All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ACPI : Introduce support for _CCA object
@ 2015-04-29 13:44 ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: rjw, lenb, catalin.marinas, will.deacon
  Cc: msalter, hanjun.guo, al.stone, grant.likely, arnd, leo.duran,
	linux-arm-kernel, linux-acpi, linux-kernel, linaro-acpi,
	Suravee Suthikulpanit

This patch series introduce support for _CCA object, which is currently
used mainly by ARM64 platform to specify DMA coherency attribute for
devices when booting with ACPI.

A copy of ACPIv6 can be found here:
    http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

This has been tested on AMD-Seattle platform, which implements _CCA 
object as described in the AMD Opteron A1100 Series Processor ACPI Porting Guide:

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf

Changes From RFC: (https://lkml.org/lkml/2015/4/1/389)
    * New logic for deriving and propagating coherent attribute from
      parent devices. (by Mark)
    * Introducing acpi_dma_is_coherent() API (Per Tom suggestion)
    * Introducing CONFIG_ACPI_MUST_HAVE_CCA kernel configuration.
    * Rebased to linux-4.1-rc1

Suravee Suthikulpanit (2):
  arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  ACPI / scan: Parse _CCA and setup device coherency

 arch/arm/Kconfig             |  1 +
 arch/arm64/Kconfig           |  1 +
 drivers/acpi/Kconfig         |  3 +++
 drivers/acpi/acpi_platform.c |  5 ++++-
 drivers/acpi/scan.c          | 45 ++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h      |  9 ++++++++-
 6 files changed, 62 insertions(+), 2 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 0/2] ACPI : Introduce support for _CCA object
@ 2015-04-29 13:44 ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: rjw, lenb, catalin.marinas, will.deacon
  Cc: msalter, hanjun.guo, al.stone, grant.likely, arnd, leo.duran,
	linux-arm-kernel, linux-acpi, linux-kernel, linaro-acpi,
	Suravee Suthikulpanit

This patch series introduce support for _CCA object, which is currently
used mainly by ARM64 platform to specify DMA coherency attribute for
devices when booting with ACPI.

A copy of ACPIv6 can be found here:
    http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

This has been tested on AMD-Seattle platform, which implements _CCA 
object as described in the AMD Opteron A1100 Series Processor ACPI Porting Guide:

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf

Changes From RFC: (https://lkml.org/lkml/2015/4/1/389)
    * New logic for deriving and propagating coherent attribute from
      parent devices. (by Mark)
    * Introducing acpi_dma_is_coherent() API (Per Tom suggestion)
    * Introducing CONFIG_ACPI_MUST_HAVE_CCA kernel configuration.
    * Rebased to linux-4.1-rc1

Suravee Suthikulpanit (2):
  arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  ACPI / scan: Parse _CCA and setup device coherency

 arch/arm/Kconfig             |  1 +
 arch/arm64/Kconfig           |  1 +
 drivers/acpi/Kconfig         |  3 +++
 drivers/acpi/acpi_platform.c |  5 ++++-
 drivers/acpi/scan.c          | 45 ++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h      |  9 ++++++++-
 6 files changed, 62 insertions(+), 2 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 0/2] ACPI : Introduce support for _CCA object
@ 2015-04-29 13:44 ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series introduce support for _CCA object, which is currently
used mainly by ARM64 platform to specify DMA coherency attribute for
devices when booting with ACPI.

A copy of ACPIv6 can be found here:
    http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

This has been tested on AMD-Seattle platform, which implements _CCA 
object as described in the AMD Opteron A1100 Series Processor ACPI Porting Guide:

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf

Changes From RFC: (https://lkml.org/lkml/2015/4/1/389)
    * New logic for deriving and propagating coherent attribute from
      parent devices. (by Mark)
    * Introducing acpi_dma_is_coherent() API (Per Tom suggestion)
    * Introducing CONFIG_ACPI_MUST_HAVE_CCA kernel configuration.
    * Rebased to linux-4.1-rc1

Suravee Suthikulpanit (2):
  arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  ACPI / scan: Parse _CCA and setup device coherency

 arch/arm/Kconfig             |  1 +
 arch/arm64/Kconfig           |  1 +
 drivers/acpi/Kconfig         |  3 +++
 drivers/acpi/acpi_platform.c |  5 ++++-
 drivers/acpi/scan.c          | 45 ++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h      |  9 ++++++++-
 6 files changed, 62 insertions(+), 2 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-29 13:44 ` Suravee Suthikulpanit
  (?)
@ 2015-04-29 13:44   ` Suravee Suthikulpanit
  -1 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: rjw, lenb, catalin.marinas, will.deacon
  Cc: msalter, hanjun.guo, al.stone, grant.likely, arnd, leo.duran,
	linux-arm-kernel, linux-acpi, linux-kernel, linaro-acpi,
	Suravee Suthikulpanit

>From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
in arm and arm64 Kconfig to specify such requirement.

Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
will default to setting up devices as non-coherent.

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/arm/Kconfig     | 1 +
 arch/arm64/Kconfig   | 1 +
 drivers/acpi/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 45df48b..2a0d036 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1,6 +1,7 @@
 config ARM
 	bool
 	default y
+	select ACPI_MUST_HAVE_CCA if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4269dba..e5471f8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1,6 +1,7 @@
 config ARM64
 	def_bool y
 	select ACPI_GENERIC_GSI if ACPI
+	select ACPI_MUST_HAVE_CCA if ACPI
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_HAS_ELF_RANDOMIZE
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ab2cbb5..620ee67 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -54,6 +54,9 @@ config ACPI_GENERIC_GSI
 config ACPI_SYSTEM_POWER_STATES_SUPPORT
 	bool
 
+config ACPI_MUST_HAVE_CCA
+	bool
+
 config ACPI_SLEEP
 	bool
 	depends on SUSPEND || HIBERNATION
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 13:44   ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: rjw, lenb, catalin.marinas, will.deacon
  Cc: msalter, hanjun.guo, al.stone, grant.likely, arnd, leo.duran,
	linux-arm-kernel, linux-acpi, linux-kernel, linaro-acpi,
	Suravee Suthikulpanit

>From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
in arm and arm64 Kconfig to specify such requirement.

Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
will default to setting up devices as non-coherent.

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/arm/Kconfig     | 1 +
 arch/arm64/Kconfig   | 1 +
 drivers/acpi/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 45df48b..2a0d036 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1,6 +1,7 @@
 config ARM
 	bool
 	default y
+	select ACPI_MUST_HAVE_CCA if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4269dba..e5471f8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1,6 +1,7 @@
 config ARM64
 	def_bool y
 	select ACPI_GENERIC_GSI if ACPI
+	select ACPI_MUST_HAVE_CCA if ACPI
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_HAS_ELF_RANDOMIZE
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ab2cbb5..620ee67 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -54,6 +54,9 @@ config ACPI_GENERIC_GSI
 config ACPI_SYSTEM_POWER_STATES_SUPPORT
 	bool
 
+config ACPI_MUST_HAVE_CCA
+	bool
+
 config ACPI_SLEEP
 	bool
 	depends on SUSPEND || HIBERNATION
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 13:44   ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

>From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
in arm and arm64 Kconfig to specify such requirement.

Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
will default to setting up devices as non-coherent.

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/arm/Kconfig     | 1 +
 arch/arm64/Kconfig   | 1 +
 drivers/acpi/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 45df48b..2a0d036 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1,6 +1,7 @@
 config ARM
 	bool
 	default y
+	select ACPI_MUST_HAVE_CCA if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4269dba..e5471f8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1,6 +1,7 @@
 config ARM64
 	def_bool y
 	select ACPI_GENERIC_GSI if ACPI
+	select ACPI_MUST_HAVE_CCA if ACPI
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_HAS_ELF_RANDOMIZE
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ab2cbb5..620ee67 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -54,6 +54,9 @@ config ACPI_GENERIC_GSI
 config ACPI_SYSTEM_POWER_STATES_SUPPORT
 	bool
 
+config ACPI_MUST_HAVE_CCA
+	bool
+
 config ACPI_SLEEP
 	bool
 	depends on SUSPEND || HIBERNATION
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 13:44 ` Suravee Suthikulpanit
  (?)
@ 2015-04-29 13:44   ` Suravee Suthikulpanit
  -1 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: rjw, lenb, catalin.marinas, will.deacon
  Cc: msalter, hanjun.guo, al.stone, grant.likely, arnd, leo.duran,
	linux-arm-kernel, linux-acpi, linux-kernel, linaro-acpi,
	Suravee Suthikulpanit

This patch implements support for ACPI _CCA object, which is introduced in
ACPIv5.1, can be used for specifying device DMA coherency attribute.

The parsing logic traverses device namespace to parse coherency
information, and stores it in acpi_device_flags. Then uses it to call
arch_setup_dma_ops() when creating each device enumerated in DSDT
during ACPI scan.

This patch also introduces acpi_dma_is_coherent(), which provides
an interface for device drivers to check the coherency information
similarly to the of_dma_is_coherent().

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 drivers/acpi/acpi_platform.c |  5 ++++-
 drivers/acpi/scan.c          | 45 ++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h      |  9 ++++++++-
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
index 4bf7559..a4db208 100644
--- a/drivers/acpi/acpi_platform.c
+++ b/drivers/acpi/acpi_platform.c
@@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
 	if (IS_ERR(pdev))
 		dev_err(&adev->dev, "platform device creation failed: %ld\n",
 			PTR_ERR(pdev));
-	else
+	else {
+		arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
+				   adev->flags.is_coherent);
 		dev_dbg(&adev->dev, "created platform device %s\n",
 			dev_name(&pdev->dev));
+	}
 
 	kfree(resources);
 	return pdev;
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 849b699..509d0157 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -11,6 +11,7 @@
 #include <linux/kthread.h>
 #include <linux/dmi.h>
 #include <linux/nls.h>
+#include <linux/dma-mapping.h>
 
 #include <asm/pgtable.h>
 
@@ -2137,6 +2138,49 @@ void acpi_free_pnp_ids(struct acpi_device_pnp *pnp)
 	kfree(pnp->unique_id);
 }
 
+static void acpi_init_coherency(struct acpi_device *device)
+{
+	unsigned long long cca;
+	acpi_status status;
+	struct acpi_device *parent = device->parent;
+
+	if (parent && parent->flags.cca_seen) {
+		/*
+		 * From ACPIv5.1, OSPM will ignore _CCA if an ancestor
+		 * already saw one.
+		 */
+		device->flags.cca_seen = 1;
+		cca = acpi_dma_is_coherent(parent);
+	} else {
+		status = acpi_evaluate_integer(device->handle, "_CCA",
+					       NULL, &cca);
+		if (ACPI_SUCCESS(status)) {
+			device->flags.cca_seen = 1;
+		} else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
+			/*
+			 * Architecture has specified that if the device
+			 * can do DMA, it must have ACPI _CCA object.
+			 * Here, there could be two cases:
+			 *   1. Not DMA-able device.
+			 *   2. DMA-able device, but missing _CCA object.
+			 *
+			 * In both cases, we will default to dma non-coherent.
+			 */
+			cca = 0;
+		} else {
+			/*
+			 * If architecture does not specify that device must
+			 * specify ACPI _CCA (e.g. x86), we default to use
+			 * dma coherent.
+			 */
+			cca = 1;
+		}
+	}
+
+	device->flags.is_coherent = cca;
+	arch_setup_dma_ops(&device->dev, 0, 0, NULL, cca);
+}
+
 void acpi_init_device_object(struct acpi_device *device, acpi_handle handle,
 			     int type, unsigned long long sta)
 {
@@ -2155,6 +2199,7 @@ void acpi_init_device_object(struct acpi_device *device, acpi_handle handle,
 	device->flags.visited = false;
 	device_initialize(&device->dev);
 	dev_set_uevent_suppress(&device->dev, true);
+	acpi_init_coherency(device);
 }
 
 void acpi_device_add_finalize(struct acpi_device *device)
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 8de4fa9..7e8cd4c 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -208,7 +208,9 @@ struct acpi_device_flags {
 	u32 visited:1;
 	u32 hotplug_notify:1;
 	u32 is_dock_station:1;
-	u32 reserved:23;
+	u32 is_coherent:1;
+	u32 cca_seen:1;
+	u32 reserved:21;
 };
 
 /* File System */
@@ -380,6 +382,11 @@ struct acpi_device {
 	void (*remove)(struct acpi_device *);
 };
 
+static inline bool acpi_dma_is_coherent(struct acpi_device *adev)
+{
+	return adev && adev->flags.is_coherent;
+}
+
 static inline bool is_acpi_node(struct fwnode_handle *fwnode)
 {
 	return fwnode && fwnode->type == FWNODE_ACPI;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 13:44   ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: rjw, lenb, catalin.marinas, will.deacon
  Cc: msalter, hanjun.guo, al.stone, grant.likely, arnd, leo.duran,
	linux-arm-kernel, linux-acpi, linux-kernel, linaro-acpi,
	Suravee Suthikulpanit

This patch implements support for ACPI _CCA object, which is introduced in
ACPIv5.1, can be used for specifying device DMA coherency attribute.

The parsing logic traverses device namespace to parse coherency
information, and stores it in acpi_device_flags. Then uses it to call
arch_setup_dma_ops() when creating each device enumerated in DSDT
during ACPI scan.

This patch also introduces acpi_dma_is_coherent(), which provides
an interface for device drivers to check the coherency information
similarly to the of_dma_is_coherent().

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 drivers/acpi/acpi_platform.c |  5 ++++-
 drivers/acpi/scan.c          | 45 ++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h      |  9 ++++++++-
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
index 4bf7559..a4db208 100644
--- a/drivers/acpi/acpi_platform.c
+++ b/drivers/acpi/acpi_platform.c
@@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
 	if (IS_ERR(pdev))
 		dev_err(&adev->dev, "platform device creation failed: %ld\n",
 			PTR_ERR(pdev));
-	else
+	else {
+		arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
+				   adev->flags.is_coherent);
 		dev_dbg(&adev->dev, "created platform device %s\n",
 			dev_name(&pdev->dev));
+	}
 
 	kfree(resources);
 	return pdev;
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 849b699..509d0157 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -11,6 +11,7 @@
 #include <linux/kthread.h>
 #include <linux/dmi.h>
 #include <linux/nls.h>
+#include <linux/dma-mapping.h>
 
 #include <asm/pgtable.h>
 
@@ -2137,6 +2138,49 @@ void acpi_free_pnp_ids(struct acpi_device_pnp *pnp)
 	kfree(pnp->unique_id);
 }
 
+static void acpi_init_coherency(struct acpi_device *device)
+{
+	unsigned long long cca;
+	acpi_status status;
+	struct acpi_device *parent = device->parent;
+
+	if (parent && parent->flags.cca_seen) {
+		/*
+		 * From ACPIv5.1, OSPM will ignore _CCA if an ancestor
+		 * already saw one.
+		 */
+		device->flags.cca_seen = 1;
+		cca = acpi_dma_is_coherent(parent);
+	} else {
+		status = acpi_evaluate_integer(device->handle, "_CCA",
+					       NULL, &cca);
+		if (ACPI_SUCCESS(status)) {
+			device->flags.cca_seen = 1;
+		} else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
+			/*
+			 * Architecture has specified that if the device
+			 * can do DMA, it must have ACPI _CCA object.
+			 * Here, there could be two cases:
+			 *   1. Not DMA-able device.
+			 *   2. DMA-able device, but missing _CCA object.
+			 *
+			 * In both cases, we will default to dma non-coherent.
+			 */
+			cca = 0;
+		} else {
+			/*
+			 * If architecture does not specify that device must
+			 * specify ACPI _CCA (e.g. x86), we default to use
+			 * dma coherent.
+			 */
+			cca = 1;
+		}
+	}
+
+	device->flags.is_coherent = cca;
+	arch_setup_dma_ops(&device->dev, 0, 0, NULL, cca);
+}
+
 void acpi_init_device_object(struct acpi_device *device, acpi_handle handle,
 			     int type, unsigned long long sta)
 {
@@ -2155,6 +2199,7 @@ void acpi_init_device_object(struct acpi_device *device, acpi_handle handle,
 	device->flags.visited = false;
 	device_initialize(&device->dev);
 	dev_set_uevent_suppress(&device->dev, true);
+	acpi_init_coherency(device);
 }
 
 void acpi_device_add_finalize(struct acpi_device *device)
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 8de4fa9..7e8cd4c 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -208,7 +208,9 @@ struct acpi_device_flags {
 	u32 visited:1;
 	u32 hotplug_notify:1;
 	u32 is_dock_station:1;
-	u32 reserved:23;
+	u32 is_coherent:1;
+	u32 cca_seen:1;
+	u32 reserved:21;
 };
 
 /* File System */
@@ -380,6 +382,11 @@ struct acpi_device {
 	void (*remove)(struct acpi_device *);
 };
 
+static inline bool acpi_dma_is_coherent(struct acpi_device *adev)
+{
+	return adev && adev->flags.is_coherent;
+}
+
 static inline bool is_acpi_node(struct fwnode_handle *fwnode)
 {
 	return fwnode && fwnode->type == FWNODE_ACPI;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 13:44   ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

This patch implements support for ACPI _CCA object, which is introduced in
ACPIv5.1, can be used for specifying device DMA coherency attribute.

The parsing logic traverses device namespace to parse coherency
information, and stores it in acpi_device_flags. Then uses it to call
arch_setup_dma_ops() when creating each device enumerated in DSDT
during ACPI scan.

This patch also introduces acpi_dma_is_coherent(), which provides
an interface for device drivers to check the coherency information
similarly to the of_dma_is_coherent().

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 drivers/acpi/acpi_platform.c |  5 ++++-
 drivers/acpi/scan.c          | 45 ++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h      |  9 ++++++++-
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
index 4bf7559..a4db208 100644
--- a/drivers/acpi/acpi_platform.c
+++ b/drivers/acpi/acpi_platform.c
@@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
 	if (IS_ERR(pdev))
 		dev_err(&adev->dev, "platform device creation failed: %ld\n",
 			PTR_ERR(pdev));
-	else
+	else {
+		arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
+				   adev->flags.is_coherent);
 		dev_dbg(&adev->dev, "created platform device %s\n",
 			dev_name(&pdev->dev));
+	}
 
 	kfree(resources);
 	return pdev;
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 849b699..509d0157 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -11,6 +11,7 @@
 #include <linux/kthread.h>
 #include <linux/dmi.h>
 #include <linux/nls.h>
+#include <linux/dma-mapping.h>
 
 #include <asm/pgtable.h>
 
@@ -2137,6 +2138,49 @@ void acpi_free_pnp_ids(struct acpi_device_pnp *pnp)
 	kfree(pnp->unique_id);
 }
 
+static void acpi_init_coherency(struct acpi_device *device)
+{
+	unsigned long long cca;
+	acpi_status status;
+	struct acpi_device *parent = device->parent;
+
+	if (parent && parent->flags.cca_seen) {
+		/*
+		 * From ACPIv5.1, OSPM will ignore _CCA if an ancestor
+		 * already saw one.
+		 */
+		device->flags.cca_seen = 1;
+		cca = acpi_dma_is_coherent(parent);
+	} else {
+		status = acpi_evaluate_integer(device->handle, "_CCA",
+					       NULL, &cca);
+		if (ACPI_SUCCESS(status)) {
+			device->flags.cca_seen = 1;
+		} else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
+			/*
+			 * Architecture has specified that if the device
+			 * can do DMA, it must have ACPI _CCA object.
+			 * Here, there could be two cases:
+			 *   1. Not DMA-able device.
+			 *   2. DMA-able device, but missing _CCA object.
+			 *
+			 * In both cases, we will default to dma non-coherent.
+			 */
+			cca = 0;
+		} else {
+			/*
+			 * If architecture does not specify that device must
+			 * specify ACPI _CCA (e.g. x86), we default to use
+			 * dma coherent.
+			 */
+			cca = 1;
+		}
+	}
+
+	device->flags.is_coherent = cca;
+	arch_setup_dma_ops(&device->dev, 0, 0, NULL, cca);
+}
+
 void acpi_init_device_object(struct acpi_device *device, acpi_handle handle,
 			     int type, unsigned long long sta)
 {
@@ -2155,6 +2199,7 @@ void acpi_init_device_object(struct acpi_device *device, acpi_handle handle,
 	device->flags.visited = false;
 	device_initialize(&device->dev);
 	dev_set_uevent_suppress(&device->dev, true);
+	acpi_init_coherency(device);
 }
 
 void acpi_device_add_finalize(struct acpi_device *device)
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 8de4fa9..7e8cd4c 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -208,7 +208,9 @@ struct acpi_device_flags {
 	u32 visited:1;
 	u32 hotplug_notify:1;
 	u32 is_dock_station:1;
-	u32 reserved:23;
+	u32 is_coherent:1;
+	u32 cca_seen:1;
+	u32 reserved:21;
 };
 
 /* File System */
@@ -380,6 +382,11 @@ struct acpi_device {
 	void (*remove)(struct acpi_device *);
 };
 
+static inline bool acpi_dma_is_coherent(struct acpi_device *adev)
+{
+	return adev && adev->flags.is_coherent;
+}
+
 static inline bool is_acpi_node(struct fwnode_handle *fwnode)
 {
 	return fwnode && fwnode->type == FWNODE_ACPI;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 13:44   ` Suravee Suthikulpanit
@ 2015-04-29 14:03     ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 14:03 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Suravee Suthikulpanit, rjw, lenb, catalin.marinas, will.deacon,
	al.stone, linaro-acpi, linux-kernel, linux-acpi, leo.duran,
	hanjun.guo, msalter, grant.likely

On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> +                       device->flags.cca_seen = 1;
> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
> +                       /*
> +                        * Architecture has specified that if the device
> +                        * can do DMA, it must have ACPI _CCA object.
> +                        * Here, there could be two cases:
> +                        *   1. Not DMA-able device.
> +                        *   2. DMA-able device, but missing _CCA object.
> +                        *
> +                        * In both cases, we will default to dma non-coherent.
> +                        */
> +                       cca = 0;
> +               } else {
> +                       /*
> +                        * If architecture does not specify that device must
> +                        * specify ACPI _CCA (e.g. x86), we default to use
> +                        * dma coherent.
> +                        */
> +                       cca = 1;
> +               }
> 

What does it mean here if a device does DMA but is not coherent? Do you
have an example of a server that needs this?

Can we please make the default for ARM64 cca=1 as well?

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 14:03     ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 14:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> +                       device->flags.cca_seen = 1;
> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
> +                       /*
> +                        * Architecture has specified that if the device
> +                        * can do DMA, it must have ACPI _CCA object.
> +                        * Here, there could be two cases:
> +                        *   1. Not DMA-able device.
> +                        *   2. DMA-able device, but missing _CCA object.
> +                        *
> +                        * In both cases, we will default to dma non-coherent.
> +                        */
> +                       cca = 0;
> +               } else {
> +                       /*
> +                        * If architecture does not specify that device must
> +                        * specify ACPI _CCA (e.g. x86), we default to use
> +                        * dma coherent.
> +                        */
> +                       cca = 1;
> +               }
> 

What does it mean here if a device does DMA but is not coherent? Do you
have an example of a server that needs this?

Can we please make the default for ARM64 cca=1 as well?

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-29 13:44   ` Suravee Suthikulpanit
@ 2015-04-29 14:04     ` Catalin Marinas
  -1 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-04-29 14:04 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: rjw, lenb, will.deacon, al.stone, arnd, linaro-acpi,
	linux-kernel, linux-acpi, leo.duran, hanjun.guo, msalter,
	grant.likely, linux-arm-kernel

On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
> in arm and arm64 Kconfig to specify such requirement.
> 
> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
> will default to setting up devices as non-coherent.
> 
> Signed-off-by: Mark Salter <msalter@redhat.com>
> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> ---
>  arch/arm/Kconfig     | 1 +
>  arch/arm64/Kconfig   | 1 +
>  drivers/acpi/Kconfig | 3 +++
>  3 files changed, 5 insertions(+)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 45df48b..2a0d036 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1,6 +1,7 @@
>  config ARM
>  	bool
>  	default y
> +	select ACPI_MUST_HAVE_CCA if ACPI
>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>  	select ARCH_HAS_ELF_RANDOMIZE
>  	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST

Any plans for ACPI on 32-bit ARM?

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 14:04     ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-04-29 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
> in arm and arm64 Kconfig to specify such requirement.
> 
> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
> will default to setting up devices as non-coherent.
> 
> Signed-off-by: Mark Salter <msalter@redhat.com>
> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> ---
>  arch/arm/Kconfig     | 1 +
>  arch/arm64/Kconfig   | 1 +
>  drivers/acpi/Kconfig | 3 +++
>  3 files changed, 5 insertions(+)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 45df48b..2a0d036 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1,6 +1,7 @@
>  config ARM
>  	bool
>  	default y
> +	select ACPI_MUST_HAVE_CCA if ACPI
>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>  	select ARCH_HAS_ELF_RANDOMIZE
>  	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST

Any plans for ACPI on 32-bit ARM?

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-29 14:04     ` Catalin Marinas
  (?)
@ 2015-04-29 14:31       ` Suravee Suthikulpanit
  -1 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: rjw, lenb, will.deacon, al.stone, arnd, linaro-acpi,
	linux-kernel, linux-acpi, leo.duran, hanjun.guo, msalter,
	grant.likely, linux-arm-kernel, Charles Garcia-Tobin



On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>> in arm and arm64 Kconfig to specify such requirement.
>>
>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>> will default to setting up devices as non-coherent.
>>
>> Signed-off-by: Mark Salter <msalter@redhat.com>
>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>> ---
>>   arch/arm/Kconfig     | 1 +
>>   arch/arm64/Kconfig   | 1 +
>>   drivers/acpi/Kconfig | 3 +++
>>   3 files changed, 5 insertions(+)
>>
>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> index 45df48b..2a0d036 100644
>> --- a/arch/arm/Kconfig
>> +++ b/arch/arm/Kconfig
>> @@ -1,6 +1,7 @@
>>   config ARM
>>   	bool
>>   	default y
>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>   	select ARCH_HAS_ELF_RANDOMIZE
>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>
> Any plans for ACPI on 32-bit ARM?
>

Not that I am aware, but I could be totally wrong. The reason I am 
adding this here for 32-bit ARM is because the ACPI spec mentioned this.

If you think this is not necessary until we introduce ACPI for ARM32, it 
can be removed.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 14:31       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: rjw, lenb, will.deacon, al.stone, arnd, linaro-acpi,
	linux-kernel, linux-acpi, leo.duran, hanjun.guo, msalter,
	grant.likely, linux-arm-kernel, Charles Garcia-Tobin



On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>> in arm and arm64 Kconfig to specify such requirement.
>>
>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>> will default to setting up devices as non-coherent.
>>
>> Signed-off-by: Mark Salter <msalter@redhat.com>
>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>> ---
>>   arch/arm/Kconfig     | 1 +
>>   arch/arm64/Kconfig   | 1 +
>>   drivers/acpi/Kconfig | 3 +++
>>   3 files changed, 5 insertions(+)
>>
>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> index 45df48b..2a0d036 100644
>> --- a/arch/arm/Kconfig
>> +++ b/arch/arm/Kconfig
>> @@ -1,6 +1,7 @@
>>   config ARM
>>   	bool
>>   	default y
>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>   	select ARCH_HAS_ELF_RANDOMIZE
>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>
> Any plans for ACPI on 32-bit ARM?
>

Not that I am aware, but I could be totally wrong. The reason I am 
adding this here for 32-bit ARM is because the ACPI spec mentioned this.

If you think this is not necessary until we introduce ACPI for ARM32, it 
can be removed.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 14:31       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:31 UTC (permalink / raw)
  To: linux-arm-kernel



On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>> in arm and arm64 Kconfig to specify such requirement.
>>
>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>> will default to setting up devices as non-coherent.
>>
>> Signed-off-by: Mark Salter <msalter@redhat.com>
>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>> ---
>>   arch/arm/Kconfig     | 1 +
>>   arch/arm64/Kconfig   | 1 +
>>   drivers/acpi/Kconfig | 3 +++
>>   3 files changed, 5 insertions(+)
>>
>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> index 45df48b..2a0d036 100644
>> --- a/arch/arm/Kconfig
>> +++ b/arch/arm/Kconfig
>> @@ -1,6 +1,7 @@
>>   config ARM
>>   	bool
>>   	default y
>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>   	select ARCH_HAS_ELF_RANDOMIZE
>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>
> Any plans for ACPI on 32-bit ARM?
>

Not that I am aware, but I could be totally wrong. The reason I am 
adding this here for 32-bit ARM is because the ACPI spec mentioned this.

If you think this is not necessary until we introduce ACPI for ARM32, it 
can be removed.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-29 14:31       ` Suravee Suthikulpanit
@ 2015-04-29 14:42         ` Catalin Marinas
  -1 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-04-29 14:42 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: al.stone, arnd, linaro-acpi, rjw, will.deacon, linux-kernel,
	linux-acpi, leo.duran, hanjun.guo, msalter, grant.likely,
	Charles Garcia-Tobin, linux-arm-kernel, lenb

On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> 
> 
> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> >On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> >> From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
> >>section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
> >>specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
> >>in arm and arm64 Kconfig to specify such requirement.
> >>
> >>Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
> >>will default to setting up devices as non-coherent.
> >>
> >>Signed-off-by: Mark Salter <msalter@redhat.com>
> >>Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> >>---
> >>  arch/arm/Kconfig     | 1 +
> >>  arch/arm64/Kconfig   | 1 +
> >>  drivers/acpi/Kconfig | 3 +++
> >>  3 files changed, 5 insertions(+)
> >>
> >>diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> >>index 45df48b..2a0d036 100644
> >>--- a/arch/arm/Kconfig
> >>+++ b/arch/arm/Kconfig
> >>@@ -1,6 +1,7 @@
> >>  config ARM
> >>  	bool
> >>  	default y
> >>+	select ACPI_MUST_HAVE_CCA if ACPI
> >>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> >>  	select ARCH_HAS_ELF_RANDOMIZE
> >>  	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> >
> >Any plans for ACPI on 32-bit ARM?
> 
> Not that I am aware, but I could be totally wrong. The reason I am adding
> this here for 32-bit ARM is because the ACPI spec mentioned this.
> 
> If you think this is not necessary until we introduce ACPI for ARM32, it can
> be removed.

I think it should be removed (as long as ACPI cannot be selected on
arm32).

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 14:42         ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-04-29 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> 
> 
> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> >On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> >> From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
> >>section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
> >>specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
> >>in arm and arm64 Kconfig to specify such requirement.
> >>
> >>Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
> >>will default to setting up devices as non-coherent.
> >>
> >>Signed-off-by: Mark Salter <msalter@redhat.com>
> >>Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> >>---
> >>  arch/arm/Kconfig     | 1 +
> >>  arch/arm64/Kconfig   | 1 +
> >>  drivers/acpi/Kconfig | 3 +++
> >>  3 files changed, 5 insertions(+)
> >>
> >>diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> >>index 45df48b..2a0d036 100644
> >>--- a/arch/arm/Kconfig
> >>+++ b/arch/arm/Kconfig
> >>@@ -1,6 +1,7 @@
> >>  config ARM
> >>  	bool
> >>  	default y
> >>+	select ACPI_MUST_HAVE_CCA if ACPI
> >>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> >>  	select ARCH_HAS_ELF_RANDOMIZE
> >>  	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> >
> >Any plans for ACPI on 32-bit ARM?
> 
> Not that I am aware, but I could be totally wrong. The reason I am adding
> this here for 32-bit ARM is because the ACPI spec mentioned this.
> 
> If you think this is not necessary until we introduce ACPI for ARM32, it can
> be removed.

I think it should be removed (as long as ACPI cannot be selected on
arm32).

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-29 14:42         ` Catalin Marinas
  (?)
@ 2015-04-29 14:44           ` Suravee Suthikulpanit
  -1 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:44 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: al.stone, arnd, linaro-acpi, rjw, will.deacon, linux-kernel,
	linux-acpi, leo.duran, hanjun.guo, msalter, grant.likely,
	Charles Garcia-Tobin, linux-arm-kernel, lenb



On 04/29/2015 09:42 AM, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>
>>
>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>>>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>>>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>>>> in arm and arm64 Kconfig to specify such requirement.
>>>>
>>>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>>>> will default to setting up devices as non-coherent.
>>>>
>>>> Signed-off-by: Mark Salter <msalter@redhat.com>
>>>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>>>> ---
>>>>   arch/arm/Kconfig     | 1 +
>>>>   arch/arm64/Kconfig   | 1 +
>>>>   drivers/acpi/Kconfig | 3 +++
>>>>   3 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>> index 45df48b..2a0d036 100644
>>>> --- a/arch/arm/Kconfig
>>>> +++ b/arch/arm/Kconfig
>>>> @@ -1,6 +1,7 @@
>>>>   config ARM
>>>>   	bool
>>>>   	default y
>>>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>>>   	select ARCH_HAS_ELF_RANDOMIZE
>>>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>
>>> Any plans for ACPI on 32-bit ARM?
>>
>> Not that I am aware, but I could be totally wrong. The reason I am adding
>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>
>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>> be removed.
>
> I think it should be removed (as long as ACPI cannot be selected on
> arm32).
>

Ok, I'll remove that in V2.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 14:44           ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:44 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: al.stone, arnd, linaro-acpi, rjw, will.deacon, linux-kernel,
	linux-acpi, leo.duran, hanjun.guo, msalter, grant.likely,
	Charles Garcia-Tobin, linux-arm-kernel, lenb



On 04/29/2015 09:42 AM, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>
>>
>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>>>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>>>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>>>> in arm and arm64 Kconfig to specify such requirement.
>>>>
>>>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>>>> will default to setting up devices as non-coherent.
>>>>
>>>> Signed-off-by: Mark Salter <msalter@redhat.com>
>>>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>>>> ---
>>>>   arch/arm/Kconfig     | 1 +
>>>>   arch/arm64/Kconfig   | 1 +
>>>>   drivers/acpi/Kconfig | 3 +++
>>>>   3 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>> index 45df48b..2a0d036 100644
>>>> --- a/arch/arm/Kconfig
>>>> +++ b/arch/arm/Kconfig
>>>> @@ -1,6 +1,7 @@
>>>>   config ARM
>>>>   	bool
>>>>   	default y
>>>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>>>   	select ARCH_HAS_ELF_RANDOMIZE
>>>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>
>>> Any plans for ACPI on 32-bit ARM?
>>
>> Not that I am aware, but I could be totally wrong. The reason I am adding
>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>
>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>> be removed.
>
> I think it should be removed (as long as ACPI cannot be selected on
> arm32).
>

Ok, I'll remove that in V2.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-29 14:44           ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:44 UTC (permalink / raw)
  To: linux-arm-kernel



On 04/29/2015 09:42 AM, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>
>>
>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>>>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>>>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>>>> in arm and arm64 Kconfig to specify such requirement.
>>>>
>>>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>>>> will default to setting up devices as non-coherent.
>>>>
>>>> Signed-off-by: Mark Salter <msalter@redhat.com>
>>>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>>>> ---
>>>>   arch/arm/Kconfig     | 1 +
>>>>   arch/arm64/Kconfig   | 1 +
>>>>   drivers/acpi/Kconfig | 3 +++
>>>>   3 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>> index 45df48b..2a0d036 100644
>>>> --- a/arch/arm/Kconfig
>>>> +++ b/arch/arm/Kconfig
>>>> @@ -1,6 +1,7 @@
>>>>   config ARM
>>>>   	bool
>>>>   	default y
>>>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>>>   	select ARCH_HAS_ELF_RANDOMIZE
>>>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>
>>> Any plans for ACPI on 32-bit ARM?
>>
>> Not that I am aware, but I could be totally wrong. The reason I am adding
>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>
>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>> be removed.
>
> I think it should be removed (as long as ACPI cannot be selected on
> arm32).
>

Ok, I'll remove that in V2.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 14:03     ` Arnd Bergmann
  (?)
@ 2015-04-29 14:45       ` Suravee Suthikulpanit
  -1 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:45 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: rjw, lenb, catalin.marinas, will.deacon, al.stone, linaro-acpi,
	linux-kernel, linux-acpi, leo.duran, hanjun.guo, msalter,
	grant.likely



On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> +                       device->flags.cca_seen = 1;
>> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>> +                       /*
>> +                        * Architecture has specified that if the device
>> +                        * can do DMA, it must have ACPI _CCA object.
>> +                        * Here, there could be two cases:
>> +                        *   1. Not DMA-able device.
>> +                        *   2. DMA-able device, but missing _CCA object.
>> +                        *
>> +                        * In both cases, we will default to dma non-coherent.
>> +                        */
>> +                       cca = 0;
>> +               } else {
>> +                       /*
>> +                        * If architecture does not specify that device must
>> +                        * specify ACPI _CCA (e.g. x86), we default to use
>> +                        * dma coherent.
>> +                        */
>> +                       cca = 1;
>> +               }
>>
>
> What does it mean here if a device does DMA but is not coherent? Do you
> have an example of a server that needs this?
>
> Can we please make the default for ARM64 cca=1 as well?
>
> 	Arnd
>

Actually, I am trying to implement the logic for when missing _CCA to be 
consistent with the behavior when the devicetree entry does not specify 
"dma-coherent" property. IIUC, in such case, Linux will default to using 
non-coherent DMA.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 14:45       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:45 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: rjw, lenb, catalin.marinas, will.deacon, al.stone, linaro-acpi,
	linux-kernel, linux-acpi, leo.duran, hanjun.guo, msalter,
	grant.likely



On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> +                       device->flags.cca_seen = 1;
>> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>> +                       /*
>> +                        * Architecture has specified that if the device
>> +                        * can do DMA, it must have ACPI _CCA object.
>> +                        * Here, there could be two cases:
>> +                        *   1. Not DMA-able device.
>> +                        *   2. DMA-able device, but missing _CCA object.
>> +                        *
>> +                        * In both cases, we will default to dma non-coherent.
>> +                        */
>> +                       cca = 0;
>> +               } else {
>> +                       /*
>> +                        * If architecture does not specify that device must
>> +                        * specify ACPI _CCA (e.g. x86), we default to use
>> +                        * dma coherent.
>> +                        */
>> +                       cca = 1;
>> +               }
>>
>
> What does it mean here if a device does DMA but is not coherent? Do you
> have an example of a server that needs this?
>
> Can we please make the default for ARM64 cca=1 as well?
>
> 	Arnd
>

Actually, I am trying to implement the logic for when missing _CCA to be 
consistent with the behavior when the devicetree entry does not specify 
"dma-coherent" property. IIUC, in such case, Linux will default to using 
non-coherent DMA.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 14:45       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 14:45 UTC (permalink / raw)
  To: linux-arm-kernel



On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> +                       device->flags.cca_seen = 1;
>> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>> +                       /*
>> +                        * Architecture has specified that if the device
>> +                        * can do DMA, it must have ACPI _CCA object.
>> +                        * Here, there could be two cases:
>> +                        *   1. Not DMA-able device.
>> +                        *   2. DMA-able device, but missing _CCA object.
>> +                        *
>> +                        * In both cases, we will default to dma non-coherent.
>> +                        */
>> +                       cca = 0;
>> +               } else {
>> +                       /*
>> +                        * If architecture does not specify that device must
>> +                        * specify ACPI _CCA (e.g. x86), we default to use
>> +                        * dma coherent.
>> +                        */
>> +                       cca = 1;
>> +               }
>>
>
> What does it mean here if a device does DMA but is not coherent? Do you
> have an example of a server that needs this?
>
> Can we please make the default for ARM64 cca=1 as well?
>
> 	Arnd
>

Actually, I am trying to implement the logic for when missing _CCA to be 
consistent with the behavior when the devicetree entry does not specify 
"dma-coherent" property. IIUC, in such case, Linux will default to using 
non-coherent DMA.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 14:45       ` Suravee Suthikulpanit
@ 2015-04-29 14:47         ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 14:47 UTC (permalink / raw)
  To: linaro-acpi
  Cc: Suravee Suthikulpanit, linux-arm-kernel, catalin.marinas, rjw,
	linux-kernel, will.deacon, linux-acpi, lenb

On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
> > On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> >> +                       device->flags.cca_seen = 1;
> >> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
> >> +                       /*
> >> +                        * Architecture has specified that if the device
> >> +                        * can do DMA, it must have ACPI _CCA object.
> >> +                        * Here, there could be two cases:
> >> +                        *   1. Not DMA-able device.
> >> +                        *   2. DMA-able device, but missing _CCA object.
> >> +                        *
> >> +                        * In both cases, we will default to dma non-coherent.
> >> +                        */
> >> +                       cca = 0;
> >> +               } else {
> >> +                       /*
> >> +                        * If architecture does not specify that device must
> >> +                        * specify ACPI _CCA (e.g. x86), we default to use
> >> +                        * dma coherent.
> >> +                        */
> >> +                       cca = 1;
> >> +               }
> >>
> >
> > What does it mean here if a device does DMA but is not coherent? Do you
> > have an example of a server that needs this?
> >
> > Can we please make the default for ARM64 cca=1 as well?
> >
> >       Arnd
> >
> 
> Actually, I am trying to implement the logic for when missing _CCA to be 
> consistent with the behavior when the devicetree entry does not specify 
> "dma-coherent" property. IIUC, in such case, Linux will default to using 
> non-coherent DMA.

Why?

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 14:47         ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
> > On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> >> +                       device->flags.cca_seen = 1;
> >> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
> >> +                       /*
> >> +                        * Architecture has specified that if the device
> >> +                        * can do DMA, it must have ACPI _CCA object.
> >> +                        * Here, there could be two cases:
> >> +                        *   1. Not DMA-able device.
> >> +                        *   2. DMA-able device, but missing _CCA object.
> >> +                        *
> >> +                        * In both cases, we will default to dma non-coherent.
> >> +                        */
> >> +                       cca = 0;
> >> +               } else {
> >> +                       /*
> >> +                        * If architecture does not specify that device must
> >> +                        * specify ACPI _CCA (e.g. x86), we default to use
> >> +                        * dma coherent.
> >> +                        */
> >> +                       cca = 1;
> >> +               }
> >>
> >
> > What does it mean here if a device does DMA but is not coherent? Do you
> > have an example of a server that needs this?
> >
> > Can we please make the default for ARM64 cca=1 as well?
> >
> >       Arnd
> >
> 
> Actually, I am trying to implement the logic for when missing _CCA to be 
> consistent with the behavior when the devicetree entry does not specify 
> "dma-coherent" property. IIUC, in such case, Linux will default to using 
> non-coherent DMA.

Why?

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 14:47         ` Arnd Bergmann
  (?)
@ 2015-04-29 14:57           ` Suthikulpanit, Suravee
  -1 siblings, 0 replies; 102+ messages in thread
From: Suthikulpanit, Suravee @ 2015-04-29 14:57 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-acpi
  Cc: linux-arm-kernel, catalin.marinas, rjw, linux-kernel,
	will.deacon, linux-acpi, lenb, Charles Garcia-Tobin



On 4/29/15, 09:47, "Arnd Bergmann" <arnd@arndb.de> wrote:

>On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
>> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
>> > On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> >> +                       device->flags.cca_seen = 1;
>> >> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>> >> +                       /*
>> >> +                        * Architecture has specified that if the
>>device
>> >> +                        * can do DMA, it must have ACPI _CCA object.
>> >> +                        * Here, there could be two cases:
>> >> +                        *   1. Not DMA-able device.
>> >> +                        *   2. DMA-able device, but missing _CCA
>>object.
>> >> +                        *
>> >> +                        * In both cases, we will default to dma
>>non-coherent.
>> >> +                        */
>> >> +                       cca = 0;
>> >> +               } else {
>> >> +                       /*
>> >> +                        * If architecture does not specify that
>>device must
>> >> +                        * specify ACPI _CCA (e.g. x86), we default
>>to use
>> >> +                        * dma coherent.
>> >> +                        */
>> >> +                       cca = 1;
>> >> +               }
>> >>
>> >
>> > What does it mean here if a device does DMA but is not coherent? Do
>>you
>> > have an example of a server that needs this?
>> >
>> > Can we please make the default for ARM64 cca=1 as well?
>> >
>> >       Arnd
>> >
>> 
>> Actually, I am trying to implement the logic for when missing _CCA to
>>be 
>> consistent with the behavior when the devicetree entry does not specify
>> "dma-coherent" property. IIUC, in such case, Linux will default to
>>using 
>> non-coherent DMA.
>
>Why?
>
>	Arnd

Otherwise, it would seem inconsistent with what states in the ACPI spec:
 
  CCA objects are only relevant for devices that can access CPU-visible
memory,
  such as devices that are DMA capable. On ARM based systems, the _CCA
object 
  must be supplied all such devices. On Intel platforms, if the _CCA
object is 
  not supplied, the OSPM will assume the devices are hardware cache
coherent.

>From the statement above, I interpreted as if it is not present, it would
be non-coherent.

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 14:57           ` Suthikulpanit, Suravee
  0 siblings, 0 replies; 102+ messages in thread
From: Suthikulpanit, Suravee @ 2015-04-29 14:57 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-acpi
  Cc: linux-arm-kernel, catalin.marinas, rjw, linux-kernel,
	will.deacon, linux-acpi, lenb, Charles Garcia-Tobin



On 4/29/15, 09:47, "Arnd Bergmann" <arnd@arndb.de> wrote:

>On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
>> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
>> > On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> >> +                       device->flags.cca_seen = 1;
>> >> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>> >> +                       /*
>> >> +                        * Architecture has specified that if the
>>device
>> >> +                        * can do DMA, it must have ACPI _CCA object.
>> >> +                        * Here, there could be two cases:
>> >> +                        *   1. Not DMA-able device.
>> >> +                        *   2. DMA-able device, but missing _CCA
>>object.
>> >> +                        *
>> >> +                        * In both cases, we will default to dma
>>non-coherent.
>> >> +                        */
>> >> +                       cca = 0;
>> >> +               } else {
>> >> +                       /*
>> >> +                        * If architecture does not specify that
>>device must
>> >> +                        * specify ACPI _CCA (e.g. x86), we default
>>to use
>> >> +                        * dma coherent.
>> >> +                        */
>> >> +                       cca = 1;
>> >> +               }
>> >>
>> >
>> > What does it mean here if a device does DMA but is not coherent? Do
>>you
>> > have an example of a server that needs this?
>> >
>> > Can we please make the default for ARM64 cca=1 as well?
>> >
>> >       Arnd
>> >
>> 
>> Actually, I am trying to implement the logic for when missing _CCA to
>>be 
>> consistent with the behavior when the devicetree entry does not specify
>> "dma-coherent" property. IIUC, in such case, Linux will default to
>>using 
>> non-coherent DMA.
>
>Why?
>
>	Arnd

Otherwise, it would seem inconsistent with what states in the ACPI spec:
 
  CCA objects are only relevant for devices that can access CPU-visible
memory,
  such as devices that are DMA capable. On ARM based systems, the _CCA
object 
  must be supplied all such devices. On Intel platforms, if the _CCA
object is 
  not supplied, the OSPM will assume the devices are hardware cache
coherent.

>From the statement above, I interpreted as if it is not present, it would
be non-coherent.

Suravee


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 14:57           ` Suthikulpanit, Suravee
  0 siblings, 0 replies; 102+ messages in thread
From: Suthikulpanit, Suravee @ 2015-04-29 14:57 UTC (permalink / raw)
  To: linux-arm-kernel



On 4/29/15, 09:47, "Arnd Bergmann" <arnd@arndb.de> wrote:

>On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
>> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
>> > On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> >> +                       device->flags.cca_seen = 1;
>> >> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>> >> +                       /*
>> >> +                        * Architecture has specified that if the
>>device
>> >> +                        * can do DMA, it must have ACPI _CCA object.
>> >> +                        * Here, there could be two cases:
>> >> +                        *   1. Not DMA-able device.
>> >> +                        *   2. DMA-able device, but missing _CCA
>>object.
>> >> +                        *
>> >> +                        * In both cases, we will default to dma
>>non-coherent.
>> >> +                        */
>> >> +                       cca = 0;
>> >> +               } else {
>> >> +                       /*
>> >> +                        * If architecture does not specify that
>>device must
>> >> +                        * specify ACPI _CCA (e.g. x86), we default
>>to use
>> >> +                        * dma coherent.
>> >> +                        */
>> >> +                       cca = 1;
>> >> +               }
>> >>
>> >
>> > What does it mean here if a device does DMA but is not coherent? Do
>>you
>> > have an example of a server that needs this?
>> >
>> > Can we please make the default for ARM64 cca=1 as well?
>> >
>> >       Arnd
>> >
>> 
>> Actually, I am trying to implement the logic for when missing _CCA to
>>be 
>> consistent with the behavior when the devicetree entry does not specify
>> "dma-coherent" property. IIUC, in such case, Linux will default to
>>using 
>> non-coherent DMA.
>
>Why?
>
>	Arnd

Otherwise, it would seem inconsistent with what states in the ACPI spec:
 
  CCA objects are only relevant for devices that can access CPU-visible
memory,
  such as devices that are DMA capable. On ARM based systems, the _CCA
object 
  must be supplied all such devices. On Intel platforms, if the _CCA
object is 
  not supplied, the OSPM will assume the devices are hardware cache
coherent.

>From the statement above, I interpreted as if it is not present, it would
be non-coherent.

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 14:57           ` Suthikulpanit, Suravee
  (?)
@ 2015-04-29 15:39             ` Al Stone
  -1 siblings, 0 replies; 102+ messages in thread
From: Al Stone @ 2015-04-29 15:39 UTC (permalink / raw)
  To: Suthikulpanit, Suravee, Arnd Bergmann, linaro-acpi
  Cc: rjw, catalin.marinas, will.deacon, linux-kernel, linux-acpi,
	linux-arm-kernel, lenb

On 04/29/2015 08:57 AM, Suthikulpanit, Suravee wrote:
> 
> 
> On 4/29/15, 09:47, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
>> On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
>>> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
>>>> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>>>>> +                       device->flags.cca_seen = 1;
>>>>> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>>>>> +                       /*
>>>>> +                        * Architecture has specified that if the
>>> device
>>>>> +                        * can do DMA, it must have ACPI _CCA object.
>>>>> +                        * Here, there could be two cases:
>>>>> +                        *   1. Not DMA-able device.
>>>>> +                        *   2. DMA-able device, but missing _CCA
>>> object.
>>>>> +                        *
>>>>> +                        * In both cases, we will default to dma
>>> non-coherent.
>>>>> +                        */
>>>>> +                       cca = 0;
>>>>> +               } else {
>>>>> +                       /*
>>>>> +                        * If architecture does not specify that
>>> device must
>>>>> +                        * specify ACPI _CCA (e.g. x86), we default
>>> to use
>>>>> +                        * dma coherent.
>>>>> +                        */
>>>>> +                       cca = 1;
>>>>> +               }
>>>>>
>>>>
>>>> What does it mean here if a device does DMA but is not coherent? Do
>>> you
>>>> have an example of a server that needs this?
>>>>
>>>> Can we please make the default for ARM64 cca=1 as well?
>>>>
>>>>       Arnd
>>>>
>>>
>>> Actually, I am trying to implement the logic for when missing _CCA to
>>> be 
>>> consistent with the behavior when the devicetree entry does not specify
>>> "dma-coherent" property. IIUC, in such case, Linux will default to
>>> using 
>>> non-coherent DMA.
>>
>> Why?
>>
>> 	Arnd
> 
> Otherwise, it would seem inconsistent with what states in the ACPI spec:
>  
>   CCA objects are only relevant for devices that can access CPU-visible
> memory,
>   such as devices that are DMA capable. On ARM based systems, the _CCA
> object 
>   must be supplied all such devices. On Intel platforms, if the _CCA
> object is 
>   not supplied, the OSPM will assume the devices are hardware cache
> coherent.
> 
> From the statement above, I interpreted as if it is not present, it would
> be non-coherent.
> 
> Suravee

A little background to Suravee's statement...

When the spec was being changed for _CCA, it was determined by the ASWG
that there was no reasonable default -- either choice would break something.
Multiple OSs, SoC vendors, and platform vendors were asked.  So, the spec
says for ARMv8, _CCA must be specified when needed and is not assumed to have
any value.  Obviously, any OS can choose to behave differently, but that's
what was specified and why it was specified that way.

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 15:39             ` Al Stone
  0 siblings, 0 replies; 102+ messages in thread
From: Al Stone @ 2015-04-29 15:39 UTC (permalink / raw)
  To: Suthikulpanit, Suravee, Arnd Bergmann, linaro-acpi
  Cc: rjw, catalin.marinas, will.deacon, linux-kernel, linux-acpi,
	linux-arm-kernel, lenb

On 04/29/2015 08:57 AM, Suthikulpanit, Suravee wrote:
> 
> 
> On 4/29/15, 09:47, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
>> On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
>>> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
>>>> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>>>>> +                       device->flags.cca_seen = 1;
>>>>> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>>>>> +                       /*
>>>>> +                        * Architecture has specified that if the
>>> device
>>>>> +                        * can do DMA, it must have ACPI _CCA object.
>>>>> +                        * Here, there could be two cases:
>>>>> +                        *   1. Not DMA-able device.
>>>>> +                        *   2. DMA-able device, but missing _CCA
>>> object.
>>>>> +                        *
>>>>> +                        * In both cases, we will default to dma
>>> non-coherent.
>>>>> +                        */
>>>>> +                       cca = 0;
>>>>> +               } else {
>>>>> +                       /*
>>>>> +                        * If architecture does not specify that
>>> device must
>>>>> +                        * specify ACPI _CCA (e.g. x86), we default
>>> to use
>>>>> +                        * dma coherent.
>>>>> +                        */
>>>>> +                       cca = 1;
>>>>> +               }
>>>>>
>>>>
>>>> What does it mean here if a device does DMA but is not coherent? Do
>>> you
>>>> have an example of a server that needs this?
>>>>
>>>> Can we please make the default for ARM64 cca=1 as well?
>>>>
>>>>       Arnd
>>>>
>>>
>>> Actually, I am trying to implement the logic for when missing _CCA to
>>> be 
>>> consistent with the behavior when the devicetree entry does not specify
>>> "dma-coherent" property. IIUC, in such case, Linux will default to
>>> using 
>>> non-coherent DMA.
>>
>> Why?
>>
>> 	Arnd
> 
> Otherwise, it would seem inconsistent with what states in the ACPI spec:
>  
>   CCA objects are only relevant for devices that can access CPU-visible
> memory,
>   such as devices that are DMA capable. On ARM based systems, the _CCA
> object 
>   must be supplied all such devices. On Intel platforms, if the _CCA
> object is 
>   not supplied, the OSPM will assume the devices are hardware cache
> coherent.
> 
> From the statement above, I interpreted as if it is not present, it would
> be non-coherent.
> 
> Suravee

A little background to Suravee's statement...

When the spec was being changed for _CCA, it was determined by the ASWG
that there was no reasonable default -- either choice would break something.
Multiple OSs, SoC vendors, and platform vendors were asked.  So, the spec
says for ARMv8, _CCA must be specified when needed and is not assumed to have
any value.  Obviously, any OS can choose to behave differently, but that's
what was specified and why it was specified that way.

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 15:39             ` Al Stone
  0 siblings, 0 replies; 102+ messages in thread
From: Al Stone @ 2015-04-29 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/29/2015 08:57 AM, Suthikulpanit, Suravee wrote:
> 
> 
> On 4/29/15, 09:47, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
>> On Wednesday 29 April 2015 09:45:43 Suravee Suthikulpanit wrote:
>>> On 04/29/2015 09:03 AM, Arnd Bergmann wrote:
>>>> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>>>>> +                       device->flags.cca_seen = 1;
>>>>> +               } else if (IS_ENABLED(CONFIG_ACPI_MUST_HAVE_CCA)) {
>>>>> +                       /*
>>>>> +                        * Architecture has specified that if the
>>> device
>>>>> +                        * can do DMA, it must have ACPI _CCA object.
>>>>> +                        * Here, there could be two cases:
>>>>> +                        *   1. Not DMA-able device.
>>>>> +                        *   2. DMA-able device, but missing _CCA
>>> object.
>>>>> +                        *
>>>>> +                        * In both cases, we will default to dma
>>> non-coherent.
>>>>> +                        */
>>>>> +                       cca = 0;
>>>>> +               } else {
>>>>> +                       /*
>>>>> +                        * If architecture does not specify that
>>> device must
>>>>> +                        * specify ACPI _CCA (e.g. x86), we default
>>> to use
>>>>> +                        * dma coherent.
>>>>> +                        */
>>>>> +                       cca = 1;
>>>>> +               }
>>>>>
>>>>
>>>> What does it mean here if a device does DMA but is not coherent? Do
>>> you
>>>> have an example of a server that needs this?
>>>>
>>>> Can we please make the default for ARM64 cca=1 as well?
>>>>
>>>>       Arnd
>>>>
>>>
>>> Actually, I am trying to implement the logic for when missing _CCA to
>>> be 
>>> consistent with the behavior when the devicetree entry does not specify
>>> "dma-coherent" property. IIUC, in such case, Linux will default to
>>> using 
>>> non-coherent DMA.
>>
>> Why?
>>
>> 	Arnd
> 
> Otherwise, it would seem inconsistent with what states in the ACPI spec:
>  
>   CCA objects are only relevant for devices that can access CPU-visible
> memory,
>   such as devices that are DMA capable. On ARM based systems, the _CCA
> object 
>   must be supplied all such devices. On Intel platforms, if the _CCA
> object is 
>   not supplied, the OSPM will assume the devices are hardware cache
> coherent.
> 
> From the statement above, I interpreted as if it is not present, it would
> be non-coherent.
> 
> Suravee

A little background to Suravee's statement...

When the spec was being changed for _CCA, it was determined by the ASWG
that there was no reasonable default -- either choice would break something.
Multiple OSs, SoC vendors, and platform vendors were asked.  So, the spec
says for ARMv8, _CCA must be specified when needed and is not assumed to have
any value.  Obviously, any OS can choose to behave differently, but that's
what was specified and why it was specified that way.

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone at linaro.org
-----------------------------------

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 14:57           ` Suthikulpanit, Suravee
  (?)
@ 2015-04-29 15:54             ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 15:54 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linaro-acpi, linux-arm-kernel, catalin.marinas, rjw,
	linux-kernel, will.deacon, linux-acpi, lenb,
	Charles Garcia-Tobin

On Wednesday 29 April 2015 14:57:10 Suthikulpanit, Suravee wrote:
> Otherwise, it would seem inconsistent with what states in the ACPI spec:
>  
>   CCA objects are only relevant for devices that can access CPU-visible
> memory,
>   such as devices that are DMA capable. On ARM based systems, the _CCA
> object 
>   must be supplied all such devices. On Intel platforms, if the _CCA
> object is 
>   not supplied, the OSPM will assume the devices are hardware cache
> coherent.
> 
> From the statement above, I interpreted as if it is not present, it would
> be non-coherent.
> 

My guess is that this section was included for Windows Phone, which runs
on embedded SoCs that usually have noncoherent DMA in a particular way.

Linux however only uses ACPI for servers, so that case does not happen.

I guess it would be reasonable to add a run-time warning here if you
try to do DMA on a device that does not have CCA set, and you should
probably set the DMA mask to 0 in that case as well.

Note that there are lots of ways in which you could have noncoherent DMA:
the default on ARM32 is that it requires uncached access or explicit
cache flushes, but it's also possible to have an SMP system where a device
is only coherent with some of the CPUs and requires explicit synchronization
(not flushes) otherwise. In a multi-level cache hierarchy, there could be
all sorts of combinations of flushes and syncs you would need to do.

With DT, we handle this using SoC-specific overrides for platforms that
are noncoherent in funny ways, see
http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
for instance. If we just disallow DMA to devices that are marked with _CCA=0
in ACPI, we can avoid this case, or discuss it by the time someone has hardware
that wants it, and then make a more informed decision about it.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 15:54             ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 15:54 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linaro-acpi, linux-arm-kernel, catalin.marinas, rjw,
	linux-kernel, will.deacon, linux-acpi, lenb,
	Charles Garcia-Tobin

On Wednesday 29 April 2015 14:57:10 Suthikulpanit, Suravee wrote:
> Otherwise, it would seem inconsistent with what states in the ACPI spec:
>  
>   CCA objects are only relevant for devices that can access CPU-visible
> memory,
>   such as devices that are DMA capable. On ARM based systems, the _CCA
> object 
>   must be supplied all such devices. On Intel platforms, if the _CCA
> object is 
>   not supplied, the OSPM will assume the devices are hardware cache
> coherent.
> 
> From the statement above, I interpreted as if it is not present, it would
> be non-coherent.
> 

My guess is that this section was included for Windows Phone, which runs
on embedded SoCs that usually have noncoherent DMA in a particular way.

Linux however only uses ACPI for servers, so that case does not happen.

I guess it would be reasonable to add a run-time warning here if you
try to do DMA on a device that does not have CCA set, and you should
probably set the DMA mask to 0 in that case as well.

Note that there are lots of ways in which you could have noncoherent DMA:
the default on ARM32 is that it requires uncached access or explicit
cache flushes, but it's also possible to have an SMP system where a device
is only coherent with some of the CPUs and requires explicit synchronization
(not flushes) otherwise. In a multi-level cache hierarchy, there could be
all sorts of combinations of flushes and syncs you would need to do.

With DT, we handle this using SoC-specific overrides for platforms that
are noncoherent in funny ways, see
http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
for instance. If we just disallow DMA to devices that are marked with _CCA=0
in ACPI, we can avoid this case, or discuss it by the time someone has hardware
that wants it, and then make a more informed decision about it.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 15:54             ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 15:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 29 April 2015 14:57:10 Suthikulpanit, Suravee wrote:
> Otherwise, it would seem inconsistent with what states in the ACPI spec:
>  
>   CCA objects are only relevant for devices that can access CPU-visible
> memory,
>   such as devices that are DMA capable. On ARM based systems, the _CCA
> object 
>   must be supplied all such devices. On Intel platforms, if the _CCA
> object is 
>   not supplied, the OSPM will assume the devices are hardware cache
> coherent.
> 
> From the statement above, I interpreted as if it is not present, it would
> be non-coherent.
> 

My guess is that this section was included for Windows Phone, which runs
on embedded SoCs that usually have noncoherent DMA in a particular way.

Linux however only uses ACPI for servers, so that case does not happen.

I guess it would be reasonable to add a run-time warning here if you
try to do DMA on a device that does not have CCA set, and you should
probably set the DMA mask to 0 in that case as well.

Note that there are lots of ways in which you could have noncoherent DMA:
the default on ARM32 is that it requires uncached access or explicit
cache flushes, but it's also possible to have an SMP system where a device
is only coherent with some of the CPUs and requires explicit synchronization
(not flushes) otherwise. In a multi-level cache hierarchy, there could be
all sorts of combinations of flushes and syncs you would need to do.

With DT, we handle this using SoC-specific overrides for platforms that
are noncoherent in funny ways, see
http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
for instance. If we just disallow DMA to devices that are marked with _CCA=0
in ACPI, we can avoid this case, or discuss it by the time someone has hardware
that wants it, and then make a more informed decision about it.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 15:39             ` Al Stone
  (?)
@ 2015-04-29 16:15               ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 16:15 UTC (permalink / raw)
  To: linaro-acpi
  Cc: Al Stone, Suthikulpanit, Suravee, catalin.marinas, rjw,
	linux-kernel, will.deacon, linux-acpi, linux-arm-kernel, lenb

On Wednesday 29 April 2015 09:39:25 Al Stone wrote:
> 
> When the spec was being changed for _CCA, it was determined by the ASWG
> that there was no reasonable default -- either choice would break something.
> Multiple OSs, SoC vendors, and platform vendors were asked.  So, the spec
> says for ARMv8, _CCA must be specified when needed and is not assumed to have
> any value.  Obviously, any OS can choose to behave differently, but that's
> what was specified and why it was specified that way.

Ok, so it was essentially a CYA strategy. As we know that for Linux we're
only interested in server parts here, but we also want to be compliant,
I'd still argue that we check the property value and just disallow DMA
for any device that is lacking CCA or contains zero here.

The current patch actually implements non-standard behavior: if _CCA
is missing, it registers the device as dma-capable with coherency turned
off, where my interpretation of the cited standard would be that we
treat a missing _CCA as not being able to perform DMA. What I'd like
to see instead is to only enable DMA support if _CCA is present and
enabled.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 16:15               ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 16:15 UTC (permalink / raw)
  To: linaro-acpi
  Cc: Al Stone, Suthikulpanit, Suravee, catalin.marinas, rjw,
	linux-kernel, will.deacon, linux-acpi, linux-arm-kernel, lenb

On Wednesday 29 April 2015 09:39:25 Al Stone wrote:
> 
> When the spec was being changed for _CCA, it was determined by the ASWG
> that there was no reasonable default -- either choice would break something.
> Multiple OSs, SoC vendors, and platform vendors were asked.  So, the spec
> says for ARMv8, _CCA must be specified when needed and is not assumed to have
> any value.  Obviously, any OS can choose to behave differently, but that's
> what was specified and why it was specified that way.

Ok, so it was essentially a CYA strategy. As we know that for Linux we're
only interested in server parts here, but we also want to be compliant,
I'd still argue that we check the property value and just disallow DMA
for any device that is lacking CCA or contains zero here.

The current patch actually implements non-standard behavior: if _CCA
is missing, it registers the device as dma-capable with coherency turned
off, where my interpretation of the cited standard would be that we
treat a missing _CCA as not being able to perform DMA. What I'd like
to see instead is to only enable DMA support if _CCA is present and
enabled.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 16:15               ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 29 April 2015 09:39:25 Al Stone wrote:
> 
> When the spec was being changed for _CCA, it was determined by the ASWG
> that there was no reasonable default -- either choice would break something.
> Multiple OSs, SoC vendors, and platform vendors were asked.  So, the spec
> says for ARMv8, _CCA must be specified when needed and is not assumed to have
> any value.  Obviously, any OS can choose to behave differently, but that's
> what was specified and why it was specified that way.

Ok, so it was essentially a CYA strategy. As we know that for Linux we're
only interested in server parts here, but we also want to be compliant,
I'd still argue that we check the property value and just disallow DMA
for any device that is lacking CCA or contains zero here.

The current patch actually implements non-standard behavior: if _CCA
is missing, it registers the device as dma-capable with coherency turned
off, where my interpretation of the cited standard would be that we
treat a missing _CCA as not being able to perform DMA. What I'd like
to see instead is to only enable DMA support if _CCA is present and
enabled.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 13:44   ` Suravee Suthikulpanit
@ 2015-04-29 16:25     ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 16:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Suravee Suthikulpanit, rjw, lenb, catalin.marinas, will.deacon,
	al.stone, linaro-acpi, linux-kernel, linux-acpi, leo.duran,
	hanjun.guo, msalter, grant.likely

On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
> index 4bf7559..a4db208 100644
> --- a/drivers/acpi/acpi_platform.c
> +++ b/drivers/acpi/acpi_platform.c
> @@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
>         if (IS_ERR(pdev))
>                 dev_err(&adev->dev, "platform device creation failed: %ld\n",
>                         PTR_ERR(pdev));
> -       else
> +       else {
> +               arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
> +                                  adev->flags.is_coherent);
>                 dev_dbg(&adev->dev, "created platform device %s\n",
>                         dev_name(&pdev->dev));
> +       }
>  
>         kfree(resources);
> 

Looking at this code in more detail, it seems that it unconditionally
sets pdevinfo.dma_mask = DMA_BIT_MASK(32), before calling
arch_setup_dma_ops(). This assignment should really done inside of
arch_setup_dma_ops() instead, which means we should implement that
function on all architectures that support ACPI.

For the case where _CCA is missing (or coherency disabled, if you ask
me), we would not call that function.

On a related note, I'm not sure how to handle different DMA masks here.
arch_setup_dma_ops() gets passed a size (and offset) argument, which should
match the DMA mask, but I don't know if there is a way to find out the
size from ACPI. Should we assume it's always 64-bit DMA capable?

For legacy reasons, the default mask is probably best left at 32-bit,
but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
and that should fail based on the information provided by the platform
if the bus is not capable of doing that.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 16:25     ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-29 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
> index 4bf7559..a4db208 100644
> --- a/drivers/acpi/acpi_platform.c
> +++ b/drivers/acpi/acpi_platform.c
> @@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
>         if (IS_ERR(pdev))
>                 dev_err(&adev->dev, "platform device creation failed: %ld\n",
>                         PTR_ERR(pdev));
> -       else
> +       else {
> +               arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
> +                                  adev->flags.is_coherent);
>                 dev_dbg(&adev->dev, "created platform device %s\n",
>                         dev_name(&pdev->dev));
> +       }
>  
>         kfree(resources);
> 

Looking at this code in more detail, it seems that it unconditionally
sets pdevinfo.dma_mask = DMA_BIT_MASK(32), before calling
arch_setup_dma_ops(). This assignment should really done inside of
arch_setup_dma_ops() instead, which means we should implement that
function on all architectures that support ACPI.

For the case where _CCA is missing (or coherency disabled, if you ask
me), we would not call that function.

On a related note, I'm not sure how to handle different DMA masks here.
arch_setup_dma_ops() gets passed a size (and offset) argument, which should
match the DMA mask, but I don't know if there is a way to find out the
size from ACPI. Should we assume it's always 64-bit DMA capable?

For legacy reasons, the default mask is probably best left at 32-bit,
but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
and that should fail based on the information provided by the platform
if the bus is not capable of doing that.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 16:25     ` Arnd Bergmann
  (?)
@ 2015-04-29 21:53       ` Suravee Suthikulpanit
  -1 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 21:53 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: rjw, lenb, catalin.marinas, will.deacon, al.stone, linaro-acpi,
	linux-kernel, linux-acpi, leo.duran, hanjun.guo, msalter,
	grant.likely

On 4/29/15 11:25, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
>> index 4bf7559..a4db208 100644
>> --- a/drivers/acpi/acpi_platform.c
>> +++ b/drivers/acpi/acpi_platform.c
>> @@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
>>          if (IS_ERR(pdev))
>>                  dev_err(&adev->dev, "platform device creation failed: %ld\n",
>>                          PTR_ERR(pdev));
>> -       else
>> +       else {
>> +               arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
>> +                                  adev->flags.is_coherent);
>>                  dev_dbg(&adev->dev, "created platform device %s\n",
>>                          dev_name(&pdev->dev));
>> +       }
>>
>>          kfree(resources);
>>
>
> Looking at this code in more detail, it seems that it unconditionally
> sets pdevinfo.dma_mask = DMA_BIT_MASK(32), before calling
> arch_setup_dma_ops().

I think that's just the default legacy value assigned when it first 
create the platform_device from acpi_device.

> This assignment should really done inside of arch_setup_dma_ops()
 > instead, which means we should implement that
> function on all architectures that support ACPI.


> For the case where _CCA is missing (or coherency disabled, if you ask
> me), we would not call that function.

Actually, I agree for the case of missing _CCA when needed, ACPI driver 
probably should not make assumption and leave the decision for the 
default underlying arch-specific default. Basically, it should not be 
calling arch_setup_dma_ops().

As for the case where _CCA=0, I think the ACPI driver should essentially 
communicate the information as HW is non-coherent as described in the 
spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
that this in probably less-likely for the ARM64 server platforms. 
However, I would think that the ACPI driver should not be making such 
assumption.

> On a related note, I'm not sure how to handle different DMA masks here.
> arch_setup_dma_ops() gets passed a size (and offset) argument, which should
> match the DMA mask, but I don't know if there is a way to find out the
> size from ACPI. Should we assume it's always 64-bit DMA capable?

Looking at the ACPI spec, it does have the _DMA object. IIUC, this can 
be used to describe DMA properties of a particular bus.

Method(_DMA, ResourceTemplate()
{
	QWORDMemory(
	ResourceConsumer,
	PosDecode, // _DEC
	MinFixed, // _MIF
	MaxFixed, // _MAF
	Prefetchable, // _MEM
	ReadWrite, // _RW
	0, // _GRA
	0, // _MIN
	0x1fffffff, // _MAX
	0x200000000, // _TRA
	0x20000000, // _LEN
	, , ,	
	)
}

I am not sure if this is an appropriate use for this object, but this 
seems to be similar to the dma-ranges property for OF, and probably can 
be used to specify baseaddr and size information when calling 
arch_setup_dma_ops().

> For legacy reasons, the default mask is probably best left at 32-bit,
> but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
> and that should fail based on the information provided by the platform
> if the bus is not capable of doing that.
>
> 	Arnd
>

However, on ARM64 the dma_base and size parameter for 
arch_setup_dma_ops() is currently not used, and only coherent flag is 
used. We probably should look at this separately. For the moment, we can 
probably say that if _CCA object is missing when needed, the ACPI driver 
won't set up dma_mask when creating platform_device, which should be 
equivalent to saying DMA is not supported.

Please let me know if this is acceptable, and I'll make change in V2 
accordingly.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 21:53       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 21:53 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: rjw, lenb, catalin.marinas, will.deacon, al.stone, linaro-acpi,
	linux-kernel, linux-acpi, leo.duran, hanjun.guo, msalter,
	grant.likely

On 4/29/15 11:25, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
>> index 4bf7559..a4db208 100644
>> --- a/drivers/acpi/acpi_platform.c
>> +++ b/drivers/acpi/acpi_platform.c
>> @@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
>>          if (IS_ERR(pdev))
>>                  dev_err(&adev->dev, "platform device creation failed: %ld\n",
>>                          PTR_ERR(pdev));
>> -       else
>> +       else {
>> +               arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
>> +                                  adev->flags.is_coherent);
>>                  dev_dbg(&adev->dev, "created platform device %s\n",
>>                          dev_name(&pdev->dev));
>> +       }
>>
>>          kfree(resources);
>>
>
> Looking at this code in more detail, it seems that it unconditionally
> sets pdevinfo.dma_mask = DMA_BIT_MASK(32), before calling
> arch_setup_dma_ops().

I think that's just the default legacy value assigned when it first 
create the platform_device from acpi_device.

> This assignment should really done inside of arch_setup_dma_ops()
 > instead, which means we should implement that
> function on all architectures that support ACPI.


> For the case where _CCA is missing (or coherency disabled, if you ask
> me), we would not call that function.

Actually, I agree for the case of missing _CCA when needed, ACPI driver 
probably should not make assumption and leave the decision for the 
default underlying arch-specific default. Basically, it should not be 
calling arch_setup_dma_ops().

As for the case where _CCA=0, I think the ACPI driver should essentially 
communicate the information as HW is non-coherent as described in the 
spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
that this in probably less-likely for the ARM64 server platforms. 
However, I would think that the ACPI driver should not be making such 
assumption.

> On a related note, I'm not sure how to handle different DMA masks here.
> arch_setup_dma_ops() gets passed a size (and offset) argument, which should
> match the DMA mask, but I don't know if there is a way to find out the
> size from ACPI. Should we assume it's always 64-bit DMA capable?

Looking at the ACPI spec, it does have the _DMA object. IIUC, this can 
be used to describe DMA properties of a particular bus.

Method(_DMA, ResourceTemplate()
{
	QWORDMemory(
	ResourceConsumer,
	PosDecode, // _DEC
	MinFixed, // _MIF
	MaxFixed, // _MAF
	Prefetchable, // _MEM
	ReadWrite, // _RW
	0, // _GRA
	0, // _MIN
	0x1fffffff, // _MAX
	0x200000000, // _TRA
	0x20000000, // _LEN
	, , ,	
	)
}

I am not sure if this is an appropriate use for this object, but this 
seems to be similar to the dma-ranges property for OF, and probably can 
be used to specify baseaddr and size information when calling 
arch_setup_dma_ops().

> For legacy reasons, the default mask is probably best left at 32-bit,
> but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
> and that should fail based on the information provided by the platform
> if the bus is not capable of doing that.
>
> 	Arnd
>

However, on ARM64 the dma_base and size parameter for 
arch_setup_dma_ops() is currently not used, and only coherent flag is 
used. We probably should look at this separately. For the moment, we can 
probably say that if _CCA object is missing when needed, the ACPI driver 
won't set up dma_mask when creating platform_device, which should be 
equivalent to saying DMA is not supported.

Please let me know if this is acceptable, and I'll make change in V2 
accordingly.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-29 21:53       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulpanit @ 2015-04-29 21:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/29/15 11:25, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
>> index 4bf7559..a4db208 100644
>> --- a/drivers/acpi/acpi_platform.c
>> +++ b/drivers/acpi/acpi_platform.c
>> @@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
>>          if (IS_ERR(pdev))
>>                  dev_err(&adev->dev, "platform device creation failed: %ld\n",
>>                          PTR_ERR(pdev));
>> -       else
>> +       else {
>> +               arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
>> +                                  adev->flags.is_coherent);
>>                  dev_dbg(&adev->dev, "created platform device %s\n",
>>                          dev_name(&pdev->dev));
>> +       }
>>
>>          kfree(resources);
>>
>
> Looking at this code in more detail, it seems that it unconditionally
> sets pdevinfo.dma_mask = DMA_BIT_MASK(32), before calling
> arch_setup_dma_ops().

I think that's just the default legacy value assigned when it first 
create the platform_device from acpi_device.

> This assignment should really done inside of arch_setup_dma_ops()
 > instead, which means we should implement that
> function on all architectures that support ACPI.


> For the case where _CCA is missing (or coherency disabled, if you ask
> me), we would not call that function.

Actually, I agree for the case of missing _CCA when needed, ACPI driver 
probably should not make assumption and leave the decision for the 
default underlying arch-specific default. Basically, it should not be 
calling arch_setup_dma_ops().

As for the case where _CCA=0, I think the ACPI driver should essentially 
communicate the information as HW is non-coherent as described in the 
spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
that this in probably less-likely for the ARM64 server platforms. 
However, I would think that the ACPI driver should not be making such 
assumption.

> On a related note, I'm not sure how to handle different DMA masks here.
> arch_setup_dma_ops() gets passed a size (and offset) argument, which should
> match the DMA mask, but I don't know if there is a way to find out the
> size from ACPI. Should we assume it's always 64-bit DMA capable?

Looking at the ACPI spec, it does have the _DMA object. IIUC, this can 
be used to describe DMA properties of a particular bus.

Method(_DMA, ResourceTemplate()
{
	QWORDMemory(
	ResourceConsumer,
	PosDecode, // _DEC
	MinFixed, // _MIF
	MaxFixed, // _MAF
	Prefetchable, // _MEM
	ReadWrite, // _RW
	0, // _GRA
	0, // _MIN
	0x1fffffff, // _MAX
	0x200000000, // _TRA
	0x20000000, // _LEN
	, , ,	
	)
}

I am not sure if this is an appropriate use for this object, but this 
seems to be similar to the dma-ranges property for OF, and probably can 
be used to specify baseaddr and size information when calling 
arch_setup_dma_ops().

> For legacy reasons, the default mask is probably best left at 32-bit,
> but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
> and that should fail based on the information provided by the platform
> if the bus is not capable of doing that.
>
> 	Arnd
>

However, on ARM64 the dma_base and size parameter for 
arch_setup_dma_ops() is currently not used, and only coherent flag is 
used. We probably should look at this separately. For the moment, we can 
probably say that if _CCA object is missing when needed, the ACPI driver 
won't set up dma_mask when creating platform_device, which should be 
equivalent to saying DMA is not supported.

Please let me know if this is acceptable, and I'll make change in V2 
accordingly.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 21:53       ` Suravee Suthikulpanit
@ 2015-04-30  8:23         ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30  8:23 UTC (permalink / raw)
  To: linaro-acpi
  Cc: Suravee Suthikulpanit, linux-arm-kernel, catalin.marinas, rjw,
	linux-kernel, will.deacon, linux-acpi, lenb

On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> On 4/29/15 11:25, Arnd Bergmann wrote:
> > On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> >> diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
> >> index 4bf7559..a4db208 100644
> >> --- a/drivers/acpi/acpi_platform.c
> >> +++ b/drivers/acpi/acpi_platform.c
> >> @@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
> >>          if (IS_ERR(pdev))
> >>                  dev_err(&adev->dev, "platform device creation failed: %ld\n",
> >>                          PTR_ERR(pdev));
> >> -       else
> >> +       else {
> >> +               arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
> >> +                                  adev->flags.is_coherent);
> >>                  dev_dbg(&adev->dev, "created platform device %s\n",
> >>                          dev_name(&pdev->dev));
> >> +       }
> >>
> >>          kfree(resources);
> >>
> >
> > Looking at this code in more detail, it seems that it unconditionally
> > sets pdevinfo.dma_mask = DMA_BIT_MASK(32), before calling
> > arch_setup_dma_ops().
> 
> I think that's just the default legacy value assigned when it first 
> create the platform_device from acpi_device.

Understood. And on x86 there is no way to find out if a device supports
DMA or not, so it has to do this I guess.

> > This assignment should really done inside of arch_setup_dma_ops()
>  > instead, which means we should implement that
> > function on all architectures that support ACPI.
> 
> 
> > For the case where _CCA is missing (or coherency disabled, if you ask
> > me), we would not call that function.
> 
> Actually, I agree for the case of missing _CCA when needed, ACPI driver 
> probably should not make assumption and leave the decision for the 
> default underlying arch-specific default. Basically, it should not be 
> calling arch_setup_dma_ops().

Ok.

> As for the case where _CCA=0, I think the ACPI driver should essentially 
> communicate the information as HW is non-coherent as described in the 
> spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> that this in probably less-likely for the ARM64 server platforms. 
> However, I would think that the ACPI driver should not be making such 
> assumption.

Can you add a description to the ACPI spec then to describe in detail what
"non-coherent" is supposed to mean, and which action the OS is supposed to
take when accessing data from device or CPU?

As I explained, the way we handle it by default on ARM64 is what embedded
systems typically do, but that might be completely different on the imagined
server chips that are not coherent for some reason. Just saying a device
is not coherent is like saying the CPU has known bugs but not saying how
to prevent it from crashing.

Is there some AML method that the OS can call to synchronize the cache
controller for all DMA to/from a particular device?

> > On a related note, I'm not sure how to handle different DMA masks here.
> > arch_setup_dma_ops() gets passed a size (and offset) argument, which should
> > match the DMA mask, but I don't know if there is a way to find out the
> > size from ACPI. Should we assume it's always 64-bit DMA capable?
> 
> Looking at the ACPI spec, it does have the _DMA object. IIUC, this can 
> be used to describe DMA properties of a particular bus.
> 
> Method(_DMA, ResourceTemplate()
> {
> 	QWORDMemory(
> 	ResourceConsumer,
> 	PosDecode, // _DEC
> 	MinFixed, // _MIF
> 	MaxFixed, // _MAF
> 	Prefetchable, // _MEM
> 	ReadWrite, // _RW
> 	0, // _GRA
> 	0, // _MIN
> 	0x1fffffff, // _MAX
> 	0x200000000, // _TRA
> 	0x20000000, // _LEN
> 	, , ,	
> 	)
> }
> 
> I am not sure if this is an appropriate use for this object, but this 
> seems to be similar to the dma-ranges property for OF, and probably can 
> be used to specify baseaddr and size information when calling 
> arch_setup_dma_ops().

Yes, that seems like a good idea. What is the expected behavior when that
object is absent? Do we assume that the parent device is not DMA capable?

Is this sufficient to describe the case where a device can only do DMA
to a specific address range that is not at bus address zero but that maps
to the beginning of physical RAM?

> > For legacy reasons, the default mask is probably best left at 32-bit,
> > but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
> > and that should fail based on the information provided by the platform
> > if the bus is not capable of doing that.
> >
> 
> However, on ARM64 the dma_base and size parameter for 
> arch_setup_dma_ops() is currently not used, and only coherent flag is 
> used. 

We can hope that we won't need the dma_base setting here, but it's
good to have the option to pass it down if we need it.

Not passing the size is a bug that needs to be fixed ASAP, I believe
a number of folks have run into this, most recently the APM X-Gene
MMC controller

> We probably should look at this separately. For the moment, we can 
> probably say that if _CCA object is missing when needed, the ACPI driver 
> won't set up dma_mask when creating platform_device, which should be 
> equivalent to saying DMA is not supported.
> 
> Please let me know if this is acceptable, and I'll make change in V2 
> accordingly.

I would still ask that you treat non-coherent to mean "no DMA" until
we have come up with a way to sufficiently describe the kind of
non-coherency in ACPI.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30  8:23         ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30  8:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> On 4/29/15 11:25, Arnd Bergmann wrote:
> > On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
> >> diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
> >> index 4bf7559..a4db208 100644
> >> --- a/drivers/acpi/acpi_platform.c
> >> +++ b/drivers/acpi/acpi_platform.c
> >> @@ -108,9 +108,12 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *adev)
> >>          if (IS_ERR(pdev))
> >>                  dev_err(&adev->dev, "platform device creation failed: %ld\n",
> >>                          PTR_ERR(pdev));
> >> -       else
> >> +       else {
> >> +               arch_setup_dma_ops(&pdev->dev, 0, 0, NULL,
> >> +                                  adev->flags.is_coherent);
> >>                  dev_dbg(&adev->dev, "created platform device %s\n",
> >>                          dev_name(&pdev->dev));
> >> +       }
> >>
> >>          kfree(resources);
> >>
> >
> > Looking at this code in more detail, it seems that it unconditionally
> > sets pdevinfo.dma_mask = DMA_BIT_MASK(32), before calling
> > arch_setup_dma_ops().
> 
> I think that's just the default legacy value assigned when it first 
> create the platform_device from acpi_device.

Understood. And on x86 there is no way to find out if a device supports
DMA or not, so it has to do this I guess.

> > This assignment should really done inside of arch_setup_dma_ops()
>  > instead, which means we should implement that
> > function on all architectures that support ACPI.
> 
> 
> > For the case where _CCA is missing (or coherency disabled, if you ask
> > me), we would not call that function.
> 
> Actually, I agree for the case of missing _CCA when needed, ACPI driver 
> probably should not make assumption and leave the decision for the 
> default underlying arch-specific default. Basically, it should not be 
> calling arch_setup_dma_ops().

Ok.

> As for the case where _CCA=0, I think the ACPI driver should essentially 
> communicate the information as HW is non-coherent as described in the 
> spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> that this in probably less-likely for the ARM64 server platforms. 
> However, I would think that the ACPI driver should not be making such 
> assumption.

Can you add a description to the ACPI spec then to describe in detail what
"non-coherent" is supposed to mean, and which action the OS is supposed to
take when accessing data from device or CPU?

As I explained, the way we handle it by default on ARM64 is what embedded
systems typically do, but that might be completely different on the imagined
server chips that are not coherent for some reason. Just saying a device
is not coherent is like saying the CPU has known bugs but not saying how
to prevent it from crashing.

Is there some AML method that the OS can call to synchronize the cache
controller for all DMA to/from a particular device?

> > On a related note, I'm not sure how to handle different DMA masks here.
> > arch_setup_dma_ops() gets passed a size (and offset) argument, which should
> > match the DMA mask, but I don't know if there is a way to find out the
> > size from ACPI. Should we assume it's always 64-bit DMA capable?
> 
> Looking at the ACPI spec, it does have the _DMA object. IIUC, this can 
> be used to describe DMA properties of a particular bus.
> 
> Method(_DMA, ResourceTemplate()
> {
> 	QWORDMemory(
> 	ResourceConsumer,
> 	PosDecode, // _DEC
> 	MinFixed, // _MIF
> 	MaxFixed, // _MAF
> 	Prefetchable, // _MEM
> 	ReadWrite, // _RW
> 	0, // _GRA
> 	0, // _MIN
> 	0x1fffffff, // _MAX
> 	0x200000000, // _TRA
> 	0x20000000, // _LEN
> 	, , ,	
> 	)
> }
> 
> I am not sure if this is an appropriate use for this object, but this 
> seems to be similar to the dma-ranges property for OF, and probably can 
> be used to specify baseaddr and size information when calling 
> arch_setup_dma_ops().

Yes, that seems like a good idea. What is the expected behavior when that
object is absent? Do we assume that the parent device is not DMA capable?

Is this sufficient to describe the case where a device can only do DMA
to a specific address range that is not at bus address zero but that maps
to the beginning of physical RAM?

> > For legacy reasons, the default mask is probably best left at 32-bit,
> > but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
> > and that should fail based on the information provided by the platform
> > if the bus is not capable of doing that.
> >
> 
> However, on ARM64 the dma_base and size parameter for 
> arch_setup_dma_ops() is currently not used, and only coherent flag is 
> used. 

We can hope that we won't need the dma_base setting here, but it's
good to have the option to pass it down if we need it.

Not passing the size is a bug that needs to be fixed ASAP, I believe
a number of folks have run into this, most recently the APM X-Gene
MMC controller

> We probably should look at this separately. For the moment, we can 
> probably say that if _CCA object is missing when needed, the ACPI driver 
> won't set up dma_mask when creating platform_device, which should be 
> equivalent to saying DMA is not supported.
> 
> Please let me know if this is acceptable, and I'll make change in V2 
> accordingly.

I would still ask that you treat non-coherent to mean "no DMA" until
we have come up with a way to sufficiently describe the kind of
non-coherency in ACPI.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30  8:23         ` Arnd Bergmann
  (?)
@ 2015-04-30 10:41           ` Will Deacon
  -1 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 10:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

Hi Arnd,

On Thu, Apr 30, 2015 at 09:23:59AM +0100, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> > As for the case where _CCA=0, I think the ACPI driver should essentially 
> > communicate the information as HW is non-coherent as described in the 
> > spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> > that this in probably less-likely for the ARM64 server platforms. 
> > However, I would think that the ACPI driver should not be making such 
> > assumption.
> 
> Can you add a description to the ACPI spec then to describe in detail what
> "non-coherent" is supposed to mean, and which action the OS is supposed to
> take when accessing data from device or CPU?

You may be interested in the IORT ACPI companion spec here:

  http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf

On CCA, it says:

  `This value must match the value returned by the _CCA object defined in
   the DSDT for the device represented by this node. The attribute can take
   the following values:

   - 0x1: The device is fully coherent. No cache maintenance[1] is required for
     memory shared with the device which is mapped on CPUs as
     Inner Write-Back (IWB), Outer Write-back (OWB), and Inner
     shareable (ISH). In addition, during system initialization at cold
     boot, or after wakeup from low-power state, if the cache
     coherency requires an SMMU override or some specific device
     configuration, the platform firmware has to ensure that this has
     been done. Therefore the semantics represented by a value of
     0x1 are always correct at the time of hand-off from firmware to
     OS.

   - 0x0: The device is not coherent. Therefore:
     * Cache maintenance is required for memory shared with the
       device that is mapped on CPUs as IWB-OWB-ISH.
     * No cache maintenance is required for memory shared with the
       device that is mapped on the CPU as device or Non-cacheable.

   All other values are reserved.

[1] Note: Caching operations described in this document apply to the CPU
    caches and any other caches in the system where device memory accesses
    can hit.'

This aside, the documented introduces some useful, related concepts such
as CPM (coherent path to memory) and DACS (device attributes are cacheable
and inner shareable) for describing different IO subsystems. It also has
mechanisms to descibe ID repainting from PCI->SMMU->ITS.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 10:41           ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 10:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

Hi Arnd,

On Thu, Apr 30, 2015 at 09:23:59AM +0100, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> > As for the case where _CCA=0, I think the ACPI driver should essentially 
> > communicate the information as HW is non-coherent as described in the 
> > spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> > that this in probably less-likely for the ARM64 server platforms. 
> > However, I would think that the ACPI driver should not be making such 
> > assumption.
> 
> Can you add a description to the ACPI spec then to describe in detail what
> "non-coherent" is supposed to mean, and which action the OS is supposed to
> take when accessing data from device or CPU?

You may be interested in the IORT ACPI companion spec here:

  http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf

On CCA, it says:

  `This value must match the value returned by the _CCA object defined in
   the DSDT for the device represented by this node. The attribute can take
   the following values:

   - 0x1: The device is fully coherent. No cache maintenance[1] is required for
     memory shared with the device which is mapped on CPUs as
     Inner Write-Back (IWB), Outer Write-back (OWB), and Inner
     shareable (ISH). In addition, during system initialization at cold
     boot, or after wakeup from low-power state, if the cache
     coherency requires an SMMU override or some specific device
     configuration, the platform firmware has to ensure that this has
     been done. Therefore the semantics represented by a value of
     0x1 are always correct at the time of hand-off from firmware to
     OS.

   - 0x0: The device is not coherent. Therefore:
     * Cache maintenance is required for memory shared with the
       device that is mapped on CPUs as IWB-OWB-ISH.
     * No cache maintenance is required for memory shared with the
       device that is mapped on the CPU as device or Non-cacheable.

   All other values are reserved.

[1] Note: Caching operations described in this document apply to the CPU
    caches and any other caches in the system where device memory accesses
    can hit.'

This aside, the documented introduces some useful, related concepts such
as CPM (coherent path to memory) and DACS (device attributes are cacheable
and inner shareable) for describing different IO subsystems. It also has
mechanisms to descibe ID repainting from PCI->SMMU->ITS.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 10:41           ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,

On Thu, Apr 30, 2015 at 09:23:59AM +0100, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> > As for the case where _CCA=0, I think the ACPI driver should essentially 
> > communicate the information as HW is non-coherent as described in the 
> > spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> > that this in probably less-likely for the ARM64 server platforms. 
> > However, I would think that the ACPI driver should not be making such 
> > assumption.
> 
> Can you add a description to the ACPI spec then to describe in detail what
> "non-coherent" is supposed to mean, and which action the OS is supposed to
> take when accessing data from device or CPU?

You may be interested in the IORT ACPI companion spec here:

  http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf

On CCA, it says:

  `This value must match the value returned by the _CCA object defined in
   the DSDT for the device represented by this node. The attribute can take
   the following values:

   - 0x1: The device is fully coherent. No cache maintenance[1] is required for
     memory shared with the device which is mapped on CPUs as
     Inner Write-Back (IWB), Outer Write-back (OWB), and Inner
     shareable (ISH). In addition, during system initialization at cold
     boot, or after wakeup from low-power state, if the cache
     coherency requires an SMMU override or some specific device
     configuration, the platform firmware has to ensure that this has
     been done. Therefore the semantics represented by a value of
     0x1 are always correct at the time of hand-off from firmware to
     OS.

   - 0x0: The device is not coherent. Therefore:
     * Cache maintenance is required for memory shared with the
       device that is mapped on CPUs as IWB-OWB-ISH.
     * No cache maintenance is required for memory shared with the
       device that is mapped on the CPU as device or Non-cacheable.

   All other values are reserved.

[1] Note: Caching operations described in this document apply to the CPU
    caches and any other caches in the system where device memory accesses
    can hit.'

This aside, the documented introduces some useful, related concepts such
as CPM (coherent path to memory) and DACS (device attributes are cacheable
and inner shareable) for describing different IO subsystems. It also has
mechanisms to descibe ID repainting from PCI->SMMU->ITS.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 10:41           ` Will Deacon
  (?)
@ 2015-04-30 10:47             ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 10:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> Hi Arnd,
> 
> On Thu, Apr 30, 2015 at 09:23:59AM +0100, Arnd Bergmann wrote:
> > On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> > > As for the case where _CCA=0, I think the ACPI driver should essentially 
> > > communicate the information as HW is non-coherent as described in the 
> > > spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> > > that this in probably less-likely for the ARM64 server platforms. 
> > > However, I would think that the ACPI driver should not be making such 
> > > assumption.
> > 
> > Can you add a description to the ACPI spec then to describe in detail what
> > "non-coherent" is supposed to mean, and which action the OS is supposed to
> > take when accessing data from device or CPU?
> 
> You may be interested in the IORT ACPI companion spec here:
> 
>   http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf
> 
> On CCA, it says:
> 
>   `This value must match the value returned by the _CCA object defined in
>    the DSDT for the device represented by this node. The attribute can take
>    the following values:
> 
>    - 0x1: The device is fully coherent. No cache maintenance[1] is required for
>      memory shared with the device which is mapped on CPUs as
>      Inner Write-Back (IWB), Outer Write-back (OWB), and Inner
>      shareable (ISH). In addition, during system initialization at cold
>      boot, or after wakeup from low-power state, if the cache
>      coherency requires an SMMU override or some specific device
>      configuration, the platform firmware has to ensure that this has
>      been done. Therefore the semantics represented by a value of
>      0x1 are always correct at the time of hand-off from firmware to
>      OS.

Ok, this part absolutely makes sense.

>    - 0x0: The device is not coherent. Therefore:
>      * Cache maintenance is required for memory shared with the
>        device that is mapped on CPUs as IWB-OWB-ISH.

This still seems insufficient. I guess this excludes having to
synchronize external bridges or write buffers, but it does not specify
what cache maintenance is required. Should there be an "outer-flush"?
Should the CPU cache be invalidated or flushed (or both), and do
we need to care about caches inside of the device or just inside of
the CPU?

>      * No cache maintenance is required for memory shared with the
>        device that is mapped on the CPU as device or Non-cacheable.
> 
>    All other values are reserved.
> 
> [1] Note: Caching operations described in this document apply to the CPU
>     caches and any other caches in the system where device memory accesses
>     can hit.'
> 
> This aside, the documented introduces some useful, related concepts such
> as CPM (coherent path to memory) and DACS (device attributes are cacheable
> and inner shareable) for describing different IO subsystems. It also has
> mechanisms to descibe ID repainting from PCI->SMMU->ITS.

Ah, good.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 10:47             ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 10:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> Hi Arnd,
> 
> On Thu, Apr 30, 2015 at 09:23:59AM +0100, Arnd Bergmann wrote:
> > On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> > > As for the case where _CCA=0, I think the ACPI driver should essentially 
> > > communicate the information as HW is non-coherent as described in the 
> > > spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> > > that this in probably less-likely for the ARM64 server platforms. 
> > > However, I would think that the ACPI driver should not be making such 
> > > assumption.
> > 
> > Can you add a description to the ACPI spec then to describe in detail what
> > "non-coherent" is supposed to mean, and which action the OS is supposed to
> > take when accessing data from device or CPU?
> 
> You may be interested in the IORT ACPI companion spec here:
> 
>   http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf
> 
> On CCA, it says:
> 
>   `This value must match the value returned by the _CCA object defined in
>    the DSDT for the device represented by this node. The attribute can take
>    the following values:
> 
>    - 0x1: The device is fully coherent. No cache maintenance[1] is required for
>      memory shared with the device which is mapped on CPUs as
>      Inner Write-Back (IWB), Outer Write-back (OWB), and Inner
>      shareable (ISH). In addition, during system initialization at cold
>      boot, or after wakeup from low-power state, if the cache
>      coherency requires an SMMU override or some specific device
>      configuration, the platform firmware has to ensure that this has
>      been done. Therefore the semantics represented by a value of
>      0x1 are always correct at the time of hand-off from firmware to
>      OS.

Ok, this part absolutely makes sense.

>    - 0x0: The device is not coherent. Therefore:
>      * Cache maintenance is required for memory shared with the
>        device that is mapped on CPUs as IWB-OWB-ISH.

This still seems insufficient. I guess this excludes having to
synchronize external bridges or write buffers, but it does not specify
what cache maintenance is required. Should there be an "outer-flush"?
Should the CPU cache be invalidated or flushed (or both), and do
we need to care about caches inside of the device or just inside of
the CPU?

>      * No cache maintenance is required for memory shared with the
>        device that is mapped on the CPU as device or Non-cacheable.
> 
>    All other values are reserved.
> 
> [1] Note: Caching operations described in this document apply to the CPU
>     caches and any other caches in the system where device memory accesses
>     can hit.'
> 
> This aside, the documented introduces some useful, related concepts such
> as CPM (coherent path to memory) and DACS (device attributes are cacheable
> and inner shareable) for describing different IO subsystems. It also has
> mechanisms to descibe ID repainting from PCI->SMMU->ITS.

Ah, good.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 10:47             ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 10:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> Hi Arnd,
> 
> On Thu, Apr 30, 2015 at 09:23:59AM +0100, Arnd Bergmann wrote:
> > On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
> > > As for the case where _CCA=0, I think the ACPI driver should essentially 
> > > communicate the information as HW is non-coherent as described in the 
> > > spec, and should be calling arch_setup_dma_ops(dev, false). It is true 
> > > that this in probably less-likely for the ARM64 server platforms. 
> > > However, I would think that the ACPI driver should not be making such 
> > > assumption.
> > 
> > Can you add a description to the ACPI spec then to describe in detail what
> > "non-coherent" is supposed to mean, and which action the OS is supposed to
> > take when accessing data from device or CPU?
> 
> You may be interested in the IORT ACPI companion spec here:
> 
>   http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf
> 
> On CCA, it says:
> 
>   `This value must match the value returned by the _CCA object defined in
>    the DSDT for the device represented by this node. The attribute can take
>    the following values:
> 
>    - 0x1: The device is fully coherent. No cache maintenance[1] is required for
>      memory shared with the device which is mapped on CPUs as
>      Inner Write-Back (IWB), Outer Write-back (OWB), and Inner
>      shareable (ISH). In addition, during system initialization at cold
>      boot, or after wakeup from low-power state, if the cache
>      coherency requires an SMMU override or some specific device
>      configuration, the platform firmware has to ensure that this has
>      been done. Therefore the semantics represented by a value of
>      0x1 are always correct at the time of hand-off from firmware to
>      OS.

Ok, this part absolutely makes sense.

>    - 0x0: The device is not coherent. Therefore:
>      * Cache maintenance is required for memory shared with the
>        device that is mapped on CPUs as IWB-OWB-ISH.

This still seems insufficient. I guess this excludes having to
synchronize external bridges or write buffers, but it does not specify
what cache maintenance is required. Should there be an "outer-flush"?
Should the CPU cache be invalidated or flushed (or both), and do
we need to care about caches inside of the device or just inside of
the CPU?

>      * No cache maintenance is required for memory shared with the
>        device that is mapped on the CPU as device or Non-cacheable.
> 
>    All other values are reserved.
> 
> [1] Note: Caching operations described in this document apply to the CPU
>     caches and any other caches in the system where device memory accesses
>     can hit.'
> 
> This aside, the documented introduces some useful, related concepts such
> as CPM (coherent path to memory) and DACS (device attributes are cacheable
> and inner shareable) for describing different IO subsystems. It also has
> mechanisms to descibe ID repainting from PCI->SMMU->ITS.

Ah, good.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 10:47             ` Arnd Bergmann
  (?)
@ 2015-04-30 11:07               ` Will Deacon
  -1 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 11:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thu, Apr 30, 2015 at 11:47:46AM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> >    - 0x0: The device is not coherent. Therefore:
> >      * Cache maintenance is required for memory shared with the
> >        device that is mapped on CPUs as IWB-OWB-ISH.
> 
> This still seems insufficient. I guess this excludes having to
> synchronize external bridges or write buffers, but it does not specify
> what cache maintenance is required. Should there be an "outer-flush"?
> Should the CPU cache be invalidated or flushed (or both), and do
> we need to care about caches inside of the device or just inside of
> the CPU?

See the note below:

> > [1] Note: Caching operations described in this document apply to the CPU
> >     caches and any other caches in the system where device memory accesses
> >     can hit.'

So for the CPU caches we'd do the usual clean to push dirty lines to the device
and (clean+)invalidate before reading data from the device. For the "other
caches in the system" we currently assume (for ARM64) that cache maintenance
will be broadcast and therefore I wouldn't anticipate doing anything extra.

If people want to build system caches that don't respect broadcast cache
maintenance and require explicit management (e.g outer_flush), then I
consider that a broken system and we should try to disable the cache before
entering the kernel. ARMv8 explicitly prohibits this type of cache in the
architecture (type 1 below):

  `Conceptually, three classes of system cache can be envisaged:

   1. System caches which lie before the point of coherency and cannot
      be managed by any cache maintenance instructions. Such systems
      fundamentally undermine the concept of cache maintenance
      instructions operating to the point of coherency, as they imply
      the use of non-architecture mechanisms to manage coherency. The
      use of such systems in the ARM architecture is explicitly
      prohibited.

   2. System caches which lie before the point of coherency and can be
      managed by cache maintenance by address instructions that apply to
      the point of coherency, but cannot be managed by cache maintenance
      by set/way instructions. Where maintenance of the entirety of such
      a cache must be performed, as in the case for power management, it
      must be performed using non-architectural mechanisms.

   3. System caches which lie beyond the point of coherency and so are
      invisible to the software. The management of such caches is
      outside the scope of the architecture.'

(sorry to keep throwing the book at you!)

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 11:07               ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 11:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thu, Apr 30, 2015 at 11:47:46AM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> >    - 0x0: The device is not coherent. Therefore:
> >      * Cache maintenance is required for memory shared with the
> >        device that is mapped on CPUs as IWB-OWB-ISH.
> 
> This still seems insufficient. I guess this excludes having to
> synchronize external bridges or write buffers, but it does not specify
> what cache maintenance is required. Should there be an "outer-flush"?
> Should the CPU cache be invalidated or flushed (or both), and do
> we need to care about caches inside of the device or just inside of
> the CPU?

See the note below:

> > [1] Note: Caching operations described in this document apply to the CPU
> >     caches and any other caches in the system where device memory accesses
> >     can hit.'

So for the CPU caches we'd do the usual clean to push dirty lines to the device
and (clean+)invalidate before reading data from the device. For the "other
caches in the system" we currently assume (for ARM64) that cache maintenance
will be broadcast and therefore I wouldn't anticipate doing anything extra.

If people want to build system caches that don't respect broadcast cache
maintenance and require explicit management (e.g outer_flush), then I
consider that a broken system and we should try to disable the cache before
entering the kernel. ARMv8 explicitly prohibits this type of cache in the
architecture (type 1 below):

  `Conceptually, three classes of system cache can be envisaged:

   1. System caches which lie before the point of coherency and cannot
      be managed by any cache maintenance instructions. Such systems
      fundamentally undermine the concept of cache maintenance
      instructions operating to the point of coherency, as they imply
      the use of non-architecture mechanisms to manage coherency. The
      use of such systems in the ARM architecture is explicitly
      prohibited.

   2. System caches which lie before the point of coherency and can be
      managed by cache maintenance by address instructions that apply to
      the point of coherency, but cannot be managed by cache maintenance
      by set/way instructions. Where maintenance of the entirety of such
      a cache must be performed, as in the case for power management, it
      must be performed using non-architectural mechanisms.

   3. System caches which lie beyond the point of coherency and so are
      invisible to the software. The management of such caches is
      outside the scope of the architecture.'

(sorry to keep throwing the book at you!)

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 11:07               ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 30, 2015 at 11:47:46AM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> >    - 0x0: The device is not coherent. Therefore:
> >      * Cache maintenance is required for memory shared with the
> >        device that is mapped on CPUs as IWB-OWB-ISH.
> 
> This still seems insufficient. I guess this excludes having to
> synchronize external bridges or write buffers, but it does not specify
> what cache maintenance is required. Should there be an "outer-flush"?
> Should the CPU cache be invalidated or flushed (or both), and do
> we need to care about caches inside of the device or just inside of
> the CPU?

See the note below:

> > [1] Note: Caching operations described in this document apply to the CPU
> >     caches and any other caches in the system where device memory accesses
> >     can hit.'

So for the CPU caches we'd do the usual clean to push dirty lines to the device
and (clean+)invalidate before reading data from the device. For the "other
caches in the system" we currently assume (for ARM64) that cache maintenance
will be broadcast and therefore I wouldn't anticipate doing anything extra.

If people want to build system caches that don't respect broadcast cache
maintenance and require explicit management (e.g outer_flush), then I
consider that a broken system and we should try to disable the cache before
entering the kernel. ARMv8 explicitly prohibits this type of cache in the
architecture (type 1 below):

  `Conceptually, three classes of system cache can be envisaged:

   1. System caches which lie before the point of coherency and cannot
      be managed by any cache maintenance instructions. Such systems
      fundamentally undermine the concept of cache maintenance
      instructions operating to the point of coherency, as they imply
      the use of non-architecture mechanisms to manage coherency. The
      use of such systems in the ARM architecture is explicitly
      prohibited.

   2. System caches which lie before the point of coherency and can be
      managed by cache maintenance by address instructions that apply to
      the point of coherency, but cannot be managed by cache maintenance
      by set/way instructions. Where maintenance of the entirety of such
      a cache must be performed, as in the case for power management, it
      must be performed using non-architectural mechanisms.

   3. System caches which lie beyond the point of coherency and so are
      invisible to the software. The management of such caches is
      outside the scope of the architecture.'

(sorry to keep throwing the book at you!)

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 11:07               ` Will Deacon
  (?)
@ 2015-04-30 11:24                 ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 11:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 11:47:46AM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> > >    - 0x0: The device is not coherent. Therefore:
> > >      * Cache maintenance is required for memory shared with the
> > >        device that is mapped on CPUs as IWB-OWB-ISH.
> > 
> > This still seems insufficient. I guess this excludes having to
> > synchronize external bridges or write buffers, but it does not specify
> > what cache maintenance is required. Should there be an "outer-flush"?
> > Should the CPU cache be invalidated or flushed (or both), and do
> > we need to care about caches inside of the device or just inside of
> > the CPU?
> 
> See the note below:
> 
> > > [1] Note: Caching operations described in this document apply to the CPU
> > >     caches and any other caches in the system where device memory accesses
> > >     can hit.'
> 
> So for the CPU caches we'd do the usual clean to push dirty lines to the device
> and (clean+)invalidate before reading data from the device. For the "other
> caches in the system" we currently assume (for ARM64) that cache maintenance
> will be broadcast and therefore I wouldn't anticipate doing anything extra.
> 
> If people want to build system caches that don't respect broadcast cache
> maintenance and require explicit management (e.g outer_flush), then I
> consider that a broken system and we should try to disable the cache before
> entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> architecture (type 1 below):
> 
>   `Conceptually, three classes of system cache can be envisaged:
> 
>    1. System caches which lie before the point of coherency and cannot
>       be managed by any cache maintenance instructions. Such systems
>       fundamentally undermine the concept of cache maintenance
>       instructions operating to the point of coherency, as they imply
>       the use of non-architecture mechanisms to manage coherency. The
>       use of such systems in the ARM architecture is explicitly
>       prohibited.

Hmm, I thought this was what GPUs typically have, with their own
internal caches that are managed by the GPU rather than the normal
cache maintenance instructions. Does this prohibit the use of most
GPU devices with ARMv8, or did I misunderstand what they do?

>    2. System caches which lie before the point of coherency and can be
>       managed by cache maintenance by address instructions that apply to
>       the point of coherency, but cannot be managed by cache maintenance
>       by set/way instructions. Where maintenance of the entirety of such
>       a cache must be performed, as in the case for power management, it
>       must be performed using non-architectural mechanisms.

That still doesn't define which cache maintenance instructions are
required for a device that is marked as not coherent using the _CCA
property.

Here, I know that I have a cache that I can flush or invalidate or sync
using architected instructions, but should I?

In particular, there are two common models that we support in Linux:

a) embedded ARM32 and others

dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
dma_cache_sync() == not supportable
dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}

b) NUMA servers (parisc, itanium) and others

dma_alloc_noncoherent() == alloc cached
dma_alloc_coherent() == alloc uncached
dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync

There are probably other models that could happen, but the patch
set seems to assume a) is the only possible model, while the
architecture description you cite seems to still allow both a) and
b), as well as some variations, and it's possible that we will 
see b) on arm64 servers but not a).

You could also have a system that requires cache invalidation for
sending data from the device to memory, but does not require anything
for memory-to-device data, or you could have the opposite.

>    3. System caches which lie beyond the point of coherency and so are
>       invisible to the software. The management of such caches is
>       outside the scope of the architecture.'
> 
> (sorry to keep throwing the book at you!)

That's fine, at least I don't have to read it cover-to-cover then ;-)

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 11:24                 ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 11:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 11:47:46AM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> > >    - 0x0: The device is not coherent. Therefore:
> > >      * Cache maintenance is required for memory shared with the
> > >        device that is mapped on CPUs as IWB-OWB-ISH.
> > 
> > This still seems insufficient. I guess this excludes having to
> > synchronize external bridges or write buffers, but it does not specify
> > what cache maintenance is required. Should there be an "outer-flush"?
> > Should the CPU cache be invalidated or flushed (or both), and do
> > we need to care about caches inside of the device or just inside of
> > the CPU?
> 
> See the note below:
> 
> > > [1] Note: Caching operations described in this document apply to the CPU
> > >     caches and any other caches in the system where device memory accesses
> > >     can hit.'
> 
> So for the CPU caches we'd do the usual clean to push dirty lines to the device
> and (clean+)invalidate before reading data from the device. For the "other
> caches in the system" we currently assume (for ARM64) that cache maintenance
> will be broadcast and therefore I wouldn't anticipate doing anything extra.
> 
> If people want to build system caches that don't respect broadcast cache
> maintenance and require explicit management (e.g outer_flush), then I
> consider that a broken system and we should try to disable the cache before
> entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> architecture (type 1 below):
> 
>   `Conceptually, three classes of system cache can be envisaged:
> 
>    1. System caches which lie before the point of coherency and cannot
>       be managed by any cache maintenance instructions. Such systems
>       fundamentally undermine the concept of cache maintenance
>       instructions operating to the point of coherency, as they imply
>       the use of non-architecture mechanisms to manage coherency. The
>       use of such systems in the ARM architecture is explicitly
>       prohibited.

Hmm, I thought this was what GPUs typically have, with their own
internal caches that are managed by the GPU rather than the normal
cache maintenance instructions. Does this prohibit the use of most
GPU devices with ARMv8, or did I misunderstand what they do?

>    2. System caches which lie before the point of coherency and can be
>       managed by cache maintenance by address instructions that apply to
>       the point of coherency, but cannot be managed by cache maintenance
>       by set/way instructions. Where maintenance of the entirety of such
>       a cache must be performed, as in the case for power management, it
>       must be performed using non-architectural mechanisms.

That still doesn't define which cache maintenance instructions are
required for a device that is marked as not coherent using the _CCA
property.

Here, I know that I have a cache that I can flush or invalidate or sync
using architected instructions, but should I?

In particular, there are two common models that we support in Linux:

a) embedded ARM32 and others

dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
dma_cache_sync() == not supportable
dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}

b) NUMA servers (parisc, itanium) and others

dma_alloc_noncoherent() == alloc cached
dma_alloc_coherent() == alloc uncached
dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync

There are probably other models that could happen, but the patch
set seems to assume a) is the only possible model, while the
architecture description you cite seems to still allow both a) and
b), as well as some variations, and it's possible that we will 
see b) on arm64 servers but not a).

You could also have a system that requires cache invalidation for
sending data from the device to memory, but does not require anything
for memory-to-device data, or you could have the opposite.

>    3. System caches which lie beyond the point of coherency and so are
>       invisible to the software. The management of such caches is
>       outside the scope of the architecture.'
> 
> (sorry to keep throwing the book at you!)

That's fine, at least I don't have to read it cover-to-cover then ;-)

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 11:24                 ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 11:47:46AM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 11:41:02 Will Deacon wrote:
> > >    - 0x0: The device is not coherent. Therefore:
> > >      * Cache maintenance is required for memory shared with the
> > >        device that is mapped on CPUs as IWB-OWB-ISH.
> > 
> > This still seems insufficient. I guess this excludes having to
> > synchronize external bridges or write buffers, but it does not specify
> > what cache maintenance is required. Should there be an "outer-flush"?
> > Should the CPU cache be invalidated or flushed (or both), and do
> > we need to care about caches inside of the device or just inside of
> > the CPU?
> 
> See the note below:
> 
> > > [1] Note: Caching operations described in this document apply to the CPU
> > >     caches and any other caches in the system where device memory accesses
> > >     can hit.'
> 
> So for the CPU caches we'd do the usual clean to push dirty lines to the device
> and (clean+)invalidate before reading data from the device. For the "other
> caches in the system" we currently assume (for ARM64) that cache maintenance
> will be broadcast and therefore I wouldn't anticipate doing anything extra.
> 
> If people want to build system caches that don't respect broadcast cache
> maintenance and require explicit management (e.g outer_flush), then I
> consider that a broken system and we should try to disable the cache before
> entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> architecture (type 1 below):
> 
>   `Conceptually, three classes of system cache can be envisaged:
> 
>    1. System caches which lie before the point of coherency and cannot
>       be managed by any cache maintenance instructions. Such systems
>       fundamentally undermine the concept of cache maintenance
>       instructions operating to the point of coherency, as they imply
>       the use of non-architecture mechanisms to manage coherency. The
>       use of such systems in the ARM architecture is explicitly
>       prohibited.

Hmm, I thought this was what GPUs typically have, with their own
internal caches that are managed by the GPU rather than the normal
cache maintenance instructions. Does this prohibit the use of most
GPU devices with ARMv8, or did I misunderstand what they do?

>    2. System caches which lie before the point of coherency and can be
>       managed by cache maintenance by address instructions that apply to
>       the point of coherency, but cannot be managed by cache maintenance
>       by set/way instructions. Where maintenance of the entirety of such
>       a cache must be performed, as in the case for power management, it
>       must be performed using non-architectural mechanisms.

That still doesn't define which cache maintenance instructions are
required for a device that is marked as not coherent using the _CCA
property.

Here, I know that I have a cache that I can flush or invalidate or sync
using architected instructions, but should I?

In particular, there are two common models that we support in Linux:

a) embedded ARM32 and others

dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
dma_cache_sync() == not supportable
dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}

b) NUMA servers (parisc, itanium) and others

dma_alloc_noncoherent() == alloc cached
dma_alloc_coherent() == alloc uncached
dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync

There are probably other models that could happen, but the patch
set seems to assume a) is the only possible model, while the
architecture description you cite seems to still allow both a) and
b), as well as some variations, and it's possible that we will 
see b) on arm64 servers but not a).

You could also have a system that requires cache invalidation for
sending data from the device to memory, but does not require anything
for memory-to-device data, or you could have the opposite.

>    3. System caches which lie beyond the point of coherency and so are
>       invisible to the software. The management of such caches is
>       outside the scope of the architecture.'
> 
> (sorry to keep throwing the book at you!)

That's fine, at least I don't have to read it cover-to-cover then ;-)

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 11:24                 ` Arnd Bergmann
  (?)
@ 2015-04-30 11:46                   ` Will Deacon
  -1 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 11:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> > So for the CPU caches we'd do the usual clean to push dirty lines to the device
> > and (clean+)invalidate before reading data from the device. For the "other
> > caches in the system" we currently assume (for ARM64) that cache maintenance
> > will be broadcast and therefore I wouldn't anticipate doing anything extra.
> > 
> > If people want to build system caches that don't respect broadcast cache
> > maintenance and require explicit management (e.g outer_flush), then I
> > consider that a broken system and we should try to disable the cache before
> > entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> > architecture (type 1 below):
> > 
> >   `Conceptually, three classes of system cache can be envisaged:
> > 
> >    1. System caches which lie before the point of coherency and cannot
> >       be managed by any cache maintenance instructions. Such systems
> >       fundamentally undermine the concept of cache maintenance
> >       instructions operating to the point of coherency, as they imply
> >       the use of non-architecture mechanisms to manage coherency. The
> >       use of such systems in the ARM architecture is explicitly
> >       prohibited.
> 
> Hmm, I thought this was what GPUs typically have, with their own
> internal caches that are managed by the GPU rather than the normal
> cache maintenance instructions. Does this prohibit the use of most
> GPU devices with ARMv8, or did I misunderstand what they do?

No, because it's the responsibility of the GPU/GPU driver to ensure
that the internal caches are not visible to the CPU. I guess you can
think of data in the GPU private cache like data sitting in a CPU's write
buffer (i.e. non-snoopable).

> >    2. System caches which lie before the point of coherency and can be
> >       managed by cache maintenance by address instructions that apply to
> >       the point of coherency, but cannot be managed by cache maintenance
> >       by set/way instructions. Where maintenance of the entirety of such
> >       a cache must be performed, as in the case for power management, it
> >       must be performed using non-architectural mechanisms.
> 
> That still doesn't define which cache maintenance instructions are
> required for a device that is marked as not coherent using the _CCA
> property.
> 
> Here, I know that I have a cache that I can flush or invalidate or sync
> using architected instructions, but should I?

Table 15 in the IORT spec show the 8 combinations of CCA/CPM/DACs,
the mapping requirements and whether or not maintenance is required.

The actual maintenance operations aren't described, but they would
correspond with what we currently do in the ARM and arm64 kernels (clean to
device, clean+inv from device).

> In particular, there are two common models that we support in Linux:
> 
> a) embedded ARM32 and others
> 
> dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> dma_cache_sync() == not supportable
> dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> 
> b) NUMA servers (parisc, itanium) and others
> 
> dma_alloc_noncoherent() == alloc cached

This would lead to mismatched memory attributes on ARM/arm64.

> dma_alloc_coherent() == alloc uncached
> dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync

Cache sync doesn't exist in the ARM/arm64architecture, what are the
semantics supposed to be? Maybe it's just DSB for us (complete all pending
maintenance).

> There are probably other models that could happen, but the patch
> set seems to assume a) is the only possible model, while the
> architecture description you cite seems to still allow both a) and
> b), as well as some variations, and it's possible that we will 
> see b) on arm64 servers but not a)

Well, we should be careful not to confuse the ACPI spec with the ARM
architecture. The latter is more permissive, but does disallow system
caches that do not respect broadcast maintenance.

It's also worth pointing out that the architecture doesn't distinguish
between embedded and server machines using A-class processors.

> You could also have a system that requires cache invalidation for
> sending data from the device to memory, but does not require anything
> for memory-to-device data, or you could have the opposite.

You could theoretically build all sorts of strange devices, but that doesn't
mean we have to support them. In the case you describe, they'd have to put
up with the cost of redundant cache cleaning but it should at least function
correctly.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 11:46                   ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 11:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> > So for the CPU caches we'd do the usual clean to push dirty lines to the device
> > and (clean+)invalidate before reading data from the device. For the "other
> > caches in the system" we currently assume (for ARM64) that cache maintenance
> > will be broadcast and therefore I wouldn't anticipate doing anything extra.
> > 
> > If people want to build system caches that don't respect broadcast cache
> > maintenance and require explicit management (e.g outer_flush), then I
> > consider that a broken system and we should try to disable the cache before
> > entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> > architecture (type 1 below):
> > 
> >   `Conceptually, three classes of system cache can be envisaged:
> > 
> >    1. System caches which lie before the point of coherency and cannot
> >       be managed by any cache maintenance instructions. Such systems
> >       fundamentally undermine the concept of cache maintenance
> >       instructions operating to the point of coherency, as they imply
> >       the use of non-architecture mechanisms to manage coherency. The
> >       use of such systems in the ARM architecture is explicitly
> >       prohibited.
> 
> Hmm, I thought this was what GPUs typically have, with their own
> internal caches that are managed by the GPU rather than the normal
> cache maintenance instructions. Does this prohibit the use of most
> GPU devices with ARMv8, or did I misunderstand what they do?

No, because it's the responsibility of the GPU/GPU driver to ensure
that the internal caches are not visible to the CPU. I guess you can
think of data in the GPU private cache like data sitting in a CPU's write
buffer (i.e. non-snoopable).

> >    2. System caches which lie before the point of coherency and can be
> >       managed by cache maintenance by address instructions that apply to
> >       the point of coherency, but cannot be managed by cache maintenance
> >       by set/way instructions. Where maintenance of the entirety of such
> >       a cache must be performed, as in the case for power management, it
> >       must be performed using non-architectural mechanisms.
> 
> That still doesn't define which cache maintenance instructions are
> required for a device that is marked as not coherent using the _CCA
> property.
> 
> Here, I know that I have a cache that I can flush or invalidate or sync
> using architected instructions, but should I?

Table 15 in the IORT spec show the 8 combinations of CCA/CPM/DACs,
the mapping requirements and whether or not maintenance is required.

The actual maintenance operations aren't described, but they would
correspond with what we currently do in the ARM and arm64 kernels (clean to
device, clean+inv from device).

> In particular, there are two common models that we support in Linux:
> 
> a) embedded ARM32 and others
> 
> dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> dma_cache_sync() == not supportable
> dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> 
> b) NUMA servers (parisc, itanium) and others
> 
> dma_alloc_noncoherent() == alloc cached

This would lead to mismatched memory attributes on ARM/arm64.

> dma_alloc_coherent() == alloc uncached
> dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync

Cache sync doesn't exist in the ARM/arm64architecture, what are the
semantics supposed to be? Maybe it's just DSB for us (complete all pending
maintenance).

> There are probably other models that could happen, but the patch
> set seems to assume a) is the only possible model, while the
> architecture description you cite seems to still allow both a) and
> b), as well as some variations, and it's possible that we will 
> see b) on arm64 servers but not a)

Well, we should be careful not to confuse the ACPI spec with the ARM
architecture. The latter is more permissive, but does disallow system
caches that do not respect broadcast maintenance.

It's also worth pointing out that the architecture doesn't distinguish
between embedded and server machines using A-class processors.

> You could also have a system that requires cache invalidation for
> sending data from the device to memory, but does not require anything
> for memory-to-device data, or you could have the opposite.

You could theoretically build all sorts of strange devices, but that doesn't
mean we have to support them. In the case you describe, they'd have to put
up with the cost of redundant cache cleaning but it should at least function
correctly.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 11:46                   ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 11:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> > So for the CPU caches we'd do the usual clean to push dirty lines to the device
> > and (clean+)invalidate before reading data from the device. For the "other
> > caches in the system" we currently assume (for ARM64) that cache maintenance
> > will be broadcast and therefore I wouldn't anticipate doing anything extra.
> > 
> > If people want to build system caches that don't respect broadcast cache
> > maintenance and require explicit management (e.g outer_flush), then I
> > consider that a broken system and we should try to disable the cache before
> > entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> > architecture (type 1 below):
> > 
> >   `Conceptually, three classes of system cache can be envisaged:
> > 
> >    1. System caches which lie before the point of coherency and cannot
> >       be managed by any cache maintenance instructions. Such systems
> >       fundamentally undermine the concept of cache maintenance
> >       instructions operating to the point of coherency, as they imply
> >       the use of non-architecture mechanisms to manage coherency. The
> >       use of such systems in the ARM architecture is explicitly
> >       prohibited.
> 
> Hmm, I thought this was what GPUs typically have, with their own
> internal caches that are managed by the GPU rather than the normal
> cache maintenance instructions. Does this prohibit the use of most
> GPU devices with ARMv8, or did I misunderstand what they do?

No, because it's the responsibility of the GPU/GPU driver to ensure
that the internal caches are not visible to the CPU. I guess you can
think of data in the GPU private cache like data sitting in a CPU's write
buffer (i.e. non-snoopable).

> >    2. System caches which lie before the point of coherency and can be
> >       managed by cache maintenance by address instructions that apply to
> >       the point of coherency, but cannot be managed by cache maintenance
> >       by set/way instructions. Where maintenance of the entirety of such
> >       a cache must be performed, as in the case for power management, it
> >       must be performed using non-architectural mechanisms.
> 
> That still doesn't define which cache maintenance instructions are
> required for a device that is marked as not coherent using the _CCA
> property.
> 
> Here, I know that I have a cache that I can flush or invalidate or sync
> using architected instructions, but should I?

Table 15 in the IORT spec show the 8 combinations of CCA/CPM/DACs,
the mapping requirements and whether or not maintenance is required.

The actual maintenance operations aren't described, but they would
correspond with what we currently do in the ARM and arm64 kernels (clean to
device, clean+inv from device).

> In particular, there are two common models that we support in Linux:
> 
> a) embedded ARM32 and others
> 
> dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> dma_cache_sync() == not supportable
> dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> 
> b) NUMA servers (parisc, itanium) and others
> 
> dma_alloc_noncoherent() == alloc cached

This would lead to mismatched memory attributes on ARM/arm64.

> dma_alloc_coherent() == alloc uncached
> dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync

Cache sync doesn't exist in the ARM/arm64architecture, what are the
semantics supposed to be? Maybe it's just DSB for us (complete all pending
maintenance).

> There are probably other models that could happen, but the patch
> set seems to assume a) is the only possible model, while the
> architecture description you cite seems to still allow both a) and
> b), as well as some variations, and it's possible that we will 
> see b) on arm64 servers but not a)

Well, we should be careful not to confuse the ACPI spec with the ARM
architecture. The latter is more permissive, but does disallow system
caches that do not respect broadcast maintenance.

It's also worth pointing out that the architecture doesn't distinguish
between embedded and server machines using A-class processors.

> You could also have a system that requires cache invalidation for
> sending data from the device to memory, but does not require anything
> for memory-to-device data, or you could have the opposite.

You could theoretically build all sorts of strange devices, but that doesn't
mean we have to support them. In the case you describe, they'd have to put
up with the cost of redundant cache cleaning but it should at least function
correctly.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 11:46                   ` Will Deacon
  (?)
@ 2015-04-30 13:03                     ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 13:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> > > So for the CPU caches we'd do the usual clean to push dirty lines to the device
> > > and (clean+)invalidate before reading data from the device. For the "other
> > > caches in the system" we currently assume (for ARM64) that cache maintenance
> > > will be broadcast and therefore I wouldn't anticipate doing anything extra.
> > > 
> > > If people want to build system caches that don't respect broadcast cache
> > > maintenance and require explicit management (e.g outer_flush), then I
> > > consider that a broken system and we should try to disable the cache before
> > > entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> > > architecture (type 1 below):
> > > 
> > >   `Conceptually, three classes of system cache can be envisaged:
> > > 
> > >    1. System caches which lie before the point of coherency and cannot
> > >       be managed by any cache maintenance instructions. Such systems
> > >       fundamentally undermine the concept of cache maintenance
> > >       instructions operating to the point of coherency, as they imply
> > >       the use of non-architecture mechanisms to manage coherency. The
> > >       use of such systems in the ARM architecture is explicitly
> > >       prohibited.
> > 
> > Hmm, I thought this was what GPUs typically have, with their own
> > internal caches that are managed by the GPU rather than the normal
> > cache maintenance instructions. Does this prohibit the use of most
> > GPU devices with ARMv8, or did I misunderstand what they do?
> 
> No, because it's the responsibility of the GPU/GPU driver to ensure
> that the internal caches are not visible to the CPU. I guess you can
> think of data in the GPU private cache like data sitting in a CPU's write
> buffer (i.e. non-snoopable).

Ok.

> > In particular, there are two common models that we support in Linux:
> > 
> > a) embedded ARM32 and others
> > 
> > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > dma_cache_sync() == not supportable
> > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > 
> > b) NUMA servers (parisc, itanium) and others
> > 
> > dma_alloc_noncoherent() == alloc cached
> 
> This would lead to mismatched memory attributes on ARM/arm64.

How so? This is just what __dma_alloc() on arm64 does for
coherent devices:

        /* no need for non-cacheable mapping if coherent */
        if (coherent)
                return ptr;

> > dma_alloc_coherent() == alloc uncached
> > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> 
> Cache sync doesn't exist in the ARM/arm64architecture, what are the
> semantics supposed to be? Maybe it's just DSB for us (complete all pending
> maintenance).

It ensures that a state of a buffer as observed by CPU and device is
identical. It's possible that we removed all platforms that did something
interesting here, so it's one of these:

a) On architectures that are mostly coherent, it's a barrier
   that is broadcast to all devices, like I assume DSB is. IA64
   currently does this for all machines, but IIRC it used to 
   access some cluster interconnect at some point to enforce a
   flush.
   The ARM32 based ArmadaXP also falls into this model if the cache
   coherency fabric is enabled, as that needs to be synchronized
b) On architectures where the device may not see the state of the cache,
   but the CPU is always aware of anything the device sends it,
   it flushes the cache. This seems to be the case on parisc,
   and in particular, there are some variants that do not support
   dma_alloc_coherent but only dma_alloc_noncoherent.
c) On architectures that need the synchronization both ways,
   it does (almost) the same invalidate/clean/flush thing as
   ARM, except it doesn't have to worry about cache lines from
   speculative prefetch which make it impossible to implement on
   ARM.

> > There are probably other models that could happen, but the patch
> > set seems to assume a) is the only possible model, while the
> > architecture description you cite seems to still allow both a) and
> > b), as well as some variations, and it's possible that we will 
> > see b) on arm64 servers but not a)
> 
> Well, we should be careful not to confuse the ACPI spec with the ARM
> architecture. The latter is more permissive, but does disallow system
> caches that do not respect broadcast maintenance.
> 
> It's also worth pointing out that the architecture doesn't distinguish
> between embedded and server machines using A-class processors.
> 
> > You could also have a system that requires cache invalidation for
> > sending data from the device to memory, but does not require anything
> > for memory-to-device data, or you could have the opposite.
> 
> You could theoretically build all sorts of strange devices, but that doesn't
> mean we have to support them. In the case you describe, they'd have to put
> up with the cost of redundant cache cleaning but it should at least function
> correctly.

Which case would a variant of ArmadaXP with a 64-bit core fall into then?
Do I understand it right that requiring to sync the coherency fabric
would make it noncompliant with ACPI but still architecturally compliant?

I guess we could handle that case as well, by requiring any ACPI based
firmware to turn off the coherency fabric on that system and just making
it dog slow.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 13:03                     ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 13:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> > > So for the CPU caches we'd do the usual clean to push dirty lines to the device
> > > and (clean+)invalidate before reading data from the device. For the "other
> > > caches in the system" we currently assume (for ARM64) that cache maintenance
> > > will be broadcast and therefore I wouldn't anticipate doing anything extra.
> > > 
> > > If people want to build system caches that don't respect broadcast cache
> > > maintenance and require explicit management (e.g outer_flush), then I
> > > consider that a broken system and we should try to disable the cache before
> > > entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> > > architecture (type 1 below):
> > > 
> > >   `Conceptually, three classes of system cache can be envisaged:
> > > 
> > >    1. System caches which lie before the point of coherency and cannot
> > >       be managed by any cache maintenance instructions. Such systems
> > >       fundamentally undermine the concept of cache maintenance
> > >       instructions operating to the point of coherency, as they imply
> > >       the use of non-architecture mechanisms to manage coherency. The
> > >       use of such systems in the ARM architecture is explicitly
> > >       prohibited.
> > 
> > Hmm, I thought this was what GPUs typically have, with their own
> > internal caches that are managed by the GPU rather than the normal
> > cache maintenance instructions. Does this prohibit the use of most
> > GPU devices with ARMv8, or did I misunderstand what they do?
> 
> No, because it's the responsibility of the GPU/GPU driver to ensure
> that the internal caches are not visible to the CPU. I guess you can
> think of data in the GPU private cache like data sitting in a CPU's write
> buffer (i.e. non-snoopable).

Ok.

> > In particular, there are two common models that we support in Linux:
> > 
> > a) embedded ARM32 and others
> > 
> > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > dma_cache_sync() == not supportable
> > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > 
> > b) NUMA servers (parisc, itanium) and others
> > 
> > dma_alloc_noncoherent() == alloc cached
> 
> This would lead to mismatched memory attributes on ARM/arm64.

How so? This is just what __dma_alloc() on arm64 does for
coherent devices:

        /* no need for non-cacheable mapping if coherent */
        if (coherent)
                return ptr;

> > dma_alloc_coherent() == alloc uncached
> > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> 
> Cache sync doesn't exist in the ARM/arm64architecture, what are the
> semantics supposed to be? Maybe it's just DSB for us (complete all pending
> maintenance).

It ensures that a state of a buffer as observed by CPU and device is
identical. It's possible that we removed all platforms that did something
interesting here, so it's one of these:

a) On architectures that are mostly coherent, it's a barrier
   that is broadcast to all devices, like I assume DSB is. IA64
   currently does this for all machines, but IIRC it used to 
   access some cluster interconnect at some point to enforce a
   flush.
   The ARM32 based ArmadaXP also falls into this model if the cache
   coherency fabric is enabled, as that needs to be synchronized
b) On architectures where the device may not see the state of the cache,
   but the CPU is always aware of anything the device sends it,
   it flushes the cache. This seems to be the case on parisc,
   and in particular, there are some variants that do not support
   dma_alloc_coherent but only dma_alloc_noncoherent.
c) On architectures that need the synchronization both ways,
   it does (almost) the same invalidate/clean/flush thing as
   ARM, except it doesn't have to worry about cache lines from
   speculative prefetch which make it impossible to implement on
   ARM.

> > There are probably other models that could happen, but the patch
> > set seems to assume a) is the only possible model, while the
> > architecture description you cite seems to still allow both a) and
> > b), as well as some variations, and it's possible that we will 
> > see b) on arm64 servers but not a)
> 
> Well, we should be careful not to confuse the ACPI spec with the ARM
> architecture. The latter is more permissive, but does disallow system
> caches that do not respect broadcast maintenance.
> 
> It's also worth pointing out that the architecture doesn't distinguish
> between embedded and server machines using A-class processors.
> 
> > You could also have a system that requires cache invalidation for
> > sending data from the device to memory, but does not require anything
> > for memory-to-device data, or you could have the opposite.
> 
> You could theoretically build all sorts of strange devices, but that doesn't
> mean we have to support them. In the case you describe, they'd have to put
> up with the cost of redundant cache cleaning but it should at least function
> correctly.

Which case would a variant of ArmadaXP with a 64-bit core fall into then?
Do I understand it right that requiring to sync the coherency fabric
would make it noncompliant with ACPI but still architecturally compliant?

I guess we could handle that case as well, by requiring any ACPI based
firmware to turn off the coherency fabric on that system and just making
it dog slow.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 13:03                     ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 13:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 12:07:18 Will Deacon wrote:
> > > So for the CPU caches we'd do the usual clean to push dirty lines to the device
> > > and (clean+)invalidate before reading data from the device. For the "other
> > > caches in the system" we currently assume (for ARM64) that cache maintenance
> > > will be broadcast and therefore I wouldn't anticipate doing anything extra.
> > > 
> > > If people want to build system caches that don't respect broadcast cache
> > > maintenance and require explicit management (e.g outer_flush), then I
> > > consider that a broken system and we should try to disable the cache before
> > > entering the kernel. ARMv8 explicitly prohibits this type of cache in the
> > > architecture (type 1 below):
> > > 
> > >   `Conceptually, three classes of system cache can be envisaged:
> > > 
> > >    1. System caches which lie before the point of coherency and cannot
> > >       be managed by any cache maintenance instructions. Such systems
> > >       fundamentally undermine the concept of cache maintenance
> > >       instructions operating to the point of coherency, as they imply
> > >       the use of non-architecture mechanisms to manage coherency. The
> > >       use of such systems in the ARM architecture is explicitly
> > >       prohibited.
> > 
> > Hmm, I thought this was what GPUs typically have, with their own
> > internal caches that are managed by the GPU rather than the normal
> > cache maintenance instructions. Does this prohibit the use of most
> > GPU devices with ARMv8, or did I misunderstand what they do?
> 
> No, because it's the responsibility of the GPU/GPU driver to ensure
> that the internal caches are not visible to the CPU. I guess you can
> think of data in the GPU private cache like data sitting in a CPU's write
> buffer (i.e. non-snoopable).

Ok.

> > In particular, there are two common models that we support in Linux:
> > 
> > a) embedded ARM32 and others
> > 
> > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > dma_cache_sync() == not supportable
> > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > 
> > b) NUMA servers (parisc, itanium) and others
> > 
> > dma_alloc_noncoherent() == alloc cached
> 
> This would lead to mismatched memory attributes on ARM/arm64.

How so? This is just what __dma_alloc() on arm64 does for
coherent devices:

        /* no need for non-cacheable mapping if coherent */
        if (coherent)
                return ptr;

> > dma_alloc_coherent() == alloc uncached
> > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> 
> Cache sync doesn't exist in the ARM/arm64architecture, what are the
> semantics supposed to be? Maybe it's just DSB for us (complete all pending
> maintenance).

It ensures that a state of a buffer as observed by CPU and device is
identical. It's possible that we removed all platforms that did something
interesting here, so it's one of these:

a) On architectures that are mostly coherent, it's a barrier
   that is broadcast to all devices, like I assume DSB is. IA64
   currently does this for all machines, but IIRC it used to 
   access some cluster interconnect at some point to enforce a
   flush.
   The ARM32 based ArmadaXP also falls into this model if the cache
   coherency fabric is enabled, as that needs to be synchronized
b) On architectures where the device may not see the state of the cache,
   but the CPU is always aware of anything the device sends it,
   it flushes the cache. This seems to be the case on parisc,
   and in particular, there are some variants that do not support
   dma_alloc_coherent but only dma_alloc_noncoherent.
c) On architectures that need the synchronization both ways,
   it does (almost) the same invalidate/clean/flush thing as
   ARM, except it doesn't have to worry about cache lines from
   speculative prefetch which make it impossible to implement on
   ARM.

> > There are probably other models that could happen, but the patch
> > set seems to assume a) is the only possible model, while the
> > architecture description you cite seems to still allow both a) and
> > b), as well as some variations, and it's possible that we will 
> > see b) on arm64 servers but not a)
> 
> Well, we should be careful not to confuse the ACPI spec with the ARM
> architecture. The latter is more permissive, but does disallow system
> caches that do not respect broadcast maintenance.
> 
> It's also worth pointing out that the architecture doesn't distinguish
> between embedded and server machines using A-class processors.
> 
> > You could also have a system that requires cache invalidation for
> > sending data from the device to memory, but does not require anything
> > for memory-to-device data, or you could have the opposite.
> 
> You could theoretically build all sorts of strange devices, but that doesn't
> mean we have to support them. In the case you describe, they'd have to put
> up with the cost of redundant cache cleaning but it should at least function
> correctly.

Which case would a variant of ArmadaXP with a 64-bit core fall into then?
Do I understand it right that requiring to sync the coherency fabric
would make it noncompliant with ACPI but still architecturally compliant?

I guess we could handle that case as well, by requiring any ACPI based
firmware to turn off the coherency fabric on that system and just making
it dog slow.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 13:03                     ` Arnd Bergmann
  (?)
@ 2015-04-30 13:13                       ` Will Deacon
  -1 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 13:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > > In particular, there are two common models that we support in Linux:
> > > 
> > > a) embedded ARM32 and others
> > > 
> > > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > > dma_cache_sync() == not supportable
> > > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > > 
> > > b) NUMA servers (parisc, itanium) and others
> > > 
> > > dma_alloc_noncoherent() == alloc cached
> > 
> > This would lead to mismatched memory attributes on ARM/arm64.
> 
> How so? This is just what __dma_alloc() on arm64 does for
> coherent devices:
> 
>         /* no need for non-cacheable mapping if coherent */
>         if (coherent)
>                 return ptr;

Ok, I thought that you were only describing the cases when the device is
non-coherent (_CCA=0). Otherwise, your assertion above that
dma_alloc_coherent == alloc uncached isn't true for coherent devices.

So now I'm confused...

> > > dma_alloc_coherent() == alloc uncached
> > > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> > 
> > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > maintenance).
> 
> It ensures that a state of a buffer as observed by CPU and device is
> identical. It's possible that we removed all platforms that did something
> interesting here, so it's one of these:
> 
> a) On architectures that are mostly coherent, it's a barrier
>    that is broadcast to all devices, like I assume DSB is. IA64
>    currently does this for all machines, but IIRC it used to 
>    access some cluster interconnect at some point to enforce a
>    flush.
>    The ARM32 based ArmadaXP also falls into this model if the cache
>    coherency fabric is enabled, as that needs to be synchronized
> b) On architectures where the device may not see the state of the cache,
>    but the CPU is always aware of anything the device sends it,
>    it flushes the cache. This seems to be the case on parisc,
>    and in particular, there are some variants that do not support
>    dma_alloc_coherent but only dma_alloc_noncoherent.
> c) On architectures that need the synchronization both ways,
>    it does (almost) the same invalidate/clean/flush thing as
>    ARM, except it doesn't have to worry about cache lines from
>    speculative prefetch which make it impossible to implement on
>    ARM.

Okey doke, thanks for the explanation. It sounds like we can just build
the primitive out of the existing cache maintenance routines if we need
to implement it.

> > > There are probably other models that could happen, but the patch
> > > set seems to assume a) is the only possible model, while the
> > > architecture description you cite seems to still allow both a) and
> > > b), as well as some variations, and it's possible that we will 
> > > see b) on arm64 servers but not a)
> > 
> > Well, we should be careful not to confuse the ACPI spec with the ARM
> > architecture. The latter is more permissive, but does disallow system
> > caches that do not respect broadcast maintenance.
> > 
> > It's also worth pointing out that the architecture doesn't distinguish
> > between embedded and server machines using A-class processors.
> > 
> > > You could also have a system that requires cache invalidation for
> > > sending data from the device to memory, but does not require anything
> > > for memory-to-device data, or you could have the opposite.
> > 
> > You could theoretically build all sorts of strange devices, but that doesn't
> > mean we have to support them. In the case you describe, they'd have to put
> > up with the cost of redundant cache cleaning but it should at least function
> > correctly.
> 
> Which case would a variant of ArmadaXP with a 64-bit core fall into then?
> Do I understand it right that requiring to sync the coherency fabric
> would make it noncompliant with ACPI but still architecturally compliant?

I would say that the ArmadaXP coherency fabric is not compliant with ARMv8
as it requires additional steps over those cache maintenance instructions
described by the architecture (i.e. it falls into class (1) of the three
classes of system cache in the architecture).

> I guess we could handle that case as well, by requiring any ACPI based
> firmware to turn off the coherency fabric on that system and just making
> it dog slow.

We already require something similar in Documentation/arm64/booting.txt:

  `System caches which do not respect architected cache maintenance by VA
   operations (not recommended) must be configured and disabled.'

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 13:13                       ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 13:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > > In particular, there are two common models that we support in Linux:
> > > 
> > > a) embedded ARM32 and others
> > > 
> > > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > > dma_cache_sync() == not supportable
> > > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > > 
> > > b) NUMA servers (parisc, itanium) and others
> > > 
> > > dma_alloc_noncoherent() == alloc cached
> > 
> > This would lead to mismatched memory attributes on ARM/arm64.
> 
> How so? This is just what __dma_alloc() on arm64 does for
> coherent devices:
> 
>         /* no need for non-cacheable mapping if coherent */
>         if (coherent)
>                 return ptr;

Ok, I thought that you were only describing the cases when the device is
non-coherent (_CCA=0). Otherwise, your assertion above that
dma_alloc_coherent == alloc uncached isn't true for coherent devices.

So now I'm confused...

> > > dma_alloc_coherent() == alloc uncached
> > > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> > 
> > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > maintenance).
> 
> It ensures that a state of a buffer as observed by CPU and device is
> identical. It's possible that we removed all platforms that did something
> interesting here, so it's one of these:
> 
> a) On architectures that are mostly coherent, it's a barrier
>    that is broadcast to all devices, like I assume DSB is. IA64
>    currently does this for all machines, but IIRC it used to 
>    access some cluster interconnect at some point to enforce a
>    flush.
>    The ARM32 based ArmadaXP also falls into this model if the cache
>    coherency fabric is enabled, as that needs to be synchronized
> b) On architectures where the device may not see the state of the cache,
>    but the CPU is always aware of anything the device sends it,
>    it flushes the cache. This seems to be the case on parisc,
>    and in particular, there are some variants that do not support
>    dma_alloc_coherent but only dma_alloc_noncoherent.
> c) On architectures that need the synchronization both ways,
>    it does (almost) the same invalidate/clean/flush thing as
>    ARM, except it doesn't have to worry about cache lines from
>    speculative prefetch which make it impossible to implement on
>    ARM.

Okey doke, thanks for the explanation. It sounds like we can just build
the primitive out of the existing cache maintenance routines if we need
to implement it.

> > > There are probably other models that could happen, but the patch
> > > set seems to assume a) is the only possible model, while the
> > > architecture description you cite seems to still allow both a) and
> > > b), as well as some variations, and it's possible that we will 
> > > see b) on arm64 servers but not a)
> > 
> > Well, we should be careful not to confuse the ACPI spec with the ARM
> > architecture. The latter is more permissive, but does disallow system
> > caches that do not respect broadcast maintenance.
> > 
> > It's also worth pointing out that the architecture doesn't distinguish
> > between embedded and server machines using A-class processors.
> > 
> > > You could also have a system that requires cache invalidation for
> > > sending data from the device to memory, but does not require anything
> > > for memory-to-device data, or you could have the opposite.
> > 
> > You could theoretically build all sorts of strange devices, but that doesn't
> > mean we have to support them. In the case you describe, they'd have to put
> > up with the cost of redundant cache cleaning but it should at least function
> > correctly.
> 
> Which case would a variant of ArmadaXP with a 64-bit core fall into then?
> Do I understand it right that requiring to sync the coherency fabric
> would make it noncompliant with ACPI but still architecturally compliant?

I would say that the ArmadaXP coherency fabric is not compliant with ARMv8
as it requires additional steps over those cache maintenance instructions
described by the architecture (i.e. it falls into class (1) of the three
classes of system cache in the architecture).

> I guess we could handle that case as well, by requiring any ACPI based
> firmware to turn off the coherency fabric on that system and just making
> it dog slow.

We already require something similar in Documentation/arm64/booting.txt:

  `System caches which do not respect architected cache maintenance by VA
   operations (not recommended) must be configured and disabled.'

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 13:13                       ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > > In particular, there are two common models that we support in Linux:
> > > 
> > > a) embedded ARM32 and others
> > > 
> > > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > > dma_cache_sync() == not supportable
> > > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > > 
> > > b) NUMA servers (parisc, itanium) and others
> > > 
> > > dma_alloc_noncoherent() == alloc cached
> > 
> > This would lead to mismatched memory attributes on ARM/arm64.
> 
> How so? This is just what __dma_alloc() on arm64 does for
> coherent devices:
> 
>         /* no need for non-cacheable mapping if coherent */
>         if (coherent)
>                 return ptr;

Ok, I thought that you were only describing the cases when the device is
non-coherent (_CCA=0). Otherwise, your assertion above that
dma_alloc_coherent == alloc uncached isn't true for coherent devices.

So now I'm confused...

> > > dma_alloc_coherent() == alloc uncached
> > > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> > 
> > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > maintenance).
> 
> It ensures that a state of a buffer as observed by CPU and device is
> identical. It's possible that we removed all platforms that did something
> interesting here, so it's one of these:
> 
> a) On architectures that are mostly coherent, it's a barrier
>    that is broadcast to all devices, like I assume DSB is. IA64
>    currently does this for all machines, but IIRC it used to 
>    access some cluster interconnect at some point to enforce a
>    flush.
>    The ARM32 based ArmadaXP also falls into this model if the cache
>    coherency fabric is enabled, as that needs to be synchronized
> b) On architectures where the device may not see the state of the cache,
>    but the CPU is always aware of anything the device sends it,
>    it flushes the cache. This seems to be the case on parisc,
>    and in particular, there are some variants that do not support
>    dma_alloc_coherent but only dma_alloc_noncoherent.
> c) On architectures that need the synchronization both ways,
>    it does (almost) the same invalidate/clean/flush thing as
>    ARM, except it doesn't have to worry about cache lines from
>    speculative prefetch which make it impossible to implement on
>    ARM.

Okey doke, thanks for the explanation. It sounds like we can just build
the primitive out of the existing cache maintenance routines if we need
to implement it.

> > > There are probably other models that could happen, but the patch
> > > set seems to assume a) is the only possible model, while the
> > > architecture description you cite seems to still allow both a) and
> > > b), as well as some variations, and it's possible that we will 
> > > see b) on arm64 servers but not a)
> > 
> > Well, we should be careful not to confuse the ACPI spec with the ARM
> > architecture. The latter is more permissive, but does disallow system
> > caches that do not respect broadcast maintenance.
> > 
> > It's also worth pointing out that the architecture doesn't distinguish
> > between embedded and server machines using A-class processors.
> > 
> > > You could also have a system that requires cache invalidation for
> > > sending data from the device to memory, but does not require anything
> > > for memory-to-device data, or you could have the opposite.
> > 
> > You could theoretically build all sorts of strange devices, but that doesn't
> > mean we have to support them. In the case you describe, they'd have to put
> > up with the cost of redundant cache cleaning but it should at least function
> > correctly.
> 
> Which case would a variant of ArmadaXP with a 64-bit core fall into then?
> Do I understand it right that requiring to sync the coherency fabric
> would make it noncompliant with ACPI but still architecturally compliant?

I would say that the ArmadaXP coherency fabric is not compliant with ARMv8
as it requires additional steps over those cache maintenance instructions
described by the architecture (i.e. it falls into class (1) of the three
classes of system cache in the architecture).

> I guess we could handle that case as well, by requiring any ACPI based
> firmware to turn off the coherency fabric on that system and just making
> it dog slow.

We already require something similar in Documentation/arm64/booting.txt:

  `System caches which do not respect architected cache maintenance by VA
   operations (not recommended) must be configured and disabled.'

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-29 14:42         ` Catalin Marinas
  (?)
@ 2015-04-30 13:47           ` Hanjun Guo
  -1 siblings, 0 replies; 102+ messages in thread
From: Hanjun Guo @ 2015-04-30 13:47 UTC (permalink / raw)
  To: Catalin Marinas, Suravee Suthikulpanit
  Cc: al.stone, arnd, linaro-acpi, rjw, will.deacon, linux-kernel,
	linux-acpi, leo.duran, msalter, grant.likely,
	Charles Garcia-Tobin, linux-arm-kernel, lenb

On 2015年04月29日 22:42, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>
>>
>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>>>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>>>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>>>> in arm and arm64 Kconfig to specify such requirement.
>>>>
>>>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>>>> will default to setting up devices as non-coherent.
>>>>
>>>> Signed-off-by: Mark Salter <msalter@redhat.com>
>>>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>>>> ---
>>>>   arch/arm/Kconfig     | 1 +
>>>>   arch/arm64/Kconfig   | 1 +
>>>>   drivers/acpi/Kconfig | 3 +++
>>>>   3 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>> index 45df48b..2a0d036 100644
>>>> --- a/arch/arm/Kconfig
>>>> +++ b/arch/arm/Kconfig
>>>> @@ -1,6 +1,7 @@
>>>>   config ARM
>>>>   	bool
>>>>   	default y
>>>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>>>   	select ARCH_HAS_ELF_RANDOMIZE
>>>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>
>>> Any plans for ACPI on 32-bit ARM?
>>
>> Not that I am aware, but I could be totally wrong. The reason I am adding
>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>
>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>> be removed.
>
> I think it should be removed (as long as ACPI cannot be selected on
> arm32).

I agree.

Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
for ARM64 based enterprise system at now.

Thanks
Hanjun
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 13:47           ` Hanjun Guo
  0 siblings, 0 replies; 102+ messages in thread
From: Hanjun Guo @ 2015-04-30 13:47 UTC (permalink / raw)
  To: Catalin Marinas, Suravee Suthikulpanit
  Cc: al.stone, arnd, linaro-acpi, rjw, will.deacon, linux-kernel,
	linux-acpi, leo.duran, msalter, grant.likely,
	Charles Garcia-Tobin, linux-arm-kernel, lenb

On 2015年04月29日 22:42, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>
>>
>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>>>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>>>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>>>> in arm and arm64 Kconfig to specify such requirement.
>>>>
>>>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>>>> will default to setting up devices as non-coherent.
>>>>
>>>> Signed-off-by: Mark Salter <msalter@redhat.com>
>>>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>>>> ---
>>>>   arch/arm/Kconfig     | 1 +
>>>>   arch/arm64/Kconfig   | 1 +
>>>>   drivers/acpi/Kconfig | 3 +++
>>>>   3 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>> index 45df48b..2a0d036 100644
>>>> --- a/arch/arm/Kconfig
>>>> +++ b/arch/arm/Kconfig
>>>> @@ -1,6 +1,7 @@
>>>>   config ARM
>>>>   	bool
>>>>   	default y
>>>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>>>   	select ARCH_HAS_ELF_RANDOMIZE
>>>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>
>>> Any plans for ACPI on 32-bit ARM?
>>
>> Not that I am aware, but I could be totally wrong. The reason I am adding
>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>
>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>> be removed.
>
> I think it should be removed (as long as ACPI cannot be selected on
> arm32).

I agree.

Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
for ARM64 based enterprise system at now.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 13:47           ` Hanjun Guo
  0 siblings, 0 replies; 102+ messages in thread
From: Hanjun Guo @ 2015-04-30 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 2015?04?29? 22:42, Catalin Marinas wrote:
> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>
>>
>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>  From ACPIv6 (http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf),
>>>> section 6.2.17 _CCA states that ARM platforms require ACPI _CCA object to be
>>>> specified for DMA-cabpable devices. This patch introduces ACPI_MUST_HAVE_CCA
>>>> in arm and arm64 Kconfig to specify such requirement.
>>>>
>>>> Note that when _CCA is required, if it is missing in the DSDT. ACPI driver
>>>> will default to setting up devices as non-coherent.
>>>>
>>>> Signed-off-by: Mark Salter <msalter@redhat.com>
>>>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>>>> ---
>>>>   arch/arm/Kconfig     | 1 +
>>>>   arch/arm64/Kconfig   | 1 +
>>>>   drivers/acpi/Kconfig | 3 +++
>>>>   3 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>> index 45df48b..2a0d036 100644
>>>> --- a/arch/arm/Kconfig
>>>> +++ b/arch/arm/Kconfig
>>>> @@ -1,6 +1,7 @@
>>>>   config ARM
>>>>   	bool
>>>>   	default y
>>>> +	select ACPI_MUST_HAVE_CCA if ACPI
>>>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>>>   	select ARCH_HAS_ELF_RANDOMIZE
>>>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>
>>> Any plans for ACPI on 32-bit ARM?
>>
>> Not that I am aware, but I could be totally wrong. The reason I am adding
>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>
>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>> be removed.
>
> I think it should be removed (as long as ACPI cannot be selected on
> arm32).

I agree.

Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
for ARM64 based enterprise system at now.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-30 13:47           ` Hanjun Guo
  (?)
@ 2015-04-30 13:50             ` Will Deacon
  -1 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 13:50 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Catalin Marinas, suravee.suthikulpanit, al.stone, arnd,
	linaro-acpi, rjw, linux-kernel, linux-acpi, leo.duran, msalter,
	grant.likely, Charles Garcia-Tobin, linux-arm-kernel, lenb

On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
> On 2015年04月29日 22:42, Catalin Marinas wrote:
> > On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> >> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> >>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> >>> Any plans for ACPI on 32-bit ARM?
> >>
> >> Not that I am aware, but I could be totally wrong. The reason I am adding
> >> this here for 32-bit ARM is because the ACPI spec mentioned this.
> >>
> >> If you think this is not necessary until we introduce ACPI for ARM32, it can
> >> be removed.
> >
> > I think it should be removed (as long as ACPI cannot be selected on
> > arm32).
> 
> I agree.
> 
> Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
> for ARM64 based enterprise system at now.

While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
on arm64? It's a deprecated /proc/acpi interface and it would be nice to
avoid introducing deprecated behaviour if we can avoid it.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 13:50             ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 13:50 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Catalin Marinas, suravee.suthikulpanit, al.stone, arnd,
	linaro-acpi, rjw, linux-kernel, linux-acpi, leo.duran, msalter,
	grant.likely, Charles Garcia-Tobin, linux-arm-kernel, lenb

On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
> On 2015年04月29日 22:42, Catalin Marinas wrote:
> > On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> >> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> >>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> >>> Any plans for ACPI on 32-bit ARM?
> >>
> >> Not that I am aware, but I could be totally wrong. The reason I am adding
> >> this here for 32-bit ARM is because the ACPI spec mentioned this.
> >>
> >> If you think this is not necessary until we introduce ACPI for ARM32, it can
> >> be removed.
> >
> > I think it should be removed (as long as ACPI cannot be selected on
> > arm32).
> 
> I agree.
> 
> Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
> for ARM64 based enterprise system at now.

While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
on arm64? It's a deprecated /proc/acpi interface and it would be nice to
avoid introducing deprecated behaviour if we can avoid it.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 13:50             ` Will Deacon
  0 siblings, 0 replies; 102+ messages in thread
From: Will Deacon @ 2015-04-30 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
> On 2015?04?29? 22:42, Catalin Marinas wrote:
> > On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> >> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> >>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> >>> Any plans for ACPI on 32-bit ARM?
> >>
> >> Not that I am aware, but I could be totally wrong. The reason I am adding
> >> this here for 32-bit ARM is because the ACPI spec mentioned this.
> >>
> >> If you think this is not necessary until we introduce ACPI for ARM32, it can
> >> be removed.
> >
> > I think it should be removed (as long as ACPI cannot be selected on
> > arm32).
> 
> I agree.
> 
> Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
> for ARM64 based enterprise system at now.

While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
on arm64? It's a deprecated /proc/acpi interface and it would be nice to
avoid introducing deprecated behaviour if we can avoid it.

Will

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 13:13                       ` Will Deacon
  (?)
@ 2015-04-30 13:52                         ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 13:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > > > In particular, there are two common models that we support in Linux:
> > > > 
> > > > a) embedded ARM32 and others
> > > > 
> > > > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > > > dma_cache_sync() == not supportable
> > > > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > > > 
> > > > b) NUMA servers (parisc, itanium) and others
> > > > 
> > > > dma_alloc_noncoherent() == alloc cached
> > > 
> > > This would lead to mismatched memory attributes on ARM/arm64.
> > 
> > How so? This is just what __dma_alloc() on arm64 does for
> > coherent devices:
> > 
> >         /* no need for non-cacheable mapping if coherent */
> >         if (coherent)
> >                 return ptr;
> 
> Ok, I thought that you were only describing the cases when the device is
> non-coherent (_CCA=0). Otherwise, your assertion above that
> dma_alloc_coherent == alloc uncached isn't true for coherent devices.
> 
> So now I'm confused...

What I was describing here is a device that is not fully coherent,
but instead requires some operation other than a cache flush/invalidate
to complete before the memory can be accessed.

> > > > dma_alloc_coherent() == alloc uncached
> > > > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> > > 
> > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > maintenance).
> > 
> > It ensures that a state of a buffer as observed by CPU and device is
> > identical. It's possible that we removed all platforms that did something
> > interesting here, so it's one of these:
> > 
> > a) On architectures that are mostly coherent, it's a barrier
> >    that is broadcast to all devices, like I assume DSB is. IA64
> >    currently does this for all machines, but IIRC it used to 
> >    access some cluster interconnect at some point to enforce a
> >    flush.
> >    The ARM32 based ArmadaXP also falls into this model if the cache
> >    coherency fabric is enabled, as that needs to be synchronized
> > b) On architectures where the device may not see the state of the cache,
> >    but the CPU is always aware of anything the device sends it,
> >    it flushes the cache. This seems to be the case on parisc,
> >    and in particular, there are some variants that do not support
> >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > c) On architectures that need the synchronization both ways,
> >    it does (almost) the same invalidate/clean/flush thing as
> >    ARM, except it doesn't have to worry about cache lines from
> >    speculative prefetch which make it impossible to implement on
> >    ARM.
> 
> Okey doke, thanks for the explanation. It sounds like we can just build
> the primitive out of the existing cache maintenance routines if we need
> to implement it.

Cases a) and b) yes, but not c), otherwise we could simplify
the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
and __dma_page_dev_to_cpu into one function.

And a) and b) are both for systems that are more coherent than what
our noncoherent dma_map_ops implement, but less coherent than what
the coherent dma_map_ops do, and that is specifically what the ACPI
binding cannot describe, unless you argue that either ACPI or ARMv8
forbids both of these models.

> > Which case would a variant of ArmadaXP with a 64-bit core fall into then?
> > Do I understand it right that requiring to sync the coherency fabric
> > would make it noncompliant with ACPI but still architecturally compliant?
> 
> I would say that the ArmadaXP coherency fabric is not compliant with ARMv8
> as it requires additional steps over those cache maintenance instructions
> described by the architecture (i.e. it falls into class (1) of the three
> classes of system cache in the architecture).
> 
> > I guess we could handle that case as well, by requiring any ACPI based
> > firmware to turn off the coherency fabric on that system and just making
> > it dog slow.
> 
> We already require something similar in Documentation/arm64/booting.txt:
> 
>   `System caches which do not respect architected cache maintenance by VA
>    operations (not recommended) must be configured and disabled.'

Hmm, does that rule really get violated here? I think it fully respects
the cache maintenance (flush/invalidate/clean) operations, but it does
not fully respect the dsb/dmb instructions, which is something else.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 13:52                         ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 13:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: linaro-acpi, suravee.suthikulpanit, linux-arm-kernel,
	Catalin Marinas, rjw, linux-kernel, linux-acpi, lenb

On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > > > In particular, there are two common models that we support in Linux:
> > > > 
> > > > a) embedded ARM32 and others
> > > > 
> > > > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > > > dma_cache_sync() == not supportable
> > > > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > > > 
> > > > b) NUMA servers (parisc, itanium) and others
> > > > 
> > > > dma_alloc_noncoherent() == alloc cached
> > > 
> > > This would lead to mismatched memory attributes on ARM/arm64.
> > 
> > How so? This is just what __dma_alloc() on arm64 does for
> > coherent devices:
> > 
> >         /* no need for non-cacheable mapping if coherent */
> >         if (coherent)
> >                 return ptr;
> 
> Ok, I thought that you were only describing the cases when the device is
> non-coherent (_CCA=0). Otherwise, your assertion above that
> dma_alloc_coherent == alloc uncached isn't true for coherent devices.
> 
> So now I'm confused...

What I was describing here is a device that is not fully coherent,
but instead requires some operation other than a cache flush/invalidate
to complete before the memory can be accessed.

> > > > dma_alloc_coherent() == alloc uncached
> > > > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> > > 
> > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > maintenance).
> > 
> > It ensures that a state of a buffer as observed by CPU and device is
> > identical. It's possible that we removed all platforms that did something
> > interesting here, so it's one of these:
> > 
> > a) On architectures that are mostly coherent, it's a barrier
> >    that is broadcast to all devices, like I assume DSB is. IA64
> >    currently does this for all machines, but IIRC it used to 
> >    access some cluster interconnect at some point to enforce a
> >    flush.
> >    The ARM32 based ArmadaXP also falls into this model if the cache
> >    coherency fabric is enabled, as that needs to be synchronized
> > b) On architectures where the device may not see the state of the cache,
> >    but the CPU is always aware of anything the device sends it,
> >    it flushes the cache. This seems to be the case on parisc,
> >    and in particular, there are some variants that do not support
> >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > c) On architectures that need the synchronization both ways,
> >    it does (almost) the same invalidate/clean/flush thing as
> >    ARM, except it doesn't have to worry about cache lines from
> >    speculative prefetch which make it impossible to implement on
> >    ARM.
> 
> Okey doke, thanks for the explanation. It sounds like we can just build
> the primitive out of the existing cache maintenance routines if we need
> to implement it.

Cases a) and b) yes, but not c), otherwise we could simplify
the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
and __dma_page_dev_to_cpu into one function.

And a) and b) are both for systems that are more coherent than what
our noncoherent dma_map_ops implement, but less coherent than what
the coherent dma_map_ops do, and that is specifically what the ACPI
binding cannot describe, unless you argue that either ACPI or ARMv8
forbids both of these models.

> > Which case would a variant of ArmadaXP with a 64-bit core fall into then?
> > Do I understand it right that requiring to sync the coherency fabric
> > would make it noncompliant with ACPI but still architecturally compliant?
> 
> I would say that the ArmadaXP coherency fabric is not compliant with ARMv8
> as it requires additional steps over those cache maintenance instructions
> described by the architecture (i.e. it falls into class (1) of the three
> classes of system cache in the architecture).
> 
> > I guess we could handle that case as well, by requiring any ACPI based
> > firmware to turn off the coherency fabric on that system and just making
> > it dog slow.
> 
> We already require something similar in Documentation/arm64/booting.txt:
> 
>   `System caches which do not respect architected cache maintenance by VA
>    operations (not recommended) must be configured and disabled.'

Hmm, does that rule really get violated here? I think it fully respects
the cache maintenance (flush/invalidate/clean) operations, but it does
not fully respect the dsb/dmb instructions, which is something else.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 13:52                         ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-04-30 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > On Thu, Apr 30, 2015 at 12:24:12PM +0100, Arnd Bergmann wrote:
> > > > In particular, there are two common models that we support in Linux:
> > > > 
> > > > a) embedded ARM32 and others
> > > > 
> > > > dma_alloc_non_coherent() == dma_alloc_coherent() == alloc uncached
> > > > dma_cache_sync() == not supportable
> > > > dma_sync_{single,sg,page}_for_{device,cpu} == {flush, invalidate, ...}
> > > > 
> > > > b) NUMA servers (parisc, itanium) and others
> > > > 
> > > > dma_alloc_noncoherent() == alloc cached
> > > 
> > > This would lead to mismatched memory attributes on ARM/arm64.
> > 
> > How so? This is just what __dma_alloc() on arm64 does for
> > coherent devices:
> > 
> >         /* no need for non-cacheable mapping if coherent */
> >         if (coherent)
> >                 return ptr;
> 
> Ok, I thought that you were only describing the cases when the device is
> non-coherent (_CCA=0). Otherwise, your assertion above that
> dma_alloc_coherent == alloc uncached isn't true for coherent devices.
> 
> So now I'm confused...

What I was describing here is a device that is not fully coherent,
but instead requires some operation other than a cache flush/invalidate
to complete before the memory can be accessed.

> > > > dma_alloc_coherent() == alloc uncached
> > > > dma_sync_{single,sg,page}_for_{device,cpu} ==  dma_cache_sync() == cache sync
> > > 
> > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > maintenance).
> > 
> > It ensures that a state of a buffer as observed by CPU and device is
> > identical. It's possible that we removed all platforms that did something
> > interesting here, so it's one of these:
> > 
> > a) On architectures that are mostly coherent, it's a barrier
> >    that is broadcast to all devices, like I assume DSB is. IA64
> >    currently does this for all machines, but IIRC it used to 
> >    access some cluster interconnect at some point to enforce a
> >    flush.
> >    The ARM32 based ArmadaXP also falls into this model if the cache
> >    coherency fabric is enabled, as that needs to be synchronized
> > b) On architectures where the device may not see the state of the cache,
> >    but the CPU is always aware of anything the device sends it,
> >    it flushes the cache. This seems to be the case on parisc,
> >    and in particular, there are some variants that do not support
> >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > c) On architectures that need the synchronization both ways,
> >    it does (almost) the same invalidate/clean/flush thing as
> >    ARM, except it doesn't have to worry about cache lines from
> >    speculative prefetch which make it impossible to implement on
> >    ARM.
> 
> Okey doke, thanks for the explanation. It sounds like we can just build
> the primitive out of the existing cache maintenance routines if we need
> to implement it.

Cases a) and b) yes, but not c), otherwise we could simplify
the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
and __dma_page_dev_to_cpu into one function.

And a) and b) are both for systems that are more coherent than what
our noncoherent dma_map_ops implement, but less coherent than what
the coherent dma_map_ops do, and that is specifically what the ACPI
binding cannot describe, unless you argue that either ACPI or ARMv8
forbids both of these models.

> > Which case would a variant of ArmadaXP with a 64-bit core fall into then?
> > Do I understand it right that requiring to sync the coherency fabric
> > would make it noncompliant with ACPI but still architecturally compliant?
> 
> I would say that the ArmadaXP coherency fabric is not compliant with ARMv8
> as it requires additional steps over those cache maintenance instructions
> described by the architecture (i.e. it falls into class (1) of the three
> classes of system cache in the architecture).
> 
> > I guess we could handle that case as well, by requiring any ACPI based
> > firmware to turn off the coherency fabric on that system and just making
> > it dog slow.
> 
> We already require something similar in Documentation/arm64/booting.txt:
> 
>   `System caches which do not respect architected cache maintenance by VA
>    operations (not recommended) must be configured and disabled.'

Hmm, does that rule really get violated here? I think it fully respects
the cache maintenance (flush/invalidate/clean) operations, but it does
not fully respect the dsb/dmb instructions, which is something else.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-30 13:50             ` Will Deacon
  (?)
@ 2015-04-30 14:14               ` Hanjun Guo
  -1 siblings, 0 replies; 102+ messages in thread
From: Hanjun Guo @ 2015-04-30 14:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, suravee.suthikulpanit, al.stone, arnd,
	linaro-acpi, rjw, linux-kernel, linux-acpi, leo.duran, msalter,
	grant.likely, Charles Garcia-Tobin, linux-arm-kernel, lenb

On 2015年04月30日 21:50, Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
>> On 2015年04月29日 22:42, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>> Any plans for ACPI on 32-bit ARM?
>>>>
>>>> Not that I am aware, but I could be totally wrong. The reason I am adding
>>>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>>>
>>>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>>>> be removed.
>>>
>>> I think it should be removed (as long as ACPI cannot be selected on
>>> arm32).
>>
>> I agree.
>>
>> Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
>> for ARM64 based enterprise system at now.
>
> While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
> on arm64? It's a deprecated /proc/acpi interface and it would be nice to
> avoid introducing deprecated behaviour if we can avoid it.

I agree. It is used for laptop ac adapter and battery, I will look
into that and clean it up for ARM64.

Thanks
Hanjun
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 14:14               ` Hanjun Guo
  0 siblings, 0 replies; 102+ messages in thread
From: Hanjun Guo @ 2015-04-30 14:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, suravee.suthikulpanit, al.stone, arnd,
	linaro-acpi, rjw, linux-kernel, linux-acpi, leo.duran, msalter,
	grant.likely, Charles Garcia-Tobin, linux-arm-kernel, lenb

On 2015年04月30日 21:50, Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
>> On 2015年04月29日 22:42, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>> Any plans for ACPI on 32-bit ARM?
>>>>
>>>> Not that I am aware, but I could be totally wrong. The reason I am adding
>>>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>>>
>>>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>>>> be removed.
>>>
>>> I think it should be removed (as long as ACPI cannot be selected on
>>> arm32).
>>
>> I agree.
>>
>> Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
>> for ARM64 based enterprise system at now.
>
> While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
> on arm64? It's a deprecated /proc/acpi interface and it would be nice to
> avoid introducing deprecated behaviour if we can avoid it.

I agree. It is used for laptop ac adapter and battery, I will look
into that and clean it up for ARM64.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 14:14               ` Hanjun Guo
  0 siblings, 0 replies; 102+ messages in thread
From: Hanjun Guo @ 2015-04-30 14:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 2015?04?30? 21:50, Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
>> On 2015?04?29? 22:42, Catalin Marinas wrote:
>>> On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
>>>> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
>>>>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
>>>>> Any plans for ACPI on 32-bit ARM?
>>>>
>>>> Not that I am aware, but I could be totally wrong. The reason I am adding
>>>> this here for 32-bit ARM is because the ACPI spec mentioned this.
>>>>
>>>> If you think this is not necessary until we introduce ACPI for ARM32, it can
>>>> be removed.
>>>
>>> I think it should be removed (as long as ACPI cannot be selected on
>>> arm32).
>>
>> I agree.
>>
>> Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
>> for ARM64 based enterprise system at now.
>
> While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
> on arm64? It's a deprecated /proc/acpi interface and it would be nice to
> avoid introducing deprecated behaviour if we can avoid it.

I agree. It is used for laptop ac adapter and battery, I will look
into that and clean it up for ARM64.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
  2015-04-30 13:50             ` Will Deacon
  (?)
@ 2015-04-30 15:01               ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 102+ messages in thread
From: Lorenzo Pieralisi @ 2015-04-30 15:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: hanjun.guo, Catalin Marinas, suravee.suthikulpanit, al.stone,
	arnd, linaro-acpi, rjw, linux-kernel, linux-acpi, leo.duran,
	msalter, grant.likely, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On Thu, Apr 30, 2015 at 02:50:18PM +0100, Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
> > On 2015???04???29??? 22:42, Catalin Marinas wrote:
> > > On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> > >> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> > >>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> > >>> Any plans for ACPI on 32-bit ARM?
> > >>
> > >> Not that I am aware, but I could be totally wrong. The reason I am adding
> > >> this here for 32-bit ARM is because the ACPI spec mentioned this.
> > >>
> > >> If you think this is not necessary until we introduce ACPI for ARM32, it can
> > >> be removed.
> > >
> > > I think it should be removed (as long as ACPI cannot be selected on
> > > arm32).
> > 
> > I agree.
> > 
> > Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
> > for ARM64 based enterprise system at now.
> 
> While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
> on arm64? It's a deprecated /proc/acpi interface and it would be nice to
> avoid introducing deprecated behaviour if we can avoid it.

I think we can make it depend on x86 because the compilation units that
create that proc dirs (ACPI_BATTERY and ACPI_AC) already depend on it,
at the moment compiling drivers/acpi/cm_sbs.c is totally useless on
arm64.

Lorenzo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 15:01               ` Lorenzo Pieralisi
  0 siblings, 0 replies; 102+ messages in thread
From: Lorenzo Pieralisi @ 2015-04-30 15:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: hanjun.guo, Catalin Marinas, suravee.suthikulpanit, al.stone,
	arnd, linaro-acpi, rjw, linux-kernel, linux-acpi, leo.duran,
	msalter, grant.likely, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On Thu, Apr 30, 2015 at 02:50:18PM +0100, Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
> > On 2015???04???29??? 22:42, Catalin Marinas wrote:
> > > On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> > >> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> > >>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> > >>> Any plans for ACPI on 32-bit ARM?
> > >>
> > >> Not that I am aware, but I could be totally wrong. The reason I am adding
> > >> this here for 32-bit ARM is because the ACPI spec mentioned this.
> > >>
> > >> If you think this is not necessary until we introduce ACPI for ARM32, it can
> > >> be removed.
> > >
> > > I think it should be removed (as long as ACPI cannot be selected on
> > > arm32).
> > 
> > I agree.
> > 
> > Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
> > for ARM64 based enterprise system at now.
> 
> While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
> on arm64? It's a deprecated /proc/acpi interface and it would be nice to
> avoid introducing deprecated behaviour if we can avoid it.

I think we can make it depend on x86 because the compilation units that
create that proc dirs (ACPI_BATTERY and ACPI_AC) already depend on it,
at the moment compiling drivers/acpi/cm_sbs.c is totally useless on
arm64.

Lorenzo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA
@ 2015-04-30 15:01               ` Lorenzo Pieralisi
  0 siblings, 0 replies; 102+ messages in thread
From: Lorenzo Pieralisi @ 2015-04-30 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 30, 2015 at 02:50:18PM +0100, Will Deacon wrote:
> On Thu, Apr 30, 2015 at 02:47:13PM +0100, Hanjun Guo wrote:
> > On 2015???04???29??? 22:42, Catalin Marinas wrote:
> > > On Wed, Apr 29, 2015 at 09:31:03AM -0500, Suravee Suthikulpanit wrote:
> > >> On 04/29/2015 09:04 AM, Catalin Marinas wrote:
> > >>> On Wed, Apr 29, 2015 at 08:44:08AM -0500, Suravee Suthikulpanit wrote:
> > >>> Any plans for ACPI on 32-bit ARM?
> > >>
> > >> Not that I am aware, but I could be totally wrong. The reason I am adding
> > >> this here for 32-bit ARM is because the ACPI spec mentioned this.
> > >>
> > >> If you think this is not necessary until we introduce ACPI for ARM32, it can
> > >> be removed.
> > >
> > > I think it should be removed (as long as ACPI cannot be selected on
> > > arm32).
> > 
> > I agree.
> > 
> > Now there is no plan for ARM32 ACPI as I know, ACPI for ARM targets
> > for ARM64 based enterprise system at now.
> 
> While we're at it, do we *really* need to support CONFIG_ACPI_PROCFS_POWER
> on arm64? It's a deprecated /proc/acpi interface and it would be nice to
> avoid introducing deprecated behaviour if we can avoid it.

I think we can make it depend on x86 because the compilation units that
create that proc dirs (ACPI_BATTERY and ACPI_AC) already depend on it,
at the moment compiling drivers/acpi/cm_sbs.c is totally useless on
arm64.

Lorenzo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 13:52                         ` Arnd Bergmann
  (?)
@ 2015-04-30 15:55                           ` Catalin Marinas
  -1 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-04-30 15:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, linaro-acpi, rjw, linux-kernel, linux-acpi,
	suravee.suthikulpanit, linux-arm-kernel, lenb

On Thu, Apr 30, 2015 at 03:52:17PM +0200, Arnd Bergmann wrote:
> On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> > On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > > maintenance).
> > > 
> > > It ensures that a state of a buffer as observed by CPU and device is
> > > identical. It's possible that we removed all platforms that did something
> > > interesting here, so it's one of these:
> > > 
> > > a) On architectures that are mostly coherent, it's a barrier
> > >    that is broadcast to all devices, like I assume DSB is. IA64
> > >    currently does this for all machines, but IIRC it used to 
> > >    access some cluster interconnect at some point to enforce a
> > >    flush.
> > >    The ARM32 based ArmadaXP also falls into this model if the cache
> > >    coherency fabric is enabled, as that needs to be synchronized

I'm getting confused by the ArmadaXP case. IIRC, the point of the
arm,io-coherent property to the PL310 was precisely to make the
outer_sync a no-op when the coherency is enabled. So basically an mb()
would only issue a DSB on such platform without the PL310 cache sync.

On coherent systems, devices usually snoop the inner/CPU cache and not
the system cache, that's further down the line. So a DSB would ensure
the visibility at the coherent interconnect level before the system
cache. I don't think it needs to be broadcast all the way to devices.

> > > b) On architectures where the device may not see the state of the cache,
> > >    but the CPU is always aware of anything the device sends it,
> > >    it flushes the cache. This seems to be the case on parisc,
> > >    and in particular, there are some variants that do not support
> > >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > > c) On architectures that need the synchronization both ways,
> > >    it does (almost) the same invalidate/clean/flush thing as
> > >    ARM, except it doesn't have to worry about cache lines from
> > >    speculative prefetch which make it impossible to implement on
> > >    ARM.
> > 
> > Okey doke, thanks for the explanation. It sounds like we can just build
> > the primitive out of the existing cache maintenance routines if we need
> > to implement it.
> 
> Cases a) and b) yes, but not c), otherwise we could simplify
> the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
> and __dma_page_dev_to_cpu into one function.

I don't fully understand c) or b). Wouldn't the non-coherent ops cover
them both, though potentially not as efficient?

> And a) and b) are both for systems that are more coherent than what
> our noncoherent dma_map_ops implement, but less coherent than what
> the coherent dma_map_ops do, and that is specifically what the ACPI
> binding cannot describe, unless you argue that either ACPI or ARMv8
> forbids both of these models.

In general, a DSB should work as described in the ARM ARM without the
need to poke additional devices (PL310 is an example not to follow).

> > > I guess we could handle that case as well, by requiring any ACPI based
> > > firmware to turn off the coherency fabric on that system and just making
> > > it dog slow.
> > 
> > We already require something similar in Documentation/arm64/booting.txt:
> > 
> >   `System caches which do not respect architected cache maintenance by VA
> >    operations (not recommended) must be configured and disabled.'
> 
> Hmm, does that rule really get violated here? I think it fully respects
> the cache maintenance (flush/invalidate/clean) operations, but it does
> not fully respect the dsb/dmb instructions, which is something else.

If it fully respects the cache maintenance, it should also respect the
completion and ordering requirements of the cache maintenance
operations. That means that a DSB guarantees completion of such
operations.

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 15:55                           ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-04-30 15:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, linaro-acpi, rjw, linux-kernel, linux-acpi,
	suravee.suthikulpanit, linux-arm-kernel, lenb

On Thu, Apr 30, 2015 at 03:52:17PM +0200, Arnd Bergmann wrote:
> On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> > On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > > maintenance).
> > > 
> > > It ensures that a state of a buffer as observed by CPU and device is
> > > identical. It's possible that we removed all platforms that did something
> > > interesting here, so it's one of these:
> > > 
> > > a) On architectures that are mostly coherent, it's a barrier
> > >    that is broadcast to all devices, like I assume DSB is. IA64
> > >    currently does this for all machines, but IIRC it used to 
> > >    access some cluster interconnect at some point to enforce a
> > >    flush.
> > >    The ARM32 based ArmadaXP also falls into this model if the cache
> > >    coherency fabric is enabled, as that needs to be synchronized

I'm getting confused by the ArmadaXP case. IIRC, the point of the
arm,io-coherent property to the PL310 was precisely to make the
outer_sync a no-op when the coherency is enabled. So basically an mb()
would only issue a DSB on such platform without the PL310 cache sync.

On coherent systems, devices usually snoop the inner/CPU cache and not
the system cache, that's further down the line. So a DSB would ensure
the visibility at the coherent interconnect level before the system
cache. I don't think it needs to be broadcast all the way to devices.

> > > b) On architectures where the device may not see the state of the cache,
> > >    but the CPU is always aware of anything the device sends it,
> > >    it flushes the cache. This seems to be the case on parisc,
> > >    and in particular, there are some variants that do not support
> > >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > > c) On architectures that need the synchronization both ways,
> > >    it does (almost) the same invalidate/clean/flush thing as
> > >    ARM, except it doesn't have to worry about cache lines from
> > >    speculative prefetch which make it impossible to implement on
> > >    ARM.
> > 
> > Okey doke, thanks for the explanation. It sounds like we can just build
> > the primitive out of the existing cache maintenance routines if we need
> > to implement it.
> 
> Cases a) and b) yes, but not c), otherwise we could simplify
> the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
> and __dma_page_dev_to_cpu into one function.

I don't fully understand c) or b). Wouldn't the non-coherent ops cover
them both, though potentially not as efficient?

> And a) and b) are both for systems that are more coherent than what
> our noncoherent dma_map_ops implement, but less coherent than what
> the coherent dma_map_ops do, and that is specifically what the ACPI
> binding cannot describe, unless you argue that either ACPI or ARMv8
> forbids both of these models.

In general, a DSB should work as described in the ARM ARM without the
need to poke additional devices (PL310 is an example not to follow).

> > > I guess we could handle that case as well, by requiring any ACPI based
> > > firmware to turn off the coherency fabric on that system and just making
> > > it dog slow.
> > 
> > We already require something similar in Documentation/arm64/booting.txt:
> > 
> >   `System caches which do not respect architected cache maintenance by VA
> >    operations (not recommended) must be configured and disabled.'
> 
> Hmm, does that rule really get violated here? I think it fully respects
> the cache maintenance (flush/invalidate/clean) operations, but it does
> not fully respect the dsb/dmb instructions, which is something else.

If it fully respects the cache maintenance, it should also respect the
completion and ordering requirements of the cache maintenance
operations. That means that a DSB guarantees completion of such
operations.

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 15:55                           ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-04-30 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 30, 2015 at 03:52:17PM +0200, Arnd Bergmann wrote:
> On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> > On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > > maintenance).
> > > 
> > > It ensures that a state of a buffer as observed by CPU and device is
> > > identical. It's possible that we removed all platforms that did something
> > > interesting here, so it's one of these:
> > > 
> > > a) On architectures that are mostly coherent, it's a barrier
> > >    that is broadcast to all devices, like I assume DSB is. IA64
> > >    currently does this for all machines, but IIRC it used to 
> > >    access some cluster interconnect at some point to enforce a
> > >    flush.
> > >    The ARM32 based ArmadaXP also falls into this model if the cache
> > >    coherency fabric is enabled, as that needs to be synchronized

I'm getting confused by the ArmadaXP case. IIRC, the point of the
arm,io-coherent property to the PL310 was precisely to make the
outer_sync a no-op when the coherency is enabled. So basically an mb()
would only issue a DSB on such platform without the PL310 cache sync.

On coherent systems, devices usually snoop the inner/CPU cache and not
the system cache, that's further down the line. So a DSB would ensure
the visibility at the coherent interconnect level before the system
cache. I don't think it needs to be broadcast all the way to devices.

> > > b) On architectures where the device may not see the state of the cache,
> > >    but the CPU is always aware of anything the device sends it,
> > >    it flushes the cache. This seems to be the case on parisc,
> > >    and in particular, there are some variants that do not support
> > >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > > c) On architectures that need the synchronization both ways,
> > >    it does (almost) the same invalidate/clean/flush thing as
> > >    ARM, except it doesn't have to worry about cache lines from
> > >    speculative prefetch which make it impossible to implement on
> > >    ARM.
> > 
> > Okey doke, thanks for the explanation. It sounds like we can just build
> > the primitive out of the existing cache maintenance routines if we need
> > to implement it.
> 
> Cases a) and b) yes, but not c), otherwise we could simplify
> the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
> and __dma_page_dev_to_cpu into one function.

I don't fully understand c) or b). Wouldn't the non-coherent ops cover
them both, though potentially not as efficient?

> And a) and b) are both for systems that are more coherent than what
> our noncoherent dma_map_ops implement, but less coherent than what
> the coherent dma_map_ops do, and that is specifically what the ACPI
> binding cannot describe, unless you argue that either ACPI or ARMv8
> forbids both of these models.

In general, a DSB should work as described in the ARM ARM without the
need to poke additional devices (PL310 is an example not to follow).

> > > I guess we could handle that case as well, by requiring any ACPI based
> > > firmware to turn off the coherency fabric on that system and just making
> > > it dog slow.
> > 
> > We already require something similar in Documentation/arm64/booting.txt:
> > 
> >   `System caches which do not respect architected cache maintenance by VA
> >    operations (not recommended) must be configured and disabled.'
> 
> Hmm, does that rule really get violated here? I think it fully respects
> the cache maintenance (flush/invalidate/clean) operations, but it does
> not fully respect the dsb/dmb instructions, which is something else.

If it fully respects the cache maintenance, it should also respect the
completion and ordering requirements of the cache maintenance
operations. That means that a DSB guarantees completion of such
operations.

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30  8:23         ` Arnd Bergmann
  (?)
@ 2015-04-30 23:39           ` Suravee Suthikulanit
  -1 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulanit @ 2015-04-30 23:39 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-acpi
  Cc: linux-arm-kernel, catalin.marinas, rjw, linux-kernel,
	will.deacon, linux-acpi, lenb

On 4/30/2015 3:23 AM, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
>> On 4/29/15 11:25, Arnd Bergmann wrote:
>>> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> [...]
>> As for the case where _CCA=0, I think the ACPI driver should essentially
>> communicate the information as HW is non-coherent as described in the
>> spec, and should be calling arch_setup_dma_ops(dev, false). It is true
>> that this in probably less-likely for the ARM64 server platforms.
>> However, I would think that the ACPI driver should not be making such
>> assumption.
>
> Can you add a description to the ACPI spec then to describe in detail what
> "non-coherent" is supposed to mean, and which action the OS is supposed to
> take when accessing data from device or CPU?

I believe Will has already provided this, and we have already discussed 
this on separate emails in this thread.

>>>[...]
>>> On a related note, I'm not sure how to handle different DMA masks here.
>>> arch_setup_dma_ops() gets passed a size (and offset) argument, which should
>>> match the DMA mask, but I don't know if there is a way to find out the
>>> size from ACPI. Should we assume it's always 64-bit DMA capable?
>>
>> Looking at the ACPI spec, it does have the _DMA object. IIUC, this can
>> be used to describe DMA properties of a particular bus.
>>
>> Method(_DMA, ResourceTemplate()
>> {
>> 	QWORDMemory(
>> 	ResourceConsumer,
>> 	PosDecode, // _DEC
>> 	MinFixed, // _MIF
>> 	MaxFixed, // _MAF
>> 	Prefetchable, // _MEM
>> 	ReadWrite, // _RW
>> 	0, // _GRA
>> 	0, // _MIN
>> 	0x1fffffff, // _MAX
>> 	0x200000000, // _TRA
>> 	0x20000000, // _LEN
>> 	, , ,	
>> 	)
>> }
>>
>> I am not sure if this is an appropriate use for this object, but this
>> seems to be similar to the dma-ranges property for OF, and probably can
>> be used to specify baseaddr and size information when calling
>> arch_setup_dma_ops().
>
> Yes, that seems like a good idea. What is the expected behavior when that
> object is absent? Do we assume that the parent device is not DMA capable?

 From the spec:
If the _DMA object is not present for a bus device, the OS assumes that 
any address placed on a bus by a child device will be decoded either by 
a device on the bus or by the bus itself, (in other words, all address 
ranges can be used for DMA).

The issue is, since this is optional, I don't know which FW often 
providing this info.

> Is this sufficient to describe the case where a device can only do DMA
> to a specific address range that is not at bus address zero but that maps
> to the beginning of physical RAM?

I believe that's the _MIN (Minimum Base Address) is for.

>>> For legacy reasons, the default mask is probably best left at 32-bit,
>>> but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
>>> and that should fail based on the information provided by the platform
>>> if the bus is not capable of doing that.
>>>
>> However, on ARM64 the dma_base and size parameter for
>> arch_setup_dma_ops() is currently not used, and only coherent flag is
>> used.
>
> We can hope that we won't need the dma_base setting here, but it's
> good to have the option to pass it down if we need it.
>
> Not passing the size is a bug that needs to be fixed ASAP, I believe
> a number of folks have run into this, most recently the APM X-Gene
> MMC controller
>

Ok. I'll look at this separately.

>> We probably should look at this separately. For the moment, we can
>> probably say that if _CCA object is missing when needed, the ACPI driver
>> won't set up dma_mask when creating platform_device, which should be
>> equivalent to saying DMA is not supported.
>>
>> Please let me know if this is acceptable, and I'll make change in V2
>> accordingly.
>
> I would still ask that you treat non-coherent to mean "no DMA" until
> we have come up with a way to sufficiently describe the kind of
> non-coherency in ACPI.
>
> 	Arnd

Ok. In V2, when _CCA=0, since we are not aware of ARM64 systems that is 
working with such assumption with ACPI. I will also default to not 
calling arch_setup_dma_ops() and fallback to arch-specific default. We 
can revisit this later once we need to support such case.

Thanks,

Suravee


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 23:39           ` Suravee Suthikulanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulanit @ 2015-04-30 23:39 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-acpi
  Cc: linux-arm-kernel, catalin.marinas, rjw, linux-kernel,
	will.deacon, linux-acpi, lenb

On 4/30/2015 3:23 AM, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
>> On 4/29/15 11:25, Arnd Bergmann wrote:
>>> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> [...]
>> As for the case where _CCA=0, I think the ACPI driver should essentially
>> communicate the information as HW is non-coherent as described in the
>> spec, and should be calling arch_setup_dma_ops(dev, false). It is true
>> that this in probably less-likely for the ARM64 server platforms.
>> However, I would think that the ACPI driver should not be making such
>> assumption.
>
> Can you add a description to the ACPI spec then to describe in detail what
> "non-coherent" is supposed to mean, and which action the OS is supposed to
> take when accessing data from device or CPU?

I believe Will has already provided this, and we have already discussed 
this on separate emails in this thread.

>>>[...]
>>> On a related note, I'm not sure how to handle different DMA masks here.
>>> arch_setup_dma_ops() gets passed a size (and offset) argument, which should
>>> match the DMA mask, but I don't know if there is a way to find out the
>>> size from ACPI. Should we assume it's always 64-bit DMA capable?
>>
>> Looking at the ACPI spec, it does have the _DMA object. IIUC, this can
>> be used to describe DMA properties of a particular bus.
>>
>> Method(_DMA, ResourceTemplate()
>> {
>> 	QWORDMemory(
>> 	ResourceConsumer,
>> 	PosDecode, // _DEC
>> 	MinFixed, // _MIF
>> 	MaxFixed, // _MAF
>> 	Prefetchable, // _MEM
>> 	ReadWrite, // _RW
>> 	0, // _GRA
>> 	0, // _MIN
>> 	0x1fffffff, // _MAX
>> 	0x200000000, // _TRA
>> 	0x20000000, // _LEN
>> 	, , ,	
>> 	)
>> }
>>
>> I am not sure if this is an appropriate use for this object, but this
>> seems to be similar to the dma-ranges property for OF, and probably can
>> be used to specify baseaddr and size information when calling
>> arch_setup_dma_ops().
>
> Yes, that seems like a good idea. What is the expected behavior when that
> object is absent? Do we assume that the parent device is not DMA capable?

 From the spec:
If the _DMA object is not present for a bus device, the OS assumes that 
any address placed on a bus by a child device will be decoded either by 
a device on the bus or by the bus itself, (in other words, all address 
ranges can be used for DMA).

The issue is, since this is optional, I don't know which FW often 
providing this info.

> Is this sufficient to describe the case where a device can only do DMA
> to a specific address range that is not at bus address zero but that maps
> to the beginning of physical RAM?

I believe that's the _MIN (Minimum Base Address) is for.

>>> For legacy reasons, the default mask is probably best left at 32-bit,
>>> but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
>>> and that should fail based on the information provided by the platform
>>> if the bus is not capable of doing that.
>>>
>> However, on ARM64 the dma_base and size parameter for
>> arch_setup_dma_ops() is currently not used, and only coherent flag is
>> used.
>
> We can hope that we won't need the dma_base setting here, but it's
> good to have the option to pass it down if we need it.
>
> Not passing the size is a bug that needs to be fixed ASAP, I believe
> a number of folks have run into this, most recently the APM X-Gene
> MMC controller
>

Ok. I'll look at this separately.

>> We probably should look at this separately. For the moment, we can
>> probably say that if _CCA object is missing when needed, the ACPI driver
>> won't set up dma_mask when creating platform_device, which should be
>> equivalent to saying DMA is not supported.
>>
>> Please let me know if this is acceptable, and I'll make change in V2
>> accordingly.
>
> I would still ask that you treat non-coherent to mean "no DMA" until
> we have come up with a way to sufficiently describe the kind of
> non-coherency in ACPI.
>
> 	Arnd

Ok. In V2, when _CCA=0, since we are not aware of ARM64 systems that is 
working with such assumption with ACPI. I will also default to not 
calling arch_setup_dma_ops() and fallback to arch-specific default. We 
can revisit this later once we need to support such case.

Thanks,

Suravee


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-04-30 23:39           ` Suravee Suthikulanit
  0 siblings, 0 replies; 102+ messages in thread
From: Suravee Suthikulanit @ 2015-04-30 23:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/30/2015 3:23 AM, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 16:53:10 Suravee Suthikulpanit wrote:
>> On 4/29/15 11:25, Arnd Bergmann wrote:
>>> On Wednesday 29 April 2015 08:44:09 Suravee Suthikulpanit wrote:
>> [...]
>> As for the case where _CCA=0, I think the ACPI driver should essentially
>> communicate the information as HW is non-coherent as described in the
>> spec, and should be calling arch_setup_dma_ops(dev, false). It is true
>> that this in probably less-likely for the ARM64 server platforms.
>> However, I would think that the ACPI driver should not be making such
>> assumption.
>
> Can you add a description to the ACPI spec then to describe in detail what
> "non-coherent" is supposed to mean, and which action the OS is supposed to
> take when accessing data from device or CPU?

I believe Will has already provided this, and we have already discussed 
this on separate emails in this thread.

>>>[...]
>>> On a related note, I'm not sure how to handle different DMA masks here.
>>> arch_setup_dma_ops() gets passed a size (and offset) argument, which should
>>> match the DMA mask, but I don't know if there is a way to find out the
>>> size from ACPI. Should we assume it's always 64-bit DMA capable?
>>
>> Looking at the ACPI spec, it does have the _DMA object. IIUC, this can
>> be used to describe DMA properties of a particular bus.
>>
>> Method(_DMA, ResourceTemplate()
>> {
>> 	QWORDMemory(
>> 	ResourceConsumer,
>> 	PosDecode, // _DEC
>> 	MinFixed, // _MIF
>> 	MaxFixed, // _MAF
>> 	Prefetchable, // _MEM
>> 	ReadWrite, // _RW
>> 	0, // _GRA
>> 	0, // _MIN
>> 	0x1fffffff, // _MAX
>> 	0x200000000, // _TRA
>> 	0x20000000, // _LEN
>> 	, , ,	
>> 	)
>> }
>>
>> I am not sure if this is an appropriate use for this object, but this
>> seems to be similar to the dma-ranges property for OF, and probably can
>> be used to specify baseaddr and size information when calling
>> arch_setup_dma_ops().
>
> Yes, that seems like a good idea. What is the expected behavior when that
> object is absent? Do we assume that the parent device is not DMA capable?

 From the spec:
If the _DMA object is not present for a bus device, the OS assumes that 
any address placed on a bus by a child device will be decoded either by 
a device on the bus or by the bus itself, (in other words, all address 
ranges can be used for DMA).

The issue is, since this is optional, I don't know which FW often 
providing this info.

> Is this sufficient to describe the case where a device can only do DMA
> to a specific address range that is not at bus address zero but that maps
> to the beginning of physical RAM?

I believe that's the _MIN (Minimum Base Address) is for.

>>> For legacy reasons, the default mask is probably best left at 32-bit,
>>> but drivers are expected to call dma_set_mask() if they can do 64-bit DMA,
>>> and that should fail based on the information provided by the platform
>>> if the bus is not capable of doing that.
>>>
>> However, on ARM64 the dma_base and size parameter for
>> arch_setup_dma_ops() is currently not used, and only coherent flag is
>> used.
>
> We can hope that we won't need the dma_base setting here, but it's
> good to have the option to pass it down if we need it.
>
> Not passing the size is a bug that needs to be fixed ASAP, I believe
> a number of folks have run into this, most recently the APM X-Gene
> MMC controller
>

Ok. I'll look at this separately.

>> We probably should look at this separately. For the moment, we can
>> probably say that if _CCA object is missing when needed, the ACPI driver
>> won't set up dma_mask when creating platform_device, which should be
>> equivalent to saying DMA is not supported.
>>
>> Please let me know if this is acceptable, and I'll make change in V2
>> accordingly.
>
> I would still ask that you treat non-coherent to mean "no DMA" until
> we have come up with a way to sufficiently describe the kind of
> non-coherency in ACPI.
>
> 	Arnd

Ok. In V2, when _CCA=0, since we are not aware of ARM64 systems that is 
working with such assumption with ACPI. I will also default to not 
calling arch_setup_dma_ops() and fallback to arch-specific default. We 
can revisit this later once we need to support such case.

Thanks,

Suravee

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-29 15:54             ` Arnd Bergmann
  (?)
@ 2015-05-01 11:06               ` Catalin Marinas
  -1 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-05-01 11:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Suthikulpanit, Suravee, rjw, linaro-acpi, will.deacon,
	linux-kernel, linux-acpi, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On Wed, Apr 29, 2015 at 05:54:02PM +0200, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 14:57:10 Suthikulpanit, Suravee wrote:
> > Otherwise, it would seem inconsistent with what states in the ACPI spec:
> >  
> >   CCA objects are only relevant for devices that can access
> >   CPU-visible memory, such as devices that are DMA capable. On ARM
> >   based systems, the _CCA object must be supplied all such devices.
> >   On Intel platforms, if the _CCA object is not supplied, the OSPM
> >   will assume the devices are hardware cache coherent.
> > 
> > From the statement above, I interpreted as if it is not present, it would
> > be non-coherent.
> 
> My guess is that this section was included for Windows Phone, which runs
> on embedded SoCs that usually have noncoherent DMA in a particular way.
> 
> Linux however only uses ACPI for servers, so that case does not happen.
> 
> I guess it would be reasonable to add a run-time warning here if you
> try to do DMA on a device that does not have CCA set, and you should
> probably set the DMA mask to 0 in that case as well.

I agree, if _CCA isn't present, we should not allow DMA. With DT, the
default dma_ops point to non-coherent but with ACPI, we could change
the default to a dummy set of dma_ops which don't do anything (or just
return NULL). Something like below, untested:


diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index 9437e3dc5833..3fd6ef019c8f 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -31,10 +31,14 @@ extern struct dma_map_ops *dma_ops;
 
 static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
 {
-	if (unlikely(!dev) || !dev->archdata.dma_ops)
+	if (!dev)
 		return dma_ops;
-	else
+	else if (dev->archdata.dma_ops)
 		return dev->archdata.dma_ops;
+	else if (!acpi_disabled)
+		return dummy_dma_ops;
+	else
+		return dma_ops;
 }
 
 static inline struct dma_map_ops *get_dma_ops(struct device *dev)
@@ -48,6 +52,8 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 				      struct iommu_ops *iommu, bool coherent)
 {
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
 	dev->archdata.dma_coherent = coherent;
 }
 #define arch_setup_dma_ops	arch_setup_dma_ops


The core code should not call arch_setup_dma_ops() if no _CCA option is
found.

> Note that there are lots of ways in which you could have noncoherent DMA:
> the default on ARM32 is that it requires uncached access or explicit
> cache flushes, but it's also possible to have an SMP system where a device
> is only coherent with some of the CPUs and requires explicit synchronization
> (not flushes) otherwise. In a multi-level cache hierarchy, there could be
> all sorts of combinations of flushes and syncs you would need to do.
> 
> With DT, we handle this using SoC-specific overrides for platforms that
> are noncoherent in funny ways, see
> http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
> for instance.

It looks like mach-mvebu no longer needs this, according to commit
1bd4d8a6de5c (ARM: mvebu: use arm_coherent_dma_ops and re-enable hardware
I/O coherency).

Even if some hardware needs this, it's usually because it has some
broken assumptions about barriers which most likely are architecture
non-compliant. We can work around it on a case by case basis (SoC
quirks). One option would be to disable coherency altogether for that
device, even if the performance is affected (e.g. no partial coherency).
Another possibility may be to add a bus driver for that broken
interconnect which installs its own dma ops for each device attached.

> If we just disallow DMA to devices that are marked with _CCA=0
> in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> that wants it, and then make a more informed decision about it.

I don't think we should disallow DMA to devices with _CCA == 0 (only to
those that don't have a _CCA property at all) as long as _CCA == 0 has
clear semantics like only architected cache maintenance required (and
that's what the ARMv8 ARM requires from compliant system caches).

-- 
Catalin

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-01 11:06               ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-05-01 11:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Suthikulpanit, Suravee, rjw, linaro-acpi, will.deacon,
	linux-kernel, linux-acpi, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On Wed, Apr 29, 2015 at 05:54:02PM +0200, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 14:57:10 Suthikulpanit, Suravee wrote:
> > Otherwise, it would seem inconsistent with what states in the ACPI spec:
> >  
> >   CCA objects are only relevant for devices that can access
> >   CPU-visible memory, such as devices that are DMA capable. On ARM
> >   based systems, the _CCA object must be supplied all such devices.
> >   On Intel platforms, if the _CCA object is not supplied, the OSPM
> >   will assume the devices are hardware cache coherent.
> > 
> > From the statement above, I interpreted as if it is not present, it would
> > be non-coherent.
> 
> My guess is that this section was included for Windows Phone, which runs
> on embedded SoCs that usually have noncoherent DMA in a particular way.
> 
> Linux however only uses ACPI for servers, so that case does not happen.
> 
> I guess it would be reasonable to add a run-time warning here if you
> try to do DMA on a device that does not have CCA set, and you should
> probably set the DMA mask to 0 in that case as well.

I agree, if _CCA isn't present, we should not allow DMA. With DT, the
default dma_ops point to non-coherent but with ACPI, we could change
the default to a dummy set of dma_ops which don't do anything (or just
return NULL). Something like below, untested:


diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index 9437e3dc5833..3fd6ef019c8f 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -31,10 +31,14 @@ extern struct dma_map_ops *dma_ops;
 
 static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
 {
-	if (unlikely(!dev) || !dev->archdata.dma_ops)
+	if (!dev)
 		return dma_ops;
-	else
+	else if (dev->archdata.dma_ops)
 		return dev->archdata.dma_ops;
+	else if (!acpi_disabled)
+		return dummy_dma_ops;
+	else
+		return dma_ops;
 }
 
 static inline struct dma_map_ops *get_dma_ops(struct device *dev)
@@ -48,6 +52,8 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 				      struct iommu_ops *iommu, bool coherent)
 {
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
 	dev->archdata.dma_coherent = coherent;
 }
 #define arch_setup_dma_ops	arch_setup_dma_ops


The core code should not call arch_setup_dma_ops() if no _CCA option is
found.

> Note that there are lots of ways in which you could have noncoherent DMA:
> the default on ARM32 is that it requires uncached access or explicit
> cache flushes, but it's also possible to have an SMP system where a device
> is only coherent with some of the CPUs and requires explicit synchronization
> (not flushes) otherwise. In a multi-level cache hierarchy, there could be
> all sorts of combinations of flushes and syncs you would need to do.
> 
> With DT, we handle this using SoC-specific overrides for platforms that
> are noncoherent in funny ways, see
> http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
> for instance.

It looks like mach-mvebu no longer needs this, according to commit
1bd4d8a6de5c (ARM: mvebu: use arm_coherent_dma_ops and re-enable hardware
I/O coherency).

Even if some hardware needs this, it's usually because it has some
broken assumptions about barriers which most likely are architecture
non-compliant. We can work around it on a case by case basis (SoC
quirks). One option would be to disable coherency altogether for that
device, even if the performance is affected (e.g. no partial coherency).
Another possibility may be to add a bus driver for that broken
interconnect which installs its own dma ops for each device attached.

> If we just disallow DMA to devices that are marked with _CCA=0
> in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> that wants it, and then make a more informed decision about it.

I don't think we should disallow DMA to devices with _CCA == 0 (only to
those that don't have a _CCA property at all) as long as _CCA == 0 has
clear semantics like only architected cache maintenance required (and
that's what the ARMv8 ARM requires from compliant system caches).

-- 
Catalin

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-01 11:06               ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-05-01 11:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 29, 2015 at 05:54:02PM +0200, Arnd Bergmann wrote:
> On Wednesday 29 April 2015 14:57:10 Suthikulpanit, Suravee wrote:
> > Otherwise, it would seem inconsistent with what states in the ACPI spec:
> >  
> >   CCA objects are only relevant for devices that can access
> >   CPU-visible memory, such as devices that are DMA capable. On ARM
> >   based systems, the _CCA object must be supplied all such devices.
> >   On Intel platforms, if the _CCA object is not supplied, the OSPM
> >   will assume the devices are hardware cache coherent.
> > 
> > From the statement above, I interpreted as if it is not present, it would
> > be non-coherent.
> 
> My guess is that this section was included for Windows Phone, which runs
> on embedded SoCs that usually have noncoherent DMA in a particular way.
> 
> Linux however only uses ACPI for servers, so that case does not happen.
> 
> I guess it would be reasonable to add a run-time warning here if you
> try to do DMA on a device that does not have CCA set, and you should
> probably set the DMA mask to 0 in that case as well.

I agree, if _CCA isn't present, we should not allow DMA. With DT, the
default dma_ops point to non-coherent but with ACPI, we could change
the default to a dummy set of dma_ops which don't do anything (or just
return NULL). Something like below, untested:


diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index 9437e3dc5833..3fd6ef019c8f 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -31,10 +31,14 @@ extern struct dma_map_ops *dma_ops;
 
 static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
 {
-	if (unlikely(!dev) || !dev->archdata.dma_ops)
+	if (!dev)
 		return dma_ops;
-	else
+	else if (dev->archdata.dma_ops)
 		return dev->archdata.dma_ops;
+	else if (!acpi_disabled)
+		return dummy_dma_ops;
+	else
+		return dma_ops;
 }
 
 static inline struct dma_map_ops *get_dma_ops(struct device *dev)
@@ -48,6 +52,8 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 				      struct iommu_ops *iommu, bool coherent)
 {
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
 	dev->archdata.dma_coherent = coherent;
 }
 #define arch_setup_dma_ops	arch_setup_dma_ops


The core code should not call arch_setup_dma_ops() if no _CCA option is
found.

> Note that there are lots of ways in which you could have noncoherent DMA:
> the default on ARM32 is that it requires uncached access or explicit
> cache flushes, but it's also possible to have an SMP system where a device
> is only coherent with some of the CPUs and requires explicit synchronization
> (not flushes) otherwise. In a multi-level cache hierarchy, there could be
> all sorts of combinations of flushes and syncs you would need to do.
> 
> With DT, we handle this using SoC-specific overrides for platforms that
> are noncoherent in funny ways, see
> http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
> for instance.

It looks like mach-mvebu no longer needs this, according to commit
1bd4d8a6de5c (ARM: mvebu: use arm_coherent_dma_ops and re-enable hardware
I/O coherency).

Even if some hardware needs this, it's usually because it has some
broken assumptions about barriers which most likely are architecture
non-compliant. We can work around it on a case by case basis (SoC
quirks). One option would be to disable coherency altogether for that
device, even if the performance is affected (e.g. no partial coherency).
Another possibility may be to add a bus driver for that broken
interconnect which installs its own dma ops for each device attached.

> If we just disallow DMA to devices that are marked with _CCA=0
> in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> that wants it, and then make a more informed decision about it.

I don't think we should disallow DMA to devices with _CCA == 0 (only to
those that don't have a _CCA property at all) as long as _CCA == 0 has
clear semantics like only architected cache maintenance required (and
that's what the ARMv8 ARM requires from compliant system caches).

-- 
Catalin

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-04-30 15:55                           ` Catalin Marinas
  (?)
@ 2015-05-08 14:01                             ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-05-08 14:01 UTC (permalink / raw)
  To: linaro-acpi
  Cc: Catalin Marinas, Will Deacon, rjw, linux-kernel, linux-acpi,
	linux-arm-kernel, lenb

On Thursday 30 April 2015 16:55:14 Catalin Marinas wrote:
> On Thu, Apr 30, 2015 at 03:52:17PM +0200, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> > > On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > > > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > > > maintenance).
> > > > 
> > > > It ensures that a state of a buffer as observed by CPU and device is
> > > > identical. It's possible that we removed all platforms that did something
> > > > interesting here, so it's one of these:
> > > > 
> > > > a) On architectures that are mostly coherent, it's a barrier
> > > >    that is broadcast to all devices, like I assume DSB is. IA64
> > > >    currently does this for all machines, but IIRC it used to 
> > > >    access some cluster interconnect at some point to enforce a
> > > >    flush.
> > > >    The ARM32 based ArmadaXP also falls into this model if the cache
> > > >    coherency fabric is enabled, as that needs to be synchronized
> 
> I'm getting confused by the ArmadaXP case. IIRC, the point of the
> arm,io-coherent property to the PL310 was precisely to make the
> outer_sync a no-op when the coherency is enabled. So basically an mb()
> would only issue a DSB on such platform without the PL310 cache sync.
> 
> On coherent systems, devices usually snoop the inner/CPU cache and not
> the system cache, that's further down the line. So a DSB would ensure
> the visibility at the coherent interconnect level before the system
> cache. I don't think it needs to be broadcast all the way to devices.

Sorry for the late reply. IIRC, the sync on Armada XP was not required
for the cache controller, but rather for the bus fabric, to ensure
that a DMA has made it into the memory controller.

> > > > b) On architectures where the device may not see the state of the cache,
> > > >    but the CPU is always aware of anything the device sends it,
> > > >    it flushes the cache. This seems to be the case on parisc,
> > > >    and in particular, there are some variants that do not support
> > > >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > > > c) On architectures that need the synchronization both ways,
> > > >    it does (almost) the same invalidate/clean/flush thing as
> > > >    ARM, except it doesn't have to worry about cache lines from
> > > >    speculative prefetch which make it impossible to implement on
> > > >    ARM.
> > > 
> > > Okey doke, thanks for the explanation. It sounds like we can just build
> > > the primitive out of the existing cache maintenance routines if we need
> > > to implement it.
> > 
> > Cases a) and b) yes, but not c), otherwise we could simplify
> > the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
> > and __dma_page_dev_to_cpu into one function.
> 
> I don't fully understand c) or b). Wouldn't the non-coherent ops cover
> them both, though potentially not as efficient?

Turning off caches usually makes everything coherent, but the performance
cost can be gigantic. Also, it might not help if the problem with coherency
is the completion of the DMA as opposed to the caching.

> > > > I guess we could handle that case as well, by requiring any ACPI based
> > > > firmware to turn off the coherency fabric on that system and just making
> > > > it dog slow.
> > > 
> > > We already require something similar in Documentation/arm64/booting.txt:
> > > 
> > >   `System caches which do not respect architected cache maintenance by VA
> > >    operations (not recommended) must be configured and disabled.'
> > 
> > Hmm, does that rule really get violated here? I think it fully respects
> > the cache maintenance (flush/invalidate/clean) operations, but it does
> > not fully respect the dsb/dmb instructions, which is something else.
> 
> If it fully respects the cache maintenance, it should also respect the
> completion and ordering requirements of the cache maintenance
> operations. That means that a DSB guarantees completion of such
> operations.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-08 14:01                             ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-05-08 14:01 UTC (permalink / raw)
  To: linaro-acpi
  Cc: Catalin Marinas, Will Deacon, rjw, linux-kernel, linux-acpi,
	linux-arm-kernel, lenb

On Thursday 30 April 2015 16:55:14 Catalin Marinas wrote:
> On Thu, Apr 30, 2015 at 03:52:17PM +0200, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> > > On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > > > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > > > maintenance).
> > > > 
> > > > It ensures that a state of a buffer as observed by CPU and device is
> > > > identical. It's possible that we removed all platforms that did something
> > > > interesting here, so it's one of these:
> > > > 
> > > > a) On architectures that are mostly coherent, it's a barrier
> > > >    that is broadcast to all devices, like I assume DSB is. IA64
> > > >    currently does this for all machines, but IIRC it used to 
> > > >    access some cluster interconnect at some point to enforce a
> > > >    flush.
> > > >    The ARM32 based ArmadaXP also falls into this model if the cache
> > > >    coherency fabric is enabled, as that needs to be synchronized
> 
> I'm getting confused by the ArmadaXP case. IIRC, the point of the
> arm,io-coherent property to the PL310 was precisely to make the
> outer_sync a no-op when the coherency is enabled. So basically an mb()
> would only issue a DSB on such platform without the PL310 cache sync.
> 
> On coherent systems, devices usually snoop the inner/CPU cache and not
> the system cache, that's further down the line. So a DSB would ensure
> the visibility at the coherent interconnect level before the system
> cache. I don't think it needs to be broadcast all the way to devices.

Sorry for the late reply. IIRC, the sync on Armada XP was not required
for the cache controller, but rather for the bus fabric, to ensure
that a DMA has made it into the memory controller.

> > > > b) On architectures where the device may not see the state of the cache,
> > > >    but the CPU is always aware of anything the device sends it,
> > > >    it flushes the cache. This seems to be the case on parisc,
> > > >    and in particular, there are some variants that do not support
> > > >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > > > c) On architectures that need the synchronization both ways,
> > > >    it does (almost) the same invalidate/clean/flush thing as
> > > >    ARM, except it doesn't have to worry about cache lines from
> > > >    speculative prefetch which make it impossible to implement on
> > > >    ARM.
> > > 
> > > Okey doke, thanks for the explanation. It sounds like we can just build
> > > the primitive out of the existing cache maintenance routines if we need
> > > to implement it.
> > 
> > Cases a) and b) yes, but not c), otherwise we could simplify
> > the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
> > and __dma_page_dev_to_cpu into one function.
> 
> I don't fully understand c) or b). Wouldn't the non-coherent ops cover
> them both, though potentially not as efficient?

Turning off caches usually makes everything coherent, but the performance
cost can be gigantic. Also, it might not help if the problem with coherency
is the completion of the DMA as opposed to the caching.

> > > > I guess we could handle that case as well, by requiring any ACPI based
> > > > firmware to turn off the coherency fabric on that system and just making
> > > > it dog slow.
> > > 
> > > We already require something similar in Documentation/arm64/booting.txt:
> > > 
> > >   `System caches which do not respect architected cache maintenance by VA
> > >    operations (not recommended) must be configured and disabled.'
> > 
> > Hmm, does that rule really get violated here? I think it fully respects
> > the cache maintenance (flush/invalidate/clean) operations, but it does
> > not fully respect the dsb/dmb instructions, which is something else.
> 
> If it fully respects the cache maintenance, it should also respect the
> completion and ordering requirements of the cache maintenance
> operations. That means that a DSB guarantees completion of such
> operations.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-08 14:01                             ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-05-08 14:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 30 April 2015 16:55:14 Catalin Marinas wrote:
> On Thu, Apr 30, 2015 at 03:52:17PM +0200, Arnd Bergmann wrote:
> > On Thursday 30 April 2015 14:13:45 Will Deacon wrote:
> > > On Thu, Apr 30, 2015 at 02:03:00PM +0100, Arnd Bergmann wrote:
> > > > On Thursday 30 April 2015 12:46:15 Will Deacon wrote:
> > > > > Cache sync doesn't exist in the ARM/arm64architecture, what are the
> > > > > semantics supposed to be? Maybe it's just DSB for us (complete all pending
> > > > > maintenance).
> > > > 
> > > > It ensures that a state of a buffer as observed by CPU and device is
> > > > identical. It's possible that we removed all platforms that did something
> > > > interesting here, so it's one of these:
> > > > 
> > > > a) On architectures that are mostly coherent, it's a barrier
> > > >    that is broadcast to all devices, like I assume DSB is. IA64
> > > >    currently does this for all machines, but IIRC it used to 
> > > >    access some cluster interconnect at some point to enforce a
> > > >    flush.
> > > >    The ARM32 based ArmadaXP also falls into this model if the cache
> > > >    coherency fabric is enabled, as that needs to be synchronized
> 
> I'm getting confused by the ArmadaXP case. IIRC, the point of the
> arm,io-coherent property to the PL310 was precisely to make the
> outer_sync a no-op when the coherency is enabled. So basically an mb()
> would only issue a DSB on such platform without the PL310 cache sync.
> 
> On coherent systems, devices usually snoop the inner/CPU cache and not
> the system cache, that's further down the line. So a DSB would ensure
> the visibility at the coherent interconnect level before the system
> cache. I don't think it needs to be broadcast all the way to devices.

Sorry for the late reply. IIRC, the sync on Armada XP was not required
for the cache controller, but rather for the bus fabric, to ensure
that a DMA has made it into the memory controller.

> > > > b) On architectures where the device may not see the state of the cache,
> > > >    but the CPU is always aware of anything the device sends it,
> > > >    it flushes the cache. This seems to be the case on parisc,
> > > >    and in particular, there are some variants that do not support
> > > >    dma_alloc_coherent but only dma_alloc_noncoherent.
> > > > c) On architectures that need the synchronization both ways,
> > > >    it does (almost) the same invalidate/clean/flush thing as
> > > >    ARM, except it doesn't have to worry about cache lines from
> > > >    speculative prefetch which make it impossible to implement on
> > > >    ARM.
> > > 
> > > Okey doke, thanks for the explanation. It sounds like we can just build
> > > the primitive out of the existing cache maintenance routines if we need
> > > to implement it.
> > 
> > Cases a) and b) yes, but not c), otherwise we could simplify
> > the ARM dma-mapping implementation and just merge __dma_page_cpu_to_dev
> > and __dma_page_dev_to_cpu into one function.
> 
> I don't fully understand c) or b). Wouldn't the non-coherent ops cover
> them both, though potentially not as efficient?

Turning off caches usually makes everything coherent, but the performance
cost can be gigantic. Also, it might not help if the problem with coherency
is the completion of the DMA as opposed to the caching.

> > > > I guess we could handle that case as well, by requiring any ACPI based
> > > > firmware to turn off the coherency fabric on that system and just making
> > > > it dog slow.
> > > 
> > > We already require something similar in Documentation/arm64/booting.txt:
> > > 
> > >   `System caches which do not respect architected cache maintenance by VA
> > >    operations (not recommended) must be configured and disabled.'
> > 
> > Hmm, does that rule really get violated here? I think it fully respects
> > the cache maintenance (flush/invalidate/clean) operations, but it does
> > not fully respect the dsb/dmb instructions, which is something else.
> 
> If it fully respects the cache maintenance, it should also respect the
> completion and ordering requirements of the cache maintenance
> operations. That means that a DSB guarantees completion of such
> operations.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-05-01 11:06               ` Catalin Marinas
  (?)
@ 2015-05-08 14:08                 ` Arnd Bergmann
  -1 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-05-08 14:08 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Suthikulpanit, Suravee, rjw, linaro-acpi, will.deacon,
	linux-kernel, linux-acpi, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
> 
> > Note that there are lots of ways in which you could have noncoherent DMA:
> > the default on ARM32 is that it requires uncached access or explicit
> > cache flushes, but it's also possible to have an SMP system where a device
> > is only coherent with some of the CPUs and requires explicit synchronization
> > (not flushes) otherwise. In a multi-level cache hierarchy, there could be
> > all sorts of combinations of flushes and syncs you would need to do.
> > 
> > With DT, we handle this using SoC-specific overrides for platforms that
> > are noncoherent in funny ways, see
> > http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
> > for instance.
> 
> It looks like mach-mvebu no longer needs this, according to commit
> 1bd4d8a6de5c (ARM: mvebu: use arm_coherent_dma_ops and re-enable hardware
> I/O coherency).

Yes, Thomas Petazzoni found a way to configure that chip to essentially
provide PCI semantics where an MMIO read from a devices ensures that all
previous DMA has completed, which made the sync unnecessary. I believe
Marvell recommends against using that mode for performance reasons,
and they still use their own manual syncs in their vendor kernel.

> Even if some hardware needs this, it's usually because it has some
> broken assumptions about barriers which most likely are architecture
> non-compliant. We can work around it on a case by case basis (SoC
> quirks). One option would be to disable coherency altogether for that
> device, even if the performance is affected (e.g. no partial coherency).
> Another possibility may be to add a bus driver for that broken
> interconnect which installs its own dma ops for each device attached.

Whether the Armada XP example is broken or not is really a matter of
perspective. I would count it broken on the basis that is does not
match what the Linux DMA and MMIO APIs expect, but you can well build
an OS around their semantics.

> > If we just disallow DMA to devices that are marked with _CCA=0
> > in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> > that wants it, and then make a more informed decision about it.
> 
> I don't think we should disallow DMA to devices with _CCA == 0 (only to
> those that don't have a _CCA property at all) as long as _CCA == 0 has
> clear semantics like only architected cache maintenance required (and
> that's what the ARMv8 ARM requires from compliant system caches).

Even if we exclude all cases in which the behavior may be unexpected,
there is still the other point I raised initially:

             what would that be good for?

Can you think of a case where a server system has a reason to use
a device in noncoherent mode? I think it's more likely to be a case
where a device got misconfigured accidentally by the firmware, and
we're better off warning about that in the kernel than trying to prepare
for an unknown hardware that might use an obscure feature of the spec.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-08 14:08                 ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-05-08 14:08 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Suthikulpanit, Suravee, rjw, linaro-acpi, will.deacon,
	linux-kernel, linux-acpi, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
> 
> > Note that there are lots of ways in which you could have noncoherent DMA:
> > the default on ARM32 is that it requires uncached access or explicit
> > cache flushes, but it's also possible to have an SMP system where a device
> > is only coherent with some of the CPUs and requires explicit synchronization
> > (not flushes) otherwise. In a multi-level cache hierarchy, there could be
> > all sorts of combinations of flushes and syncs you would need to do.
> > 
> > With DT, we handle this using SoC-specific overrides for platforms that
> > are noncoherent in funny ways, see
> > http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
> > for instance.
> 
> It looks like mach-mvebu no longer needs this, according to commit
> 1bd4d8a6de5c (ARM: mvebu: use arm_coherent_dma_ops and re-enable hardware
> I/O coherency).

Yes, Thomas Petazzoni found a way to configure that chip to essentially
provide PCI semantics where an MMIO read from a devices ensures that all
previous DMA has completed, which made the sync unnecessary. I believe
Marvell recommends against using that mode for performance reasons,
and they still use their own manual syncs in their vendor kernel.

> Even if some hardware needs this, it's usually because it has some
> broken assumptions about barriers which most likely are architecture
> non-compliant. We can work around it on a case by case basis (SoC
> quirks). One option would be to disable coherency altogether for that
> device, even if the performance is affected (e.g. no partial coherency).
> Another possibility may be to add a bus driver for that broken
> interconnect which installs its own dma ops for each device attached.

Whether the Armada XP example is broken or not is really a matter of
perspective. I would count it broken on the basis that is does not
match what the Linux DMA and MMIO APIs expect, but you can well build
an OS around their semantics.

> > If we just disallow DMA to devices that are marked with _CCA=0
> > in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> > that wants it, and then make a more informed decision about it.
> 
> I don't think we should disallow DMA to devices with _CCA == 0 (only to
> those that don't have a _CCA property at all) as long as _CCA == 0 has
> clear semantics like only architected cache maintenance required (and
> that's what the ARMv8 ARM requires from compliant system caches).

Even if we exclude all cases in which the behavior may be unexpected,
there is still the other point I raised initially:

             what would that be good for?

Can you think of a case where a server system has a reason to use
a device in noncoherent mode? I think it's more likely to be a case
where a device got misconfigured accidentally by the firmware, and
we're better off warning about that in the kernel than trying to prepare
for an unknown hardware that might use an obscure feature of the spec.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-08 14:08                 ` Arnd Bergmann
  0 siblings, 0 replies; 102+ messages in thread
From: Arnd Bergmann @ 2015-05-08 14:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
> 
> > Note that there are lots of ways in which you could have noncoherent DMA:
> > the default on ARM32 is that it requires uncached access or explicit
> > cache flushes, but it's also possible to have an SMP system where a device
> > is only coherent with some of the CPUs and requires explicit synchronization
> > (not flushes) otherwise. In a multi-level cache hierarchy, there could be
> > all sorts of combinations of flushes and syncs you would need to do.
> > 
> > With DT, we handle this using SoC-specific overrides for platforms that
> > are noncoherent in funny ways, see
> > http://lxr.free-electrons.com/source/arch/arm/mach-mvebu/coherency.c?v=3.18#L263
> > for instance.
> 
> It looks like mach-mvebu no longer needs this, according to commit
> 1bd4d8a6de5c (ARM: mvebu: use arm_coherent_dma_ops and re-enable hardware
> I/O coherency).

Yes, Thomas Petazzoni found a way to configure that chip to essentially
provide PCI semantics where an MMIO read from a devices ensures that all
previous DMA has completed, which made the sync unnecessary. I believe
Marvell recommends against using that mode for performance reasons,
and they still use their own manual syncs in their vendor kernel.

> Even if some hardware needs this, it's usually because it has some
> broken assumptions about barriers which most likely are architecture
> non-compliant. We can work around it on a case by case basis (SoC
> quirks). One option would be to disable coherency altogether for that
> device, even if the performance is affected (e.g. no partial coherency).
> Another possibility may be to add a bus driver for that broken
> interconnect which installs its own dma ops for each device attached.

Whether the Armada XP example is broken or not is really a matter of
perspective. I would count it broken on the basis that is does not
match what the Linux DMA and MMIO APIs expect, but you can well build
an OS around their semantics.

> > If we just disallow DMA to devices that are marked with _CCA=0
> > in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> > that wants it, and then make a more informed decision about it.
> 
> I don't think we should disallow DMA to devices with _CCA == 0 (only to
> those that don't have a _CCA property at all) as long as _CCA == 0 has
> clear semantics like only architected cache maintenance required (and
> that's what the ARMv8 ARM requires from compliant system caches).

Even if we exclude all cases in which the behavior may be unexpected,
there is still the other point I raised initially:

             what would that be good for?

Can you think of a case where a server system has a reason to use
a device in noncoherent mode? I think it's more likely to be a case
where a device got misconfigured accidentally by the firmware, and
we're better off warning about that in the kernel than trying to prepare
for an unknown hardware that might use an obscure feature of the spec.

	Arnd

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-05-08 14:08                 ` Arnd Bergmann
  (?)
@ 2015-05-11 17:10                   ` Catalin Marinas
  -1 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-05-11 17:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, rjw, will.deacon, linux-kernel, linux-acpi,
	Suthikulpanit, Suravee, lenb, linux-arm-kernel,
	Charles Garcia-Tobin

On Fri, May 08, 2015 at 04:08:53PM +0200, Arnd Bergmann wrote:
> On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
> > > If we just disallow DMA to devices that are marked with _CCA=0
> > > in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> > > that wants it, and then make a more informed decision about it.
> > 
> > I don't think we should disallow DMA to devices with _CCA == 0 (only to
> > those that don't have a _CCA property at all) as long as _CCA == 0 has
> > clear semantics like only architected cache maintenance required (and
> > that's what the ARMv8 ARM requires from compliant system caches).
> 
> Even if we exclude all cases in which the behavior may be unexpected,
> there is still the other point I raised initially:
> 
>              what would that be good for?
> 
> Can you think of a case where a server system has a reason to use
> a device in noncoherent mode? I think it's more likely to be a case
> where a device got misconfigured accidentally by the firmware, and
> we're better off warning about that in the kernel than trying to prepare
> for an unknown hardware that might use an obscure feature of the spec.

Maybe some of the people involved in arm64 servers can give a better
answer, I'm not familiar with their hardware (plans).

I would expect most DMA-capable devices to be cache coherent. However,
for (system) performance reasons, some of them could be configured as
non-coherent. An example, though unlikely on servers, is a display
device continuously accessing a framebuffer. You may not want to
overload the coherent interconnect.

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-11 17:10                   ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-05-11 17:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-acpi, rjw, will.deacon, linux-kernel, linux-acpi,
	Suthikulpanit, Suravee, lenb, linux-arm-kernel,
	Charles Garcia-Tobin

On Fri, May 08, 2015 at 04:08:53PM +0200, Arnd Bergmann wrote:
> On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
> > > If we just disallow DMA to devices that are marked with _CCA=0
> > > in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> > > that wants it, and then make a more informed decision about it.
> > 
> > I don't think we should disallow DMA to devices with _CCA == 0 (only to
> > those that don't have a _CCA property at all) as long as _CCA == 0 has
> > clear semantics like only architected cache maintenance required (and
> > that's what the ARMv8 ARM requires from compliant system caches).
> 
> Even if we exclude all cases in which the behavior may be unexpected,
> there is still the other point I raised initially:
> 
>              what would that be good for?
> 
> Can you think of a case where a server system has a reason to use
> a device in noncoherent mode? I think it's more likely to be a case
> where a device got misconfigured accidentally by the firmware, and
> we're better off warning about that in the kernel than trying to prepare
> for an unknown hardware that might use an obscure feature of the spec.

Maybe some of the people involved in arm64 servers can give a better
answer, I'm not familiar with their hardware (plans).

I would expect most DMA-capable devices to be cache coherent. However,
for (system) performance reasons, some of them could be configured as
non-coherent. An example, though unlikely on servers, is a display
device continuously accessing a framebuffer. You may not want to
overload the coherent interconnect.

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-11 17:10                   ` Catalin Marinas
  0 siblings, 0 replies; 102+ messages in thread
From: Catalin Marinas @ 2015-05-11 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 08, 2015 at 04:08:53PM +0200, Arnd Bergmann wrote:
> On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
> > > If we just disallow DMA to devices that are marked with _CCA=0
> > > in ACPI, we can avoid this case, or discuss it by the time someone has hardware
> > > that wants it, and then make a more informed decision about it.
> > 
> > I don't think we should disallow DMA to devices with _CCA == 0 (only to
> > those that don't have a _CCA property at all) as long as _CCA == 0 has
> > clear semantics like only architected cache maintenance required (and
> > that's what the ARMv8 ARM requires from compliant system caches).
> 
> Even if we exclude all cases in which the behavior may be unexpected,
> there is still the other point I raised initially:
> 
>              what would that be good for?
> 
> Can you think of a case where a server system has a reason to use
> a device in noncoherent mode? I think it's more likely to be a case
> where a device got misconfigured accidentally by the firmware, and
> we're better off warning about that in the kernel than trying to prepare
> for an unknown hardware that might use an obscure feature of the spec.

Maybe some of the people involved in arm64 servers can give a better
answer, I'm not familiar with their hardware (plans).

I would expect most DMA-capable devices to be cache coherent. However,
for (system) performance reasons, some of them could be configured as
non-coherent. An example, though unlikely on servers, is a display
device continuously accessing a framebuffer. You may not want to
overload the coherent interconnect.

-- 
Catalin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
  2015-05-11 17:10                   ` Catalin Marinas
  (?)
@ 2015-05-11 17:24                     ` Robin Murphy
  -1 siblings, 0 replies; 102+ messages in thread
From: Robin Murphy @ 2015-05-11 17:24 UTC (permalink / raw)
  To: Catalin Marinas, Arnd Bergmann
  Cc: linaro-acpi, rjw, Will Deacon, linux-kernel, linux-acpi,
	suravee.suthikulpanit, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On 11/05/15 18:10, Catalin Marinas wrote:
> On Fri, May 08, 2015 at 04:08:53PM +0200, Arnd Bergmann wrote:
>> On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
>>>> If we just disallow DMA to devices that are marked with _CCA=0
>>>> in ACPI, we can avoid this case, or discuss it by the time someone has hardware
>>>> that wants it, and then make a more informed decision about it.
>>>
>>> I don't think we should disallow DMA to devices with _CCA == 0 (only to
>>> those that don't have a _CCA property at all) as long as _CCA == 0 has
>>> clear semantics like only architected cache maintenance required (and
>>> that's what the ARMv8 ARM requires from compliant system caches).
>>
>> Even if we exclude all cases in which the behavior may be unexpected,
>> there is still the other point I raised initially:
>>
>>               what would that be good for?
>>
>> Can you think of a case where a server system has a reason to use
>> a device in noncoherent mode? I think it's more likely to be a case
>> where a device got misconfigured accidentally by the firmware, and
>> we're better off warning about that in the kernel than trying to prepare
>> for an unknown hardware that might use an obscure feature of the spec.
>
> Maybe some of the people involved in arm64 servers can give a better
> answer, I'm not familiar with their hardware (plans).
>
> I would expect most DMA-capable devices to be cache coherent. However,
> for (system) performance reasons, some of them could be configured as
> non-coherent. An example, though unlikely on servers, is a display
> device continuously accessing a framebuffer. You may not want to
> overload the coherent interconnect.

FWIW, I've also had much the same argument put to me for IOMMUs, i.e. 
they want to make the page table walk interface non-coherent because 
they'd rather pay the cost of flushing the page tables once to save a 
few extra cycles of latency for cache snooping on every TLB miss.

Robin.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-11 17:24                     ` Robin Murphy
  0 siblings, 0 replies; 102+ messages in thread
From: Robin Murphy @ 2015-05-11 17:24 UTC (permalink / raw)
  To: Catalin Marinas, Arnd Bergmann
  Cc: linaro-acpi, rjw, Will Deacon, linux-kernel, linux-acpi,
	suravee.suthikulpanit, Charles Garcia-Tobin, linux-arm-kernel,
	lenb

On 11/05/15 18:10, Catalin Marinas wrote:
> On Fri, May 08, 2015 at 04:08:53PM +0200, Arnd Bergmann wrote:
>> On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
>>>> If we just disallow DMA to devices that are marked with _CCA=0
>>>> in ACPI, we can avoid this case, or discuss it by the time someone has hardware
>>>> that wants it, and then make a more informed decision about it.
>>>
>>> I don't think we should disallow DMA to devices with _CCA == 0 (only to
>>> those that don't have a _CCA property at all) as long as _CCA == 0 has
>>> clear semantics like only architected cache maintenance required (and
>>> that's what the ARMv8 ARM requires from compliant system caches).
>>
>> Even if we exclude all cases in which the behavior may be unexpected,
>> there is still the other point I raised initially:
>>
>>               what would that be good for?
>>
>> Can you think of a case where a server system has a reason to use
>> a device in noncoherent mode? I think it's more likely to be a case
>> where a device got misconfigured accidentally by the firmware, and
>> we're better off warning about that in the kernel than trying to prepare
>> for an unknown hardware that might use an obscure feature of the spec.
>
> Maybe some of the people involved in arm64 servers can give a better
> answer, I'm not familiar with their hardware (plans).
>
> I would expect most DMA-capable devices to be cache coherent. However,
> for (system) performance reasons, some of them could be configured as
> non-coherent. An example, though unlikely on servers, is a display
> device continuously accessing a framebuffer. You may not want to
> overload the coherent interconnect.

FWIW, I've also had much the same argument put to me for IOMMUs, i.e. 
they want to make the page table walk interface non-coherent because 
they'd rather pay the cost of flushing the page tables once to save a 
few extra cycles of latency for cache snooping on every TLB miss.

Robin.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Linaro-acpi] [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency
@ 2015-05-11 17:24                     ` Robin Murphy
  0 siblings, 0 replies; 102+ messages in thread
From: Robin Murphy @ 2015-05-11 17:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/05/15 18:10, Catalin Marinas wrote:
> On Fri, May 08, 2015 at 04:08:53PM +0200, Arnd Bergmann wrote:
>> On Friday 01 May 2015 12:06:44 Catalin Marinas wrote:
>>>> If we just disallow DMA to devices that are marked with _CCA=0
>>>> in ACPI, we can avoid this case, or discuss it by the time someone has hardware
>>>> that wants it, and then make a more informed decision about it.
>>>
>>> I don't think we should disallow DMA to devices with _CCA == 0 (only to
>>> those that don't have a _CCA property at all) as long as _CCA == 0 has
>>> clear semantics like only architected cache maintenance required (and
>>> that's what the ARMv8 ARM requires from compliant system caches).
>>
>> Even if we exclude all cases in which the behavior may be unexpected,
>> there is still the other point I raised initially:
>>
>>               what would that be good for?
>>
>> Can you think of a case where a server system has a reason to use
>> a device in noncoherent mode? I think it's more likely to be a case
>> where a device got misconfigured accidentally by the firmware, and
>> we're better off warning about that in the kernel than trying to prepare
>> for an unknown hardware that might use an obscure feature of the spec.
>
> Maybe some of the people involved in arm64 servers can give a better
> answer, I'm not familiar with their hardware (plans).
>
> I would expect most DMA-capable devices to be cache coherent. However,
> for (system) performance reasons, some of them could be configured as
> non-coherent. An example, though unlikely on servers, is a display
> device continuously accessing a framebuffer. You may not want to
> overload the coherent interconnect.

FWIW, I've also had much the same argument put to me for IOMMUs, i.e. 
they want to make the page table walk interface non-coherent because 
they'd rather pay the cost of flushing the page tables once to save a 
few extra cycles of latency for cache snooping on every TLB miss.

Robin.

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2015-05-11 17:24 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-29 13:44 [PATCH 0/2] ACPI : Introduce support for _CCA object Suravee Suthikulpanit
2015-04-29 13:44 ` Suravee Suthikulpanit
2015-04-29 13:44 ` Suravee Suthikulpanit
2015-04-29 13:44 ` [PATCH 1/2] arm/arm64: ACPI: Introduce CONFIG_ACPI_MUST_HAVE_CCA Suravee Suthikulpanit
2015-04-29 13:44   ` Suravee Suthikulpanit
2015-04-29 13:44   ` Suravee Suthikulpanit
2015-04-29 14:04   ` Catalin Marinas
2015-04-29 14:04     ` Catalin Marinas
2015-04-29 14:31     ` Suravee Suthikulpanit
2015-04-29 14:31       ` Suravee Suthikulpanit
2015-04-29 14:31       ` Suravee Suthikulpanit
2015-04-29 14:42       ` Catalin Marinas
2015-04-29 14:42         ` Catalin Marinas
2015-04-29 14:44         ` Suravee Suthikulpanit
2015-04-29 14:44           ` Suravee Suthikulpanit
2015-04-29 14:44           ` Suravee Suthikulpanit
2015-04-30 13:47         ` Hanjun Guo
2015-04-30 13:47           ` Hanjun Guo
2015-04-30 13:47           ` Hanjun Guo
2015-04-30 13:50           ` Will Deacon
2015-04-30 13:50             ` Will Deacon
2015-04-30 13:50             ` Will Deacon
2015-04-30 14:14             ` Hanjun Guo
2015-04-30 14:14               ` Hanjun Guo
2015-04-30 14:14               ` Hanjun Guo
2015-04-30 15:01             ` Lorenzo Pieralisi
2015-04-30 15:01               ` Lorenzo Pieralisi
2015-04-30 15:01               ` Lorenzo Pieralisi
2015-04-29 13:44 ` [PATCH 2/2] ACPI / scan: Parse _CCA and setup device coherency Suravee Suthikulpanit
2015-04-29 13:44   ` Suravee Suthikulpanit
2015-04-29 13:44   ` Suravee Suthikulpanit
2015-04-29 14:03   ` Arnd Bergmann
2015-04-29 14:03     ` Arnd Bergmann
2015-04-29 14:45     ` Suravee Suthikulpanit
2015-04-29 14:45       ` Suravee Suthikulpanit
2015-04-29 14:45       ` Suravee Suthikulpanit
2015-04-29 14:47       ` [Linaro-acpi] " Arnd Bergmann
2015-04-29 14:47         ` Arnd Bergmann
2015-04-29 14:57         ` Suthikulpanit, Suravee
2015-04-29 14:57           ` Suthikulpanit, Suravee
2015-04-29 14:57           ` Suthikulpanit, Suravee
2015-04-29 15:39           ` Al Stone
2015-04-29 15:39             ` Al Stone
2015-04-29 15:39             ` Al Stone
2015-04-29 16:15             ` Arnd Bergmann
2015-04-29 16:15               ` Arnd Bergmann
2015-04-29 16:15               ` Arnd Bergmann
2015-04-29 15:54           ` Arnd Bergmann
2015-04-29 15:54             ` Arnd Bergmann
2015-04-29 15:54             ` Arnd Bergmann
2015-05-01 11:06             ` Catalin Marinas
2015-05-01 11:06               ` Catalin Marinas
2015-05-01 11:06               ` Catalin Marinas
2015-05-08 14:08               ` Arnd Bergmann
2015-05-08 14:08                 ` Arnd Bergmann
2015-05-08 14:08                 ` Arnd Bergmann
2015-05-11 17:10                 ` Catalin Marinas
2015-05-11 17:10                   ` Catalin Marinas
2015-05-11 17:10                   ` Catalin Marinas
2015-05-11 17:24                   ` Robin Murphy
2015-05-11 17:24                     ` Robin Murphy
2015-05-11 17:24                     ` Robin Murphy
2015-04-29 16:25   ` Arnd Bergmann
2015-04-29 16:25     ` Arnd Bergmann
2015-04-29 21:53     ` Suravee Suthikulpanit
2015-04-29 21:53       ` Suravee Suthikulpanit
2015-04-29 21:53       ` Suravee Suthikulpanit
2015-04-30  8:23       ` [Linaro-acpi] " Arnd Bergmann
2015-04-30  8:23         ` Arnd Bergmann
2015-04-30 10:41         ` Will Deacon
2015-04-30 10:41           ` Will Deacon
2015-04-30 10:41           ` Will Deacon
2015-04-30 10:47           ` Arnd Bergmann
2015-04-30 10:47             ` Arnd Bergmann
2015-04-30 10:47             ` Arnd Bergmann
2015-04-30 11:07             ` Will Deacon
2015-04-30 11:07               ` Will Deacon
2015-04-30 11:07               ` Will Deacon
2015-04-30 11:24               ` Arnd Bergmann
2015-04-30 11:24                 ` Arnd Bergmann
2015-04-30 11:24                 ` Arnd Bergmann
2015-04-30 11:46                 ` Will Deacon
2015-04-30 11:46                   ` Will Deacon
2015-04-30 11:46                   ` Will Deacon
2015-04-30 13:03                   ` Arnd Bergmann
2015-04-30 13:03                     ` Arnd Bergmann
2015-04-30 13:03                     ` Arnd Bergmann
2015-04-30 13:13                     ` Will Deacon
2015-04-30 13:13                       ` Will Deacon
2015-04-30 13:13                       ` Will Deacon
2015-04-30 13:52                       ` Arnd Bergmann
2015-04-30 13:52                         ` Arnd Bergmann
2015-04-30 13:52                         ` Arnd Bergmann
2015-04-30 15:55                         ` Catalin Marinas
2015-04-30 15:55                           ` Catalin Marinas
2015-04-30 15:55                           ` Catalin Marinas
2015-05-08 14:01                           ` Arnd Bergmann
2015-05-08 14:01                             ` Arnd Bergmann
2015-05-08 14:01                             ` Arnd Bergmann
2015-04-30 23:39         ` Suravee Suthikulanit
2015-04-30 23:39           ` Suravee Suthikulanit
2015-04-30 23:39           ` Suravee Suthikulanit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.