linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver
@ 2022-03-21 16:50 Sven Peter
  2022-03-21 16:50 ` [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART Sven Peter
                   ` (9 more replies)
  0 siblings, 10 replies; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

Hi,

This series includes everything[*] required to get NVMe up and running on
Apple's M1, M1 Pro and M1 Max SoCs.

The NVMe controller on these machines is not attached to a PCI bus and
this driver originally started out when Arnd added platform support to
pci.c and I added the required Apple quirks. Christoph Hellwig stumbled
upon an early version and suggested to instead rewrite it as a
stand-alone driver since some of these quirks are rather awkward to
implement and would affect the hot path otherwise [1].

Here's the first version that creates apple.c to handle these weird NVMe
controllers. The following parts are included:

  - Device tree bindings: Since this is the first and probably only
    SoC that has NVMe outside of a PCIe bus I've put them into
    soc/apple. The same bindings are also used by u-boot and OpenBSD
    already.

  - SART address filter: Some of the buffers required by the NVMe
    controller sit behind a simple DMA address filter Apple calls
    SART. It allows to specify up to 16 physical address ranges
    that are allowed and will reject access to anything else.
    Unlike a real IOMMU there's no way to setup pagetables and
    also not all buffers are subject to this filtering. Most
    buffers used by the NVMe commands themselves use an integrated
    IOMMU instead.

  - RTKit IPC protocol: The NVMe controller is running a proprietary
    RTOS Apple calls RTKit and we need to communicate with it in order
    to bring up and use the NVMe controller. This communication happens
    over a mailbox interface that is already upstream. This protocol
    is also used for various other drivers present on these SoCs
    (SMC, display controller, Thunderbolt/USB4).

  - And finally the NVMe driver itself: The driver registers a platform
    device and is mainly based on pci.c with a few special Apple quirks.
    The biggest difference to normal NVMe (except for the missing PCI
    bus) is that command submission works differently: The SQ is
    replaced with a simple array and a command is triggered by writing
    its tag to a MMIO register. Additionally, the command must also be
    setup in the proprietary NVMMU before it can be submitted.
    There is some code duplication with pci.c for the setup of the PRPs.
    Depending on what you prefer this could be moved to a common file
    shared between pci.c and apple.c.

The hardware here is the same hardware that's already used in T2 Macs.
The only difference is that the T2 chip itself initializes the
controller, disable some quirks (the NVMMU and the weird submission
array) and then exposes it over a PCIe interface.

The driver itself has been successfully used by multiple people as
their daily driver for weeks at this point and no major issues have
been reported.
A smaller issue is that flushes on the devices take *much* longer than
expected. Jens Axboe has a workaround where the flushes are delayed but
that one isn't ready for submission yet.

Best,

Sven

[1] https://lore.kernel.org/linux-nvme/YRE7vCyn9d1ClhRm@infradead.org/
[*] The only missing part in this series are the device tree updates
    but since these will go through arm-soc anyway I haven't included
    them here but will instead submit them once this series is in a shape
    where it can be merged.

Hector Martin (2):
  nvme-apple: Add support for multiple power domains
  nvme-apple: Add support for suspend/resume

Jens Axboe (1):
  nvme-apple: Serialize command issue

Sven Peter (6):
  dt-bindings: soc: apple: Add Apple SART
  dt-bindings: soc: apple: Add ANS NVMe
  soc: apple: Always include Makefile
  soc: apple: Add SART driver
  soc: apple: Add RTKit IPC library
  nvme-apple: Add initial Apple SoC NVMe driver

 .../bindings/soc/apple/apple,nvme-ans.yaml    |   75 +
 .../bindings/soc/apple/apple,sart.yaml        |   52 +
 MAINTAINERS                                   |    3 +
 drivers/nvme/host/Kconfig                     |   12 +
 drivers/nvme/host/Makefile                    |    3 +
 drivers/nvme/host/apple.c                     | 1568 +++++++++++++++++
 drivers/soc/Makefile                          |    2 +-
 drivers/soc/apple/Kconfig                     |   24 +
 drivers/soc/apple/Makefile                    |    6 +
 drivers/soc/apple/rtkit-crashlog.c            |  147 ++
 drivers/soc/apple/rtkit-internal.h            |   76 +
 drivers/soc/apple/rtkit.c                     |  842 +++++++++
 drivers/soc/apple/sart.c                      |  318 ++++
 include/linux/soc/apple/rtkit.h               |  203 +++
 include/linux/soc/apple/sart.h                |   77 +
 15 files changed, 3407 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
 create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,sart.yaml
 create mode 100644 drivers/nvme/host/apple.c
 create mode 100644 drivers/soc/apple/rtkit-crashlog.c
 create mode 100644 drivers/soc/apple/rtkit-internal.h
 create mode 100644 drivers/soc/apple/rtkit.c
 create mode 100644 drivers/soc/apple/sart.c
 create mode 100644 include/linux/soc/apple/rtkit.h
 create mode 100644 include/linux/soc/apple/sart.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-31 21:23   ` Rob Herring
  2022-03-21 16:50 ` [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe Sven Peter
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Arnd Bergmann,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme

Apple SoCs such as the M1 come with a simple DMA address filter called
SART. Unlike a real IOMMU no pagetables can be configured but instead
DMA transactions can be allowed for up to 16 paddr regions.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 .../bindings/soc/apple/apple,sart.yaml        | 52 +++++++++++++++++++
 MAINTAINERS                                   |  1 +
 2 files changed, 53 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,sart.yaml

diff --git a/Documentation/devicetree/bindings/soc/apple/apple,sart.yaml b/Documentation/devicetree/bindings/soc/apple/apple,sart.yaml
new file mode 100644
index 000000000000..d8177b3a3fba
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/apple/apple,sart.yaml
@@ -0,0 +1,52 @@
+# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/soc/apple/apple,sart.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Apple SART DMA address filter
+
+maintainers:
+  - Sven Peter <sven@svenpeter.dev>
+
+description:
+  Apple SART is a simple address filter for DMA transactions. Regions of
+  physical memory must be added to the SART's allow list before any
+  DMA can target these. Unlike a proper IOMMU no remapping can be done and
+  special support in the consumer driver is required since not all DMA
+  transactions of a single device are subject to SART filtering.
+
+  SART1 has first been used since at least the A11 (iPhone 8 and iPhone X)
+  and allows 36 bit of physical address space and filter entries with sizes
+  up to 24 bit.
+
+  SART2, first seen in A14 and M1, allows 36 bit of physical address space
+  and filter entry size up to 36 bit.
+
+  SART3, first seen in M1 Pro/Max, extends both the address space and filter
+  entry size to 42 bit.
+
+properties:
+  compatible:
+    enum:
+      - apple,t6000-sart
+      - apple,t8103-sart
+
+  reg:
+    maxItems: 1
+
+  power-domains:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |
+    sart@7bc50000 {
+      compatible = "apple,t8103-sart";
+      reg = <0x7bc50000 0x4000>;
+    };
diff --git a/MAINTAINERS b/MAINTAINERS
index cd0f68d4a34a..027c3b4ad61c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1774,6 +1774,7 @@ F:	Documentation/devicetree/bindings/mailbox/apple,mailbox.yaml
 F:	Documentation/devicetree/bindings/pci/apple,pcie.yaml
 F:	Documentation/devicetree/bindings/pinctrl/apple,pinctrl.yaml
 F:	Documentation/devicetree/bindings/power/apple*
+F:	Documentation/devicetree/bindings/soc/apple/*
 F:	Documentation/devicetree/bindings/watchdog/apple,wdt.yaml
 F:	arch/arm64/boot/dts/apple/
 F:	drivers/i2c/busses/i2c-pasemi-core.c
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
  2022-03-21 16:50 ` [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-23 11:14   ` Krzysztof Kozlowski
  2022-03-21 16:50 ` [PATCH 3/9] soc: apple: Always include Makefile Sven Peter
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Arnd Bergmann,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme

Apple SoCs such as the M1 come with an embedded NVMe coprocessor called
ANS2.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 .../bindings/soc/apple/apple,nvme-ans.yaml    | 75 +++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml

diff --git a/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
new file mode 100644
index 000000000000..e1f4c1c572aa
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
@@ -0,0 +1,75 @@
+# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/soc/apple/apple,nvme-ans.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Apple ANS NVM Express host controller
+
+maintainers:
+  - Sven Peter <sven@svenpeter.dev>
+
+properties:
+  compatible:
+    items:
+      - enum:
+          - apple,t8103-nvme-ans2
+          - apple,t6000-nvme-ans2
+      - const: apple,nvme-ans2
+
+  reg:
+    items:
+      - description: NVMe and NVMMU registers
+      - description: ANS2 co-processor control registers
+
+  reg-names:
+    items:
+      - const: nvme
+      - const: ans
+
+  resets:
+    maxItems: 1
+
+  power-domains: true
+
+  mboxes:
+    maxItems: 1
+    description: Mailbox of the ANS2 co-processor
+
+  interrupts:
+    maxItems: 1
+
+  apple,sart:
+    maxItems: 1
+    $ref: /schemas/types.yaml#/definitions/phandle
+    description: |
+      Reference to the SART address filter.
+
+      The SART address filter is documented in apple,sart.yaml.
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - resets
+  - mboxes
+  - interrupts
+  - apple,sart
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/interrupt-controller/apple-aic.h>
+    #include <dt-bindings/interrupt-controller/irq.h>
+
+    nvme@7bcc0000 {
+      compatible = "apple,t8103-nvme-ans2", "apple,nvme-ans2";
+      reg = <0x7bcc0000 0x40000>, <0x77400000 0x4000>;
+      reg-names = "nvme", "ans";
+      interrupts = <AIC_IRQ 590 IRQ_TYPE_LEVEL_HIGH>;
+      mboxes = <&ans>;
+      apple,sart = <&sart>;
+      power-domains = <&ps_ans2>;
+      resets = <&ps_ans2>;
+     };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/9] soc: apple: Always include Makefile
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
  2022-03-21 16:50 ` [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART Sven Peter
  2022-03-21 16:50 ` [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-21 16:50 ` [PATCH 4/9] soc: apple: Add SART driver Sven Peter
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

We want to allow the code inside drivers/soc/apple to be compiled with
COMPILE_TEST but this will currently result in linking errors because
ARCH_APPLE is not set and make will never recurse into
drivers/soc/apple.
Let's just unconditionally recurse into apple/ since all drivers
in there are guarded by config options anyways.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/soc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
index adb30c2d4fea..608f8ce8b600 100644
--- a/drivers/soc/Makefile
+++ b/drivers/soc/Makefile
@@ -4,7 +4,7 @@
 #
 
 obj-$(CONFIG_ARCH_ACTIONS)	+= actions/
-obj-$(CONFIG_ARCH_APPLE)	+= apple/
+obj-y				+= apple/
 obj-y				+= aspeed/
 obj-$(CONFIG_ARCH_AT91)		+= atmel/
 obj-y				+= bcm/
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/9] soc: apple: Add SART driver
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
                   ` (2 preceding siblings ...)
  2022-03-21 16:50 ` [PATCH 3/9] soc: apple: Always include Makefile Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-21 17:07   ` Arnd Bergmann
  2022-03-21 17:17   ` Alyssa Rosenzweig
  2022-03-21 16:50 ` [PATCH 5/9] soc: apple: Add RTKit IPC library Sven Peter
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

The NVMe co-processor on the Apple M1 uses a DMA address filter called
SART for some DMA transactions. This adds a simple driver used to
configure the memory regions from which DMA transactions are allowed.

Co-developed-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 MAINTAINERS                    |   1 +
 drivers/soc/apple/Kconfig      |  11 ++
 drivers/soc/apple/Makefile     |   3 +
 drivers/soc/apple/sart.c       | 318 +++++++++++++++++++++++++++++++++
 include/linux/soc/apple/sart.h |  77 ++++++++
 5 files changed, 410 insertions(+)
 create mode 100644 drivers/soc/apple/sart.c
 create mode 100644 include/linux/soc/apple/sart.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 027c3b4ad61c..3d37fe7a0408 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1787,6 +1787,7 @@ F:	drivers/watchdog/apple_wdt.c
 F:	include/dt-bindings/interrupt-controller/apple-aic.h
 F:	include/dt-bindings/pinctrl/apple.h
 F:	include/linux/apple-mailbox.h
+F:	include/linux/soc/apple/*
 
 ARM/ARTPEC MACHINE SUPPORT
 M:	Jesper Nilsson <jesper.nilsson@axis.com>
diff --git a/drivers/soc/apple/Kconfig b/drivers/soc/apple/Kconfig
index 9b8de31d6a8f..8c37ffd53fbd 100644
--- a/drivers/soc/apple/Kconfig
+++ b/drivers/soc/apple/Kconfig
@@ -17,6 +17,17 @@ config APPLE_PMGR_PWRSTATE
 	  controls for SoC devices. This driver manages them through the
 	  generic power domain framework, and also provides reset support.
 
+config APPLE_SART
+	tristate "Apple SART DMA address filter"
+	depends on ARCH_APPLE || COMPILE_TEST
+	default ARCH_APPLE
+	help
+	  Apple SART is a simple DMA address filter used on Apple SoCs such
+	  as the M1. It is usually required for the NVMe coprocessor which does
+	  not use a proper IOMMU.
+
+	  Say 'y' here if you have an Apple SoC.
+
 endmenu
 
 endif
diff --git a/drivers/soc/apple/Makefile b/drivers/soc/apple/Makefile
index c114e84667e4..c83c66317098 100644
--- a/drivers/soc/apple/Makefile
+++ b/drivers/soc/apple/Makefile
@@ -1,2 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_APPLE_PMGR_PWRSTATE)	+= apple-pmgr-pwrstate.o
+
+obj-$(CONFIG_APPLE_SART) += apple-sart.o
+apple-sart-y = sart.o
diff --git a/drivers/soc/apple/sart.c b/drivers/soc/apple/sart.c
new file mode 100644
index 000000000000..836ea68db2fc
--- /dev/null
+++ b/drivers/soc/apple/sart.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+/*
+ * Apple SART device driver
+ * Copyright (C) The Asahi Linux Contributors
+ *
+ * Apple SART is a simple address filter for some DMA transactions.
+ * Regions of physical memory must be added to the SART's allow
+ * list before any DMA can target these. Unlike a proper
+ * IOMMU no remapping can be done and special support in the
+ * consumer driver is required since not all DMA transactions of
+ * a single device are subject to SART filtering.
+ */
+
+#include <linux/soc/apple/sart.h>
+#include <linux/atomic.h>
+#include <linux/bits.h>
+#include <linux/bitfield.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#define APPLE_SART_MAX_ENTRIES 16
+
+/* This is probably a bitfield but the exact meaning of each bit is unknown. */
+#define APPLE_SART_FLAGS_ALLOW 0xff
+
+/* SARTv2 registers */
+#define APPLE_SART2_CONFIG(idx)	      (0x00 + 4 * (idx))
+#define APPLE_SART2_CONFIG_FLAGS      GENMASK(31, 24)
+#define APPLE_SART2_CONFIG_SIZE	      GENMASK(23, 0)
+#define APPLE_SART2_CONFIG_SIZE_SHIFT 12
+#define APPLE_SART2_CONFIG_SIZE_MAX   GENMASK(23, 0)
+
+#define APPLE_SART2_PADDR(idx)	(0x40 + 4 * (idx))
+#define APPLE_SART2_PADDR_SHIFT 12
+
+/* SARTv3 registers */
+#define APPLE_SART3_CONFIG(idx) (0x00 + 4 * (idx))
+
+#define APPLE_SART3_PADDR(idx)	(0x40 + 4 * (idx))
+#define APPLE_SART3_PADDR_SHIFT 12
+
+#define APPLE_SART3_SIZE(idx)  (0x80 + 4 * (idx))
+#define APPLE_SART3_SIZE_SHIFT 12
+#define APPLE_SART3_SIZE_MAX   GENMASK(29, 0)
+
+struct apple_sart_ops {
+	void (*get_entry)(struct apple_sart *sart, int index, u8 *flags,
+			  phys_addr_t *paddr, size_t *size);
+	int (*set_entry)(struct apple_sart *sart, int index, u8 flags,
+			 phys_addr_t paddr, size_t size);
+};
+
+struct apple_sart {
+	struct device *dev;
+	void __iomem *regs;
+
+	const struct apple_sart_ops *ops;
+
+	unsigned long protected_entries;
+	unsigned long used_entries;
+};
+
+static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
+			    phys_addr_t *paddr, size_t *size)
+{
+	u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
+	u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
+	u32 size_ = FIELD_GET(APPLE_SART2_CONFIG_SIZE, cfg);
+
+	*flags = FIELD_GET(APPLE_SART2_CONFIG_FLAGS, cfg);
+	*size = (size_t)size_ << APPLE_SART2_CONFIG_SIZE_SHIFT;
+	*paddr = (phys_addr_t)paddr_ << APPLE_SART2_PADDR_SHIFT;
+}
+
+static int sart2_set_entry(struct apple_sart *sart, int index, u8 flags,
+			   phys_addr_t paddr, size_t size)
+{
+	u32 cfg;
+
+	if (size & ((1 << APPLE_SART2_CONFIG_SIZE_SHIFT) - 1))
+		return -EINVAL;
+	if (paddr & ((1 << APPLE_SART2_PADDR_SHIFT) - 1))
+		return -EINVAL;
+
+	size >>= APPLE_SART2_CONFIG_SIZE_SHIFT;
+	paddr >>= APPLE_SART2_PADDR_SHIFT;
+
+	if (size > APPLE_SART2_CONFIG_SIZE_MAX)
+		return -EINVAL;
+
+	cfg = FIELD_PREP(APPLE_SART2_CONFIG_FLAGS, flags);
+	cfg |= FIELD_PREP(APPLE_SART2_CONFIG_SIZE, size);
+
+	writel_relaxed(paddr, sart->regs + APPLE_SART2_PADDR(index));
+	writel_relaxed(cfg, sart->regs + APPLE_SART2_CONFIG(index));
+
+	return 0;
+}
+
+static struct apple_sart_ops sart_ops_v2 = {
+	.get_entry = sart2_get_entry,
+	.set_entry = sart2_set_entry,
+};
+
+static void sart3_get_entry(struct apple_sart *sart, int index, u8 *flags,
+			    phys_addr_t *paddr, size_t *size)
+{
+	u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART3_PADDR(index));
+	u32 size_ = readl_relaxed(sart->regs + APPLE_SART3_SIZE(index));
+
+	*flags = readl_relaxed(sart->regs + APPLE_SART3_CONFIG(index));
+	*size = (size_t)size_ << APPLE_SART3_SIZE_SHIFT;
+	*paddr = (phys_addr_t)paddr_ << APPLE_SART3_PADDR_SHIFT;
+}
+
+static int sart3_set_entry(struct apple_sart *sart, int index, u8 flags,
+			   phys_addr_t paddr, size_t size)
+{
+	if (size & ((1 << APPLE_SART3_SIZE_SHIFT) - 1))
+		return -EINVAL;
+	if (paddr & ((1 << APPLE_SART3_PADDR_SHIFT) - 1))
+		return -EINVAL;
+
+	paddr >>= APPLE_SART3_PADDR_SHIFT;
+	size >>= APPLE_SART3_SIZE_SHIFT;
+
+	if (size > APPLE_SART3_SIZE_MAX)
+		return -EINVAL;
+
+	writel_relaxed(paddr, sart->regs + APPLE_SART3_PADDR(index));
+	writel_relaxed(size, sart->regs + APPLE_SART3_SIZE(index));
+	writel_relaxed(flags, sart->regs + APPLE_SART3_CONFIG(index));
+
+	return 0;
+}
+
+static struct apple_sart_ops sart_ops_v3 = {
+	.get_entry = sart3_get_entry,
+	.set_entry = sart3_set_entry,
+};
+
+static int apple_sart_probe(struct platform_device *pdev)
+{
+	int i;
+	struct apple_sart *sart;
+	struct device *dev = &pdev->dev;
+
+	sart = devm_kzalloc(dev, sizeof(*sart), GFP_KERNEL);
+	if (!sart)
+		return -ENOMEM;
+
+	sart->dev = dev;
+	sart->ops = of_device_get_match_data(dev);
+
+	sart->regs = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(sart->regs))
+		return PTR_ERR(sart->regs);
+
+	for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) {
+		u8 flags;
+		size_t size;
+		phys_addr_t paddr;
+
+		sart->ops->get_entry(sart, i, &flags, &paddr, &size);
+
+		if (!flags)
+			continue;
+
+		dev_dbg(sart->dev,
+			"SART bootloader entry: index %02d; flags: 0x%02x; paddr: %pa; size: 0x%zx\n",
+			i, flags, &paddr, size);
+		set_bit(i, &sart->protected_entries);
+	}
+
+	platform_set_drvdata(pdev, sart);
+	return 0;
+}
+
+struct apple_sart *apple_sart_get(struct device *dev)
+{
+	struct device_node *sart_node;
+	struct platform_device *sart_pdev;
+	struct apple_sart *sart;
+
+	sart_node = of_parse_phandle(dev->of_node, "apple,sart", 0);
+	if (!sart_node)
+		return ERR_PTR(ENODEV);
+
+	sart_pdev = of_find_device_by_node(sart_node);
+	of_node_put(sart_node);
+
+	if (!sart_pdev)
+		return ERR_PTR(ENODEV);
+
+	sart = dev_get_drvdata(&sart_pdev->dev);
+	if (!sart)
+		return ERR_PTR(EPROBE_DEFER);
+
+	device_link_add(dev, &sart_pdev->dev,
+			DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
+
+	return sart;
+}
+EXPORT_SYMBOL(apple_sart_get);
+
+int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
+				  size_t size)
+{
+	int i, ret;
+
+	for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) {
+		if (test_bit(i, &sart->protected_entries))
+			continue;
+		if (test_and_set_bit(i, &sart->used_entries))
+			continue;
+
+		ret = sart->ops->set_entry(sart, i, APPLE_SART_FLAGS_ALLOW,
+					   paddr, size);
+		if (ret) {
+			dev_dbg(sart->dev,
+				"unable to set entry %d to [%pa, 0x%zx]\n",
+				i, &paddr, size);
+			clear_bit(i, &sart->used_entries);
+			return ret;
+		}
+
+		dev_dbg(sart->dev, "wrote [%pa, 0x%zx] to %d\n", &paddr, size,
+			i);
+		return 0;
+	}
+
+	dev_warn(sart->dev,
+		 "no free entries left to add [paddr: 0x%llx, size: 0x%zx]\n",
+		 paddr, size);
+
+	return -EBUSY;
+}
+EXPORT_SYMBOL(apple_sart_add_allowed_region);
+
+int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
+				     size_t size)
+{
+	int i;
+
+	dev_dbg(sart->dev,
+		"will remove [paddr: %pa, size: 0x%zx] from allowed regions\n",
+		&paddr, size);
+
+	for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) {
+		u8 eflags;
+		size_t esize;
+		phys_addr_t epaddr;
+
+		if (test_bit(i, &sart->protected_entries))
+			continue;
+
+		sart->ops->get_entry(sart, i, &eflags, &epaddr, &esize);
+
+		if (epaddr != paddr || esize != size)
+			continue;
+
+		sart->ops->set_entry(sart, i, 0, 0, 0);
+
+		clear_bit(i, &sart->used_entries);
+		dev_dbg(sart->dev, "cleared entry %d\n", i);
+		return 0;
+	}
+
+	dev_warn(sart->dev, "entry [paddr: 0x%llx, size: 0x%zx] not found\n",
+		 paddr, size);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(apple_sart_remove_allowed_region);
+
+static void apple_sart_shutdown(struct platform_device *pdev)
+{
+	struct apple_sart *sart = dev_get_drvdata(&pdev->dev);
+	int i;
+
+	for (i = 0; i < APPLE_SART_MAX_ENTRIES; ++i) {
+		if (test_bit(i, &sart->protected_entries))
+			continue;
+
+		sart->ops->set_entry(sart, i, 0, 0, 0);
+	}
+}
+
+static const struct of_device_id apple_sart_of_match[] = {
+	{
+		.compatible = "apple,t6000-sart",
+		.data = &sart_ops_v3,
+	},
+	{
+		.compatible = "apple,t8103-sart",
+		.data = &sart_ops_v2,
+	},
+	{}
+};
+MODULE_DEVICE_TABLE(of, apple_sart_of_match);
+
+static struct platform_driver apple_sart_driver = {
+	.driver = {
+		.name = "apple-sart",
+		.of_match_table = apple_sart_of_match,
+	},
+	.probe = apple_sart_probe,
+	.shutdown = apple_sart_shutdown,
+};
+module_platform_driver(apple_sart_driver);
+
+MODULE_LICENSE("Dual MIT/GPL");
+MODULE_AUTHOR("Sven Peter <sven@svenpeter.dev>");
+MODULE_DESCRIPTION("Apple SART driver");
diff --git a/include/linux/soc/apple/sart.h b/include/linux/soc/apple/sart.h
new file mode 100644
index 000000000000..4e3d4960be5a
--- /dev/null
+++ b/include/linux/soc/apple/sart.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR MIT */
+/*
+ * Apple SART device driver
+ * Copyright (C) The Asahi Linux Contributors
+ *
+ * Apple SART is a simple address filter for DMA transactions.
+ * Regions of physical memory must be added to the SART's allow
+ * list before any DMA can target these. Unlike a proper
+ * IOMMU no remapping can be done.
+ */
+
+#ifndef _LINUX_SOC_APPLE_SART_H_
+#define _LINUX_SOC_APPLE_SART_H_
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/types.h>
+
+struct apple_sart;
+
+#if IS_ENABLED(CONFIG_APPLE_SART)
+
+/*
+ * Get a reference to the SART attached to dev.
+ *
+ * Looks for the phandle reference in apple,sart and returns a pointer
+ * to the corresponding apple_sart struct to be used with
+ * apple_sart_add_allowed_region and apple_sart_remove_allowed_region.
+ */
+struct apple_sart *apple_sart_get(struct device *dev);
+
+/*
+ * Adds the region [paddr, paddr+size] to the DMA allow list.
+ *
+ * @sart: SART reference
+ * @paddr: Start address of the region to be used for DMA
+ * @size: Size of the region to be used for DMA.
+ */
+int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
+				  size_t size);
+
+/*
+ * Removes the region [paddr, paddr+size] from the DMA allow list.
+ *
+ * Note that exact same paddr and size used for apple_sart_add_allowed_region
+ * have to be passed.
+ *
+ * @sart: SART reference
+ * @paddr: Start address of the region no longer used for DMA
+ * @size: Size of the region no longer used for DMA.
+ */
+int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
+				     size_t size);
+
+#else
+
+static inline struct apple_sart *apple_sart_get(struct device *dev)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline int apple_sart_add_allowed_region(struct apple_sart *sart,
+						phys_addr_t paddr, size_t size)
+{
+	return -ENODEV;
+}
+
+static inline int apple_sart_remove_allowed_region(struct apple_sart *sart,
+						   phys_addr_t paddr,
+						   size_t size)
+{
+	return -ENODEV;
+}
+
+#endif /* IS_ENABLED(CONFIG_APPLE_SART) */
+
+#endif /* _LINUX_SOC_APPLE_SART_H_ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
                   ` (3 preceding siblings ...)
  2022-03-21 16:50 ` [PATCH 4/9] soc: apple: Add SART driver Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-22 13:13   ` Arnd Bergmann
                     ` (2 more replies)
  2022-03-21 16:50 ` [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver Sven Peter
                   ` (4 subsequent siblings)
  9 siblings, 3 replies; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

Apple SoCs such as the M1 come with multiple embedded co-processors
running proprietary firmware. Communication with those is established
over a simple mailbox using the RTKit IPC protocol.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/soc/apple/Kconfig          |  13 +
 drivers/soc/apple/Makefile         |   3 +
 drivers/soc/apple/rtkit-crashlog.c | 147 +++++
 drivers/soc/apple/rtkit-internal.h |  76 +++
 drivers/soc/apple/rtkit.c          | 842 +++++++++++++++++++++++++++++
 include/linux/soc/apple/rtkit.h    | 203 +++++++
 6 files changed, 1284 insertions(+)
 create mode 100644 drivers/soc/apple/rtkit-crashlog.c
 create mode 100644 drivers/soc/apple/rtkit-internal.h
 create mode 100644 drivers/soc/apple/rtkit.c
 create mode 100644 include/linux/soc/apple/rtkit.h

diff --git a/drivers/soc/apple/Kconfig b/drivers/soc/apple/Kconfig
index 8c37ffd53fbd..feb56419ac3c 100644
--- a/drivers/soc/apple/Kconfig
+++ b/drivers/soc/apple/Kconfig
@@ -17,6 +17,19 @@ config APPLE_PMGR_PWRSTATE
 	  controls for SoC devices. This driver manages them through the
 	  generic power domain framework, and also provides reset support.
 
+config APPLE_RTKIT
+	tristate "Apple RTKit co-processor IPC protocol"
+	depends on MAILBOX
+	depends on ARCH_APPLE || (COMPILE_TEST && 64BIT)
+	default ARCH_APPLE
+	help
+	  Apple SoCs such as the M1 come with various co-processors running
+	  their proprietary RTKit operating system. This option enables support
+	  for the protocol library used to communicate with those. It is used
+	  by various client drivers.
+
+	  Say 'y' here if you have an Apple SoC.
+
 config APPLE_SART
 	tristate "Apple SART DMA address filter"
 	depends on ARCH_APPLE || COMPILE_TEST
diff --git a/drivers/soc/apple/Makefile b/drivers/soc/apple/Makefile
index c83c66317098..e293770cf66d 100644
--- a/drivers/soc/apple/Makefile
+++ b/drivers/soc/apple/Makefile
@@ -1,5 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_APPLE_PMGR_PWRSTATE)	+= apple-pmgr-pwrstate.o
 
+obj-$(CONFIG_APPLE_RTKIT) += apple-rtkit.o
+apple-rtkit-y = rtkit.o rtkit-crashlog.o
+
 obj-$(CONFIG_APPLE_SART) += apple-sart.o
 apple-sart-y = sart.o
diff --git a/drivers/soc/apple/rtkit-crashlog.c b/drivers/soc/apple/rtkit-crashlog.c
new file mode 100644
index 000000000000..4612c8997632
--- /dev/null
+++ b/drivers/soc/apple/rtkit-crashlog.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+/*
+ * Apple RTKit IPC library
+ * Copyright (C) The Asahi Linux Contributors
+ */
+#include "rtkit-internal.h"
+
+#define FOURCC(a, b, c, d)                                                     \
+	(((u32)(a) << 24) | ((u32)(b) << 16) | ((u32)(c) << 8) | ((u32)(d)))
+
+#define APPLE_RTKIT_CRASHLOG_HEADER FOURCC('C', 'L', 'H', 'E')
+#define APPLE_RTKIT_CRASHLOG_STR FOURCC('C', 's', 't', 'r')
+#define APPLE_RTKIT_CRASHLOG_VERSION FOURCC('C', 'v', 'e', 'r')
+#define APPLE_RTKIT_CRASHLOG_MBOX FOURCC('C', 'm', 'b', 'x')
+#define APPLE_RTKIT_CRASHLOG_TIME FOURCC('C', 't', 'i', 'm')
+
+struct apple_rtkit_crashlog_header {
+	u32 fourcc;
+	u32 version;
+	u32 size;
+	u32 flags;
+	u8 _unk[16];
+};
+static_assert(sizeof(struct apple_rtkit_crashlog_header) == 0x20);
+
+struct apple_rtkit_crashlog_mbox_entry {
+	u64 msg0;
+	u64 msg1;
+	u32 timestamp;
+	u8 _unk[4];
+};
+static_assert(sizeof(struct apple_rtkit_crashlog_mbox_entry) == 0x18);
+
+static void apple_rtkit_crashlog_dump_str(struct apple_rtkit *rtk, u8 *bfr,
+					  size_t size)
+{
+	u32 idx;
+	u8 *ptr, *end;
+
+	memcpy(&idx, bfr, 4);
+
+	ptr = bfr + 4;
+	end = bfr + size;
+	while (ptr < end) {
+		u8 *newline = memchr(ptr, '\n', end - ptr);
+
+		if (newline) {
+			u8 tmp = *newline;
+			*newline = '\0';
+			rtk_warn("Message (id=%x): %s\n", idx, ptr);
+			*newline = tmp;
+			ptr = newline + 1;
+		} else {
+			rtk_warn("Message (id=%x): %s", idx, ptr);
+			break;
+		}
+	}
+}
+
+static void apple_rtkit_crashlog_dump_version(struct apple_rtkit *rtk, u8 *bfr,
+					      size_t size)
+{
+	rtk_warn("Version: %s", bfr + 16);
+}
+
+static void apple_rtkit_crashlog_dump_time(struct apple_rtkit *rtk, u8 *bfr,
+					   size_t size)
+{
+	u64 crash_time;
+
+	memcpy(&crash_time, bfr, 8);
+	rtk_warn("Crash time: %lld", crash_time);
+}
+
+static void apple_rtkit_crashlog_dump_mailbox(struct apple_rtkit *rtk, u8 *bfr,
+					      size_t size)
+{
+	u32 type, index, i;
+	size_t n_messages;
+	struct apple_rtkit_crashlog_mbox_entry entry;
+
+	memcpy(&type, bfr + 16, 4);
+	memcpy(&index, bfr + 24, 4);
+	n_messages = (size - 28) / sizeof(entry);
+
+	rtk_warn("Mailbox history (type = %d, index = %d)", type, index);
+	for (i = 0; i < n_messages; ++i) {
+		memcpy(&entry, bfr + 28 + i * sizeof(entry), sizeof(entry));
+		rtk_warn(" #%03d@%08x: %016llx %016llx", i, entry.timestamp,
+			 entry.msg0, entry.msg1);
+	}
+}
+
+void apple_rtkit_crashlog_dump(struct apple_rtkit *rtk, u8 *bfr, size_t size)
+{
+	size_t offset;
+	u32 section_fourcc, section_size;
+	struct apple_rtkit_crashlog_header header;
+
+	memcpy(&header, bfr, sizeof(header));
+	if (header.fourcc != APPLE_RTKIT_CRASHLOG_HEADER) {
+		rtk_warn("Expected crashlog header but got %x", header.fourcc);
+		return;
+	}
+
+	if (header.size > size) {
+		rtk_warn("Crashlog size (%x) is too large", header.size);
+		return;
+	}
+
+	size = header.size;
+	offset = sizeof(header);
+
+	while (offset < size) {
+		memcpy(&section_fourcc, bfr + offset, 4);
+		memcpy(&section_size, bfr + offset + 12, 4);
+
+		switch (section_fourcc) {
+		case APPLE_RTKIT_CRASHLOG_HEADER:
+			rtk_dbg("End of crashlog reached");
+			return;
+		case APPLE_RTKIT_CRASHLOG_STR:
+			apple_rtkit_crashlog_dump_str(rtk, bfr + offset + 16,
+						      section_size);
+			break;
+		case APPLE_RTKIT_CRASHLOG_VERSION:
+			apple_rtkit_crashlog_dump_version(
+				rtk, bfr + offset + 16, section_size);
+			break;
+		case APPLE_RTKIT_CRASHLOG_MBOX:
+			apple_rtkit_crashlog_dump_mailbox(
+				rtk, bfr + offset + 16, section_size);
+			break;
+		case APPLE_RTKIT_CRASHLOG_TIME:
+			apple_rtkit_crashlog_dump_time(rtk, bfr + offset + 16,
+						       section_size);
+			break;
+		default:
+			rtk_warn("Unknown crashlog section: %x",
+				 section_fourcc);
+		}
+
+		offset += section_size;
+	}
+
+	rtk_warn("End of crashlog reached but no footer present");
+}
diff --git a/drivers/soc/apple/rtkit-internal.h b/drivers/soc/apple/rtkit-internal.h
new file mode 100644
index 000000000000..6ff8b2cd2532
--- /dev/null
+++ b/drivers/soc/apple/rtkit-internal.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR MIT */
+/*
+ * Apple RTKit IPC library
+ * Copyright (C) The Asahi Linux Contributors
+ */
+
+#ifndef _APPLE_RTKIT_INTERAL_H
+#define _APPLE_RTKIT_INTERAL_H
+
+#include <linux/apple-mailbox.h>
+#include <linux/bitfield.h>
+#include <linux/bitmap.h>
+#include <linux/completion.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/kthread.h>
+#include <linux/kfifo.h>
+#include <linux/mailbox_client.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/soc/apple/rtkit.h>
+#include <linux/wait.h>
+
+#define rtk_err(format, arg...) dev_err(rtk->dev, "RTKit: " format, ##arg)
+#define rtk_warn(format, arg...) dev_warn(rtk->dev, "RTKit: " format, ##arg)
+#define rtk_info(format, arg...) dev_info(rtk->dev, "RTKit: " format, ##arg)
+#define rtk_dbg(format, arg...) dev_dbg(rtk->dev, "RTKit: " format, ##arg)
+
+
+#define APPLE_RTKIT_APP_ENDPOINT_START 0x20
+#define APPLE_RTKIT_MAX_ENDPOINTS 0x100
+
+struct apple_rtkit_work {
+	unsigned int type;
+	struct apple_mbox_msg msg;
+};
+
+struct apple_rtkit {
+	void *cookie;
+	const struct apple_rtkit_ops *ops;
+	struct device *dev;
+	struct mbox_client mbox_cl;
+	struct mbox_chan *mbox_chan;
+
+	struct completion epmap_completion;
+	struct completion reinit_completion;
+	struct completion iop_pwr_ack_completion;
+	struct completion ap_pwr_ack_completion;
+
+	int boot_result;
+	int version;
+
+	unsigned int iop_power_state;
+	unsigned int ap_power_state;
+	bool crashed;
+
+	struct task_struct *task;
+
+	struct wait_queue_head wq;
+	DECLARE_KFIFO(work_fifo, struct apple_rtkit_work, 64);
+	spinlock_t work_lock;
+
+	DECLARE_BITMAP(endpoints, APPLE_RTKIT_MAX_ENDPOINTS);
+
+	struct apple_rtkit_shmem ioreport_buffer;
+	struct apple_rtkit_shmem crashlog_buffer;
+
+	struct apple_rtkit_shmem syslog_buffer;
+	char *syslog_msg_buffer;
+	size_t syslog_n_entries;
+	size_t syslog_msg_size;
+};
+
+void apple_rtkit_crashlog_dump(struct apple_rtkit *rtk, u8 *bfr, size_t size);
+
+#endif
diff --git a/drivers/soc/apple/rtkit.c b/drivers/soc/apple/rtkit.c
new file mode 100644
index 000000000000..7a93c6a99ae9
--- /dev/null
+++ b/drivers/soc/apple/rtkit.c
@@ -0,0 +1,842 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+/*
+ * Apple RTKit IPC library
+ * Copyright (C) The Asahi Linux Contributors
+ */
+
+//#define DEBUG
+
+#include "rtkit-internal.h"
+
+enum { APPLE_RTKIT_WORK_MSG,
+	APPLE_RTKIT_WORK_REINIT,
+};
+
+enum { APPLE_RTKIT_PWR_STATE_OFF = 0x00,
+	APPLE_RTKIT_PWR_STATE_SLEEP = 0x01,
+	APPLE_RTKIT_PWR_STATE_GATED = 0x02,
+	APPLE_RTKIT_PWR_STATE_QUIESCED = 0x10,
+	APPLE_RTKIT_PWR_STATE_ON = 0x20,
+};
+
+enum { APPLE_RTKIT_EP_MGMT = 0,
+	APPLE_RTKIT_EP_CRASHLOG = 1,
+	APPLE_RTKIT_EP_SYSLOG = 2,
+	APPLE_RTKIT_EP_DEBUG = 3,
+	APPLE_RTKIT_EP_IOREPORT = 4,
+	APPLE_RTKIT_EP_OSLOG = 8,
+};
+
+#define APPLE_RTKIT_MGMT_TYPE GENMASK(59, 52)
+
+enum { APPLE_RTKIT_MGMT_HELLO = 1,
+	APPLE_RTKIT_MGMT_HELLO_REPLY = 2,
+	APPLE_RTKIT_MGMT_STARTEP = 5,
+	APPLE_RTKIT_MGMT_SET_IOP_PWR_STATE = 6,
+	APPLE_RTKIT_MGMT_SET_IOP_PWR_STATE_ACK = 7,
+	APPLE_RTKIT_MGMT_EPMAP = 8,
+	APPLE_RTKIT_MGMT_EPMAP_REPLY = 8,
+	APPLE_RTKIT_MGMT_SET_AP_PWR_STATE = 0xb,
+	APPLE_RTKIT_MGMT_SET_AP_PWR_STATE_ACK = 0xb,
+};
+
+#define APPLE_RTKIT_MGMT_HELLO_MINVER GENMASK(15, 0)
+#define APPLE_RTKIT_MGMT_HELLO_MAXVER GENMASK(31, 16)
+
+#define APPLE_RTKIT_MGMT_EPMAP_LAST   BIT(51)
+#define APPLE_RTKIT_MGMT_EPMAP_BASE   GENMASK(34, 32)
+#define APPLE_RTKIT_MGMT_EPMAP_BITMAP GENMASK(31, 0)
+
+#define APPLE_RTKIT_MGMT_EPMAP_REPLY_MORE BIT(0)
+
+#define APPLE_RTKIT_MGMT_STARTEP_EP   GENMASK(39, 32)
+#define APPLE_RTKIT_MGMT_STARTEP_FLAG BIT(1)
+
+#define APPLE_RTKIT_MGMT_PWR_STATE GENMASK(15, 0)
+
+#define APPLE_RTKIT_CRASHLOG_CRASH 1
+
+#define APPLE_RTKIT_BUFFER_REQUEST	1
+#define APPLE_RTKIT_BUFFER_REQUEST_SIZE GENMASK(51, 44)
+#define APPLE_RTKIT_BUFFER_REQUEST_IOVA GENMASK(41, 0)
+
+#define APPLE_RTKIT_SYSLOG_TYPE GENMASK(59, 52)
+
+#define APPLE_RTKIT_SYSLOG_LOG 5
+
+#define APPLE_RTKIT_SYSLOG_INIT	     8
+#define APPLE_RTKIT_SYSLOG_N_ENTRIES GENMASK(7, 0)
+#define APPLE_RTKIT_SYSLOG_MSG_SIZE  GENMASK(31, 24)
+
+#define APPLE_RTKIT_OSLOG_TYPE GENMASK(63, 56)
+#define APPLE_RTKIT_OSLOG_INIT	1
+#define APPLE_RTKIT_OSLOG_ACK	3
+
+#define APPLE_RTKIT_MIN_SUPPORTED_VERSION 11
+#define APPLE_RTKIT_MAX_SUPPORTED_VERSION 12
+
+bool apple_rtkit_is_running(struct apple_rtkit *rtk)
+{
+	if (rtk->crashed)
+		return false;
+	if ((rtk->iop_power_state & 0xff) != APPLE_RTKIT_PWR_STATE_ON)
+		return false;
+	if ((rtk->ap_power_state & 0xff) != APPLE_RTKIT_PWR_STATE_ON)
+		return false;
+	return true;
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_is_running);
+
+bool apple_rtkit_is_crashed(struct apple_rtkit *rtk)
+{
+	return rtk->crashed;
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_is_crashed);
+
+static void apple_rtkit_management_send(struct apple_rtkit *rtk, u8 type,
+					u64 msg)
+{
+	msg &= ~APPLE_RTKIT_MGMT_TYPE;
+	msg |= FIELD_PREP(APPLE_RTKIT_MGMT_TYPE, type);
+	apple_rtkit_send_message(rtk, APPLE_RTKIT_EP_MGMT, msg);
+}
+
+static void apple_rtkit_management_rx_hello(struct apple_rtkit *rtk, u64 msg)
+{
+	u64 reply;
+
+	int min_ver = FIELD_GET(APPLE_RTKIT_MGMT_HELLO_MINVER, msg);
+	int max_ver = FIELD_GET(APPLE_RTKIT_MGMT_HELLO_MAXVER, msg);
+	int want_ver = min(APPLE_RTKIT_MAX_SUPPORTED_VERSION, max_ver);
+
+	rtk_dbg("Min ver %d, max ver %d\n", min_ver, max_ver);
+
+	if (min_ver > APPLE_RTKIT_MAX_SUPPORTED_VERSION) {
+		rtk_err("Firmware min version %d is too new\n", min_ver);
+		goto abort_boot;
+	}
+
+	if (max_ver < APPLE_RTKIT_MIN_SUPPORTED_VERSION) {
+		rtk_err("Firmware max version %d is too old\n", max_ver);
+		goto abort_boot;
+	}
+
+	rtk_info("Initializing (protocol version %d)\n", want_ver);
+	rtk->version = want_ver;
+
+	reply = FIELD_PREP(APPLE_RTKIT_MGMT_HELLO_MINVER, want_ver);
+	reply |= FIELD_PREP(APPLE_RTKIT_MGMT_HELLO_MAXVER, want_ver);
+	apple_rtkit_management_send(rtk, APPLE_RTKIT_MGMT_HELLO_REPLY, reply);
+
+	return;
+
+abort_boot:
+	rtk->boot_result = -EINVAL;
+	complete_all(&rtk->epmap_completion);
+}
+
+static void apple_rtkit_management_rx_epmap(struct apple_rtkit *rtk, u64 msg)
+{
+	int i, ep;
+	u64 reply;
+	unsigned long bitmap = FIELD_GET(APPLE_RTKIT_MGMT_EPMAP_BITMAP, msg);
+	u32 base = FIELD_GET(APPLE_RTKIT_MGMT_EPMAP_BASE, msg);
+
+	rtk_dbg("received endpoint bitmap 0x%lx with base 0x%x\n", bitmap,
+		base);
+
+	for_each_set_bit(i, &bitmap, 32) {
+		ep = 32 * base + i;
+		rtk_dbg("Discovered endpoint 0x%02x\n", ep);
+		set_bit(ep, rtk->endpoints);
+	}
+
+	reply = FIELD_PREP(APPLE_RTKIT_MGMT_EPMAP_BASE, base);
+	if (msg & APPLE_RTKIT_MGMT_EPMAP_LAST)
+		reply |= APPLE_RTKIT_MGMT_EPMAP_LAST;
+	else
+		reply |= APPLE_RTKIT_MGMT_EPMAP_REPLY_MORE;
+
+	apple_rtkit_management_send(rtk, APPLE_RTKIT_MGMT_EPMAP_REPLY, reply);
+
+	if (!(msg & APPLE_RTKIT_MGMT_EPMAP_LAST))
+		return;
+
+	for_each_set_bit(ep, rtk->endpoints, APPLE_RTKIT_APP_ENDPOINT_START) {
+		switch (ep) {
+		/* the management endpoint is started by default */
+		case APPLE_RTKIT_EP_MGMT:
+			break;
+
+		/* without starting these RTKit refuses to boot */
+		case APPLE_RTKIT_EP_SYSLOG:
+		case APPLE_RTKIT_EP_CRASHLOG:
+		case APPLE_RTKIT_EP_DEBUG:
+		case APPLE_RTKIT_EP_IOREPORT:
+		case APPLE_RTKIT_EP_OSLOG:
+			rtk_dbg("Starting system endpoint 0x%02x\n", ep);
+			apple_rtkit_start_ep(rtk, ep);
+			break;
+
+		default:
+			rtk_warn("Unknown system endpoint: 0x%02x\n", ep);
+		}
+	}
+
+	complete_all(&rtk->epmap_completion);
+}
+
+static void apple_rtkit_management_rx_iop_pwr_ack(struct apple_rtkit *rtk,
+						  u64 msg)
+{
+	unsigned int new_state = FIELD_GET(APPLE_RTKIT_MGMT_PWR_STATE, msg);
+
+	rtk_dbg("IOP power state transition: 0x%x -> 0x%x\n",
+		rtk->iop_power_state, new_state);
+	rtk->iop_power_state = new_state;
+
+	complete_all(&rtk->iop_pwr_ack_completion);
+}
+
+static void apple_rtkit_management_rx_ap_pwr_ack(struct apple_rtkit *rtk,
+						 u64 msg)
+{
+	unsigned int new_state = FIELD_GET(APPLE_RTKIT_MGMT_PWR_STATE, msg);
+
+	rtk_dbg("AP power state transition: 0x%x -> 0x%x\n",
+		rtk->ap_power_state, new_state);
+	rtk->ap_power_state = new_state;
+
+	complete_all(&rtk->ap_pwr_ack_completion);
+}
+
+static void apple_rtkit_management_rx(struct apple_rtkit *rtk, u64 msg)
+{
+	u8 type = FIELD_GET(APPLE_RTKIT_MGMT_TYPE, msg);
+
+	switch (type) {
+	case APPLE_RTKIT_MGMT_HELLO:
+		apple_rtkit_management_rx_hello(rtk, msg);
+		break;
+	case APPLE_RTKIT_MGMT_EPMAP:
+		apple_rtkit_management_rx_epmap(rtk, msg);
+		break;
+	case APPLE_RTKIT_MGMT_SET_IOP_PWR_STATE_ACK:
+		apple_rtkit_management_rx_iop_pwr_ack(rtk, msg);
+		break;
+	case APPLE_RTKIT_MGMT_SET_AP_PWR_STATE_ACK:
+		apple_rtkit_management_rx_ap_pwr_ack(rtk, msg);
+		break;
+	default:
+		rtk_warn("unknown management message: 0x%llx (type: 0x%02x)\n",
+			 msg, type);
+	}
+}
+
+static int apple_rtkit_common_rx_get_buffer(struct apple_rtkit *rtk,
+					    struct apple_rtkit_shmem *buffer,
+					    u8 ep, u64 msg)
+{
+	size_t n_4kpages = FIELD_GET(APPLE_RTKIT_BUFFER_REQUEST_SIZE, msg);
+	size_t size = n_4kpages << 12;
+	dma_addr_t iova = FIELD_GET(APPLE_RTKIT_BUFFER_REQUEST_IOVA, msg);
+	u64 reply;
+	int err;
+
+	rtk_dbg("buffer request for 0x%zx bytes at %pad\n", size, &iova);
+
+	if (iova && (!rtk->ops->shmem_setup || !rtk->ops->shmem_destroy))
+		return -EINVAL;
+
+	if (rtk->ops->shmem_setup) {
+		err = rtk->ops->shmem_setup(rtk->cookie, buffer, iova, size);
+		if (err < 0)
+			return err;
+	} else {
+		buffer->buffer =
+			dma_alloc_coherent(rtk->dev, size, &iova, GFP_KERNEL);
+		if (!buffer->buffer)
+			return -ENOMEM;
+
+		buffer->size = size;
+		buffer->iova = iova;
+	}
+
+	if (!buffer->is_mapped) {
+		reply = FIELD_PREP(APPLE_RTKIT_SYSLOG_TYPE,
+				   APPLE_RTKIT_BUFFER_REQUEST);
+		reply |= FIELD_PREP(APPLE_RTKIT_BUFFER_REQUEST_SIZE, n_4kpages);
+		reply |= FIELD_PREP(APPLE_RTKIT_BUFFER_REQUEST_IOVA,
+				    buffer->iova);
+		apple_rtkit_send_message(rtk, ep, reply);
+	}
+
+	return 0;
+}
+
+static void apple_rtkit_free_buffer(struct apple_rtkit *rtk,
+				    struct apple_rtkit_shmem *bfr)
+{
+	if (bfr->size == 0)
+		return;
+
+	if (rtk->ops->shmem_destroy)
+		rtk->ops->shmem_destroy(rtk->cookie, bfr);
+	else if (bfr->buffer)
+		dma_free_coherent(rtk->dev, bfr->size, bfr->buffer, bfr->iova);
+
+	bfr->buffer = NULL;
+	bfr->iomem = NULL;
+	bfr->iova = 0;
+	bfr->size = 0;
+	bfr->is_mapped = false;
+}
+
+static void apple_rtkit_memcpy(struct apple_rtkit *rtk, void *dst,
+			       struct apple_rtkit_shmem *bfr, size_t offset,
+			       size_t len)
+{
+	if (bfr->iomem)
+		memcpy_fromio(dst, bfr->iomem + offset, len);
+	else
+		memcpy(dst, bfr->buffer + offset, len);
+}
+
+static void apple_rtkit_crashlog_rx(struct apple_rtkit *rtk, u64 msg)
+{
+	u8 type = FIELD_GET(APPLE_RTKIT_SYSLOG_TYPE, msg);
+	u8 *bfr;
+
+	if (type != APPLE_RTKIT_CRASHLOG_CRASH) {
+		rtk_warn("Unknown crashlog message: %llx\n", msg);
+		return;
+	}
+
+	if (!rtk->crashlog_buffer.size) {
+		apple_rtkit_common_rx_get_buffer(rtk, &rtk->crashlog_buffer,
+						 APPLE_RTKIT_EP_CRASHLOG, msg);
+		return;
+	}
+
+	rtk_err("co-processor has crashed.\n");
+
+	/*
+	 * create a shadow copy here to make sure the co-processor isn't able
+	 * to change the log while we're dumping it. this also ensures
+	 * the buffer is in normal memory and not iomem for e.g. the SMC
+	 */
+	bfr = kzalloc(rtk->crashlog_buffer.size, GFP_KERNEL);
+	if (bfr) {
+		apple_rtkit_memcpy(rtk, bfr, &rtk->crashlog_buffer, 0,
+				   rtk->crashlog_buffer.size);
+		apple_rtkit_crashlog_dump(rtk, bfr, rtk->crashlog_buffer.size);
+		kfree(bfr);
+	} else {
+		rtk_err("Couldn't allocate crashlog shadow buffer.");
+	}
+
+	rtk->crashed = true;
+	if (rtk->ops->crashed)
+		rtk->ops->crashed(rtk->cookie);
+}
+
+static void apple_rtkit_ioreport_rx(struct apple_rtkit *rtk, u64 msg)
+{
+	u8 type = FIELD_GET(APPLE_RTKIT_SYSLOG_TYPE, msg);
+
+	switch (type) {
+	case APPLE_RTKIT_BUFFER_REQUEST:
+		apple_rtkit_common_rx_get_buffer(rtk, &rtk->ioreport_buffer,
+						 APPLE_RTKIT_EP_IOREPORT, msg);
+		break;
+	/* unknown, must be ACKed or the co-processor will hang */
+	case 0x8:
+	case 0xc:
+		apple_rtkit_send_message(rtk, APPLE_RTKIT_EP_IOREPORT, msg);
+		break;
+	default:
+		rtk_warn("Unknown ioreport message: %llx\n", msg);
+	}
+}
+
+static void apple_rtkit_syslog_rx_init(struct apple_rtkit *rtk, u64 msg)
+{
+	rtk->syslog_n_entries = FIELD_GET(APPLE_RTKIT_SYSLOG_N_ENTRIES, msg);
+	rtk->syslog_msg_size = FIELD_GET(APPLE_RTKIT_SYSLOG_MSG_SIZE, msg);
+
+	rtk->syslog_msg_buffer = kzalloc(rtk->syslog_msg_size, GFP_KERNEL);
+
+	rtk_dbg("syslog initialized: entries: %zd, msg_size: %zd\n",
+		rtk->syslog_n_entries, rtk->syslog_msg_size);
+}
+
+static void apple_rtkit_syslog_rx_log(struct apple_rtkit *rtk, u64 msg)
+{
+	u8 idx = msg & 0xff;
+	char log_context[24];
+	size_t entry_size = 0x20 + rtk->syslog_msg_size;
+
+	if (!rtk->syslog_buffer.size) {
+		rtk_warn(
+			"received syslog message but syslog_buffer.size is zero");
+		goto done;
+	}
+	if (!rtk->syslog_buffer.buffer && !rtk->syslog_buffer.iomem) {
+		rtk_warn("received syslog message but no syslog_buffer.buffer or syslog_buffer.iomem");
+		goto done;
+	}
+	if (idx > rtk->syslog_n_entries) {
+		rtk_warn("syslog index %d out of range", idx);
+		goto done;
+	}
+
+	apple_rtkit_memcpy(rtk, log_context, &rtk->syslog_buffer,
+			   idx * entry_size + 8, sizeof(log_context));
+	apple_rtkit_memcpy(rtk, rtk->syslog_msg_buffer, &rtk->syslog_buffer,
+			   idx * entry_size + 8 + sizeof(log_context),
+			   rtk->syslog_msg_size);
+
+	log_context[sizeof(log_context) - 1] = 0;
+	rtk->syslog_msg_buffer[rtk->syslog_msg_size - 1] = 0;
+	rtk_info("syslog message: %s: %s", log_context, rtk->syslog_msg_buffer);
+
+done:
+	apple_rtkit_send_message(rtk, APPLE_RTKIT_EP_SYSLOG, msg);
+}
+
+static void apple_rtkit_syslog_rx(struct apple_rtkit *rtk, u64 msg)
+{
+	u8 type = FIELD_GET(APPLE_RTKIT_SYSLOG_TYPE, msg);
+
+	switch (type) {
+	case APPLE_RTKIT_BUFFER_REQUEST:
+		apple_rtkit_common_rx_get_buffer(rtk, &rtk->syslog_buffer,
+						 APPLE_RTKIT_EP_SYSLOG, msg);
+		break;
+	case APPLE_RTKIT_SYSLOG_INIT:
+		apple_rtkit_syslog_rx_init(rtk, msg);
+		break;
+	case APPLE_RTKIT_SYSLOG_LOG:
+		apple_rtkit_syslog_rx_log(rtk, msg);
+		break;
+	default:
+		rtk_warn("Unknown syslog message: %llx\n", msg);
+	}
+}
+
+static void apple_rtkit_oslog_rx_init(struct apple_rtkit *rtk, u64 msg)
+{
+	u64 ack;
+
+	rtk_dbg("oslog init: msg: 0x%llx\n", msg);
+	ack = FIELD_PREP(APPLE_RTKIT_OSLOG_TYPE, APPLE_RTKIT_OSLOG_ACK);
+	apple_rtkit_send_message(rtk, APPLE_RTKIT_EP_OSLOG, ack);
+}
+
+static void apple_rtkit_oslog_rx(struct apple_rtkit *rtk, u64 msg)
+{
+	u8 type = FIELD_GET(APPLE_RTKIT_OSLOG_TYPE, msg);
+
+	switch (type) {
+	case APPLE_RTKIT_OSLOG_INIT:
+		apple_rtkit_oslog_rx_init(rtk, msg);
+		break;
+	default:
+		rtk_warn("Unknown oslog message: %llx\n", msg);
+	}
+}
+
+static void apple_rtkit_rx(struct apple_rtkit *rtk, struct apple_mbox_msg *msg)
+{
+	u8 ep = msg->msg1;
+
+	if (!test_bit(ep, rtk->endpoints))
+		rtk_warn("Message to undiscovered endpoint 0x%02x", ep);
+
+	switch (ep) {
+	case APPLE_RTKIT_EP_MGMT:
+		apple_rtkit_management_rx(rtk, msg->msg0);
+		break;
+	case APPLE_RTKIT_EP_CRASHLOG:
+		apple_rtkit_crashlog_rx(rtk, msg->msg0);
+		break;
+	case APPLE_RTKIT_EP_SYSLOG:
+		apple_rtkit_syslog_rx(rtk, msg->msg0);
+		break;
+	case APPLE_RTKIT_EP_IOREPORT:
+		apple_rtkit_ioreport_rx(rtk, msg->msg0);
+		break;
+	case APPLE_RTKIT_EP_OSLOG:
+		apple_rtkit_oslog_rx(rtk, msg->msg0);
+		break;
+	case APPLE_RTKIT_APP_ENDPOINT_START ... 0xff:
+		rtk->ops->recv_message(rtk->cookie, ep, msg->msg0);
+		break;
+	default:
+		rtk_warn("message to unknown endpoint %02x: %llx\n", ep,
+			 msg->msg0);
+	}
+}
+
+static void apple_rtkit_do_reinit(struct apple_rtkit *rtk)
+{
+	apple_rtkit_free_buffer(rtk, &rtk->ioreport_buffer);
+	apple_rtkit_free_buffer(rtk, &rtk->crashlog_buffer);
+	apple_rtkit_free_buffer(rtk, &rtk->syslog_buffer);
+
+	kfree(rtk->syslog_msg_buffer);
+
+	rtk->syslog_msg_buffer = NULL;
+	rtk->syslog_n_entries = 0;
+	rtk->syslog_msg_size = 0;
+
+	bitmap_zero(rtk->endpoints, APPLE_RTKIT_MAX_ENDPOINTS);
+	set_bit(APPLE_RTKIT_EP_MGMT, rtk->endpoints);
+
+	reinit_completion(&rtk->epmap_completion);
+	reinit_completion(&rtk->iop_pwr_ack_completion);
+	reinit_completion(&rtk->ap_pwr_ack_completion);
+
+	rtk->crashed = false;
+	rtk->iop_power_state = APPLE_RTKIT_PWR_STATE_OFF;
+	rtk->ap_power_state = APPLE_RTKIT_PWR_STATE_OFF;
+
+	complete_all(&rtk->reinit_completion);
+}
+
+static int apple_rtkit_worker(void *data)
+{
+	struct apple_rtkit *rtk = data;
+	struct apple_rtkit_work work;
+
+	while (!kthread_should_stop()) {
+		wait_event_interruptible(rtk->wq,
+					 kfifo_len(&rtk->work_fifo) > 0 ||
+						 kthread_should_stop());
+
+		if (kthread_should_stop())
+			break;
+
+		while (kfifo_out_spinlocked(&rtk->work_fifo, &work, 1,
+					    &rtk->work_lock) == 1) {
+			switch (work.type) {
+			case APPLE_RTKIT_WORK_MSG:
+				apple_rtkit_rx(rtk, &work.msg);
+				break;
+			case APPLE_RTKIT_WORK_REINIT:
+				apple_rtkit_do_reinit(rtk);
+				break;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static void apple_rtkit_rx_callback(struct mbox_client *cl, void *mssg)
+{
+	struct apple_rtkit *rtk = container_of(cl, struct apple_rtkit, mbox_cl);
+	struct apple_mbox_msg *msg = mssg;
+	struct apple_rtkit_work work;
+
+	dma_rmb();
+
+	memcpy(&work.msg, msg, sizeof(*msg));
+	work.type = APPLE_RTKIT_WORK_MSG;
+
+	kfifo_in_spinlocked(&rtk->work_fifo, &work, 1, &rtk->work_lock);
+	wake_up(&rtk->wq);
+}
+
+int apple_rtkit_send_message(struct apple_rtkit *rtk, u8 ep, u64 message)
+{
+	struct apple_mbox_msg msg;
+
+	if (rtk->crashed)
+		return -EINVAL;
+	if (ep >= APPLE_RTKIT_APP_ENDPOINT_START &&
+	    !apple_rtkit_is_running(rtk))
+		return -EINVAL;
+
+	msg.msg0 = (u64)message;
+	msg.msg1 = ep;
+	dma_wmb();
+
+	return mbox_send_message(rtk->mbox_chan, &msg);
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_send_message);
+
+int apple_rtkit_start_ep(struct apple_rtkit *rtk, u8 endpoint)
+{
+	u64 msg;
+
+	if (!test_bit(endpoint, rtk->endpoints))
+		return -EINVAL;
+	if (endpoint >= APPLE_RTKIT_APP_ENDPOINT_START &&
+	    !apple_rtkit_is_running(rtk))
+		return -EINVAL;
+
+	msg = FIELD_PREP(APPLE_RTKIT_MGMT_STARTEP_EP, endpoint);
+	msg |= APPLE_RTKIT_MGMT_STARTEP_FLAG;
+	apple_rtkit_management_send(rtk, APPLE_RTKIT_MGMT_STARTEP, msg);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_start_ep);
+
+static int apple_rtkit_start_worker(struct apple_rtkit *rtk)
+{
+	rtk->task = kthread_run(apple_rtkit_worker, rtk, "%s-rtkit-worker",
+				dev_name(rtk->dev));
+	if (IS_ERR(rtk->task))
+		return PTR_ERR(rtk->task);
+	return 0;
+}
+
+struct apple_rtkit *apple_rtkit_init(struct device *dev, void *cookie,
+				     const char *mbox_name, int mbox_idx,
+				     const struct apple_rtkit_ops *ops)
+{
+	struct apple_rtkit *rtk;
+	int ret;
+
+	if (!ops)
+		return ERR_PTR(-EINVAL);
+
+	rtk = kzalloc(sizeof(*rtk), GFP_KERNEL);
+	if (!rtk)
+		return ERR_PTR(-ENOMEM);
+
+	rtk->dev = dev;
+	rtk->cookie = cookie;
+	rtk->ops = ops;
+
+	INIT_KFIFO(rtk->work_fifo);
+	spin_lock_init(&rtk->work_lock);
+	init_waitqueue_head(&rtk->wq);
+	init_completion(&rtk->epmap_completion);
+	init_completion(&rtk->reinit_completion);
+	init_completion(&rtk->iop_pwr_ack_completion);
+	init_completion(&rtk->ap_pwr_ack_completion);
+
+	bitmap_zero(rtk->endpoints, APPLE_RTKIT_MAX_ENDPOINTS);
+	set_bit(APPLE_RTKIT_EP_MGMT, rtk->endpoints);
+
+	ret = apple_rtkit_start_worker(rtk);
+	if (ret)
+		return ERR_PTR(ret);
+
+	rtk->mbox_cl.dev = dev;
+	rtk->mbox_cl.tx_block = true;
+	rtk->mbox_cl.knows_txdone = false;
+	rtk->mbox_cl.rx_callback = &apple_rtkit_rx_callback;
+
+	if (mbox_name)
+		rtk->mbox_chan =
+			mbox_request_channel_byname(&rtk->mbox_cl, mbox_name);
+	else
+		rtk->mbox_chan = mbox_request_channel(&rtk->mbox_cl, mbox_idx);
+
+	if (IS_ERR(rtk->mbox_chan))
+		return (struct apple_rtkit *)rtk->mbox_chan;
+
+	return rtk;
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_init);
+
+static int apple_rtkit_wait_for_completion(struct completion *c)
+{
+	long t;
+
+	t = wait_for_completion_interruptible_timeout(c,
+						      msecs_to_jiffies(1000));
+	if (t == -ERESTARTSYS)
+		return t;
+	else if (t == 0)
+		return -ETIME;
+	else
+		return 0;
+}
+
+int apple_rtkit_reinit(struct apple_rtkit *rtk)
+{
+	struct apple_rtkit_work work;
+
+	reinit_completion(&rtk->reinit_completion);
+
+	work.type = APPLE_RTKIT_WORK_REINIT;
+	kfifo_in_spinlocked(&rtk->work_fifo, &work, 1, &rtk->work_lock);
+	wake_up(&rtk->wq);
+
+	return apple_rtkit_wait_for_completion(&rtk->reinit_completion);
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_reinit);
+
+static int apple_rtkit_set_ap_power_state(struct apple_rtkit *rtk,
+					  unsigned int state)
+{
+	u64 msg;
+	int ret;
+
+	reinit_completion(&rtk->ap_pwr_ack_completion);
+
+	msg = FIELD_PREP(APPLE_RTKIT_MGMT_PWR_STATE, state);
+	apple_rtkit_management_send(rtk, APPLE_RTKIT_MGMT_SET_AP_PWR_STATE,
+				    msg);
+
+	ret = apple_rtkit_wait_for_completion(&rtk->ap_pwr_ack_completion);
+	if (ret)
+		return ret;
+
+	if (rtk->ap_power_state != state)
+		return -EINVAL;
+	return 0;
+}
+
+static int apple_rtkit_set_iop_power_state(struct apple_rtkit *rtk,
+					   unsigned int state)
+{
+	u64 msg;
+	int ret;
+
+	reinit_completion(&rtk->iop_pwr_ack_completion);
+
+	msg = FIELD_PREP(APPLE_RTKIT_MGMT_PWR_STATE, state);
+	apple_rtkit_management_send(rtk, APPLE_RTKIT_MGMT_SET_IOP_PWR_STATE,
+				    msg);
+
+	ret = apple_rtkit_wait_for_completion(&rtk->iop_pwr_ack_completion);
+	if (ret)
+		return ret;
+
+	if (rtk->iop_power_state != state)
+		return -EINVAL;
+	return 0;
+}
+
+int apple_rtkit_boot(struct apple_rtkit *rtk)
+{
+	int ret;
+
+	if (apple_rtkit_is_running(rtk))
+		return 0;
+	if (rtk->crashed)
+		return -EINVAL;
+
+	rtk_dbg("waiting for boot to finish\n");
+	ret = apple_rtkit_wait_for_completion(&rtk->epmap_completion);
+	if (ret)
+		return ret;
+	if (rtk->boot_result)
+		return rtk->boot_result;
+
+	rtk_dbg("waiting for IOP power state ACK\n");
+	ret = apple_rtkit_wait_for_completion(&rtk->iop_pwr_ack_completion);
+	if (ret)
+		return ret;
+
+	return apple_rtkit_set_ap_power_state(rtk, APPLE_RTKIT_PWR_STATE_ON);
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_boot);
+
+int apple_rtkit_shutdown(struct apple_rtkit *rtk)
+{
+	int ret;
+
+	/* if OFF is used here the co-processor will not wake up again */
+	ret = apple_rtkit_set_ap_power_state(rtk,
+					     APPLE_RTKIT_PWR_STATE_QUIESCED);
+	if (ret)
+		return ret;
+
+	ret = apple_rtkit_set_iop_power_state(rtk, APPLE_RTKIT_PWR_STATE_SLEEP);
+	if (ret)
+		return ret;
+
+	return apple_rtkit_reinit(rtk);
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_shutdown);
+
+int apple_rtkit_hibernate(struct apple_rtkit *rtk)
+{
+	int ret;
+
+	ret = apple_rtkit_set_ap_power_state(rtk,
+					     APPLE_RTKIT_PWR_STATE_QUIESCED);
+	if (ret)
+		return ret;
+
+	ret = apple_rtkit_set_iop_power_state(rtk,
+					      APPLE_RTKIT_PWR_STATE_QUIESCED);
+	if (ret)
+		return ret;
+
+	ret = apple_rtkit_reinit(rtk);
+	if (ret)
+		return ret;
+
+	// TODO: apple_rtkit_reinit resets these so we have to restore them here :/
+	rtk->iop_power_state = APPLE_RTKIT_PWR_STATE_QUIESCED;
+	rtk->ap_power_state = APPLE_RTKIT_PWR_STATE_QUIESCED;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_hibernate);
+
+int apple_rtkit_wake(struct apple_rtkit *rtk)
+{
+	u64 msg;
+
+	if (apple_rtkit_is_running(rtk))
+		return -EINVAL;
+
+	reinit_completion(&rtk->iop_pwr_ack_completion);
+
+	/*
+	 * Use open-coded apple_rtkit_set_iop_power_state since apple_rtkit_boot
+	 * will wait for the completion anyway.
+	 */
+	msg = FIELD_PREP(APPLE_RTKIT_MGMT_PWR_STATE, APPLE_RTKIT_PWR_STATE_ON);
+	apple_rtkit_management_send(rtk, APPLE_RTKIT_MGMT_SET_IOP_PWR_STATE,
+				    msg);
+
+	return apple_rtkit_boot(rtk);
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_wake);
+
+void apple_rtkit_free(struct apple_rtkit *rtk)
+{
+	kthread_stop(rtk->task);
+	mbox_free_channel(rtk->mbox_chan);
+
+	apple_rtkit_free_buffer(rtk, &rtk->ioreport_buffer);
+	apple_rtkit_free_buffer(rtk, &rtk->crashlog_buffer);
+	apple_rtkit_free_buffer(rtk, &rtk->syslog_buffer);
+
+	kfree(rtk->syslog_msg_buffer);
+	kfree(rtk);
+}
+EXPORT_SYMBOL_GPL(apple_rtkit_free);
+
+struct apple_rtkit *devm_apple_rtkit_init(struct device *dev, void *cookie,
+					  const char *mbox_name, int mbox_idx,
+					  const struct apple_rtkit_ops *ops)
+{
+	struct apple_rtkit *rtk;
+	int ret;
+
+	rtk = apple_rtkit_init(dev, cookie, mbox_name, mbox_idx, ops);
+	if (IS_ERR(rtk))
+		return rtk;
+
+	ret = devm_add_action_or_reset(dev, (void (*)(void *))apple_rtkit_free,
+				       rtk);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return rtk;
+}
+EXPORT_SYMBOL_GPL(devm_apple_rtkit_init);
+
+MODULE_LICENSE("Dual MIT/GPL");
+MODULE_AUTHOR("Sven Peter <sven@svenpeter.dev>");
+MODULE_DESCRIPTION("Apple RTKit driver");
diff --git a/include/linux/soc/apple/rtkit.h b/include/linux/soc/apple/rtkit.h
new file mode 100644
index 000000000000..a1beb514fff6
--- /dev/null
+++ b/include/linux/soc/apple/rtkit.h
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR MIT */
+/*
+ * Apple RTKit IPC Library
+ * Copyright (C) The Asahi Linux Contributors
+ *
+ * Apple's SoCs come with various co-processors running their RTKit operating
+ * system. This protocol library is used by client drivers to use the
+ * features provided by them.
+ */
+#ifndef _LINUX_APPLE_RTKIT_H_
+#define _LINUX_APPLE_RTKIT_H_
+
+#include <linux/device.h>
+#include <linux/ioport.h>
+#include <linux/types.h>
+#include <linux/mailbox_client.h>
+
+/*
+ * Struct to represent implementation-specific RTKit operations.
+ *
+ * @buffer:    Shared memory buffer allocated inside normal RAM.
+ * @iomem:     Shared memory buffer controlled by the co-processors.
+ * @size:      Size of the shared memory buffer.
+ * @iova:      Device VA of shared memory buffer.
+ * @is_mapped: Shared memory buffer is managed by the co-processor.
+ */
+
+struct apple_rtkit_shmem {
+	void *buffer;
+	void __iomem *iomem;
+	size_t size;
+	dma_addr_t iova;
+	bool is_mapped;
+};
+
+/*
+ * Struct to represent implementation-specific RTKit operations.
+ *
+ * @crashed:       Called when the co-processor has crashed.
+ * @recv_message:  Function called when a message from RTKit is received
+ *                 on a non-system endpoint. Called from a worker thread.
+ * @shmem_setup:   Setup shared memory buffer. If bfr.is_iomem is true the
+ *                 buffer is managed by the co-processor and needs to be mapped.
+ *                 Otherwise the buffer is managed by Linux and needs to be
+ *                 allocated. If not specified dma_alloc_coherent is used.
+ * @shmem_destroy: Undo the shared memory buffer setup in shmem_setup. If not
+ *                 specified dma_free_coherent is used.
+ */
+struct apple_rtkit_ops {
+	void (*crashed)(void *cookie);
+	void (*recv_message)(void *cookie, u8 endpoint, u64 message);
+	int (*shmem_setup)(void *cookie, struct apple_rtkit_shmem *bfr,
+			   dma_addr_t addr, size_t len);
+	void (*shmem_destroy)(void *cookie, struct apple_rtkit_shmem *bfr);
+};
+
+struct apple_rtkit;
+
+#if IS_ENABLED(CONFIG_APPLE_RTKIT)
+
+/*
+ * Initializes the internal state required to handle RTKit. This
+ * should usually be called within _probe.
+ *
+ * @dev: Pointer to the device node this coprocessor is assocated with
+ * @cookie: opaque cookie passed to all functions defined in rtkit_ops
+ * @mbox_name: mailbox name used to communicate with the co-processor
+ * @mbox_idx: mailbox index to be used if mbox_name is NULL
+ * @ops: pointer to rtkit_ops to be used for this co-processor
+ */
+struct apple_rtkit *apple_rtkit_init(struct device *dev, void *cookie,
+				     const char *mbox_name, int mbox_idx,
+				     const struct apple_rtkit_ops *ops);
+
+/*
+ * Dev-res managed version of apple_rtkit_init.
+ */
+struct apple_rtkit *devm_apple_rtkit_init(struct device *dev, void *cookie,
+					  const char *mbox_name, int mbox_idx,
+					  const struct apple_rtkit_ops *ops);
+
+/*
+ * Free internal structures.
+ */
+void apple_rtkit_free(struct apple_rtkit *rtk);
+
+/*
+ * Reinitialize internal structures. Must only be called with the co-processor
+ * is held in reset.
+ */
+int apple_rtkit_reinit(struct apple_rtkit *rtk);
+
+/*
+ * Handle RTKit's boot process. Should be called after the CPU of the
+ * co-processor has been started.
+ */
+int apple_rtkit_boot(struct apple_rtkit *rtk);
+
+/*
+ * Hibernate the co-processor.
+ */
+int apple_rtkit_hibernate(struct apple_rtkit *rtk);
+
+/*
+ * Wake the co-processor up from hibernation mode.
+ */
+int apple_rtkit_wake(struct apple_rtkit *rtk);
+
+/*
+ * Shutdown the co-processor
+ */
+int apple_rtkit_shutdown(struct apple_rtkit *rtk);
+
+/*
+ * Checks if RTKit is running and ready to handle messages.
+ */
+bool apple_rtkit_is_running(struct apple_rtkit *rtk);
+
+/*
+ * Checks if RTKit has crashed.
+ */
+bool apple_rtkit_is_crashed(struct apple_rtkit *rtk);
+
+/*
+ * Starts an endpoint. Must be called after boot but before any messages can be
+ * sent or received from that endpoint.
+ */
+int apple_rtkit_start_ep(struct apple_rtkit *rtk, u8 endpoint);
+
+/*
+ * Send a message to the given endpoint.
+ */
+int apple_rtkit_send_message(struct apple_rtkit *rtk, u8 ep, u64 message);
+
+#else
+
+static inline struct apple_rtkit *
+apple_rtkit_init(struct device *dev, void *cookie, const char *mbox_name,
+		 int mbox_idx, const struct apple_rtkit_ops *ops)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline struct apple_rtkit *
+devm_apple_rtkit_init(struct device *dev, void *cookie, const char *mbox_name,
+		      int mbox_idx, const struct apple_rtkit_ops *ops)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline void apple_rtkit_free(struct apple_rtkit *rtk)
+{
+}
+
+static inline int apple_rtkit_reinit(struct apple_rtkit *rtk)
+{
+	return -ENODEV;
+}
+
+static inline int apple_rtkit_boot(struct apple_rtkit *rtk)
+{
+	return -ENODEV;
+}
+
+static inline int apple_rtkit_hibernate(struct apple_rtkit *rtk)
+{
+	return -ENODEV;
+}
+
+static inline int apple_rtkit_wake(struct apple_rtkit *rtk)
+{
+	return -ENODEV;
+}
+
+static inline int apple_rtkit_shutdown(struct apple_rtkit *rtk)
+{
+	return -ENODEV;
+}
+
+static inline bool apple_rtkit_is_running(struct apple_rtkit *rtk)
+{
+	return false;
+}
+
+static inline bool apple_rtkit_is_crashed(struct apple_rtkit *rtk)
+{
+	return false;
+}
+
+static inline int apple_rtkit_start_ep(struct apple_rtkit *rtk, u8 endpoint)
+{
+	return -ENODEV;
+}
+
+static inline int apple_rtkit_send_message(struct apple_rtkit *rtk, u8 ep,
+					   u64 message)
+{
+	return -ENODEV;
+}
+
+#endif /* IS_ENABLED(CONFIG_APPLE_RTKIT) */
+
+#endif /* _LINUX_APPLE_RTKIT_H_ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
                   ` (4 preceding siblings ...)
  2022-03-21 16:50 ` [PATCH 5/9] soc: apple: Add RTKit IPC library Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-21 17:01   ` Keith Busch
                     ` (2 more replies)
  2022-03-21 16:50 ` [PATCH 7/9] nvme-apple: Serialize command issue Sven Peter
                   ` (3 subsequent siblings)
  9 siblings, 3 replies; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

Apple SoCs such as the M1 come with an embedded NVMe controller that
is not attached to any PCIe bus. Additionally, it doesn't confirm
to the NVMe specification and requires a bunch of changes to command
submission and IOMMU configuration to work.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 MAINTAINERS                |    1 +
 drivers/nvme/host/Kconfig  |   12 +
 drivers/nvme/host/Makefile |    3 +
 drivers/nvme/host/apple.c  | 1456 ++++++++++++++++++++++++++++++++++++
 4 files changed, 1472 insertions(+)
 create mode 100644 drivers/nvme/host/apple.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3d37fe7a0408..0a493298b6ff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1781,6 +1781,7 @@ F:	drivers/i2c/busses/i2c-pasemi-core.c
 F:	drivers/i2c/busses/i2c-pasemi-platform.c
 F:	drivers/irqchip/irq-apple-aic.c
 F:	drivers/mailbox/apple-mailbox.c
+F:	drivers/nvme/host/apple.c
 F:	drivers/pinctrl/pinctrl-apple-gpio.c
 F:	drivers/soc/apple/*
 F:	drivers/watchdog/apple_wdt.c
diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index dc0450ca23a3..439848a4b685 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -83,3 +83,15 @@ config NVME_TCP
 	  from https://github.com/linux-nvme/nvme-cli.
 
 	  If unsure, say N.
+
+config NVME_APPLE
+	tristate "Apple ANS2 NVM Express host driver"
+	depends on OF && BLOCK
+	depends on (APPLE_RTKIT && APPLE_SART && ARCH_APPLE) || COMPILE_TEST
+	select NVME_CORE
+	help
+	  This provides support for the NVMe controller embedded in Apple SoCs
+	  such as the M1.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called nvme-apple.
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index dfaacd472e5d..2927820c70a3 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabrics.o
 obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
 obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
 obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
+obj-$(CONFIG_NVME_APPLE)		+= nvme-apple.o
 
 nvme-core-y				:= core.o ioctl.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
@@ -25,3 +26,5 @@ nvme-rdma-y				+= rdma.o
 nvme-fc-y				+= fc.o
 
 nvme-tcp-y				+= tcp.o
+
+nvme-apple-y				+= apple.o
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
new file mode 100644
index 000000000000..587d6c7014a0
--- /dev/null
+++ b/drivers/nvme/host/apple.c
@@ -0,0 +1,1456 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Apple ANS NVM Express device driver
+ * Copyright The Asahi Linux Contributors
+ *
+ * Based on the pci.c NVM Express device driver
+ * Copyright (c) 2011-2014, Intel Corporation.
+ * and on the rdma.c NVMe over Fabrics RDMA host code.
+ * Copyright (c) 2015-2016 HGST, a Western Digital Company.
+ */
+
+//#define DEBUG
+
+#include <linux/async.h>
+#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
+#include <linux/blk-integrity.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
+#include <linux/interrupt.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/jiffies.h>
+#include <linux/mempool.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/once.h>
+#include <linux/platform_device.h>
+#include <linux/soc/apple/rtkit.h>
+#include <linux/soc/apple/sart.h>
+#include <linux/reset.h>
+#include <linux/time64.h>
+
+#include "nvme.h"
+
+#define APPLE_ANS_BOOT_TIMEOUT	  USEC_PER_SEC
+#define APPLE_ANS_MAX_QUEUE_DEPTH 64
+
+#define APPLE_ANS_COPROC_CPU_CONTROL	 0x44
+#define APPLE_ANS_COPROC_CPU_CONTROL_RUN BIT(4)
+
+#define APPLE_ANS_ACQ_DB  0x1004
+#define APPLE_ANS_IOCQ_DB 0x100c
+
+#define APPLE_ANS_MAX_PEND_CMDS_CTRL 0x1210
+
+#define APPLE_ANS_BOOT_STATUS	 0x1300
+#define APPLE_ANS_BOOT_STATUS_OK 0xde71ce55
+
+#define APPLE_ANS_UNKNOWN_CTRL	 0x24008
+#define APPLE_ANS_PRP_NULL_CHECK BIT(11)
+
+#define APPLE_ANS_LINEAR_SQ_CTRL 0x24908
+#define APPLE_ANS_LINEAR_SQ_EN	 BIT(0)
+
+#define APPLE_ANS_LINEAR_ASQ_DB	 0x2490c
+#define APPLE_ANS_LINEAR_IOSQ_DB 0x24910
+
+#define APPLE_NVMMU_NUM_TCBS	  0x28100
+#define APPLE_NVMMU_ASQ_TCB_BASE  0x28108
+#define APPLE_NVMMU_IOSQ_TCB_BASE 0x28110
+#define APPLE_NVMMU_TCB_INVAL	  0x28118
+#define APPLE_NVMMU_TCB_STAT	  0x28120
+
+/* NVM Express NVM Command Set Specification, Revision 1.0a, Figure 18 */
+#define NVME_OPCODE_DATA_XFER_HOST_TO_CTRL BIT(0)
+#define NVME_OPCODE_DATA_XFER_CTRL_TO_HOST BIT(1)
+
+/*
+ * This controller is a bit weird in the way command tags works: Both the
+ * admin and the IO queue share the same tag space. Additionally, tags
+ * cannot be higher than 0x40 which effectively limits the combined
+ * queue depth to 0x40. Instead of wasting half of that on the admin queue
+ * which gets much less traffic we instead reduce its size here.
+ * The controller also doesn't support async event such that no space must
+ * be reserved for NVME_NR_AEN_COMMANDS.
+ */
+#define APPLE_NVME_AQ_DEPTH	   2
+#define APPLE_NVME_AQ_MQ_TAG_DEPTH (APPLE_NVME_AQ_DEPTH - 1)
+
+/*
+ * These can be higher, but we need to ensure that any command doesn't
+ * require an sg allocation that needs more than a page of data.
+ */
+#define NVME_MAX_KB_SZ 4096
+#define NVME_MAX_SEGS  127
+
+/*
+ * This controller comes with an embedded IOMMU known as NVMMU.
+ * The NVMMU is pointed to an array of TCBs indexed by the command tag.
+ * Each command must be configured inside this structure before it's allowed
+ * to execute, including commands that don't require DMA transfers.
+ *
+ * An exception to this are Apple's vendor-specific commands (opcode 0xD8 on the
+ * admin queue): Those commands must still be added to the NVMMU but the DMA
+ * buffers cannot be represented as PRPs and must instead be allowed using SART.
+ *
+ * Programming the PRPs to the same values as those in the submission queue
+ * looks rather silly at first. This hardware is however designed for a kernel
+ * that runs the NVMMU code in a higher exception level than the NVMe driver.
+ * In that setting the NVMe driver first programs the submission queue entry
+ * and then executes a hypercall to the code that is allowed to program the
+ * NVMMU. The NVMMU driver then creates a shadow copy of the PRPs while
+ * verifying that they don't point to kernel text, data, pagetables, or similar
+ * protected areas before programming the TCB to point to this shadow copy.
+ * Since Linux doesn't do any of that we may as well just point both the queue
+ * and the TCB PRP pointer to the same memory.
+ */
+struct apple_nvmmu_tcb {
+	u8 opcode;
+
+#define APPLE_ANS_TCB_DMA_FROM_DEVICE BIT(0)
+#define APPLE_ANS_TCB_DMA_TO_DEVICE   BIT(1)
+	u8 dma_flags;
+
+	u8 command_id;
+	u8 _unk0;
+	u32 length;
+	u8 _unk1[16];
+	u64 prp1;
+	u64 prp2;
+	u8 _unk2[16];
+	u8 aes_iv[8];
+	u8 _aes_unk[64];
+};
+
+/*
+ * The Apple NVMe controller only supports a single admin and a single IO queue
+ * which are both limited to 64 entries and share a single interrupt.
+ *
+ * The completion queue works as usual. The submission "queue" instead is
+ * an array indexed by the command tag on this hardware. Commands must also be
+ * present in the NVMMU's tcb array. They are triggered by writing their tag to
+ * a MMIO register.
+ */
+struct apple_nvme_queue {
+	struct nvme_command *sqes;
+	struct nvme_completion *cqes;
+	struct apple_nvmmu_tcb *tcbs;
+
+	dma_addr_t sq_dma_addr;
+	dma_addr_t cq_dma_addr;
+	dma_addr_t tcb_dma_addr;
+
+	u32 __iomem *sq_db;
+	u32 __iomem *cq_db;
+
+	u16 cq_head;
+	u8 cq_phase;
+
+	bool is_adminq;
+	bool enabled;
+};
+
+/*
+ * The apple_nvme_iod describes the data in an I/O.
+ *
+ * The sg pointer contains the list of PRP chunk allocations in addition
+ * to the actual struct scatterlist.
+ */
+struct apple_nvme_iod {
+	struct nvme_request req;
+	struct nvme_command cmd;
+	struct apple_nvme_queue *q;
+	int npages; /* In the PRP list. 0 means small pool in use */
+	int nents; /* Used in scatterlist */
+	dma_addr_t first_dma;
+	unsigned int dma_len; /* length of single DMA segment mapping */
+	struct scatterlist *sg;
+};
+
+struct apple_nvme {
+	struct device *dev;
+
+	void __iomem *mmio_coproc;
+	void __iomem *mmio_nvme;
+
+	struct apple_sart *sart;
+	struct apple_rtkit *rtk;
+	struct reset_control *reset;
+
+	struct dma_pool *prp_page_pool;
+	struct dma_pool *prp_small_pool;
+	mempool_t *iod_mempool;
+
+	struct nvme_ctrl ctrl;
+	struct work_struct remove_work;
+
+	struct apple_nvme_queue adminq;
+	struct apple_nvme_queue ioq;
+
+	struct blk_mq_tag_set admin_tagset;
+	struct blk_mq_tag_set tagset;
+
+	int irq;
+	spinlock_t lock;
+};
+
+static_assert(sizeof(struct nvme_command) == 64);
+static_assert(sizeof(struct apple_nvmmu_tcb) == 128);
+
+static inline struct apple_nvme *ctrl_to_apple_nvme(struct nvme_ctrl *ctrl)
+{
+	return container_of(ctrl, struct apple_nvme, ctrl);
+}
+
+static inline struct apple_nvme *queue_to_apple_nvme(struct apple_nvme_queue *q)
+{
+	if (q->is_adminq)
+		return container_of(q, struct apple_nvme, adminq);
+	else
+		return container_of(q, struct apple_nvme, ioq);
+}
+
+static unsigned int apple_nvme_queue_depth(struct apple_nvme_queue *q)
+{
+	if (q->is_adminq)
+		return APPLE_NVME_AQ_DEPTH;
+	else
+		return APPLE_ANS_MAX_QUEUE_DEPTH;
+}
+
+static void apple_nvme_rtkit_crashed(void *cookie)
+{
+	struct apple_nvme *anv = cookie;
+
+	dev_warn(anv->dev, "RTKit crashed; unable to recover without a reboot");
+	nvme_reset_ctrl(&anv->ctrl);
+}
+
+static void apple_nvme_rtkit_recv(void *cookie, u8 endpoint, u64 message)
+{
+	struct apple_nvme *anv = cookie;
+
+	dev_warn(anv->dev, "Received unexpected message to EP%02d: %llx",
+		 endpoint, message);
+}
+
+static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,
+				     dma_addr_t iova, size_t size)
+{
+	struct apple_nvme *anv = cookie;
+	int ret;
+
+	if (iova)
+		return -EINVAL;
+
+	bfr->buffer = dma_alloc_coherent(anv->dev, size, &iova, GFP_KERNEL);
+	if (!bfr->buffer)
+		return -ENOMEM;
+
+	ret = apple_sart_add_allowed_region(anv->sart, iova, size);
+	if (ret) {
+		dma_free_coherent(anv->dev, size, bfr->buffer, iova);
+		bfr->buffer = NULL;
+		return -ENOMEM;
+	}
+
+	bfr->size = size;
+	bfr->iova = iova;
+
+	return 0;
+}
+
+static void apple_nvme_sart_dma_destroy(void *cookie, struct apple_rtkit_shmem *bfr)
+{
+	struct apple_nvme *anv = cookie;
+
+	apple_sart_remove_allowed_region(anv->sart, bfr->iova, bfr->size);
+	dma_free_coherent(anv->dev, bfr->size, bfr->buffer, bfr->iova);
+}
+
+static const struct apple_rtkit_ops apple_nvme_rtkit_ops = {
+	.crashed = apple_nvme_rtkit_crashed,
+	.recv_message = apple_nvme_rtkit_recv,
+	.shmem_setup = apple_nvme_sart_dma_setup,
+	.shmem_destroy = apple_nvme_sart_dma_destroy,
+};
+
+static void apple_nvmmu_inval(struct apple_nvme_queue *q, unsigned int tag)
+{
+	struct apple_nvme *anv = queue_to_apple_nvme(q);
+
+	writel(tag, anv->mmio_nvme + APPLE_NVMMU_TCB_INVAL);
+	if (readl_relaxed(anv->mmio_nvme + APPLE_NVMMU_TCB_STAT))
+		dev_warn(anv->dev, "NVMMU TCB invalidation failed\n");
+}
+
+static void apple_nvme_submit_cmd(struct apple_nvme_queue *q,
+				  struct nvme_command *cmd)
+{
+	u32 tag = nvme_tag_from_cid(cmd->common.command_id);
+	struct apple_nvmmu_tcb *tcb = &q->tcbs[tag];
+
+	tcb->opcode = cmd->common.opcode;
+	tcb->prp1 = cmd->common.dptr.prp1;
+	tcb->prp2 = cmd->common.dptr.prp2;
+	tcb->length = cmd->rw.length;
+	tcb->command_id = tag;
+	tcb->dma_flags = 0;
+
+	if (cmd->common.opcode & NVME_OPCODE_DATA_XFER_HOST_TO_CTRL)
+		tcb->dma_flags |= APPLE_ANS_TCB_DMA_TO_DEVICE;
+	if (cmd->common.opcode & NVME_OPCODE_DATA_XFER_CTRL_TO_HOST)
+		tcb->dma_flags |= APPLE_ANS_TCB_DMA_FROM_DEVICE;
+
+	memcpy(&q->sqes[tag], cmd, sizeof(*cmd));
+	writel(tag, q->sq_db);
+}
+
+/*
+ * From pci.c:
+ * Will slightly overestimate the number of pages needed.  This is OK
+ * as it only leads to a small amount of wasted memory for the lifetime of
+ * the I/O.
+ */
+static inline size_t apple_nvme_iod_alloc_size(void)
+{
+	const unsigned int nprps = DIV_ROUND_UP(
+		NVME_MAX_KB_SZ + NVME_CTRL_PAGE_SIZE, NVME_CTRL_PAGE_SIZE);
+	const int npages = DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);
+	const size_t alloc_size = sizeof(__le64 *) * npages +
+				  sizeof(struct scatterlist) * NVME_MAX_SEGS;
+
+	return alloc_size;
+}
+
+static void **apple_nvme_iod_list(struct request *req)
+{
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+
+	return (void **)(iod->sg + blk_rq_nr_phys_segments(req));
+}
+
+static void apple_nvme_free_prps(struct apple_nvme *anv, struct request *req)
+{
+	const int last_prp = NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1;
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	dma_addr_t dma_addr = iod->first_dma;
+	int i;
+
+	for (i = 0; i < iod->npages; i++) {
+		__le64 *prp_list = apple_nvme_iod_list(req)[i];
+		dma_addr_t next_dma_addr = prp_list[last_prp];
+
+		dma_pool_free(anv->prp_page_pool, prp_list, dma_addr);
+		dma_addr = next_dma_addr;
+	}
+}
+
+static void apple_nvme_unmap_data(struct apple_nvme *anv, struct request *req)
+{
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+
+	if (iod->dma_len) {
+		dma_unmap_page(anv->dev, iod->first_dma, iod->dma_len,
+			       rq_dma_dir(req));
+		return;
+	}
+
+	WARN_ON_ONCE(!iod->nents);
+
+	dma_unmap_sg(anv->dev, iod->sg, iod->nents, rq_dma_dir(req));
+	if (iod->npages == 0)
+		dma_pool_free(anv->prp_small_pool, apple_nvme_iod_list(req)[0],
+			      iod->first_dma);
+	else
+		apple_nvme_free_prps(anv, req);
+	mempool_free(iod->sg, anv->iod_mempool);
+}
+
+static void apple_nvme_print_sgl(struct scatterlist *sgl, int nents)
+{
+	int i;
+	struct scatterlist *sg;
+
+	for_each_sg(sgl, sg, nents, i) {
+		dma_addr_t phys = sg_phys(sg);
+
+		pr_warn("sg[%d] phys_addr:%pad offset:%d length:%d dma_address:%pad dma_length:%d\n",
+			i, &phys, sg->offset, sg->length, &sg_dma_address(sg),
+			sg_dma_len(sg));
+	}
+}
+
+static blk_status_t apple_nvme_setup_prps(struct apple_nvme *anv,
+					  struct request *req,
+					  struct nvme_rw_command *cmnd)
+{
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct dma_pool *pool;
+	int length = blk_rq_payload_bytes(req);
+	struct scatterlist *sg = iod->sg;
+	int dma_len = sg_dma_len(sg);
+	u64 dma_addr = sg_dma_address(sg);
+	int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1);
+	__le64 *prp_list;
+	void **list = apple_nvme_iod_list(req);
+	dma_addr_t prp_dma;
+	int nprps, i;
+
+	length -= (NVME_CTRL_PAGE_SIZE - offset);
+	if (length <= 0) {
+		iod->first_dma = 0;
+		goto done;
+	}
+
+	dma_len -= (NVME_CTRL_PAGE_SIZE - offset);
+	if (dma_len) {
+		dma_addr += (NVME_CTRL_PAGE_SIZE - offset);
+	} else {
+		sg = sg_next(sg);
+		dma_addr = sg_dma_address(sg);
+		dma_len = sg_dma_len(sg);
+	}
+
+	if (length <= NVME_CTRL_PAGE_SIZE) {
+		iod->first_dma = dma_addr;
+		goto done;
+	}
+
+	nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
+	if (nprps <= (256 / 8)) {
+		pool = anv->prp_small_pool;
+		iod->npages = 0;
+	} else {
+		pool = anv->prp_page_pool;
+		iod->npages = 1;
+	}
+
+	prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
+	if (!prp_list) {
+		iod->first_dma = dma_addr;
+		iod->npages = -1;
+		return BLK_STS_RESOURCE;
+	}
+	list[0] = prp_list;
+	iod->first_dma = prp_dma;
+	i = 0;
+	for (;;) {
+		if (i == NVME_CTRL_PAGE_SIZE >> 3) {
+			__le64 *old_prp_list = prp_list;
+
+			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
+			if (!prp_list)
+				goto free_prps;
+			list[iod->npages++] = prp_list;
+			prp_list[0] = old_prp_list[i - 1];
+			old_prp_list[i - 1] = prp_dma;
+			i = 1;
+		}
+		prp_list[i++] = dma_addr;
+		dma_len -= NVME_CTRL_PAGE_SIZE;
+		dma_addr += NVME_CTRL_PAGE_SIZE;
+		length -= NVME_CTRL_PAGE_SIZE;
+		if (length <= 0)
+			break;
+		if (dma_len > 0)
+			continue;
+		if (unlikely(dma_len < 0))
+			goto bad_sgl;
+		sg = sg_next(sg);
+		dma_addr = sg_dma_address(sg);
+		dma_len = sg_dma_len(sg);
+	}
+done:
+	cmnd->dptr.prp1 = sg_dma_address(iod->sg);
+	cmnd->dptr.prp2 = iod->first_dma;
+	return BLK_STS_OK;
+free_prps:
+	apple_nvme_free_prps(anv, req);
+	return BLK_STS_RESOURCE;
+bad_sgl:
+	WARN(DO_ONCE(apple_nvme_print_sgl, iod->sg, iod->nents),
+	     "Invalid SGL for payload:%d nents:%d\n", blk_rq_payload_bytes(req),
+	     iod->nents);
+	return BLK_STS_IOERR;
+}
+
+static blk_status_t apple_nvme_setup_prp_simple(struct apple_nvme *anv,
+						struct request *req,
+						struct nvme_rw_command *cmnd,
+						struct bio_vec *bv)
+{
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	unsigned int offset = bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1);
+	unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - offset;
+
+	iod->first_dma = dma_map_bvec(anv->dev, bv, rq_dma_dir(req), 0);
+	if (dma_mapping_error(anv->dev, iod->first_dma))
+		return BLK_STS_RESOURCE;
+	iod->dma_len = bv->bv_len;
+
+	cmnd->dptr.prp1 = iod->first_dma;
+	if (bv->bv_len > first_prp_len)
+		cmnd->dptr.prp2 = iod->first_dma + first_prp_len;
+	return BLK_STS_OK;
+}
+
+static blk_status_t apple_nvme_map_data(struct apple_nvme *anv,
+					struct request *req,
+					struct nvme_command *cmnd)
+{
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	blk_status_t ret = BLK_STS_RESOURCE;
+	int nr_mapped;
+
+	if (blk_rq_nr_phys_segments(req) == 1) {
+		struct bio_vec bv = req_bvec(req);
+
+		if (bv.bv_offset + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2)
+			return apple_nvme_setup_prp_simple(anv, req, &cmnd->rw,
+							   &bv);
+	}
+
+	iod->dma_len = 0;
+	iod->sg = mempool_alloc(anv->iod_mempool, GFP_ATOMIC);
+	if (!iod->sg)
+		return BLK_STS_RESOURCE;
+	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
+	iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
+	if (!iod->nents)
+		goto out_free_sg;
+
+	nr_mapped = dma_map_sg_attrs(anv->dev, iod->sg, iod->nents,
+				     rq_dma_dir(req), DMA_ATTR_NO_WARN);
+	if (!nr_mapped)
+		goto out_free_sg;
+
+	ret = apple_nvme_setup_prps(anv, req, &cmnd->rw);
+	if (ret != BLK_STS_OK)
+		goto out_unmap_sg;
+	return BLK_STS_OK;
+
+out_unmap_sg:
+	dma_unmap_sg(anv->dev, iod->sg, iod->nents, rq_dma_dir(req));
+out_free_sg:
+	mempool_free(iod->sg, anv->iod_mempool);
+	return ret;
+}
+
+static __always_inline void apple_nvme_unmap_rq(struct request *req)
+{
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct apple_nvme *anv = queue_to_apple_nvme(iod->q);
+
+	if (blk_rq_nr_phys_segments(req))
+		apple_nvme_unmap_data(anv, req);
+}
+
+static void apple_nvme_complete_rq(struct request *req)
+{
+	apple_nvme_unmap_rq(req);
+	nvme_complete_rq(req);
+}
+
+static void apple_nvme_complete_batch(struct io_comp_batch *iob)
+{
+	nvme_complete_batch(iob, apple_nvme_unmap_rq);
+}
+
+static inline bool apple_nvme_cqe_pending(struct apple_nvme_queue *q)
+{
+	struct nvme_completion *hcqe = &q->cqes[q->cq_head];
+
+	return (READ_ONCE(hcqe->status) & 1) == q->cq_phase;
+}
+
+static inline struct blk_mq_tags *
+apple_nvme_queue_tagset(struct apple_nvme *anv, struct apple_nvme_queue *q)
+{
+	if (q->is_adminq)
+		return anv->admin_tagset.tags[0];
+	else
+		return anv->tagset.tags[0];
+}
+
+static inline void apple_nvme_handle_cqe(struct apple_nvme_queue *q,
+					 struct io_comp_batch *iob, u16 idx)
+{
+	struct apple_nvme *anv = queue_to_apple_nvme(q);
+	struct nvme_completion *cqe = &q->cqes[idx];
+	__u16 command_id = READ_ONCE(cqe->command_id);
+	struct request *req;
+
+	apple_nvmmu_inval(q, command_id);
+
+	req = nvme_find_rq(apple_nvme_queue_tagset(anv, q), command_id);
+	if (unlikely(!req)) {
+		dev_warn(anv->dev, "invalid id %d completed", command_id);
+		return;
+	}
+
+	if (!nvme_try_complete_req(req, cqe->status, cqe->result) &&
+	    !blk_mq_add_to_batch(req, iob, nvme_req(req)->status,
+				 apple_nvme_complete_batch))
+		apple_nvme_complete_rq(req);
+}
+
+static inline void apple_nvme_update_cq_head(struct apple_nvme_queue *q)
+{
+	u32 tmp = q->cq_head + 1;
+
+	if (tmp == apple_nvme_queue_depth(q)) {
+		q->cq_head = 0;
+		q->cq_phase ^= 1;
+	} else {
+		q->cq_head = tmp;
+	}
+}
+
+static bool apple_nvme_poll_cq(struct apple_nvme_queue *q,
+			       struct io_comp_batch *iob)
+{
+	bool found = false;
+
+	while (apple_nvme_cqe_pending(q)) {
+		found = true;
+
+		/*
+		 * load-load control dependency between phase and the rest of
+		 * the cqe requires a full read memory barrier
+		 */
+		dma_rmb();
+		apple_nvme_handle_cqe(q, iob, q->cq_head);
+		apple_nvme_update_cq_head(q);
+	}
+
+	if (found)
+		writel_relaxed(q->cq_head, q->cq_db);
+
+	return found;
+}
+
+static bool apple_nvme_handle_cq(struct apple_nvme_queue *q, bool force)
+{
+	bool found;
+	DEFINE_IO_COMP_BATCH(iob);
+
+	if (!READ_ONCE(q->enabled) && !force)
+		return false;
+
+	found = apple_nvme_poll_cq(q, &iob);
+
+	if (!rq_list_empty(iob.req_list))
+		apple_nvme_complete_batch(&iob);
+
+	return found;
+}
+
+static irqreturn_t apple_nvme_irq(int irq, void *data)
+{
+	struct apple_nvme *anv = data;
+	bool handled = false;
+	unsigned long flags;
+
+	spin_lock_irqsave(&anv->lock, flags);
+	if (apple_nvme_handle_cq(&anv->ioq, false))
+		handled = true;
+	if (apple_nvme_handle_cq(&anv->adminq, false))
+		handled = true;
+	spin_unlock_irqrestore(&anv->lock, flags);
+
+	if (handled)
+		return IRQ_HANDLED;
+	return IRQ_NONE;
+}
+
+static int apple_nvme_create_cq(struct apple_nvme *anv)
+{
+	struct nvme_command c = {};
+
+	/*
+	 * Note: we (ab)use the fact that the prp fields survive if no data
+	 * is attached to the request.
+	 */
+	c.create_cq.opcode = nvme_admin_create_cq;
+	c.create_cq.prp1 = anv->ioq.cq_dma_addr;
+	c.create_cq.cqid = 1;
+	c.create_cq.qsize = APPLE_ANS_MAX_QUEUE_DEPTH - 1;
+	c.create_cq.cq_flags = NVME_QUEUE_PHYS_CONTIG | NVME_CQ_IRQ_ENABLED;
+	c.create_cq.irq_vector = 0;
+
+	return nvme_submit_sync_cmd(anv->ctrl.admin_q, &c, NULL, 0);
+}
+
+static int apple_nvme_remove_cq(struct apple_nvme *anv)
+{
+	struct nvme_command c = {};
+
+	c.delete_queue.opcode = nvme_admin_delete_cq;
+	c.delete_queue.qid = 1;
+
+	return nvme_submit_sync_cmd(anv->ctrl.admin_q, &c, NULL, 0);
+}
+
+static int apple_nvme_create_sq(struct apple_nvme *anv)
+{
+	struct nvme_command c = {};
+
+	/*
+	 * Note: we (ab)use the fact that the prp fields survive if no data
+	 * is attached to the request.
+	 */
+	c.create_sq.opcode = nvme_admin_create_sq;
+	c.create_sq.prp1 = anv->ioq.sq_dma_addr;
+	c.create_sq.sqid = 1;
+	c.create_sq.qsize = APPLE_ANS_MAX_QUEUE_DEPTH - 1;
+	c.create_sq.sq_flags = NVME_QUEUE_PHYS_CONTIG;
+	c.create_sq.cqid = 1;
+
+	return nvme_submit_sync_cmd(anv->ctrl.admin_q, &c, NULL, 0);
+}
+
+static int apple_nvme_remove_sq(struct apple_nvme *anv)
+{
+	struct nvme_command c = {};
+
+	c.delete_queue.opcode = nvme_admin_delete_sq;
+	c.delete_queue.qid = 1;
+
+	return nvme_submit_sync_cmd(anv->ctrl.admin_q, &c, NULL, 0);
+}
+
+static blk_status_t apple_nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
+					const struct blk_mq_queue_data *bd)
+{
+	struct nvme_ns *ns = hctx->queue->queuedata;
+	struct apple_nvme_queue *q = hctx->driver_data;
+	struct apple_nvme *anv = queue_to_apple_nvme(q);
+	struct request *req = bd->rq;
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_command *cmnd = &iod->cmd;
+	blk_status_t ret;
+
+	iod->npages = -1;
+	iod->nents = 0;
+
+	/*
+	 * We should not need to do this, but we're still using this to
+	 * ensure we can drain requests on a dying queue.
+	 */
+	if (unlikely(!READ_ONCE(q->enabled)))
+		return BLK_STS_IOERR;
+
+	if (!nvme_check_ready(&anv->ctrl, req, true))
+		return nvme_fail_nonready_command(&anv->ctrl, req);
+
+	ret = nvme_setup_cmd(ns, req);
+	if (ret)
+		return ret;
+
+	if (blk_rq_nr_phys_segments(req)) {
+		ret = apple_nvme_map_data(anv, req, cmnd);
+		if (ret)
+			goto out_free_cmd;
+	}
+
+	blk_mq_start_request(req);
+	apple_nvme_submit_cmd(q, cmnd);
+	return BLK_STS_OK;
+
+out_free_cmd:
+	nvme_cleanup_cmd(req);
+	return ret;
+}
+
+static int apple_nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+				unsigned int hctx_idx)
+{
+	hctx->driver_data = data;
+	return 0;
+}
+
+static int apple_nvme_init_request(struct blk_mq_tag_set *set,
+				   struct request *req, unsigned int hctx_idx,
+				   unsigned int numa_node)
+{
+	struct apple_nvme_queue *q = set->driver_data;
+	struct apple_nvme *anv = queue_to_apple_nvme(q);
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_request *nreq = nvme_req(req);
+
+	iod->q = q;
+	nreq->ctrl = &anv->ctrl;
+	nreq->cmd = &iod->cmd;
+
+	return 0;
+}
+
+static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
+{
+	u32 csts = readl(anv->mmio_nvme + NVME_REG_CSTS);
+	bool dead = false, freeze = false;
+	unsigned long flags;
+
+	if (apple_rtkit_is_crashed(anv->rtk))
+		dead = true;
+	if (!(csts & NVME_CSTS_RDY))
+		dead = true;
+	if (csts & NVME_CSTS_CFS)
+		dead = true;
+
+	if (anv->ctrl.state == NVME_CTRL_LIVE ||
+	    anv->ctrl.state == NVME_CTRL_RESETTING) {
+		freeze = true;
+		nvme_start_freeze(&anv->ctrl);
+	}
+
+	/*
+	 * Give the controller a chance to complete all entered requests if
+	 * doing a safe shutdown.
+	 */
+	if (!dead && shutdown && freeze)
+		nvme_wait_freeze_timeout(&anv->ctrl, NVME_IO_TIMEOUT);
+
+	nvme_stop_queues(&anv->ctrl);
+
+	if (!dead) {
+		if (READ_ONCE(anv->ioq.enabled)) {
+			apple_nvme_remove_sq(anv);
+			apple_nvme_remove_cq(anv);
+		}
+
+		if (shutdown)
+			nvme_shutdown_ctrl(&anv->ctrl);
+		nvme_disable_ctrl(&anv->ctrl);
+	}
+
+	WRITE_ONCE(anv->ioq.enabled, false);
+	WRITE_ONCE(anv->adminq.enabled, false);
+	mb(); /* ensure that nvme_queue_rq() sees that enabled is cleared */
+	nvme_stop_admin_queue(&anv->ctrl);
+
+	/* last chance to complete any requests before nvme_cancel_request */
+	spin_lock_irqsave(&anv->lock, flags);
+	apple_nvme_handle_cq(&anv->ioq, true);
+	apple_nvme_handle_cq(&anv->adminq, true);
+	spin_unlock_irqrestore(&anv->lock, flags);
+
+	blk_mq_tagset_busy_iter(&anv->tagset, nvme_cancel_request, &anv->ctrl);
+	blk_mq_tagset_busy_iter(&anv->admin_tagset, nvme_cancel_request,
+				&anv->ctrl);
+	blk_mq_tagset_wait_completed_request(&anv->tagset);
+	blk_mq_tagset_wait_completed_request(&anv->admin_tagset);
+
+	/*
+	 * The driver will not be starting up queues again if shutting down so
+	 * must flush all entered requests to their failed completion to avoid
+	 * deadlocking blk-mq hot-cpu notifier.
+	 */
+	if (shutdown) {
+		nvme_start_queues(&anv->ctrl);
+		nvme_start_admin_queue(&anv->ctrl);
+	}
+}
+
+static enum blk_eh_timer_return apple_nvme_timeout(struct request *req,
+						   bool reserved)
+{
+	struct apple_nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct apple_nvme_queue *q = iod->q;
+	struct apple_nvme *anv = queue_to_apple_nvme(q);
+	unsigned long flags;
+	u32 csts = readl_relaxed(anv->mmio_nvme + NVME_REG_CSTS);
+
+	if (anv->ctrl.state != NVME_CTRL_LIVE) {
+		/*
+		 * From rdma.c:
+		 * If we are resetting, connecting or deleting we should
+		 * complete immediately because we may block controller
+		 * teardown or setup sequence
+		 * - ctrl disable/shutdown fabrics requests
+		 * - connect requests
+		 * - initialization admin requests
+		 * - I/O requests that entered after unquiescing and
+		 *   the controller stopped responding
+		 *
+		 * All other requests should be cancelled by the error
+		 * recovery work, so it's fine that we fail it here.
+		 */
+		dev_warn(anv->dev,
+			 "I/O %d(aq:%d) timeout while not in live state\n",
+			 req->tag, q->is_adminq);
+		if (blk_mq_request_started(req) &&
+		    !blk_mq_request_completed(req)) {
+			nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
+			blk_mq_complete_request(req);
+		}
+		return BLK_EH_DONE;
+	}
+
+	/* check if we just missed an interrupt if we're still alive */
+	if (!apple_rtkit_is_crashed(anv->rtk) && !(csts & NVME_CSTS_CFS)) {
+		spin_lock_irqsave(&anv->lock, flags);
+		apple_nvme_handle_cq(q, false);
+		spin_unlock_irqrestore(&anv->lock, flags);
+		if (blk_mq_request_completed(req)) {
+			dev_warn(anv->dev,
+				 "I/O %d(aq:%d) timeout: completion polled\n",
+				 req->tag, q->is_adminq);
+			return BLK_EH_DONE;
+		}
+	}
+
+	/*
+	 * aborting commands isn't supported which leaves a full reset as our
+	 * only option here
+	 */
+	dev_warn(anv->dev, "I/O %d(aq:%d) timeout: resetting controller\n",
+		 req->tag, q->is_adminq);
+	nvme_req(req)->flags |= NVME_REQ_CANCELLED;
+	apple_nvme_disable(anv, false);
+	nvme_reset_ctrl(&anv->ctrl);
+	return BLK_EH_DONE;
+}
+
+static int apple_nvme_poll(struct blk_mq_hw_ctx *hctx,
+			   struct io_comp_batch *iob)
+{
+	struct apple_nvme_queue *q = hctx->driver_data;
+	struct apple_nvme *anv = queue_to_apple_nvme(q);
+	bool found;
+	unsigned long flags;
+
+	spin_lock_irqsave(&anv->lock, flags);
+	found = apple_nvme_poll_cq(q, iob);
+	spin_unlock_irqrestore(&anv->lock, flags);
+
+	return found;
+}
+
+static const struct blk_mq_ops apple_nvme_mq_admin_ops = {
+	.queue_rq = apple_nvme_queue_rq,
+	.complete = apple_nvme_complete_rq,
+	.init_hctx = apple_nvme_init_hctx,
+	.init_request = apple_nvme_init_request,
+	.timeout = apple_nvme_timeout,
+};
+
+static const struct blk_mq_ops apple_nvme_mq_ops = {
+	.queue_rq = apple_nvme_queue_rq,
+	.complete = apple_nvme_complete_rq,
+	.init_hctx = apple_nvme_init_hctx,
+	.init_request = apple_nvme_init_request,
+	.timeout = apple_nvme_timeout,
+	.poll = apple_nvme_poll,
+};
+
+static void apple_nvme_init_queue(struct apple_nvme_queue *q)
+{
+	unsigned int depth = apple_nvme_queue_depth(q);
+
+	q->cq_head = 0;
+	q->cq_phase = 1;
+	memset(q->tcbs, 0,
+	       APPLE_ANS_MAX_QUEUE_DEPTH * sizeof(struct apple_nvmmu_tcb));
+	memset(q->cqes, 0, depth * sizeof(struct nvme_completion));
+	WRITE_ONCE(q->enabled, true);
+	wmb(); /* ensure the first interrupt sees the initialization */
+}
+
+static void apple_nvme_reset_work(struct work_struct *work)
+{
+	unsigned int nr_io_queues = 1;
+	int ret;
+	u32 boot_status, aqa;
+	struct apple_nvme *anv =
+		container_of(work, struct apple_nvme, ctrl.reset_work);
+
+	if (anv->ctrl.state != NVME_CTRL_RESETTING) {
+		dev_warn(anv->dev, "ctrl state %d is not RESETTING\n",
+			 anv->ctrl.state);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	/* there's unfortunately no known way to recover if RTKit crashed :( */
+	if (apple_rtkit_is_crashed(anv->rtk)) {
+		dev_err(anv->dev,
+			"RTKit has crashed without any way to recover.");
+		ret = -EIO;
+		goto out;
+	}
+
+	if (anv->ctrl.ctrl_config & NVME_CC_ENABLE)
+		apple_nvme_disable(anv, false);
+
+	/* RTKit must be shut down cleanly for the (soft)-reset to work */
+	if (apple_rtkit_is_running(anv->rtk)) {
+		dev_dbg(anv->dev, "Trying to shut down RTKit before reset.");
+		ret = apple_rtkit_shutdown(anv->rtk);
+		if (ret)
+			goto out;
+	}
+
+	writel_relaxed(0, anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
+	(void)readl_relaxed(anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
+
+	ret = reset_control_assert(anv->reset);
+	if (ret)
+		goto out;
+
+	ret = apple_rtkit_reinit(anv->rtk);
+	if (ret)
+		goto out;
+
+	ret = reset_control_deassert(anv->reset);
+	if (ret)
+		goto out;
+
+	writel_relaxed(APPLE_ANS_COPROC_CPU_CONTROL_RUN,
+		       anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
+	(void)readl_relaxed(anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
+	ret = apple_rtkit_boot(anv->rtk);
+	if (ret) {
+		dev_err(anv->dev, "ANS did not boot");
+		goto out;
+	}
+
+	ret = readl_relaxed_poll_timeout(
+		anv->mmio_nvme + APPLE_ANS_BOOT_STATUS, boot_status,
+		boot_status == APPLE_ANS_BOOT_STATUS_OK, USEC_PER_MSEC,
+		APPLE_ANS_BOOT_TIMEOUT);
+	if (ret) {
+		dev_err(anv->dev, "ANS did not initialize");
+		goto out;
+	}
+
+	dev_dbg(anv->dev, "ANS booted successfully.");
+
+	/*
+	 * Limit the max command size to prevent iod->sg allocations going
+	 * over a single page.
+	 */
+	anv->ctrl.max_hw_sectors = min_t(u32, NVME_MAX_KB_SZ << 1,
+					 dma_max_mapping_size(anv->dev) >> 9);
+	anv->ctrl.max_segments = NVME_MAX_SEGS;
+
+	/*
+	 * Enable NVMMU and linear submission queues.
+	 * While we could keep those disabled and pretend this is slightly
+	 * more common NVMe controller we'd still need some quirks (e.g.
+	 * sq entries will be 128 bytes) and Apple might drop support for
+	 * that mode in the future.
+	 */
+	writel_relaxed(APPLE_ANS_LINEAR_SQ_EN,
+		       anv->mmio_nvme + APPLE_ANS_LINEAR_SQ_CTRL);
+
+	/* Allow as many pending command as possible for both queues */
+	writel_relaxed(APPLE_ANS_MAX_QUEUE_DEPTH |
+			       (APPLE_ANS_MAX_QUEUE_DEPTH << 16),
+		       anv->mmio_nvme + APPLE_ANS_MAX_PEND_CMDS_CTRL);
+
+	/* Setup the NVMMU for the maximum admin and IO queue depth */
+	writel_relaxed(APPLE_ANS_MAX_QUEUE_DEPTH - 1,
+		       anv->mmio_nvme + APPLE_NVMMU_NUM_TCBS);
+
+	/*
+	 * This is probably a chicken bit: without it all commands where any PRP
+	 * is set to zero (including those that don't use that field) fail and
+	 * the co-processor complains about "completed with err BAD_CMD-" or
+	 * a "NULL_PRP_PTR_ERR" in the syslog
+	 */
+	writel_relaxed(readl_relaxed(anv->mmio_nvme + APPLE_ANS_UNKNOWN_CTRL) &
+			       ~APPLE_ANS_PRP_NULL_CHECK,
+		       anv->mmio_nvme + APPLE_ANS_UNKNOWN_CTRL);
+
+	/* Setup the admin queue */
+	aqa = APPLE_NVME_AQ_DEPTH - 1;
+	aqa |= aqa << 16;
+	writel_relaxed(aqa, anv->mmio_nvme + NVME_REG_AQA);
+	lo_hi_writeq_relaxed(anv->adminq.sq_dma_addr,
+			     anv->mmio_nvme + NVME_REG_ASQ);
+	lo_hi_writeq_relaxed(anv->adminq.cq_dma_addr,
+			     anv->mmio_nvme + NVME_REG_ACQ);
+
+	/* Setup NVMMU for both queues */
+	lo_hi_writeq_relaxed(anv->adminq.tcb_dma_addr,
+			     anv->mmio_nvme + APPLE_NVMMU_ASQ_TCB_BASE);
+	lo_hi_writeq_relaxed(anv->ioq.tcb_dma_addr,
+			     anv->mmio_nvme + APPLE_NVMMU_IOSQ_TCB_BASE);
+
+	anv->ctrl.sqsize =
+		APPLE_ANS_MAX_QUEUE_DEPTH - 1; /* 0's based queue depth */
+	anv->ctrl.cap = lo_hi_readq_relaxed(anv->mmio_nvme + NVME_REG_CAP);
+
+	dev_dbg(anv->dev, "Enabling controller now");
+	ret = nvme_enable_ctrl(&anv->ctrl);
+	if (ret)
+		goto out;
+
+	dev_dbg(anv->dev, "Starting admin queue");
+	apple_nvme_init_queue(&anv->adminq);
+	nvme_start_admin_queue(&anv->ctrl);
+
+	if (!nvme_change_ctrl_state(&anv->ctrl, NVME_CTRL_CONNECTING)) {
+		dev_warn(anv->ctrl.device,
+			 "failed to mark controller CONNECTING\n");
+		ret = -ENODEV;
+		goto out;
+	}
+
+	ret = nvme_init_ctrl_finish(&anv->ctrl);
+	if (ret)
+		goto out;
+
+	dev_dbg(anv->dev, "Creating IOCQ");
+	ret = apple_nvme_create_cq(anv);
+	if (ret)
+		goto out;
+	dev_dbg(anv->dev, "Creating IOSQ");
+	ret = apple_nvme_create_sq(anv);
+	if (ret)
+		goto out_remove_cq;
+
+	apple_nvme_init_queue(&anv->ioq);
+	nr_io_queues = 1;
+	ret = nvme_set_queue_count(&anv->ctrl, &nr_io_queues);
+	if (ret)
+		goto out_remove_sq;
+	if (nr_io_queues != 1) {
+		ret = -ENXIO;
+		goto out_remove_sq;
+	}
+
+	anv->ctrl.queue_count = nr_io_queues + 1;
+
+	nvme_start_queues(&anv->ctrl);
+	nvme_wait_freeze(&anv->ctrl);
+	blk_mq_update_nr_hw_queues(&anv->tagset, 1);
+	nvme_unfreeze(&anv->ctrl);
+
+	if (!nvme_change_ctrl_state(&anv->ctrl, NVME_CTRL_LIVE)) {
+		dev_warn(anv->ctrl.device,
+			 "failed to mark controller live state\n");
+		ret = -ENODEV;
+		goto out_remove_sq;
+	}
+
+	nvme_start_ctrl(&anv->ctrl);
+
+	dev_dbg(anv->dev, "ANS boot and NVMe init completed.");
+	return;
+
+out_remove_sq:
+	apple_nvme_remove_sq(anv);
+out_remove_cq:
+	apple_nvme_remove_cq(anv);
+out:
+	dev_warn(anv->ctrl.device, "Reset failure status: %d\n", ret);
+	nvme_change_ctrl_state(&anv->ctrl, NVME_CTRL_DELETING);
+	nvme_get_ctrl(&anv->ctrl);
+	apple_nvme_disable(anv, false);
+	nvme_kill_queues(&anv->ctrl);
+	if (!queue_work(nvme_wq, &anv->remove_work))
+		nvme_put_ctrl(&anv->ctrl);
+}
+
+static void apple_nvme_remove_dead_ctrl_work(struct work_struct *work)
+{
+	struct apple_nvme *anv =
+		container_of(work, struct apple_nvme, remove_work);
+
+	nvme_put_ctrl(&anv->ctrl);
+	device_release_driver(anv->dev);
+}
+
+static int apple_nvme_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
+{
+	*val = readl_relaxed(ctrl_to_apple_nvme(ctrl)->mmio_nvme + off);
+	return 0;
+}
+
+static int apple_nvme_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
+{
+	writel_relaxed(val, ctrl_to_apple_nvme(ctrl)->mmio_nvme + off);
+	return 0;
+}
+
+static int apple_nvme_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
+{
+	*val = lo_hi_readq_relaxed(ctrl_to_apple_nvme(ctrl)->mmio_nvme + off);
+	return 0;
+}
+
+static int apple_nvme_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
+{
+	struct device *dev = ctrl_to_apple_nvme(ctrl)->dev;
+
+	return snprintf(buf, size, "%s\n", dev_name(dev));
+}
+
+static void apple_nvme_free_ctrl(struct nvme_ctrl *ctrl)
+{
+}
+
+static const struct nvme_ctrl_ops nvme_ctrl_ops = {
+	.name = "apple-nvme",
+	.module = THIS_MODULE,
+	.flags = 0,
+	.reg_read32 = apple_nvme_reg_read32,
+	.reg_write32 = apple_nvme_reg_write32,
+	.reg_read64 = apple_nvme_reg_read64,
+	.free_ctrl = apple_nvme_free_ctrl,
+	.get_address = apple_nvme_get_address,
+};
+
+static void apple_nvme_async_probe(void *data, async_cookie_t cookie)
+{
+	struct apple_nvme *anv = data;
+
+	flush_work(&anv->ctrl.reset_work);
+	flush_work(&anv->ctrl.scan_work);
+	nvme_put_ctrl(&anv->ctrl);
+}
+
+static int apple_nvme_alloc_tagsets(struct apple_nvme *anv)
+{
+	int ret;
+
+	anv->admin_tagset.ops = &apple_nvme_mq_admin_ops;
+	anv->admin_tagset.nr_hw_queues = 1;
+	anv->admin_tagset.queue_depth = APPLE_NVME_AQ_MQ_TAG_DEPTH;
+	anv->admin_tagset.timeout = NVME_ADMIN_TIMEOUT;
+	anv->admin_tagset.numa_node = NUMA_NO_NODE;
+	anv->admin_tagset.cmd_size = sizeof(struct apple_nvme_iod);
+	anv->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
+	anv->admin_tagset.driver_data = &anv->adminq;
+
+	ret = blk_mq_alloc_tag_set(&anv->admin_tagset);
+	if (ret)
+		return ret;
+	ret = devm_add_action_or_reset(anv->dev,
+				       (void (*)(void *))blk_mq_free_tag_set,
+				       &anv->admin_tagset);
+	if (ret)
+		return ret;
+
+	anv->tagset.ops = &apple_nvme_mq_ops;
+	anv->tagset.nr_hw_queues = 1;
+	anv->tagset.nr_maps = 1;
+	/*
+	 * Tags are used as an index to the NVMMU and must be unique across
+	 * both queues. The admin queue gets the first APPLE_NVME_AQ_DEPTH which
+	 * must be marked as reserved in the IO queue.
+	 */
+	anv->tagset.reserved_tags = APPLE_NVME_AQ_DEPTH;
+	anv->tagset.queue_depth = APPLE_ANS_MAX_QUEUE_DEPTH - 1;
+	anv->tagset.timeout = NVME_IO_TIMEOUT;
+	anv->tagset.numa_node = NUMA_NO_NODE;
+	anv->tagset.cmd_size = sizeof(struct apple_nvme_iod);
+	anv->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
+	anv->tagset.driver_data = &anv->ioq;
+
+	ret = blk_mq_alloc_tag_set(&anv->tagset);
+	if (ret)
+		return ret;
+	ret = devm_add_action_or_reset(
+		anv->dev, (void (*)(void *))blk_mq_free_tag_set, &anv->tagset);
+	if (ret)
+		return ret;
+
+	anv->ctrl.admin_tagset = &anv->admin_tagset;
+	anv->ctrl.tagset = &anv->tagset;
+
+	return 0;
+}
+
+static int apple_nvme_queue_alloc(struct apple_nvme *anv,
+				  struct apple_nvme_queue *q)
+{
+	unsigned int depth = apple_nvme_queue_depth(q);
+
+	q->cqes = dmam_alloc_coherent(anv->dev,
+				      depth * sizeof(struct nvme_completion),
+				      &q->cq_dma_addr, GFP_KERNEL);
+	if (!q->cqes)
+		return -ENOMEM;
+
+	q->sqes = dmam_alloc_coherent(anv->dev,
+				      depth * sizeof(struct nvme_command),
+				      &q->sq_dma_addr, GFP_KERNEL);
+	if (!q->sqes)
+		return -ENOMEM;
+
+	/*
+	 * We need the maximum queue depth here because the NVMMU only has a
+	 * single depth configuration shared between both queues.
+	 */
+	q->tcbs = dmam_alloc_coherent(anv->dev,
+				      APPLE_ANS_MAX_QUEUE_DEPTH *
+					      sizeof(struct apple_nvmmu_tcb),
+				      &q->tcb_dma_addr, GFP_KERNEL);
+	if (!q->tcbs)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int apple_nvme_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct apple_nvme *anv;
+	int ret;
+
+	anv = devm_kzalloc(dev, sizeof(*anv), GFP_KERNEL);
+	if (!anv)
+		return -ENOMEM;
+
+	anv->dev = dev;
+	anv->adminq.is_adminq = true;
+	platform_set_drvdata(pdev, anv);
+
+	if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
+		return -ENXIO;
+
+	anv->irq = platform_get_irq(pdev, 0);
+	if (anv->irq < 0)
+		return anv->irq;
+	if (!anv->irq)
+		return -ENXIO;
+
+	anv->mmio_coproc = devm_platform_ioremap_resource_byname(pdev, "ans");
+	if (IS_ERR(anv->mmio_coproc))
+		return PTR_ERR(anv->mmio_coproc);
+	anv->mmio_nvme = devm_platform_ioremap_resource_byname(pdev, "nvme");
+	if (IS_ERR(anv->mmio_nvme))
+		return PTR_ERR(anv->mmio_nvme);
+
+	anv->adminq.sq_db = anv->mmio_nvme + APPLE_ANS_LINEAR_ASQ_DB;
+	anv->adminq.cq_db = anv->mmio_nvme + APPLE_ANS_ACQ_DB;
+	anv->ioq.sq_db = anv->mmio_nvme + APPLE_ANS_LINEAR_IOSQ_DB;
+	anv->ioq.cq_db = anv->mmio_nvme + APPLE_ANS_IOCQ_DB;
+
+	anv->sart = apple_sart_get(dev);
+	if (IS_ERR(anv->sart))
+		return dev_err_probe(dev, PTR_ERR(anv->sart),
+				     "Failed to initialize SART");
+
+	anv->reset = devm_reset_control_array_get_exclusive(anv->dev);
+	if (IS_ERR(anv->reset))
+		return dev_err_probe(dev, PTR_ERR(anv->reset),
+				     "Failed to get reset control");
+
+	INIT_WORK(&anv->ctrl.reset_work, apple_nvme_reset_work);
+	INIT_WORK(&anv->remove_work, apple_nvme_remove_dead_ctrl_work);
+	spin_lock_init(&anv->lock);
+
+	ret = apple_nvme_queue_alloc(anv, &anv->adminq);
+	if (ret)
+		return ret;
+	ret = apple_nvme_queue_alloc(anv, &anv->ioq);
+	if (ret)
+		return ret;
+
+	anv->prp_page_pool = dmam_pool_create("prp list page", anv->dev,
+					      NVME_CTRL_PAGE_SIZE,
+					      NVME_CTRL_PAGE_SIZE, 0);
+	if (!anv->prp_page_pool)
+		return -ENOMEM;
+
+	anv->prp_small_pool =
+		dmam_pool_create("prp list 256", anv->dev, 256, 256, 0);
+	if (!anv->prp_small_pool)
+		return -ENOMEM;
+
+	WARN_ON_ONCE(apple_nvme_iod_alloc_size() > PAGE_SIZE);
+	anv->iod_mempool =
+		mempool_create_kmalloc_pool(1, apple_nvme_iod_alloc_size());
+	if (!anv->iod_mempool)
+		return -ENOMEM;
+	ret = devm_add_action_or_reset(
+		anv->dev, (void (*)(void *))mempool_destroy, anv->iod_mempool);
+	if (ret)
+		return ret;
+
+	ret = apple_nvme_alloc_tagsets(anv);
+	if (ret)
+		return ret;
+
+	ret = devm_request_irq(anv->dev, anv->irq, apple_nvme_irq, 0,
+			       "nvme-apple", anv);
+	if (ret)
+		return dev_err_probe(dev, ret, "Failed to request IRQ");
+
+	anv->rtk =
+		devm_apple_rtkit_init(dev, anv, NULL, 0, &apple_nvme_rtkit_ops);
+	if (IS_ERR(anv->rtk))
+		return dev_err_probe(dev, PTR_ERR(anv->rtk),
+				     "Failed to initialize RTKit");
+
+	ret = nvme_init_ctrl(&anv->ctrl, anv->dev, &nvme_ctrl_ops,
+			     NVME_QUIRK_SKIP_CID_GEN);
+	if (ret)
+		return dev_err_probe(dev, ret,
+				     "Failed to initialize nvme_ctrl");
+
+	anv->ctrl.admin_q = blk_mq_init_queue(&anv->admin_tagset);
+	if (IS_ERR(anv->ctrl.admin_q))
+		return -ENOMEM;
+
+	nvme_reset_ctrl(&anv->ctrl);
+	async_schedule(apple_nvme_async_probe, anv);
+
+	return 0;
+}
+
+static int apple_nvme_remove(struct platform_device *pdev)
+{
+	struct apple_nvme *anv = platform_get_drvdata(pdev);
+
+	nvme_change_ctrl_state(&anv->ctrl, NVME_CTRL_DELETING);
+	flush_work(&anv->ctrl.reset_work);
+	nvme_stop_ctrl(&anv->ctrl);
+	nvme_remove_namespaces(&anv->ctrl);
+	apple_nvme_disable(anv, true);
+	nvme_uninit_ctrl(&anv->ctrl);
+
+	if (apple_rtkit_is_running(anv->rtk))
+		apple_rtkit_shutdown(anv->rtk);
+
+	return 0;
+}
+
+static void apple_nvme_shutdown(struct platform_device *pdev)
+{
+	struct apple_nvme *anv = platform_get_drvdata(pdev);
+
+	apple_nvme_disable(anv, true);
+	if (apple_rtkit_is_running(anv->rtk))
+		apple_rtkit_shutdown(anv->rtk);
+}
+
+static const struct of_device_id apple_nvme_of_match[] = {
+	{ .compatible = "apple,nvme-ans2" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, apple_nvme_of_match);
+
+static struct platform_driver apple_nvme_driver = {
+	.driver = {
+		.name = "nvme-apple",
+		.of_match_table = apple_nvme_of_match,
+	},
+	.probe = apple_nvme_probe,
+	.remove = apple_nvme_remove,
+	.shutdown = apple_nvme_shutdown,
+};
+module_platform_driver(apple_nvme_driver);
+
+MODULE_AUTHOR("Sven Peter <sven@svenpeter.dev>");
+MODULE_LICENSE("GPL");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 7/9] nvme-apple: Serialize command issue
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
                   ` (5 preceding siblings ...)
  2022-03-21 16:50 ` [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-24  6:16   ` Christoph Hellwig
  2022-03-21 16:50 ` [PATCH 8/9] nvme-apple: Add support for multiple power domains Sven Peter
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

This controller shouldn't need serialization of command issue since
the SQ is replaced by a simple array and commands are issued by writing
the array index to a MMIO register.
Without serialization however sometimes commands are still executed
correctly and appear in the CQ but never trigger an interrupt.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
[sven: added our best guess why this needs to be done]
Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/nvme/host/apple.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 587d6c7014a0..a4193429564e 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -292,6 +292,7 @@ static void apple_nvmmu_inval(struct apple_nvme_queue *q, unsigned int tag)
 static void apple_nvme_submit_cmd(struct apple_nvme_queue *q,
 				  struct nvme_command *cmd)
 {
+	struct apple_nvme *anv = queue_to_apple_nvme(q);
 	u32 tag = nvme_tag_from_cid(cmd->common.command_id);
 	struct apple_nvmmu_tcb *tcb = &q->tcbs[tag];
 
@@ -308,7 +309,18 @@ static void apple_nvme_submit_cmd(struct apple_nvme_queue *q,
 		tcb->dma_flags |= APPLE_ANS_TCB_DMA_FROM_DEVICE;
 
 	memcpy(&q->sqes[tag], cmd, sizeof(*cmd));
+
+	/*
+	 * This lock here doesn't make much sense at a first glace but
+	 * removing it will result in occasional missed completetion
+	 * interrupts even though the commands still appear on the CQ.
+	 * It's unclear why this happens but our best guess is that
+	 * there is a bug in the firmware triggered when a write to the
+	 * CQ and the SQ happen simultaneously.
+	 */
+	spin_lock_irq(&anv->lock);
 	writel(tag, q->sq_db);
+	spin_unlock_irq(&anv->lock);
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 8/9] nvme-apple: Add support for multiple power domains
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
                   ` (6 preceding siblings ...)
  2022-03-21 16:50 ` [PATCH 7/9] nvme-apple: Serialize command issue Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-21 16:50 ` [PATCH 9/9] nvme-apple: Add support for suspend/resume Sven Peter
  2022-03-22 17:26 ` [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Alyssa Rosenzweig
  9 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

From: Hector Martin <marcan@marcan.st>

Turns out we really need this, as the APCIE_ST*_SYS domains really do
hard-depend on ANS2, at least on t6000.

Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/nvme/host/apple.c | 67 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index a4193429564e..d89b4ab80169 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -29,6 +29,7 @@
 #include <linux/of_platform.h>
 #include <linux/once.h>
 #include <linux/platform_device.h>
+#include <linux/pm_domain.h>
 #include <linux/soc/apple/rtkit.h>
 #include <linux/soc/apple/sart.h>
 #include <linux/reset.h>
@@ -178,6 +179,10 @@ struct apple_nvme {
 	void __iomem *mmio_coproc;
 	void __iomem *mmio_nvme;
 
+	struct device **pd_dev;
+	struct device_link **pd_link;
+	int pd_count;
+
 	struct apple_sart *sart;
 	struct apple_rtkit *rtk;
 	struct reset_control *reset;
@@ -1313,6 +1318,62 @@ static int apple_nvme_queue_alloc(struct apple_nvme *anv,
 	return 0;
 }
 
+static void apple_nvme_detach_genpd(struct apple_nvme *anv)
+{
+	int i;
+
+	if (anv->pd_count <= 1)
+		return;
+
+	for (i = anv->pd_count - 1; i >= 0; i--) {
+		if (anv->pd_link[i])
+			device_link_del(anv->pd_link[i]);
+		if (!IS_ERR_OR_NULL(anv->pd_dev[i]))
+			dev_pm_domain_detach(anv->pd_dev[i], true);
+	}
+}
+
+static int apple_nvme_attach_genpd(struct apple_nvme *anv)
+{
+	struct device *dev = anv->dev;
+	int i;
+
+	anv->pd_count = of_count_phandle_with_args(dev->of_node,
+						   "power-domains",
+						   "#power-domain-cells");
+	if (anv->pd_count <= 1)
+		return 0;
+
+	anv->pd_dev = devm_kcalloc(dev, anv->pd_count, sizeof(*anv->pd_dev),
+				   GFP_KERNEL);
+	if (!anv->pd_dev)
+		return -ENOMEM;
+
+	anv->pd_link = devm_kcalloc(dev, anv->pd_count, sizeof(*anv->pd_link),
+				    GFP_KERNEL);
+	if (!anv->pd_link)
+		return -ENOMEM;
+
+	for (i = 0; i < anv->pd_count; i++) {
+		anv->pd_dev[i] = dev_pm_domain_attach_by_id(dev, i);
+		if (IS_ERR(anv->pd_dev[i])) {
+			apple_nvme_detach_genpd(anv);
+			return PTR_ERR(anv->pd_dev[i]);
+		}
+
+		anv->pd_link[i] = device_link_add(dev, anv->pd_dev[i],
+						  DL_FLAG_STATELESS |
+						  DL_FLAG_PM_RUNTIME |
+						  DL_FLAG_RPM_ACTIVE);
+		if (!anv->pd_link[i]) {
+			apple_nvme_detach_genpd(anv);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 static int apple_nvme_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -1327,6 +1388,10 @@ static int apple_nvme_probe(struct platform_device *pdev)
 	anv->adminq.is_adminq = true;
 	platform_set_drvdata(pdev, anv);
 
+	ret = apple_nvme_attach_genpd(anv);
+	if (ret < 0)
+		return dev_err_probe(dev, ret, "Failed to attach power domains");
+
 	if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
 		return -ENXIO;
 
@@ -1435,6 +1500,8 @@ static int apple_nvme_remove(struct platform_device *pdev)
 	if (apple_rtkit_is_running(anv->rtk))
 		apple_rtkit_shutdown(anv->rtk);
 
+	apple_nvme_detach_genpd(anv);
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 9/9] nvme-apple: Add support for suspend/resume
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
                   ` (7 preceding siblings ...)
  2022-03-21 16:50 ` [PATCH 8/9] nvme-apple: Add support for multiple power domains Sven Peter
@ 2022-03-21 16:50 ` Sven Peter
  2022-03-24  6:17   ` Christoph Hellwig
  2022-03-22 17:26 ` [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Alyssa Rosenzweig
  9 siblings, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-03-21 16:50 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

From: Hector Martin <marcan@marcan.st>

Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/nvme/host/apple.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index d89b4ab80169..c949a18ec690 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1514,6 +1514,36 @@ static void apple_nvme_shutdown(struct platform_device *pdev)
 		apple_rtkit_shutdown(anv->rtk);
 }
 
+#ifdef CONFIG_PM_SLEEP
+static int apple_nvme_resume(struct device *dev)
+{
+	struct apple_nvme *anv = dev_get_drvdata(dev);
+
+	return  nvme_reset_ctrl(&anv->ctrl);
+}
+
+static int apple_nvme_suspend(struct device *dev)
+{
+	struct apple_nvme *anv = dev_get_drvdata(dev);
+	int ret = 0;
+
+	apple_nvme_disable(anv, true);
+
+	if (apple_rtkit_is_running(anv->rtk))
+		ret = apple_rtkit_shutdown(anv->rtk);
+
+	writel_relaxed(0, anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
+	(void)readl_relaxed(anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
+
+	return ret;
+}
+
+static const struct dev_pm_ops apple_nvme_pm_ops = {
+	.suspend	= apple_nvme_suspend,
+	.resume		= apple_nvme_resume,
+};
+#endif
+
 static const struct of_device_id apple_nvme_of_match[] = {
 	{ .compatible = "apple,nvme-ans2" },
 	{},
@@ -1524,6 +1554,9 @@ static struct platform_driver apple_nvme_driver = {
 	.driver = {
 		.name = "nvme-apple",
 		.of_match_table = apple_nvme_of_match,
+#ifdef CONFIG_PM_SLEEP
+		.pm = &apple_nvme_pm_ops,
+#endif
 	},
 	.probe = apple_nvme_probe,
 	.remove = apple_nvme_remove,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-21 16:50 ` [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver Sven Peter
@ 2022-03-21 17:01   ` Keith Busch
  2022-04-02 13:10     ` Sven Peter
  2022-03-22 13:38   ` Arnd Bergmann
  2022-03-24  6:16   ` Christoph Hellwig
  2 siblings, 1 reply; 55+ messages in thread
From: Keith Busch @ 2022-03-21 17:01 UTC (permalink / raw)
  To: Sven Peter
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Hector Martin,
	Alyssa Rosenzweig, Rob Herring, Arnd Bergmann, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

On Mon, Mar 21, 2022 at 05:50:46PM +0100, Sven Peter wrote:
> +static bool apple_nvme_poll_cq(struct apple_nvme_queue *q,
> +			       struct io_comp_batch *iob)
> +{
> +	bool found = false;
> +
> +	while (apple_nvme_cqe_pending(q)) {
> +		found = true;
> +
> +		/*
> +		 * load-load control dependency between phase and the rest of
> +		 * the cqe requires a full read memory barrier
> +		 */
> +		dma_rmb();
> +		apple_nvme_handle_cqe(q, iob, q->cq_head);
> +		apple_nvme_update_cq_head(q);
> +	}
> +
> +	if (found)
> +		writel_relaxed(q->cq_head, q->cq_db);

Doesn't a relaxed write mean that a subsequent write can bypass the previous?
If so, that sounds like it can corrupt the cq head.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-03-21 16:50 ` [PATCH 4/9] soc: apple: Add SART driver Sven Peter
@ 2022-03-21 17:07   ` Arnd Bergmann
  2022-04-02 12:38     ` Sven Peter
  2022-03-21 17:17   ` Alyssa Rosenzweig
  1 sibling, 1 reply; 55+ messages in thread
From: Arnd Bergmann @ 2022-03-21 17:07 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Marc Zyngier, DTML, Linux ARM, Linux Kernel Mailing List,
	linux-nvme

On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>
> The NVMe co-processor on the Apple M1 uses a DMA address filter called
> SART for some DMA transactions. This adds a simple driver used to
> configure the memory regions from which DMA transactions are allowed.
>
> Co-developed-by: Hector Martin <marcan@marcan.st>
> Signed-off-by: Hector Martin <marcan@marcan.st>
> Signed-off-by: Sven Peter <sven@svenpeter.dev>

Can you add some explanation about why this uses a custom interface
instead of hooking into the dma_map_ops?

> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
> +                           phys_addr_t *paddr, size_t *size)
> +{
> +       u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
> +       u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));

Why do you use the _relaxed() accessors here and elsewhere in the driver?

> +struct apple_sart *apple_sart_get(struct device *dev)
> +{
> +       struct device_node *sart_node;
> +       struct platform_device *sart_pdev;
> +       struct apple_sart *sart;
> +
> +       sart_node = of_parse_phandle(dev->of_node, "apple,sart", 0);
> +       if (!sart_node)
> +               return ERR_PTR(ENODEV);

The error pointers need to take negative values, like 'ERR_PTR(-ENODEV)',
here and everywhere else in the driver.

       Arnd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-03-21 16:50 ` [PATCH 4/9] soc: apple: Add SART driver Sven Peter
  2022-03-21 17:07   ` Arnd Bergmann
@ 2022-03-21 17:17   ` Alyssa Rosenzweig
  2022-04-02 12:40     ` Sven Peter
  1 sibling, 1 reply; 55+ messages in thread
From: Alyssa Rosenzweig @ 2022-03-21 17:17 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Rob Herring, Arnd Bergmann, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

> +/*
> + * Adds the region [paddr, paddr+size] to the DMA allow list.
> + *
> + * @sart: SART reference
> + * @paddr: Start address of the region to be used for DMA
> + * @size: Size of the region to be used for DMA.
> + */
> +int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
> +				  size_t size);
> +
> +/*
> + * Removes the region [paddr, paddr+size] from the DMA allow list.
> + *
> + * Note that exact same paddr and size used for apple_sart_add_allowed_region
> + * have to be passed.
> + *
> + * @sart: SART reference
> + * @paddr: Start address of the region no longer used for DMA
> + * @size: Size of the region no longer used for DMA.
> + */
> +int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
> +				     size_t size);

Might it be simpler for add_allowed_region to return a handle that
remove_allowed_region takes instead of paddr+size? Then
remove_allowed_region doesn't have to walk anything, and the requirement
to use the same paddr/size is implicit.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-21 16:50 ` [PATCH 5/9] soc: apple: Add RTKit IPC library Sven Peter
@ 2022-03-22 13:13   ` Arnd Bergmann
  2022-03-22 17:41     ` Robin Murphy
  2022-04-02 12:56     ` Sven Peter
  2022-03-22 17:23   ` Alyssa Rosenzweig
  2022-03-23 11:19   ` Krzysztof Kozlowski
  2 siblings, 2 replies; 55+ messages in thread
From: Arnd Bergmann @ 2022-03-22 13:13 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Marc Zyngier, DTML, Linux ARM, Linux Kernel Mailing List,
	linux-nvme

On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>
> Apple SoCs such as the M1 come with multiple embedded co-processors
> running proprietary firmware. Communication with those is established
> over a simple mailbox using the RTKit IPC protocol.
>
> Signed-off-by: Sven Peter <sven@svenpeter.dev>

> +
> +#define rtk_err(format, arg...) dev_err(rtk->dev, "RTKit: " format, ##arg)
> +#define rtk_warn(format, arg...) dev_warn(rtk->dev, "RTKit: " format, ##arg)
> +#define rtk_info(format, arg...) dev_info(rtk->dev, "RTKit: " format, ##arg)
> +#define rtk_dbg(format, arg...) dev_dbg(rtk->dev, "RTKit: " format, ##arg)

I generally don't like the custom printing macros, please just open-code
the prints where they are used, that makes it easier for other kernel
developers to see exactly what is being printed.

> +enum { APPLE_RTKIT_WORK_MSG,
> +       APPLE_RTKIT_WORK_REINIT,
> +};
> +
> +enum { APPLE_RTKIT_PWR_STATE_OFF = 0x00,
> +       APPLE_RTKIT_PWR_STATE_SLEEP = 0x01,
> +       APPLE_RTKIT_PWR_STATE_GATED = 0x02,
> +       APPLE_RTKIT_PWR_STATE_QUIESCED = 0x10,
> +       APPLE_RTKIT_PWR_STATE_ON = 0x20,
> +};

This is an odd indentation style, I would insert a newline after the 'enum {'

> +static int apple_rtkit_worker(void *data)
> +{
> +       struct apple_rtkit *rtk = data;
> +       struct apple_rtkit_work work;
> +
> +       while (!kthread_should_stop()) {
> +               wait_event_interruptible(rtk->wq,
> +                                        kfifo_len(&rtk->work_fifo) > 0 ||
> +                                                kthread_should_stop());
> +
> +               if (kthread_should_stop())
> +                       break;
> +
> +               while (kfifo_out_spinlocked(&rtk->work_fifo, &work, 1,
> +                                           &rtk->work_lock) == 1) {
> +                       switch (work.type) {
> +                       case APPLE_RTKIT_WORK_MSG:
> +                               apple_rtkit_rx(rtk, &work.msg);
> +                               break;
> +                       case APPLE_RTKIT_WORK_REINIT:
> +                               apple_rtkit_do_reinit(rtk);
> +                               break;
> +                       }
> +               }

It looks like you add quite a bit of complexity by using a custom
worker thread implementation. Can you explain what this is
needed for? Isn't this roughly the same thing that one would
get more easily with create_singlethread_workqueue()?

> +#if IS_ENABLED(CONFIG_APPLE_RTKIT)

Instead of allowing the interface to be used without CONFIG_APPLE_RTKIT,
I think it is sufficient to allow the driver itself to be built with
CONFIG_COMPILE_TEST (as you already do), and then have
drivers using it marked as 'depends on APPLE_RTKIT'
unconditionally.

> +/*
> + * Initializes the internal state required to handle RTKit. This
> + * should usually be called within _probe.
> + *
> + * @dev: Pointer to the device node this coprocessor is assocated with
> + * @cookie: opaque cookie passed to all functions defined in rtkit_ops
> + * @mbox_name: mailbox name used to communicate with the co-processor
> + * @mbox_idx: mailbox index to be used if mbox_name is NULL
> + * @ops: pointer to rtkit_ops to be used for this co-processor
> + */
> +struct apple_rtkit *apple_rtkit_init(struct device *dev, void *cookie,
> +                                    const char *mbox_name, int mbox_idx,
> +                                    const struct apple_rtkit_ops *ops);
> +
> +/*
> + * Dev-res managed version of apple_rtkit_init.
> + */
> +struct apple_rtkit *devm_apple_rtkit_init(struct device *dev, void *cookie,
> +                                         const char *mbox_name, int mbox_idx,
> +                                         const struct apple_rtkit_ops *ops);

Do we need to export both of these?

         Arnd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-21 16:50 ` [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver Sven Peter
  2022-03-21 17:01   ` Keith Busch
@ 2022-03-22 13:38   ` Arnd Bergmann
  2022-04-02 13:34     ` Sven Peter
  2022-03-24  6:16   ` Christoph Hellwig
  2 siblings, 1 reply; 55+ messages in thread
From: Arnd Bergmann @ 2022-03-22 13:38 UTC (permalink / raw)
  To: Sven Peter
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Marc Zyngier, DTML, Linux ARM, Linux Kernel Mailing List,
	linux-nvme

On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:

> +static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,
> +                                    dma_addr_t iova, size_t size)
> +{
> +       struct apple_nvme *anv = cookie;
> +       int ret;
> +
> +       if (iova)
> +               return -EINVAL;
> +
> +       bfr->buffer = dma_alloc_coherent(anv->dev, size, &iova, GFP_KERNEL);
> +       if (!bfr->buffer)
> +               return -ENOMEM;

You pass 'iova' as an argument, but then replace it with the address
returned by dma_alloc_coherent(). Can you remove the function
argument?

> +static void apple_nvmmu_inval(struct apple_nvme_queue *q, unsigned int tag)
> +{
> +       struct apple_nvme *anv = queue_to_apple_nvme(q);
> +
> +       writel(tag, anv->mmio_nvme + APPLE_NVMMU_TCB_INVAL);
> +       if (readl_relaxed(anv->mmio_nvme + APPLE_NVMMU_TCB_STAT))
> +               dev_warn(anv->dev, "NVMMU TCB invalidation failed\n");
> +}

I don't like to see the _relaxed() accessors used without an explanation
about why that helps. Please use the non-relaxed version, or make sure
it's obvious here why you use it.

> +bad_sgl:
> +       WARN(DO_ONCE(apple_nvme_print_sgl, iod->sg, iod->nents),
> +            "Invalid SGL for payload:%d nents:%d\n", blk_rq_payload_bytes(req),
> +            iod->nents);

I think you mean WARN_ONCE() here?

> +       writel_relaxed(0, anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
> +       (void)readl_relaxed(anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);

What is the purpose of the readl_relaxed() here? It looks like you are
trying to flush
the write to the hardware, but then again

  a) on Apple hardware, the registers are mapped using PROT_DEVICE_nGnRnE,
      so MMIO writes are never posted

  b) the read is "_relaxed", so there is no barrier, and the result is
unused, so
      it would appear that the CPU can just keep executing code anyway.

Since this is all the initialization path, I can't imagine what the
relaxation of
the barriers helps with.

> +static int apple_nvme_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
> +{
> +       *val = readl_relaxed(ctrl_to_apple_nvme(ctrl)->mmio_nvme + off);
> +       return 0;
> +}
> +
> +static int apple_nvme_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
> +{
> +       writel_relaxed(val, ctrl_to_apple_nvme(ctrl)->mmio_nvme + off);
> +       return 0;
> +}

If you have generic register access functions, don't make them use
_relaxed internally. If there are instances that need to be _relaxed,
add another version of the accessor that spells this out in the caller.

       Arnd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-21 16:50 ` [PATCH 5/9] soc: apple: Add RTKit IPC library Sven Peter
  2022-03-22 13:13   ` Arnd Bergmann
@ 2022-03-22 17:23   ` Alyssa Rosenzweig
  2022-04-02 12:50     ` Sven Peter
  2022-03-23 11:19   ` Krzysztof Kozlowski
  2 siblings, 1 reply; 55+ messages in thread
From: Alyssa Rosenzweig @ 2022-03-22 17:23 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Rob Herring, Arnd Bergmann, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

> +enum { APPLE_RTKIT_PWR_STATE_OFF = 0x00,
> +	APPLE_RTKIT_PWR_STATE_SLEEP = 0x01,
> +	APPLE_RTKIT_PWR_STATE_GATED = 0x02,
> +	APPLE_RTKIT_PWR_STATE_QUIESCED = 0x10,
> +	APPLE_RTKIT_PWR_STATE_ON = 0x20,
> +};

It would be great to get comments added explaining what these states
are. It's not obvious how off/sleep/gated/quiesced differ, and it's not
obvious from the code (doesn't look like GATED is used here at all?).
Are these Apple names or r/e'd names or a mix?

> +	if (!rtk->syslog_buffer.size) {
> +		rtk_warn(
> +			"received syslog message but syslog_buffer.size is zero");
> +		goto done;
> +	}
> +	if (!rtk->syslog_buffer.buffer && !rtk->syslog_buffer.iomem) {
> +		rtk_warn("received syslog message but no syslog_buffer.buffer or syslog_buffer.iomem");
> +		goto done;
> +	}

Nit: Wrapping is inconsistent between these two warns.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver
  2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
                   ` (8 preceding siblings ...)
  2022-03-21 16:50 ` [PATCH 9/9] nvme-apple: Add support for suspend/resume Sven Peter
@ 2022-03-22 17:26 ` Alyssa Rosenzweig
  9 siblings, 0 replies; 55+ messages in thread
From: Alyssa Rosenzweig @ 2022-03-22 17:26 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Rob Herring, Arnd Bergmann, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

> The driver itself has been successfully used by multiple people as
> their daily driver for weeks at this point and no major issues have
> been reported.

FWIW, various versions of your NVMe code have been my daily driver for
development since late 2021 now, along with the display driver based on
your RTKit library here. I'm not qualified to judge code quality or
fitness for mainline, but I can testify that it's functional.

(Not a formal T-b since I haven't tested this exact version of the patch
series, though.)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-22 13:13   ` Arnd Bergmann
@ 2022-03-22 17:41     ` Robin Murphy
  2022-04-02 12:56     ` Sven Peter
  1 sibling, 0 replies; 55+ messages in thread
From: Robin Murphy @ 2022-03-22 17:41 UTC (permalink / raw)
  To: Arnd Bergmann, Sven Peter
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Marc Zyngier, DTML,
	Linux ARM, Linux Kernel Mailing List, linux-nvme

On 2022-03-22 13:13, Arnd Bergmann wrote:
>> +#define rtk_err(format, arg...) dev_err(rtk->dev, "RTKit: " format, ##arg)
>> +#define rtk_warn(format, arg...) dev_warn(rtk->dev, "RTKit: " format, ##arg)
>> +#define rtk_info(format, arg...) dev_info(rtk->dev, "RTKit: " format, ##arg)
>> +#define rtk_dbg(format, arg...) dev_dbg(rtk->dev, "RTKit: " format, ##arg)
> 
> I generally don't like the custom printing macros, please just open-code
> the prints where they are used, that makes it easier for other kernel
> developers to see exactly what is being printed

More to the point, implicitly depending on having something named "rtk" 
in scope is pretty much inexcusable, and the only thing left after 
fixing that is what we have pr_fmt for ;)

Robin.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe
  2022-03-21 16:50 ` [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe Sven Peter
@ 2022-03-23 11:14   ` Krzysztof Kozlowski
  2022-04-02 13:05     ` Sven Peter
  0 siblings, 1 reply; 55+ messages in thread
From: Krzysztof Kozlowski @ 2022-03-23 11:14 UTC (permalink / raw)
  To: Sven Peter, Rob Herring
  Cc: Hector Martin, Alyssa Rosenzweig, Arnd Bergmann, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

On 21/03/2022 17:50, Sven Peter wrote:
> Apple SoCs such as the M1 come with an embedded NVMe coprocessor called
> ANS2.
> 
> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> ---
>  .../bindings/soc/apple/apple,nvme-ans.yaml    | 75 +++++++++++++++++++
>  1 file changed, 75 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
> 
> diff --git a/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
> new file mode 100644
> index 000000000000..e1f4c1c572aa
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
> @@ -0,0 +1,75 @@
> +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/soc/apple/apple,nvme-ans.yaml#

Do not drop all code in soc/apple, but please use respective subsystems.
Apple is not a subsystem, is not special.

> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Apple ANS NVM Express host controller
> +
> +maintainers:
> +  - Sven Peter <sven@svenpeter.dev>
> +
> +properties:
> +  compatible:
> +    items:
> +      - enum:
> +          - apple,t8103-nvme-ans2
> +          - apple,t6000-nvme-ans2
> +      - const: apple,nvme-ans2
> +
> +  reg:
> +    items:
> +      - description: NVMe and NVMMU registers
> +      - description: ANS2 co-processor control registers
> +
> +  reg-names:
> +    items:
> +      - const: nvme
> +      - const: ans
> +
> +  resets:
> +    maxItems: 1
> +
> +  power-domains: true

maxItems



Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-21 16:50 ` [PATCH 5/9] soc: apple: Add RTKit IPC library Sven Peter
  2022-03-22 13:13   ` Arnd Bergmann
  2022-03-22 17:23   ` Alyssa Rosenzweig
@ 2022-03-23 11:19   ` Krzysztof Kozlowski
  2022-04-02 13:51     ` Sven Peter
  2 siblings, 1 reply; 55+ messages in thread
From: Krzysztof Kozlowski @ 2022-03-23 11:19 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme

On 21/03/2022 17:50, Sven Peter wrote:
> Apple SoCs such as the M1 come with multiple embedded co-processors
> running proprietary firmware. Communication with those is established
> over a simple mailbox using the RTKit IPC protocol.
> 
> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> ---
>  drivers/soc/apple/Kconfig          |  13 +
>  drivers/soc/apple/Makefile         |   3 +
>  drivers/soc/apple/rtkit-crashlog.c | 147 +++++
>  drivers/soc/apple/rtkit-internal.h |  76 +++
>  drivers/soc/apple/rtkit.c          | 842 +++++++++++++++++++++++++++++
>  include/linux/soc/apple/rtkit.h    | 203 +++++++
>  6 files changed, 1284 insertions(+)

Isn't this some implementation of a mailbox? If so, it should be in
drivers/mailbox. Please don't put all stuff in soc/apple, that's not how
Linux is organized. To drivers/soc usually we put drivers which do not
fit regular subsystems.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-21 16:50 ` [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver Sven Peter
  2022-03-21 17:01   ` Keith Busch
  2022-03-22 13:38   ` Arnd Bergmann
@ 2022-03-24  6:16   ` Christoph Hellwig
  2022-04-02 12:47     ` Sven Peter
  2022-04-04 15:57     ` Hector Martin
  2 siblings, 2 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-03-24  6:16 UTC (permalink / raw)
  To: Sven Peter
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme

> +
> +//#define DEBUG

This should not leak into the driverṡ

> +#include <linux/blk-integrity.h>

As far as I can tell this driver does not support metadata or PI,
so why is this include needed?

> +/* NVM Express NVM Command Set Specification, Revision 1.0a, Figure 18 */
> +#define NVME_OPCODE_DATA_XFER_HOST_TO_CTRL BIT(0)
> +#define NVME_OPCODE_DATA_XFER_CTRL_TO_HOST BIT(1)

Please just use the nvme_is_write helper where you are using these.

> +static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,

Please avoid > 80 character lines.

> +static void apple_nvme_free_ctrl(struct nvme_ctrl *ctrl)
> +{
> +}

So where are the apple specific resources free?  ->free_ctrl is
the callback from the struct device release callback, so without
one the resource release can't be tried to the device release.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] nvme-apple: Serialize command issue
  2022-03-21 16:50 ` [PATCH 7/9] nvme-apple: Serialize command issue Sven Peter
@ 2022-03-24  6:16   ` Christoph Hellwig
  2022-04-02 12:42     ` Sven Peter
  0 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-03-24  6:16 UTC (permalink / raw)
  To: Sven Peter
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme, Jens Axboe

On Mon, Mar 21, 2022 at 05:50:47PM +0100, Sven Peter wrote:
> From: Jens Axboe <axboe@kernel.dk>
> 
> This controller shouldn't need serialization of command issue since
> the SQ is replaced by a simple array and commands are issued by writing
> the array index to a MMIO register.
> Without serialization however sometimes commands are still executed
> correctly and appear in the CQ but never trigger an interrupt.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> [sven: added our best guess why this needs to be done]
> Signed-off-by: Sven Peter <sven@svenpeter.dev>

This really should go into the previous patch.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 9/9] nvme-apple: Add support for suspend/resume
  2022-03-21 16:50 ` [PATCH 9/9] nvme-apple: Add support for suspend/resume Sven Peter
@ 2022-03-24  6:17   ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-03-24  6:17 UTC (permalink / raw)
  To: Sven Peter
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme

Same for patches 8 and 9 - this should go into the main driver
submission.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART
  2022-03-21 16:50 ` [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART Sven Peter
@ 2022-03-31 21:23   ` Rob Herring
  2022-04-02 12:58     ` Sven Peter
  0 siblings, 1 reply; 55+ messages in thread
From: Rob Herring @ 2022-03-31 21:23 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Alyssa Rosenzweig, Arnd Bergmann, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

On Mon, Mar 21, 2022 at 05:50:41PM +0100, Sven Peter wrote:
> Apple SoCs such as the M1 come with a simple DMA address filter called
> SART. Unlike a real IOMMU no pagetables can be configured but instead
> DMA transactions can be allowed for up to 16 paddr regions.
> 
> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> ---
>  .../bindings/soc/apple/apple,sart.yaml        | 52 +++++++++++++++++++

Close enough to an IOMMU in terms of its purpose, so put in 
bindings/iommu/

>  MAINTAINERS                                   |  1 +
>  2 files changed, 53 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,sart.yaml
> 
> diff --git a/Documentation/devicetree/bindings/soc/apple/apple,sart.yaml b/Documentation/devicetree/bindings/soc/apple/apple,sart.yaml
> new file mode 100644
> index 000000000000..d8177b3a3fba
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/apple/apple,sart.yaml
> @@ -0,0 +1,52 @@
> +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/soc/apple/apple,sart.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Apple SART DMA address filter
> +
> +maintainers:
> +  - Sven Peter <sven@svenpeter.dev>
> +
> +description:
> +  Apple SART is a simple address filter for DMA transactions. Regions of
> +  physical memory must be added to the SART's allow list before any
> +  DMA can target these. Unlike a proper IOMMU no remapping can be done and
> +  special support in the consumer driver is required since not all DMA
> +  transactions of a single device are subject to SART filtering.
> +
> +  SART1 has first been used since at least the A11 (iPhone 8 and iPhone X)
> +  and allows 36 bit of physical address space and filter entries with sizes
> +  up to 24 bit.
> +
> +  SART2, first seen in A14 and M1, allows 36 bit of physical address space
> +  and filter entry size up to 36 bit.
> +
> +  SART3, first seen in M1 Pro/Max, extends both the address space and filter
> +  entry size to 42 bit.
> +
> +properties:
> +  compatible:
> +    enum:
> +      - apple,t6000-sart
> +      - apple,t8103-sart
> +
> +  reg:
> +    maxItems: 1
> +
> +  power-domains:
> +    maxItems: 1
> +
> +required:
> +  - compatible
> +  - reg
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    sart@7bc50000 {
> +      compatible = "apple,t8103-sart";
> +      reg = <0x7bc50000 0x4000>;
> +    };
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cd0f68d4a34a..027c3b4ad61c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1774,6 +1774,7 @@ F:	Documentation/devicetree/bindings/mailbox/apple,mailbox.yaml
>  F:	Documentation/devicetree/bindings/pci/apple,pcie.yaml
>  F:	Documentation/devicetree/bindings/pinctrl/apple,pinctrl.yaml
>  F:	Documentation/devicetree/bindings/power/apple*
> +F:	Documentation/devicetree/bindings/soc/apple/*
>  F:	Documentation/devicetree/bindings/watchdog/apple,wdt.yaml
>  F:	arch/arm64/boot/dts/apple/
>  F:	drivers/i2c/busses/i2c-pasemi-core.c
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-03-21 17:07   ` Arnd Bergmann
@ 2022-04-02 12:38     ` Sven Peter
  2022-04-02 19:07       ` Arnd Bergmann
  0 siblings, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-04-02 12:38 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Keith Busch,
	axboe, hch, sagi, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme

Hi,

Thanks for the review!

On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote:
> On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>>
>> The NVMe co-processor on the Apple M1 uses a DMA address filter called
>> SART for some DMA transactions. This adds a simple driver used to
>> configure the memory regions from which DMA transactions are allowed.
>>
>> Co-developed-by: Hector Martin <marcan@marcan.st>
>> Signed-off-by: Hector Martin <marcan@marcan.st>
>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>
> Can you add some explanation about why this uses a custom interface
> instead of hooking into the dma_map_ops?

Sure.
In a perfect world this would just be an IOMMU implementation but since
SART can't create any real IOVA space using pagetables it doesn't fit
inside that subsytem.

In a slightly less perfect world I could just implement dma_map_ops here
but that won't work either because not all DMA buffers of the NVMe
device have to go through SART and those allocations happen
inside the same device and would use the same dma_map_ops.

The NVMe controller has two separate DMA filters:

   - NVMMU, which must be set up for any command that uses PRPs and
     ensures that the DMA transactions only touch the pages listed
     inside the PRP structure. NVMMU itself is tightly coupled
     to the NVMe controller: The list of allowed pages is configured
     based on command's tag id and even commands that require no DMA
     transactions must be listed inside NVMMU before they are started.
   - SART, which must be set up for some shared memory buffers (e.g.
     log messages from the NVMe firmware) and for some NVMe debug
     commands that don't use PRPs.
     SART is only loosely coupled to the NVMe controller and could
     also be used together with other devices. It's also the only
     thing that changed between M1 and M1 Pro/Max/Ultra and that's
     why I decided to separate it from the NVMe driver.

I'll add this explanation to the commit message.

>
>> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
>> +                           phys_addr_t *paddr, size_t *size)
>> +{
>> +       u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
>> +       u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
>
> Why do you use the _relaxed() accessors here and elsewhere in the driver?

This device itself doesn't do any DMA transactions so it needs no memory
synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write
from/to these buffers (multiple times) and they have the required barriers
in place whenever they are used.

These buffers so far are only allocated at probe time though so even using
the normal writel/readl here won't hurt performance at all. I can just use
those if you prefer or alternatively add a comment why _relaxed is fine here.

This is a bit similar to the discussion for the pinctrl series last year [1].

>
>> +struct apple_sart *apple_sart_get(struct device *dev)
>> +{
>> +       struct device_node *sart_node;
>> +       struct platform_device *sart_pdev;
>> +       struct apple_sart *sart;
>> +
>> +       sart_node = of_parse_phandle(dev->of_node, "apple,sart", 0);
>> +       if (!sart_node)
>> +               return ERR_PTR(ENODEV);
>
> The error pointers need to take negative values, like 'ERR_PTR(-ENODEV)',
> here and everywhere else in the driver.

Ouch, that's my second bug of that kind in the past days. I'll fix it here
and check the other patches in this series as well.


Thanks,


Sven


[1] https://lore.kernel.org/lkml/87sfz8zdzb.wl-maz@kernel.org/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-03-21 17:17   ` Alyssa Rosenzweig
@ 2022-04-02 12:40     ` Sven Peter
  0 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 12:40 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Hector Martin, Rob Herring, Arnd Bergmann, Keith Busch, axboe,
	hch, sagi, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

On Mon, Mar 21, 2022, at 18:17, Alyssa Rosenzweig wrote:
>> +/*
>> + * Adds the region [paddr, paddr+size] to the DMA allow list.
>> + *
>> + * @sart: SART reference
>> + * @paddr: Start address of the region to be used for DMA
>> + * @size: Size of the region to be used for DMA.
>> + */
>> +int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
>> +				  size_t size);
>> +
>> +/*
>> + * Removes the region [paddr, paddr+size] from the DMA allow list.
>> + *
>> + * Note that exact same paddr and size used for apple_sart_add_allowed_region
>> + * have to be passed.
>> + *
>> + * @sart: SART reference
>> + * @paddr: Start address of the region no longer used for DMA
>> + * @size: Size of the region no longer used for DMA.
>> + */
>> +int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr,
>> +				     size_t size);
>
> Might it be simpler for add_allowed_region to return a handle that
> remove_allowed_region takes instead of paddr+size? Then
> remove_allowed_region doesn't have to walk anything, and the requirement
> to use the same paddr/size is implicit.

I liked that idea and just prototyped it and them remembered why I have
the API like it is:

In a perfect world this would be an IOMMU and just implement the IOMMU device ops.
In a slightly less perfect world this would implement dma_map_ops and transparently
allow using the normal DMA API inside the NVMe driver.
In Apple's world the NVMe driver needs two separate IOMMU-like filters and you can't
have two separate dma_map_ops without ugly hacks.

This will usually be used just next to dma_map_coherent and dma_free_coherent and
for those you need to keep track of phys_addr, dma_addr and size anyway,
e.g. (with the error handling removed here)

	bfr->buffer = dma_alloc_coherent(anv->dev, size, &iova, GFP_KERNEL);
	ret = apple_sart_add_allowed_region(anv->sart, iova, size);

and later then

	apple_sart_remove_allowed_region(anv->sart, bfr->iova, bfr->size);
	dma_free_coherent(anv->dev, bfr->size, bfr->buffer, bfr->iova);



Best,

Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] nvme-apple: Serialize command issue
  2022-03-24  6:16   ` Christoph Hellwig
@ 2022-04-02 12:42     ` Sven Peter
  0 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 12:42 UTC (permalink / raw)
  To: hch
  Cc: Keith Busch, axboe, sagi, Hector Martin, Alyssa Rosenzweig,
	Rob Herring, Arnd Bergmann, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme, Jens Axboe

On Thu, Mar 24, 2022, at 07:16, Christoph Hellwig wrote:
> On Mon, Mar 21, 2022 at 05:50:47PM +0100, Sven Peter wrote:
>> From: Jens Axboe <axboe@kernel.dk>
>> 
>> This controller shouldn't need serialization of command issue since
>> the SQ is replaced by a simple array and commands are issued by writing
>> the array index to a MMIO register.
>> Without serialization however sometimes commands are still executed
>> correctly and appear in the CQ but never trigger an interrupt.
>> 
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> [sven: added our best guess why this needs to be done]
>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>
> This really should go into the previous patch.

Ok, I'll squash it (and the following two patches) into the previous one.

Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-24  6:16   ` Christoph Hellwig
@ 2022-04-02 12:47     ` Sven Peter
  2022-04-04 15:57     ` Hector Martin
  1 sibling, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 12:47 UTC (permalink / raw)
  To: hch
  Cc: Keith Busch, axboe, sagi, Hector Martin, Alyssa Rosenzweig,
	Rob Herring, Arnd Bergmann, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme

Hi,

Thanks for the review!

On Thu, Mar 24, 2022, at 07:16, Christoph Hellwig wrote:
>> +
>> +//#define DEBUG
>
> This should not leak into the driverṡ

Agreed, I'll remove it.

>
>> +#include <linux/blk-integrity.h>
>
> As far as I can tell this driver does not support metadata or PI,
> so why is this include needed?

Yup, you're right, it's not required at all. It's a left-over from
when I hadn't realized that this controller doesn't support metadata
and still had it implemented.

>
>> +/* NVM Express NVM Command Set Specification, Revision 1.0a, Figure 18 */
>> +#define NVME_OPCODE_DATA_XFER_HOST_TO_CTRL BIT(0)
>> +#define NVME_OPCODE_DATA_XFER_CTRL_TO_HOST BIT(1)
>
> Please just use the nvme_is_write helper where you are using these.

I didn't realize that helper exists, will use it!


>
>> +static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,
>
> Please avoid > 80 character lines.

Will fix it.

>
>> +static void apple_nvme_free_ctrl(struct nvme_ctrl *ctrl)
>> +{
>> +}
>
> So where are the apple specific resources free?  ->free_ctrl is
> the callback from the struct device release callback, so without
> one the resource release can't be tried to the device release.

The resources are all devres managed but I only just realized that
the nvme_ctrl creates its own struct device that has to be tied to
the platform device. I think I'll at least need a get_device in _probe
and a put_device here just like in pci.c to tie those two together.
I'll also double check to make sure that all resources are indeed
devres-managed.


Thanks,


Sven


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-22 17:23   ` Alyssa Rosenzweig
@ 2022-04-02 12:50     ` Sven Peter
  0 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 12:50 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Hector Martin, Rob Herring, Arnd Bergmann, Keith Busch, axboe,
	hch, sagi, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme



On Tue, Mar 22, 2022, at 18:23, Alyssa Rosenzweig wrote:
>> +enum { APPLE_RTKIT_PWR_STATE_OFF = 0x00,
>> +	APPLE_RTKIT_PWR_STATE_SLEEP = 0x01,
>> +	APPLE_RTKIT_PWR_STATE_GATED = 0x02,
>> +	APPLE_RTKIT_PWR_STATE_QUIESCED = 0x10,
>> +	APPLE_RTKIT_PWR_STATE_ON = 0x20,
>> +};
>
> It would be great to get comments added explaining what these states
> are. It's not obvious how off/sleep/gated/quiesced differ, and it's not
> obvious from the code (doesn't look like GATED is used here at all?).
> Are these Apple names or r/e'd names or a mix?

Good point. They come from XNU's serial output, I'll add comments
for those where I know exactly what they do. I'll also remove GATED
for now since that's not used yet.

>
>> +	if (!rtk->syslog_buffer.size) {
>> +		rtk_warn(
>> +			"received syslog message but syslog_buffer.size is zero");
>> +		goto done;
>> +	}
>> +	if (!rtk->syslog_buffer.buffer && !rtk->syslog_buffer.iomem) {
>> +		rtk_warn("received syslog message but no syslog_buffer.buffer or syslog_buffer.iomem");
>> +		goto done;
>> +	}
>
> Nit: Wrapping is inconsistent between these two warns.

Will fix it while getting rid of the rtk_ macros.

Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-22 13:13   ` Arnd Bergmann
  2022-03-22 17:41     ` Robin Murphy
@ 2022-04-02 12:56     ` Sven Peter
  2022-04-02 18:30       ` Arnd Bergmann
  1 sibling, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-04-02 12:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Keith Busch,
	axboe, hch, sagi, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme

On Tue, Mar 22, 2022, at 14:13, Arnd Bergmann wrote:
> On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>>
>> Apple SoCs such as the M1 come with multiple embedded co-processors
>> running proprietary firmware. Communication with those is established
>> over a simple mailbox using the RTKit IPC protocol.
>>
>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>
>> +
>> +#define rtk_err(format, arg...) dev_err(rtk->dev, "RTKit: " format, ##arg)
>> +#define rtk_warn(format, arg...) dev_warn(rtk->dev, "RTKit: " format, ##arg)
>> +#define rtk_info(format, arg...) dev_info(rtk->dev, "RTKit: " format, ##arg)
>> +#define rtk_dbg(format, arg...) dev_dbg(rtk->dev, "RTKit: " format, ##arg)
>
> I generally don't like the custom printing macros, please just open-code
> the prints where they are used, that makes it easier for other kernel
> developers to see exactly what is being printed.
>

Sure, I'll remove them.

>> +enum { APPLE_RTKIT_WORK_MSG,
>> +       APPLE_RTKIT_WORK_REINIT,
>> +};
>> +
>> +enum { APPLE_RTKIT_PWR_STATE_OFF = 0x00,
>> +       APPLE_RTKIT_PWR_STATE_SLEEP = 0x01,
>> +       APPLE_RTKIT_PWR_STATE_GATED = 0x02,
>> +       APPLE_RTKIT_PWR_STATE_QUIESCED = 0x10,
>> +       APPLE_RTKIT_PWR_STATE_ON = 0x20,
>> +};
>
> This is an odd indentation style, I would insert a newline after the 'enum {'

Yeah, I blame clang-format and me not double-checking the result for that one.
I'll add the newline.

>
>> +static int apple_rtkit_worker(void *data)
>> +{
>> +       struct apple_rtkit *rtk = data;
>> +       struct apple_rtkit_work work;
>> +
>> +       while (!kthread_should_stop()) {
>> +               wait_event_interruptible(rtk->wq,
>> +                                        kfifo_len(&rtk->work_fifo) > 0 ||
>> +                                                kthread_should_stop());
>> +
>> +               if (kthread_should_stop())
>> +                       break;
>> +
>> +               while (kfifo_out_spinlocked(&rtk->work_fifo, &work, 1,
>> +                                           &rtk->work_lock) == 1) {
>> +                       switch (work.type) {
>> +                       case APPLE_RTKIT_WORK_MSG:
>> +                               apple_rtkit_rx(rtk, &work.msg);
>> +                               break;
>> +                       case APPLE_RTKIT_WORK_REINIT:
>> +                               apple_rtkit_do_reinit(rtk);
>> +                               break;
>> +                       }
>> +               }
>
> It looks like you add quite a bit of complexity by using a custom
> worker thread implementation. Can you explain what this is
> needed for? Isn't this roughly the same thing that one would
> get more easily with create_singlethread_workqueue()?

I originally had just a workqueue here but I can only put
one instance of e.g. APPLE_RTKIT_WORK_MSG onto these.
There could however be a new incoming message while the previous
one is still being handled and I couldn't figure out a way
to handle that with workqueues without introducing a race.


>
>> +#if IS_ENABLED(CONFIG_APPLE_RTKIT)
>
> Instead of allowing the interface to be used without CONFIG_APPLE_RTKIT,
> I think it is sufficient to allow the driver itself to be built with
> CONFIG_COMPILE_TEST (as you already do), and then have
> drivers using it marked as 'depends on APPLE_RTKIT'
> unconditionally.

That's indeed much simpler and what I wanted to achieve. I just couldn't
figure out how to do it. I'll use your approach! 

>
>> +/*
>> + * Initializes the internal state required to handle RTKit. This
>> + * should usually be called within _probe.
>> + *
>> + * @dev: Pointer to the device node this coprocessor is assocated with
>> + * @cookie: opaque cookie passed to all functions defined in rtkit_ops
>> + * @mbox_name: mailbox name used to communicate with the co-processor
>> + * @mbox_idx: mailbox index to be used if mbox_name is NULL
>> + * @ops: pointer to rtkit_ops to be used for this co-processor
>> + */
>> +struct apple_rtkit *apple_rtkit_init(struct device *dev, void *cookie,
>> +                                    const char *mbox_name, int mbox_idx,
>> +                                    const struct apple_rtkit_ops *ops);
>> +
>> +/*
>> + * Dev-res managed version of apple_rtkit_init.
>> + */
>> +struct apple_rtkit *devm_apple_rtkit_init(struct device *dev, void *cookie,
>> +                                         const char *mbox_name, int mbox_idx,
>> +                                         const struct apple_rtkit_ops *ops);
>
> Do we need to export both of these?

No, only devm_apple_rtkit_init needs to be exported.


Thanks,


Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART
  2022-03-31 21:23   ` Rob Herring
@ 2022-04-02 12:58     ` Sven Peter
  0 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 12:58 UTC (permalink / raw)
  To: Rob Herring
  Cc: Hector Martin, Alyssa Rosenzweig, Arnd Bergmann, Keith Busch,
	axboe, hch, sagi, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

On Thu, Mar 31, 2022, at 23:23, Rob Herring wrote:
> On Mon, Mar 21, 2022 at 05:50:41PM +0100, Sven Peter wrote:
>> Apple SoCs such as the M1 come with a simple DMA address filter called
>> SART. Unlike a real IOMMU no pagetables can be configured but instead
>> DMA transactions can be allowed for up to 16 paddr regions.
>> 
>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>> ---
>>  .../bindings/soc/apple/apple,sart.yaml        | 52 +++++++++++++++++++
>
> Close enough to an IOMMU in terms of its purpose, so put in 
> bindings/iommu/

Ok, will put it there. I guess I can also use iommu for the node name
then, e.g.

    iommu@7bc50000 {
      compatible = "apple,t8103-sart";
      reg = <0x7bc50000 0x4000>;
    };


Thanks,


Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe
  2022-03-23 11:14   ` Krzysztof Kozlowski
@ 2022-04-02 13:05     ` Sven Peter
  2022-04-02 16:06       ` Krzysztof Kozlowski
  0 siblings, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-04-02 13:05 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Rob Herring
  Cc: Hector Martin, Alyssa Rosenzweig, Arnd Bergmann, Keith Busch,
	axboe, hch, sagi, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

Hi,

thanks for the review!

On Wed, Mar 23, 2022, at 12:14, Krzysztof Kozlowski wrote:
> On 21/03/2022 17:50, Sven Peter wrote:
>> Apple SoCs such as the M1 come with an embedded NVMe coprocessor called
>> ANS2.
>> 
>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>> ---
>>  .../bindings/soc/apple/apple,nvme-ans.yaml    | 75 +++++++++++++++++++
>>  1 file changed, 75 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
>> 
>> diff --git a/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
>> new file mode 100644
>> index 000000000000..e1f4c1c572aa
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
>> @@ -0,0 +1,75 @@
>> +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/soc/apple/apple,nvme-ans.yaml#
>
> Do not drop all code in soc/apple, but please use respective subsystems.
> Apple is not a subsystem, is not special.
>

Sure, the code is already inside drivers/nvme/host but I'll also create
Documentation/devicetree/bindings/nvme and put the bindings in there as
well.

>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Apple ANS NVM Express host controller
>> +
>> +maintainers:
>> +  - Sven Peter <sven@svenpeter.dev>
>> +
>> +properties:
>> +  compatible:
>> +    items:
>> +      - enum:
>> +          - apple,t8103-nvme-ans2
>> +          - apple,t6000-nvme-ans2
>> +      - const: apple,nvme-ans2
>> +
>> +  reg:
>> +    items:
>> +      - description: NVMe and NVMMU registers
>> +      - description: ANS2 co-processor control registers
>> +
>> +  reg-names:
>> +    items:
>> +      - const: nvme
>> +      - const: ans
>> +
>> +  resets:
>> +    maxItems: 1
>> +
>> +  power-domains: true
>
> maxItems

Ok, I guess I can just use the max number of domains in the HW
released so far and we can always increase it when Apple releases
a new SoC that requires more power domains then.


Thanks,

Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-21 17:01   ` Keith Busch
@ 2022-04-02 13:10     ` Sven Peter
  0 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 13:10 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, hch, sagi, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

On Mon, Mar 21, 2022, at 18:01, Keith Busch wrote:
> On Mon, Mar 21, 2022 at 05:50:46PM +0100, Sven Peter wrote:
>> +static bool apple_nvme_poll_cq(struct apple_nvme_queue *q,
>> +			       struct io_comp_batch *iob)
>> +{
>> +	bool found = false;
>> +
>> +	while (apple_nvme_cqe_pending(q)) {
>> +		found = true;
>> +
>> +		/*
>> +		 * load-load control dependency between phase and the rest of
>> +		 * the cqe requires a full read memory barrier
>> +		 */
>> +		dma_rmb();
>> +		apple_nvme_handle_cqe(q, iob, q->cq_head);
>> +		apple_nvme_update_cq_head(q);
>> +	}
>> +
>> +	if (found)
>> +		writel_relaxed(q->cq_head, q->cq_db);
>
> Doesn't a relaxed write mean that a subsequent write can bypass the previous?
> If so, that sounds like it can corrupt the cq head.

No, writel_relaxed just means there's no synchronization barrier for writes to
normal memory and that those could bypass the store to device memory and only
become visible later.
The underlying memory (q->cq_db) is still mapped as Device-nGnRnE which prevents
reordering of writes to the same endpoint such that the N-th CQ update is still
guaranteed to be visible to the device before the N+1-th CQ update.

I'll either add a comment why it's okay to use _relaxed here or just use writel
instead since there doesn't seem to be any performance difference.



Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-22 13:38   ` Arnd Bergmann
@ 2022-04-02 13:34     ` Sven Peter
  2022-04-02 13:58       ` Janne Grunau
  2022-04-02 21:26       ` Arnd Bergmann
  0 siblings, 2 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 13:34 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Keith Busch, axboe, hch, sagi, Hector Martin, Alyssa Rosenzweig,
	Rob Herring, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme



On Tue, Mar 22, 2022, at 14:38, Arnd Bergmann wrote:
> On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>
>> +static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,
>> +                                    dma_addr_t iova, size_t size)
>> +{
>> +       struct apple_nvme *anv = cookie;
>> +       int ret;
>> +
>> +       if (iova)
>> +               return -EINVAL;
>> +
>> +       bfr->buffer = dma_alloc_coherent(anv->dev, size, &iova, GFP_KERNEL);
>> +       if (!bfr->buffer)
>> +               return -ENOMEM;
>
> You pass 'iova' as an argument, but then replace it with the address
> returned by dma_alloc_coherent(). Can you remove the function
> argument?

Yup, will remove it.

>
>> +static void apple_nvmmu_inval(struct apple_nvme_queue *q, unsigned int tag)
>> +{
>> +       struct apple_nvme *anv = queue_to_apple_nvme(q);
>> +
>> +       writel(tag, anv->mmio_nvme + APPLE_NVMMU_TCB_INVAL);
>> +       if (readl_relaxed(anv->mmio_nvme + APPLE_NVMMU_TCB_STAT))
>> +               dev_warn(anv->dev, "NVMMU TCB invalidation failed\n");
>> +}
>
> I don't like to see the _relaxed() accessors used without an explanation
> about why that helps. Please use the non-relaxed version, or make sure
> it's obvious here why you use it.

Ok, I'll either use the non-relaxed ones or add a comment whenever I use
the relaxed version. In this case here there's no write to any DMA buffers
that needs to be visible to the device. That writel there could actually
be a writel_relaxed as well. There just used to be a write to a buffer
above (which is another good reason to always comment when using the
non-relaxed ones, maybe then I would've noticed then and updated it).


>
>> +bad_sgl:
>> +       WARN(DO_ONCE(apple_nvme_print_sgl, iod->sg, iod->nents),
>> +            "Invalid SGL for payload:%d nents:%d\n", blk_rq_payload_bytes(req),
>> +            iod->nents);
>
> I think you mean WARN_ONCE() here?

This is taken from pci.c which used to use WARN_ONCE but was replaced in
d08774738446e77734777adcf5d1045237b4475a with this construction here.
The commit message mentions

    The WARN_ONCE macro returns true if the condition is true, not if the
    warn was raised, so we're printing the scatter list every time it's
    invalid. This is excessive and makes debugging harder, so this patch
    prints it just once.


>
>> +       writel_relaxed(0, anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
>> +       (void)readl_relaxed(anv->mmio_coproc + APPLE_ANS_COPROC_CPU_CONTROL);
>
> What is the purpose of the readl_relaxed() here? It looks like you are
> trying to flush
> the write to the hardware, but then again
>
>   a) on Apple hardware, the registers are mapped using PROT_DEVICE_nGnRnE,
>       so MMIO writes are never posted
>
>   b) the read is "_relaxed", so there is no barrier, and the result is
> unused, so
>       it would appear that the CPU can just keep executing code anyway.
>
> Since this is all the initialization path, I can't imagine what the
> relaxation of
> the barriers helps with.

Agreed, I've actually tried replacing all non-relaxed ones with the normal
accessors (even those inside the hot path) and didn't see any performance
difference.
I can use the normal ones here and I'll consider using the non-relaxed ones
in the hot path together with a comment why they are safe in those places.

>
>> +static int apple_nvme_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
>> +{
>> +       *val = readl_relaxed(ctrl_to_apple_nvme(ctrl)->mmio_nvme + off);
>> +       return 0;
>> +}
>> +
>> +static int apple_nvme_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
>> +{
>> +       writel_relaxed(val, ctrl_to_apple_nvme(ctrl)->mmio_nvme + off);
>> +       return 0;
>> +}
>
> If you have generic register access functions, don't make them use
> _relaxed internally. If there are instances that need to be _relaxed,
> add another version of the accessor that spells this out in the caller.

Ok, there are used internally by the nvme core and can't do any kind of DMA
right now but I agree that it's better to use the non-relaxed ones here
to prevent surprises in the future.


Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-03-23 11:19   ` Krzysztof Kozlowski
@ 2022-04-02 13:51     ` Sven Peter
  2022-04-02 16:08       ` Krzysztof Kozlowski
  2022-04-04 15:02       ` Rob Herring
  0 siblings, 2 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 13:51 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Keith Busch, axboe, hch, sagi, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme

On Wed, Mar 23, 2022, at 12:19, Krzysztof Kozlowski wrote:
> On 21/03/2022 17:50, Sven Peter wrote:
>> Apple SoCs such as the M1 come with multiple embedded co-processors
>> running proprietary firmware. Communication with those is established
>> over a simple mailbox using the RTKit IPC protocol.
>> 
>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>> ---
>>  drivers/soc/apple/Kconfig          |  13 +
>>  drivers/soc/apple/Makefile         |   3 +
>>  drivers/soc/apple/rtkit-crashlog.c | 147 +++++
>>  drivers/soc/apple/rtkit-internal.h |  76 +++
>>  drivers/soc/apple/rtkit.c          | 842 +++++++++++++++++++++++++++++
>>  include/linux/soc/apple/rtkit.h    | 203 +++++++
>>  6 files changed, 1284 insertions(+)
>
> Isn't this some implementation of a mailbox? If so, it should be in
> drivers/mailbox. Please don't put all stuff in soc/apple, that's not how
> Linux is organized. To drivers/soc usually we put drivers which do not
> fit regular subsystems.
>

I put this into soc/apple because I don't think it fits within the mailbox
framework very well.
(It actually uses the mailbox framework for the actual communication
with the hardware with a driver that's already upstream.)

Essentially, the mailbox subsystem provides a common API to send and
receive messages over indepedent hardware channels and devicetree bindings
to describe the relationship between those channels and other drivers.

One of the features that doesn't really fit is that we need to be able
to start, shutdown and re-start these co-processors. The NVMe driver
actually doesn't need to send/receive any messages except those required
to setup the common syslog/crashlog/etc. interfaces.
The mailbox framework would have to be extended to support these specific
use cases.

Another thing that doesn't fit is the memory management: These co-processors
sometimes need shared memory buffers to e.g. send syslog messages.
They always request these buffers with an IPC message but then there are
different possibilities:

	- For some processor the DMA API can just be used and an IOVA must be
	  sent back. For NVMe these buffers must additionally be allowed in this
	  SART address filter.
	- At least one other processor (SMC) does not request such buffers but
	  instead just sends a pointer into MMIO space and the buffer must be
	  accessed using readl/writel. This MMIO memory region is used for
	  both the common buffers (syslog etc.) and for the actual shared buffers
	  used for communication, such that the resource would have to be shared
	  across drivers.
	- And yet another coprocessor (for the display controller) requests some
	  buffers with an already existing IOVA that than need to be mapped
	  specifically inside the IOMMU.

Each of these co-processors also provides a single function and most
of them don't even have different endpoints. And even those that do (DCP) will
just become a single driver since all those endpoints are very much related.

While it's not impossible to do all that by extending and forcing this into the
mailbox framework at lest I think that it doesn't fit very well and would just
create unneccesarry impedance.


Best,


Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-02 13:34     ` Sven Peter
@ 2022-04-02 13:58       ` Janne Grunau
  2022-04-02 14:02         ` Sven Peter
  2022-04-02 21:26       ` Arnd Bergmann
  1 sibling, 1 reply; 55+ messages in thread
From: Janne Grunau @ 2022-04-02 13:58 UTC (permalink / raw)
  To: Sven Peter
  Cc: Arnd Bergmann, Keith Busch, axboe, hch, sagi, Hector Martin,
	Alyssa Rosenzweig, Rob Herring, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme

On 2022-04-02 15:34:55 +0200, Sven Peter wrote:
> 
> 
> On Tue, Mar 22, 2022, at 14:38, Arnd Bergmann wrote:
> > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
> >
> >> +static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,
> >> +                                    dma_addr_t iova, size_t size)
> >> +{
> >> +       struct apple_nvme *anv = cookie;
> >> +       int ret;
> >> +
> >> +       if (iova)
> >> +               return -EINVAL;
> >> +
> >> +       bfr->buffer = dma_alloc_coherent(anv->dev, size, &iova, GFP_KERNEL);
> >> +       if (!bfr->buffer)
> >> +               return -ENOMEM;
> >
> > You pass 'iova' as an argument, but then replace it with the address
> > returned by dma_alloc_coherent(). Can you remove the function
> > argument?
> 
> Yup, will remove it.

You can remove it but we will have to add it back once we submit the dcp 
driver (display co-processor). dcp is initialized during boot and uses a 
single pre-allocated buffer which needs to be mapped instead of 
allocated. It seemed easier to make apple_rtkit_ops.shmem_setup() 
omnipotent than adding a different function pointer for this case.

Janne

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-02 13:58       ` Janne Grunau
@ 2022-04-02 14:02         ` Sven Peter
  0 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-02 14:02 UTC (permalink / raw)
  To: Janne Grunau
  Cc: Arnd Bergmann, Keith Busch, axboe, hch, sagi, Hector Martin,
	Alyssa Rosenzweig, Rob Herring, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme



On Sat, Apr 2, 2022, at 15:58, Janne Grunau wrote:
> On 2022-04-02 15:34:55 +0200, Sven Peter wrote:
>> 
>> 
>> On Tue, Mar 22, 2022, at 14:38, Arnd Bergmann wrote:
>> > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>> >
>> >> +static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,
>> >> +                                    dma_addr_t iova, size_t size)
>> >> +{
>> >> +       struct apple_nvme *anv = cookie;
>> >> +       int ret;
>> >> +
>> >> +       if (iova)
>> >> +               return -EINVAL;
>> >> +
>> >> +       bfr->buffer = dma_alloc_coherent(anv->dev, size, &iova, GFP_KERNEL);
>> >> +       if (!bfr->buffer)
>> >> +               return -ENOMEM;
>> >
>> > You pass 'iova' as an argument, but then replace it with the address
>> > returned by dma_alloc_coherent(). Can you remove the function
>> > argument?
>> 
>> Yup, will remove it.
>
> You can remove it but we will have to add it back once we submit the dcp 
> driver (display co-processor). dcp is initialized during boot and uses a 
> single pre-allocated buffer which needs to be mapped instead of 
> allocated. It seemed easier to make apple_rtkit_ops.shmem_setup() 
> omnipotent than adding a different function pointer for this case.

Or we can already pass the iova DCP sent us into bfr->iova
and this function then has the task to just "do whatever is required
so that struct apple_rtkit_shmem *bfr is usable".


Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe
  2022-04-02 13:05     ` Sven Peter
@ 2022-04-02 16:06       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 55+ messages in thread
From: Krzysztof Kozlowski @ 2022-04-02 16:06 UTC (permalink / raw)
  To: Sven Peter, Rob Herring
  Cc: Hector Martin, Alyssa Rosenzweig, Arnd Bergmann, Keith Busch,
	axboe, hch, sagi, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme

On 02/04/2022 15:05, Sven Peter wrote:
> Hi,
> 
> thanks for the review!
> 
> On Wed, Mar 23, 2022, at 12:14, Krzysztof Kozlowski wrote:
>> On 21/03/2022 17:50, Sven Peter wrote:
>>> Apple SoCs such as the M1 come with an embedded NVMe coprocessor called
>>> ANS2.
>>>
>>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>>> ---
>>>  .../bindings/soc/apple/apple,nvme-ans.yaml    | 75 +++++++++++++++++++
>>>  1 file changed, 75 insertions(+)
>>>  create mode 100644 Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
>>>
>>> diff --git a/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
>>> new file mode 100644
>>> index 000000000000..e1f4c1c572aa
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/soc/apple/apple,nvme-ans.yaml
>>> @@ -0,0 +1,75 @@
>>> +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
>>> +%YAML 1.2
>>> +---
>>> +$id: http://devicetree.org/schemas/soc/apple/apple,nvme-ans.yaml#
>>
>> Do not drop all code in soc/apple, but please use respective subsystems.
>> Apple is not a subsystem, is not special.
>>
> 
> Sure, the code is already inside drivers/nvme/host but I'll also create
> Documentation/devicetree/bindings/nvme and put the bindings in there as
> well.

Yes, please. We have also Documentation/devicetree/bindings/nvmem/ but
it seems its entirely different stuff.


Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-04-02 13:51     ` Sven Peter
@ 2022-04-02 16:08       ` Krzysztof Kozlowski
  2022-04-04 15:02       ` Rob Herring
  1 sibling, 0 replies; 55+ messages in thread
From: Krzysztof Kozlowski @ 2022-04-02 16:08 UTC (permalink / raw)
  To: Sven Peter
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Keith Busch, axboe, hch, sagi, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme

On 02/04/2022 15:51, Sven Peter wrote:
> On Wed, Mar 23, 2022, at 12:19, Krzysztof Kozlowski wrote:
>> On 21/03/2022 17:50, Sven Peter wrote:
>>> Apple SoCs such as the M1 come with multiple embedded co-processors
>>> running proprietary firmware. Communication with those is established
>>> over a simple mailbox using the RTKit IPC protocol.
>>>
>>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>>> ---
>>>  drivers/soc/apple/Kconfig          |  13 +
>>>  drivers/soc/apple/Makefile         |   3 +
>>>  drivers/soc/apple/rtkit-crashlog.c | 147 +++++
>>>  drivers/soc/apple/rtkit-internal.h |  76 +++
>>>  drivers/soc/apple/rtkit.c          | 842 +++++++++++++++++++++++++++++
>>>  include/linux/soc/apple/rtkit.h    | 203 +++++++
>>>  6 files changed, 1284 insertions(+)
>>
>> Isn't this some implementation of a mailbox? If so, it should be in
>> drivers/mailbox. Please don't put all stuff in soc/apple, that's not how
>> Linux is organized. To drivers/soc usually we put drivers which do not
>> fit regular subsystems.
>>
> 
> I put this into soc/apple because I don't think it fits within the mailbox
> framework very well.
> (It actually uses the mailbox framework for the actual communication
> with the hardware with a driver that's already upstream.)
> 
> Essentially, the mailbox subsystem provides a common API to send and
> receive messages over indepedent hardware channels and devicetree bindings
> to describe the relationship between those channels and other drivers.
> 
> One of the features that doesn't really fit is that we need to be able
> to start, shutdown and re-start these co-processors. The NVMe driver
> actually doesn't need to send/receive any messages except those required
> to setup the common syslog/crashlog/etc. interfaces.
> The mailbox framework would have to be extended to support these specific
> use cases.
> 
> Another thing that doesn't fit is the memory management: These co-processors
> sometimes need shared memory buffers to e.g. send syslog messages.
> They always request these buffers with an IPC message but then there are
> different possibilities:
> 
> 	- For some processor the DMA API can just be used and an IOVA must be
> 	  sent back. For NVMe these buffers must additionally be allowed in this
> 	  SART address filter.
> 	- At least one other processor (SMC) does not request such buffers but
> 	  instead just sends a pointer into MMIO space and the buffer must be
> 	  accessed using readl/writel. This MMIO memory region is used for
> 	  both the common buffers (syslog etc.) and for the actual shared buffers
> 	  used for communication, such that the resource would have to be shared
> 	  across drivers.
> 	- And yet another coprocessor (for the display controller) requests some
> 	  buffers with an already existing IOVA that than need to be mapped
> 	  specifically inside the IOMMU.
> 
> Each of these co-processors also provides a single function and most
> of them don't even have different endpoints. And even those that do (DCP) will
> just become a single driver since all those endpoints are very much related.
> 
> While it's not impossible to do all that by extending and forcing this into the
> mailbox framework at lest I think that it doesn't fit very well and would just
> create unneccesarry impedance.

Thanks for explanation. I don't know the mailbox framework well enough
to advise you, so I don't mind keeping it in current location (drivers/soc).

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-04-02 12:56     ` Sven Peter
@ 2022-04-02 18:30       ` Arnd Bergmann
  2022-04-03 10:45         ` Sven Peter
  0 siblings, 1 reply; 55+ messages in thread
From: Arnd Bergmann @ 2022-04-02 18:30 UTC (permalink / raw)
  To: Sven Peter
  Cc: Arnd Bergmann, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Keith Busch, axboe, hch, sagi, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme

On Sat, Apr 2, 2022 at 2:56 PM Sven Peter <sven@svenpeter.dev> wrote:
> On Tue, Mar 22, 2022, at 14:13, Arnd Bergmann wrote:
> >> +static int apple_rtkit_worker(void *data)
> >> +{
> >> +       struct apple_rtkit *rtk = data;
> >> +       struct apple_rtkit_work work;
> >> +
> >> +       while (!kthread_should_stop()) {
> >> +               wait_event_interruptible(rtk->wq,
> >> +                                        kfifo_len(&rtk->work_fifo) > 0 ||
> >> +                                                kthread_should_stop());
> >> +
> >> +               if (kthread_should_stop())
> >> +                       break;
> >> +
> >> +               while (kfifo_out_spinlocked(&rtk->work_fifo, &work, 1,
> >> +                                           &rtk->work_lock) == 1) {
> >> +                       switch (work.type) {
> >> +                       case APPLE_RTKIT_WORK_MSG:
> >> +                               apple_rtkit_rx(rtk, &work.msg);
> >> +                               break;
> >> +                       case APPLE_RTKIT_WORK_REINIT:
> >> +                               apple_rtkit_do_reinit(rtk);
> >> +                               break;
> >> +                       }
> >> +               }
> >
> > It looks like you add quite a bit of complexity by using a custom
> > worker thread implementation. Can you explain what this is
> > needed for? Isn't this roughly the same thing that one would
> > get more easily with create_singlethread_workqueue()?
>
> I originally had just a workqueue here but I can only put
> one instance of e.g. APPLE_RTKIT_WORK_MSG onto these.
> There could however be a new incoming message while the previous
> one is still being handled and I couldn't figure out a way
> to handle that with workqueues without introducing a race.

Are you trying to avoid dynamic allocation of the messages then
and have no other place that you can embed it in?

If you kmalloc() a messages that embeds a work_struct, you can
enqueue as many of those as you want, but the allocation adds
complexity through the need for error handling etc.

I wonder if you can change the mailbox driver to use a threaded
irq handler, which I think should ensure that the callback here
is run in process context, avoiding the need to defer execution
within the rtkit driver.

         Arnd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-04-02 12:38     ` Sven Peter
@ 2022-04-02 19:07       ` Arnd Bergmann
  2022-04-04 14:58         ` Rob Herring
  0 siblings, 1 reply; 55+ messages in thread
From: Arnd Bergmann @ 2022-04-02 19:07 UTC (permalink / raw)
  To: Sven Peter
  Cc: Arnd Bergmann, Hector Martin, Alyssa Rosenzweig, Rob Herring,
	Keith Busch, axboe, hch, sagi, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme

On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote:
> On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote:
> > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
> >> The NVMe co-processor on the Apple M1 uses a DMA address filter called
> >> SART for some DMA transactions. This adds a simple driver used to
> >> configure the memory regions from which DMA transactions are allowed.
> >>
> >> Co-developed-by: Hector Martin <marcan@marcan.st>
> >> Signed-off-by: Hector Martin <marcan@marcan.st>
> >> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> >
> > Can you add some explanation about why this uses a custom interface
> > instead of hooking into the dma_map_ops?
>
> Sure.
> In a perfect world this would just be an IOMMU implementation but since
> SART can't create any real IOVA space using pagetables it doesn't fit
> inside that subsytem.
>
> In a slightly less perfect world I could just implement dma_map_ops here
> but that won't work either because not all DMA buffers of the NVMe
> device have to go through SART and those allocations happen
> inside the same device and would use the same dma_map_ops.
>
> The NVMe controller has two separate DMA filters:
>
>    - NVMMU, which must be set up for any command that uses PRPs and
>      ensures that the DMA transactions only touch the pages listed
>      inside the PRP structure. NVMMU itself is tightly coupled
>      to the NVMe controller: The list of allowed pages is configured
>      based on command's tag id and even commands that require no DMA
>      transactions must be listed inside NVMMU before they are started.
>    - SART, which must be set up for some shared memory buffers (e.g.
>      log messages from the NVMe firmware) and for some NVMe debug
>      commands that don't use PRPs.
>      SART is only loosely coupled to the NVMe controller and could
>      also be used together with other devices. It's also the only
>      thing that changed between M1 and M1 Pro/Max/Ultra and that's
>      why I decided to separate it from the NVMe driver.
>
> I'll add this explanation to the commit message.

Ok, thanks.

> >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
> >> +                           phys_addr_t *paddr, size_t *size)
> >> +{
> >> +       u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
> >> +       u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
> >
> > Why do you use the _relaxed() accessors here and elsewhere in the driver?
>
> This device itself doesn't do any DMA transactions so it needs no memory
> synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write
> from/to these buffers (multiple times) and they have the required barriers
> in place whenever they are used.
>
> These buffers so far are only allocated at probe time though so even using
> the normal writel/readl here won't hurt performance at all. I can just use
> those if you prefer or alternatively add a comment why _relaxed is fine here.
>
> This is a bit similar to the discussion for the pinctrl series last year [1].

I think it's better to only use the _relaxed version where it actually helps,
with a comment about it, and use the normal version elsewhere, in
particular in functions that you have copied from the normal nvme driver.
I had tried to compare some of your code with the other version and
was rather confused by that.

        Arnd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-02 13:34     ` Sven Peter
  2022-04-02 13:58       ` Janne Grunau
@ 2022-04-02 21:26       ` Arnd Bergmann
  1 sibling, 0 replies; 55+ messages in thread
From: Arnd Bergmann @ 2022-04-02 21:26 UTC (permalink / raw)
  To: Sven Peter
  Cc: Arnd Bergmann, Keith Busch, axboe, hch, sagi, Hector Martin,
	Alyssa Rosenzweig, Rob Herring, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme

On Sat, Apr 2, 2022 at 3:34 PM Sven Peter <sven@svenpeter.dev> wrote:
> On Tue, Mar 22, 2022, at 14:38, Arnd Bergmann wrote:
> > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:

> >> +bad_sgl:
> >> +       WARN(DO_ONCE(apple_nvme_print_sgl, iod->sg, iod->nents),
> >> +            "Invalid SGL for payload:%d nents:%d\n", blk_rq_payload_bytes(req),
> >> +            iod->nents);
> >
> > I think you mean WARN_ONCE() here?
>
> This is taken from pci.c which used to use WARN_ONCE but was replaced in
> d08774738446e77734777adcf5d1045237b4475a with this construction here.
> The commit message mentions
>
>     The WARN_ONCE macro returns true if the condition is true, not if the
>     warn was raised, so we're printing the scatter list every time it's
>     invalid. This is excessive and makes debugging harder, so this patch
>     prints it just once.
>

Ok, makes sense. If we get more of these in the kernel, we may want to
add a helper that makes this more obvious, but it appears that for now
these two are the only ones.

It could also be expressed by moving the WARN_ONCE() into the
condition above the 'goto', but I don't see a reason to change the other
driver and it's better to keep the two consistent.

> Agreed, I've actually tried replacing all non-relaxed ones with the normal
> accessors (even those inside the hot path) and didn't see any performance
> difference.
> I can use the normal ones here and I'll consider using the non-relaxed ones
> in the hot path together with a comment why they are safe in those places.

Sounds good. If you find any instances in the hot path that have a
corresponding version in the pci.c file, it would be good to keep these
two in sync, hopefully making the non-apple case faster as well.

        Arnd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-04-02 18:30       ` Arnd Bergmann
@ 2022-04-03 10:45         ` Sven Peter
  0 siblings, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-03 10:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Hector Martin, Alyssa Rosenzweig, Rob Herring, Keith Busch,
	axboe, hch, sagi, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme



On Sat, Apr 2, 2022, at 20:30, Arnd Bergmann wrote:
> On Sat, Apr 2, 2022 at 2:56 PM Sven Peter <sven@svenpeter.dev> wrote:
>> On Tue, Mar 22, 2022, at 14:13, Arnd Bergmann wrote:
>> >> +static int apple_rtkit_worker(void *data)
>> >> +{
>> >> +       struct apple_rtkit *rtk = data;
>> >> +       struct apple_rtkit_work work;
>> >> +
>> >> +       while (!kthread_should_stop()) {
>> >> +               wait_event_interruptible(rtk->wq,
>> >> +                                        kfifo_len(&rtk->work_fifo) > 0 ||
>> >> +                                                kthread_should_stop());
>> >> +
>> >> +               if (kthread_should_stop())
>> >> +                       break;
>> >> +
>> >> +               while (kfifo_out_spinlocked(&rtk->work_fifo, &work, 1,
>> >> +                                           &rtk->work_lock) == 1) {
>> >> +                       switch (work.type) {
>> >> +                       case APPLE_RTKIT_WORK_MSG:
>> >> +                               apple_rtkit_rx(rtk, &work.msg);
>> >> +                               break;
>> >> +                       case APPLE_RTKIT_WORK_REINIT:
>> >> +                               apple_rtkit_do_reinit(rtk);
>> >> +                               break;
>> >> +                       }
>> >> +               }
>> >
>> > It looks like you add quite a bit of complexity by using a custom
>> > worker thread implementation. Can you explain what this is
>> > needed for? Isn't this roughly the same thing that one would
>> > get more easily with create_singlethread_workqueue()?
>>
>> I originally had just a workqueue here but I can only put
>> one instance of e.g. APPLE_RTKIT_WORK_MSG onto these.
>> There could however be a new incoming message while the previous
>> one is still being handled and I couldn't figure out a way
>> to handle that with workqueues without introducing a race.
>
> Are you trying to avoid dynamic allocation of the messages then
> and have no other place that you can embed it in?

Yeah, I didn't want to allocate anything because of the added
complexity here like you mentioned.

>
> If you kmalloc() a messages that embeds a work_struct, you can
> enqueue as many of those as you want, but the allocation adds
> complexity through the need for error handling etc.
>
> I wonder if you can change the mailbox driver to use a threaded
> irq handler, which I think should ensure that the callback here
> is run in process context, avoiding the need to defer execution
> within the rtkit driver.

Hrm, I just realized that's already the case. mailbox_client.h documents
rx_callback as "Atomic callback to provide client the data received"
but since we know rtkit will only ever be combined with that specific
Apple mailbox we could get away with just ignoring that.
It's a small hack but it might be worth it since it would reduce
the complexity inside rtkit.

Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-04-02 19:07       ` Arnd Bergmann
@ 2022-04-04 14:58         ` Rob Herring
  2022-04-04 15:01           ` Hector Martin
  2022-04-05 15:37           ` Sven Peter
  0 siblings, 2 replies; 55+ messages in thread
From: Rob Herring @ 2022-04-04 14:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sven Peter, Hector Martin, Alyssa Rosenzweig, Keith Busch, axboe,
	hch, sagi, Marc Zyngier, DTML, Linux ARM,
	Linux Kernel Mailing List, linux-nvme

On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote:
> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote:
> > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote:
> > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
> > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called
> > >> SART for some DMA transactions. This adds a simple driver used to
> > >> configure the memory regions from which DMA transactions are allowed.
> > >>
> > >> Co-developed-by: Hector Martin <marcan@marcan.st>
> > >> Signed-off-by: Hector Martin <marcan@marcan.st>
> > >> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> > >
> > > Can you add some explanation about why this uses a custom interface
> > > instead of hooking into the dma_map_ops?
> >
> > Sure.
> > In a perfect world this would just be an IOMMU implementation but since
> > SART can't create any real IOVA space using pagetables it doesn't fit
> > inside that subsytem.
> >
> > In a slightly less perfect world I could just implement dma_map_ops here
> > but that won't work either because not all DMA buffers of the NVMe
> > device have to go through SART and those allocations happen
> > inside the same device and would use the same dma_map_ops.
> >
> > The NVMe controller has two separate DMA filters:
> >
> >    - NVMMU, which must be set up for any command that uses PRPs and
> >      ensures that the DMA transactions only touch the pages listed
> >      inside the PRP structure. NVMMU itself is tightly coupled
> >      to the NVMe controller: The list of allowed pages is configured
> >      based on command's tag id and even commands that require no DMA
> >      transactions must be listed inside NVMMU before they are started.
> >    - SART, which must be set up for some shared memory buffers (e.g.
> >      log messages from the NVMe firmware) and for some NVMe debug
> >      commands that don't use PRPs.
> >      SART is only loosely coupled to the NVMe controller and could
> >      also be used together with other devices. It's also the only
> >      thing that changed between M1 and M1 Pro/Max/Ultra and that's
> >      why I decided to separate it from the NVMe driver.
> >
> > I'll add this explanation to the commit message.
> 
> Ok, thanks.
> 
> > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
> > >> +                           phys_addr_t *paddr, size_t *size)
> > >> +{
> > >> +       u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
> > >> +       u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
> > >
> > > Why do you use the _relaxed() accessors here and elsewhere in the driver?
> >
> > This device itself doesn't do any DMA transactions so it needs no memory
> > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write
> > from/to these buffers (multiple times) and they have the required barriers
> > in place whenever they are used.
> >
> > These buffers so far are only allocated at probe time though so even using
> > the normal writel/readl here won't hurt performance at all. I can just use
> > those if you prefer or alternatively add a comment why _relaxed is fine here.
> >
> > This is a bit similar to the discussion for the pinctrl series last year [1].
> 
> I think it's better to only use the _relaxed version where it actually helps,
> with a comment about it, and use the normal version elsewhere, in
> particular in functions that you have copied from the normal nvme driver.
> I had tried to compare some of your code with the other version and
> was rather confused by that.

Oh good, I tell folks the opposite (and others do too). We don't accept 
random explicit barriers without explanation, but implicit ones are 
okay? The resulting code on arm32 is also pretty horrible with the L2x0 
and OMAP sync hooks not that that matters here.

I don't really care too much which way we go, but we should document one 
rule and follow that.

Rob

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-04-04 14:58         ` Rob Herring
@ 2022-04-04 15:01           ` Hector Martin
  2022-04-05 15:37           ` Sven Peter
  1 sibling, 0 replies; 55+ messages in thread
From: Hector Martin @ 2022-04-04 15:01 UTC (permalink / raw)
  To: Rob Herring, Arnd Bergmann
  Cc: Sven Peter, Alyssa Rosenzweig, Keith Busch, axboe, hch, sagi,
	Marc Zyngier, DTML, Linux ARM, Linux Kernel Mailing List,
	linux-nvme

On 04/04/2022 23.58, Rob Herring wrote:
> On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote:
>> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote:
>>> On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote:
>>>> On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>>>>> The NVMe co-processor on the Apple M1 uses a DMA address filter called
>>>>> SART for some DMA transactions. This adds a simple driver used to
>>>>> configure the memory regions from which DMA transactions are allowed.
>>>>>
>>>>> Co-developed-by: Hector Martin <marcan@marcan.st>
>>>>> Signed-off-by: Hector Martin <marcan@marcan.st>
>>>>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>>>>
>>>> Can you add some explanation about why this uses a custom interface
>>>> instead of hooking into the dma_map_ops?
>>>
>>> Sure.
>>> In a perfect world this would just be an IOMMU implementation but since
>>> SART can't create any real IOVA space using pagetables it doesn't fit
>>> inside that subsytem.
>>>
>>> In a slightly less perfect world I could just implement dma_map_ops here
>>> but that won't work either because not all DMA buffers of the NVMe
>>> device have to go through SART and those allocations happen
>>> inside the same device and would use the same dma_map_ops.
>>>
>>> The NVMe controller has two separate DMA filters:
>>>
>>>    - NVMMU, which must be set up for any command that uses PRPs and
>>>      ensures that the DMA transactions only touch the pages listed
>>>      inside the PRP structure. NVMMU itself is tightly coupled
>>>      to the NVMe controller: The list of allowed pages is configured
>>>      based on command's tag id and even commands that require no DMA
>>>      transactions must be listed inside NVMMU before they are started.
>>>    - SART, which must be set up for some shared memory buffers (e.g.
>>>      log messages from the NVMe firmware) and for some NVMe debug
>>>      commands that don't use PRPs.
>>>      SART is only loosely coupled to the NVMe controller and could
>>>      also be used together with other devices. It's also the only
>>>      thing that changed between M1 and M1 Pro/Max/Ultra and that's
>>>      why I decided to separate it from the NVMe driver.
>>>
>>> I'll add this explanation to the commit message.
>>
>> Ok, thanks.
>>
>>>>> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
>>>>> +                           phys_addr_t *paddr, size_t *size)
>>>>> +{
>>>>> +       u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
>>>>> +       u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
>>>>
>>>> Why do you use the _relaxed() accessors here and elsewhere in the driver?
>>>
>>> This device itself doesn't do any DMA transactions so it needs no memory
>>> synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write
>>> from/to these buffers (multiple times) and they have the required barriers
>>> in place whenever they are used.
>>>
>>> These buffers so far are only allocated at probe time though so even using
>>> the normal writel/readl here won't hurt performance at all. I can just use
>>> those if you prefer or alternatively add a comment why _relaxed is fine here.
>>>
>>> This is a bit similar to the discussion for the pinctrl series last year [1].
>>
>> I think it's better to only use the _relaxed version where it actually helps,
>> with a comment about it, and use the normal version elsewhere, in
>> particular in functions that you have copied from the normal nvme driver.
>> I had tried to compare some of your code with the other version and
>> was rather confused by that.
> 
> Oh good, I tell folks the opposite (and others do too). We don't accept 
> random explicit barriers without explanation, but implicit ones are 
> okay? The resulting code on arm32 is also pretty horrible with the L2x0 
> and OMAP sync hooks not that that matters here.
> 
> I don't really care too much which way we go, but we should document one 
> rule and follow that.

I'm sure maz@ has an opinion here too :-)

(3... 2... 1... fight!)

-- 
Hector Martin (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-04-02 13:51     ` Sven Peter
  2022-04-02 16:08       ` Krzysztof Kozlowski
@ 2022-04-04 15:02       ` Rob Herring
  2022-04-04 15:47         ` Hector Martin
  1 sibling, 1 reply; 55+ messages in thread
From: Rob Herring @ 2022-04-04 15:02 UTC (permalink / raw)
  To: Sven Peter
  Cc: Krzysztof Kozlowski, Hector Martin, Alyssa Rosenzweig,
	Arnd Bergmann, Keith Busch, axboe, hch, sagi, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

On Sat, Apr 02, 2022 at 03:51:46PM +0200, Sven Peter wrote:
> On Wed, Mar 23, 2022, at 12:19, Krzysztof Kozlowski wrote:
> > On 21/03/2022 17:50, Sven Peter wrote:
> >> Apple SoCs such as the M1 come with multiple embedded co-processors
> >> running proprietary firmware. Communication with those is established
> >> over a simple mailbox using the RTKit IPC protocol.
> >> 
> >> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> >> ---
> >>  drivers/soc/apple/Kconfig          |  13 +
> >>  drivers/soc/apple/Makefile         |   3 +
> >>  drivers/soc/apple/rtkit-crashlog.c | 147 +++++
> >>  drivers/soc/apple/rtkit-internal.h |  76 +++
> >>  drivers/soc/apple/rtkit.c          | 842 +++++++++++++++++++++++++++++
> >>  include/linux/soc/apple/rtkit.h    | 203 +++++++
> >>  6 files changed, 1284 insertions(+)
> >
> > Isn't this some implementation of a mailbox? If so, it should be in
> > drivers/mailbox. Please don't put all stuff in soc/apple, that's not how
> > Linux is organized. To drivers/soc usually we put drivers which do not
> > fit regular subsystems.
> >
> 
> I put this into soc/apple because I don't think it fits within the mailbox
> framework very well.
> (It actually uses the mailbox framework for the actual communication
> with the hardware with a driver that's already upstream.)
> 
> Essentially, the mailbox subsystem provides a common API to send and
> receive messages over indepedent hardware channels and devicetree bindings
> to describe the relationship between those channels and other drivers.
> 
> One of the features that doesn't really fit is that we need to be able
> to start, shutdown and re-start these co-processors. The NVMe driver

remoteproc does that. Did you look at it? Most remoteproc drivers use 
some combination of mailboxes and shared memory.

Rob

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/9] soc: apple: Add RTKit IPC library
  2022-04-04 15:02       ` Rob Herring
@ 2022-04-04 15:47         ` Hector Martin
  0 siblings, 0 replies; 55+ messages in thread
From: Hector Martin @ 2022-04-04 15:47 UTC (permalink / raw)
  To: Rob Herring, Sven Peter
  Cc: Krzysztof Kozlowski, Alyssa Rosenzweig, Arnd Bergmann,
	Keith Busch, axboe, hch, sagi, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme

On 05/04/2022 00.02, Rob Herring wrote:
> On Sat, Apr 02, 2022 at 03:51:46PM +0200, Sven Peter wrote:
>> On Wed, Mar 23, 2022, at 12:19, Krzysztof Kozlowski wrote:
>>> On 21/03/2022 17:50, Sven Peter wrote:
>>>> Apple SoCs such as the M1 come with multiple embedded co-processors
>>>> running proprietary firmware. Communication with those is established
>>>> over a simple mailbox using the RTKit IPC protocol.
>>>>
>>>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>>>> ---
>>>>  drivers/soc/apple/Kconfig          |  13 +
>>>>  drivers/soc/apple/Makefile         |   3 +
>>>>  drivers/soc/apple/rtkit-crashlog.c | 147 +++++
>>>>  drivers/soc/apple/rtkit-internal.h |  76 +++
>>>>  drivers/soc/apple/rtkit.c          | 842 +++++++++++++++++++++++++++++
>>>>  include/linux/soc/apple/rtkit.h    | 203 +++++++
>>>>  6 files changed, 1284 insertions(+)
>>>
>>> Isn't this some implementation of a mailbox? If so, it should be in
>>> drivers/mailbox. Please don't put all stuff in soc/apple, that's not how
>>> Linux is organized. To drivers/soc usually we put drivers which do not
>>> fit regular subsystems.
>>>
>>
>> I put this into soc/apple because I don't think it fits within the mailbox
>> framework very well.
>> (It actually uses the mailbox framework for the actual communication
>> with the hardware with a driver that's already upstream.)
>>
>> Essentially, the mailbox subsystem provides a common API to send and
>> receive messages over indepedent hardware channels and devicetree bindings
>> to describe the relationship between those channels and other drivers.
>>
>> One of the features that doesn't really fit is that we need to be able
>> to start, shutdown and re-start these co-processors. The NVMe driver
> 
> remoteproc does that. Did you look at it? Most remoteproc drivers use 
> some combination of mailboxes and shared memory.

Remoteproc seems to be mostly about providing a standard interface for
loading firmware images and kickstarting somewhat generic remote
processors, as well as some high-level stuff for virtio. None of that is
useful to us as far as I can tell, because Linux doesn't load the
firmware for these things, it is pre-loaded by the bootloader and
therefore they might as well be fixed-function hardware as far as we're
concerned.

I can certainly see some similarities between the resourceproc API and
what we're doing, but I don't see what it would do for us other than
cause an impedance mismatch. Does it actually *do* something we can use?
Keep in mind these are Apple copros running Apple firmware that will
only ever talk to drivers we write for Apple machines, and we don't
control the firmware.

Impediance mismatches on first glance:

- The assumption that firmware is loaded by Linux seems to be hard-coded
into the subsystem
- Only one boot/shutdown path, while we need different power states and
boot modes depending on the specific instance
- The concept of the "resource table"; we have something similar for at
least one copro, but it's brokered via exchanged messages after it is
booted, and not something we can just look up in the firmware (our plan
was to just put the requested regions in the DT reg node and name them;
the firmware then *after boot* provides a list of mappings it wants from
that list and they can be mapped at that point). At least one other
copro does a subset of this an entirely different way altogether. Apple
aren't consistent and we can't do anything about that.
- Rproc trace buffers: Apple does syslogs but a different, incompatible way.
- ELF coredumps: even if this were useful for the blobs we get, the
blobs loaded by the bootloader themselves are Mach-O binaries, not ELF,
so that'd require some funny binary format conversion on one end or the
other to be able to line them up for postmortem debugging. And Apple's
crashdump format is a custom tag/value type thing.

I'm certainly willing to be convinced to use a kernel subsystem if it
actually does something useful for us, but last time we did that when it
wasn't entirely clear we should (mailbox, for the hardware underlying
these coprocessors) it just resulted in a bunch of headaches because
that subsystem is poorly designed and doesn't seem to have bought us
anything other than limitations. e.g. it is using suboptimal queueing
right now, and we had to switch to the atomic API to make SMC work,
which ends up even lower level, and isn't even properly documented and
different drivers interpret differently, so now it's just pure added
complexity and confusion for ~no gain over just reading/writing the
mailbox registers, which would've been *much* simpler (and more
performant than introducing additional queuing, since the hardware
*does* queuing now which we can't use because mailbox doesn't support
it). Remoteproc kind of seems like an even worse fit here...

Oh, and these device producer/consumer relations cannot be computed
statically as far as I can tell, so each one we introduce is another
special case in distro initramfs image building, since they need to
encode magic additional device-specific module dependency information
somehow since it isn't available in depmod, but only encoded in DTs
which are not necessarily known at initramfs build time.

-- 
Hector Martin (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-03-24  6:16   ` Christoph Hellwig
  2022-04-02 12:47     ` Sven Peter
@ 2022-04-04 15:57     ` Hector Martin
  2022-04-04 15:59       ` Christoph Hellwig
  1 sibling, 1 reply; 55+ messages in thread
From: Hector Martin @ 2022-04-04 15:57 UTC (permalink / raw)
  To: Christoph Hellwig, Sven Peter
  Cc: Keith Busch, Jens Axboe, Sagi Grimberg, Alyssa Rosenzweig,
	Rob Herring, Arnd Bergmann, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme

On 24/03/2022 15.16, Christoph Hellwig wrote:
>> +
>> +//#define DEBUG
> 
> This should not leak into the driverṡ
> 
>> +#include <linux/blk-integrity.h>
> 
> As far as I can tell this driver does not support metadata or PI,
> so why is this include needed?
> 
>> +/* NVM Express NVM Command Set Specification, Revision 1.0a, Figure 18 */
>> +#define NVME_OPCODE_DATA_XFER_HOST_TO_CTRL BIT(0)
>> +#define NVME_OPCODE_DATA_XFER_CTRL_TO_HOST BIT(1)
> 
> Please just use the nvme_is_write helper where you are using these.
> 
>> +static int apple_nvme_sart_dma_setup(void *cookie, struct apple_rtkit_shmem *bfr,
> 
> Please avoid > 80 character lines.

The kernel hard limit is 100-character lines, not 80-character lines.
Maintainers for existing drivers are certainly free to stick to 80 chars
if they like it that way, but I don't see why we should still be
enforcing that for new code. See bdc48fa11e46.

And also:
https://lkml.iu.edu/hypermail/linux/kernel/2005.3/08168.html

Quoth Torvalds, addressing you personally:

"If you or Christoph have 80 character lines, you'll get possibly ugly
wrapped output. Tough. That's _your_ choice. Your hardware limitations
shouldn't be a pain for the rest of us."

-- 
Hector Martin (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-04 15:57     ` Hector Martin
@ 2022-04-04 15:59       ` Christoph Hellwig
  2022-04-04 16:03         ` Sven Peter
                           ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-04-04 15:59 UTC (permalink / raw)
  To: Hector Martin
  Cc: Christoph Hellwig, Sven Peter, Keith Busch, Jens Axboe,
	Sagi Grimberg, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme

On Tue, Apr 05, 2022 at 12:57:33AM +0900, Hector Martin wrote:
> The kernel hard limit is 100-character lines, not 80-character lines.
> Maintainers for existing drivers are certainly free to stick to 80 chars
> if they like it that way, but I don't see why we should still be
> enforcing that for new code. See bdc48fa11e46.

Because 100 is completely utterly unreadable if is not for individual
lines like strings, and that is actually how Linus stated it in CodingStyle.

Your code as-is is completely unreadable and will not go into
drivers/nvme/ in that form.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-04 15:59       ` Christoph Hellwig
@ 2022-04-04 16:03         ` Sven Peter
  2022-04-04 16:05           ` hch
  2022-04-04 16:05         ` Hector Martin
  2022-04-04 18:29         ` Jens Axboe
  2 siblings, 1 reply; 55+ messages in thread
From: Sven Peter @ 2022-04-04 16:03 UTC (permalink / raw)
  To: hch, Hector Martin
  Cc: Keith Busch, axboe, sagi, Alyssa Rosenzweig, Rob Herring,
	Arnd Bergmann, Marc Zyngier, devicetree, linux-arm-kernel,
	linux-kernel, linux-nvme



On Mon, Apr 4, 2022, at 17:59, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 12:57:33AM +0900, Hector Martin wrote:
>> The kernel hard limit is 100-character lines, not 80-character lines.
>> Maintainers for existing drivers are certainly free to stick to 80 chars
>> if they like it that way, but I don't see why we should still be
>> enforcing that for new code. See bdc48fa11e46.
>
> Because 100 is completely utterly unreadable if is not for individual
> lines like strings, and that is actually how Linus stated it in CodingStyle.
>
> Your code as-is is completely unreadable and will not go into
> drivers/nvme/ in that form.

fwiw, I wrote that code and I just forgot to check the line length
after some last minute changes again.
It's already been reduced to 80 chars in my local tree.


Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-04 15:59       ` Christoph Hellwig
  2022-04-04 16:03         ` Sven Peter
@ 2022-04-04 16:05         ` Hector Martin
  2022-04-04 18:29         ` Jens Axboe
  2 siblings, 0 replies; 55+ messages in thread
From: Hector Martin @ 2022-04-04 16:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sven Peter, Keith Busch, Jens Axboe, Sagi Grimberg,
	Alyssa Rosenzweig, Rob Herring, Arnd Bergmann, Marc Zyngier,
	devicetree, linux-arm-kernel, linux-kernel, linux-nvme

On 05/04/2022 00.59, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 12:57:33AM +0900, Hector Martin wrote:
>> The kernel hard limit is 100-character lines, not 80-character lines.
>> Maintainers for existing drivers are certainly free to stick to 80 chars
>> if they like it that way, but I don't see why we should still be
>> enforcing that for new code. See bdc48fa11e46.
> 
> Because 100 is completely utterly unreadable if is not for individual
> lines like strings, and that is actually how Linus stated it in CodingStyle.
> 
> Your code as-is is completely unreadable and will not go into
> drivers/nvme/ in that form.

That line is 81 characters. I'm sure Sven doesn't mind fixing it, but
"your code as-is is completely unreadable" seems uncalled for here, and
rather unreasonable.

-- 
Hector Martin (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-04 16:03         ` Sven Peter
@ 2022-04-04 16:05           ` hch
  0 siblings, 0 replies; 55+ messages in thread
From: hch @ 2022-04-04 16:05 UTC (permalink / raw)
  To: Sven Peter
  Cc: hch, Hector Martin, Keith Busch, axboe, sagi, Alyssa Rosenzweig,
	Rob Herring, Arnd Bergmann, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme

On Mon, Apr 04, 2022 at 06:03:28PM +0200, Sven Peter wrote:
> fwiw, I wrote that code and I just forgot to check the line length
> after some last minute changes again.
> It's already been reduced to 80 chars in my local tree.

Awesome!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-04 15:59       ` Christoph Hellwig
  2022-04-04 16:03         ` Sven Peter
  2022-04-04 16:05         ` Hector Martin
@ 2022-04-04 18:29         ` Jens Axboe
  2022-04-05  6:24           ` Christoph Hellwig
  2 siblings, 1 reply; 55+ messages in thread
From: Jens Axboe @ 2022-04-04 18:29 UTC (permalink / raw)
  To: Christoph Hellwig, Hector Martin
  Cc: Sven Peter, Keith Busch, Sagi Grimberg, Alyssa Rosenzweig,
	Rob Herring, Arnd Bergmann, Marc Zyngier, devicetree,
	linux-arm-kernel, linux-kernel, linux-nvme

On 4/4/22 9:59 AM, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 12:57:33AM +0900, Hector Martin wrote:
>> The kernel hard limit is 100-character lines, not 80-character lines.
>> Maintainers for existing drivers are certainly free to stick to 80 chars
>> if they like it that way, but I don't see why we should still be
>> enforcing that for new code. See bdc48fa11e46.
> 
> Because 100 is completely utterly unreadable if is not for individual
> lines like strings, and that is actually how Linus stated it in
> CodingStyle.
> 
> Your code as-is is completely unreadable and will not go into
> drivers/nvme/ in that form.

Please reconsider how you phrase these objections. Saying the code is
"completely unreadable" because it's _1_ character over your hard limit
is just nonsense, and not a very productive way to deal with this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver
  2022-04-04 18:29         ` Jens Axboe
@ 2022-04-05  6:24           ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-04-05  6:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Hector Martin, Sven Peter, Keith Busch,
	Sagi Grimberg, Alyssa Rosenzweig, Rob Herring, Arnd Bergmann,
	Marc Zyngier, devicetree, linux-arm-kernel, linux-kernel,
	linux-nvme

On Mon, Apr 04, 2022 at 12:29:08PM -0600, Jens Axboe wrote:
> Please reconsider how you phrase these objections. Saying the code is
> "completely unreadable" because it's _1_ character over your hard limit
> is just nonsense, and not a very productive way to deal with this.

Agreed and sorry.  I actually had a piece of code in mind that formatted
all the comments way to long that really tripped me off, but it wans't
this one.  Too much going on right now.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/9] soc: apple: Add SART driver
  2022-04-04 14:58         ` Rob Herring
  2022-04-04 15:01           ` Hector Martin
@ 2022-04-05 15:37           ` Sven Peter
  1 sibling, 0 replies; 55+ messages in thread
From: Sven Peter @ 2022-04-05 15:37 UTC (permalink / raw)
  To: Rob Herring, Arnd Bergmann
  Cc: Hector Martin, Alyssa Rosenzweig, Keith Busch, axboe, hch, sagi,
	Marc Zyngier, DTML, Linux ARM, Linux Kernel Mailing List,
	linux-nvme

On Mon, Apr 4, 2022, at 16:58, Rob Herring wrote:
> On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote:
>> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@svenpeter.dev> wrote:
>> > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote:
>> > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@svenpeter.dev> wrote:
>> > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called
>> > >> SART for some DMA transactions. This adds a simple driver used to
>> > >> configure the memory regions from which DMA transactions are allowed.
>> > >>
>> > >> Co-developed-by: Hector Martin <marcan@marcan.st>
>> > >> Signed-off-by: Hector Martin <marcan@marcan.st>
>> > >> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>> > >
>> > > Can you add some explanation about why this uses a custom interface
>> > > instead of hooking into the dma_map_ops?
>> >
>> > Sure.
>> > In a perfect world this would just be an IOMMU implementation but since
>> > SART can't create any real IOVA space using pagetables it doesn't fit
>> > inside that subsytem.
>> >
>> > In a slightly less perfect world I could just implement dma_map_ops here
>> > but that won't work either because not all DMA buffers of the NVMe
>> > device have to go through SART and those allocations happen
>> > inside the same device and would use the same dma_map_ops.
>> >
>> > The NVMe controller has two separate DMA filters:
>> >
>> >    - NVMMU, which must be set up for any command that uses PRPs and
>> >      ensures that the DMA transactions only touch the pages listed
>> >      inside the PRP structure. NVMMU itself is tightly coupled
>> >      to the NVMe controller: The list of allowed pages is configured
>> >      based on command's tag id and even commands that require no DMA
>> >      transactions must be listed inside NVMMU before they are started.
>> >    - SART, which must be set up for some shared memory buffers (e.g.
>> >      log messages from the NVMe firmware) and for some NVMe debug
>> >      commands that don't use PRPs.
>> >      SART is only loosely coupled to the NVMe controller and could
>> >      also be used together with other devices. It's also the only
>> >      thing that changed between M1 and M1 Pro/Max/Ultra and that's
>> >      why I decided to separate it from the NVMe driver.
>> >
>> > I'll add this explanation to the commit message.
>> 
>> Ok, thanks.
>> 
>> > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
>> > >> +                           phys_addr_t *paddr, size_t *size)
>> > >> +{
>> > >> +       u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
>> > >> +       u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
>> > >
>> > > Why do you use the _relaxed() accessors here and elsewhere in the driver?
>> >
>> > This device itself doesn't do any DMA transactions so it needs no memory
>> > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write
>> > from/to these buffers (multiple times) and they have the required barriers
>> > in place whenever they are used.
>> >
>> > These buffers so far are only allocated at probe time though so even using
>> > the normal writel/readl here won't hurt performance at all. I can just use
>> > those if you prefer or alternatively add a comment why _relaxed is fine here.
>> >
>> > This is a bit similar to the discussion for the pinctrl series last year [1].
>> 
>> I think it's better to only use the _relaxed version where it actually helps,
>> with a comment about it, and use the normal version elsewhere, in
>> particular in functions that you have copied from the normal nvme driver.
>> I had tried to compare some of your code with the other version and
>> was rather confused by that.
>
> Oh good, I tell folks the opposite (and others do too). We don't accept 
> random explicit barriers without explanation, but implicit ones are 
> okay? The resulting code on arm32 is also pretty horrible with the L2x0 
> and OMAP sync hooks not that that matters here.
>
> I don't really care too much which way we go, but we should document one 
> rule and follow that.

I don't have a strong opinion either. Arnd's approach is currently documented
in Documentation/driver-api/device-io.rst fwiw:

  On architectures that require an expensive barrier for serializing against
  DMA, these "relaxed" versions of the MMIO accessors only serialize against
  each other, but contain a less expensive barrier operation. A device driver
  might use these in a particularly performance sensitive fast path, with a
  comment that explains why the usage in a specific location is safe without
  the extra barriers.


Sven

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2022-04-05 22:47 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 16:50 [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Sven Peter
2022-03-21 16:50 ` [PATCH 1/9] dt-bindings: soc: apple: Add Apple SART Sven Peter
2022-03-31 21:23   ` Rob Herring
2022-04-02 12:58     ` Sven Peter
2022-03-21 16:50 ` [PATCH 2/9] dt-bindings: soc: apple: Add ANS NVMe Sven Peter
2022-03-23 11:14   ` Krzysztof Kozlowski
2022-04-02 13:05     ` Sven Peter
2022-04-02 16:06       ` Krzysztof Kozlowski
2022-03-21 16:50 ` [PATCH 3/9] soc: apple: Always include Makefile Sven Peter
2022-03-21 16:50 ` [PATCH 4/9] soc: apple: Add SART driver Sven Peter
2022-03-21 17:07   ` Arnd Bergmann
2022-04-02 12:38     ` Sven Peter
2022-04-02 19:07       ` Arnd Bergmann
2022-04-04 14:58         ` Rob Herring
2022-04-04 15:01           ` Hector Martin
2022-04-05 15:37           ` Sven Peter
2022-03-21 17:17   ` Alyssa Rosenzweig
2022-04-02 12:40     ` Sven Peter
2022-03-21 16:50 ` [PATCH 5/9] soc: apple: Add RTKit IPC library Sven Peter
2022-03-22 13:13   ` Arnd Bergmann
2022-03-22 17:41     ` Robin Murphy
2022-04-02 12:56     ` Sven Peter
2022-04-02 18:30       ` Arnd Bergmann
2022-04-03 10:45         ` Sven Peter
2022-03-22 17:23   ` Alyssa Rosenzweig
2022-04-02 12:50     ` Sven Peter
2022-03-23 11:19   ` Krzysztof Kozlowski
2022-04-02 13:51     ` Sven Peter
2022-04-02 16:08       ` Krzysztof Kozlowski
2022-04-04 15:02       ` Rob Herring
2022-04-04 15:47         ` Hector Martin
2022-03-21 16:50 ` [PATCH 6/9] nvme-apple: Add initial Apple SoC NVMe driver Sven Peter
2022-03-21 17:01   ` Keith Busch
2022-04-02 13:10     ` Sven Peter
2022-03-22 13:38   ` Arnd Bergmann
2022-04-02 13:34     ` Sven Peter
2022-04-02 13:58       ` Janne Grunau
2022-04-02 14:02         ` Sven Peter
2022-04-02 21:26       ` Arnd Bergmann
2022-03-24  6:16   ` Christoph Hellwig
2022-04-02 12:47     ` Sven Peter
2022-04-04 15:57     ` Hector Martin
2022-04-04 15:59       ` Christoph Hellwig
2022-04-04 16:03         ` Sven Peter
2022-04-04 16:05           ` hch
2022-04-04 16:05         ` Hector Martin
2022-04-04 18:29         ` Jens Axboe
2022-04-05  6:24           ` Christoph Hellwig
2022-03-21 16:50 ` [PATCH 7/9] nvme-apple: Serialize command issue Sven Peter
2022-03-24  6:16   ` Christoph Hellwig
2022-04-02 12:42     ` Sven Peter
2022-03-21 16:50 ` [PATCH 8/9] nvme-apple: Add support for multiple power domains Sven Peter
2022-03-21 16:50 ` [PATCH 9/9] nvme-apple: Add support for suspend/resume Sven Peter
2022-03-24  6:17   ` Christoph Hellwig
2022-03-22 17:26 ` [PATCH 0/9] Apple M1 (Pro/Max) NVMe driver Alyssa Rosenzweig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).