IOMMU Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/2] Support for TI Page-based Address Translator
@ 2019-06-07 19:35 Andrew F. Davis via iommu
  2019-06-07 19:35 ` [RFC PATCH 1/2] dt-bindings: soc: ti: Add TI PAT bindings Andrew F. Davis via iommu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Andrew F. Davis via iommu @ 2019-06-07 19:35 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Santosh Shilimkar, Will Deacon,
	Robin Murphy, Joerg Roedel, David Airlie, Daniel Vetter,
	Tero Kristo, William Mills, Tomi Valkeinen, Sumit Semwal,
	John Stultz
  Cc: devicetree, linux-kernel, dri-devel, Andrew F . Davis,
	linaro-mm-sig, iommu, linux-arm-kernel, linux-media

Hello all,

So I've got a new IP on our new SoC I'm looking to make use of and would
like some help figuring out what framework best matches its function. The
IP is called a "Page-based Address Translator" or PAT. A PAT instance
(there are 5 of these things on our J721e device[0]) is basically a
really simple IOMMU sitting on the interconnect between the device bus
and what is effectively our northbridge called
MSMC (DRAM/SRAM/L3-Cache/Coherency controller).

Simplified it looks about like this:

         CPUs
          |
DRAM --- MSMC --- SRAM/L3
          |
        NAVSS - (PATs)
          |
  --- Device Bus ---------
 |      |      |          |
Device  Device  Device   etc..

Each PAT has a set a window in high memory (about 0x48_0000_0000 area)
for which any transaction with an address targeting its window will be
routed into that PAT. The PAT then does a simple calculation based on
the how far into the window the address is and the current page size,
does a lookup to an internally held table of translations, then sends the
transaction back out on the interconnect with a new address. Usually this
address should be towards somewhere in DRAM, but can be some other device
or even back into PAT (I'm not sure there is a valid use-case for this
but just a point of interest).

My gut reaction is that this is an IOMMU which belongs in the IOMMU
subsystem. But there are a couple oddities that make me less sure it is
really suitable for the IOMMU framework. First it doesn't sit in front of
any devices, it sits in front of *all* devices, this means we would have
every device claim it as an IOMMU parent, even though many devices also
have a traditional IOMMU connected. Second, there is only a limited
window of address space per PAT, this means we will get fragmentation and
allocation failures on occasion, in this way it looks to me more like AGP
GART. Third, the window is in high-memory, so unlike some IOMMU devices
which can be used to allow DMA to high-mem from low-mem only devices, PAT
can't be used for that. Lastly it doesn't provide any isolation, if the
access does not target the PAT window it is not used (that is not to say
we don't have isolation, just that it is taken care of by other parts of
the interconnect).

This means, to me, that PAT has one main purpose: making
physically-contiguous views of scattered pages in system memory for DMA.
But it does that really well, the whole translation table is held in a
PAT-internal SRAM giving 1 bus cycle latency and at full bus bandwidth.

So what are my options here, is IOMMU the right way to go or not?

Looking around the kernel I also see the char dev ARP/GART interface
which looks like a good fit, but also looks quite dated and my guess
deprecated at this point. Moving right along..

Another thing I saw is we have the support upstream of the DMM device[1]
available in some OMAPx/AM57x SoCs. I'm a little more familiar with this
device. The DMM is a bundle of IPs and in fact one of them is called
"PAT" and it even does basically the same thing this incarnation of "PAT"
does. It's upstream integration design is a bit questionable
unfortunately, the DMM support was integrated into the OMAPDRM display
driver, which does make some sense then given its support for rotation
(using TILER IP contained in DMM). The issue with this was that the
DMM/TILER/PAT IP was not part of the our display IP, but instead out at
the end of the shared device bus, inside the external memory controller.
Like this new PAT this meant that any IP that could make use of it, but
only the display framework could actually provide buffers backed by it.
This meant, for instance, if we wanted to decode some video buffer using
our video decoder we would have to allocate from DRM framework then pass
that over to the V4L2 system. This doesn't make much sense and required
the user-space to know about this odd situation and allocate from the
right spot or else have to use up valuable CMA space or waste memory with
dedicated carveouts.

Another idea would be to have this as a special central allocator
(exposed through DMA-BUF heaps[2] or ION) that would give out normal
system memory as a DMA-BUF but remap it with PAT if a device that only
supports contiguous memory tries to attach/map that DMA-BUF.

One last option would be to allow user-space to choose to make the buffer
contiguous when it needs. That's what the driver in this series allows.
We expose a remapping device, user-space passes it a non-contiguous
DMA-BUF handle and it passes a contiguous one back. Simple as that.

So how do we use this, lets take Android for example, we don't know at
allocation time if a rendering buffer will end up going back into the GPU
for further processing, or if it will be consumed directly by the display.
This is a problem for us as our GPU can work with non-contiguous buffers
but our display cannot, so any buffers that could possibly go to the
display at some point currently needs to be allocated as contiguous from
the start, this leads to a lot of unneeded use of carveout/CMA memory.
With this driver on the other hand, we allocate regular non-contiguous
system memory (again using DMA-BUF heaps, but ION could work here too),
then only when a buffer is about to be sent to the display we pass the
handle to this DMA-BUF to our driver here and take the handle it gives
back and pass that to the display instead.

As said, it is probably not the ideal solution but it does work and was
used for some early testing of the IP.

Well, sorry for the wall of text.
Any and all suggestions very welcome and appreciated.

Thanks,
Andrew

[0] http://www.ti.com/lit/pdf/spruil1
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
[2] https://lkml.org/lkml/2019/6/6/1211

Andrew F. Davis (2):
  dt-bindings: soc: ti: Add TI PAT bindings
  soc: ti: Add Support for the TI Page-based Address Translator (PAT)

 .../devicetree/bindings/misc/ti,pat.txt       |  34 ++
 drivers/soc/ti/Kconfig                        |   9 +
 drivers/soc/ti/Makefile                       |   1 +
 drivers/soc/ti/ti-pat.c                       | 569 ++++++++++++++++++
 include/uapi/linux/ti-pat.h                   |  44 ++
 5 files changed, 657 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/ti,pat.txt
 create mode 100644 drivers/soc/ti/ti-pat.c
 create mode 100644 include/uapi/linux/ti-pat.h

-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] dt-bindings: soc: ti: Add TI PAT bindings
  2019-06-07 19:35 [RFC PATCH 0/2] Support for TI Page-based Address Translator Andrew F. Davis via iommu
@ 2019-06-07 19:35 ` Andrew F. Davis via iommu
  2019-06-07 19:35 ` [RFC PATCH 2/2] soc: ti: Add Support for the TI Page-based Address Translator (PAT) Andrew F. Davis via iommu
  2019-07-02 15:49 ` [RFC PATCH 0/2] Support for TI Page-based Address Translator Robin Murphy
  2 siblings, 0 replies; 5+ messages in thread
From: Andrew F. Davis via iommu @ 2019-06-07 19:35 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Santosh Shilimkar, Will Deacon,
	Robin Murphy, Joerg Roedel, David Airlie, Daniel Vetter,
	Tero Kristo, William Mills, Tomi Valkeinen, Sumit Semwal,
	John Stultz
  Cc: devicetree, linux-kernel, dri-devel, Andrew F . Davis,
	linaro-mm-sig, iommu, linux-arm-kernel, linux-media

This patch adds the bindings for the Page-based Address Translator (PAT)
present on various TI SoCs. A Page-based Address Translator (PAT) device
performs address translation using tables stored in an internal SRAM.
Each PAT supports a set number of pages, each occupying a programmable
4KB, 16KB, 64KB, or 1MB of addresses in a window for which an incoming
transaction will be translated.

Signed-off-by: Andrew F. Davis <afd@ti.com>
---
 .../devicetree/bindings/misc/ti,pat.txt       | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/ti,pat.txt

diff --git a/Documentation/devicetree/bindings/misc/ti,pat.txt b/Documentation/devicetree/bindings/misc/ti,pat.txt
new file mode 100644
index 000000000000..fac20d45ad4c
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/ti,pat.txt
@@ -0,0 +1,34 @@
+Texas Instruments Page-based Address Translator (PAT) driver binding
+--------------------------------------------------------------------
+
+A Page-based Address Translator (PAT) device performs address translation
+using tables stored in an internal SRAM. Each PAT supports a set number of
+pages, each occupying a programmable 4KB, 16KB, 64KB, or 1MB of addresses
+in a window for which an incoming transaction will be translated.
+
+TI-PAT controller Device Node
+=============================
+
+The TI-PAT node describes the Texas Instrument's Page-based Address
+Translator (PAT).
+
+Required properties:
+-------------------
+- compatible: should be "ti,j721e-pat"
+- reg-names:
+	mmrs - Memory mapped registers region
+	table - Location of the table of translation pages
+	window - Window of memory addresses translated by this PAT
+- reg: register addresses corresponding to the above
+
+Example:
+
+navss_pat0: pat@31010000 {
+	compatible = "ti,j721e-pat";
+	reg = <0x00 0x31010000 0x00 0x00000100>,
+	      <0x00 0x36400000 0x00 0x00040000>,
+	      <0x48 0x00000000 0x00 0x40000000>;
+	reg-names = "mmrs",
+	            "table",
+	            "window";
+};
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] soc: ti: Add Support for the TI Page-based Address Translator (PAT)
  2019-06-07 19:35 [RFC PATCH 0/2] Support for TI Page-based Address Translator Andrew F. Davis via iommu
  2019-06-07 19:35 ` [RFC PATCH 1/2] dt-bindings: soc: ti: Add TI PAT bindings Andrew F. Davis via iommu
@ 2019-06-07 19:35 ` Andrew F. Davis via iommu
  2019-06-18  9:07   ` Tero Kristo via iommu
  2019-07-02 15:49 ` [RFC PATCH 0/2] Support for TI Page-based Address Translator Robin Murphy
  2 siblings, 1 reply; 5+ messages in thread
From: Andrew F. Davis via iommu @ 2019-06-07 19:35 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Santosh Shilimkar, Will Deacon,
	Robin Murphy, Joerg Roedel, David Airlie, Daniel Vetter,
	Tero Kristo, William Mills, Tomi Valkeinen, Sumit Semwal,
	John Stultz
  Cc: devicetree, linux-kernel, dri-devel, Andrew F . Davis,
	linaro-mm-sig, iommu, linux-arm-kernel, linux-media

This patch adds a driver for the Page-based Address Translator (PAT)
present on various TI SoCs. A PAT device performs address translation
using tables stored in an internal SRAM. Each PAT supports a set number
of pages, each occupying a programmable 4KB, 16KB, 64KB, or 1MB of
addresses in a window for which an incoming transaction will be
translated.

Signed-off-by: Andrew F. Davis <afd@ti.com>
---
 drivers/soc/ti/Kconfig      |   9 +
 drivers/soc/ti/Makefile     |   1 +
 drivers/soc/ti/ti-pat.c     | 569 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ti-pat.h |  44 +++
 4 files changed, 623 insertions(+)
 create mode 100644 drivers/soc/ti/ti-pat.c
 create mode 100644 include/uapi/linux/ti-pat.h

diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
index f0be35d3dcba..b838ae74d01f 100644
--- a/drivers/soc/ti/Kconfig
+++ b/drivers/soc/ti/Kconfig
@@ -86,4 +86,13 @@ config TI_SCI_INTA_MSI_DOMAIN
 	help
 	  Driver to enable Interrupt Aggregator specific MSI Domain.
 
+config TI_PAT
+	tristate "TI PAT DMA-BUF exporter"
+	select REGMAP
+	help
+	  Driver for TI Page-based Address Translator (PAT). This driver
+	  provides the an API allowing the remapping of a non-contiguous
+	  DMA-BUF into a contiguous one that is sutable for devices needing
+	  coniguous memory.
+
 endif # SOC_TI
diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
index b3868d392d4f..1369642b40c3 100644
--- a/drivers/soc/ti/Makefile
+++ b/drivers/soc/ti/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_AMX3_PM)			+= pm33xx.o
 obj-$(CONFIG_WKUP_M3_IPC)		+= wkup_m3_ipc.o
 obj-$(CONFIG_TI_SCI_PM_DOMAINS)		+= ti_sci_pm_domains.o
 obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN)	+= ti_sci_inta_msi.o
+obj-$(CONFIG_TI_PAT)			+= ti-pat.o
diff --git a/drivers/soc/ti/ti-pat.c b/drivers/soc/ti/ti-pat.c
new file mode 100644
index 000000000000..7359ea0f7ccf
--- /dev/null
+++ b/drivers/soc/ti/ti-pat.c
@@ -0,0 +1,569 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * TI PAT mapped DMA-BUF memory re-exporter
+ *
+ * Copyright (C) 2018-2019 Texas Instruments Incorporated - http://www.ti.com/
+ *	Andrew F. Davis <afd@ti.com>
+ */
+
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/mod_devicetable.h>
+#include <linux/uaccess.h>
+#include <linux/miscdevice.h>
+#include <linux/regmap.h>
+#include <linux/dma-buf.h>
+#include <linux/genalloc.h>
+#include <linux/vmalloc.h>
+#include <linux/slab.h>
+
+#include <linux/ti-pat.h>
+
+#define TI_PAT_DRIVER_NAME	"ti-pat"
+
+/* TI PAT MMRS registers */
+#define TI_PAT_MMRS_PID		0x0 /* Revision Register */
+#define TI_PAT_MMRS_CONFIG	0x4 /* Config Register */
+#define TI_PAT_MMRS_CONTROL	0x10 /* Control Register */
+
+/* TI PAT CONTROL register field values */
+#define TI_PAT_CONTROL_ARB_MODE_UF	0x0 /* Updates first */
+#define TI_PAT_CONTROL_ARB_MODE_RR	0x2 /* Round-robin */
+
+#define TI_PAT_CONTROL_PAGE_SIZE_4KB	0x0
+#define TI_PAT_CONTROL_PAGE_SIZE_16KB	0x1
+#define TI_PAT_CONTROL_PAGE_SIZE_64KB	0x2
+#define TI_PAT_CONTROL_PAGE_SIZE_1MB	0x3
+
+static unsigned int ti_pat_page_sizes[] = {
+	[TI_PAT_CONTROL_PAGE_SIZE_4KB]  = 4 * 1024,
+	[TI_PAT_CONTROL_PAGE_SIZE_16KB] = 16 * 1024,
+	[TI_PAT_CONTROL_PAGE_SIZE_64KB] = 64 * 1024,
+	[TI_PAT_CONTROL_PAGE_SIZE_1MB]  = 1024 * 1024,
+};
+
+enum ti_pat_mmrs_fields {
+	/* Revision */
+	F_PID_MAJOR,
+	F_PID_MINOR,
+
+	/* Controls */
+	F_CONTROL_ARB_MODE,
+	F_CONTROL_PAGE_SIZE,
+	F_CONTROL_REPLACE_OID_EN,
+	F_CONTROL_EN,
+
+	/* sentinel */
+	F_MAX_FIELDS
+};
+
+static const struct reg_field ti_pat_mmrs_reg_fields[] = {
+	/* Revision */
+	[F_PID_MAJOR]			= REG_FIELD(TI_PAT_MMRS_PID, 8, 10),
+	[F_PID_MINOR]			= REG_FIELD(TI_PAT_MMRS_PID, 0, 5),
+	/* Controls */
+	[F_CONTROL_ARB_MODE]		= REG_FIELD(TI_PAT_MMRS_CONTROL, 6, 7),
+	[F_CONTROL_PAGE_SIZE]		= REG_FIELD(TI_PAT_MMRS_CONTROL, 4, 5),
+	[F_CONTROL_REPLACE_OID_EN]	= REG_FIELD(TI_PAT_MMRS_CONTROL, 1, 1),
+	[F_CONTROL_EN]			= REG_FIELD(TI_PAT_MMRS_CONTROL, 0, 0),
+};
+
+/**
+ * struct ti_pat_data - PAT device instance data
+ * @dev: PAT device structure
+ * @mdev: misc device
+ * @mmrs_map: Register map of MMRS region
+ * @table_base: Base address of TABLE region
+ */
+struct ti_pat_data {
+	struct device *dev;
+	struct miscdevice mdev;
+	struct regmap *mmrs_map;
+	struct regmap_field *mmrs_fields[F_MAX_FIELDS];
+	void __iomem *table_base;
+	unsigned int page_count;
+	unsigned int page_size;
+	phys_addr_t window_base;
+	struct gen_pool *pool;
+};
+
+struct ti_pat_dma_buf_attachment {
+	struct device *dev;
+	struct sg_table *table;
+	struct ti_pat_buffer *buffer;
+	struct list_head list;
+};
+
+struct ti_pat_buffer {
+	struct ti_pat_data *pat;
+	struct dma_buf *i_dma_buf;
+	size_t size;
+	unsigned long offset;
+	struct dma_buf *e_dma_buf;
+
+	struct dma_buf_attachment *attachment;
+	struct sg_table *sgt;
+
+	struct list_head attachments;
+	int map_count;
+
+	struct mutex lock;
+};
+
+static const struct regmap_config ti_pat_regmap_config = {
+	.reg_bits = 32,
+	.val_bits = 32,
+	.reg_stride = 4,
+};
+
+static int ti_pat_dma_buf_attach(struct dma_buf *dmabuf,
+				 struct dma_buf_attachment *attachment)
+{
+	struct ti_pat_dma_buf_attachment *a;
+	struct ti_pat_buffer *buffer = dmabuf->priv;
+
+	a = kzalloc(sizeof(*a), GFP_KERNEL);
+	if (!a)
+		return -ENOMEM;
+
+	a->dev = attachment->dev;
+	a->buffer = buffer;
+	INIT_LIST_HEAD(&a->list);
+
+	a->table = kzalloc(sizeof(*a->table), GFP_KERNEL);
+	if (!a->table) {
+		kfree(a);
+		return -ENOMEM;
+	}
+
+	if (sg_alloc_table(a->table, 1, GFP_KERNEL)) {
+		kfree(a->table);
+		kfree(a);
+		return -ENOMEM;
+	}
+
+	sg_set_page(a->table->sgl, pfn_to_page(PFN_DOWN(buffer->offset)), buffer->size, 0);
+
+	attachment->priv = a;
+
+	mutex_lock(&buffer->lock);
+	/* First time attachment we attach to parent */
+	if (list_empty(&buffer->attachments)) {
+		buffer->attachment = dma_buf_attach(buffer->i_dma_buf, buffer->pat->dev);
+		if (IS_ERR(buffer->attachment)) {
+			dev_err(buffer->pat->dev, "Unable to attach to parent DMA-BUF\n");
+			mutex_unlock(&buffer->lock);
+			kfree(a->table);
+			kfree(a);
+			return PTR_ERR(buffer->attachment);
+		}
+	}
+	list_add(&a->list, &buffer->attachments);
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static void ti_pat_dma_buf_detatch(struct dma_buf *dmabuf,
+				   struct dma_buf_attachment *attachment)
+{
+	struct ti_pat_dma_buf_attachment *a = attachment->priv;
+	struct ti_pat_buffer *buffer = dmabuf->priv;
+
+	mutex_lock(&buffer->lock);
+	list_del(&a->list);
+	/* Last attachment we detach from parent */
+	if (list_empty(&buffer->attachments)) {
+		dma_buf_detach(buffer->i_dma_buf, buffer->attachment);
+		buffer->attachment = NULL;
+	}
+	mutex_unlock(&buffer->lock);
+
+	kfree(a);
+}
+
+static void ti_pat_set_page(struct ti_pat_data *pat, size_t page_id, dma_addr_t dma_address)
+{
+	unsigned int j = page_id >> 8;
+	unsigned int k = page_id & 0xff;
+	dma_addr_t offset = (j * 0x1000) + (k * 0x8);
+	void __iomem *table_base_l = pat->table_base + offset;
+	void __iomem *table_base_h = table_base_l + 0x4;
+
+	u32 base_l = dma_address >> 12;
+	u32 base_h = (dma_address >> 44) & GENMASK(3, 0);
+
+	dev_dbg(pat->dev, "Enabling PAT index: %zu pointing to %pad\n", page_id, &dma_address);
+
+	writel(base_l, table_base_l);
+	writel(BIT(31) | base_h, table_base_h);
+}
+
+static void ti_pat_unset_page(struct ti_pat_data *pat, size_t page_id)
+{
+	unsigned int j = page_id >> 8;
+	unsigned int k = page_id & 0xff;
+	dma_addr_t offset = (j * 0x1000) + (k * 0x8);
+	void __iomem *table_base_l = pat->table_base + offset;
+	void __iomem *table_base_h = table_base_l + 0x4;
+
+	dev_dbg(pat->dev, "Disabling PAT index: %zu\n", page_id);
+
+	writel(0, table_base_h);
+}
+
+static struct sg_table *ti_pat_map_dma_buf(struct dma_buf_attachment *attachment,
+					   enum dma_data_direction direction)
+{
+	struct ti_pat_dma_buf_attachment *a = attachment->priv;
+	struct ti_pat_buffer *buffer = a->buffer;
+	struct ti_pat_data *pat = buffer->pat;
+	struct sg_table *table = a->table;
+	struct scatterlist *s;
+	unsigned int i, s_len;
+	size_t page_id;
+	int ret;
+
+	mutex_lock(&buffer->lock);
+	/* First time mapping we map to parent */
+	if (!buffer->map_count) {
+		buffer->sgt = dma_buf_map_attachment(buffer->attachment, DMA_BIDIRECTIONAL);
+		if (IS_ERR(buffer->sgt)) {
+			dev_err(pat->dev, "Unable to map parent DMA-BUF\n");
+			return buffer->sgt;
+		}
+
+		/* And program PAT area for this set of pages */
+		page_id = (buffer->offset - pat->window_base) / pat->page_size;
+		for_each_sg(buffer->sgt->sgl, s, buffer->sgt->nents, i) {
+			if (s->offset) {
+				dev_err(pat->dev, "Cannot use offset buffers\n");
+				ret = -EINVAL;
+				goto unmap;
+			}
+
+			if (s->length % pat->page_size) {
+				dev_err(pat->dev, "Cannot use buffers not a multiple of page size\n");
+				ret = -EINVAL;
+				goto unmap;
+			}
+
+			for (s_len = 0; s_len < s->length; s_len += pat->page_size)
+				ti_pat_set_page(pat, page_id++, s->dma_address + s_len);
+		}
+	}
+	buffer->map_count++;
+	mutex_unlock(&buffer->lock);
+
+	/* Map the attached device's table to get DMA addresses */
+	if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents, direction, DMA_ATTR_SKIP_CPU_SYNC))
+		return ERR_PTR(-ENOMEM);
+
+	return table;
+
+unmap:
+	dma_buf_unmap_attachment(buffer->attachment, buffer->sgt, DMA_BIDIRECTIONAL);
+	return ERR_PTR(ret);
+}
+
+static void ti_pat_unmap_dma_buf(struct dma_buf_attachment *attachment,
+				 struct sg_table *table,
+				 enum dma_data_direction direction)
+{
+	struct ti_pat_dma_buf_attachment *a = attachment->priv;
+	struct ti_pat_buffer *buffer = a->buffer;
+	struct ti_pat_data *pat = buffer->pat;
+
+	/* Unmap the attached device's table */
+	dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents, direction, DMA_ATTR_SKIP_CPU_SYNC);
+
+	mutex_lock(&buffer->lock);
+	buffer->map_count--;
+	/* Last mapping we unmap from parent */
+	if (!buffer->map_count) {
+		/* Disable PAT pages for this area */
+		size_t page_start = (buffer->offset - pat->window_base) / pat->page_size;
+		size_t page_end = page_start + (buffer->size / pat->page_size);
+		for (; page_start < page_end; page_start++)
+			ti_pat_unset_page(pat, page_start);
+
+		dma_buf_unmap_attachment(buffer->attachment, buffer->sgt, DMA_BIDIRECTIONAL);
+		buffer->sgt = NULL;
+	}
+	mutex_unlock(&buffer->lock);
+}
+
+static void ti_pat_dma_buf_release(struct dma_buf *dmabuf)
+{
+	struct ti_pat_buffer *buffer = dmabuf->priv;
+
+	if (buffer->attachment && buffer->sgt)
+		dma_buf_unmap_attachment(buffer->attachment, buffer->sgt, DMA_BIDIRECTIONAL);
+	if (buffer->i_dma_buf && !IS_ERR_OR_NULL(buffer->attachment))
+		dma_buf_detach(buffer->i_dma_buf, buffer->attachment);
+	if (buffer->i_dma_buf)
+		dma_buf_put(buffer->i_dma_buf);
+
+	if (buffer->offset)
+		gen_pool_free(buffer->pat->pool, buffer->offset, buffer->size);
+
+	kfree(buffer);
+}
+
+void *ti_pat_dma_buf_map(struct dma_buf *dma_buf, unsigned long page_num)
+{
+	return NULL;
+}
+
+int ti_pat_dma_buf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma)
+{
+	return -EINVAL;
+}
+
+static const struct dma_buf_ops dma_buf_ops = {
+	.attach = ti_pat_dma_buf_attach,
+	.detach = ti_pat_dma_buf_detatch,
+
+	.map_dma_buf = ti_pat_map_dma_buf,
+	.unmap_dma_buf = ti_pat_unmap_dma_buf,
+
+	.release = ti_pat_dma_buf_release,
+
+	.map = ti_pat_dma_buf_map,
+	.mmap = ti_pat_dma_buf_mmap,
+};
+
+int ti_pat_export(struct ti_pat_data *pat,
+		  struct dma_buf *i_dma_buf,
+		  struct dma_buf **e_dma_buf)
+{
+	struct ti_pat_buffer *buffer;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	int ret;
+
+	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	buffer->pat = pat;
+	buffer->i_dma_buf = i_dma_buf;
+	buffer->size = buffer->i_dma_buf->size;
+	mutex_init(&buffer->lock);
+	INIT_LIST_HEAD(&buffer->attachments);
+	buffer->map_count = 0;
+
+	/* Reserve PAT space */
+	buffer->offset = gen_pool_alloc(buffer->pat->pool, buffer->size);
+	if (!buffer->offset) {
+		ret = -ENOMEM;
+		goto free_buffer;
+	}
+
+	exp_info.ops = &dma_buf_ops;
+	exp_info.size = buffer->size;
+	exp_info.flags = O_RDWR;
+	exp_info.priv = buffer;
+
+	*e_dma_buf = dma_buf_export(&exp_info);
+	if (IS_ERR(*e_dma_buf)) {
+		ret = PTR_ERR(*e_dma_buf);
+		goto free_pool;
+	}
+
+	return 0;
+
+free_pool:
+	gen_pool_free(buffer->pat->pool, buffer->offset, buffer->size);
+free_buffer:
+	kfree(buffer);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ti_pat_export);
+
+static long ti_pat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct ti_pat_data *pat = container_of(file->private_data, struct ti_pat_data, mdev);
+
+	switch (cmd) {
+	case TI_PAT_IOC_EXPORT:
+	{
+		struct ti_pat_export_data export;
+		struct dma_buf *i_dma_buf;
+		struct dma_buf *e_dma_buf;
+		int ret;
+
+		if (_IOC_SIZE(cmd) > sizeof(export))
+			return -EINVAL;
+
+		if (copy_from_user(&export, (void __user *)arg, _IOC_SIZE(cmd)))
+			return -EFAULT;
+
+		i_dma_buf = dma_buf_get(export.fd);
+		if (IS_ERR(i_dma_buf))
+			return PTR_ERR(i_dma_buf);
+
+		ret = ti_pat_export(pat, i_dma_buf, &e_dma_buf);
+		if (ret) {
+			dma_buf_put(i_dma_buf);
+			return ret;
+		}
+
+		export.fd = dma_buf_fd(e_dma_buf, O_CLOEXEC);
+		if (export.fd < 0) {
+			dma_buf_put(e_dma_buf);
+			dma_buf_put(i_dma_buf);
+			return export.fd;
+		}
+
+		if (copy_to_user((void __user *)arg, &export, _IOC_SIZE(cmd)))
+			return -EFAULT;
+
+		break;
+	}
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
+static const struct file_operations ti_pat_fops = {
+	.owner          = THIS_MODULE,
+	.unlocked_ioctl = ti_pat_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= ti_pat_ioctl,
+#endif
+};
+
+static const struct of_device_id ti_pat_of_match[] = {
+	{ .compatible = "ti,j721e-pat", },
+	{ /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, ti_pat_of_match);
+
+static int ti_pat_probe(struct platform_device *pdev)
+{
+	struct ti_pat_data *pat;
+	struct resource *res;
+	void __iomem *base;
+	unsigned int revision_major;
+	unsigned int revision_minor;
+	resource_size_t size;
+	size_t page_size;
+	int i, ret;
+
+	pat = devm_kzalloc(&pdev->dev, sizeof(*pat), GFP_KERNEL);
+	if (!pat)
+		return -ENOMEM;
+	platform_set_drvdata(pdev, pat);
+	pat->dev = &pdev->dev;
+
+	/* MMRS */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	base = devm_ioremap_resource(pat->dev, res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+
+	pat->mmrs_map = devm_regmap_init_mmio(pat->dev, base, &ti_pat_regmap_config);
+	if (IS_ERR(pat->mmrs_map)) {
+		dev_err(pat->dev, "Unable to allocate MMRS register map\n");
+		return PTR_ERR(pat->mmrs_map);
+	}
+
+	for (i = 0; i < F_MAX_FIELDS; i++) {
+		pat->mmrs_fields[i] = devm_regmap_field_alloc(pat->dev, pat->mmrs_map, ti_pat_mmrs_reg_fields[i]);
+		if (IS_ERR(pat->mmrs_fields[i])) {
+			dev_err(pat->dev, "Unable to allocate Regmap fields\n");
+			return PTR_ERR(pat->mmrs_fields[i]);
+		}
+	}
+
+	ret = regmap_read(pat->mmrs_map, TI_PAT_MMRS_CONFIG, &pat->page_count);
+	if (ret) {
+		dev_err(pat->dev, "Unable to read device page count\n");
+		return ret;
+	}
+
+	ret = regmap_field_read(pat->mmrs_fields[F_PID_MAJOR], &revision_major);
+	if (ret) {
+		dev_err(pat->dev, "Unable to read device major revision\n");
+		return ret;
+	}
+
+	ret = regmap_field_read(pat->mmrs_fields[F_PID_MINOR], &revision_minor);
+	if (ret) {
+		dev_err(pat->dev, "Unable to read device minor revision\n");
+		return ret;
+	}
+
+	dev_info(pat->dev, "Found PAT Rev %d.%d with %d pages\n", revision_major, revision_minor, pat->page_count);
+
+	/* TABLE */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	if (!res) {
+		dev_err(pat->dev, "Unable to find TABLE IO resource\n");
+		return -ENOENT;
+	}
+	size = resource_size(res);
+
+	/* 256 pages per 4KB of table space */
+	if (size != (pat->page_count << 4))
+		dev_warn(pat->dev, "TABLE region size (%llu) does not match reported page count\n", size);
+
+	pat->table_base = devm_ioremap_resource(pat->dev, res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+
+	/* WINDOW */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 2);
+	if (!res) {
+		dev_err(pat->dev, "Unable to find WINDOW IO resource\n");
+		return -ENOENT;
+	}
+	pat->window_base = res->start;
+	size = resource_size(res);
+
+	pat->page_size = PAGE_SIZE;
+	for (page_size = 0; page_size < ARRAY_SIZE(ti_pat_page_sizes); page_size++)
+		if (ti_pat_page_sizes[page_size] == pat->page_size)
+			break;
+	if (page_size == ARRAY_SIZE(ti_pat_page_sizes)) {
+		dev_err(pat->dev, "Unsupported PAGE_SIZE (%d)\n", pat->page_size);
+		return -EINVAL;
+	}
+	regmap_field_write(pat->mmrs_fields[F_CONTROL_PAGE_SIZE], page_size);
+
+	/* Enable this PAT module */
+	regmap_field_write(pat->mmrs_fields[F_CONTROL_EN], 1);
+
+	pat->pool = gen_pool_create(PAGE_SHIFT, -1);
+	if (!pat->pool)
+		return -ENOMEM;
+	gen_pool_add(pat->pool, pat->window_base, size, -1);
+
+	pat->mdev.minor = MISC_DYNAMIC_MINOR;
+	pat->mdev.name = pdev->name;
+	pat->mdev.fops = &ti_pat_fops;
+	pat->mdev.parent = NULL;
+	ret = misc_register(&pat->mdev);
+	if (ret) {
+		dev_err(pat->dev, "Unable to register misc device\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+static struct platform_driver ti_pat_driver = {
+	.probe = ti_pat_probe,
+	.driver = {
+		.name = TI_PAT_DRIVER_NAME,
+		.of_match_table = ti_pat_of_match,
+	},
+};
+module_platform_driver(ti_pat_driver);
+
+MODULE_AUTHOR("Andrew F. Davis <afd@ti.com>");
+MODULE_DESCRIPTION("TI PAT mapped DMA-BUF memory exporter");
+MODULE_LICENSE("GPL v2");
diff --git a/include/uapi/linux/ti-pat.h b/include/uapi/linux/ti-pat.h
new file mode 100644
index 000000000000..b06d79454815
--- /dev/null
+++ b/include/uapi/linux/ti-pat.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * TI PAT mapped DMA-BUF memory exporter UAPI
+ *
+ * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
+ *	Andrew F. Davis <afd@ti.com>
+ */
+
+#ifndef _UAPI_LINUX_TI_PAT_H
+#define _UAPI_LINUX_TI_PAT_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/**
+ * DOC: TI PAT Userspace API
+ *
+ * create a client by opening /dev/ti-pat
+ * most operations handled via following ioctls
+ */
+
+/**
+ * struct ti_pat_allocation_data - metadata passed from userspace for allocations
+ * @fd:			populated with DMA-BUF FD for this allocation
+ * @flags:		flags for the allocation
+ *
+ * Provided by userspace as an argument to the ioctl
+ */
+struct ti_pat_export_data {
+	__u32 fd;
+	__u32 flags;
+};
+
+#define TI_PAT_IOC_MAGIC 'P'
+
+/**
+ * DOC: TI_PAT_IOC_EXPORT - Re-export DMA-BUF through TI PAT
+ *
+ * Takes an ti_pat_export_data struct and returns it with the fd field
+ * populated with the DMA-BUF handle for the new export.
+ */
+#define TI_PAT_IOC_EXPORT _IOWR(TI_PAT_IOC_MAGIC, 0, struct ti_pat_export_data)
+
+#endif /* _UAPI_LINUX_TI_PAT_H */
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 2/2] soc: ti: Add Support for the TI Page-based Address Translator (PAT)
  2019-06-07 19:35 ` [RFC PATCH 2/2] soc: ti: Add Support for the TI Page-based Address Translator (PAT) Andrew F. Davis via iommu
@ 2019-06-18  9:07   ` Tero Kristo via iommu
  0 siblings, 0 replies; 5+ messages in thread
From: Tero Kristo via iommu @ 2019-06-18  9:07 UTC (permalink / raw)
  To: Andrew F. Davis, Rob Herring, Mark Rutland, Santosh Shilimkar,
	Will Deacon, Robin Murphy, Joerg Roedel, David Airlie,
	Daniel Vetter, William Mills, Tomi Valkeinen, Sumit Semwal,
	John Stultz
  Cc: devicetree, linux-kernel, dri-devel, linaro-mm-sig, iommu,
	linux-arm-kernel, linux-media

On 07/06/2019 22:35, Andrew F. Davis wrote:
> This patch adds a driver for the Page-based Address Translator (PAT)
> present on various TI SoCs. A PAT device performs address translation
> using tables stored in an internal SRAM. Each PAT supports a set number
> of pages, each occupying a programmable 4KB, 16KB, 64KB, or 1MB of
> addresses in a window for which an incoming transaction will be
> translated.
> 
> Signed-off-by: Andrew F. Davis <afd@ti.com>
> ---
>   drivers/soc/ti/Kconfig      |   9 +
>   drivers/soc/ti/Makefile     |   1 +
>   drivers/soc/ti/ti-pat.c     | 569 ++++++++++++++++++++++++++++++++++++
>   include/uapi/linux/ti-pat.h |  44 +++
>   4 files changed, 623 insertions(+)
>   create mode 100644 drivers/soc/ti/ti-pat.c
>   create mode 100644 include/uapi/linux/ti-pat.h
> 
> diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
> index f0be35d3dcba..b838ae74d01f 100644
> --- a/drivers/soc/ti/Kconfig
> +++ b/drivers/soc/ti/Kconfig
> @@ -86,4 +86,13 @@ config TI_SCI_INTA_MSI_DOMAIN
>   	help
>   	  Driver to enable Interrupt Aggregator specific MSI Domain.
>   
> +config TI_PAT
> +	tristate "TI PAT DMA-BUF exporter"
> +	select REGMAP

What is the reasoning for using regmap for internal register access? Why 
not just use direct readl/writel for everything? To me it seems this is 
only used during probe time also...

> +	help
> +	  Driver for TI Page-based Address Translator (PAT). This driver
> +	  provides the an API allowing the remapping of a non-contiguous
> +	  DMA-BUF into a contiguous one that is sutable for devices needing
> +	  coniguous memory.

Minor typo: contiguous.

> +
>   endif # SOC_TI
> diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
> index b3868d392d4f..1369642b40c3 100644
> --- a/drivers/soc/ti/Makefile
> +++ b/drivers/soc/ti/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_AMX3_PM)			+= pm33xx.o
>   obj-$(CONFIG_WKUP_M3_IPC)		+= wkup_m3_ipc.o
>   obj-$(CONFIG_TI_SCI_PM_DOMAINS)		+= ti_sci_pm_domains.o
>   obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN)	+= ti_sci_inta_msi.o
> +obj-$(CONFIG_TI_PAT)			+= ti-pat.o
> diff --git a/drivers/soc/ti/ti-pat.c b/drivers/soc/ti/ti-pat.c
> new file mode 100644
> index 000000000000..7359ea0f7ccf
> --- /dev/null
> +++ b/drivers/soc/ti/ti-pat.c
> @@ -0,0 +1,569 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * TI PAT mapped DMA-BUF memory re-exporter
> + *
> + * Copyright (C) 2018-2019 Texas Instruments Incorporated - http://www.ti.com/
> + *	Andrew F. Davis <afd@ti.com>
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/uaccess.h>
> +#include <linux/miscdevice.h>
> +#include <linux/regmap.h>
> +#include <linux/dma-buf.h>
> +#include <linux/genalloc.h>
> +#include <linux/vmalloc.h>
> +#include <linux/slab.h>
> +
> +#include <linux/ti-pat.h>
> +
> +#define TI_PAT_DRIVER_NAME	"ti-pat"

Why do you have a define for this seeing it is only used in single location?

> +
> +/* TI PAT MMRS registers */
> +#define TI_PAT_MMRS_PID		0x0 /* Revision Register */
> +#define TI_PAT_MMRS_CONFIG	0x4 /* Config Register */
> +#define TI_PAT_MMRS_CONTROL	0x10 /* Control Register */
> +
> +/* TI PAT CONTROL register field values */
> +#define TI_PAT_CONTROL_ARB_MODE_UF	0x0 /* Updates first */
> +#define TI_PAT_CONTROL_ARB_MODE_RR	0x2 /* Round-robin */
> +
> +#define TI_PAT_CONTROL_PAGE_SIZE_4KB	0x0
> +#define TI_PAT_CONTROL_PAGE_SIZE_16KB	0x1
> +#define TI_PAT_CONTROL_PAGE_SIZE_64KB	0x2
> +#define TI_PAT_CONTROL_PAGE_SIZE_1MB	0x3
> +
> +static unsigned int ti_pat_page_sizes[] = {
> +	[TI_PAT_CONTROL_PAGE_SIZE_4KB]  = 4 * 1024,
> +	[TI_PAT_CONTROL_PAGE_SIZE_16KB] = 16 * 1024,
> +	[TI_PAT_CONTROL_PAGE_SIZE_64KB] = 64 * 1024,
> +	[TI_PAT_CONTROL_PAGE_SIZE_1MB]  = 1024 * 1024,
> +};
> +
> +enum ti_pat_mmrs_fields {
> +	/* Revision */
> +	F_PID_MAJOR,
> +	F_PID_MINOR,
> +
> +	/* Controls */
> +	F_CONTROL_ARB_MODE,
> +	F_CONTROL_PAGE_SIZE,
> +	F_CONTROL_REPLACE_OID_EN,
> +	F_CONTROL_EN,
> +
> +	/* sentinel */
> +	F_MAX_FIELDS
> +};
> +
> +static const struct reg_field ti_pat_mmrs_reg_fields[] = {
> +	/* Revision */
> +	[F_PID_MAJOR]			= REG_FIELD(TI_PAT_MMRS_PID, 8, 10),
> +	[F_PID_MINOR]			= REG_FIELD(TI_PAT_MMRS_PID, 0, 5),
> +	/* Controls */
> +	[F_CONTROL_ARB_MODE]		= REG_FIELD(TI_PAT_MMRS_CONTROL, 6, 7),
> +	[F_CONTROL_PAGE_SIZE]		= REG_FIELD(TI_PAT_MMRS_CONTROL, 4, 5),
> +	[F_CONTROL_REPLACE_OID_EN]	= REG_FIELD(TI_PAT_MMRS_CONTROL, 1, 1),
> +	[F_CONTROL_EN]			= REG_FIELD(TI_PAT_MMRS_CONTROL, 0, 0),
> +};
> +
> +/**
> + * struct ti_pat_data - PAT device instance data
> + * @dev: PAT device structure
> + * @mdev: misc device
> + * @mmrs_map: Register map of MMRS region
> + * @table_base: Base address of TABLE region

Please add kerneldoc comments for all fields.

> + */
> +struct ti_pat_data {
> +	struct device *dev;
> +	struct miscdevice mdev;
> +	struct regmap *mmrs_map;
> +	struct regmap_field *mmrs_fields[F_MAX_FIELDS];
> +	void __iomem *table_base;
> +	unsigned int page_count;
> +	unsigned int page_size;
> +	phys_addr_t window_base;
> +	struct gen_pool *pool;
> +};
> +

Kerneldoc comments for below structs would be also useful, especially 
for ti_pat_buffer.

> +struct ti_pat_dma_buf_attachment {
> +	struct device *dev;
> +	struct sg_table *table;
> +	struct ti_pat_buffer *buffer;
> +	struct list_head list;
> +};
> +
> +struct ti_pat_buffer {
> +	struct ti_pat_data *pat;
> +	struct dma_buf *i_dma_buf;
> +	size_t size;
> +	unsigned long offset;
> +	struct dma_buf *e_dma_buf;
> +
> +	struct dma_buf_attachment *attachment;
> +	struct sg_table *sgt;
> +
> +	struct list_head attachments;
> +	int map_count;
> +
> +	struct mutex lock;
> +};
> +
> +static const struct regmap_config ti_pat_regmap_config = {
> +	.reg_bits = 32,
> +	.val_bits = 32,
> +	.reg_stride = 4,
> +};
> +
> +static int ti_pat_dma_buf_attach(struct dma_buf *dmabuf,
> +				 struct dma_buf_attachment *attachment)
> +{
> +	struct ti_pat_dma_buf_attachment *a;
> +	struct ti_pat_buffer *buffer = dmabuf->priv;
> +
> +	a = kzalloc(sizeof(*a), GFP_KERNEL);
> +	if (!a)
> +		return -ENOMEM;
> +
> +	a->dev = attachment->dev;
> +	a->buffer = buffer;
> +	INIT_LIST_HEAD(&a->list);
> +
> +	a->table = kzalloc(sizeof(*a->table), GFP_KERNEL);
> +	if (!a->table) {
> +		kfree(a);
> +		return -ENOMEM;
> +	}
> +
> +	if (sg_alloc_table(a->table, 1, GFP_KERNEL)) {
> +		kfree(a->table);
> +		kfree(a);
> +		return -ENOMEM;
> +	}
> +
> +	sg_set_page(a->table->sgl, pfn_to_page(PFN_DOWN(buffer->offset)), buffer->size, 0);
> +
> +	attachment->priv = a;
> +
> +	mutex_lock(&buffer->lock);
> +	/* First time attachment we attach to parent */
> +	if (list_empty(&buffer->attachments)) {
> +		buffer->attachment = dma_buf_attach(buffer->i_dma_buf, buffer->pat->dev);
> +		if (IS_ERR(buffer->attachment)) {
> +			dev_err(buffer->pat->dev, "Unable to attach to parent DMA-BUF\n");
> +			mutex_unlock(&buffer->lock);
> +			kfree(a->table);
> +			kfree(a);
> +			return PTR_ERR(buffer->attachment);
> +		}
> +	}
> +	list_add(&a->list, &buffer->attachments);
> +	mutex_unlock(&buffer->lock);
> +
> +	return 0;
> +}
> +
> +static void ti_pat_dma_buf_detatch(struct dma_buf *dmabuf,
> +				   struct dma_buf_attachment *attachment)

Func name should be ti_pat_dma_buf_detach instead?

Other than that, I can't see anything obvious with my limited experience 
with dma_buf. Is there a simple way to test this driver btw?

-Tero

--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Support for TI Page-based Address Translator
  2019-06-07 19:35 [RFC PATCH 0/2] Support for TI Page-based Address Translator Andrew F. Davis via iommu
  2019-06-07 19:35 ` [RFC PATCH 1/2] dt-bindings: soc: ti: Add TI PAT bindings Andrew F. Davis via iommu
  2019-06-07 19:35 ` [RFC PATCH 2/2] soc: ti: Add Support for the TI Page-based Address Translator (PAT) Andrew F. Davis via iommu
@ 2019-07-02 15:49 ` Robin Murphy
  2 siblings, 0 replies; 5+ messages in thread
From: Robin Murphy @ 2019-07-02 15:49 UTC (permalink / raw)
  To: Andrew F. Davis, Rob Herring, Mark Rutland, Santosh Shilimkar,
	Will Deacon, Joerg Roedel, David Airlie, Daniel Vetter,
	Tero Kristo, William Mills, Tomi Valkeinen, Sumit Semwal,
	John Stultz
  Cc: devicetree, linux-kernel, dri-devel, linaro-mm-sig, iommu,
	linux-arm-kernel, linux-media

On 07/06/2019 20:35, Andrew F. Davis wrote:
> Hello all,
> 
> So I've got a new IP on our new SoC I'm looking to make use of and would
> like some help figuring out what framework best matches its function. The
> IP is called a "Page-based Address Translator" or PAT. A PAT instance
> (there are 5 of these things on our J721e device[0]) is basically a
> really simple IOMMU sitting on the interconnect between the device bus
> and what is effectively our northbridge called
> MSMC (DRAM/SRAM/L3-Cache/Coherency controller).
> 
> Simplified it looks about like this:
> 
>           CPUs
>            |
> DRAM --- MSMC --- SRAM/L3
>            |
>          NAVSS - (PATs)
>            |
>    --- Device Bus ---------
>   |      |      |          |
> Device  Device  Device   etc..
> 
> Each PAT has a set a window in high memory (about 0x48_0000_0000 area)
> for which any transaction with an address targeting its window will be
> routed into that PAT. The PAT then does a simple calculation based on
> the how far into the window the address is and the current page size,
> does a lookup to an internally held table of translations, then sends the
> transaction back out on the interconnect with a new address. Usually this
> address should be towards somewhere in DRAM, but can be some other device
> or even back into PAT (I'm not sure there is a valid use-case for this
> but just a point of interest).
> 
> My gut reaction is that this is an IOMMU which belongs in the IOMMU
> subsystem. But there are a couple oddities that make me less sure it is
> really suitable for the IOMMU framework. First it doesn't sit in front of
> any devices, it sits in front of *all* devices, this means we would have
> every device claim it as an IOMMU parent, even though many devices also
> have a traditional IOMMU connected. Second, there is only a limited
> window of address space per PAT, this means we will get fragmentation and
> allocation failures on occasion, in this way it looks to me more like AGP
> GART. Third, the window is in high-memory, so unlike some IOMMU devices
> which can be used to allow DMA to high-mem from low-mem only devices, PAT
> can't be used for that. Lastly it doesn't provide any isolation, if the
> access does not target the PAT window it is not used (that is not to say
> we don't have isolation, just that it is taken care of by other parts of
> the interconnect).
> 
> This means, to me, that PAT has one main purpose: making
> physically-contiguous views of scattered pages in system memory for DMA.
> But it does that really well, the whole translation table is held in a
> PAT-internal SRAM giving 1 bus cycle latency and at full bus bandwidth.
> 
> So what are my options here, is IOMMU the right way to go or not?

FWIW, that sounds almost exactly like my (vague) understanding of other 
GARTs, and as such should be pretty well manageable via the IOMMU API - 
we already have tegra-gart, for example. The aperture contention issue 
could certainly be mitigated by letting the firmware claim it's only 
associated with the display and any other devices which really need it.

A further interesting avenue of investigation - now that Christoph's 
recent work has made it much more possible - would be a second set of 
IOMMU DMA ops tailored for "GART-like" domains where force_aperture=0, 
which could behave as dma-direct wherever possible and only use IOMMU 
remaps when absolutely necessary.

Robin.

> Looking around the kernel I also see the char dev ARP/GART interface
> which looks like a good fit, but also looks quite dated and my guess
> deprecated at this point. Moving right along..
> 
> Another thing I saw is we have the support upstream of the DMM device[1]
> available in some OMAPx/AM57x SoCs. I'm a little more familiar with this
> device. The DMM is a bundle of IPs and in fact one of them is called
> "PAT" and it even does basically the same thing this incarnation of "PAT"
> does. It's upstream integration design is a bit questionable
> unfortunately, the DMM support was integrated into the OMAPDRM display
> driver, which does make some sense then given its support for rotation
> (using TILER IP contained in DMM). The issue with this was that the
> DMM/TILER/PAT IP was not part of the our display IP, but instead out at
> the end of the shared device bus, inside the external memory controller.
> Like this new PAT this meant that any IP that could make use of it, but
> only the display framework could actually provide buffers backed by it.
> This meant, for instance, if we wanted to decode some video buffer using
> our video decoder we would have to allocate from DRM framework then pass
> that over to the V4L2 system. This doesn't make much sense and required
> the user-space to know about this odd situation and allocate from the
> right spot or else have to use up valuable CMA space or waste memory with
> dedicated carveouts.
> 
> Another idea would be to have this as a special central allocator
> (exposed through DMA-BUF heaps[2] or ION) that would give out normal
> system memory as a DMA-BUF but remap it with PAT if a device that only
> supports contiguous memory tries to attach/map that DMA-BUF.
> 
> One last option would be to allow user-space to choose to make the buffer
> contiguous when it needs. That's what the driver in this series allows.
> We expose a remapping device, user-space passes it a non-contiguous
> DMA-BUF handle and it passes a contiguous one back. Simple as that.
> 
> So how do we use this, lets take Android for example, we don't know at
> allocation time if a rendering buffer will end up going back into the GPU
> for further processing, or if it will be consumed directly by the display.
> This is a problem for us as our GPU can work with non-contiguous buffers
> but our display cannot, so any buffers that could possibly go to the
> display at some point currently needs to be allocated as contiguous from
> the start, this leads to a lot of unneeded use of carveout/CMA memory.
> With this driver on the other hand, we allocate regular non-contiguous
> system memory (again using DMA-BUF heaps, but ION could work here too),
> then only when a buffer is about to be sent to the display we pass the
> handle to this DMA-BUF to our driver here and take the handle it gives
> back and pass that to the display instead.
> 
> As said, it is probably not the ideal solution but it does work and was
> used for some early testing of the IP.
> 
> Well, sorry for the wall of text.
> Any and all suggestions very welcome and appreciated.
> 
> Thanks,
> Andrew
> 
> [0] http://www.ti.com/lit/pdf/spruil1
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> [2] https://lkml.org/lkml/2019/6/6/1211
> 
> Andrew F. Davis (2):
>    dt-bindings: soc: ti: Add TI PAT bindings
>    soc: ti: Add Support for the TI Page-based Address Translator (PAT)
> 
>   .../devicetree/bindings/misc/ti,pat.txt       |  34 ++
>   drivers/soc/ti/Kconfig                        |   9 +
>   drivers/soc/ti/Makefile                       |   1 +
>   drivers/soc/ti/ti-pat.c                       | 569 ++++++++++++++++++
>   include/uapi/linux/ti-pat.h                   |  44 ++
>   5 files changed, 657 insertions(+)
>   create mode 100644 Documentation/devicetree/bindings/misc/ti,pat.txt
>   create mode 100644 drivers/soc/ti/ti-pat.c
>   create mode 100644 include/uapi/linux/ti-pat.h
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-07 19:35 [RFC PATCH 0/2] Support for TI Page-based Address Translator Andrew F. Davis via iommu
2019-06-07 19:35 ` [RFC PATCH 1/2] dt-bindings: soc: ti: Add TI PAT bindings Andrew F. Davis via iommu
2019-06-07 19:35 ` [RFC PATCH 2/2] soc: ti: Add Support for the TI Page-based Address Translator (PAT) Andrew F. Davis via iommu
2019-06-18  9:07   ` Tero Kristo via iommu
2019-07-02 15:49 ` [RFC PATCH 0/2] Support for TI Page-based Address Translator Robin Murphy

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org iommu@archiver.kernel.org
	public-inbox-index linux-iommu


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/ public-inbox