dmaengine Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support
@ 2019-07-30  9:34 Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 01/14] bindings: soc: ti: add documentation for k3 ringacc Peter Ujfalusi
                   ` (15 more replies)
  0 siblings, 16 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Changes since v1
(https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*)
- Added support for j721e
- Based on 5.3-rc2
- dropped ti_sci API patch for RM management as it is already upstream
- dropped dmadev_get_slave_channel() patch, using __dma_request_channel()
- Added Rob's Reviewed-by to ringacc DT binding document patch
- DT bindings changes:
 - linux,udma-mode is gone, I have a simple lookup table in the driver to flag
   TR channels.
 - Support for j721e
- Fix bug in of_node_put() handling in xlate function

Changes since RFC (https://patchwork.kernel.org/cover/10612465/):
- Based on linux-next (20190506) which now have the ti_sci interrupt support
- The series can be applied and the UDMA via DMAengine API will be functional
- Included in the series: ti_sci Resource management API, cppi5 header and
  driver for the ring accelerator.
- The DMAengine core patches have been updated as per the review comments for
  earlier submittion.
- The DMAengine driver patch is artificially split up to 6 smaller patches

The k3-udma driver implements the Data Movement Architecture described in
AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and
j721e TRM (http://www.ti.com/lit/pdf/spruil1)

This DMA architecture is a big departure from 'traditional' architecture where
we had either EDMA or sDMA as system DMA.

Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)
or USB (am335x) while other peripherals were serviced by EDMA.

In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the
SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). 

The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)
and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and
TR mode (similar to EDMA descriptor).
The data movement is done within a PSI-L fabric, peripherals (including the
UDMA-P) are not addressed by their I/O register as with traditional DMAs but
with their PSI-L thread ID.

In AM65x/j721e we have two main type of peripherals:
Legacy: McASP, McSPI, UART, etc.
 to provide connectivity they are serviced by PDMA (Peripheral DMA)
 PDMA threads are locked to service a given peripheral, for example PSI-L thread
 0x4400/0xc400 is to service McASP0 rx/tx.
 The PDMa configuration can be done via the UDMA Real Time Peer registers.
Native: Networking, security accelerator
 these peripherals have native support for PSI-L.

To be able to use the DMA the following generic steps need to be taken:
- configure a DMA channel (tchan for TX, rchan for RX)
 - channel mode: Packet or TR mode
 - for memcpy a tchan and rchan pair is used.
 - for packet mode RX we also need to configure a receive flow to configure the
   packet receiption
- the source and destination threads must be paired
- at minimum one pair of rings need to be configured:
 - tx: transfer ring and transfer completion ring
 - rx: free descriptor ring and receive ring
- two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring
 - If the channel is in packet mode or configured to memcpy then we only need
   one interrupt from the ring, events from UDMAP is not used.

When the channel setup is completed we only interract with the rings:
- TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by
  the UDMA-P
- RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to
  the r_ring.

Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the
case in previous DMAs we need to report the amount of data held in these FIFOs
to clients (delay calculation for ALSA, UART FIFO flush support).

Metadata support:
DMAengine user driver was posted upstream based/tested on the v1 of the UDMA
series: https://lkml.org/lkml/2019/6/28/20
SA2UL is using the metadata DMAengine API.

Note on the last patch:
In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case
anymore and the DMAengine API currently missing support for the features we
would need to support networking, things like
- support for receive descriptor 'classification'
 - we need to support several receive queues for a channel.
 - the queues are used for packet priority handling for example, but they can be
   used to have pools of descriptors for different sizes.
- out of order completion of descriptors on a channel
 - when we have several queues to handle different priority packets the
   descriptors will be completed 'out-of-order'
- NAPI type of operation (polling instead of interrupt driven transfer)
 - without this we can not sustain gigabit speeds and we need to support NAPI
 - not to limit this to networking, but other high performance operations

It is my intention to work on these to be able to remove the 'glue' layer and
switch to DMAengine API - or have an API aside of DMAengine to have generic way
to support networking, but given how controversial and not trivial these changes
are we need something to support networking.

The series (+DT patch to enabled UDMA/PDMA on AM65x) on top of 5.3-rc2 is
available:
https://github.com/omap-audio/linux-audio.git peter/udma/series_v2-5.3-rc2

Regards,
Peter
---
Grygorii Strashko (3):
  bindings: soc: ti: add documentation for k3 ringacc
  soc: ti: k3: add navss ringacc driver
  dmaengine: ti: k3-udma: Add glue layer for non DMAengine users

Peter Ujfalusi (11):
  dmaengine: doc: Add sections for per descriptor metadata support
  dmaengine: Add metadata_ops for dma_async_tx_descriptor
  dmaengine: Add support for reporting DMA cached data amount
  dmaengine: ti: Add cppi5 header for UDMA
  dt-bindings: dma: ti: Add document for K3 UDMA
  dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io
    func
  dmaengine: ti: New driver for K3 UDMA - split#2: probe/remove, xlate
    and filter_fn
  dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free
    chan_resources
  dmaengine: ti: New driver for K3 UDMA - split#4: dma_device callbacks
    1
  dmaengine: ti: New driver for K3 UDMA - split#5: dma_device callbacks
    2
  dmaengine: ti: New driver for K3 UDMA - split#6: Kconfig and Makefile

 .../devicetree/bindings/dma/ti/k3-udma.txt    |  170 +
 .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +
 Documentation/driver-api/dmaengine/client.rst |   75 +
 .../driver-api/dmaengine/provider.rst         |   46 +
 drivers/dma/dmaengine.c                       |   73 +
 drivers/dma/dmaengine.h                       |    8 +
 drivers/dma/ti/Kconfig                        |   22 +
 drivers/dma/ti/Makefile                       |    2 +
 drivers/dma/ti/k3-udma-glue.c                 | 1039 +++++
 drivers/dma/ti/k3-udma-private.c              |  124 +
 drivers/dma/ti/k3-udma.c                      | 3479 +++++++++++++++++
 drivers/dma/ti/k3-udma.h                      |  160 +
 drivers/soc/ti/Kconfig                        |   17 +
 drivers/soc/ti/Makefile                       |    1 +
 drivers/soc/ti/k3-ringacc.c                   | 1191 ++++++
 include/dt-bindings/dma/k3-udma.h             |   10 +
 include/linux/dma/k3-udma-glue.h              |  125 +
 include/linux/dma/ti-cppi5.h                  |  996 +++++
 include/linux/dmaengine.h                     |  110 +
 include/linux/soc/ti/k3-ringacc.h             |  262 ++
 20 files changed, 7969 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
 create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt
 create mode 100644 drivers/dma/ti/k3-udma-glue.c
 create mode 100644 drivers/dma/ti/k3-udma-private.c
 create mode 100644 drivers/dma/ti/k3-udma.c
 create mode 100644 drivers/dma/ti/k3-udma.h
 create mode 100644 drivers/soc/ti/k3-ringacc.c
 create mode 100644 include/dt-bindings/dma/k3-udma.h
 create mode 100644 include/linux/dma/k3-udma-glue.h
 create mode 100644 include/linux/dma/ti-cppi5.h
 create mode 100644 include/linux/soc/ti/k3-ringacc.h

-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 01/14] bindings: soc: ti: add documentation for k3 ringacc
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver Peter Ujfalusi
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

From: Grygorii Strashko <grygorii.strashko@ti.com>

The Ring Accelerator (RINGACC or RA) provides hardware acceleration to
enable straightforward passing of work between a producer and a consumer.
There is one RINGACC module per NAVSS on TI AM65x and j721e.

This patch introduces RINGACC device tree bindings.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
 .../devicetree/bindings/soc/ti/k3-ringacc.txt | 59 +++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt

diff --git a/Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt b/Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt
new file mode 100644
index 000000000000..86954cf4fa99
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt
@@ -0,0 +1,59 @@
+* Texas Instruments K3 NavigatorSS Ring Accelerator
+
+The Ring Accelerator (RA) is a machine which converts read/write accesses
+from/to a constant address into corresponding read/write accesses from/to a
+circular data structure in memory. The RA eliminates the need for each DMA
+controller which needs to access ring elements from having to know the current
+state of the ring (base address, current offset). The DMA controller
+performs a read or write access to a specific address range (which maps to the
+source interface on the RA) and the RA replaces the address for the transaction
+with a new address which corresponds to the head or tail element of the ring
+(head for reads, tail for writes).
+
+The Ring Accelerator is a hardware module that is responsible for accelerating
+management of the packet queues. The K3 SoCs can have more than one RA instances
+
+Required properties:
+- compatible	: Must be "ti,am654-navss-ringacc";
+- reg		: Should contain register location and length of the following
+		  named register regions.
+- reg-names	: should be
+		  "rt" - The RA Ring Real-time Control/Status Registers
+		  "fifos" - The RA Queues Registers
+		  "proxy_gcfg" - The RA Proxy Global Config Registers
+		  "proxy_target" - The RA Proxy Datapath Registers
+- ti,num-rings	: Number of rings supported by RA
+- ti,sci-rm-range-gp-rings : TI-SCI RM subtype for GP ring range
+- ti,sci	: phandle on TI-SCI compatible System controller node
+- ti,sci-dev-id	: TI-SCI device id
+- msi-parent	: phandle for "ti,sci-inta" interrupt controller
+
+Optional properties:
+ -- ti,dma-ring-reset-quirk : enable ringacc / udma ring state interoperability
+		  issue software w/a
+
+Example:
+
+ringacc: ringacc@3c000000 {
+	compatible = "ti,am654-navss-ringacc";
+	reg =	<0x0 0x3c000000 0x0 0x400000>,
+		<0x0 0x38000000 0x0 0x400000>,
+		<0x0 0x31120000 0x0 0x100>,
+		<0x0 0x33000000 0x0 0x40000>;
+	reg-names = "rt", "fifos",
+		    "proxy_gcfg", "proxy_target";
+	ti,num-rings = <818>;
+	ti,sci-rm-range-gp-rings = <0x2>; /* GP ring range */
+	ti,dma-ring-reset-quirk;
+	ti,sci = <&dmsc>;
+	ti,sci-dev-id = <187>;
+	msi-parent = <&inta_main_udmass>;
+};
+
+client:
+
+dma_ipx: dma_ipx@<addr> {
+	...
+	ti,ringacc = <&ringacc>;
+	...
+}
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 01/14] bindings: soc: ti: add documentation for k3 ringacc Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-08-30 12:57   ` Peter Ujfalusi
  2019-09-09  6:09   ` Tero Kristo
  2019-07-30  9:34 ` [PATCH v2 03/14] dmaengine: doc: Add sections for per descriptor metadata support Peter Ujfalusi
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

From: Grygorii Strashko <grygorii.strashko@ti.com>

The Ring Accelerator (RINGACC or RA) provides hardware acceleration to
enable straightforward passing of work between a producer and a consumer.
There is one RINGACC module per NAVSS on TI AM65x SoCs.

The RINGACC converts constant-address read and write accesses to equivalent
read or write accesses to a circular data structure in memory. The RINGACC
eliminates the need for each DMA controller which needs to access ring
elements from having to know the current state of the ring (base address,
current offset). The DMA controller performs a read or write access to a
specific address range (which maps to the source interface on the RINGACC)
and the RINGACC replaces the address for the transaction with a new address
which corresponds to the head or tail element of the ring (head for reads,
tail for writes). Since the RINGACC maintains the state, multiple DMA
controllers or channels are allowed to coherently share the same rings as
applicable. The RINGACC is able to place data which is destined towards
software into cached memory directly.

Supported ring modes:
- Ring Mode
- Messaging Mode
- Credentials Mode
- Queue Manager Mode

TI-SCI integration:

Texas Instrument's System Control Interface (TI-SCI) Message Protocol now
has control over Ringacc module resources management (RM) and Rings
configuration.

The corresponding support of TI-SCI Ringacc module RM protocol
introduced as option through DT parameters:
- ti,sci: phandle on TI-SCI firmware controller DT node
- ti,sci-dev-id: TI-SCI device identifier as per TI-SCI firmware spec

if both parameters present - Ringacc driver will configure/free/reset Rings
using TI-SCI Message Ringacc RM Protocol.

The Ringacc driver manages Rings allocation by itself now and requests
TI-SCI firmware to allocate and configure specific Rings only. It's done
this way because, Linux driver implements two stage Rings allocation and
configuration (allocate ring and configure ring) while I-SCI Message
Protocol supports only one combined operation (allocate+configure).

Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/soc/ti/Kconfig            |   17 +
 drivers/soc/ti/Makefile           |    1 +
 drivers/soc/ti/k3-ringacc.c       | 1191 +++++++++++++++++++++++++++++
 include/linux/soc/ti/k3-ringacc.h |  262 +++++++
 4 files changed, 1471 insertions(+)
 create mode 100644 drivers/soc/ti/k3-ringacc.c
 create mode 100644 include/linux/soc/ti/k3-ringacc.h

diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
index cf545f428d03..10c76faa503e 100644
--- a/drivers/soc/ti/Kconfig
+++ b/drivers/soc/ti/Kconfig
@@ -80,6 +80,23 @@ config TI_SCI_PM_DOMAINS
 	  called ti_sci_pm_domains. Note this is needed early in boot before
 	  rootfs may be available.
 
+config TI_K3_RINGACC
+	tristate "K3 Ring accelerator Sub System"
+	depends on ARCH_K3 || COMPILE_TEST
+	depends on TI_SCI_INTA_IRQCHIP
+	default y
+	help
+	  Say y here to support the K3 Ring accelerator module.
+	  The Ring Accelerator (RINGACC or RA)  provides hardware acceleration
+	  to enable straightforward passing of work between a producer
+	  and a consumer. There is one RINGACC module per NAVSS on TI AM65x SoCs
+	  If unsure, say N.
+
+config TI_K3_RINGACC_DEBUG
+	tristate "K3 Ring accelerator Sub System tests and debug"
+	depends on TI_K3_RINGACC
+	default n
+
 endif # SOC_TI
 
 config TI_SCI_INTA_MSI_DOMAIN
diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
index b3868d392d4f..cc4bc8b08bf5 100644
--- a/drivers/soc/ti/Makefile
+++ b/drivers/soc/ti/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_AMX3_PM)			+= pm33xx.o
 obj-$(CONFIG_WKUP_M3_IPC)		+= wkup_m3_ipc.o
 obj-$(CONFIG_TI_SCI_PM_DOMAINS)		+= ti_sci_pm_domains.o
 obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN)	+= ti_sci_inta_msi.o
+obj-$(CONFIG_TI_K3_RINGACC)		+= k3-ringacc.o
diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
new file mode 100644
index 000000000000..401dfc963319
--- /dev/null
+++ b/drivers/soc/ti/k3-ringacc.c
@@ -0,0 +1,1191 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * TI K3 NAVSS Ring Accelerator subsystem driver
+ *
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/soc/ti/k3-ringacc.h>
+#include <linux/soc/ti/ti_sci_protocol.h>
+#include <linux/soc/ti/ti_sci_inta_msi.h>
+#include <linux/of_irq.h>
+#include <linux/irqdomain.h>
+
+static LIST_HEAD(k3_ringacc_list);
+static DEFINE_MUTEX(k3_ringacc_list_lock);
+
+#ifdef CONFIG_TI_K3_RINGACC_DEBUG
+#define	k3_nav_dbg(dev, arg...) dev_err(dev, arg)
+static	void dbg_writel(u32 v, void __iomem *reg)
+{
+	pr_err("WRITEL(32): v(%08X)-->reg(%p)\n", v, reg);
+	writel(v, reg);
+}
+
+static	u32 dbg_readl(void __iomem *reg)
+{
+	u32 v;
+
+	v = readl(reg);
+	pr_err("READL(32): v(%08X)<--reg(%p)\n", v, reg);
+	return v;
+}
+#else
+#define	k3_nav_dbg(dev, arg...) dev_dbg(dev, arg)
+#define dbg_writel(v, reg) writel(v, reg)
+
+#define dbg_readl(reg) readl(reg)
+#endif
+
+#define K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK		GENMASK(19, 0)
+
+/**
+ * struct k3_ring_rt_regs -  The RA Control/Status Registers region
+ */
+struct k3_ring_rt_regs {
+	u32	resv_16[4];
+	u32	db;		/* RT Ring N Doorbell Register */
+	u32	resv_4[1];
+	u32	occ;		/* RT Ring N Occupancy Register */
+	u32	indx;		/* RT Ring N Current Index Register */
+	u32	hwocc;		/* RT Ring N Hardware Occupancy Register */
+	u32	hwindx;		/* RT Ring N Current Index Register */
+};
+
+#define K3_RINGACC_RT_REGS_STEP	0x1000
+
+/**
+ * struct k3_ring_fifo_regs -  The Ring Accelerator Queues Registers region
+ */
+struct k3_ring_fifo_regs {
+	u32	head_data[128];		/* Ring Head Entry Data Registers */
+	u32	tail_data[128];		/* Ring Tail Entry Data Registers */
+	u32	peek_head_data[128];	/* Ring Peek Head Entry Data Regs */
+	u32	peek_tail_data[128];	/* Ring Peek Tail Entry Data Regs */
+};
+
+/**
+ * struct k3_ringacc_proxy_gcfg_regs - RA Proxy Global Config MMIO Region
+ */
+struct k3_ringacc_proxy_gcfg_regs {
+	u32	revision;	/* Revision Register */
+	u32	config;		/* Config Register */
+};
+
+#define K3_RINGACC_PROXY_CFG_THREADS_MASK		GENMASK(15, 0)
+
+/**
+ * struct k3_ringacc_proxy_target_regs -  Proxy Datapath MMIO Region
+ */
+struct k3_ringacc_proxy_target_regs {
+	u32	control;	/* Proxy Control Register */
+	u32	status;		/* Proxy Status Register */
+	u8	resv_512[504];
+	u32	data[128];	/* Proxy Data Register */
+};
+
+#define K3_RINGACC_PROXY_TARGET_STEP	0x1000
+#define K3_RINGACC_PROXY_NOT_USED	(-1)
+
+enum k3_ringacc_proxy_access_mode {
+	PROXY_ACCESS_MODE_HEAD = 0,
+	PROXY_ACCESS_MODE_TAIL = 1,
+	PROXY_ACCESS_MODE_PEEK_HEAD = 2,
+	PROXY_ACCESS_MODE_PEEK_TAIL = 3,
+};
+
+#define K3_RINGACC_FIFO_WINDOW_SIZE_BYTES  (512U)
+#define K3_RINGACC_FIFO_REGS_STEP	0x1000
+#define K3_RINGACC_MAX_DB_RING_CNT    (127U)
+
+/**
+ * struct k3_ring_ops -  Ring operations
+ */
+struct k3_ring_ops {
+	int (*push_tail)(struct k3_ring *ring, void *elm);
+	int (*push_head)(struct k3_ring *ring, void *elm);
+	int (*pop_tail)(struct k3_ring *ring, void *elm);
+	int (*pop_head)(struct k3_ring *ring, void *elm);
+};
+
+/**
+ * struct k3_ring - RA Ring descriptor
+ *
+ * @rt - Ring control/status registers
+ * @fifos - Ring queues registers
+ * @proxy - Ring Proxy Datapath registers
+ * @ring_mem_dma - Ring buffer dma address
+ * @ring_mem_virt - Ring buffer virt address
+ * @ops - Ring operations
+ * @size - Ring size in elements
+ * @elm_size - Size of the ring element
+ * @mode - Ring mode
+ * @flags - flags
+ * @free - Number of free elements
+ * @occ - Ring occupancy
+ * @windex - Write index (only for @K3_RINGACC_RING_MODE_RING)
+ * @rindex - Read index (only for @K3_RINGACC_RING_MODE_RING)
+ * @ring_id - Ring Id
+ * @parent - Pointer on struct @k3_ringacc
+ * @use_count - Use count for shared rings
+ * @proxy_id - RA Ring Proxy Id (only if @K3_RINGACC_RING_USE_PROXY)
+ */
+struct k3_ring {
+	struct k3_ring_rt_regs __iomem *rt;
+	struct k3_ring_fifo_regs __iomem *fifos;
+	struct k3_ringacc_proxy_target_regs  __iomem *proxy;
+	dma_addr_t	ring_mem_dma;
+	void		*ring_mem_virt;
+	struct k3_ring_ops *ops;
+	u32		size;
+	enum k3_ring_size elm_size;
+	enum k3_ring_mode mode;
+	u32		flags;
+#define K3_RING_FLAG_BUSY	BIT(1)
+#define K3_RING_FLAG_SHARED	BIT(2)
+	u32		free;
+	u32		occ;
+	u32		windex;
+	u32		rindex;
+	u32		ring_id;
+	struct k3_ringacc	*parent;
+	u32		use_count;
+	int		proxy_id;
+};
+
+/**
+ * struct k3_ringacc - Rings accelerator descriptor
+ *
+ * @dev - pointer on RA device
+ * @proxy_gcfg - RA proxy global config registers
+ * @proxy_target_base - RA proxy datapath region
+ * @num_rings - number of ring in RA
+ * @rm_gp_range - general purpose rings range from tisci
+ * @dma_ring_reset_quirk - DMA reset w/a enable
+ * @num_proxies - number of RA proxies
+ * @rings - array of rings descriptors (struct @k3_ring)
+ * @list - list of RAs in the system
+ * @tisci - pointer ti-sci handle
+ * @tisci_ring_ops - ti-sci rings ops
+ * @tisci_dev_id - ti-sci device id
+ */
+struct k3_ringacc {
+	struct device *dev;
+	struct k3_ringacc_proxy_gcfg_regs __iomem *proxy_gcfg;
+	void __iomem *proxy_target_base;
+	u32 num_rings; /* number of rings in Ringacc module */
+	unsigned long *rings_inuse;
+	struct ti_sci_resource *rm_gp_range;
+
+	bool dma_ring_reset_quirk;
+	u32 num_proxies;
+	unsigned long *proxy_inuse;
+
+	struct k3_ring *rings;
+	struct list_head list;
+	struct mutex req_lock; /* protect rings allocation */
+
+	const struct ti_sci_handle *tisci;
+	const struct ti_sci_rm_ringacc_ops *tisci_ring_ops;
+	u32  tisci_dev_id;
+};
+
+static long k3_ringacc_ring_get_fifo_pos(struct k3_ring *ring)
+{
+	return K3_RINGACC_FIFO_WINDOW_SIZE_BYTES -
+	       (4 << ring->elm_size);
+}
+
+static void *k3_ringacc_get_elm_addr(struct k3_ring *ring, u32 idx)
+{
+	return (idx * (4 << ring->elm_size) + ring->ring_mem_virt);
+}
+
+static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem);
+static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem);
+
+static struct k3_ring_ops k3_ring_mode_ring_ops = {
+		.push_tail = k3_ringacc_ring_push_mem,
+		.pop_head = k3_ringacc_ring_pop_mem,
+};
+
+static int k3_ringacc_ring_push_io(struct k3_ring *ring, void *elem);
+static int k3_ringacc_ring_pop_io(struct k3_ring *ring, void *elem);
+static int k3_ringacc_ring_push_head_io(struct k3_ring *ring, void *elem);
+static int k3_ringacc_ring_pop_tail_io(struct k3_ring *ring, void *elem);
+
+static struct k3_ring_ops k3_ring_mode_msg_ops = {
+		.push_tail = k3_ringacc_ring_push_io,
+		.push_head = k3_ringacc_ring_push_head_io,
+		.pop_tail = k3_ringacc_ring_pop_tail_io,
+		.pop_head = k3_ringacc_ring_pop_io,
+};
+
+static int k3_ringacc_ring_push_head_proxy(struct k3_ring *ring, void *elem);
+static int k3_ringacc_ring_push_tail_proxy(struct k3_ring *ring, void *elem);
+static int k3_ringacc_ring_pop_head_proxy(struct k3_ring *ring, void *elem);
+static int k3_ringacc_ring_pop_tail_proxy(struct k3_ring *ring, void *elem);
+
+static struct k3_ring_ops k3_ring_mode_proxy_ops = {
+		.push_tail = k3_ringacc_ring_push_tail_proxy,
+		.push_head = k3_ringacc_ring_push_head_proxy,
+		.pop_tail = k3_ringacc_ring_pop_tail_proxy,
+		.pop_head = k3_ringacc_ring_pop_head_proxy,
+};
+
+#ifdef CONFIG_TI_K3_RINGACC_DEBUG
+void k3_ringacc_ring_dump(struct k3_ring *ring)
+{
+	struct device *dev = ring->parent->dev;
+
+	k3_nav_dbg(dev, "dump ring: %d\n", ring->ring_id);
+	k3_nav_dbg(dev, "dump mem virt %p, dma %pad\n",
+		   ring->ring_mem_virt, &ring->ring_mem_dma);
+	k3_nav_dbg(dev, "dump elmsize %d, size %d, mode %d, proxy_id %d\n",
+		   ring->elm_size, ring->size, ring->mode, ring->proxy_id);
+
+	k3_nav_dbg(dev, "dump ring_rt_regs: db%08x\n",
+		   readl(&ring->rt->db));
+	k3_nav_dbg(dev, "dump occ%08x\n",
+		   readl(&ring->rt->occ));
+	k3_nav_dbg(dev, "dump indx%08x\n",
+		   readl(&ring->rt->indx));
+	k3_nav_dbg(dev, "dump hwocc%08x\n",
+		   readl(&ring->rt->hwocc));
+	k3_nav_dbg(dev, "dump hwindx%08x\n",
+		   readl(&ring->rt->hwindx));
+
+	if (ring->ring_mem_virt)
+		print_hex_dump(KERN_ERR, "dump ring_mem_virt ",
+			       DUMP_PREFIX_NONE, 16, 1,
+			       ring->ring_mem_virt, 16 * 8, false);
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_dump);
+#endif
+
+struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
+					int id, u32 flags)
+{
+	int proxy_id = K3_RINGACC_PROXY_NOT_USED;
+
+	mutex_lock(&ringacc->req_lock);
+
+	if (id == K3_RINGACC_RING_ID_ANY) {
+		/* Request for any general purpose ring */
+		struct ti_sci_resource_desc *gp_rings =
+						&ringacc->rm_gp_range->desc[0];
+		unsigned long size;
+
+		size = gp_rings->start + gp_rings->num;
+		id = find_next_zero_bit(ringacc->rings_inuse, size,
+					gp_rings->start);
+		if (id == size)
+			goto error;
+	} else if (id < 0) {
+		goto error;
+	}
+
+	if (test_bit(id, ringacc->rings_inuse) &&
+	    !(ringacc->rings[id].flags & K3_RING_FLAG_SHARED))
+		goto error;
+	else if (ringacc->rings[id].flags & K3_RING_FLAG_SHARED)
+		goto out;
+
+	if (flags & K3_RINGACC_RING_USE_PROXY) {
+		proxy_id = find_next_zero_bit(ringacc->proxy_inuse,
+					      ringacc->num_proxies, 0);
+		if (proxy_id == ringacc->num_proxies)
+			goto error;
+	}
+
+	if (!try_module_get(ringacc->dev->driver->owner))
+		goto error;
+
+	if (proxy_id != K3_RINGACC_PROXY_NOT_USED) {
+		set_bit(proxy_id, ringacc->proxy_inuse);
+		ringacc->rings[id].proxy_id = proxy_id;
+		k3_nav_dbg(ringacc->dev, "Giving ring#%d proxy#%d\n",
+			   id, proxy_id);
+	} else {
+		k3_nav_dbg(ringacc->dev, "Giving ring#%d\n", id);
+	}
+
+	set_bit(id, ringacc->rings_inuse);
+out:
+	ringacc->rings[id].use_count++;
+	mutex_unlock(&ringacc->req_lock);
+	return &ringacc->rings[id];
+
+error:
+	mutex_unlock(&ringacc->req_lock);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_request_ring);
+
+static void k3_ringacc_ring_reset_sci(struct k3_ring *ring)
+{
+	struct k3_ringacc *ringacc = ring->parent;
+	int ret;
+
+	ret = ringacc->tisci_ring_ops->config(
+			ringacc->tisci,
+			TI_SCI_MSG_VALUE_RM_RING_COUNT_VALID,
+			ringacc->tisci_dev_id,
+			ring->ring_id,
+			0,
+			0,
+			ring->size,
+			0,
+			0,
+			0);
+	if (ret)
+		dev_err(ringacc->dev, "TISCI reset ring fail (%d) ring_idx %d\n",
+			ret, ring->ring_id);
+}
+
+void k3_ringacc_ring_reset(struct k3_ring *ring)
+{
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return;
+
+	ring->occ = 0;
+	ring->free = 0;
+	ring->rindex = 0;
+	ring->windex = 0;
+
+	k3_ringacc_ring_reset_sci(ring);
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset);
+
+static void k3_ringacc_ring_reconfig_qmode_sci(struct k3_ring *ring,
+					       enum k3_ring_mode mode)
+{
+	struct k3_ringacc *ringacc = ring->parent;
+	int ret;
+
+	ret = ringacc->tisci_ring_ops->config(
+			ringacc->tisci,
+			TI_SCI_MSG_VALUE_RM_RING_MODE_VALID,
+			ringacc->tisci_dev_id,
+			ring->ring_id,
+			0,
+			0,
+			0,
+			mode,
+			0,
+			0);
+	if (ret)
+		dev_err(ringacc->dev, "TISCI reconf qmode fail (%d) ring_idx %d\n",
+			ret, ring->ring_id);
+}
+
+void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ)
+{
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return;
+
+	if (!ring->parent->dma_ring_reset_quirk)
+		return;
+
+	if (!occ)
+		occ = dbg_readl(&ring->rt->occ);
+
+	if (occ) {
+		u32 db_ring_cnt, db_ring_cnt_cur;
+
+		k3_nav_dbg(ring->parent->dev, "%s %u occ: %u\n", __func__,
+			   ring->ring_id, occ);
+		/* 2. Reset the ring */
+		k3_ringacc_ring_reset_sci(ring);
+
+		/*
+		 * 3. Setup the ring in ring/doorbell mode
+		 * (if not already in this mode)
+		 */
+		if (ring->mode != K3_RINGACC_RING_MODE_RING)
+			k3_ringacc_ring_reconfig_qmode_sci(
+					ring, K3_RINGACC_RING_MODE_RING);
+		/*
+		 * 4. Ring the doorbell 2**22 – ringOcc times.
+		 * This will wrap the internal UDMAP ring state occupancy
+		 * counter (which is 21-bits wide) to 0.
+		 */
+		db_ring_cnt = (1U << 22) - occ;
+
+		while (db_ring_cnt != 0) {
+			/*
+			 * Ring the doorbell with the maximum count each
+			 * iteration if possible to minimize the total
+			 * of writes
+			 */
+			if (db_ring_cnt > K3_RINGACC_MAX_DB_RING_CNT)
+				db_ring_cnt_cur = K3_RINGACC_MAX_DB_RING_CNT;
+			else
+				db_ring_cnt_cur = db_ring_cnt;
+
+			writel(db_ring_cnt_cur, &ring->rt->db);
+			db_ring_cnt -= db_ring_cnt_cur;
+		}
+
+		/* 5. Restore the original ring mode (if not ring mode) */
+		if (ring->mode != K3_RINGACC_RING_MODE_RING)
+			k3_ringacc_ring_reconfig_qmode_sci(ring, ring->mode);
+	}
+
+	/* 2. Reset the ring */
+	k3_ringacc_ring_reset(ring);
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset_dma);
+
+static void k3_ringacc_ring_free_sci(struct k3_ring *ring)
+{
+	struct k3_ringacc *ringacc = ring->parent;
+	int ret;
+
+	ret = ringacc->tisci_ring_ops->config(
+			ringacc->tisci,
+			TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
+			ringacc->tisci_dev_id,
+			ring->ring_id,
+			0,
+			0,
+			0,
+			0,
+			0,
+			0);
+	if (ret)
+		dev_err(ringacc->dev, "TISCI ring free fail (%d) ring_idx %d\n",
+			ret, ring->ring_id);
+}
+
+int k3_ringacc_ring_free(struct k3_ring *ring)
+{
+	struct k3_ringacc *ringacc;
+
+	if (!ring)
+		return -EINVAL;
+
+	ringacc = ring->parent;
+
+	k3_nav_dbg(ring->parent->dev, "flags: 0x%08x\n", ring->flags);
+
+	if (!test_bit(ring->ring_id, ringacc->rings_inuse))
+		return -EINVAL;
+
+	mutex_lock(&ringacc->req_lock);
+
+	if (--ring->use_count)
+		goto out;
+
+	if (!(ring->flags & K3_RING_FLAG_BUSY))
+		goto no_init;
+
+	k3_ringacc_ring_free_sci(ring);
+
+	dma_free_coherent(ringacc->dev,
+			  ring->size * (4 << ring->elm_size),
+			  ring->ring_mem_virt, ring->ring_mem_dma);
+	ring->flags = 0;
+	ring->ops = NULL;
+	if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED) {
+		clear_bit(ring->proxy_id, ringacc->proxy_inuse);
+		ring->proxy = NULL;
+		ring->proxy_id = K3_RINGACC_PROXY_NOT_USED;
+	}
+
+no_init:
+	clear_bit(ring->ring_id, ringacc->rings_inuse);
+
+	module_put(ringacc->dev->driver->owner);
+
+out:
+	mutex_unlock(&ringacc->req_lock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_free);
+
+u32 k3_ringacc_get_ring_id(struct k3_ring *ring)
+{
+	if (!ring)
+		return -EINVAL;
+
+	return ring->ring_id;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_id);
+
+u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring)
+{
+	if (!ring)
+		return -EINVAL;
+
+	return ring->parent->tisci_dev_id;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_get_tisci_dev_id);
+
+int k3_ringacc_get_ring_irq_num(struct k3_ring *ring)
+{
+	int irq_num;
+
+	if (!ring)
+		return -EINVAL;
+
+	irq_num = ti_sci_inta_msi_get_virq(ring->parent->dev, ring->ring_id);
+	if (irq_num <= 0)
+		irq_num = -EINVAL;
+	return irq_num;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_irq_num);
+
+static int k3_ringacc_ring_cfg_sci(struct k3_ring *ring)
+{
+	struct k3_ringacc *ringacc = ring->parent;
+	u32 ring_idx;
+	int ret;
+
+	if (!ringacc->tisci)
+		return -EINVAL;
+
+	ring_idx = ring->ring_id;
+	ret = ringacc->tisci_ring_ops->config(
+			ringacc->tisci,
+			TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
+			ringacc->tisci_dev_id,
+			ring_idx,
+			lower_32_bits(ring->ring_mem_dma),
+			upper_32_bits(ring->ring_mem_dma),
+			ring->size,
+			ring->mode,
+			ring->elm_size,
+			0);
+	if (ret)
+		dev_err(ringacc->dev, "TISCI config ring fail (%d) ring_idx %d\n",
+			ret, ring_idx);
+
+	return ret;
+}
+
+int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg)
+{
+	struct k3_ringacc *ringacc = ring->parent;
+	int ret = 0;
+
+	if (!ring || !cfg)
+		return -EINVAL;
+	if (cfg->elm_size > K3_RINGACC_RING_ELSIZE_256 ||
+	    cfg->mode > K3_RINGACC_RING_MODE_QM ||
+	    cfg->size & ~K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK ||
+	    !test_bit(ring->ring_id, ringacc->rings_inuse))
+		return -EINVAL;
+
+	if (ring->use_count != 1)
+		return 0;
+
+	ring->size = cfg->size;
+	ring->elm_size = cfg->elm_size;
+	ring->mode = cfg->mode;
+	ring->occ = 0;
+	ring->free = 0;
+	ring->rindex = 0;
+	ring->windex = 0;
+
+	if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED)
+		ring->proxy = ringacc->proxy_target_base +
+			      ring->proxy_id * K3_RINGACC_PROXY_TARGET_STEP;
+
+	switch (ring->mode) {
+	case K3_RINGACC_RING_MODE_RING:
+		ring->ops = &k3_ring_mode_ring_ops;
+		break;
+	case K3_RINGACC_RING_MODE_QM:
+		/*
+		 * In Queue mode elm_size can be 8 only and each operation
+		 * uses 2 element slots
+		 */
+		if (cfg->elm_size != K3_RINGACC_RING_ELSIZE_8 ||
+		    cfg->size % 2)
+			goto err_free_proxy;
+		/* else, fall through */
+	case K3_RINGACC_RING_MODE_MESSAGE:
+		if (ring->proxy)
+			ring->ops = &k3_ring_mode_proxy_ops;
+		else
+			ring->ops = &k3_ring_mode_msg_ops;
+		break;
+	default:
+		ring->ops = NULL;
+		ret = -EINVAL;
+		goto err_free_proxy;
+	};
+
+	ring->ring_mem_virt =
+			dma_alloc_coherent(ringacc->dev,
+					   ring->size * (4 << ring->elm_size),
+					   &ring->ring_mem_dma, GFP_KERNEL);
+	if (!ring->ring_mem_virt) {
+		dev_err(ringacc->dev, "Failed to alloc ring mem\n");
+		ret = -ENOMEM;
+		goto err_free_ops;
+	}
+
+	ret = k3_ringacc_ring_cfg_sci(ring);
+
+	if (ret)
+		goto err_free_mem;
+
+	ring->flags |= K3_RING_FLAG_BUSY;
+	ring->flags |= (cfg->flags & K3_RINGACC_RING_SHARED) ?
+			K3_RING_FLAG_SHARED : 0;
+
+	k3_ringacc_ring_dump(ring);
+
+	return 0;
+
+err_free_mem:
+	dma_free_coherent(ringacc->dev,
+			  ring->size * (4 << ring->elm_size),
+			  ring->ring_mem_virt,
+			  ring->ring_mem_dma);
+err_free_ops:
+	ring->ops = NULL;
+err_free_proxy:
+	ring->proxy = NULL;
+	return ret;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_cfg);
+
+u32 k3_ringacc_ring_get_size(struct k3_ring *ring)
+{
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return -EINVAL;
+
+	return ring->size;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_size);
+
+u32 k3_ringacc_ring_get_free(struct k3_ring *ring)
+{
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return -EINVAL;
+
+	if (!ring->free)
+		ring->free = ring->size - dbg_readl(&ring->rt->occ);
+
+	return ring->free;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_free);
+
+u32 k3_ringacc_ring_get_occ(struct k3_ring *ring)
+{
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return -EINVAL;
+
+	return dbg_readl(&ring->rt->occ);
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_occ);
+
+u32 k3_ringacc_ring_is_full(struct k3_ring *ring)
+{
+	return !k3_ringacc_ring_get_free(ring);
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_is_full);
+
+enum k3_ringacc_access_mode {
+	K3_RINGACC_ACCESS_MODE_PUSH_HEAD,
+	K3_RINGACC_ACCESS_MODE_POP_HEAD,
+	K3_RINGACC_ACCESS_MODE_PUSH_TAIL,
+	K3_RINGACC_ACCESS_MODE_POP_TAIL,
+	K3_RINGACC_ACCESS_MODE_PEEK_HEAD,
+	K3_RINGACC_ACCESS_MODE_PEEK_TAIL,
+};
+
+static int k3_ringacc_ring_cfg_proxy(struct k3_ring *ring,
+				     enum k3_ringacc_proxy_access_mode mode)
+{
+	u32 val;
+
+	val = ring->ring_id;
+	val |= mode << 16;
+	val |= ring->elm_size << 24;
+	dbg_writel(val, &ring->proxy->control);
+	return 0;
+}
+
+static int k3_ringacc_ring_access_proxy(struct k3_ring *ring, void *elem,
+					enum k3_ringacc_access_mode access_mode)
+{
+	void __iomem *ptr;
+
+	ptr = (void __iomem *)&ring->proxy->data;
+
+	switch (access_mode) {
+	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
+	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
+		k3_ringacc_ring_cfg_proxy(ring, PROXY_ACCESS_MODE_HEAD);
+		break;
+	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
+	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
+		k3_ringacc_ring_cfg_proxy(ring, PROXY_ACCESS_MODE_TAIL);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ptr += k3_ringacc_ring_get_fifo_pos(ring);
+
+	switch (access_mode) {
+	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
+	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
+		k3_nav_dbg(ring->parent->dev, "proxy:memcpy_fromio(x): --> ptr(%p), mode:%d\n",
+			   ptr, access_mode);
+		memcpy_fromio(elem, ptr, (4 << ring->elm_size));
+		ring->occ--;
+		break;
+	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
+	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
+		k3_nav_dbg(ring->parent->dev, "proxy:memcpy_toio(x): --> ptr(%p), mode:%d\n",
+			   ptr, access_mode);
+		memcpy_toio(ptr, elem, (4 << ring->elm_size));
+		ring->free--;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	k3_nav_dbg(ring->parent->dev, "proxy: free%d occ%d\n",
+		   ring->free, ring->occ);
+	return 0;
+}
+
+static int k3_ringacc_ring_push_head_proxy(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_proxy(ring, elem,
+					    K3_RINGACC_ACCESS_MODE_PUSH_HEAD);
+}
+
+static int k3_ringacc_ring_push_tail_proxy(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_proxy(ring, elem,
+					    K3_RINGACC_ACCESS_MODE_PUSH_TAIL);
+}
+
+static int k3_ringacc_ring_pop_head_proxy(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_proxy(ring, elem,
+					    K3_RINGACC_ACCESS_MODE_POP_HEAD);
+}
+
+static int k3_ringacc_ring_pop_tail_proxy(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_proxy(ring, elem,
+					    K3_RINGACC_ACCESS_MODE_POP_HEAD);
+}
+
+static int k3_ringacc_ring_access_io(struct k3_ring *ring, void *elem,
+				     enum k3_ringacc_access_mode access_mode)
+{
+	void __iomem *ptr;
+
+	switch (access_mode) {
+	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
+	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
+		ptr = (void __iomem *)&ring->fifos->head_data;
+		break;
+	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
+	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
+		ptr = (void __iomem *)&ring->fifos->tail_data;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ptr += k3_ringacc_ring_get_fifo_pos(ring);
+
+	switch (access_mode) {
+	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
+	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
+		k3_nav_dbg(ring->parent->dev, "memcpy_fromio(x): --> ptr(%p), mode:%d\n",
+			   ptr, access_mode);
+		memcpy_fromio(elem, ptr, (4 << ring->elm_size));
+		ring->occ--;
+		break;
+	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
+	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
+		k3_nav_dbg(ring->parent->dev, "memcpy_toio(x): --> ptr(%p), mode:%d\n",
+			   ptr, access_mode);
+		memcpy_toio(ptr, elem, (4 << ring->elm_size));
+		ring->free--;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	k3_nav_dbg(ring->parent->dev, "free%d index%d occ%d index%d\n",
+		   ring->free, ring->windex, ring->occ, ring->rindex);
+	return 0;
+}
+
+static int k3_ringacc_ring_push_head_io(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_io(ring, elem,
+					 K3_RINGACC_ACCESS_MODE_PUSH_HEAD);
+}
+
+static int k3_ringacc_ring_push_io(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_io(ring, elem,
+					 K3_RINGACC_ACCESS_MODE_PUSH_TAIL);
+}
+
+static int k3_ringacc_ring_pop_io(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_io(ring, elem,
+					 K3_RINGACC_ACCESS_MODE_POP_HEAD);
+}
+
+static int k3_ringacc_ring_pop_tail_io(struct k3_ring *ring, void *elem)
+{
+	return k3_ringacc_ring_access_io(ring, elem,
+					 K3_RINGACC_ACCESS_MODE_POP_HEAD);
+}
+
+static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem)
+{
+	void *elem_ptr;
+
+	elem_ptr = k3_ringacc_get_elm_addr(ring, ring->windex);
+
+	memcpy(elem_ptr, elem, (4 << ring->elm_size));
+
+	ring->windex = (ring->windex + 1) % ring->size;
+	ring->free--;
+	dbg_writel(1, &ring->rt->db);
+
+	k3_nav_dbg(ring->parent->dev, "ring_push_mem: free%d index%d\n",
+		   ring->free, ring->windex);
+
+	return 0;
+}
+
+static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem)
+{
+	void *elem_ptr;
+
+	elem_ptr = k3_ringacc_get_elm_addr(ring, ring->rindex);
+
+	memcpy(elem, elem_ptr, (4 << ring->elm_size));
+
+	ring->rindex = (ring->rindex + 1) % ring->size;
+	ring->occ--;
+	dbg_writel(-1, &ring->rt->db);
+
+	k3_nav_dbg(ring->parent->dev, "ring_pop_mem: occ%d index%d pos_ptr%p\n",
+		   ring->occ, ring->rindex, elem_ptr);
+	return 0;
+}
+
+int k3_ringacc_ring_push(struct k3_ring *ring, void *elem)
+{
+	int ret = -EOPNOTSUPP;
+
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return -EINVAL;
+
+	k3_nav_dbg(ring->parent->dev, "ring_push: free%d index%d\n",
+		   ring->free, ring->windex);
+
+	if (k3_ringacc_ring_is_full(ring))
+		return -ENOMEM;
+
+	if (ring->ops && ring->ops->push_tail)
+		ret = ring->ops->push_tail(ring, elem);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_push);
+
+int k3_ringacc_ring_push_head(struct k3_ring *ring, void *elem)
+{
+	int ret = -EOPNOTSUPP;
+
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return -EINVAL;
+
+	k3_nav_dbg(ring->parent->dev, "ring_push_head: free%d index%d\n",
+		   ring->free, ring->windex);
+
+	if (k3_ringacc_ring_is_full(ring))
+		return -ENOMEM;
+
+	if (ring->ops && ring->ops->push_head)
+		ret = ring->ops->push_head(ring, elem);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_push_head);
+
+int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem)
+{
+	int ret = -EOPNOTSUPP;
+
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return -EINVAL;
+
+	if (!ring->occ)
+		ring->occ = k3_ringacc_ring_get_occ(ring);
+
+	k3_nav_dbg(ring->parent->dev, "ring_pop: occ%d index%d\n",
+		   ring->occ, ring->rindex);
+
+	if (!ring->occ)
+		return -ENODATA;
+
+	if (ring->ops && ring->ops->pop_head)
+		ret = ring->ops->pop_head(ring, elem);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop);
+
+int k3_ringacc_ring_pop_tail(struct k3_ring *ring, void *elem)
+{
+	int ret = -EOPNOTSUPP;
+
+	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+		return -EINVAL;
+
+	if (!ring->occ)
+		ring->occ = k3_ringacc_ring_get_occ(ring);
+
+	k3_nav_dbg(ring->parent->dev, "ring_pop_tail: occ%d index%d\n",
+		   ring->occ, ring->rindex);
+
+	if (!ring->occ)
+		return -ENODATA;
+
+	if (ring->ops && ring->ops->pop_tail)
+		ret = ring->ops->pop_tail(ring, elem);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop_tail);
+
+struct k3_ringacc *of_k3_ringacc_get_by_phandle(struct device_node *np,
+						const char *property)
+{
+	struct device_node *ringacc_np;
+	struct k3_ringacc *ringacc = ERR_PTR(-EPROBE_DEFER);
+	struct k3_ringacc *entry;
+
+	ringacc_np = of_parse_phandle(np, property, 0);
+	if (!ringacc_np)
+		return ERR_PTR(-ENODEV);
+
+	mutex_lock(&k3_ringacc_list_lock);
+	list_for_each_entry(entry, &k3_ringacc_list, list)
+		if (entry->dev->of_node == ringacc_np) {
+			ringacc = entry;
+			break;
+		}
+	mutex_unlock(&k3_ringacc_list_lock);
+	of_node_put(ringacc_np);
+
+	return ringacc;
+}
+EXPORT_SYMBOL_GPL(of_k3_ringacc_get_by_phandle);
+
+static int k3_ringacc_probe_dt(struct k3_ringacc *ringacc)
+{
+	struct device_node *node = ringacc->dev->of_node;
+	struct device *dev = ringacc->dev;
+	struct platform_device *pdev = to_platform_device(dev);
+	int ret;
+
+	if (!node) {
+		dev_err(dev, "device tree info unavailable\n");
+		return -ENODEV;
+	}
+
+	ret = of_property_read_u32(node, "ti,num-rings", &ringacc->num_rings);
+	if (ret) {
+		dev_err(dev, "ti,num-rings read failure %d\n", ret);
+		return ret;
+	}
+
+	ringacc->dma_ring_reset_quirk =
+			of_property_read_bool(node, "ti,dma-ring-reset-quirk");
+
+	ringacc->tisci = ti_sci_get_by_phandle(node, "ti,sci");
+	if (IS_ERR(ringacc->tisci)) {
+		ret = PTR_ERR(ringacc->tisci);
+		if (ret != -EPROBE_DEFER)
+			dev_err(dev, "ti,sci read fail %d\n", ret);
+		ringacc->tisci = NULL;
+		return ret;
+	}
+
+	ret = of_property_read_u32(node, "ti,sci-dev-id",
+				   &ringacc->tisci_dev_id);
+	if (ret) {
+		dev_err(dev, "ti,sci-dev-id read fail %d\n", ret);
+		return ret;
+	}
+
+	pdev->id = ringacc->tisci_dev_id;
+
+	ringacc->rm_gp_range = devm_ti_sci_get_of_resource(ringacc->tisci, dev,
+						ringacc->tisci_dev_id,
+						"ti,sci-rm-range-gp-rings");
+	if (IS_ERR(ringacc->rm_gp_range)) {
+		dev_err(dev, "Failed to allocate MSI interrupts\n");
+		return PTR_ERR(ringacc->rm_gp_range);
+	}
+
+	return ti_sci_inta_msi_domain_alloc_irqs(ringacc->dev,
+						 ringacc->rm_gp_range);
+}
+
+static int k3_ringacc_probe(struct platform_device *pdev)
+{
+	struct k3_ringacc *ringacc;
+	void __iomem *base_fifo, *base_rt;
+	struct device *dev = &pdev->dev;
+	struct resource *res;
+	int ret, i;
+
+	ringacc = devm_kzalloc(dev, sizeof(*ringacc), GFP_KERNEL);
+	if (!ringacc)
+		return -ENOMEM;
+
+	ringacc->dev = dev;
+	mutex_init(&ringacc->req_lock);
+
+	dev->msi_domain = of_msi_get_domain(dev, dev->of_node,
+					    DOMAIN_BUS_TI_SCI_INTA_MSI);
+	if (!dev->msi_domain) {
+		dev_err(dev, "Failed to get MSI domain\n");
+		return -EPROBE_DEFER;
+	}
+
+	ret = k3_ringacc_probe_dt(ringacc);
+	if (ret)
+		return ret;
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "rt");
+	base_rt = devm_ioremap_resource(dev, res);
+	if (IS_ERR(base_rt))
+		return PTR_ERR(base_rt);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "fifos");
+	base_fifo = devm_ioremap_resource(dev, res);
+	if (IS_ERR(base_fifo))
+		return PTR_ERR(base_fifo);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "proxy_gcfg");
+	ringacc->proxy_gcfg = devm_ioremap_resource(dev, res);
+	if (IS_ERR(ringacc->proxy_gcfg))
+		return PTR_ERR(ringacc->proxy_gcfg);
+
+	res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
+					   "proxy_target");
+	ringacc->proxy_target_base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(ringacc->proxy_target_base))
+		return PTR_ERR(ringacc->proxy_target_base);
+
+	ringacc->num_proxies = dbg_readl(&ringacc->proxy_gcfg->config) &
+					 K3_RINGACC_PROXY_CFG_THREADS_MASK;
+
+	ringacc->rings = devm_kzalloc(dev,
+				      sizeof(*ringacc->rings) *
+				      ringacc->num_rings,
+				      GFP_KERNEL);
+	ringacc->rings_inuse = devm_kcalloc(dev,
+					    BITS_TO_LONGS(ringacc->num_rings),
+					    sizeof(unsigned long), GFP_KERNEL);
+	ringacc->proxy_inuse = devm_kcalloc(dev,
+					    BITS_TO_LONGS(ringacc->num_proxies),
+					    sizeof(unsigned long), GFP_KERNEL);
+
+	if (!ringacc->rings || !ringacc->rings_inuse || !ringacc->proxy_inuse)
+		return -ENOMEM;
+
+	for (i = 0; i < ringacc->num_rings; i++) {
+		ringacc->rings[i].rt = base_rt +
+				       K3_RINGACC_RT_REGS_STEP * i;
+		ringacc->rings[i].fifos = base_fifo +
+					  K3_RINGACC_FIFO_REGS_STEP * i;
+		ringacc->rings[i].parent = ringacc;
+		ringacc->rings[i].ring_id = i;
+		ringacc->rings[i].proxy_id = K3_RINGACC_PROXY_NOT_USED;
+	}
+	dev_set_drvdata(dev, ringacc);
+
+	ringacc->tisci_ring_ops = &ringacc->tisci->ops.rm_ring_ops;
+
+	pm_runtime_enable(dev);
+	ret = pm_runtime_get_sync(dev);
+	if (ret < 0) {
+		pm_runtime_put_noidle(dev);
+		dev_err(dev, "Failed to enable pm %d\n", ret);
+		goto err;
+	}
+
+	mutex_lock(&k3_ringacc_list_lock);
+	list_add_tail(&ringacc->list, &k3_ringacc_list);
+	mutex_unlock(&k3_ringacc_list_lock);
+
+	dev_info(dev, "Ring Accelerator probed rings:%u, gp-rings[%u,%u] sci-dev-id:%u\n",
+		 ringacc->num_rings,
+		 ringacc->rm_gp_range->desc[0].start,
+		 ringacc->rm_gp_range->desc[0].num,
+		 ringacc->tisci_dev_id);
+	dev_info(dev, "dma-ring-reset-quirk: %s\n",
+		 ringacc->dma_ring_reset_quirk ? "enabled" : "disabled");
+	dev_info(dev, "RA Proxy rev. %08x, num_proxies:%u\n",
+		 dbg_readl(&ringacc->proxy_gcfg->revision),
+		 ringacc->num_proxies);
+	return 0;
+
+err:
+	pm_runtime_disable(dev);
+	return ret;
+}
+
+static int k3_ringacc_remove(struct platform_device *pdev)
+{
+	struct k3_ringacc *ringacc = dev_get_drvdata(&pdev->dev);
+
+	pm_runtime_put_sync(&pdev->dev);
+	pm_runtime_disable(&pdev->dev);
+
+	mutex_lock(&k3_ringacc_list_lock);
+	list_del(&ringacc->list);
+	mutex_unlock(&k3_ringacc_list_lock);
+	return 0;
+}
+
+/* Match table for of_platform binding */
+static const struct of_device_id k3_ringacc_of_match[] = {
+	{ .compatible = "ti,am654-navss-ringacc", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, k3_ringacc_of_match);
+
+static struct platform_driver k3_ringacc_driver = {
+	.probe		= k3_ringacc_probe,
+	.remove		= k3_ringacc_remove,
+	.driver		= {
+		.name	= "k3-ringacc",
+		.of_match_table = k3_ringacc_of_match,
+	},
+};
+module_platform_driver(k3_ringacc_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("TI Ringacc driver for K3 SOCs");
+MODULE_AUTHOR("Grygorii Strashko <grygorii.strashko@ti.com>");
diff --git a/include/linux/soc/ti/k3-ringacc.h b/include/linux/soc/ti/k3-ringacc.h
new file mode 100644
index 000000000000..debffba48ac9
--- /dev/null
+++ b/include/linux/soc/ti/k3-ringacc.h
@@ -0,0 +1,262 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * K3 Ring Accelerator (RA) subsystem interface
+ *
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ */
+
+#ifndef __SOC_TI_K3_RINGACC_API_H_
+#define __SOC_TI_K3_RINGACC_API_H_
+
+#include <linux/types.h>
+
+struct device_node;
+
+/**
+ * enum k3_ring_mode - &struct k3_ring_cfg mode
+ *
+ * RA ring operational modes
+ *
+ * @K3_RINGACC_RING_MODE_RING: Exposed Ring mode for SW direct access
+ * @K3_RINGACC_RING_MODE_MESSAGE: Messaging mode. Messaging mode requires
+ *	that all accesses to the queue must go through this IP so that all
+ *	accesses to the memory are controlled and ordered. This IP then
+ *	controls the entire state of the queue, and SW has no directly control,
+ *	such as through doorbells and cannot access the storage memory directly.
+ *	This is particularly useful when more than one SW or HW entity can be
+ *	the producer and/or consumer at the same time
+ * @K3_RINGACC_RING_MODE_CREDENTIALS: Credentials mode is message mode plus
+ *	stores credentials with each message, requiring the element size to be
+ *	doubled to fit the credentials. Any exposed memory should be protected
+ *	by a firewall from unwanted access
+ * @K3_RINGACC_RING_MODE_QM:  Queue manager mode. This takes the credentials
+ *	mode and adds packet length per element, along with additional read only
+ *	fields for element count and accumulated queue length. The QM mode only
+ *	operates with an 8 byte element size (any other element size is
+ *	illegal), and like in credentials mode each operation uses 2 element
+ *	slots to store the credentials and length fields
+ */
+enum k3_ring_mode {
+	K3_RINGACC_RING_MODE_RING = 0,
+	K3_RINGACC_RING_MODE_MESSAGE,
+	K3_RINGACC_RING_MODE_CREDENTIALS,
+	K3_RINGACC_RING_MODE_QM,
+	K3_RINGACC_RING_MODE_INVALID
+};
+
+/**
+ * enum k3_ring_size - &struct k3_ring_cfg elm_size
+ *
+ * RA ring element's sizes in bytes.
+ */
+enum k3_ring_size {
+	K3_RINGACC_RING_ELSIZE_4 = 0,
+	K3_RINGACC_RING_ELSIZE_8,
+	K3_RINGACC_RING_ELSIZE_16,
+	K3_RINGACC_RING_ELSIZE_32,
+	K3_RINGACC_RING_ELSIZE_64,
+	K3_RINGACC_RING_ELSIZE_128,
+	K3_RINGACC_RING_ELSIZE_256,
+	K3_RINGACC_RING_ELSIZE_INVALID
+};
+
+struct k3_ringacc;
+struct k3_ring;
+
+/**
+ * enum k3_ring_cfg - RA ring configuration structure
+ *
+ * @size: Ring size, number of elements
+ * @elm_size: Ring element size
+ * @mode: Ring operational mode
+ * @flags: Ring configuration flags. Possible values:
+ *	 @K3_RINGACC_RING_SHARED: when set allows to request the same ring
+ *	 few times. It's usable when the same ring is used as Free Host PD ring
+ *	 for different flows, for example.
+ *	 Note: Locking should be done by consumer if required
+ */
+struct k3_ring_cfg {
+	u32 size;
+	enum k3_ring_size elm_size;
+	enum k3_ring_mode mode;
+#define K3_RINGACC_RING_SHARED BIT(1)
+	u32 flags;
+};
+
+#define K3_RINGACC_RING_ID_ANY (-1)
+
+/**
+ * of_k3_ringacc_get_by_phandle - find a RA by phandle property
+ * @np: device node
+ * @propname: property name containing phandle on RA node
+ *
+ * Returns pointer on the RA - struct k3_ringacc
+ * or -ENODEV if not found,
+ * or -EPROBE_DEFER if not yet registered
+ */
+struct k3_ringacc *of_k3_ringacc_get_by_phandle(struct device_node *np,
+						const char *property);
+
+#define K3_RINGACC_RING_USE_PROXY BIT(1)
+
+/**
+ * k3_ringacc_request_ring - request ring from ringacc
+ * @ringacc: pointer on ringacc
+ * @id: ring id or K3_RINGACC_RING_ID_ANY for any general purpose ring
+ * @flags:
+ *	@K3_RINGACC_RING_USE_PROXY: if set - proxy will be allocated and
+ *		used to access ring memory. Sopported only for rings in
+ *		Message/Credentials/Queue mode.
+ *
+ * Returns pointer on the Ring - struct k3_ring
+ * or NULL in case of failure.
+ */
+struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
+					int id, u32 flags);
+
+/**
+ * k3_ringacc_ring_reset - ring reset
+ * @ring: pointer on Ring
+ *
+ * Resets ring internal state ((hw)occ, (hw)idx).
+ * TODO_GS: ? Ring can be reused without reconfiguration
+ */
+void k3_ringacc_ring_reset(struct k3_ring *ring);
+/**
+ * k3_ringacc_ring_reset - ring reset for DMA rings
+ * @ring: pointer on Ring
+ *
+ * Resets ring internal state ((hw)occ, (hw)idx). Should be used for rings
+ * which are read by K3 UDMA, like TX or Free Host PD rings.
+ */
+void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ);
+
+/**
+ * k3_ringacc_ring_free - ring free
+ * @ring: pointer on Ring
+ *
+ * Resets ring and free all alocated resources.
+ */
+int k3_ringacc_ring_free(struct k3_ring *ring);
+
+/**
+ * k3_ringacc_get_ring_id - Get the Ring ID
+ * @ring: pointer on ring
+ *
+ * Returns the Ring ID
+ */
+u32 k3_ringacc_get_ring_id(struct k3_ring *ring);
+
+/**
+ * k3_ringacc_get_ring_irq_num - Get the irq number for the ring
+ * @ring: pointer on ring
+ *
+ * Returns the interrupt number which can be used to request the interrupt
+ */
+int k3_ringacc_get_ring_irq_num(struct k3_ring *ring);
+
+/**
+ * k3_ringacc_ring_cfg - ring configure
+ * @ring: pointer on ring
+ * @cfg: Ring configuration parameters (see &struct k3_ring_cfg)
+ *
+ * Configures ring, including ring memory allocation.
+ * Returns 0 on success, errno otherwise.
+ */
+int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg);
+
+/**
+ * k3_ringacc_ring_get_size - get ring size
+ * @ring: pointer on ring
+ *
+ * Returns ring size in number of elements.
+ */
+u32 k3_ringacc_ring_get_size(struct k3_ring *ring);
+
+/**
+ * k3_ringacc_ring_get_free - get free elements
+ * @ring: pointer on ring
+ *
+ * Returns number of free elements in the ring.
+ */
+u32 k3_ringacc_ring_get_free(struct k3_ring *ring);
+
+/**
+ * k3_ringacc_ring_get_occ - get ring occupancy
+ * @ring: pointer on ring
+ *
+ * Returns total number of valid entries on the ring
+ */
+u32 k3_ringacc_ring_get_occ(struct k3_ring *ring);
+
+/**
+ * k3_ringacc_ring_is_full - checks if ring is full
+ * @ring: pointer on ring
+ *
+ * Returns true if the ring is full
+ */
+u32 k3_ringacc_ring_is_full(struct k3_ring *ring);
+
+/**
+ * k3_ringacc_ring_push - push element to the ring tail
+ * @ring: pointer on ring
+ * @elem: pointer on ring element buffer
+ *
+ * Push one ring element to the ring tail. Size of the ring element is
+ * determined by ring configuration &struct k3_ring_cfg elm_size.
+ *
+ * Returns 0 on success, errno otherwise.
+ */
+int k3_ringacc_ring_push(struct k3_ring *ring, void *elem);
+
+/**
+ * k3_ringacc_ring_pop - pop element from the ring head
+ * @ring: pointer on ring
+ * @elem: pointer on ring element buffer
+ *
+ * Push one ring element from the ring head. Size of the ring element is
+ * determined by ring configuration &struct k3_ring_cfg elm_size..
+ *
+ * Returns 0 on success, errno otherwise.
+ */
+int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem);
+
+/**
+ * k3_ringacc_ring_push_head - push element to the ring head
+ * @ring: pointer on ring
+ * @elem: pointer on ring element buffer
+ *
+ * Push one ring element to the ring head. Size of the ring element is
+ * determined by ring configuration &struct k3_ring_cfg elm_size.
+ *
+ * Returns 0 on success, errno otherwise.
+ * Not Supported by ring modes: K3_RINGACC_RING_MODE_RING
+ */
+int k3_ringacc_ring_push_head(struct k3_ring *ring, void *elem);
+
+/**
+ * k3_ringacc_ring_pop_tail - pop element from the ring tail
+ * @ring: pointer on ring
+ * @elem: pointer on ring element buffer
+ *
+ * Push one ring element from the ring tail. Size of the ring element is
+ * determined by ring configuration &struct k3_ring_cfg elm_size.
+ *
+ * Returns 0 on success, errno otherwise.
+ * Not Supported by ring modes: K3_RINGACC_RING_MODE_RING
+ */
+int k3_ringacc_ring_pop_tail(struct k3_ring *ring, void *elem);
+
+u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring);
+
+/**
+ * Debugging definitions
+ * TODO: might be removed
+ */
+#ifdef CONFIG_TI_K3_RINGACC_DEBUG
+void k3_ringacc_ring_dump(struct k3_ring *ring);
+#else
+static inline void k3_ringacc_ring_dump(struct k3_ring *ring) {};
+#endif
+
+#endif /* __SOC_TI_K3_RINGACC_API_H_ */
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 03/14] dmaengine: doc: Add sections for per descriptor metadata support
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 01/14] bindings: soc: ti: add documentation for k3 ringacc Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 04/14] dmaengine: Add metadata_ops for dma_async_tx_descriptor Peter Ujfalusi
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Update the provider and client documentation with details about the
metadata support.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 Documentation/driver-api/dmaengine/client.rst | 75 +++++++++++++++++++
 .../driver-api/dmaengine/provider.rst         | 46 ++++++++++++
 2 files changed, 121 insertions(+)

diff --git a/Documentation/driver-api/dmaengine/client.rst b/Documentation/driver-api/dmaengine/client.rst
index 45953f171500..d708e46b88a2 100644
--- a/Documentation/driver-api/dmaengine/client.rst
+++ b/Documentation/driver-api/dmaengine/client.rst
@@ -151,6 +151,81 @@ The details of these operations are:
      Note that callbacks will always be invoked from the DMA
      engines tasklet, never from interrupt context.
 
+  Optional: per descriptor metadata
+  ---------------------------------
+  DMAengine provides two ways for metadata support.
+
+  DESC_METADATA_CLIENT
+
+    The metadata buffer is allocated/provided by the client driver and it is
+    attached to the descriptor.
+
+  .. code-block:: c
+
+     int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc,
+				   void *data, size_t len);
+
+  DESC_METADATA_ENGINE
+
+    The metadata buffer is allocated/managed by the DMA driver. The client
+    driver can ask for the pointer, maximum size and the currently used size of
+    the metadata and can directly update or read it.
+
+  .. code-block:: c
+
+     void *dmaengine_desc_get_metadata_ptr(struct dma_async_tx_descriptor *desc,
+		size_t *payload_len, size_t *max_len);
+
+     int dmaengine_desc_set_metadata_len(struct dma_async_tx_descriptor *desc,
+		size_t payload_len);
+
+  Client drivers can query if a given mode is supported with:
+
+  .. code-block:: c
+
+     bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan,
+		enum dma_desc_metadata_mode mode);
+
+  Depending on the used mode client drivers must follow different flow.
+
+  DESC_METADATA_CLIENT
+
+    - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM:
+      1. prepare the descriptor (dmaengine_prep_*)
+         construct the metadata in the client's buffer
+      2. use dmaengine_desc_attach_metadata() to attach the buffer to the
+         descriptor
+      3. submit the transfer
+    - DMA_DEV_TO_MEM:
+      1. prepare the descriptor (dmaengine_prep_*)
+      2. use dmaengine_desc_attach_metadata() to attach the buffer to the
+         descriptor
+      3. submit the transfer
+      4. when the transfer is completed, the metadata should be available in the
+         attached buffer
+
+  DESC_METADATA_ENGINE
+
+    - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM:
+      1. prepare the descriptor (dmaengine_prep_*)
+      2. use dmaengine_desc_get_metadata_ptr() to get the pointer to the
+         engine's metadata area
+      3. update the metadata at the pointer
+      4. use dmaengine_desc_set_metadata_len()  to tell the DMA engine the
+         amount of data the client has placed into the metadata buffer
+      5. submit the transfer
+    - DMA_DEV_TO_MEM:
+      1. prepare the descriptor (dmaengine_prep_*)
+      2. submit the transfer
+      3. on transfer completion, use dmaengine_desc_get_metadata_ptr() to get the
+         pointer to the engine's metadata are
+      4. Read out the metadate from the pointer
+
+  .. note::
+
+     Mixed use of DESC_METADATA_CLIENT / DESC_METADATA_ENGINE is not allowed,
+     client drivers must use either of the modes per descriptor.
+
 4. Submit the transaction
 
    Once the descriptor has been prepared and the callback information
diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst
index dfc4486b5743..9e6d87b3c477 100644
--- a/Documentation/driver-api/dmaengine/provider.rst
+++ b/Documentation/driver-api/dmaengine/provider.rst
@@ -247,6 +247,52 @@ after each transfer. In case of a ring buffer, they may loop
 (DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO)
 are typically fixed.
 
+Per descriptor metadata support
+-------------------------------
+Some data movement architecure (DMA controller and peripherals) uses metadata
+associated with a transaction. The DMA controller role is to transfer the
+payload and the metadata alongside.
+The metadata itself is not used by the DMA engine itself, but it contains
+parameters, keys, vectors, etc for peripheral or from the peripheral.
+
+The DMAengine framework provides a generic ways to facilitate the metadata for
+descriptors. Depending on the architecture the DMA driver can implement either
+or both of the methods and it is up to the client driver to choose which one
+to use.
+
+- DESC_METADATA_CLIENT
+
+  The metadata buffer is allocated/provided by the client driver and it is
+  attached (via the dmaengine_desc_attach_metadata() helper to the descriptor.
+
+  From the DMA driver the following is expected for this mode:
+  - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM
+    The data from the provided metadata buffer should be prepared for the DMA
+    controller to be sent alongside of the payload data. Either by copying to a
+    hardware descriptor, or highly coupled packet.
+  - DMA_DEV_TO_MEM
+    On transfer completion the DMA driver must copy the metadata to the client
+    provided metadata buffer.
+
+- DESC_METADATA_ENGINE
+
+  The metadata buffer is allocated/managed by the DMA driver. The client driver
+  can ask for the pointer, maximum size and the currently used size of the
+  metadata and can directly update or read it. dmaengine_desc_get_metadata_ptr()
+  and dmaengine_desc_set_metadata_len() is provided as helper functions.
+
+  From the DMA driver the following is expected for this mode:
+  - get_metadata_ptr
+    Should return a pointer for the metadata buffer, the maximum size of the
+    metadata buffer and the currently used / valid (if any) bytes in the buffer.
+  - set_metadata_len
+    It is called by the clients after it have placed the metadata to the buffer
+    to let the DMA driver know the number of valid bytes provided.
+
+  Note: since the client will ask for the metadata pointer in the completion
+  callback (in DMA_DEV_TO_MEM case) the DMA driver must ensure that the
+  descriptor is not freed up prior the callback is called.
+
 Device operations
 -----------------
 
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 04/14] dmaengine: Add metadata_ops for dma_async_tx_descriptor
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (2 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 03/14] dmaengine: doc: Add sections for per descriptor metadata support Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-09-08 14:12   ` Vinod Koul
  2019-07-30  9:34 ` [PATCH v2 05/14] dmaengine: Add support for reporting DMA cached data amount Peter Ujfalusi
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

The metadata is best described as side band data or parameters traveling
alongside the data DMAd by the DMA engine. It is data
which is understood by the peripheral and the peripheral driver only, the
DMA engine see it only as data block and it is not interpreting it in any
way.

The metadata can be different per descriptor as it is a parameter for the
data being transferred.

If the DMA supports per descriptor metadata it can implement the attach,
get_ptr/set_len callbacks.

Client drivers must only use either attach or get_ptr/set_len to avoid
misconfiguration.

Client driver can check if a given metadata mode is supported by the
channel during probe time with
dmaengine_is_metadata_mode_supported(chan, DESC_METADATA_CLIENT);
dmaengine_is_metadata_mode_supported(chan, DESC_METADATA_ENGINE);

and based on this information can use either mode.

Wrappers are also added for the metadata_ops.

To be used in DESC_METADATA_CLIENT mode:
dmaengine_desc_attach_metadata()

To be used in DESC_METADATA_ENGINE mode:
dmaengine_desc_get_metadata_ptr()
dmaengine_desc_set_metadata_len()

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/dmaengine.c   |  73 ++++++++++++++++++++++++++
 include/linux/dmaengine.h | 108 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 181 insertions(+)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 03ac4b96117c..6baddf7dcbfd 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1302,6 +1302,79 @@ void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
 }
 EXPORT_SYMBOL(dma_async_tx_descriptor_init);
 
+static inline int desc_check_and_set_metadata_mode(
+	struct dma_async_tx_descriptor *desc, enum dma_desc_metadata_mode mode)
+{
+	/* Make sure that the metadata mode is not mixed */
+	if (!desc->desc_metadata_mode) {
+		if (dmaengine_is_metadata_mode_supported(desc->chan, mode))
+			desc->desc_metadata_mode = mode;
+		else
+			return -ENOTSUPP;
+	} else if (desc->desc_metadata_mode != mode) {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc,
+				   void *data, size_t len)
+{
+	int ret;
+
+	if (!desc)
+		return -EINVAL;
+
+	ret = desc_check_and_set_metadata_mode(desc, DESC_METADATA_CLIENT);
+	if (ret)
+		return ret;
+
+	if (!desc->metadata_ops || !desc->metadata_ops->attach)
+		return -ENOTSUPP;
+
+	return desc->metadata_ops->attach(desc, data, len);
+}
+EXPORT_SYMBOL_GPL(dmaengine_desc_attach_metadata);
+
+void *dmaengine_desc_get_metadata_ptr(struct dma_async_tx_descriptor *desc,
+				      size_t *payload_len, size_t *max_len)
+{
+	int ret;
+
+	if (!desc)
+		return ERR_PTR(-EINVAL);
+
+	ret = desc_check_and_set_metadata_mode(desc, DESC_METADATA_ENGINE);
+	if (ret)
+		return ERR_PTR(ret);
+
+	if (!desc->metadata_ops || !desc->metadata_ops->get_ptr)
+		return ERR_PTR(-ENOTSUPP);
+
+	return desc->metadata_ops->get_ptr(desc, payload_len, max_len);
+}
+EXPORT_SYMBOL_GPL(dmaengine_desc_get_metadata_ptr);
+
+int dmaengine_desc_set_metadata_len(struct dma_async_tx_descriptor *desc,
+				    size_t payload_len)
+{
+	int ret;
+
+	if (!desc)
+		return -EINVAL;
+
+	ret = desc_check_and_set_metadata_mode(desc, DESC_METADATA_ENGINE);
+	if (ret)
+		return ret;
+
+	if (!desc->metadata_ops || !desc->metadata_ops->set_len)
+		return -ENOTSUPP;
+
+	return desc->metadata_ops->set_len(desc, payload_len);
+}
+EXPORT_SYMBOL_GPL(dmaengine_desc_set_metadata_len);
+
 /* dma_wait_for_async_tx - spin wait for a transaction to complete
  * @tx: in-flight transaction to wait on
  */
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 8fcdee1c0cf9..40d062c3b359 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -219,6 +219,58 @@ typedef struct { DECLARE_BITMAP(bits, DMA_TX_TYPE_END); } dma_cap_mask_t;
  * @bytes_transferred: byte counter
  */
 
+/**
+ * enum dma_desc_metadata_mode - per descriptor metadata mode types supported
+ * @DESC_METADATA_CLIENT - the metadata buffer is allocated/provided by the
+ *  client driver and it is attached (via the dmaengine_desc_attach_metadata()
+ *  helper) to the descriptor.
+ *
+ * Client drivers interested to use this mode can follow:
+ * - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM:
+ *   1. prepare the descriptor (dmaengine_prep_*)
+ *	construct the metadata in the client's buffer
+ *   2. use dmaengine_desc_attach_metadata() to attach the buffer to the
+ *	descriptor
+ *   3. submit the transfer
+ * - DMA_DEV_TO_MEM:
+ *   1. prepare the descriptor (dmaengine_prep_*)
+ *   2. use dmaengine_desc_attach_metadata() to attach the buffer to the
+ *	descriptor
+ *   3. submit the transfer
+ *   4. when the transfer is completed, the metadata should be available in the
+ *	attached buffer
+ *
+ * @DESC_METADATA_ENGINE - the metadata buffer is allocated/managed by the DMA
+ *  driver. The client driver can ask for the pointer, maximum size and the
+ *  currently used size of the metadata and can directly update or read it.
+ *  dmaengine_desc_get_metadata_ptr() and dmaengine_desc_set_metadata_len() is
+ *  provided as helper functions.
+ *
+ * Client drivers interested to use this mode can follow:
+ * - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM:
+ *   1. prepare the descriptor (dmaengine_prep_*)
+ *   2. use dmaengine_desc_get_metadata_ptr() to get the pointer to the engine's
+ *	metadata area
+ *   3. update the metadata at the pointer
+ *   4. use dmaengine_desc_set_metadata_len()  to tell the DMA engine the amount
+ *	of data the client has placed into the metadata buffer
+ *   5. submit the transfer
+ * - DMA_DEV_TO_MEM:
+ *   1. prepare the descriptor (dmaengine_prep_*)
+ *   2. submit the transfer
+ *   3. on transfer completion, use dmaengine_desc_get_metadata_ptr() to get the
+ *	pointer to the engine's metadata are
+ *   4. Read out the metadate from the pointer
+ *
+ * Note: the two mode is not compatible and clients must use one mode for a
+ * descriptor.
+ */
+enum dma_desc_metadata_mode {
+	DESC_METADATA_NONE = 0,
+	DESC_METADATA_CLIENT = BIT(0),
+	DESC_METADATA_ENGINE = BIT(1),
+};
+
 struct dma_chan_percpu {
 	/* stats */
 	unsigned long memcpy_count;
@@ -475,6 +527,18 @@ struct dmaengine_unmap_data {
 	dma_addr_t addr[0];
 };
 
+struct dma_async_tx_descriptor;
+
+struct dma_descriptor_metadata_ops {
+	int (*attach)(struct dma_async_tx_descriptor *desc, void *data,
+		      size_t len);
+
+	void *(*get_ptr)(struct dma_async_tx_descriptor *desc,
+			 size_t *payload_len, size_t *max_len);
+	int (*set_len)(struct dma_async_tx_descriptor *desc,
+		       size_t payload_len);
+};
+
 /**
  * struct dma_async_tx_descriptor - async transaction descriptor
  * ---dma generic offload fields---
@@ -488,6 +552,11 @@ struct dmaengine_unmap_data {
  * descriptor pending. To be pushed on .issue_pending() call
  * @callback: routine to call after this operation is complete
  * @callback_param: general parameter to pass to the callback routine
+ * @desc_metadata_mode: core managed metadata mode to protect mixed use of
+ *	DESC_METADATA_CLIENT or DESC_METADATA_ENGINE. Otherwise
+ *	DESC_METADATA_NONE
+ * @metadata_ops: DMA driver provided metadata mode ops, need to be set by the
+ *	DMA driver if metadata mode is supported with the descriptor
  * ---async_tx api specific fields---
  * @next: at completion submit this descriptor
  * @parent: pointer to the next level up in the dependency chain
@@ -504,6 +573,8 @@ struct dma_async_tx_descriptor {
 	dma_async_tx_callback_result callback_result;
 	void *callback_param;
 	struct dmaengine_unmap_data *unmap;
+	enum dma_desc_metadata_mode desc_metadata_mode;
+	struct dma_descriptor_metadata_ops *metadata_ops;
 #ifdef CONFIG_ASYNC_TX_ENABLE_CHANNEL_SWITCH
 	struct dma_async_tx_descriptor *next;
 	struct dma_async_tx_descriptor *parent;
@@ -666,6 +737,7 @@ struct dma_filter {
  * @global_node: list_head for global dma_device_list
  * @filter: information for device/slave to filter function/param mapping
  * @cap_mask: one or more dma_capability flags
+ * @desc_metadata_modes: supported metadata modes by the DMA device
  * @max_xor: maximum number of xor sources, 0 if no capability
  * @max_pq: maximum number of PQ sources and PQ-continue capability
  * @copy_align: alignment shift for memcpy operations
@@ -727,6 +799,7 @@ struct dma_device {
 	struct list_head global_node;
 	struct dma_filter filter;
 	dma_cap_mask_t  cap_mask;
+	enum dma_desc_metadata_mode desc_metadata_modes;
 	unsigned short max_xor;
 	unsigned short max_pq;
 	enum dmaengine_alignment copy_align;
@@ -902,6 +975,41 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
 						    len, flags);
 }
 
+static inline bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan,
+		enum dma_desc_metadata_mode mode)
+{
+	if (!chan)
+		return false;
+
+	return !!(chan->device->desc_metadata_modes & mode);
+}
+
+#ifdef CONFIG_DMA_ENGINE
+int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc,
+				   void *data, size_t len);
+void *dmaengine_desc_get_metadata_ptr(struct dma_async_tx_descriptor *desc,
+				      size_t *payload_len, size_t *max_len);
+int dmaengine_desc_set_metadata_len(struct dma_async_tx_descriptor *desc,
+				    size_t payload_len);
+#else /* CONFIG_DMA_ENGINE */
+static inline int dmaengine_desc_attach_metadata(
+		struct dma_async_tx_descriptor *desc, void *data, size_t len)
+{
+	return -EINVAL;
+}
+static inline void *dmaengine_desc_get_metadata_ptr(
+		struct dma_async_tx_descriptor *desc, size_t *payload_len,
+		size_t *max_len)
+{
+	return NULL;
+}
+static inline int dmaengine_desc_set_metadata_len(
+		struct dma_async_tx_descriptor *desc, size_t payload_len)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_DMA_ENGINE */
+
 /**
  * dmaengine_terminate_all() - Terminate all active DMA transfers
  * @chan: The channel for which to terminate the transfers
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 05/14] dmaengine: Add support for reporting DMA cached data amount
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (3 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 04/14] dmaengine: Add metadata_ops for dma_async_tx_descriptor Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA Peter Ujfalusi
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

A DMA hardware can have big cache or FIFO and the amount of data sitting in
the DMA fabric can be an interest for the clients.

For example in audio we want to know the delay in the data flow and in case
the DMA have significantly large FIFO/cache, it can affect the latenc/delay

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/dmaengine.h   | 8 ++++++++
 include/linux/dmaengine.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/dma/dmaengine.h b/drivers/dma/dmaengine.h
index 501c0b063f85..b0b97475707a 100644
--- a/drivers/dma/dmaengine.h
+++ b/drivers/dma/dmaengine.h
@@ -77,6 +77,7 @@ static inline enum dma_status dma_cookie_status(struct dma_chan *chan,
 		state->last = complete;
 		state->used = used;
 		state->residue = 0;
+		state->in_flight_bytes = 0;
 	}
 	return dma_async_is_complete(cookie, complete, used);
 }
@@ -87,6 +88,13 @@ static inline void dma_set_residue(struct dma_tx_state *state, u32 residue)
 		state->residue = residue;
 }
 
+static inline void dma_set_in_flight_bytes(struct dma_tx_state *state,
+					   u32 in_flight_bytes)
+{
+	if (state)
+		state->in_flight_bytes = in_flight_bytes;
+}
+
 struct dmaengine_desc_callback {
 	dma_async_tx_callback callback;
 	dma_async_tx_callback_result callback_result;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 40d062c3b359..02ceef95340a 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -682,11 +682,13 @@ static inline struct dma_async_tx_descriptor *txd_next(struct dma_async_tx_descr
  * @residue: the remaining number of bytes left to transmit
  *	on the selected transfer for states DMA_IN_PROGRESS and
  *	DMA_PAUSED if this is implemented in the driver, else 0
+ * @in_flight_bytes: amount of data in bytes cached by the DMA.
  */
 struct dma_tx_state {
 	dma_cookie_t last;
 	dma_cookie_t used;
 	u32 residue;
+	u32 in_flight_bytes;
 };
 
 /**
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (4 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 05/14] dmaengine: Add support for reporting DMA cached data amount Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-09-08 14:25   ` Vinod Koul
  2019-07-30  9:34 ` [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA Peter Ujfalusi
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 include/linux/dma/ti-cppi5.h | 996 +++++++++++++++++++++++++++++++++++
 1 file changed, 996 insertions(+)
 create mode 100644 include/linux/dma/ti-cppi5.h

diff --git a/include/linux/dma/ti-cppi5.h b/include/linux/dma/ti-cppi5.h
new file mode 100644
index 000000000000..a58bc1e2ba0b
--- /dev/null
+++ b/include/linux/dma/ti-cppi5.h
@@ -0,0 +1,996 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * CPPI5 descriptors interface
+ *
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ */
+
+#ifndef __TI_CPPI5_H__
+#define __TI_CPPI5_H__
+
+#include <linux/bitops.h>
+#include <linux/printk.h>
+#include <linux/bug.h>
+
+/**
+ * Descriptor header, present in all types of descriptors
+ */
+struct cppi5_desc_hdr_t {
+	u32 pkt_info0;	/* Packet info word 0 (n/a in Buffer desc) */
+	u32 pkt_info1;	/* Packet info word 1 (n/a in Buffer desc) */
+	u32 pkt_info2;	/* Packet info word 2 Buffer reclamation info */
+	u32 src_dst_tag; /* Packet info word 3 (n/a in Buffer desc) */
+} __packed;
+
+/**
+ * Host-mode packet and buffer descriptor definition
+ */
+struct cppi5_host_desc_t {
+	struct cppi5_desc_hdr_t hdr;
+	u64 next_desc;	/* w4/5: Linking word */
+	u64 buf_ptr;	/* w6/7: Buffer pointer */
+	u32 buf_info1;	/* w8: Buffer valid data length */
+	u32 org_buf_len; /* w9: Original buffer length */
+	u64 org_buf_ptr; /* w10/11: Original buffer pointer */
+	u32 epib[0];	/* Extended Packet Info Data (optional, 4 words) */
+	/*
+	 * Protocol Specific Data (optional, 0-128 bytes in multiples of 4),
+	 * and/or Other Software Data (0-N bytes, optional)
+	 */
+} __packed;
+
+#define CPPI5_DESC_MIN_ALIGN			(16U)
+
+#define CPPI5_INFO0_HDESC_EPIB_SIZE		(16U)
+#define CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE	(128U)
+
+#define CPPI5_INFO0_HDESC_TYPE_SHIFT		(30U)
+#define CPPI5_INFO0_HDESC_TYPE_MASK		GENMASK(31, 30)
+#define   CPPI5_INFO0_DESC_TYPE_VAL_HOST	(1U)
+#define   CPPI5_INFO0_DESC_TYPE_VAL_MONO	(2U)
+#define   CPPI5_INFO0_DESC_TYPE_VAL_TR		(3U)
+#define CPPI5_INFO0_HDESC_EPIB_PRESENT		BIT(29)
+/*
+ * Protocol Specific Words location:
+ * 0 - located in the descriptor,
+ * 1 = located in the SOP Buffer immediately prior to the data.
+ */
+#define CPPI5_INFO0_HDESC_PSINFO_LOCATION	BIT(28)
+#define CPPI5_INFO0_HDESC_PSINFO_SIZE_SHIFT	(22U)
+#define CPPI5_INFO0_HDESC_PSINFO_SIZE_MASK	GENMASK(27, 22)
+#define CPPI5_INFO0_HDESC_PKTLEN_SHIFT		(0)
+#define CPPI5_INFO0_HDESC_PKTLEN_MASK		GENMASK(21, 0)
+
+#define CPPI5_INFO1_DESC_PKTERROR_SHIFT		(28U)
+#define CPPI5_INFO1_DESC_PKTERROR_MASK		GENMASK(31, 28)
+#define CPPI5_INFO1_HDESC_PSFLGS_SHIFT		(24U)
+#define CPPI5_INFO1_HDESC_PSFLGS_MASK		GENMASK(27, 24)
+#define CPPI5_INFO1_DESC_PKTID_SHIFT		(14U)
+#define CPPI5_INFO1_DESC_PKTID_MASK		GENMASK(23, 14)
+#define CPPI5_INFO1_DESC_FLOWID_SHIFT		(0)
+#define CPPI5_INFO1_DESC_FLOWID_MASK		GENMASK(13, 0)
+
+#define CPPI5_INFO2_HDESC_PKTTYPE_SHIFT		(27U)
+#define CPPI5_INFO2_HDESC_PKTTYPE_MASK		GENMASK(31, 27)
+/* Return Policy: 0 - Entire packet 1 - Each buffer */
+#define CPPI5_INFO2_HDESC_RETPOLICY		BIT(18)
+/*
+ * Early Return:
+ * 0 = desc pointers should be returned after all reads have been completed
+ * 1 = desc pointers should be returned immediately upon fetching
+ * the descriptor and beginning to transfer data.
+ */
+#define CPPI5_INFO2_HDESC_EARLYRET		BIT(17)
+/*
+ * Return Push Policy:
+ * 0 = Descriptor must be returned to tail of queue
+ * 1 = Descriptor must be returned to head of queue
+ */
+#define CPPI5_INFO2_DESC_RETPUSHPOLICY		BIT(16)
+#define CPPI5_INFO2_DESC_RETQ_SHIFT		(0)
+#define CPPI5_INFO2_DESC_RETQ_MASK		GENMASK(15, 0)
+
+#define CPPI5_INFO3_DESC_SRCTAG_SHIFT		(16U)
+#define CPPI5_INFO3_DESC_SRCTAG_MASK		GENMASK(31, 16)
+#define CPPI5_INFO3_DESC_DSTTAG_SHIFT		(0)
+#define CPPI5_INFO3_DESC_DSTTAG_MASK		GENMASK(15, 0)
+
+#define CPPI5_BUFINFO1_HDESC_DATA_LEN_SHIFT	(0)
+#define CPPI5_BUFINFO1_HDESC_DATA_LEN_MASK	GENMASK(27, 0)
+
+#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_SHIFT	(0)
+#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK	GENMASK(27, 0)
+
+/*
+ * Host Packet Descriptor Extended Packet Info Block
+ */
+struct cppi5_desc_epib_t {
+	u32 timestamp;	/* w0: application specific timestamp */
+	u32 sw_info0;	/* w1: Software Info 0 */
+	u32 sw_info1;	/* w2: Software Info 1 */
+	u32 sw_info2;	/* w3: Software Info 2 */
+};
+
+/**
+ * Monolithic-mode packet descriptor
+ */
+struct cppi5_monolithic_desc_t {
+	struct cppi5_desc_hdr_t hdr;
+	u32 epib[0];	/* Extended Packet Info Data (optional, 4 words) */
+	/*
+	 * Protocol Specific Data (optional, 0-128 bytes in multiples of 4),
+	 *  and/or Other Software Data (0-N bytes, optional)
+	 */
+};
+
+#define CPPI5_INFO2_MDESC_DATA_OFFSET_SHIFT	(18U)
+#define CPPI5_INFO2_MDESC_DATA_OFFSET_MASK	GENMASK(26, 18)
+
+/*
+ * Reload Enable:
+ * 0 = Finish the packet and place the descriptor back on the return queue
+ * 1 = Vector to the Reload Index and resume processing
+ */
+#define CPPI5_INFO0_TRDESC_RLDCNT_SHIFT		(20U)
+#define CPPI5_INFO0_TRDESC_RLDCNT_MASK		GENMASK(28, 20)
+#define CPPI5_INFO0_TRDESC_RLDCNT_MAX		(0x1ff)
+#define CPPI5_INFO0_TRDESC_RLDCNT_INFINITE	CPPI5_INFO0_TRDESC_RLDCNT_MAX
+#define CPPI5_INFO0_TRDESC_RLDIDX_SHIFT		(14U)
+#define CPPI5_INFO0_TRDESC_RLDIDX_MASK		GENMASK(19, 14)
+#define CPPI5_INFO0_TRDESC_RLDIDX_MAX		(0x3f)
+#define CPPI5_INFO0_TRDESC_LASTIDX_SHIFT	(0)
+#define CPPI5_INFO0_TRDESC_LASTIDX_MASK		GENMASK(13, 0)
+
+#define CPPI5_INFO1_TRDESC_RECSIZE_SHIFT	(24U)
+#define CPPI5_INFO1_TRDESC_RECSIZE_MASK		GENMASK(26, 24)
+#define   CPPI5_INFO1_TRDESC_RECSIZE_VAL_16B	(0)
+#define   CPPI5_INFO1_TRDESC_RECSIZE_VAL_32B	(1U)
+#define   CPPI5_INFO1_TRDESC_RECSIZE_VAL_64B	(2U)
+#define   CPPI5_INFO1_TRDESC_RECSIZE_VAL_128B	(3U)
+
+static inline void cppi5_desc_dump(void *desc, u32 size)
+{
+	print_hex_dump(KERN_ERR, "dump udmap_desc: ", DUMP_PREFIX_NONE,
+		       32, 4, desc, size, false);
+}
+
+/**
+ * cppi5_desc_get_type - get descriptor type
+ * @desc_hdr: packet descriptor/TR header
+ *
+ * Returns descriptor type:
+ * CPPI5_INFO0_DESC_TYPE_VAL_HOST
+ * CPPI5_INFO0_DESC_TYPE_VAL_MONO
+ * CPPI5_INFO0_DESC_TYPE_VAL_TR
+ */
+static inline u32 cppi5_desc_get_type(struct cppi5_desc_hdr_t *desc_hdr)
+{
+	WARN_ON(!desc_hdr);
+
+	return (desc_hdr->pkt_info0 & CPPI5_INFO0_HDESC_TYPE_MASK) >>
+		CPPI5_INFO0_HDESC_TYPE_SHIFT;
+}
+
+/**
+ * cppi5_desc_get_errflags - get Error Flags from Desc
+ * @desc_hdr: packet/TR descriptor header
+ *
+ * Returns Error Flags from Packet/TR Descriptor
+ */
+static inline u32 cppi5_desc_get_errflags(struct cppi5_desc_hdr_t *desc_hdr)
+{
+	WARN_ON(!desc_hdr);
+
+	return (desc_hdr->pkt_info1 & CPPI5_INFO1_DESC_PKTERROR_MASK) >>
+		CPPI5_INFO1_DESC_PKTERROR_SHIFT;
+}
+
+/**
+ * cppi5_desc_get_pktids - get Packet and Flow ids from Desc
+ * @desc_hdr: packet/TR descriptor header
+ * @pkt_id: Packet ID
+ * @flow_id: Flow ID
+ *
+ * Returns Packet and Flow ids from packet/TR descriptor
+ */
+static inline void cppi5_desc_get_pktids(struct cppi5_desc_hdr_t *desc_hdr,
+					 u32 *pkt_id, u32 *flow_id)
+{
+	WARN_ON(!desc_hdr);
+
+	*pkt_id = (desc_hdr->pkt_info1 & CPPI5_INFO1_DESC_PKTID_MASK) >>
+		   CPPI5_INFO1_DESC_PKTID_SHIFT;
+	*flow_id = (desc_hdr->pkt_info1 & CPPI5_INFO1_DESC_FLOWID_MASK) >>
+		    CPPI5_INFO1_DESC_FLOWID_SHIFT;
+}
+
+/**
+ * cppi5_desc_set_pktids - set Packet and Flow ids in Desc
+ * @desc_hdr: packet/TR descriptor header
+ * @pkt_id: Packet ID
+ * @flow_id: Flow ID
+ */
+static inline void cppi5_desc_set_pktids(struct cppi5_desc_hdr_t *desc_hdr,
+					 u32 pkt_id, u32 flow_id)
+{
+	WARN_ON(!desc_hdr);
+
+	desc_hdr->pkt_info1 |= (pkt_id << CPPI5_INFO1_DESC_PKTID_SHIFT) &
+				CPPI5_INFO1_DESC_PKTID_MASK;
+	desc_hdr->pkt_info1 |= (flow_id << CPPI5_INFO1_DESC_FLOWID_SHIFT) &
+				CPPI5_INFO1_DESC_FLOWID_MASK;
+}
+
+/**
+ * cppi5_desc_set_retpolicy - set Packet Return Policy in Desc
+ * @desc_hdr: packet/TR descriptor header
+ * @flags: fags, supported values
+ *  CPPI5_INFO2_HDESC_RETPOLICY
+ *  CPPI5_INFO2_HDESC_EARLYRET
+ *  CPPI5_INFO2_DESC_RETPUSHPOLICY
+ * @return_ring_id: Packet Return Queue/Ring id, value 0xFFFF reserved
+ */
+static inline void cppi5_desc_set_retpolicy(struct cppi5_desc_hdr_t *desc_hdr,
+					    u32 flags, u32 return_ring_id)
+{
+	WARN_ON(!desc_hdr);
+
+	desc_hdr->pkt_info2 |= flags;
+	desc_hdr->pkt_info2 |= return_ring_id & CPPI5_INFO2_DESC_RETQ_MASK;
+}
+
+/**
+ * cppi5_desc_get_tags_ids - get Packet Src/Dst Tags from Desc
+ * @desc_hdr: packet/TR descriptor header
+ * @src_tag_id: Source Tag
+ * @dst_tag_id: Dest Tag
+ *
+ * Returns Packet Src/Dst Tags from packet/TR descriptor
+ */
+static inline void cppi5_desc_get_tags_ids(struct cppi5_desc_hdr_t *desc_hdr,
+					   u32 *src_tag_id, u32 *dst_tag_id)
+{
+	WARN_ON(!desc_hdr);
+
+	if (src_tag_id)
+		*src_tag_id = (desc_hdr->src_dst_tag &
+			      CPPI5_INFO3_DESC_SRCTAG_MASK) >>
+			      CPPI5_INFO3_DESC_SRCTAG_SHIFT;
+	if (dst_tag_id)
+		*dst_tag_id = desc_hdr->src_dst_tag &
+			      CPPI5_INFO3_DESC_DSTTAG_MASK;
+}
+
+/**
+ * cppi5_desc_set_tags_ids - set Packet Src/Dst Tags in HDesc
+ * @desc_hdr: packet/TR descriptor header
+ * @src_tag_id: Source Tag
+ * @dst_tag_id: Dest Tag
+ *
+ * Returns Packet Src/Dst Tags from packet/TR descriptor
+ */
+static inline void cppi5_desc_set_tags_ids(struct cppi5_desc_hdr_t *desc_hdr,
+					   u32 src_tag_id, u32 dst_tag_id)
+{
+	WARN_ON(!desc_hdr);
+
+	desc_hdr->src_dst_tag = (src_tag_id << CPPI5_INFO3_DESC_SRCTAG_SHIFT) &
+				CPPI5_INFO3_DESC_SRCTAG_MASK;
+	desc_hdr->src_dst_tag |= dst_tag_id & CPPI5_INFO3_DESC_DSTTAG_MASK;
+}
+
+/**
+ * cppi5_hdesc_calc_size - Calculate Host Packet Descriptor size
+ * @epib: is EPIB present
+ * @psdata_size: PSDATA size
+ * @sw_data_size: SWDATA size
+ *
+ * Returns required Host Packet Descriptor size
+ * 0 - if PSDATA > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE
+ */
+static inline u32 cppi5_hdesc_calc_size(bool epib, u32 psdata_size,
+					u32 sw_data_size)
+{
+	u32 desc_size;
+
+	if (psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE)
+		return 0;
+	//TODO_GS: align
+	desc_size = sizeof(struct cppi5_host_desc_t) + psdata_size +
+		    sw_data_size;
+
+	if (epib)
+		desc_size += CPPI5_INFO0_HDESC_EPIB_SIZE;
+
+	return ALIGN(desc_size, CPPI5_DESC_MIN_ALIGN);
+}
+
+/**
+ * cppi5_hdesc_init - Init Host Packet Descriptor size
+ * @desc: Host packet descriptor
+ * @flags: supported values
+ *	CPPI5_INFO0_HDESC_EPIB_PRESENT
+ *	CPPI5_INFO0_HDESC_PSINFO_LOCATION
+ * @psdata_size: PSDATA size
+ *
+ * Returns required Host Packet Descriptor size
+ * 0 - if PSDATA > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE
+ */
+static inline void cppi5_hdesc_init(struct cppi5_host_desc_t *desc, u32 flags,
+				    u32 psdata_size)
+{
+	WARN_ON(!desc);
+	WARN_ON(psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE);
+	WARN_ON(flags & ~(CPPI5_INFO0_HDESC_EPIB_PRESENT |
+			  CPPI5_INFO0_HDESC_PSINFO_LOCATION));
+
+	desc->hdr.pkt_info0 = (CPPI5_INFO0_DESC_TYPE_VAL_HOST <<
+			       CPPI5_INFO0_HDESC_TYPE_SHIFT) | (flags);
+	desc->hdr.pkt_info0 |= ((psdata_size >> 2) <<
+				CPPI5_INFO0_HDESC_PSINFO_SIZE_SHIFT) &
+				CPPI5_INFO0_HDESC_PSINFO_SIZE_MASK;
+	desc->next_desc = 0;
+}
+
+/**
+ * cppi5_hdesc_update_flags - Replace descriptor flags
+ * @desc: Host packet descriptor
+ * @flags: supported values
+ *	CPPI5_INFO0_HDESC_EPIB_PRESENT
+ *	CPPI5_INFO0_HDESC_PSINFO_LOCATION
+ */
+static inline void cppi5_hdesc_update_flags(struct cppi5_host_desc_t *desc,
+					    u32 flags)
+{
+	WARN_ON(!desc);
+	WARN_ON(flags & ~(CPPI5_INFO0_HDESC_EPIB_PRESENT |
+			  CPPI5_INFO0_HDESC_PSINFO_LOCATION));
+
+	desc->hdr.pkt_info0 &= ~(CPPI5_INFO0_HDESC_EPIB_PRESENT |
+				 CPPI5_INFO0_HDESC_PSINFO_LOCATION);
+	desc->hdr.pkt_info0 |= flags;
+}
+
+/**
+ * cppi5_hdesc_update_psdata_size - Replace PSdata size
+ * @desc: Host packet descriptor
+ * @psdata_size: PSDATA size
+ */
+static inline void cppi5_hdesc_update_psdata_size(
+		struct cppi5_host_desc_t *desc, u32 psdata_size)
+{
+	WARN_ON(!desc);
+	WARN_ON(psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE);
+
+	desc->hdr.pkt_info0 &= ~CPPI5_INFO0_HDESC_PSINFO_SIZE_MASK;
+	desc->hdr.pkt_info0 |= ((psdata_size >> 2) <<
+				CPPI5_INFO0_HDESC_PSINFO_SIZE_SHIFT) &
+				CPPI5_INFO0_HDESC_PSINFO_SIZE_MASK;
+}
+
+/**
+ * cppi5_hdesc_get_psdata_size - get PSdata size in bytes
+ * @desc: Host packet descriptor
+ */
+static inline u32 cppi5_hdesc_get_psdata_size(struct cppi5_host_desc_t *desc)
+{
+	u32 psdata_size = 0;
+
+	WARN_ON(!desc);
+
+	if (!(desc->hdr.pkt_info0 & CPPI5_INFO0_HDESC_PSINFO_LOCATION))
+		psdata_size = (desc->hdr.pkt_info0 &
+			       CPPI5_INFO0_HDESC_PSINFO_SIZE_MASK) >>
+			       CPPI5_INFO0_HDESC_PSINFO_SIZE_SHIFT;
+
+	return (psdata_size << 2);
+}
+
+/**
+ * cppi5_hdesc_get_pktlen - get Packet Length from HDesc
+ * @desc: Host packet descriptor
+ *
+ * Returns Packet Length from Host Packet Descriptor
+ */
+static inline u32 cppi5_hdesc_get_pktlen(struct cppi5_host_desc_t *desc)
+{
+	WARN_ON(!desc);
+
+	return (desc->hdr.pkt_info0 & CPPI5_INFO0_HDESC_PKTLEN_MASK);
+}
+
+/**
+ * cppi5_hdesc_set_pktlen - set Packet Length in HDesc
+ * @desc: Host packet descriptor
+ */
+static inline void cppi5_hdesc_set_pktlen(struct cppi5_host_desc_t *desc,
+					  u32 pkt_len)
+{
+	WARN_ON(!desc);
+
+	desc->hdr.pkt_info0 |= (pkt_len & CPPI5_INFO0_HDESC_PKTLEN_MASK);
+}
+
+/**
+ * cppi5_hdesc_get_psflags - get Protocol Specific Flags from HDesc
+ * @desc: Host packet descriptor
+ *
+ * Returns Protocol Specific Flags from Host Packet Descriptor
+ */
+static inline u32 cppi5_hdesc_get_psflags(struct cppi5_host_desc_t *desc)
+{
+	WARN_ON(!desc);
+
+	return (desc->hdr.pkt_info1 & CPPI5_INFO1_HDESC_PSFLGS_MASK) >>
+		CPPI5_INFO1_HDESC_PSFLGS_SHIFT;
+}
+
+/**
+ * cppi5_hdesc_set_psflags - set Protocol Specific Flags in HDesc
+ * @desc: Host packet descriptor
+ */
+static inline void cppi5_hdesc_set_psflags(struct cppi5_host_desc_t *desc,
+					   u32 ps_flags)
+{
+	WARN_ON(!desc);
+
+	desc->hdr.pkt_info1 |= (ps_flags <<
+				CPPI5_INFO1_HDESC_PSFLGS_SHIFT) &
+				CPPI5_INFO1_HDESC_PSFLGS_MASK;
+}
+
+/**
+ * cppi5_hdesc_get_errflags - get Packet Type from HDesc
+ * @desc: Host packet descriptor
+ */
+static inline u32 cppi5_hdesc_get_pkttype(struct cppi5_host_desc_t *desc)
+{
+	WARN_ON(!desc);
+
+	return (desc->hdr.pkt_info2 & CPPI5_INFO2_HDESC_PKTTYPE_MASK) >>
+		CPPI5_INFO2_HDESC_PKTTYPE_SHIFT;
+}
+
+/**
+ * cppi5_hdesc_get_errflags - set Packet Type in HDesc
+ * @desc: Host packet descriptor
+ * @pkt_type: Packet Type
+ */
+static inline void cppi5_hdesc_set_pkttype(struct cppi5_host_desc_t *desc,
+					   u32 pkt_type)
+{
+	WARN_ON(!desc);
+	desc->hdr.pkt_info2 |=
+			(pkt_type << CPPI5_INFO2_HDESC_PKTTYPE_SHIFT) &
+			 CPPI5_INFO2_HDESC_PKTTYPE_MASK;
+}
+
+/**
+ * cppi5_hdesc_attach_buf - attach buffer to HDesc
+ * @desc: Host packet descriptor
+ * @buf: Buffer physical address
+ * @buf_data_len: Buffer length
+ * @obuf: Original Buffer physical address
+ * @obuf_len: Original Buffer length
+ *
+ * Attaches buffer to Host Packet Descriptor
+ */
+static inline void cppi5_hdesc_attach_buf(struct cppi5_host_desc_t *desc,
+					  dma_addr_t buf, u32 buf_data_len,
+					  dma_addr_t obuf, u32 obuf_len)
+{
+	WARN_ON(!desc);
+	WARN_ON(!buf && !obuf);
+
+	desc->buf_ptr = buf;
+	desc->buf_info1 = buf_data_len & CPPI5_BUFINFO1_HDESC_DATA_LEN_MASK;
+	desc->org_buf_ptr = obuf;
+	desc->org_buf_len = obuf_len & CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK;
+}
+
+static inline void cppi5_hdesc_get_obuf(struct cppi5_host_desc_t *desc,
+					dma_addr_t *obuf, u32 *obuf_len)
+{
+	WARN_ON(!desc);
+	WARN_ON(!obuf);
+	WARN_ON(!obuf_len);
+
+	*obuf = desc->org_buf_ptr;
+	*obuf_len = desc->org_buf_len & CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK;
+}
+
+static inline void cppi5_hdesc_reset_to_original(struct cppi5_host_desc_t *desc)
+{
+	WARN_ON(!desc);
+
+	desc->buf_ptr = desc->org_buf_ptr;
+	desc->buf_info1 = desc->org_buf_len;
+}
+
+/**
+ * cppi5_hdesc_link_hbdesc - link Host Buffer Descriptor to HDesc
+ * @desc: Host Packet Descriptor
+ * @buf_desc: Host Buffer Descriptor physical address
+ *
+ * add and link Host Buffer Descriptor to HDesc
+ */
+static inline void cppi5_hdesc_link_hbdesc(struct cppi5_host_desc_t *desc,
+					   dma_addr_t hbuf_desc)
+{
+	WARN_ON(!desc);
+	WARN_ON(!hbuf_desc);
+
+	desc->next_desc = hbuf_desc;
+}
+
+static inline dma_addr_t cppi5_hdesc_get_next_hbdesc(
+		struct cppi5_host_desc_t *desc)
+{
+	WARN_ON(!desc);
+
+	return (dma_addr_t)desc->next_desc;
+}
+
+static inline void cppi5_hdesc_reset_hbdesc(struct cppi5_host_desc_t *desc)
+{
+	WARN_ON(!desc);
+
+	desc->hdr = (struct cppi5_desc_hdr_t) { 0 };
+	desc->next_desc = 0;
+}
+
+/**
+ * cppi5_hdesc_epib_present -  check if EPIB present
+ * @desc_hdr: packet descriptor/TR header
+ *
+ * Returns true if EPIB present in the packet
+ */
+static inline bool cppi5_hdesc_epib_present(struct cppi5_desc_hdr_t *desc_hdr)
+{
+	WARN_ON(!desc_hdr);
+	return !!(desc_hdr->pkt_info0 & CPPI5_INFO0_HDESC_EPIB_PRESENT);
+}
+
+/**
+ * cppi5_hdesc_get_psdata -  Get pointer on PSDATA
+ * @desc: Host packet descriptor
+ *
+ * Returns pointer on PSDATA in HDesc.
+ * NULL - if ps_data placed at the start of data buffer.
+ */
+static inline void *cppi5_hdesc_get_psdata(struct cppi5_host_desc_t *desc)
+{
+	u32 psdata_size;
+	void *psdata;
+
+	WARN_ON(!desc);
+
+	if (desc->hdr.pkt_info0 & CPPI5_INFO0_HDESC_PSINFO_LOCATION)
+		return NULL;
+
+	psdata_size = (desc->hdr.pkt_info0 &
+		       CPPI5_INFO0_HDESC_PSINFO_SIZE_MASK) >>
+		       CPPI5_INFO0_HDESC_PSINFO_SIZE_SHIFT;
+
+	if (!psdata_size)
+		return NULL;
+
+	psdata = &desc->epib;
+
+	if (cppi5_hdesc_epib_present(&desc->hdr))
+		psdata += CPPI5_INFO0_HDESC_EPIB_SIZE;
+
+	return psdata;
+}
+
+static inline u32 *cppi5_hdesc_get_psdata32(struct cppi5_host_desc_t *desc)
+{
+	return (u32 *)cppi5_hdesc_get_psdata(desc);
+}
+
+/**
+ * cppi5_hdesc_get_swdata -  Get pointer on swdata
+ * @desc: Host packet descriptor
+ *
+ * Returns pointer on SWDATA in HDesc.
+ * NOTE. It's caller responsibility to be sure hdesc actually has swdata.
+ */
+static inline void *cppi5_hdesc_get_swdata(struct cppi5_host_desc_t *desc)
+{
+	u32 psdata_size = 0;
+	void *swdata;
+
+	WARN_ON(!desc);
+
+	if (!(desc->hdr.pkt_info0 & CPPI5_INFO0_HDESC_PSINFO_LOCATION))
+		psdata_size = (desc->hdr.pkt_info0 &
+			       CPPI5_INFO0_HDESC_PSINFO_SIZE_MASK) >>
+			       CPPI5_INFO0_HDESC_PSINFO_SIZE_SHIFT;
+
+	swdata = &desc->epib;
+
+	if (cppi5_hdesc_epib_present(&desc->hdr))
+		swdata += CPPI5_INFO0_HDESC_EPIB_SIZE;
+
+	swdata += (psdata_size << 2);
+
+	return swdata;
+}
+
+/* ================================== TR ================================== */
+
+#define CPPI5_TR_TYPE_SHIFT			(0U)
+#define CPPI5_TR_TYPE_MASK			GENMASK(3, 0)
+#define CPPI5_TR_STATIC				BIT(4)
+#define CPPI5_TR_WAIT				BIT(5)
+#define CPPI5_TR_EVENT_SIZE_SHIFT		(6U)
+#define CPPI5_TR_EVENT_SIZE_MASK		GENMASK(7, 6)
+#define CPPI5_TR_TRIGGER0_SHIFT			(8U)
+#define CPPI5_TR_TRIGGER0_MASK			GENMASK(9, 8)
+#define CPPI5_TR_TRIGGER0_TYPE_SHIFT		(10U)
+#define CPPI5_TR_TRIGGER0_TYPE_MASK		GENMASK(11, 10)
+#define CPPI5_TR_TRIGGER1_SHIFT			(12U)
+#define CPPI5_TR_TRIGGER1_MASK			GENMASK(13, 12)
+#define CPPI5_TR_TRIGGER1_TYPE_SHIFT		(14U)
+#define CPPI5_TR_TRIGGER1_TYPE_MASK		GENMASK(15, 14)
+#define CPPI5_TR_CMD_ID_SHIFT			(16U)
+#define CPPI5_TR_CMD_ID_MASK			GENMASK(23, 16)
+#define CPPI5_TR_CSF_FLAGS_SHIFT		(24U)
+#define CPPI5_TR_CSF_FLAGS_MASK			GENMASK(31, 24)
+#define   CPPI5_TR_CSF_SA_INDIRECT		BIT(0)
+#define   CPPI5_TR_CSF_DA_INDIRECT		BIT(1)
+#define   CPPI5_TR_CSF_SUPR_EVT			BIT(2)
+#define   CPPI5_TR_CSF_EOL_ADV_SHIFT		(4U)
+#define   CPPI5_TR_CSF_EOL_ADV_MASK		GENMASK(6, 4)
+#define   CPPI5_TR_CSF_EOP			BIT(7)
+
+/* Udmap TR flags Type field specifies the type of TR. */
+enum cppi5_tr_types {
+	/* type0: One dimensional data move */
+	CPPI5_TR_TYPE0 = 0,
+	/* type1: Two dimensional data move */
+	CPPI5_TR_TYPE1,
+	/* type2: Three dimensional data move */
+	CPPI5_TR_TYPE2,
+	/* type3: Four dimensional data move */
+	CPPI5_TR_TYPE3,
+	/* type4: Four dimensional data move with data formatting */
+	CPPI5_TR_TYPE4,
+	/* type5: Four dimensional Cache Warm */
+	CPPI5_TR_TYPE5,
+	/* type6-7: Reserved */
+	/* type8: Four Dimensional Block Move */
+	CPPI5_TR_TYPE8 = 8,
+	/* type9: Four Dimensional Block Move with Repacking */
+	CPPI5_TR_TYPE9,
+	/* type10: Two Dimensional Block Move */
+	CPPI5_TR_TYPE10,
+	/* type11: Two Dimensional Block Move with Repacking */
+	CPPI5_TR_TYPE11,
+	/* type12-14: Reserved */
+	/* type15 Four Dimensional Block Move with Repacking and Indirection */
+	CPPI5_TR_TYPE15 = 15,
+	CPPI5_TR_TYPE_MAX
+};
+
+/*
+ * Udmap TR Flags EVENT_SIZE field specifies when an event is generated
+ * for each TR.
+ */
+enum cppi5_tr_event_size {
+	/* When TR is complete and all status for the TR has been received */
+	CPPI5_TR_EVENT_SIZE_COMPLETION,
+	/*
+	 * Type 0: when the last data transaction is sent for the TR;
+	 * Type 1-11: when ICNT1 is decremented
+	 */
+	CPPI5_TR_EVENT_SIZE_ICNT1_DEC,
+	/*
+	 * Type 0-1,10-11: when the last data transaction is sent for the TR;
+	 * All other types: when ICNT2 is decremented
+	 */
+	CPPI5_TR_EVENT_SIZE_ICNT2_DEC,
+	/*
+	 * Type 0-2,10-11: when the last data transaction is sent for the TR;
+	 * All other types: when ICNT3 is decremented
+	 */
+	CPPI5_TR_EVENT_SIZE_ICNT3_DEC,
+	CPPI5_TR_EVENT_SIZE_MAX
+};
+
+/*
+ * Udmap TR Flags TRIGGERx field specifies the type of trigger used to
+ * enable the TR to transfer data as specified by TRIGGERx_TYPE field.
+ */
+enum cppi5_tr_trigger {
+	CPPI5_TR_TRIGGER_NONE,		/* No Trigger */
+	CPPI5_TR_TRIGGER_GLOBAL0,		/* Global Trigger 0 */
+	CPPI5_TR_TRIGGER_GLOBAL1,		/* Global Trigger 1 */
+	CPPI5_TR_TRIGGER_LOCAL_EVENT,	/* Local Event */
+	CPPI5_TR_TRIGGER_MAX
+};
+
+/*
+ * Udmap TR Flags TRIGGERx_TYPE field specifies the type of data transfer
+ * that will be enabled by receiving a trigger as specified by TRIGGERx.
+ */
+enum cppi5_tr_trigger_type {
+	/* The second inner most loop (ICNT1) will be decremented by 1 */
+	CPPI5_TR_TRIGGER_TYPE_ICNT1_DEC,
+	/* The third inner most loop (ICNT2) will be decremented by 1 */
+	CPPI5_TR_TRIGGER_TYPE_ICNT2_DEC,
+	/* The outer most loop (ICNT3) will be decremented by 1 */
+	CPPI5_TR_TRIGGER_TYPE_ICNT3_DEC,
+	/* The entire TR will be allowed to complete */
+	CPPI5_TR_TRIGGER_TYPE_ALL,
+	CPPI5_TR_TRIGGER_TYPE_MAX
+};
+
+typedef u32 cppi5_tr_flags_t;
+
+/* Type 0 (One dimensional data move) TR (16 byte) */
+struct cppi5_tr_type0_t {
+	cppi5_tr_flags_t flags;
+	u16 icnt0;
+	u16 unused;
+	u64 addr;
+} __aligned(16) __packed;
+
+/* Type 1 (Two dimensional data move) TR (32 byte) */
+struct cppi5_tr_type1_t {
+	cppi5_tr_flags_t flags;
+	u16 icnt0;
+	u16 icnt1;
+	u64 addr;
+	s32 dim1;
+} __aligned(32) __packed;
+
+/* Type 2 (Three dimensional data move) TR (32 byte) */
+struct cppi5_tr_type2_t {
+	cppi5_tr_flags_t flags;
+	u16 icnt0;
+	u16 icnt1;
+	u64 addr;
+	s32 dim1;
+	u16 icnt2;
+	u16 unused;
+	s32 dim2;
+} __aligned(32) __packed;
+
+/* Type 3 (Four dimensional data move) TR (32 byte) */
+struct cppi5_tr_type3_t {
+	cppi5_tr_flags_t flags;
+	u16 icnt0;
+	u16 icnt1;
+	u64 addr;
+	s32 dim1;
+	u16 icnt2;
+	u16 icnt3;
+	s32 dim2;
+	s32 dim3;
+} __aligned(32) __packed;
+
+/*
+ * Type 15 (Four Dimensional Block Copy with Repacking and
+ * Indirection Support) TR (64 byte).
+ */
+struct cppi5_tr_type15_t {
+	cppi5_tr_flags_t flags;
+	u16 icnt0;
+	u16 icnt1;
+	u64 addr;
+	s32 dim1;
+	u16 icnt2;
+	u16 icnt3;
+	s32 dim2;
+	s32 dim3;
+	u32 _reserved;
+	s32 ddim1;
+	u64 daddr;
+	s32 ddim2;
+	s32 ddim3;
+	u16 dicnt0;
+	u16 dicnt1;
+	u16 dicnt2;
+	u16 dicnt3;
+} __aligned(64) __packed;
+
+struct cppi5_tr_resp_t {
+	u8 status;
+	u8 reserved;
+	u8 cmd_id;
+	u8 flags;
+} __packed;
+
+#define CPPI5_TR_RESPONSE_STATUS_TYPE_SHIFT	(0U)
+#define CPPI5_TR_RESPONSE_STATUS_TYPE_MASK	GENMASK(3, 0)
+#define CPPI5_TR_RESPONSE_STATUS_INFO_SHIFT	(4U)
+#define CPPI5_TR_RESPONSE_STATUS_INFO_MASK	GENMASK(7, 4)
+#define CPPI5_TR_RESPONSE_CMDID_SHIFT		(16U)
+#define CPPI5_TR_RESPONSE_CMDID_MASK		GENMASK(23, 16)
+#define CPPI5_TR_RESPONSE_CFG_SPECIFIC_SHIFT	(24U)
+#define CPPI5_TR_RESPONSE_CFG_SPECIFIC_MASK	GENMASK(31, 24)
+
+/*
+ * Udmap TR Response Status Type field is used to determine
+ * what type of status is being returned.
+ */
+enum cppi5_tr_resp_status_type {
+	CPPI5_TR_RESPONSE_STATUS_COMPLETE,		/* None */
+	CPPI5_TR_RESPONSE_STATUS_TRANSFER_ERR,		/* Transfer Error */
+	CPPI5_TR_RESPONSE_STATUS_ABORTED_ERR,		/* Aborted Error */
+	CPPI5_TR_RESPONSE_STATUS_SUBMISSION_ERR,	/* Submission Error */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_ERR,	/* Unsup. Feature */
+	CPPI5_TR_RESPONSE_STATUS_MAX
+};
+
+/*
+ * Udmap TR Response Status field values which corresponds
+ * CPPI5_TR_RESPONSE_STATUS_SUBMISSION_ERR
+ */
+enum cppi5_tr_resp_status_submission {
+	/* ICNT0 was 0 */
+	CPPI5_TR_RESPONSE_STATUS_SUBMISSION_ICNT0,
+	/* Channel FIFO was full when TR received */
+	CPPI5_TR_RESPONSE_STATUS_SUBMISSION_FIFO_FULL,
+	/* Channel is not owned by the submitter */
+	CPPI5_TR_RESPONSE_STATUS_SUBMISSION_OWN,
+	CPPI5_TR_RESPONSE_STATUS_SUBMISSION_MAX
+};
+
+/*
+ * Udmap TR Response Status field values which corresponds
+ * CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_ERR
+ */
+enum cppi5_tr_resp_status_unsupported {
+	/* TR Type not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_TR_TYPE,
+	/* STATIC not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_STATIC,
+	/* EOL not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_EOL,
+	/* CONFIGURATION SPECIFIC not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_CFG_SPECIFIC,
+	/* AMODE not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_AMODE,
+	/* ELTYPE not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_ELTYPE,
+	/* DFMT not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_DFMT,
+	/* SECTR not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_SECTR,
+	/* AMODE SPECIFIC field not supported */
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_AMODE_SPECIFIC,
+	CPPI5_TR_RESPONSE_STATUS_UNSUPPORTED_MAX
+};
+
+/**
+ * cppi5_trdesc_calc_size - Calculate TR Descriptor size
+ * @tr_count: number of TR records
+ * @tr_size: Nominal size of TR record (max) [16, 32, 64, 128]
+ *
+ * Returns required TR Descriptor size
+ */
+static inline size_t cppi5_trdesc_calc_size(u32 tr_count, u32 tr_size)
+{
+	/*
+	 * The Size of a TR descriptor is:
+	 * 1 x tr_size : the first 16 bytes is used by the packet info block +
+	 * tr_count x tr_size : Transfer Request Records +
+	 * tr_count x sizeof(struct cppi5_tr_resp_t) : Transfer Response Records
+	 */
+	return tr_size * (tr_count + 1) +
+		sizeof(struct cppi5_tr_resp_t) * tr_count;
+}
+
+/**
+ * cppi5_trdesc_init - Init TR Descriptor
+ * @desc: TR Descriptor
+ * @tr_count: number of TR records
+ * @tr_size: Nominal size of TR record (max) [16, 32, 64, 128]
+ * @reload_idx: Absolute index to jump to on the 2nd and following passes
+ *		through the TR packet.
+ * @reload_count: Number of times to jump from last entry to reload_idx. 0x1ff
+ *		  indicates infinite looping.
+ *
+ * Init TR Descriptor
+ */
+static inline void cppi5_trdesc_init(struct cppi5_desc_hdr_t *desc_hdr,
+				     u32 tr_count, u32 tr_size, u32 reload_idx,
+				     u32 reload_count)
+{
+	WARN_ON(!desc_hdr);
+	WARN_ON(tr_count & ~CPPI5_INFO0_TRDESC_LASTIDX_MASK);
+	WARN_ON(reload_idx > CPPI5_INFO0_TRDESC_RLDIDX_MAX);
+	WARN_ON(reload_count > CPPI5_INFO0_TRDESC_RLDCNT_MAX);
+
+	desc_hdr->pkt_info0 = CPPI5_INFO0_DESC_TYPE_VAL_TR <<
+			      CPPI5_INFO0_HDESC_TYPE_SHIFT;
+	desc_hdr->pkt_info0 |= (reload_count << CPPI5_INFO0_TRDESC_RLDCNT_SHIFT) &
+			       CPPI5_INFO0_TRDESC_RLDCNT_MASK;
+	desc_hdr->pkt_info0 |= (reload_idx << CPPI5_INFO0_TRDESC_RLDIDX_SHIFT) &
+			       CPPI5_INFO0_TRDESC_RLDIDX_MASK;
+	desc_hdr->pkt_info0 |= (tr_count - 1) & CPPI5_INFO0_TRDESC_LASTIDX_MASK;
+
+	desc_hdr->pkt_info1 |= ((ffs(tr_size >> 4) - 1) <<
+				CPPI5_INFO1_TRDESC_RECSIZE_SHIFT) &
+				CPPI5_INFO1_TRDESC_RECSIZE_MASK;
+}
+
+/**
+ * cppi5_tr_init - Init TR record
+ * @flags: Pointer to the TR's flags
+ * @type: TR type
+ * @static_tr: TR is static
+ * @wait: Wait for TR completion before allow the next TR to start
+ * @event_size: output event generation cfg
+ * @cmd_id: TR identifier (application specifics)
+ *
+ * Init TR record
+ */
+static inline void cppi5_tr_init(cppi5_tr_flags_t *flags,
+				 enum cppi5_tr_types type, bool static_tr,
+				 bool wait, enum cppi5_tr_event_size event_size,
+				 u32 cmd_id)
+{
+	WARN_ON(!flags);
+
+	*flags = type;
+	*flags |= (event_size << CPPI5_TR_EVENT_SIZE_SHIFT) &
+		  CPPI5_TR_EVENT_SIZE_MASK;
+
+	*flags |= (cmd_id << CPPI5_TR_CMD_ID_SHIFT) &
+		  CPPI5_TR_CMD_ID_MASK;
+
+	if (static_tr && (type == CPPI5_TR_TYPE8 || type == CPPI5_TR_TYPE9))
+		*flags |= CPPI5_TR_STATIC;
+
+	if (wait)
+		*flags |= CPPI5_TR_WAIT;
+}
+
+/**
+ * cppi5_tr_set_trigger - Configure trigger0/1 and trigger0/1_type
+ * @flags: Pointer to the TR's flags
+ * @trigger0: trigger0 selection
+ * @trigger0_type: type of data transfer that will be enabled by trigger0
+ * @trigger1: trigger1 selection
+ * @trigger1_type: type of data transfer that will be enabled by trigger1
+ *
+ * Configure the triggers for the TR
+ */
+static inline void cppi5_tr_set_trigger(cppi5_tr_flags_t *flags,
+		enum cppi5_tr_trigger trigger0,
+		enum cppi5_tr_trigger_type trigger0_type,
+		enum cppi5_tr_trigger trigger1,
+		enum cppi5_tr_trigger_type trigger1_type)
+{
+	WARN_ON(!flags);
+
+	*flags |= (trigger0 << CPPI5_TR_TRIGGER0_SHIFT) &
+		  CPPI5_TR_TRIGGER0_MASK;
+	*flags |= (trigger0_type << CPPI5_TR_TRIGGER0_TYPE_SHIFT) &
+		  CPPI5_TR_TRIGGER0_TYPE_MASK;
+
+	*flags |= (trigger1 << CPPI5_TR_TRIGGER1_SHIFT) &
+		  CPPI5_TR_TRIGGER1_MASK;
+	*flags |= (trigger1_type << CPPI5_TR_TRIGGER1_TYPE_SHIFT) &
+		  CPPI5_TR_TRIGGER1_TYPE_MASK;
+}
+
+/**
+ * cppi5_tr_cflag_set - Update the Configuration specific flags
+ * @flags: Pointer to the TR's flags
+ * @csf: Configuration specific flags
+ *
+ * Set a bit in Configuration Specific Flags section of the TR flags.
+ */
+static inline void cppi5_tr_csf_set(cppi5_tr_flags_t *flags, u32 csf)
+{
+	WARN_ON(!flags);
+
+	*flags |= (csf << CPPI5_TR_CSF_FLAGS_SHIFT) &
+		  CPPI5_TR_CSF_FLAGS_MASK;
+}
+
+#endif /* __TI_CPPI5_H__ */
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (5 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-08-21 17:59   ` Rob Herring
  2019-07-30  9:34 ` [PATCH v2 08/14] dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io func Peter Ujfalusi
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

New binding document for
Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P).

UDMA-P is introduced as part of the K3 architecture and can be found on
AM654 and j721e.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 .../devicetree/bindings/dma/ti/k3-udma.txt    | 170 ++++++++++++++++++
 include/dt-bindings/dma/k3-udma.h             |  10 ++
 2 files changed, 180 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
 create mode 100644 include/dt-bindings/dma/k3-udma.h

diff --git a/Documentation/devicetree/bindings/dma/ti/k3-udma.txt b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
new file mode 100644
index 000000000000..7f30fe583ade
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
@@ -0,0 +1,170 @@
+* Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)
+
+The UDMA-P is intended to perform similar (but significantly upgraded) functions
+as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
+supports the transmission and reception of various packet types. The UDMA-P is
+architected to facilitate the segmentation and reassembly of SoC DMA data
+structure compliant packets to/from smaller data blocks that are natively
+compatible with the specific requirements of each connected peripheral. Multiple
+Tx and Rx channels are provided within the DMA which allow multiple segmentation
+or reassembly operations to be ongoing. The DMA controller maintains state
+information for each of the channels which allows packet segmentation and
+reassembly operations to be time division multiplexed between channels in order
+to share the underlying DMA hardware. An external DMA scheduler is used to
+control the ordering and rate at which this multiplexing occurs for Transmit
+operations. The ordering and rate of Receive operations is indirectly controlled
+by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.
+
+The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
+channels. Channels in the UDMA-P can be configured to be either Packet-Based or
+Third-Party channels on a channel by channel basis.
+
+Required properties:
+--------------------
+- compatible:		Should be
+			"ti,am654-navss-main-udmap" for am654 main NAVSS UDMAP
+			"ti,am654-navss-mcu-udmap" for am654 mcu NAVSS UDMAP
+			"ti,j721e-navss-main-udmap" for j721e main NAVSS UDMAP
+			"ti,j721e-navss-mcu-udmap" for j721e mcu NAVSS UDMAP
+- #dma-cells:		Should be set to <3>.
+			- The first parameter is a phandle to the remote PSI-L
+			  endpoint
+			- The second parameter is the thread offset within the
+			  remote thread ID range
+			- The third parameter is the channel direction.
+- reg:			Memory map of UDMAP
+- reg-names:		"gcfg", "rchanrt", "tchanrt"
+- msi-parent:		phandle for "ti,sci-inta" interrupt controller
+- ti,ringacc:		phandle for the ring accelerator node
+- ti,psil-base:		PSI-L thread ID base of the UDMAP channels
+- ti,sci:		phandle on TI-SCI compatible System controller node
+- ti,sci-dev-id:	TI-SCI device id
+- ti,sci-rm-range-tchan: UDMA tchan resource list in pairs of type and subtype
+- ti,sci-rm-range-rchan: UDMA rchan resource list in pairs of type and subtype
+- ti,sci-rm-range-rflow: UDMA rflow resource list in pairs of type and subtype
+
+For PSI-L thread management the parent NAVSS node must have:
+- ti,sci:		phandle on TI-SCI compatible System controller node
+- ti,sci-dev-id:	TI-SCI device id of the NAVSS instance
+
+Remote PSI-L endpoint
+
+Required properties:
+--------------------
+- ti,psil-base:		PSI-L thread ID base of the endpoint
+
+Within the PSI-L endpoint node thread configuration subnodes must present with:
+psil-configX naming convention, where X is the thread ID offset.
+
+Configuration node Optional properties:
+--------------------
+- pdma,statictr-type:	In case the remote endpoint (PDMAs) requires StaticTR
+			configuration:
+			- PSIL_STATIC_TR_XY (1): XY type of StaticTR
+			For endpoints without StaticTR the property is not
+			needed or to be set PSIL_STATIC_TR_NONE (0).
+- pdma,enable-acc32:	Force 32 bit access on peripheral port. Only valid for
+			XY type StaticTR, not supported on am654.
+			Must be enabled for threads servicing McASP with AFIFO
+			bypass mode.
+- pdma,enable-burst:	Enable burst access on peripheral port. Only valid for
+			XY type StaticTR, not supported on am654.
+- ti,channel-tpl:	Channel Throughput level:
+			0 / or not present - normal channel
+			1 - High Throughput channel
+			2 - Ultra High Throughput channel (j721e only)
+- ti,needs-epib:	If the endpoint require EPIB to be present in the
+			descriptor.
+- ti,psd-size:		Size of the Protocol Specific Data section of the
+			descriptor.
+
+Example:
+
+main_navss: main_navss {
+	compatible = "simple-bus";
+	#address-cells = <2>;
+	#size-cells = <2>;
+	dma-coherent;
+	dma-ranges;
+	ranges;
+
+	ti,sci = <&dmsc>;
+	ti,sci-dev-id = <118>;
+
+	main_udmap: dma-controller@31150000 {
+		compatible = "ti,am654-navss-main-udmap";
+		reg =	<0x0 0x31150000 0x0 0x100>,
+			<0x0 0x34000000 0x0 0x100000>,
+			<0x0 0x35000000 0x0 0x100000>;
+		reg-names = "gcfg", "rchanrt", "tchanrt";
+		#dma-cells = <3>;
+
+		ti,ringacc = <&ringacc>;
+		ti,psil-base = <0x1000>;
+
+		interrupt-parent = <&main_udmass_inta>;
+
+		ti,sci = <&dmsc>;
+		ti,sci-dev-id = <188>;
+
+		ti,sci-rm-range-tchan = <0x6 0x1>, /* TX_HCHAN */
+					<0x6 0x2>; /* TX_CHAN */
+		ti,sci-rm-range-rchan = <0x6 0x4>, /* RX_HCHAN */
+					<0x6 0x5>; /* RX_CHAN */
+		ti,sci-rm-range-rflow = <0x6 0x6>; /* GP RFLOW */
+	};
+};
+
+psilss@340c000 {
+	/* PSILSS1 AASRC */
+	compatible = "ti,j721e-psilss";
+	reg = <0x0 0x0340c000 0x0 0x1000>;
+	reg-names = "config";
+
+	pdma_main_mcasp_g0: pdma_main_mcasp_g0 {
+		/* PDMA6 (PDMA_MCASP_G0) */
+		ti,psil-base = <0x4400>;
+
+		/* psil-config0 */
+		psil-config0 {
+			pdma,statictr-type = <PSIL_STATIC_TR_XY>;
+			pdma,enable-acc32;
+			pdma,enable-burst;
+		};
+	};
+};
+
+mcasp0: mcasp@02B00000 {
+...
+	/* tx: PDMA_MAIN_MCASP_G0-0, rx: PDMA_MAIN_MCASP_G0-0 */
+	dmas = <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_TX>,
+	       <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_RX>;
+	dma-names = "tx", "rx";
+...
+};
+
+crypto: crypto@4E00000 {
+	compatible = "ti,sa2ul-crypto";
+...
+
+	/* tx: crypto_pnp-1, rx: crypto_pnp-1 */
+	dmas = <&main_udmap &crypto 0 UDMA_DIR_TX>,
+	       <&main_udmap &crypto 0 UDMA_DIR_RX>,
+	       <&main_udmap &crypto 1 UDMA_DIR_RX>;
+	dma-names = "tx", "rx1", "rx2";
+...
+	psil-config0 {
+		ti,needs-epib;
+		ti,psd-size = <64>;
+	};
+
+	psil-config1 {
+		ti,needs-epib;
+		ti,psd-size = <64>;
+	};
+
+	psil-config2 {
+		ti,needs-epib;
+		ti,psd-size = <64>;
+	};
+};
diff --git a/include/dt-bindings/dma/k3-udma.h b/include/dt-bindings/dma/k3-udma.h
new file mode 100644
index 000000000000..f5c8f5d50491
--- /dev/null
+++ b/include/dt-bindings/dma/k3-udma.h
@@ -0,0 +1,10 @@
+#ifndef __DT_TI_UDMA_H
+#define __DT_TI_UDMA_H
+
+#define UDMA_DIR_TX		0
+#define UDMA_DIR_RX		1
+
+#define PSIL_STATIC_TR_NONE	0
+#define PSIL_STATIC_TR_XY	1
+
+#endif /* __DT_TI_UDMA_H */
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 08/14] dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io func
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (6 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-09-10  7:27   ` Grygorii Strashko
  2019-07-30  9:34 ` [PATCH v2 09/14] dmaengine: ti: New driver for K3 UDMA - split#2: probe/remove, xlate and filter_fn Peter Ujfalusi
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Split patch for review containing: defines, structs, io and low level
functions and interrupt callbacks.

DMA driver for
Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)

The UDMA-P is intended to perform similar (but significantly upgraded) functions
as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
supports the transmission and reception of various packet types. The UDMA-P is
architected to facilitate the segmentation and reassembly of SoC DMA data
structure compliant packets to/from smaller data blocks that are natively
compatible with the specific requirements of each connected peripheral. Multiple
Tx and Rx channels are provided within the DMA which allow multiple segmentation
or reassembly operations to be ongoing. The DMA controller maintains state
information for each of the channels which allows packet segmentation and
reassembly operations to be time division multiplexed between channels in order
to share the underlying DMA hardware. An external DMA scheduler is used to
control the ordering and rate at which this multiplexing occurs for Transmit
operations. The ordering and rate of Receive operations is indirectly controlled
by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.

The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
channels. Channels in the UDMA-P can be configured to be either Packet-Based or
Third-Party channels on a channel by channel basis.

The initial driver supports:
- MEM_TO_MEM (TR mode)
- DEV_TO_MEM (Packet / TR mode)
- MEM_TO_DEV (Packet / TR mode)
- Cyclic (Packet / TR mode)
- Metadata for descriptors

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/ti/k3-udma.c | 1040 ++++++++++++++++++++++++++++++++++++++
 drivers/dma/ti/k3-udma.h |  130 +++++
 2 files changed, 1170 insertions(+)
 create mode 100644 drivers/dma/ti/k3-udma.c
 create mode 100644 drivers/dma/ti/k3-udma.h

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
new file mode 100644
index 000000000000..e6d2c4b172e5
--- /dev/null
+++ b/drivers/dma/ti/k3-udma.c
@@ -0,0 +1,1040 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ *  Author: Peter Ujfalusi <peter.ujfalusi@ti.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/of.h>
+#include <linux/of_dma.h>
+#include <linux/of_device.h>
+#include <linux/of_irq.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
+#include <dt-bindings/dma/k3-udma.h>
+#include <linux/soc/ti/k3-ringacc.h>
+#include <linux/soc/ti/ti_sci_protocol.h>
+#include <linux/soc/ti/ti_sci_inta_msi.h>
+#include <linux/dma/ti-cppi5.h>
+
+#include "../virt-dma.h"
+#include "k3-udma.h"
+
+struct udma_static_tr {
+	u8 elsize; /* RPSTR0 */
+	u16 elcnt; /* RPSTR0 */
+	u16 bstcnt; /* RPSTR1 */
+};
+
+#define K3_UDMA_MAX_RFLOWS		1024
+#define K3_UDMA_DEFAULT_RING_SIZE	16
+
+struct udma_chan;
+
+enum udma_mmr {
+	MMR_GCFG = 0,
+	MMR_RCHANRT,
+	MMR_TCHANRT,
+	MMR_LAST,
+};
+
+static const char * const mmr_names[] = { "gcfg", "rchanrt", "tchanrt" };
+
+struct udma_tchan {
+	void __iomem *reg_rt;
+
+	int id;
+	struct k3_ring *t_ring; /* Transmit ring */
+	struct k3_ring *tc_ring; /* Transmit Completion ring */
+};
+
+struct udma_rchan {
+	void __iomem *reg_rt;
+
+	int id;
+	struct k3_ring *fd_ring; /* Free Descriptor ring */
+	struct k3_ring *r_ring; /* Receive ring*/
+};
+
+struct udma_rflow {
+	void __iomem *reg_rflow;
+
+	int id;
+};
+
+struct udma_tr_thread_ranges {
+	int start;
+	int count;
+};
+
+struct udma_match_data {
+	bool enable_memcpy_support;
+	bool have_acc32;
+	bool have_burst;
+	u32 statictr_z_mask;
+	u32 rchan_oes_offset;
+
+	struct udma_tr_thread_ranges *tr_threads;
+
+	u8 tpl_levels;
+	u32 level_start_idx[];
+};
+
+struct udma_dev {
+	struct dma_device ddev;
+	struct device *dev;
+	void __iomem *mmrs[MMR_LAST];
+	const struct udma_match_data *match_data;
+
+	size_t desc_align; /* alignment to use for descriptors */
+
+	struct udma_tisci_rm tisci_rm;
+
+	struct k3_ringacc *ringacc;
+
+	struct work_struct purge_work;
+	struct list_head desc_to_purge;
+	spinlock_t lock;
+
+	int tchan_cnt;
+	int echan_cnt;
+	int rchan_cnt;
+	int rflow_cnt;
+	unsigned long *tchan_map;
+	unsigned long *rchan_map;
+	unsigned long *rflow_map;
+	unsigned long *rflow_map_reserved;
+
+	struct udma_tchan *tchans;
+	struct udma_rchan *rchans;
+	struct udma_rflow *rflows;
+
+	struct udma_chan *channels;
+	u32 psil_base;
+};
+
+struct udma_hwdesc {
+	size_t cppi5_desc_size;
+	void *cppi5_desc_vaddr;
+	dma_addr_t cppi5_desc_paddr;
+
+	/* TR descriptor internal pointers */
+	void *tr_req_base;
+	struct cppi5_tr_resp_t *tr_resp_base;
+};
+
+struct udma_desc {
+	struct virt_dma_desc vd;
+
+	bool terminated;
+
+	enum dma_transfer_direction dir;
+
+	struct udma_static_tr static_tr;
+	u32 residue;
+
+	unsigned int sglen;
+	unsigned int desc_idx; /* Only used for cyclic in packet mode */
+	unsigned int tr_idx;
+
+	u32 metadata_size;
+	void *metadata; /* pointer to provided metadata buffer (EPIP, PSdata) */
+
+	unsigned int hwdesc_count;
+	struct udma_hwdesc hwdesc[0];
+};
+
+enum udma_chan_state {
+	UDMA_CHAN_IS_IDLE = 0, /* not active, no teardown is in progress */
+	UDMA_CHAN_IS_ACTIVE, /* Normal operation */
+	UDMA_CHAN_IS_ACTIVE_FLUSH, /* Flushing for delayed tx */
+	UDMA_CHAN_IS_TERMINATING, /* channel is being terminated */
+};
+
+struct udma_chan {
+	struct virt_dma_chan vc;
+	struct dma_slave_config	cfg;
+	struct udma_dev *ud;
+	struct udma_desc *desc;
+	struct udma_desc *terminated_desc;
+	struct udma_static_tr static_tr;
+	char *name;
+
+	struct udma_tchan *tchan;
+	struct udma_rchan *rchan;
+	struct udma_rflow *rflow;
+
+	bool psil_paired;
+
+	int irq_num_ring;
+	int irq_num_udma;
+
+	bool cyclic;
+	bool paused;
+
+	enum udma_chan_state state;
+	struct completion teardown_completed;
+
+	u32 bcnt; /* number of bytes completed since the start of the channel */
+	u32 in_ring_cnt; /* number of descriptors in flight */
+
+	bool pkt_mode; /* TR or packet */
+	bool needs_epib; /* EPIB is needed for the communication or not */
+	u32 psd_size; /* size of Protocol Specific Data */
+	u32 metadata_size; /* (needs_epib ? 16:0) + psd_size */
+	u32 hdesc_size; /* Size of a packet descriptor in packet mode */
+	int remote_thread_id;
+	u32 src_thread;
+	u32 dst_thread;
+	u32 static_tr_type;
+	bool enable_acc32;
+	bool enable_burst;
+	enum udma_tp_level channel_tpl; /* Channel Throughput Level */
+
+	/* dmapool for packet mode descriptors */
+	bool use_dma_pool;
+	struct dma_pool *hdesc_pool;
+
+	u32 id;
+	enum dma_transfer_direction dir;
+};
+
+static inline struct udma_dev *to_udma_dev(struct dma_device *d)
+{
+	return container_of(d, struct udma_dev, ddev);
+}
+
+static inline struct udma_chan *to_udma_chan(struct dma_chan *c)
+{
+	return container_of(c, struct udma_chan, vc.chan);
+}
+
+static inline struct udma_desc *to_udma_desc(struct dma_async_tx_descriptor *t)
+{
+	return container_of(t, struct udma_desc, vd.tx);
+}
+
+/* Generic register access functions */
+static inline u32 udma_read(void __iomem *base, int reg)
+{
+	return __raw_readl(base + reg);
+}
+
+static inline void udma_write(void __iomem *base, int reg, u32 val)
+{
+	__raw_writel(val, base + reg);
+}
+
+static inline void udma_update_bits(void __iomem *base, int reg,
+				    u32 mask, u32 val)
+{
+	u32 tmp, orig;
+
+	orig = __raw_readl(base + reg);
+	tmp = orig & ~mask;
+	tmp |= (val & mask);
+
+	if (tmp != orig)
+		__raw_writel(tmp, base + reg);
+}
+
+/* TCHANRT */
+static inline u32 udma_tchanrt_read(struct udma_tchan *tchan, int reg)
+{
+	if (!tchan)
+		return 0;
+	return udma_read(tchan->reg_rt, reg);
+}
+
+static inline void udma_tchanrt_write(struct udma_tchan *tchan, int reg,
+				      u32 val)
+{
+	if (!tchan)
+		return;
+	udma_write(tchan->reg_rt, reg, val);
+}
+
+static inline void udma_tchanrt_update_bits(struct udma_tchan *tchan, int reg,
+					    u32 mask, u32 val)
+{
+	if (!tchan)
+		return;
+	udma_update_bits(tchan->reg_rt, reg, mask, val);
+}
+
+/* RCHANRT */
+static inline u32 udma_rchanrt_read(struct udma_rchan *rchan, int reg)
+{
+	if (!rchan)
+		return 0;
+	return udma_read(rchan->reg_rt, reg);
+}
+
+static inline void udma_rchanrt_write(struct udma_rchan *rchan, int reg,
+				      u32 val)
+{
+	if (!rchan)
+		return;
+	udma_write(rchan->reg_rt, reg, val);
+}
+
+static inline void udma_rchanrt_update_bits(struct udma_rchan *rchan, int reg,
+					    u32 mask, u32 val)
+{
+	if (!rchan)
+		return;
+	udma_update_bits(rchan->reg_rt, reg, mask, val);
+}
+
+static int navss_psil_pair(struct udma_dev *ud, u32 src_thread, u32 dst_thread)
+{
+	struct udma_tisci_rm *tisci_rm = &ud->tisci_rm;
+
+	dst_thread |= UDMA_PSIL_DST_THREAD_ID_OFFSET;
+	return tisci_rm->tisci_psil_ops->pair(tisci_rm->tisci,
+					      tisci_rm->tisci_navss_dev_id,
+					      src_thread, dst_thread);
+}
+
+static int navss_psil_unpair(struct udma_dev *ud, u32 src_thread,
+			     u32 dst_thread)
+{
+	struct udma_tisci_rm *tisci_rm = &ud->tisci_rm;
+
+	dst_thread |= UDMA_PSIL_DST_THREAD_ID_OFFSET;
+	return tisci_rm->tisci_psil_ops->unpair(tisci_rm->tisci,
+						tisci_rm->tisci_navss_dev_id,
+						src_thread, dst_thread);
+}
+
+static char *udma_get_dir_text(enum dma_transfer_direction dir)
+{
+	switch (dir) {
+	case DMA_DEV_TO_MEM:
+		return "DEV_TO_MEM";
+	case DMA_MEM_TO_DEV:
+		return "MEM_TO_DEV";
+	case DMA_MEM_TO_MEM:
+		return "MEM_TO_MEM";
+	case DMA_DEV_TO_DEV:
+		return "DEV_TO_DEV";
+	default:
+		break;
+	}
+
+	return "invalid";
+}
+
+static void udma_dump_chan_stdata(struct udma_chan *uc)
+{
+	struct device *dev = uc->ud->dev;
+	u32 offset;
+	int i;
+
+	if (uc->dir == DMA_MEM_TO_DEV || uc->dir == DMA_MEM_TO_MEM) {
+		dev_dbg(dev, "TCHAN State data:\n");
+		for (i = 0; i < 32; i++) {
+			offset = UDMA_TCHAN_RT_STDATA_REG + i * 4;
+			dev_dbg(dev, "TRT_STDATA[%02d]: 0x%08x\n", i,
+				udma_tchanrt_read(uc->tchan, offset));
+		}
+	}
+
+	if (uc->dir == DMA_DEV_TO_MEM || uc->dir == DMA_MEM_TO_MEM) {
+		dev_dbg(dev, "RCHAN State data:\n");
+		for (i = 0; i < 32; i++) {
+			offset = UDMA_RCHAN_RT_STDATA_REG + i * 4;
+			dev_dbg(dev, "RRT_STDATA[%02d]: 0x%08x\n", i,
+				udma_rchanrt_read(uc->rchan, offset));
+		}
+	}
+}
+
+static inline dma_addr_t udma_curr_cppi5_desc_paddr(struct udma_desc *d,
+						    int idx)
+{
+	return d->hwdesc[idx].cppi5_desc_paddr;
+}
+
+static inline void *udma_curr_cppi5_desc_vaddr(struct udma_desc *d, int idx)
+{
+	return d->hwdesc[idx].cppi5_desc_vaddr;
+}
+
+static struct udma_desc *udma_udma_desc_from_paddr(struct udma_chan *uc,
+						   dma_addr_t paddr)
+{
+	struct udma_desc *d = uc->terminated_desc;
+
+	if (d) {
+		dma_addr_t desc_paddr = udma_curr_cppi5_desc_paddr(d,
+								   d->desc_idx);
+
+		if (desc_paddr != paddr)
+			d = NULL;
+	}
+
+	if (!d) {
+		d = uc->desc;
+		if (d) {
+			dma_addr_t desc_paddr = udma_curr_cppi5_desc_paddr(d,
+								d->desc_idx);
+
+			if (desc_paddr != paddr)
+				d = NULL;
+		}
+	}
+
+	return d;
+}
+
+static void udma_free_hwdesc(struct udma_chan *uc, struct udma_desc *d)
+{
+	if (uc->use_dma_pool) {
+		int i;
+
+		for (i = 0; i < d->hwdesc_count; i++) {
+			if (!d->hwdesc[i].cppi5_desc_vaddr)
+				continue;
+
+			dma_pool_free(uc->hdesc_pool,
+				      d->hwdesc[i].cppi5_desc_vaddr,
+				      d->hwdesc[i].cppi5_desc_paddr);
+
+			d->hwdesc[i].cppi5_desc_vaddr = NULL;
+		}
+	} else if (d->hwdesc[0].cppi5_desc_vaddr) {
+		struct udma_dev *ud = uc->ud;
+
+		dma_free_coherent(ud->dev, d->hwdesc[0].cppi5_desc_size,
+				  d->hwdesc[0].cppi5_desc_vaddr,
+				  d->hwdesc[0].cppi5_desc_paddr);
+
+		d->hwdesc[0].cppi5_desc_vaddr = NULL;
+	}
+}
+
+static void udma_purge_desc_work(struct work_struct *work)
+{
+	struct udma_dev *ud = container_of(work, typeof(*ud), purge_work);
+	struct virt_dma_desc *vd, *_vd;
+	unsigned long flags;
+	LIST_HEAD(head);
+
+	spin_lock_irqsave(&ud->lock, flags);
+	list_splice_tail_init(&ud->desc_to_purge, &head);
+	spin_unlock_irqrestore(&ud->lock, flags);
+
+	list_for_each_entry_safe(vd, _vd, &head, node) {
+		struct udma_chan *uc = to_udma_chan(vd->tx.chan);
+		struct udma_desc *d = to_udma_desc(&vd->tx);
+
+		udma_free_hwdesc(uc, d);
+		list_del(&vd->node);
+		kfree(d);
+	}
+
+	/* If more to purge, schedule the work again */
+	if (!list_empty(&ud->desc_to_purge))
+		schedule_work(&ud->purge_work);
+}
+
+static void udma_desc_free(struct virt_dma_desc *vd)
+{
+	struct udma_dev *ud = to_udma_dev(vd->tx.chan->device);
+	struct udma_chan *uc = to_udma_chan(vd->tx.chan);
+	struct udma_desc *d = to_udma_desc(&vd->tx);
+	unsigned long flags;
+
+	if (uc->terminated_desc == d)
+		uc->terminated_desc = NULL;
+
+	if (uc->use_dma_pool) {
+		udma_free_hwdesc(uc, d);
+		kfree(d);
+		return;
+	}
+
+	spin_lock_irqsave(&ud->lock, flags);
+	list_add_tail(&vd->node, &ud->desc_to_purge);
+	spin_unlock_irqrestore(&ud->lock, flags);
+
+	schedule_work(&ud->purge_work);
+}
+
+static bool udma_is_chan_running(struct udma_chan *uc)
+{
+	u32 trt_ctl = 0;
+	u32 rrt_ctl = 0;
+
+	if (uc->tchan)
+		trt_ctl = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_CTL_REG);
+	if (uc->rchan)
+		rrt_ctl = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_CTL_REG);
+
+	if (trt_ctl & UDMA_CHAN_RT_CTL_EN || rrt_ctl & UDMA_CHAN_RT_CTL_EN)
+		return true;
+
+	return false;
+}
+
+static void udma_sync_for_device(struct udma_chan *uc, int idx)
+{
+	struct udma_desc *d = uc->desc;
+
+	if (uc->cyclic && uc->pkt_mode) {
+		dma_sync_single_for_device(uc->ud->dev,
+					   d->hwdesc[idx].cppi5_desc_paddr,
+					   d->hwdesc[idx].cppi5_desc_size,
+					   DMA_TO_DEVICE);
+	} else {
+		int i;
+
+		for (i = 0; i < d->hwdesc_count; i++) {
+			if (!d->hwdesc[i].cppi5_desc_vaddr)
+				continue;
+
+			dma_sync_single_for_device(uc->ud->dev,
+						d->hwdesc[i].cppi5_desc_paddr,
+						d->hwdesc[i].cppi5_desc_size,
+						DMA_TO_DEVICE);
+		}
+	}
+}
+
+static int udma_push_to_ring(struct udma_chan *uc, int idx)
+{
+	struct udma_desc *d = uc->desc;
+
+	struct k3_ring *ring = NULL;
+	int ret = -EINVAL;
+
+	switch (uc->dir) {
+	case DMA_DEV_TO_MEM:
+		ring = uc->rchan->fd_ring;
+		break;
+	case DMA_MEM_TO_DEV:
+	case DMA_MEM_TO_MEM:
+		ring = uc->tchan->t_ring;
+		break;
+	default:
+		break;
+	}
+
+	if (ring) {
+		dma_addr_t desc_addr = udma_curr_cppi5_desc_paddr(d, idx);
+
+		wmb(); /* Ensure that writes are not moved over this point */
+		udma_sync_for_device(uc, idx);
+		ret = k3_ringacc_ring_push(ring, &desc_addr);
+		uc->in_ring_cnt++;
+	}
+
+	return ret;
+}
+
+static int udma_pop_from_ring(struct udma_chan *uc, dma_addr_t *addr)
+{
+	struct k3_ring *ring = NULL;
+	int ret = -ENOENT;
+
+	switch (uc->dir) {
+	case DMA_DEV_TO_MEM:
+		ring = uc->rchan->r_ring;
+		break;
+	case DMA_MEM_TO_DEV:
+	case DMA_MEM_TO_MEM:
+		ring = uc->tchan->tc_ring;
+		break;
+	default:
+		break;
+	}
+
+	if (ring && k3_ringacc_ring_get_occ(ring)) {
+		struct udma_desc *d = NULL;
+
+		ret = k3_ringacc_ring_pop(ring, addr);
+		if (ret)
+			return ret;
+
+		/* Teardown completion */
+		if (*addr & 0x1)
+			return ret;
+
+		d = udma_udma_desc_from_paddr(uc, *addr);
+
+		if (d)
+			dma_sync_single_for_cpu(uc->ud->dev, *addr,
+						d->hwdesc[0].cppi5_desc_size,
+						DMA_FROM_DEVICE);
+		rmb(); /* Ensure that reads are not moved before this point */
+
+		if (!ret)
+			uc->in_ring_cnt--;
+	}
+
+	return ret;
+}
+
+static void udma_reset_rings(struct udma_chan *uc)
+{
+	struct k3_ring *ring1 = NULL;
+	struct k3_ring *ring2 = NULL;
+
+	switch (uc->dir) {
+	case DMA_DEV_TO_MEM:
+		if (uc->rchan) {
+			ring1 = uc->rchan->fd_ring;
+			ring2 = uc->rchan->r_ring;
+		}
+		break;
+	case DMA_MEM_TO_DEV:
+	case DMA_MEM_TO_MEM:
+		if (uc->tchan) {
+			ring1 = uc->tchan->t_ring;
+			ring2 = uc->tchan->tc_ring;
+		}
+		break;
+	default:
+		break;
+	}
+
+	if (ring1)
+		k3_ringacc_ring_reset_dma(ring1,
+					  k3_ringacc_ring_get_occ(ring1));
+	if (ring2)
+		k3_ringacc_ring_reset(ring2);
+
+	/* make sure we are not leaking memory by stalled descriptor */
+	if (uc->terminated_desc) {
+		udma_desc_free(&uc->terminated_desc->vd);
+		uc->terminated_desc = NULL;
+	}
+
+	uc->in_ring_cnt = 0;
+}
+
+static void udma_reset_counters(struct udma_chan *uc)
+{
+	u32 val;
+
+	if (uc->tchan) {
+		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_BCNT_REG, val);
+
+		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG, val);
+
+		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PCNT_REG);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PCNT_REG, val);
+
+		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG, val);
+	}
+
+	if (uc->rchan) {
+		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_BCNT_REG);
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_BCNT_REG, val);
+
+		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG);
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG, val);
+
+		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PCNT_REG);
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PCNT_REG, val);
+
+		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG);
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG, val);
+	}
+
+	uc->bcnt = 0;
+}
+
+static int udma_reset_chan(struct udma_chan *uc, bool hard)
+{
+	switch (uc->dir) {
+	case DMA_DEV_TO_MEM:
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_RT_EN_REG, 0);
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_CTL_REG, 0);
+		break;
+	case DMA_MEM_TO_DEV:
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_CTL_REG, 0);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_RT_EN_REG, 0);
+		break;
+	case DMA_MEM_TO_MEM:
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_CTL_REG, 0);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_CTL_REG, 0);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/* Reset all counters */
+	udma_reset_counters(uc);
+
+	/* Hard reset: re-initialize the channel to reset */
+	if (hard) {
+		struct udma_chan uc_backup = *uc;
+		int ret;
+
+		uc->ud->ddev.device_free_chan_resources(&uc->vc.chan);
+		/* restore the channel configuration */
+		uc->dir = uc_backup.dir;
+		uc->remote_thread_id = uc_backup.remote_thread_id;
+		uc->pkt_mode = uc_backup.pkt_mode;
+		uc->static_tr_type = uc_backup.static_tr_type;
+		uc->enable_acc32 = uc_backup.enable_acc32;
+		uc->enable_burst = uc_backup.enable_burst;
+		uc->channel_tpl = uc_backup.channel_tpl;
+		uc->psd_size = uc_backup.psd_size;
+		uc->metadata_size = uc_backup.metadata_size;
+		uc->hdesc_size = uc_backup.hdesc_size;
+
+		ret = uc->ud->ddev.device_alloc_chan_resources(&uc->vc.chan);
+		if (ret)
+			return ret;
+	}
+	uc->state = UDMA_CHAN_IS_IDLE;
+
+	return 0;
+}
+
+static void udma_start_desc(struct udma_chan *uc)
+{
+	if (uc->pkt_mode && (uc->cyclic || uc->dir == DMA_DEV_TO_MEM)) {
+		int i;
+
+		/* Push all descriptors to ring for packet mode cyclic or RX */
+		for (i = 0; i < uc->desc->sglen; i++)
+			udma_push_to_ring(uc, i);
+	} else {
+		udma_push_to_ring(uc, 0);
+	}
+}
+
+static bool udma_chan_needs_reconfiguration(struct udma_chan *uc)
+{
+	/* Only PDMAs have staticTR */
+	if (!uc->static_tr_type)
+		return false;
+
+	/* Check if the staticTR configuration has changed for TX */
+	if (memcmp(&uc->static_tr, &uc->desc->static_tr, sizeof(uc->static_tr)))
+		return true;
+
+	return false;
+}
+
+static int udma_start(struct udma_chan *uc)
+{
+	struct virt_dma_desc *vd = vchan_next_desc(&uc->vc);
+
+	if (!vd) {
+		uc->desc = NULL;
+		return -ENOENT;
+	}
+
+	list_del(&vd->node);
+
+	uc->desc = to_udma_desc(&vd->tx);
+
+	/* Channel is already running and does not need reconfiguration */
+	if (udma_is_chan_running(uc) && !udma_chan_needs_reconfiguration(uc)) {
+		udma_start_desc(uc);
+		goto out;
+	}
+
+	/* Make sure that we clear the teardown bit, if it is set */
+	udma_reset_chan(uc, false);
+
+	/* Push descriptors before we start the channel */
+	udma_start_desc(uc);
+
+	switch (uc->desc->dir) {
+	case DMA_DEV_TO_MEM:
+		/* Config remote TR */
+		if (uc->static_tr_type) {
+			u32 val = PDMA_STATIC_TR_Y(uc->desc->static_tr.elcnt) |
+				  PDMA_STATIC_TR_X(uc->desc->static_tr.elsize);
+			const struct udma_match_data *match_data =
+							uc->ud->match_data;
+
+			if (uc->enable_acc32)
+				val |= PDMA_STATIC_TR_XY_ACC32;
+			if (uc->enable_burst)
+				val |= PDMA_STATIC_TR_XY_BURST;
+
+			udma_rchanrt_write(uc->rchan,
+				UDMA_RCHAN_RT_PEER_STATIC_TR_XY_REG, val);
+
+			udma_rchanrt_write(uc->rchan,
+				UDMA_RCHAN_RT_PEER_STATIC_TR_Z_REG,
+				PDMA_STATIC_TR_Z(uc->desc->static_tr.bstcnt,
+						 match_data->statictr_z_mask));
+
+			/* save the current staticTR configuration */
+			memcpy(&uc->static_tr, &uc->desc->static_tr,
+			       sizeof(uc->static_tr));
+		}
+
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_CTL_REG,
+				   UDMA_CHAN_RT_CTL_EN);
+
+		/* Enable remote */
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_RT_EN_REG,
+				   UDMA_PEER_RT_EN_ENABLE);
+
+		break;
+	case DMA_MEM_TO_DEV:
+		/* Config remote TR */
+		if (uc->static_tr_type) {
+			u32 val = PDMA_STATIC_TR_Y(uc->desc->static_tr.elcnt) |
+				  PDMA_STATIC_TR_X(uc->desc->static_tr.elsize);
+
+			if (uc->enable_acc32)
+				val |= PDMA_STATIC_TR_XY_ACC32;
+			if (uc->enable_burst)
+				val |= PDMA_STATIC_TR_XY_BURST;
+
+			udma_tchanrt_write(uc->tchan,
+				UDMA_TCHAN_RT_PEER_STATIC_TR_XY_REG, val);
+
+			/* save the current staticTR configuration */
+			memcpy(&uc->static_tr, &uc->desc->static_tr,
+			       sizeof(uc->static_tr));
+		}
+
+		/* Enable remote */
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_RT_EN_REG,
+				   UDMA_PEER_RT_EN_ENABLE);
+
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_CTL_REG,
+				   UDMA_CHAN_RT_CTL_EN);
+
+		break;
+	case DMA_MEM_TO_MEM:
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_CTL_REG,
+				   UDMA_CHAN_RT_CTL_EN);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_CTL_REG,
+				   UDMA_CHAN_RT_CTL_EN);
+
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	uc->state = UDMA_CHAN_IS_ACTIVE;
+out:
+
+	return 0;
+}
+
+static int udma_stop(struct udma_chan *uc)
+{
+	enum udma_chan_state old_state = uc->state;
+
+	uc->state = UDMA_CHAN_IS_TERMINATING;
+	reinit_completion(&uc->teardown_completed);
+
+	switch (uc->dir) {
+	case DMA_DEV_TO_MEM:
+		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_RT_EN_REG,
+				   UDMA_PEER_RT_EN_ENABLE |
+				   UDMA_PEER_RT_EN_TEARDOWN);
+		break;
+	case DMA_MEM_TO_DEV:
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_RT_EN_REG,
+				   UDMA_PEER_RT_EN_ENABLE |
+				   UDMA_PEER_RT_EN_FLUSH);
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_CTL_REG,
+				   UDMA_CHAN_RT_CTL_EN |
+				   UDMA_CHAN_RT_CTL_TDOWN);
+		break;
+	case DMA_MEM_TO_MEM:
+		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_CTL_REG,
+				   UDMA_CHAN_RT_CTL_EN |
+				   UDMA_CHAN_RT_CTL_TDOWN);
+		break;
+	default:
+		uc->state = old_state;
+		complete_all(&uc->teardown_completed);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void udma_cyclic_packet_elapsed(struct udma_chan *uc)
+{
+	struct udma_desc *d = uc->desc;
+	struct cppi5_host_desc_t *h_desc;
+
+	h_desc = d->hwdesc[d->desc_idx].cppi5_desc_vaddr;
+	cppi5_hdesc_reset_to_original(h_desc);
+	udma_push_to_ring(uc, d->desc_idx);
+	d->desc_idx = (d->desc_idx + 1) % d->sglen;
+}
+
+static inline void udma_fetch_epib(struct udma_chan *uc, struct udma_desc *d)
+{
+	struct cppi5_host_desc_t *h_desc = d->hwdesc[0].cppi5_desc_vaddr;
+
+	memcpy(d->metadata, h_desc->epib, d->metadata_size);
+}
+
+static bool udma_is_desc_really_done(struct udma_chan *uc,
+					    struct udma_desc *d)
+{
+	u32 peer_bcnt, bcnt;
+
+	/* Only TX towards PDMA is affected */
+	if (!uc->static_tr_type || uc->dir != DMA_MEM_TO_DEV)
+		return true;
+
+	peer_bcnt = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG);
+	bcnt = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG);
+
+	if (peer_bcnt < bcnt)
+		return false;
+
+	return true;
+}
+
+static void udma_flush_tx(struct udma_chan *uc)
+{
+	if (uc->dir != DMA_MEM_TO_DEV)
+		return;
+
+	uc->state = UDMA_CHAN_IS_ACTIVE_FLUSH;
+
+	udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_CTL_REG,
+			   UDMA_CHAN_RT_CTL_EN |
+			   UDMA_CHAN_RT_CTL_TDOWN);
+}
+
+static void udma_ring_callback(struct udma_chan *uc, dma_addr_t paddr)
+{
+	struct udma_desc *d;
+	unsigned long flags;
+
+	if (!paddr)
+		return;
+
+	spin_lock_irqsave(&uc->vc.lock, flags);
+
+	/* Teardown completion message */
+	if (paddr & 0x1) {
+		/* Compensate our internal pop/push counter */
+		uc->in_ring_cnt++;
+
+		complete_all(&uc->teardown_completed);
+
+		if (uc->terminated_desc) {
+			udma_desc_free(&uc->terminated_desc->vd);
+			uc->terminated_desc = NULL;
+		}
+
+		if (!uc->desc)
+			udma_start(uc);
+
+		if (uc->state != UDMA_CHAN_IS_ACTIVE_FLUSH)
+			goto out;
+		else if (uc->desc)
+			paddr = udma_curr_cppi5_desc_paddr(uc->desc,
+							   uc->desc->desc_idx);
+	}
+
+	d = udma_udma_desc_from_paddr(uc, paddr);
+
+	if (d) {
+		dma_addr_t desc_paddr = udma_curr_cppi5_desc_paddr(d,
+								   d->desc_idx);
+		if (desc_paddr != paddr) {
+			dev_err(uc->ud->dev, "not matching descriptors!\n");
+			goto out;
+		}
+
+		if (uc->cyclic) {
+			/* push the descriptor back to the ring */
+			if (d == uc->desc) {
+				udma_cyclic_packet_elapsed(uc);
+				vchan_cyclic_callback(&d->vd);
+			}
+		} else {
+			bool desc_done = true;
+
+			if (d == uc->desc) {
+				desc_done = udma_is_desc_really_done(uc, d);
+
+				if (desc_done) {
+					uc->bcnt += d->residue;
+					udma_start(uc);
+				} else {
+					udma_flush_tx(uc);
+				}
+			} else if (d == uc->terminated_desc) {
+				uc->terminated_desc = NULL;
+			}
+
+			if (desc_done)
+				vchan_cookie_complete(&d->vd);
+		}
+	}
+out:
+	spin_unlock_irqrestore(&uc->vc.lock, flags);
+}
+
+static void udma_tr_event_callback(struct udma_chan *uc)
+{
+	struct udma_desc *d;
+	unsigned long flags;
+
+	spin_lock_irqsave(&uc->vc.lock, flags);
+	d = uc->desc;
+	if (d) {
+		d->tr_idx = (d->tr_idx + 1) % d->sglen;
+
+		if (uc->cyclic) {
+			vchan_cyclic_callback(&d->vd);
+		} else {
+			/* TODO: figure out the real amount of data */
+			uc->bcnt += d->residue;
+			udma_start(uc);
+			vchan_cookie_complete(&d->vd);
+		}
+	}
+
+	spin_unlock_irqrestore(&uc->vc.lock, flags);
+}
+
+static irqreturn_t udma_ring_irq_handler(int irq, void *data)
+{
+	struct udma_chan *uc = data;
+	dma_addr_t paddr = 0;
+
+	if (!udma_pop_from_ring(uc, &paddr))
+		udma_ring_callback(uc, paddr);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t udma_udma_irq_handler(int irq, void *data)
+{
+	struct udma_chan *uc = data;
+
+	udma_tr_event_callback(uc);
+
+	return IRQ_HANDLED;
+}
diff --git a/drivers/dma/ti/k3-udma.h b/drivers/dma/ti/k3-udma.h
new file mode 100644
index 000000000000..a6153deb791b
--- /dev/null
+++ b/drivers/dma/ti/k3-udma.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ *  Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ */
+
+#ifndef K3_UDMA_H_
+#define K3_UDMA_H_
+
+#include <linux/soc/ti/ti_sci_protocol.h>
+
+#define UDMA_PSIL_DST_THREAD_ID_OFFSET 0x8000
+
+/* Global registers */
+#define UDMA_REV_REG			0x0
+#define UDMA_PERF_CTL_REG		0x4
+#define UDMA_EMU_CTL_REG		0x8
+#define UDMA_PSIL_TO_REG		0x10
+#define UDMA_UTC_CTL_REG		0x1c
+#define UDMA_CAP_REG(i)			(0x20 + (i * 4))
+#define UDMA_RX_FLOW_ID_FW_OES_REG	0x80
+#define UDMA_RX_FLOW_ID_FW_STATUS_REG	0x88
+
+/* TX chan RT regs */
+#define UDMA_TCHAN_RT_CTL_REG		0x0
+#define UDMA_TCHAN_RT_SWTRIG_REG	0x8
+#define UDMA_TCHAN_RT_STDATA_REG	0x80
+
+#define UDMA_TCHAN_RT_PEERn_REG(i)	(0x200 + (i * 0x4))
+#define UDMA_TCHAN_RT_PEER_STATIC_TR_XY_REG	\
+	UDMA_TCHAN_RT_PEERn_REG(0)	/* PSI-L: 0x400 */
+#define UDMA_TCHAN_RT_PEER_STATIC_TR_Z_REG	\
+	UDMA_TCHAN_RT_PEERn_REG(1)	/* PSI-L: 0x401 */
+#define UDMA_TCHAN_RT_PEER_BCNT_REG		\
+	UDMA_TCHAN_RT_PEERn_REG(4)	/* PSI-L: 0x404 */
+#define UDMA_TCHAN_RT_PEER_RT_EN_REG		\
+	UDMA_TCHAN_RT_PEERn_REG(8)	/* PSI-L: 0x408 */
+
+#define UDMA_TCHAN_RT_PCNT_REG		0x400
+#define UDMA_TCHAN_RT_BCNT_REG		0x408
+#define UDMA_TCHAN_RT_SBCNT_REG		0x410
+
+/* RX chan RT regs */
+#define UDMA_RCHAN_RT_CTL_REG		0x0
+#define UDMA_RCHAN_RT_SWTRIG_REG	0x8
+#define UDMA_RCHAN_RT_STDATA_REG	0x80
+
+#define UDMA_RCHAN_RT_PEERn_REG(i)	(0x200 + (i * 0x4))
+#define UDMA_RCHAN_RT_PEER_STATIC_TR_XY_REG	\
+	UDMA_RCHAN_RT_PEERn_REG(0)	/* PSI-L: 0x400 */
+#define UDMA_RCHAN_RT_PEER_STATIC_TR_Z_REG	\
+	UDMA_RCHAN_RT_PEERn_REG(1)	/* PSI-L: 0x401 */
+#define UDMA_RCHAN_RT_PEER_BCNT_REG		\
+	UDMA_RCHAN_RT_PEERn_REG(4)	/* PSI-L: 0x404 */
+#define UDMA_RCHAN_RT_PEER_RT_EN_REG		\
+	UDMA_RCHAN_RT_PEERn_REG(8)	/* PSI-L: 0x408 */
+
+#define UDMA_RCHAN_RT_PCNT_REG		0x400
+#define UDMA_RCHAN_RT_BCNT_REG		0x408
+#define UDMA_RCHAN_RT_SBCNT_REG		0x410
+
+/* UDMA_TCHAN_RT_CTL_REG/UDMA_RCHAN_RT_CTL_REG */
+#define UDMA_CHAN_RT_CTL_EN		BIT(31)
+#define UDMA_CHAN_RT_CTL_TDOWN		BIT(30)
+#define UDMA_CHAN_RT_CTL_PAUSE		BIT(29)
+#define UDMA_CHAN_RT_CTL_FTDOWN		BIT(28)
+#define UDMA_CHAN_RT_CTL_ERROR		BIT(0)
+
+/* UDMA_TCHAN_RT_PEER_RT_EN_REG/UDMA_RCHAN_RT_PEER_RT_EN_REG (PSI-L: 0x408) */
+#define UDMA_PEER_RT_EN_ENABLE		BIT(31)
+#define UDMA_PEER_RT_EN_TEARDOWN	BIT(30)
+#define UDMA_PEER_RT_EN_PAUSE		BIT(29)
+#define UDMA_PEER_RT_EN_FLUSH		BIT(28)
+#define UDMA_PEER_RT_EN_IDLE		BIT(1)
+
+/*
+ * UDMA_TCHAN_RT_PEER_STATIC_TR_XY_REG /
+ * UDMA_RCHAN_RT_PEER_STATIC_TR_XY_REG
+ */
+#define PDMA_STATIC_TR_X_MASK		GENMASK(26, 24)
+#define PDMA_STATIC_TR_X_SHIFT		(24)
+#define PDMA_STATIC_TR_Y_MASK		GENMASK(11, 0)
+#define PDMA_STATIC_TR_Y_SHIFT		(0)
+
+#define PDMA_STATIC_TR_Y(x)	\
+	(((x) << PDMA_STATIC_TR_Y_SHIFT) & PDMA_STATIC_TR_Y_MASK)
+#define PDMA_STATIC_TR_X(x)	\
+	(((x) << PDMA_STATIC_TR_X_SHIFT) & PDMA_STATIC_TR_X_MASK)
+
+#define PDMA_STATIC_TR_XY_ACC32		BIT(30)
+#define PDMA_STATIC_TR_XY_BURST		BIT(31)
+
+/*
+ * UDMA_TCHAN_RT_PEER_STATIC_TR_Z_REG /
+ * UDMA_RCHAN_RT_PEER_STATIC_TR_Z_REG
+ */
+#define PDMA_STATIC_TR_Z(x, mask)	((x) & (mask))
+
+struct udma_dev;
+struct udma_tchan;
+struct udma_rchan;
+struct udma_rflow;
+
+enum udma_rm_range {
+	RM_RANGE_TCHAN = 0,
+	RM_RANGE_RCHAN,
+	RM_RANGE_RFLOW,
+	RM_RANGE_LAST,
+};
+
+/* Channel Throughput Levels */
+enum udma_tp_level {
+	UDMA_TP_NORMAL = 0,
+	UDMA_TP_HIGH = 1,
+	UDMA_TP_ULTRAHIGH = 2,
+	UDMA_TP_LAST,
+};
+
+struct udma_tisci_rm {
+	const struct ti_sci_handle *tisci;
+	const struct ti_sci_rm_udmap_ops *tisci_udmap_ops;
+	u32  tisci_dev_id;
+
+	/* tisci information for PSI-L thread pairing/unpairing */
+	const struct ti_sci_rm_psil_ops *tisci_psil_ops;
+	u32  tisci_navss_dev_id;
+
+	struct ti_sci_resource *rm_ranges[RM_RANGE_LAST];
+};
+
+#endif /* K3_UDMA_H_ */
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 09/14] dmaengine: ti: New driver for K3 UDMA - split#2: probe/remove, xlate and filter_fn
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (7 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 08/14] dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io func Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 10/14] dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free chan_resources Peter Ujfalusi
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Split patch for review containing: module probe/remove functions, of_xlate
and filter_fn for slave channel requests.

DMA driver for
Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)

The UDMA-P is intended to perform similar (but significantly upgraded) functions
as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
supports the transmission and reception of various packet types. The UDMA-P is
architected to facilitate the segmentation and reassembly of SoC DMA data
structure compliant packets to/from smaller data blocks that are natively
compatible with the specific requirements of each connected peripheral. Multiple
Tx and Rx channels are provided within the DMA which allow multiple segmentation
or reassembly operations to be ongoing. The DMA controller maintains state
information for each of the channels which allows packet segmentation and
reassembly operations to be time division multiplexed between channels in order
to share the underlying DMA hardware. An external DMA scheduler is used to
control the ordering and rate at which this multiplexing occurs for Transmit
operations. The ordering and rate of Receive operations is indirectly controlled
by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.

The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
channels. Channels in the UDMA-P can be configured to be either Packet-Based or
Third-Party channels on a channel by channel basis.

The initial driver supports:
- MEM_TO_MEM (TR mode)
- DEV_TO_MEM (Packet / TR mode)
- MEM_TO_DEV (Packet / TR mode)
- Cyclic (Packet / TR mode)
- Metadata for descriptors

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/ti/k3-udma.c | 605 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 605 insertions(+)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index e6d2c4b172e5..52ccc6d46de9 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1038,3 +1038,608 @@ static irqreturn_t udma_udma_irq_handler(int irq, void *data)
 
 	return IRQ_HANDLED;
 }
+
+static struct platform_driver udma_driver;
+
+static bool udma_slave_thread_is_packet_mode(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+	const struct udma_match_data *match_data = ud->match_data;
+	struct udma_tr_thread_ranges *tr_threads = match_data->tr_threads;
+	int i;
+
+	if (!tr_threads)
+		return true;
+
+	for (i = 0; tr_threads[i].count; i++) {
+		int start = tr_threads[i].start;
+		int count = tr_threads[i].count;
+
+		if (uc->remote_thread_id >= start &&
+		    uc->remote_thread_id < (start + count))
+			return false;
+	}
+	return true;
+}
+
+static bool udma_dma_filter_fn(struct dma_chan *chan, void *param)
+{
+	u32 *args;
+	struct udma_chan *uc;
+	struct udma_dev *ud;
+	struct device_node *chconf_node, *slave_node;
+	char prop[50];
+	u32 val;
+
+	if (chan->device->dev->driver != &udma_driver.driver)
+		return false;
+
+	uc = to_udma_chan(chan);
+	ud = uc->ud;
+	args = param;
+
+	if (args[2] == UDMA_DIR_TX) {
+		uc->dir = DMA_MEM_TO_DEV;
+	} else if (args[2] == UDMA_DIR_RX) {
+		uc->dir = DMA_DEV_TO_MEM;
+	} else {
+		dev_err(ud->dev, "Invalid direction (%u)\n", args[2]);
+		return false;
+	}
+
+	slave_node = of_find_node_by_phandle(args[0]);
+	if (!slave_node) {
+		dev_err(ud->dev, "Slave node is missing\n");
+		uc->dir = DMA_MEM_TO_MEM;
+		return false;
+	}
+
+	if (of_property_read_u32(slave_node, "ti,psil-base", &val)) {
+		dev_err(ud->dev, "ti,psil-base is missing\n");
+		uc->dir = DMA_MEM_TO_MEM;
+		return false;
+	}
+
+	uc->remote_thread_id = val + args[1];
+
+	snprintf(prop, sizeof(prop), "psil-config%u", args[1]);
+	/* Does of_node_put on slave_node */
+	chconf_node = of_find_node_by_name(slave_node, prop);
+	if (!chconf_node) {
+		dev_err(ud->dev, "Channel configuration node is missing\n");
+		uc->dir = DMA_MEM_TO_MEM;
+		uc->remote_thread_id = -1;
+		return false;
+	}
+
+	uc->pkt_mode = udma_slave_thread_is_packet_mode(uc);
+
+	if (!of_property_read_u32(chconf_node, "pdma,statictr-type", &val))
+		uc->static_tr_type = val;
+
+	if (uc->static_tr_type == PSIL_STATIC_TR_XY) {
+		const struct udma_match_data *match_data = ud->match_data;
+
+		if (match_data->have_acc32)
+			uc->enable_acc32 = of_property_read_bool(chconf_node,
+							"pdma,enable-acc32");
+		if (match_data->have_burst)
+			uc->enable_burst = of_property_read_bool(chconf_node,
+							"pdma,enable-burst");
+	}
+
+	if (!of_property_read_u32(chconf_node, "ti,channel-tpl", &val))
+		uc->channel_tpl = val;
+
+	uc->needs_epib = of_property_read_bool(chconf_node, "ti,needs-epib");
+	if (!of_property_read_u32(chconf_node, "ti,psd-size", &val))
+		uc->psd_size = val;
+	uc->metadata_size = (uc->needs_epib ? CPPI5_INFO0_HDESC_EPIB_SIZE : 0) +
+			    uc->psd_size;
+
+	if (uc->pkt_mode)
+		uc->hdesc_size = ALIGN(sizeof(struct cppi5_host_desc_t) +
+				 uc->metadata_size, ud->desc_align);
+
+	of_node_put(chconf_node);
+
+	dev_dbg(ud->dev, "chan%d: Remote thread: 0x%04x (%s)\n", uc->id,
+		uc->remote_thread_id, udma_get_dir_text(uc->dir));
+
+	return true;
+}
+
+static struct dma_chan *udma_of_xlate(struct of_phandle_args *dma_spec,
+				      struct of_dma *ofdma)
+{
+	struct udma_dev *ud = ofdma->of_dma_data;
+	dma_cap_mask_t mask = ud->ddev.cap_mask;
+	struct dma_chan *chan;
+
+	if (dma_spec->args_count != 3)
+		return NULL;
+
+	chan = __dma_request_channel(&mask, udma_dma_filter_fn,
+				     &dma_spec->args[0], ofdma->of_node);
+	if (!chan) {
+		dev_err(ud->dev, "get channel fail in %s.\n", __func__);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return chan;
+}
+
+static struct udma_tr_thread_ranges am654_tr_threads[] = {
+	{
+		/* PDMA0 - McASPs */
+		.start = 0x4400,
+		.count = 3,
+	},
+	{
+		/* MCU_PDMA0 - ADCs */
+		.start = 0x7100,
+		.count = 4,
+	},
+	{ /* Sentinel */ },
+};
+
+static struct udma_tr_thread_ranges j721e_tr_threads[] = {
+	{
+		/* PDMA_MCASP_G0 - McASPs */
+		.start = 0x4400,
+		.count = 3,
+	},
+	{
+		/* PDMA_MCASP_G1 - McASPs */
+		.start = 0x4500,
+		.count = 9,
+	},
+	{
+		/* MCU_PDMA_ADC - ADCs */
+		.start = 0x7400,
+		.count = 4,
+	},
+	{ /* Sentinel */ },
+};
+
+static struct udma_match_data am654_main_data = {
+	.enable_memcpy_support = true,
+	.have_acc32 = false,
+	.have_burst = false,
+	.statictr_z_mask = GENMASK(11, 0),
+	.rchan_oes_offset = 0x2000,
+	.tr_threads = am654_tr_threads,
+	.tpl_levels = 2,
+	.level_start_idx = {
+		[0] = 8, /* Normal channels */
+		[1] = 0, /* High Throughput channels */
+	},
+};
+
+static struct udma_match_data am654_mcu_data = {
+	.enable_memcpy_support = false, /* MEM_TO_MEM is slow via MCU UDMA */
+	.have_acc32 = false,
+	.have_burst = false,
+	.statictr_z_mask = GENMASK(11, 0),
+	.rchan_oes_offset = 0x2000,
+	.tr_threads = am654_tr_threads,
+	.tpl_levels = 2,
+	.level_start_idx = {
+		[0] = 2, /* Normal channels */
+		[1] = 0, /* High Throughput channels */
+	},
+};
+
+static struct udma_match_data j721e_main_data = {
+	.enable_memcpy_support = true,
+	.have_acc32 = true,
+	.have_burst = true,
+	.statictr_z_mask = GENMASK(23, 0),
+	.rchan_oes_offset = 0x400,
+	.tr_threads = j721e_tr_threads,
+	.tpl_levels = 3,
+	.level_start_idx = {
+		[0] = 16, /* Normal channels */
+		[1] = 4, /* High Throughput channels */
+		[2] = 0, /* Ultra High Throughput channels */
+	},
+};
+
+static struct udma_match_data j721e_mcu_data = {
+	.enable_memcpy_support = false, /* MEM_TO_MEM is slow via MCU UDMA */
+	.have_acc32 = true,
+	.have_burst = true,
+	.statictr_z_mask = GENMASK(23, 0),
+	.rchan_oes_offset = 0x400,
+	.tr_threads = j721e_tr_threads,
+	.tpl_levels = 2,
+	.level_start_idx = {
+		[0] = 2, /* Normal channels */
+		[1] = 0, /* High Throughput channels */
+	},
+};
+
+static const struct of_device_id udma_of_match[] = {
+	{
+		.compatible = "ti,am654-navss-main-udmap",
+		.data = &am654_main_data,
+	},
+	{
+		.compatible = "ti,am654-navss-mcu-udmap",
+		.data = &am654_mcu_data,
+	}, {
+		.compatible = "ti,j721e-navss-main-udmap",
+		.data = &j721e_main_data,
+	}, {
+		.compatible = "ti,j721e-navss-mcu-udmap",
+		.data = &j721e_mcu_data,
+	},
+	{ /* Sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, udma_of_match);
+
+static int udma_get_mmrs(struct platform_device *pdev, struct udma_dev *ud)
+{
+	struct resource *res;
+	int i;
+
+	for (i = 0; i < MMR_LAST; i++) {
+		res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
+						   mmr_names[i]);
+		ud->mmrs[i] = devm_ioremap_resource(&pdev->dev, res);
+		if (IS_ERR(ud->mmrs[i]))
+			return PTR_ERR(ud->mmrs[i]);
+	}
+
+	return 0;
+}
+
+static int udma_setup_resources(struct udma_dev *ud)
+{
+	struct device *dev = ud->dev;
+	int ch_count, ret, i, j;
+	u32 cap2, cap3;
+	struct ti_sci_resource_desc *rm_desc;
+	struct ti_sci_resource *rm_res, irq_res;
+	struct udma_tisci_rm *tisci_rm = &ud->tisci_rm;
+	static const char * const range_names[] = { "ti,sci-rm-range-tchan",
+						    "ti,sci-rm-range-rchan",
+						    "ti,sci-rm-range-rflow" };
+
+	cap2 = udma_read(ud->mmrs[MMR_GCFG], 0x28);
+	cap3 = udma_read(ud->mmrs[MMR_GCFG], 0x2c);
+
+	ud->rflow_cnt = cap3 & 0x3fff;
+	ud->tchan_cnt = cap2 & 0x1ff;
+	ud->echan_cnt = (cap2 >> 9) & 0x1ff;
+	ud->rchan_cnt = (cap2 >> 18) & 0x1ff;
+	ch_count  = ud->tchan_cnt + ud->rchan_cnt;
+
+	ud->tchan_map = devm_kmalloc_array(dev, BITS_TO_LONGS(ud->tchan_cnt),
+					   sizeof(unsigned long), GFP_KERNEL);
+	ud->tchans = devm_kcalloc(dev, ud->tchan_cnt, sizeof(*ud->tchans),
+				  GFP_KERNEL);
+	ud->rchan_map = devm_kmalloc_array(dev, BITS_TO_LONGS(ud->rchan_cnt),
+					   sizeof(unsigned long), GFP_KERNEL);
+	ud->rchans = devm_kcalloc(dev, ud->rchan_cnt, sizeof(*ud->rchans),
+				  GFP_KERNEL);
+	ud->rflow_map = devm_kmalloc_array(dev, BITS_TO_LONGS(ud->rflow_cnt),
+					   sizeof(unsigned long), GFP_KERNEL);
+	ud->rflow_map_reserved = devm_kcalloc(dev, BITS_TO_LONGS(ud->rflow_cnt),
+					      sizeof(unsigned long),
+					      GFP_KERNEL);
+	ud->rflows = devm_kcalloc(dev, ud->rflow_cnt, sizeof(*ud->rflows),
+				  GFP_KERNEL);
+
+	if (!ud->tchan_map || !ud->rchan_map || !ud->rflow_map ||
+	    !ud->rflow_map_reserved || !ud->tchans || !ud->rchans ||
+	    !ud->rflows)
+		return -ENOMEM;
+
+	/*
+	 * RX flows with the same Ids as RX channels are reserved to be used
+	 * as default flows if remote HW can't generate flow_ids. Those
+	 * RX flows can be requested only explicitly by id.
+	 */
+	bitmap_set(ud->rflow_map_reserved, 0, ud->rchan_cnt);
+
+	/* Get resource ranges from tisci */
+	for (i = 0; i < RM_RANGE_LAST; i++)
+		tisci_rm->rm_ranges[i] =
+			devm_ti_sci_get_of_resource(tisci_rm->tisci, dev,
+						    tisci_rm->tisci_dev_id,
+						    (char *)range_names[i]);
+
+	/* tchan ranges */
+	rm_res = tisci_rm->rm_ranges[RM_RANGE_TCHAN];
+	if (IS_ERR(rm_res)) {
+		bitmap_zero(ud->tchan_map, ud->tchan_cnt);
+	} else {
+		bitmap_fill(ud->tchan_map, ud->tchan_cnt);
+		for (i = 0; i < rm_res->sets; i++) {
+			rm_desc = &rm_res->desc[i];
+			bitmap_clear(ud->tchan_map, rm_desc->start,
+				     rm_desc->num);
+		}
+	}
+	irq_res.sets = rm_res->sets;
+
+	/* rchan and matching default flow ranges */
+	rm_res = tisci_rm->rm_ranges[RM_RANGE_RCHAN];
+	if (IS_ERR(rm_res)) {
+		bitmap_zero(ud->rchan_map, ud->rchan_cnt);
+		bitmap_zero(ud->rflow_map, ud->rchan_cnt);
+	} else {
+		bitmap_fill(ud->rchan_map, ud->rchan_cnt);
+		bitmap_fill(ud->rflow_map, ud->rchan_cnt);
+		for (i = 0; i < rm_res->sets; i++) {
+			rm_desc = &rm_res->desc[i];
+			bitmap_clear(ud->rchan_map, rm_desc->start,
+				     rm_desc->num);
+			bitmap_clear(ud->rflow_map, rm_desc->start,
+				     rm_desc->num);
+		}
+	}
+
+	irq_res.sets += rm_res->sets;
+	irq_res.desc = kcalloc(irq_res.sets, sizeof(*irq_res.desc), GFP_KERNEL);
+	rm_res = tisci_rm->rm_ranges[RM_RANGE_TCHAN];
+	for (i = 0; i < rm_res->sets; i++) {
+		irq_res.desc[i].start = rm_res->desc[i].start;
+		irq_res.desc[i].num = rm_res->desc[i].num;
+	}
+	rm_res = tisci_rm->rm_ranges[RM_RANGE_RCHAN];
+	for (j = 0; j < rm_res->sets; j++, i++) {
+		irq_res.desc[i].start = rm_res->desc[j].start + 0x2000;
+		irq_res.desc[i].num = rm_res->desc[j].num;
+	}
+	ret = ti_sci_inta_msi_domain_alloc_irqs(ud->dev, &irq_res);
+	kfree(irq_res.desc);
+	if (ret) {
+		dev_err(ud->dev, "Failed to allocate MSI interrupts\n");
+		return ret;
+	}
+
+	/* GP rflow ranges */
+	rm_res = tisci_rm->rm_ranges[RM_RANGE_RFLOW];
+	if (IS_ERR(rm_res)) {
+		bitmap_clear(ud->rflow_map, ud->rchan_cnt,
+			     ud->rflow_cnt - ud->rchan_cnt);
+	} else {
+		bitmap_set(ud->rflow_map, ud->rchan_cnt,
+			   ud->rflow_cnt - ud->rchan_cnt);
+		for (i = 0; i < rm_res->sets; i++) {
+			rm_desc = &rm_res->desc[i];
+			bitmap_clear(ud->rflow_map, rm_desc->start,
+				     rm_desc->num);
+		}
+	}
+
+	ch_count -= bitmap_weight(ud->tchan_map, ud->tchan_cnt);
+	ch_count -= bitmap_weight(ud->rchan_map, ud->rchan_cnt);
+	if (!ch_count)
+		return -ENODEV;
+
+	ud->channels = devm_kcalloc(dev, ch_count, sizeof(*ud->channels),
+				    GFP_KERNEL);
+	if (!ud->channels)
+		return -ENOMEM;
+
+	dev_info(dev, "Channels: %d (tchan: %u, rchan: %u, rflow: %u)\n",
+		 ch_count,
+		 ud->tchan_cnt - bitmap_weight(ud->tchan_map, ud->tchan_cnt),
+		 ud->rchan_cnt - bitmap_weight(ud->rchan_map, ud->rchan_cnt),
+		 ud->rflow_cnt - bitmap_weight(ud->rflow_map, ud->rflow_cnt));
+
+	return ch_count;
+}
+
+#define TI_UDMAC_BUSWIDTHS	(BIT(DMA_SLAVE_BUSWIDTH_1_BYTE) | \
+				 BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) | \
+				 BIT(DMA_SLAVE_BUSWIDTH_3_BYTES) | \
+				 BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) | \
+				 BIT(DMA_SLAVE_BUSWIDTH_8_BYTES))
+
+static int udma_probe(struct platform_device *pdev)
+{
+	struct device_node *navss_node = pdev->dev.parent->of_node;
+	struct device *dev = &pdev->dev;
+	struct udma_dev *ud;
+	const struct of_device_id *match;
+	int i, ret;
+	int ch_count;
+
+	ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(48));
+	if (ret)
+		dev_err(dev, "failed to set dma mask stuff\n");
+
+	ud = devm_kzalloc(dev, sizeof(*ud), GFP_KERNEL);
+	if (!ud)
+		return -ENOMEM;
+
+	ret = udma_get_mmrs(pdev, ud);
+	if (ret)
+		return ret;
+
+	ud->tisci_rm.tisci = ti_sci_get_by_phandle(dev->of_node, "ti,sci");
+	if (IS_ERR(ud->tisci_rm.tisci))
+		return PTR_ERR(ud->tisci_rm.tisci);
+
+	ret = of_property_read_u32(dev->of_node, "ti,sci-dev-id",
+				   &ud->tisci_rm.tisci_dev_id);
+	if (ret) {
+		dev_err(dev, "ti,sci-dev-id read failure %d\n", ret);
+		return ret;
+	}
+	pdev->id = ud->tisci_rm.tisci_dev_id;
+
+	ret = of_property_read_u32(navss_node, "ti,sci-dev-id",
+				   &ud->tisci_rm.tisci_navss_dev_id);
+	if (ret) {
+		dev_err(dev, "NAVSS ti,sci-dev-id read failure %d\n", ret);
+		return ret;
+	}
+
+	ud->tisci_rm.tisci_udmap_ops = &ud->tisci_rm.tisci->ops.rm_udmap_ops;
+	ud->tisci_rm.tisci_psil_ops = &ud->tisci_rm.tisci->ops.rm_psil_ops;
+
+	ud->ringacc = of_k3_ringacc_get_by_phandle(dev->of_node, "ti,ringacc");
+	if (IS_ERR(ud->ringacc))
+		return PTR_ERR(ud->ringacc);
+
+	dev->msi_domain = of_msi_get_domain(dev, dev->of_node,
+					    DOMAIN_BUS_TI_SCI_INTA_MSI);
+	if (!dev->msi_domain) {
+		dev_err(dev, "Failed to get MSI domain\n");
+		return -EPROBE_DEFER;
+	}
+
+	match = of_match_node(udma_of_match, dev->of_node);
+	if (!match) {
+		dev_err(dev, "No compatible match found\n");
+		return -ENODEV;
+	}
+	ud->match_data = match->data;
+
+	dma_cap_set(DMA_SLAVE, ud->ddev.cap_mask);
+	dma_cap_set(DMA_CYCLIC, ud->ddev.cap_mask);
+
+	ud->ddev.device_alloc_chan_resources = udma_alloc_chan_resources;
+	ud->ddev.device_config = udma_slave_config;
+	ud->ddev.device_prep_slave_sg = udma_prep_slave_sg;
+	ud->ddev.device_prep_dma_cyclic = udma_prep_dma_cyclic;
+	ud->ddev.device_issue_pending = udma_issue_pending;
+	ud->ddev.device_tx_status = udma_tx_status;
+	ud->ddev.device_pause = udma_pause;
+	ud->ddev.device_resume = udma_resume;
+	ud->ddev.device_terminate_all = udma_terminate_all;
+	ud->ddev.device_synchronize = udma_synchronize;
+
+	ud->ddev.device_free_chan_resources = udma_free_chan_resources;
+	ud->ddev.src_addr_widths = TI_UDMAC_BUSWIDTHS;
+	ud->ddev.dst_addr_widths = TI_UDMAC_BUSWIDTHS;
+	ud->ddev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
+	ud->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
+	ud->ddev.copy_align = DMAENGINE_ALIGN_8_BYTES;
+	ud->ddev.desc_metadata_modes = DESC_METADATA_CLIENT |
+				       DESC_METADATA_ENGINE;
+	if (ud->match_data->enable_memcpy_support) {
+		dma_cap_set(DMA_MEMCPY, ud->ddev.cap_mask);
+		ud->ddev.device_prep_dma_memcpy = udma_prep_dma_memcpy;
+		ud->ddev.directions |= BIT(DMA_MEM_TO_MEM);
+	}
+
+	ud->ddev.dev = dev;
+	ud->dev = dev;
+
+	INIT_LIST_HEAD(&ud->ddev.channels);
+	INIT_LIST_HEAD(&ud->desc_to_purge);
+
+	ret = of_property_read_u32(dev->of_node, "ti,psil-base",
+				   &ud->psil_base);
+	if (ret) {
+		dev_info(dev, "Missing ti,psil-base property, using %d.\n",
+			 ret);
+		return ret;
+	}
+
+	ch_count = udma_setup_resources(ud);
+	if (ch_count <= 0)
+		return ch_count;
+
+	spin_lock_init(&ud->lock);
+	INIT_WORK(&ud->purge_work, udma_purge_desc_work);
+
+	ud->desc_align = 64;
+	if (ud->desc_align < dma_get_cache_alignment())
+		ud->desc_align = dma_get_cache_alignment();
+
+	for (i = 0; i < ud->tchan_cnt; i++) {
+		struct udma_tchan *tchan = &ud->tchans[i];
+
+		tchan->id = i;
+		tchan->reg_rt = ud->mmrs[MMR_TCHANRT] + i * 0x1000;
+	}
+
+	for (i = 0; i < ud->rchan_cnt; i++) {
+		struct udma_rchan *rchan = &ud->rchans[i];
+
+		rchan->id = i;
+		rchan->reg_rt = ud->mmrs[MMR_RCHANRT] + i * 0x1000;
+	}
+
+	for (i = 0; i < ud->rflow_cnt; i++) {
+		struct udma_rflow *rflow = &ud->rflows[i];
+
+		rflow->id = i;
+	}
+
+	for (i = 0; i < ch_count; i++) {
+		struct udma_chan *uc = &ud->channels[i];
+
+		uc->ud = ud;
+		uc->vc.desc_free = udma_desc_free;
+		uc->id = i;
+		uc->remote_thread_id = -1;
+		uc->tchan = NULL;
+		uc->rchan = NULL;
+		uc->dir = DMA_MEM_TO_MEM;
+		uc->name = devm_kasprintf(dev, GFP_KERNEL, "%s chan%d",
+					  dev_name(dev), i);
+
+		vchan_init(&uc->vc, &ud->ddev);
+		/* Use custom vchan completion handling */
+		tasklet_init(&uc->vc.task, udma_vchan_complete,
+			     (unsigned long)&uc->vc);
+		init_completion(&uc->teardown_completed);
+	}
+
+	ret = dma_async_device_register(&ud->ddev);
+	if (ret) {
+		dev_err(dev, "failed to register slave DMA engine: %d\n", ret);
+		return ret;
+	}
+
+	platform_set_drvdata(pdev, ud);
+
+	ret = of_dma_controller_register(dev->of_node, udma_of_xlate, ud);
+	if (ret) {
+		dev_err(dev, "failed to register of_dma controller\n");
+		dma_async_device_unregister(&ud->ddev);
+	}
+
+	return ret;
+}
+
+static int udma_remove(struct platform_device *pdev)
+{
+	struct udma_dev *ud = platform_get_drvdata(pdev);
+
+	of_dma_controller_free(pdev->dev.of_node);
+	dma_async_device_unregister(&ud->ddev);
+
+	/* Make sure that we did proper cleanup */
+	cancel_work_sync(&ud->purge_work);
+	udma_purge_desc_work(&ud->purge_work);
+
+	pm_runtime_put_sync(&pdev->dev);
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+static struct platform_driver udma_driver = {
+	.driver = {
+		.name	= "ti-udma",
+		.of_match_table = udma_of_match,
+	},
+	.probe		= udma_probe,
+	.remove		= udma_remove,
+};
+
+module_platform_driver(udma_driver);
+
+MODULE_ALIAS("platform:ti-udma");
+MODULE_DESCRIPTION("TI K3 DMA driver for CPPI 5.0 compliant devices");
+MODULE_AUTHOR("Peter Ujfalusi <peter.ujfalusi@ti.com>");
+MODULE_LICENSE("GPL v2");
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 10/14] dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free chan_resources
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (8 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 09/14] dmaengine: ti: New driver for K3 UDMA - split#2: probe/remove, xlate and filter_fn Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-09-10  7:25   ` Grygorii Strashko
  2019-07-30  9:34 ` [PATCH v2 11/14] dmaengine: ti: New driver for K3 UDMA - split#4: dma_device callbacks 1 Peter Ujfalusi
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Split patch for review containing: channel rsource allocation and free
functions.

DMA driver for
Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)

The UDMA-P is intended to perform similar (but significantly upgraded) functions
as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
supports the transmission and reception of various packet types. The UDMA-P is
architected to facilitate the segmentation and reassembly of SoC DMA data
structure compliant packets to/from smaller data blocks that are natively
compatible with the specific requirements of each connected peripheral. Multiple
Tx and Rx channels are provided within the DMA which allow multiple segmentation
or reassembly operations to be ongoing. The DMA controller maintains state
information for each of the channels which allows packet segmentation and
reassembly operations to be time division multiplexed between channels in order
to share the underlying DMA hardware. An external DMA scheduler is used to
control the ordering and rate at which this multiplexing occurs for Transmit
operations. The ordering and rate of Receive operations is indirectly controlled
by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.

The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
channels. Channels in the UDMA-P can be configured to be either Packet-Based or
Third-Party channels on a channel by channel basis.

The initial driver supports:
- MEM_TO_MEM (TR mode)
- DEV_TO_MEM (Packet / TR mode)
- MEM_TO_DEV (Packet / TR mode)
- Cyclic (Packet / TR mode)
- Metadata for descriptors

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/ti/k3-udma.c | 780 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 780 insertions(+)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index 52ccc6d46de9..0de38db03b8d 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1039,6 +1039,786 @@ static irqreturn_t udma_udma_irq_handler(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static struct udma_rflow *__udma_reserve_rflow(struct udma_dev *ud,
+					       enum udma_tp_level tpl, int id)
+{
+	DECLARE_BITMAP(tmp, K3_UDMA_MAX_RFLOWS);
+
+	if (id >= 0) {
+		if (test_bit(id, ud->rflow_map)) {
+			dev_err(ud->dev, "rflow%d is in use\n", id);
+			return ERR_PTR(-ENOENT);
+		}
+	} else {
+		bitmap_or(tmp, ud->rflow_map, ud->rflow_map_reserved,
+			  ud->rflow_cnt);
+
+		id = find_next_zero_bit(tmp, ud->rflow_cnt, ud->rchan_cnt);
+		if (id >= ud->rflow_cnt)
+			return ERR_PTR(-ENOENT);
+	}
+
+	set_bit(id, ud->rflow_map);
+	return &ud->rflows[id];
+}
+
+#define UDMA_RESERVE_RESOURCE(res)					\
+static struct udma_##res *__udma_reserve_##res(struct udma_dev *ud,	\
+					       enum udma_tp_level tpl,	\
+					       int id)			\
+{									\
+	if (id >= 0) {							\
+		if (test_bit(id, ud->res##_map)) {			\
+			dev_err(ud->dev, "res##%d is in use\n", id);	\
+			return ERR_PTR(-ENOENT);			\
+		}							\
+	} else {							\
+		int start;						\
+									\
+		if (tpl >= ud->match_data->tpl_levels)			\
+			tpl = ud->match_data->tpl_levels - 1;		\
+									\
+		start = ud->match_data->level_start_idx[tpl];		\
+									\
+		id = find_next_zero_bit(ud->res##_map, ud->res##_cnt,	\
+					start);				\
+		if (id == ud->res##_cnt) {				\
+			return ERR_PTR(-ENOENT);			\
+		}							\
+	}								\
+									\
+	set_bit(id, ud->res##_map);					\
+	return &ud->res##s[id];						\
+}
+
+UDMA_RESERVE_RESOURCE(tchan);
+UDMA_RESERVE_RESOURCE(rchan);
+
+static int udma_get_tchan(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+
+	if (uc->tchan) {
+		dev_dbg(ud->dev, "chan%d: already have tchan%d allocated\n",
+			uc->id, uc->tchan->id);
+		return 0;
+	}
+
+	uc->tchan = __udma_reserve_tchan(ud, uc->channel_tpl, -1);
+	if (IS_ERR(uc->tchan))
+		return PTR_ERR(uc->tchan);
+
+	return 0;
+}
+
+static int udma_get_rchan(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+
+	if (uc->rchan) {
+		dev_dbg(ud->dev, "chan%d: already have rchan%d allocated\n",
+			uc->id, uc->rchan->id);
+		return 0;
+	}
+
+	uc->rchan = __udma_reserve_rchan(ud, uc->channel_tpl, -1);
+	if (IS_ERR(uc->rchan))
+		return PTR_ERR(uc->rchan);
+
+	return 0;
+}
+
+static int udma_get_chan_pair(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+	const struct udma_match_data *match_data = ud->match_data;
+	int chan_id, end;
+
+	if ((uc->tchan && uc->rchan) && uc->tchan->id == uc->rchan->id) {
+		dev_info(ud->dev, "chan%d: already have %d pair allocated\n",
+			 uc->id, uc->tchan->id);
+		return 0;
+	}
+
+	if (uc->tchan) {
+		dev_err(ud->dev, "chan%d: already have tchan%d allocated\n",
+			uc->id, uc->tchan->id);
+		return -EBUSY;
+	} else if (uc->rchan) {
+		dev_err(ud->dev, "chan%d: already have rchan%d allocated\n",
+			uc->id, uc->rchan->id);
+		return -EBUSY;
+	}
+
+	/* Can be optimized, but let's have it like this for now */
+	end = min(ud->tchan_cnt, ud->rchan_cnt);
+	/* Try to use the highest TPL channel pair for MEM_TO_MEM channels */
+	chan_id = match_data->level_start_idx[match_data->tpl_levels - 1];
+	for (; chan_id < end; chan_id++) {
+		if (!test_bit(chan_id, ud->tchan_map) &&
+		    !test_bit(chan_id, ud->rchan_map))
+			break;
+	}
+
+	if (chan_id == end)
+		return -ENOENT;
+
+	set_bit(chan_id, ud->tchan_map);
+	set_bit(chan_id, ud->rchan_map);
+	uc->tchan = &ud->tchans[chan_id];
+	uc->rchan = &ud->rchans[chan_id];
+
+	return 0;
+}
+
+static int udma_get_rflow(struct udma_chan *uc, int flow_id)
+{
+	struct udma_dev *ud = uc->ud;
+
+	if (uc->rflow) {
+		dev_dbg(ud->dev, "chan%d: already have rflow%d allocated\n",
+			uc->id, uc->rflow->id);
+		return 0;
+	}
+
+	if (!uc->rchan)
+		dev_warn(ud->dev, "chan%d: does not have rchan??\n", uc->id);
+
+	uc->rflow = __udma_reserve_rflow(ud, uc->channel_tpl, flow_id);
+	if (IS_ERR(uc->rflow))
+		return PTR_ERR(uc->rflow);
+
+	return 0;
+}
+
+static void udma_put_rchan(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+
+	if (uc->rchan) {
+		dev_dbg(ud->dev, "chan%d: put rchan%d\n", uc->id,
+			uc->rchan->id);
+		clear_bit(uc->rchan->id, ud->rchan_map);
+		uc->rchan = NULL;
+	}
+}
+
+static void udma_put_tchan(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+
+	if (uc->tchan) {
+		dev_dbg(ud->dev, "chan%d: put tchan%d\n", uc->id,
+			uc->tchan->id);
+		clear_bit(uc->tchan->id, ud->tchan_map);
+		uc->tchan = NULL;
+	}
+}
+
+static void udma_put_rflow(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+
+	if (uc->rflow) {
+		dev_dbg(ud->dev, "chan%d: put rflow%d\n", uc->id,
+			uc->rflow->id);
+		clear_bit(uc->rflow->id, ud->rflow_map);
+		uc->rflow = NULL;
+	}
+}
+
+static void udma_free_tx_resources(struct udma_chan *uc)
+{
+	if (!uc->tchan)
+		return;
+
+	k3_ringacc_ring_free(uc->tchan->t_ring);
+	k3_ringacc_ring_free(uc->tchan->tc_ring);
+	uc->tchan->t_ring = NULL;
+	uc->tchan->tc_ring = NULL;
+
+	udma_put_tchan(uc);
+}
+
+static int udma_alloc_tx_resources(struct udma_chan *uc)
+{
+	struct k3_ring_cfg ring_cfg;
+	struct udma_dev *ud = uc->ud;
+	int ret;
+
+	ret = udma_get_tchan(uc);
+	if (ret)
+		return ret;
+
+	uc->tchan->t_ring = k3_ringacc_request_ring(ud->ringacc,
+						    uc->tchan->id, 0);
+	if (!uc->tchan->t_ring) {
+		ret = -EBUSY;
+		goto err_tx_ring;
+	}
+
+	uc->tchan->tc_ring = k3_ringacc_request_ring(ud->ringacc, -1, 0);
+	if (!uc->tchan->tc_ring) {
+		ret = -EBUSY;
+		goto err_txc_ring;
+	}
+
+	memset(&ring_cfg, 0, sizeof(ring_cfg));
+	ring_cfg.size = K3_UDMA_DEFAULT_RING_SIZE;
+	ring_cfg.elm_size = K3_RINGACC_RING_ELSIZE_8;
+	ring_cfg.mode = K3_RINGACC_RING_MODE_MESSAGE;
+
+	ret = k3_ringacc_ring_cfg(uc->tchan->t_ring, &ring_cfg);
+	ret |= k3_ringacc_ring_cfg(uc->tchan->tc_ring, &ring_cfg);
+
+	if (ret)
+		goto err_ringcfg;
+
+	return 0;
+
+err_ringcfg:
+	k3_ringacc_ring_free(uc->tchan->tc_ring);
+	uc->tchan->tc_ring = NULL;
+err_txc_ring:
+	k3_ringacc_ring_free(uc->tchan->t_ring);
+	uc->tchan->t_ring = NULL;
+err_tx_ring:
+	udma_put_tchan(uc);
+
+	return ret;
+}
+
+static void udma_free_rx_resources(struct udma_chan *uc)
+{
+	if (!uc->rchan)
+		return;
+
+	if (uc->dir != DMA_MEM_TO_MEM) {
+		k3_ringacc_ring_free(uc->rchan->fd_ring);
+		k3_ringacc_ring_free(uc->rchan->r_ring);
+		uc->rchan->fd_ring = NULL;
+		uc->rchan->r_ring = NULL;
+
+		udma_put_rflow(uc);
+	}
+
+	udma_put_rchan(uc);
+}
+
+static int udma_alloc_rx_resources(struct udma_chan *uc)
+{
+	struct k3_ring_cfg ring_cfg;
+	struct udma_dev *ud = uc->ud;
+	int fd_ring_id;
+	int ret;
+
+	ret = udma_get_rchan(uc);
+	if (ret)
+		return ret;
+
+	/* For MEM_TO_MEM we don't need rflow or rings */
+	if (uc->dir == DMA_MEM_TO_MEM)
+		return 0;
+
+	ret = udma_get_rflow(uc, uc->rchan->id);
+	if (ret) {
+		ret = -EBUSY;
+		goto err_rflow;
+	}
+
+	fd_ring_id = ud->tchan_cnt + ud->echan_cnt + uc->rchan->id;
+	uc->rchan->fd_ring = k3_ringacc_request_ring(ud->ringacc,
+						     fd_ring_id, 0);
+	if (!uc->rchan->fd_ring) {
+		ret = -EBUSY;
+		goto err_rx_ring;
+	}
+
+	uc->rchan->r_ring = k3_ringacc_request_ring(ud->ringacc, -1, 0);
+	if (!uc->rchan->r_ring) {
+		ret = -EBUSY;
+		goto err_rxc_ring;
+	}
+
+	memset(&ring_cfg, 0, sizeof(ring_cfg));
+
+	if (uc->pkt_mode)
+		ring_cfg.size = SG_MAX_SEGMENTS;
+	else
+		ring_cfg.size = K3_UDMA_DEFAULT_RING_SIZE;
+
+	ring_cfg.elm_size = K3_RINGACC_RING_ELSIZE_8;
+	ring_cfg.mode = K3_RINGACC_RING_MODE_MESSAGE;
+
+	ret = k3_ringacc_ring_cfg(uc->rchan->fd_ring, &ring_cfg);
+	ring_cfg.size = K3_UDMA_DEFAULT_RING_SIZE;
+	ret |= k3_ringacc_ring_cfg(uc->rchan->r_ring, &ring_cfg);
+
+	if (ret)
+		goto err_ringcfg;
+
+	return 0;
+
+err_ringcfg:
+	k3_ringacc_ring_free(uc->rchan->r_ring);
+	uc->rchan->r_ring = NULL;
+err_rxc_ring:
+	k3_ringacc_ring_free(uc->rchan->fd_ring);
+	uc->rchan->fd_ring = NULL;
+err_rx_ring:
+	udma_put_rflow(uc);
+err_rflow:
+	udma_put_rchan(uc);
+
+	return ret;
+}
+
+static int udma_tisci_channel_config(struct udma_chan *uc)
+{
+	struct udma_dev *ud = uc->ud;
+	struct udma_tisci_rm *tisci_rm = &ud->tisci_rm;
+	const struct ti_sci_rm_udmap_ops *tisci_ops = tisci_rm->tisci_udmap_ops;
+	struct udma_tchan *tchan = uc->tchan;
+	struct udma_rchan *rchan = uc->rchan;
+	int ret = 0;
+
+	if (uc->dir == DMA_MEM_TO_MEM) {
+		/* Non synchronized - mem to mem type of transfer */
+		int tc_ring = k3_ringacc_get_ring_id(tchan->tc_ring);
+		struct ti_sci_msg_rm_udmap_tx_ch_cfg req_tx = { 0 };
+		struct ti_sci_msg_rm_udmap_rx_ch_cfg req_rx = { 0 };
+
+		req_tx.valid_params =
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_EINFO_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_PSWORDS_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID;
+
+		req_tx.nav_id = tisci_rm->tisci_dev_id;
+		req_tx.index = tchan->id;
+		req_tx.tx_pause_on_err = 0;
+		req_tx.tx_filt_einfo = 0;
+		req_tx.tx_filt_pswords = 0;
+		req_tx.tx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_BCOPY_PBRR;
+		req_tx.tx_supr_tdpkt = 0;
+		req_tx.tx_fetch_size = sizeof(struct cppi5_desc_hdr_t) >> 2;
+		req_tx.txcq_qnum = tc_ring;
+
+		ret = tisci_ops->tx_ch_cfg(tisci_rm->tisci, &req_tx);
+		if (ret) {
+			dev_err(ud->dev, "tchan%d cfg failed %d\n",
+				tchan->id, ret);
+			return ret;
+		}
+
+		req_rx.valid_params =
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_SHORT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_LONG_VALID;
+
+		req_rx.nav_id = tisci_rm->tisci_dev_id;
+		req_rx.index = rchan->id;
+		req_rx.rx_fetch_size = sizeof(struct cppi5_desc_hdr_t) >> 2;
+		req_rx.rxcq_qnum = tc_ring;
+		req_rx.rx_pause_on_err = 0;
+		req_rx.rx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_BCOPY_PBRR;
+		req_rx.rx_ignore_short = 0;
+		req_rx.rx_ignore_long = 0;
+
+		ret = tisci_ops->rx_ch_cfg(tisci_rm->tisci, &req_rx);
+		if (ret) {
+			dev_err(ud->dev, "rchan%d alloc failed %d\n",
+				rchan->id, ret);
+			return ret;
+		}
+	} else {
+		/* Slave transfer */
+		u32 mode, fetch_size;
+
+		if (uc->pkt_mode) {
+			mode = TI_SCI_RM_UDMAP_CHAN_TYPE_PKT_PBRR;
+			fetch_size = cppi5_hdesc_calc_size(uc->needs_epib,
+							   uc->psd_size, 0);
+		} else {
+			mode = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_PBRR;
+			fetch_size = sizeof(struct cppi5_desc_hdr_t);
+		}
+
+		if (uc->dir == DMA_MEM_TO_DEV) {
+			/* TX */
+			int tc_ring = k3_ringacc_get_ring_id(tchan->tc_ring);
+			struct ti_sci_msg_rm_udmap_tx_ch_cfg req_tx = { 0 };
+
+			req_tx.valid_params =
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_EINFO_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_PSWORDS_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID;
+
+			req_tx.nav_id = tisci_rm->tisci_dev_id;
+			req_tx.index = tchan->id;
+			req_tx.tx_pause_on_err = 0;
+			req_tx.tx_filt_einfo = 0;
+			req_tx.tx_filt_pswords = 0;
+			req_tx.tx_chan_type = mode;
+			req_tx.tx_supr_tdpkt = 0;
+			req_tx.tx_fetch_size = fetch_size >> 2;
+			req_tx.txcq_qnum = tc_ring;
+
+			ret = tisci_ops->tx_ch_cfg(tisci_rm->tisci, &req_tx);
+			if (ret) {
+				dev_err(ud->dev, "tchan%d cfg failed %d\n",
+					tchan->id, ret);
+				return ret;
+			}
+		} else {
+			/* RX */
+			int fd_ring = k3_ringacc_get_ring_id(rchan->fd_ring);
+			int rx_ring = k3_ringacc_get_ring_id(rchan->r_ring);
+			struct ti_sci_msg_rm_udmap_rx_ch_cfg req_rx = { 0 };
+			struct ti_sci_msg_rm_udmap_flow_cfg flow_req = { 0 };
+
+			req_rx.valid_params =
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_SHORT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_LONG_VALID;
+
+			req_rx.nav_id = tisci_rm->tisci_dev_id;
+			req_rx.index = rchan->id;
+			req_rx.rx_fetch_size =  fetch_size >> 2;
+			req_rx.rxcq_qnum = rx_ring;
+			req_rx.rx_pause_on_err = 0;
+			req_rx.rx_chan_type = mode;
+			req_rx.rx_ignore_short = 0;
+			req_rx.rx_ignore_long = 0;
+
+			ret = tisci_ops->rx_ch_cfg(tisci_rm->tisci, &req_rx);
+			if (ret) {
+				dev_err(ud->dev, "rchan%d cfg failed %d\n",
+					rchan->id, ret);
+				return ret;
+			}
+
+			flow_req.valid_params =
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_EINFO_PRESENT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_PSINFO_PRESENT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_ERROR_HANDLING_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DESC_TYPE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_HI_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_LO_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_HI_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_LO_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ0_SZ0_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ1_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ2_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ3_QNUM_VALID;
+
+			flow_req.nav_id = tisci_rm->tisci_dev_id;
+			flow_req.flow_index = rchan->id;
+
+			if (uc->needs_epib)
+				flow_req.rx_einfo_present = 1;
+			else
+				flow_req.rx_einfo_present = 0;
+			if (uc->psd_size)
+				flow_req.rx_psinfo_present = 1;
+			else
+				flow_req.rx_psinfo_present = 0;
+			flow_req.rx_error_handling = 1;
+			flow_req.rx_desc_type = 0;
+			flow_req.rx_dest_qnum = rx_ring;
+			flow_req.rx_src_tag_hi_sel = 2;
+			flow_req.rx_src_tag_lo_sel = 4;
+			flow_req.rx_dest_tag_hi_sel = 5;
+			flow_req.rx_dest_tag_lo_sel = 4;
+			flow_req.rx_fdq0_sz0_qnum = fd_ring;
+			flow_req.rx_fdq1_qnum = fd_ring;
+			flow_req.rx_fdq2_qnum = fd_ring;
+			flow_req.rx_fdq3_qnum = fd_ring;
+
+			ret = tisci_ops->rx_flow_cfg(tisci_rm->tisci,
+						     &flow_req);
+
+			if (ret) {
+				dev_err(ud->dev, "flow%d config failed: %d\n",
+					rchan->id, ret);
+				return ret;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int udma_alloc_chan_resources(struct dma_chan *chan)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	struct udma_dev *ud = to_udma_dev(chan->device);
+	const struct udma_match_data *match_data = ud->match_data;
+	struct k3_ring *irq_ring;
+	u32 irq_udma_idx;
+	int ret;
+
+	if (uc->pkt_mode || uc->dir == DMA_MEM_TO_MEM) {
+		uc->use_dma_pool = true;
+		/* in case of MEM_TO_MEM we have maximum of two TRs */
+		if (uc->dir == DMA_MEM_TO_MEM) {
+			uc->hdesc_size = cppi5_trdesc_calc_size(
+					sizeof(struct cppi5_tr_type15_t), 2);
+			uc->pkt_mode = false;
+		}
+	}
+
+	if (uc->use_dma_pool) {
+		uc->hdesc_pool = dma_pool_create(uc->name, ud->ddev.dev,
+						 uc->hdesc_size, ud->desc_align,
+						 0);
+		if (!uc->hdesc_pool) {
+			dev_err(ud->ddev.dev,
+				"Descriptor pool allocation failed\n");
+			uc->use_dma_pool = false;
+			return -ENOMEM;
+		}
+	}
+
+	pm_runtime_get_sync(ud->ddev.dev);
+
+	/*
+	 * Make sure that the completion is in a known state:
+	 * No teardown, the channel is idle
+	 */
+	reinit_completion(&uc->teardown_completed);
+	complete_all(&uc->teardown_completed);
+	uc->state = UDMA_CHAN_IS_IDLE;
+
+	switch (uc->dir) {
+	case DMA_MEM_TO_MEM:
+		/* Non synchronized - mem to mem type of transfer */
+		dev_dbg(uc->ud->dev, "%s: chan%d as MEM-to-MEM\n", __func__,
+			uc->id);
+
+		ret = udma_get_chan_pair(uc);
+		if (ret)
+			return ret;
+
+		ret = udma_alloc_tx_resources(uc);
+		if (ret)
+			return ret;
+
+		ret = udma_alloc_rx_resources(uc);
+		if (ret) {
+			udma_free_tx_resources(uc);
+			return ret;
+		}
+
+		uc->src_thread = ud->psil_base + uc->tchan->id;
+		uc->dst_thread = (ud->psil_base + uc->rchan->id) |
+				 UDMA_PSIL_DST_THREAD_ID_OFFSET;
+
+		irq_ring = uc->tchan->tc_ring;
+		irq_udma_idx = uc->tchan->id;
+		break;
+	case DMA_MEM_TO_DEV:
+		/* Slave transfer synchronized - mem to dev (TX) trasnfer */
+		dev_dbg(uc->ud->dev, "%s: chan%d as MEM-to-DEV\n", __func__,
+			uc->id);
+
+		ret = udma_alloc_tx_resources(uc);
+		if (ret) {
+			uc->remote_thread_id = -1;
+			return ret;
+		}
+
+		uc->src_thread = ud->psil_base + uc->tchan->id;
+		uc->dst_thread = uc->remote_thread_id;
+		uc->dst_thread |= UDMA_PSIL_DST_THREAD_ID_OFFSET;
+
+		irq_ring = uc->tchan->tc_ring;
+		irq_udma_idx = uc->tchan->id;
+		break;
+	case DMA_DEV_TO_MEM:
+		/* Slave transfer synchronized - dev to mem (RX) trasnfer */
+		dev_dbg(uc->ud->dev, "%s: chan%d as DEV-to-MEM\n", __func__,
+			uc->id);
+
+		ret = udma_alloc_rx_resources(uc);
+		if (ret) {
+			uc->remote_thread_id = -1;
+			return ret;
+		}
+
+		uc->src_thread = uc->remote_thread_id;
+		uc->dst_thread = (ud->psil_base + uc->rchan->id) |
+				 UDMA_PSIL_DST_THREAD_ID_OFFSET;
+
+		irq_ring = uc->rchan->r_ring;
+		irq_udma_idx = match_data->rchan_oes_offset + uc->rchan->id;
+		break;
+	default:
+		/* Can not happen */
+		dev_err(uc->ud->dev, "%s: chan%d invalid direction (%u)\n",
+			__func__, uc->id, uc->dir);
+		return -EINVAL;
+	}
+
+	/* Configure channel(s), rflow via tisci */
+	ret = udma_tisci_channel_config(uc);
+	if (ret)
+		goto err_res_free;
+
+	if (udma_is_chan_running(uc)) {
+		dev_warn(ud->dev, "chan%d: is running!\n", uc->id);
+		udma_stop(uc);
+		if (udma_is_chan_running(uc)) {
+			dev_err(ud->dev, "chan%d: won't stop!\n", uc->id);
+			goto err_res_free;
+		}
+	}
+
+	/* PSI-L pairing */
+	ret = navss_psil_pair(ud, uc->src_thread, uc->dst_thread);
+	if (ret) {
+		dev_err(ud->dev, "PSI-L pairing failed: 0x%04x -> 0x%04x\n",
+			uc->src_thread, uc->dst_thread);
+		goto err_res_free;
+	}
+
+	uc->psil_paired = true;
+
+	uc->irq_num_ring = k3_ringacc_get_ring_irq_num(irq_ring);
+	if (uc->irq_num_ring <= 0) {
+		dev_err(ud->dev, "Failed to get ring irq (index: %u)\n",
+			k3_ringacc_get_ring_id(irq_ring));
+		ret = -EINVAL;
+		goto err_psi_free;
+	}
+
+	ret = request_irq(uc->irq_num_ring, udma_ring_irq_handler,
+			  IRQF_TRIGGER_HIGH, uc->name, uc);
+	if (ret) {
+		dev_err(ud->dev, "chan%d: ring irq request failed\n", uc->id);
+		goto err_irq_free;
+	}
+
+	/* Event from UDMA (TR events) only needed for slave TR mode channels */
+	if (is_slave_direction(uc->dir) && !uc->pkt_mode) {
+		uc->irq_num_udma = ti_sci_inta_msi_get_virq(ud->dev,
+							    irq_udma_idx);
+		if (uc->irq_num_udma <= 0) {
+			dev_err(ud->dev, "Failed to get udma irq (index: %u)\n",
+				irq_udma_idx);
+			free_irq(uc->irq_num_ring, uc);
+			ret = -EINVAL;
+			goto err_irq_free;
+		}
+
+		ret = request_irq(uc->irq_num_udma, udma_udma_irq_handler, 0,
+				  uc->name, uc);
+		if (ret) {
+			dev_err(ud->dev, "chan%d: UDMA irq request failed\n",
+				uc->id);
+			free_irq(uc->irq_num_ring, uc);
+			goto err_irq_free;
+		}
+	} else {
+		uc->irq_num_udma = 0;
+	}
+
+	udma_reset_rings(uc);
+
+	return 0;
+
+err_irq_free:
+	uc->irq_num_ring = 0;
+	uc->irq_num_udma = 0;
+err_psi_free:
+	navss_psil_unpair(ud, uc->src_thread, uc->dst_thread);
+	uc->psil_paired = false;
+err_res_free:
+	udma_free_tx_resources(uc);
+	udma_free_rx_resources(uc);
+
+	uc->remote_thread_id = -1;
+	uc->dir = DMA_MEM_TO_MEM;
+	uc->pkt_mode = false;
+	uc->static_tr_type = 0;
+	uc->enable_acc32 = 0;
+	uc->enable_burst = 0;
+	uc->channel_tpl = 0;
+	uc->psd_size = 0;
+	uc->metadata_size = 0;
+	uc->hdesc_size = 0;
+
+	if (uc->use_dma_pool) {
+		dma_pool_destroy(uc->hdesc_pool);
+		uc->use_dma_pool = false;
+	}
+
+	return ret;
+}
+
+static void udma_free_chan_resources(struct dma_chan *chan)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	struct udma_dev *ud = to_udma_dev(chan->device);
+
+	udma_terminate_all(chan);
+
+	if (uc->irq_num_ring > 0) {
+		free_irq(uc->irq_num_ring, uc);
+
+		uc->irq_num_ring = 0;
+	}
+	if (uc->irq_num_udma > 0) {
+		free_irq(uc->irq_num_udma, uc);
+
+		uc->irq_num_udma = 0;
+	}
+
+	/* Release PSI-L pairing */
+	if (uc->psil_paired) {
+		navss_psil_unpair(ud, uc->src_thread, uc->dst_thread);
+		uc->psil_paired = false;
+	}
+
+	vchan_free_chan_resources(&uc->vc);
+	tasklet_kill(&uc->vc.task);
+
+	pm_runtime_put(ud->ddev.dev);
+
+	udma_free_tx_resources(uc);
+	udma_free_rx_resources(uc);
+
+	uc->remote_thread_id = -1;
+	uc->dir = DMA_MEM_TO_MEM;
+	uc->pkt_mode = false;
+	uc->static_tr_type = 0;
+	uc->enable_acc32 = 0;
+	uc->enable_burst = 0;
+	uc->channel_tpl = 0;
+	uc->psd_size = 0;
+	uc->metadata_size = 0;
+	uc->hdesc_size = 0;
+
+	if (uc->use_dma_pool) {
+		dma_pool_destroy(uc->hdesc_pool);
+		uc->use_dma_pool = false;
+	}
+}
+
 static struct platform_driver udma_driver;
 
 static bool udma_slave_thread_is_packet_mode(struct udma_chan *uc)
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 11/14] dmaengine: ti: New driver for K3 UDMA - split#4: dma_device callbacks 1
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (9 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 10/14] dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free chan_resources Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 12/14] dmaengine: ti: New driver for K3 UDMA - split#5: dma_device callbacks 2 Peter Ujfalusi
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Split patch for review containing:
device_config, device_issue_pending, device_tx_status, device_pause,
device_resume, device_terminate_all and device_synchronize callback
implementation and the custom udma_vchan_complete.

DMA driver for
Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)

The UDMA-P is intended to perform similar (but significantly upgraded) functions
as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
supports the transmission and reception of various packet types. The UDMA-P is
architected to facilitate the segmentation and reassembly of SoC DMA data
structure compliant packets to/from smaller data blocks that are natively
compatible with the specific requirements of each connected peripheral. Multiple
Tx and Rx channels are provided within the DMA which allow multiple segmentation
or reassembly operations to be ongoing. The DMA controller maintains state
information for each of the channels which allows packet segmentation and
reassembly operations to be time division multiplexed between channels in order
to share the underlying DMA hardware. An external DMA scheduler is used to
control the ordering and rate at which this multiplexing occurs for Transmit
operations. The ordering and rate of Receive operations is indirectly controlled
by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.

The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
channels. Channels in the UDMA-P can be configured to be either Packet-Based or
Third-Party channels on a channel by channel basis.

The initial driver supports:
- MEM_TO_MEM (TR mode)
- DEV_TO_MEM (Packet / TR mode)
- MEM_TO_DEV (Packet / TR mode)
- Cyclic (Packet / TR mode)
- Metadata for descriptors

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/ti/k3-udma.c | 297 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 297 insertions(+)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index 0de38db03b8d..807670ba9774 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1770,6 +1770,303 @@ static int udma_alloc_chan_resources(struct dma_chan *chan)
 	return ret;
 }
 
+static int udma_slave_config(struct dma_chan *chan,
+			     struct dma_slave_config *cfg)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+
+	memcpy(&uc->cfg, cfg, sizeof(uc->cfg));
+
+	return 0;
+}
+
+static void udma_issue_pending(struct dma_chan *chan)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&uc->vc.lock, flags);
+
+	/* If we have something pending and no active descriptor, then */
+	if (vchan_issue_pending(&uc->vc) && !uc->desc) {
+		/*
+		 * start a descriptor if the channel is NOT [marked as
+		 * terminating _and_ it is still running (teardown has not
+		 * completed yet)].
+		 */
+		if (!(uc->state == UDMA_CHAN_IS_TERMINATING &&
+		      udma_is_chan_running(uc)))
+			udma_start(uc);
+	}
+
+	spin_unlock_irqrestore(&uc->vc.lock, flags);
+}
+
+/* Not much yet */
+static enum dma_status udma_tx_status(struct dma_chan *chan,
+				      dma_cookie_t cookie,
+				      struct dma_tx_state *txstate)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	enum dma_status ret;
+
+	ret = dma_cookie_status(chan, cookie, txstate);
+
+	if (!udma_is_chan_running(uc))
+		ret = DMA_COMPLETE;
+
+	if (ret == DMA_COMPLETE || !txstate)
+		return ret;
+
+	if (uc->desc && uc->desc->vd.tx.cookie == cookie) {
+		u32 pdma_bcnt = 0;
+		u32 bcnt = 0;
+		u32 pcnt = 0;
+		u32 residue = uc->desc->residue;
+		u32 delay = 0;
+
+		if (uc->desc->dir == DMA_MEM_TO_DEV) {
+			bcnt = udma_tchanrt_read(uc->tchan,
+						 UDMA_TCHAN_RT_SBCNT_REG);
+			pdma_bcnt = udma_tchanrt_read(uc->tchan,
+						UDMA_TCHAN_RT_PEER_BCNT_REG);
+			pcnt = udma_tchanrt_read(uc->tchan,
+						 UDMA_TCHAN_RT_PCNT_REG);
+
+			if (bcnt > pdma_bcnt)
+				delay = bcnt - pdma_bcnt;
+		} else if (uc->desc->dir == DMA_DEV_TO_MEM) {
+			bcnt = udma_rchanrt_read(uc->rchan,
+						 UDMA_RCHAN_RT_BCNT_REG);
+			pdma_bcnt = udma_rchanrt_read(uc->rchan,
+						UDMA_RCHAN_RT_PEER_BCNT_REG);
+			pcnt = udma_rchanrt_read(uc->rchan,
+						 UDMA_RCHAN_RT_PCNT_REG);
+
+			if (pdma_bcnt > bcnt)
+				delay = pdma_bcnt - bcnt;
+		} else {
+			u32 sbcnt;
+
+			sbcnt = udma_tchanrt_read(uc->tchan,
+						  UDMA_TCHAN_RT_BCNT_REG);
+			bcnt = udma_tchanrt_read(uc->tchan,
+						 UDMA_TCHAN_RT_PEER_BCNT_REG);
+			pcnt = udma_tchanrt_read(uc->tchan,
+						 UDMA_TCHAN_RT_PCNT_REG);
+
+			if (sbcnt > bcnt)
+				delay = sbcnt - bcnt;
+		}
+
+		bcnt -= uc->bcnt;
+		if (bcnt && !(bcnt % uc->desc->residue))
+			residue = 0;
+		else
+			residue -= bcnt % uc->desc->residue;
+
+		if (!residue && (uc->dir == DMA_DEV_TO_MEM || !delay)) {
+			ret = DMA_COMPLETE;
+			delay = 0;
+		}
+
+		dma_set_residue(txstate, residue);
+		dma_set_in_flight_bytes(txstate, delay);
+
+	} else {
+		ret = DMA_COMPLETE;
+	}
+
+	return ret;
+}
+
+static int udma_pause(struct dma_chan *chan)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+
+	if (!uc->desc)
+		return -EINVAL;
+
+	/* pause the channel */
+	switch (uc->desc->dir) {
+	case DMA_DEV_TO_MEM:
+		udma_rchanrt_update_bits(uc->rchan,
+					 UDMA_RCHAN_RT_PEER_RT_EN_REG,
+					 UDMA_PEER_RT_EN_PAUSE,
+					 UDMA_PEER_RT_EN_PAUSE);
+		break;
+	case DMA_MEM_TO_DEV:
+		udma_tchanrt_update_bits(uc->tchan,
+					 UDMA_TCHAN_RT_PEER_RT_EN_REG,
+					 UDMA_PEER_RT_EN_PAUSE,
+					 UDMA_PEER_RT_EN_PAUSE);
+		break;
+	case DMA_MEM_TO_MEM:
+		udma_tchanrt_update_bits(uc->tchan, UDMA_TCHAN_RT_CTL_REG,
+					 UDMA_CHAN_RT_CTL_PAUSE,
+					 UDMA_CHAN_RT_CTL_PAUSE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int udma_resume(struct dma_chan *chan)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+
+	if (!uc->desc)
+		return -EINVAL;
+
+	/* resume the channel */
+	switch (uc->desc->dir) {
+	case DMA_DEV_TO_MEM:
+		udma_rchanrt_update_bits(uc->rchan,
+					 UDMA_RCHAN_RT_PEER_RT_EN_REG,
+					 UDMA_PEER_RT_EN_PAUSE, 0);
+
+		break;
+	case DMA_MEM_TO_DEV:
+		udma_tchanrt_update_bits(uc->tchan,
+					 UDMA_TCHAN_RT_PEER_RT_EN_REG,
+					 UDMA_PEER_RT_EN_PAUSE, 0);
+		break;
+	case DMA_MEM_TO_MEM:
+		udma_tchanrt_update_bits(uc->tchan, UDMA_TCHAN_RT_CTL_REG,
+					 UDMA_CHAN_RT_CTL_PAUSE, 0);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int udma_terminate_all(struct dma_chan *chan)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	unsigned long flags;
+	LIST_HEAD(head);
+
+	spin_lock_irqsave(&uc->vc.lock, flags);
+
+	if (udma_is_chan_running(uc))
+		udma_stop(uc);
+
+	if (uc->desc) {
+		uc->terminated_desc = uc->desc;
+		uc->desc = NULL;
+		uc->terminated_desc->terminated = true;
+	}
+
+	uc->paused = false;
+
+	vchan_get_all_descriptors(&uc->vc, &head);
+	spin_unlock_irqrestore(&uc->vc.lock, flags);
+	vchan_dma_desc_free_list(&uc->vc, &head);
+
+	return 0;
+}
+
+static void udma_synchronize(struct dma_chan *chan)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	unsigned long timeout = msecs_to_jiffies(1000);
+
+	vchan_synchronize(&uc->vc);
+
+	if (uc->state == UDMA_CHAN_IS_TERMINATING) {
+		timeout = wait_for_completion_timeout(&uc->teardown_completed,
+						      timeout);
+		if (!timeout) {
+			dev_warn(uc->ud->dev, "chan%d teardown timeout!\n",
+				 uc->id);
+			udma_dump_chan_stdata(uc);
+			udma_reset_chan(uc, true);
+		}
+	}
+
+	udma_reset_chan(uc, false);
+	if (udma_is_chan_running(uc))
+		dev_warn(uc->ud->dev, "chan%d refused to stop!\n", uc->id);
+
+	udma_reset_rings(uc);
+}
+
+static void udma_desc_pre_callback(struct virt_dma_chan *vc,
+				   struct virt_dma_desc *vd,
+				   struct dmaengine_result *result)
+{
+	struct udma_chan *uc = to_udma_chan(&vc->chan);
+	struct udma_desc *d;
+
+	if (!vd)
+		return;
+
+	d = to_udma_desc(&vd->tx);
+
+	if (d->metadata_size)
+		udma_fetch_epib(uc, d);
+
+	/* Provide residue information for the client */
+	if (result) {
+		void *desc_vaddr = udma_curr_cppi5_desc_vaddr(d, d->desc_idx);
+
+		if (cppi5_desc_get_type(desc_vaddr) ==
+		    CPPI5_INFO0_DESC_TYPE_VAL_HOST) {
+			result->residue = cppi5_hdesc_get_pktlen(desc_vaddr);
+			if (result->residue == d->residue)
+				result->result = DMA_TRANS_NOERROR;
+			else
+				result->result = DMA_TRANS_ABORTED;
+		} else {
+			result->residue = d->residue;
+			result->result = DMA_TRANS_NOERROR;
+		}
+	}
+}
+
+/*
+ * This tasklet handles the completion of a DMA descriptor by
+ * calling its callback and freeing it.
+ */
+static void udma_vchan_complete(unsigned long arg)
+{
+	struct virt_dma_chan *vc = (struct virt_dma_chan *)arg;
+	struct virt_dma_desc *vd, *_vd;
+	struct dmaengine_desc_callback cb;
+	LIST_HEAD(head);
+
+	spin_lock_irq(&vc->lock);
+	list_splice_tail_init(&vc->desc_completed, &head);
+	vd = vc->cyclic;
+	if (vd) {
+		vc->cyclic = NULL;
+		dmaengine_desc_get_callback(&vd->tx, &cb);
+	} else {
+		memset(&cb, 0, sizeof(cb));
+	}
+	spin_unlock_irq(&vc->lock);
+
+	udma_desc_pre_callback(vc, vd, NULL);
+	dmaengine_desc_callback_invoke(&cb, NULL);
+
+	list_for_each_entry_safe(vd, _vd, &head, node) {
+		struct dmaengine_result result;
+
+		dmaengine_desc_get_callback(&vd->tx, &cb);
+
+		list_del(&vd->node);
+
+		udma_desc_pre_callback(vc, vd, &result);
+		dmaengine_desc_callback_invoke(&cb, &result);
+
+		vchan_vdesc_fini(vd);
+	}
+}
+
 static void udma_free_chan_resources(struct dma_chan *chan)
 {
 	struct udma_chan *uc = to_udma_chan(chan);
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 12/14] dmaengine: ti: New driver for K3 UDMA - split#5: dma_device callbacks 2
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (10 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 11/14] dmaengine: ti: New driver for K3 UDMA - split#4: dma_device callbacks 1 Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 13/14] dmaengine: ti: New driver for K3 UDMA - split#6: Kconfig and Makefile Peter Ujfalusi
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Split patch for review containing:
device_prep_slave_sg and device_prep_dma_cyclic implementation supporting
packet and TR channels.

DMA driver for
Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)

The UDMA-P is intended to perform similar (but significantly upgraded) functions
as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
supports the transmission and reception of various packet types. The UDMA-P is
architected to facilitate the segmentation and reassembly of SoC DMA data
structure compliant packets to/from smaller data blocks that are natively
compatible with the specific requirements of each connected peripheral. Multiple
Tx and Rx channels are provided within the DMA which allow multiple segmentation
or reassembly operations to be ongoing. The DMA controller maintains state
information for each of the channels which allows packet segmentation and
reassembly operations to be time division multiplexed between channels in order
to share the underlying DMA hardware. An external DMA scheduler is used to
control the ordering and rate at which this multiplexing occurs for Transmit
operations. The ordering and rate of Receive operations is indirectly controlled
by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.

The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
channels. Channels in the UDMA-P can be configured to be either Packet-Based or
Third-Party channels on a channel by channel basis.

The initial driver supports:
- MEM_TO_MEM (TR mode)
- DEV_TO_MEM (Packet / TR mode)
- MEM_TO_DEV (Packet / TR mode)
- Cyclic (Packet / TR mode)
- Metadata for descriptors

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/ti/k3-udma.c | 696 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 696 insertions(+)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index 807670ba9774..bdd7652b01cf 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1780,6 +1780,702 @@ static int udma_slave_config(struct dma_chan *chan,
 	return 0;
 }
 
+static struct udma_desc *udma_alloc_tr_desc(struct udma_chan *uc,
+					    size_t tr_size, int tr_count,
+					    enum dma_transfer_direction dir)
+{
+	struct udma_hwdesc *hwdesc;
+	struct cppi5_desc_hdr_t *tr_desc;
+	struct udma_desc *d;
+	u32 reload_count = 0;
+	u32 ring_id;
+
+	switch (tr_size) {
+	case 16:
+	case 32:
+	case 64:
+	case 128:
+		break;
+	default:
+		dev_err(uc->ud->dev, "Unsupported TR size of %zu\n", tr_size);
+		return NULL;
+	}
+
+	/* We have only one descriptor containing multiple TRs */
+	d = kzalloc(sizeof(*d) + sizeof(d->hwdesc[0]), GFP_ATOMIC);
+	if (!d)
+		return NULL;
+
+	d->sglen = tr_count;
+
+	d->hwdesc_count = 1;
+	hwdesc = &d->hwdesc[0];
+
+	/* Allocate memory for DMA ring descriptor */
+	if (uc->use_dma_pool) {
+		hwdesc->cppi5_desc_size = uc->hdesc_size;
+		hwdesc->cppi5_desc_vaddr = dma_pool_zalloc(uc->hdesc_pool,
+						GFP_ATOMIC,
+						&hwdesc->cppi5_desc_paddr);
+	} else {
+		hwdesc->cppi5_desc_size = cppi5_trdesc_calc_size(tr_size,
+								 tr_count);
+		hwdesc->cppi5_desc_size = ALIGN(hwdesc->cppi5_desc_size,
+						uc->ud->desc_align);
+		hwdesc->cppi5_desc_vaddr = dma_alloc_coherent(uc->ud->dev,
+						hwdesc->cppi5_desc_size,
+						&hwdesc->cppi5_desc_paddr,
+						GFP_ATOMIC);
+	}
+
+	if (!hwdesc->cppi5_desc_vaddr) {
+		kfree(d);
+		return NULL;
+	}
+
+	/* Start of the TR req records */
+	hwdesc->tr_req_base = hwdesc->cppi5_desc_vaddr + tr_size;
+	/* Start address of the TR response array */
+	hwdesc->tr_resp_base = hwdesc->tr_req_base + tr_size * tr_count;
+
+	tr_desc = hwdesc->cppi5_desc_vaddr;
+
+	if (uc->cyclic)
+		reload_count = CPPI5_INFO0_TRDESC_RLDCNT_INFINITE;
+
+	if (dir == DMA_DEV_TO_MEM)
+		ring_id = k3_ringacc_get_ring_id(uc->rchan->r_ring);
+	else
+		ring_id = k3_ringacc_get_ring_id(uc->tchan->tc_ring);
+
+	cppi5_trdesc_init(tr_desc, tr_count, tr_size, 0, reload_count);
+	cppi5_desc_set_pktids(tr_desc, uc->id, 0x3fff);
+	cppi5_desc_set_retpolicy(tr_desc, 0, ring_id);
+
+	return d;
+}
+
+static struct udma_desc *udma_prep_slave_sg_tr(
+	struct udma_chan *uc, struct scatterlist *sgl, unsigned int sglen,
+	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
+{
+	enum dma_slave_buswidth dev_width;
+	struct scatterlist *sgent;
+	struct udma_desc *d;
+	size_t tr_size;
+	struct cppi5_tr_type1_t *tr_req = NULL;
+	unsigned int i;
+	u32 burst;
+
+	if (dir == DMA_DEV_TO_MEM) {
+		dev_width = uc->cfg.src_addr_width;
+		burst = uc->cfg.src_maxburst;
+	} else if (dir == DMA_MEM_TO_DEV) {
+		dev_width = uc->cfg.dst_addr_width;
+		burst = uc->cfg.dst_maxburst;
+	} else {
+		dev_err(uc->ud->dev, "%s: bad direction?\n", __func__);
+		return NULL;
+	}
+
+	if (!burst)
+		burst = 1;
+
+	/* Now allocate and setup the descriptor. */
+	tr_size = sizeof(struct cppi5_tr_type1_t);
+	d = udma_alloc_tr_desc(uc, tr_size, sglen, dir);
+	if (!d)
+		return NULL;
+
+	d->sglen = sglen;
+
+	tr_req = (struct cppi5_tr_type1_t *)d->hwdesc[0].tr_req_base;
+	for_each_sg(sgl, sgent, sglen, i) {
+		d->residue += sg_dma_len(sgent);
+
+		cppi5_tr_init(&tr_req[i].flags, CPPI5_TR_TYPE1, false, false,
+			      CPPI5_TR_EVENT_SIZE_COMPLETION, 0);
+		cppi5_tr_csf_set(&tr_req[i].flags, CPPI5_TR_CSF_SUPR_EVT);
+
+		tr_req[i].addr = sg_dma_address(sgent);
+		tr_req[i].icnt0 = burst * dev_width;
+		tr_req[i].dim1 = burst * dev_width;
+		tr_req[i].icnt1 = sg_dma_len(sgent) / tr_req[i].icnt0;
+	}
+
+	cppi5_tr_csf_set(&tr_req[i - 1].flags, CPPI5_TR_CSF_EOP);
+
+	return d;
+}
+
+static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d,
+				   enum dma_slave_buswidth dev_width,
+				   u16 elcnt)
+{
+	if (!uc->static_tr_type)
+		return 0;
+
+	/* Bus width translates to the element size (ES) */
+	switch (dev_width) {
+	case DMA_SLAVE_BUSWIDTH_1_BYTE:
+		d->static_tr.elsize = 0;
+		break;
+	case DMA_SLAVE_BUSWIDTH_2_BYTES:
+		d->static_tr.elsize = 1;
+		break;
+	case DMA_SLAVE_BUSWIDTH_3_BYTES:
+		d->static_tr.elsize = 2;
+		break;
+	case DMA_SLAVE_BUSWIDTH_4_BYTES:
+		d->static_tr.elsize = 3;
+		break;
+	case DMA_SLAVE_BUSWIDTH_8_BYTES:
+		d->static_tr.elsize = 4;
+		break;
+	default: /* not reached */
+		return -EINVAL;
+	}
+
+	d->static_tr.elcnt = elcnt;
+
+	/*
+	 * PDMA must to close the packet when the channel is in packet mode.
+	 * For TR mode when the channel is not cyclic we also need PDMA to close
+	 * the packet otherwise the transfer will stall because PDMA holds on
+	 * the data it has received from the peripheral.
+	 */
+	if (uc->pkt_mode || !uc->cyclic) {
+		unsigned int div = dev_width * elcnt;
+
+		if (uc->cyclic)
+			d->static_tr.bstcnt = d->residue / d->sglen / div;
+		else
+			d->static_tr.bstcnt = d->residue / div;
+
+		if (uc->dir == DMA_DEV_TO_MEM &&
+		    d->static_tr.bstcnt > uc->ud->match_data->statictr_z_mask)
+			return -EINVAL;
+	} else {
+		d->static_tr.bstcnt = 0;
+	}
+
+	return 0;
+}
+
+static struct udma_desc *udma_prep_slave_sg_pkt(
+	struct udma_chan *uc, struct scatterlist *sgl, unsigned int sglen,
+	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
+{
+	struct scatterlist *sgent;
+	struct cppi5_host_desc_t *h_desc = NULL;
+	struct udma_desc *d;
+	u32 ring_id;
+	unsigned int i;
+
+	d = kzalloc(sizeof(*d) + sglen * sizeof(d->hwdesc[0]), GFP_ATOMIC);
+	if (!d)
+		return NULL;
+
+	d->sglen = sglen;
+	d->hwdesc_count = sglen;
+
+	if (dir == DMA_DEV_TO_MEM)
+		ring_id = k3_ringacc_get_ring_id(uc->rchan->r_ring);
+	else
+		ring_id = k3_ringacc_get_ring_id(uc->tchan->tc_ring);
+
+	for_each_sg(sgl, sgent, sglen, i) {
+		struct udma_hwdesc *hwdesc = &d->hwdesc[i];
+		dma_addr_t sg_addr = sg_dma_address(sgent);
+		struct cppi5_host_desc_t *desc;
+		size_t sg_len = sg_dma_len(sgent);
+
+		hwdesc->cppi5_desc_vaddr = dma_pool_zalloc(uc->hdesc_pool,
+						GFP_ATOMIC,
+						&hwdesc->cppi5_desc_paddr);
+		if (!hwdesc->cppi5_desc_vaddr) {
+			dev_err(uc->ud->dev,
+				"descriptor%d allocation failed\n", i);
+
+			udma_free_hwdesc(uc, d);
+			kfree(d);
+			return NULL;
+		}
+
+		d->residue += sg_len;
+		hwdesc->cppi5_desc_size = uc->hdesc_size;
+		desc = hwdesc->cppi5_desc_vaddr;
+
+		if (i == 0) {
+			cppi5_hdesc_init(desc, 0, 0);
+			/* Flow and Packed ID */
+			cppi5_desc_set_pktids(&desc->hdr, uc->id, 0x3fff);
+			cppi5_desc_set_retpolicy(&desc->hdr, 0, ring_id);
+		} else {
+			cppi5_hdesc_reset_hbdesc(desc);
+			cppi5_desc_set_retpolicy(&desc->hdr, 0, 0xffff);
+		}
+
+		/* attach the sg buffer to the descriptor */
+		cppi5_hdesc_attach_buf(desc, sg_addr, sg_len, sg_addr, sg_len);
+
+		/* Attach link as host buffer descriptor */
+		if (h_desc)
+			cppi5_hdesc_link_hbdesc(h_desc,
+						hwdesc->cppi5_desc_paddr);
+
+		if (dir == DMA_MEM_TO_DEV)
+			h_desc = desc;
+	}
+
+	if (d->residue >= SZ_4M) {
+		dev_err(uc->ud->dev,
+			"%s: Transfer size %u is over the supported 4M range\n",
+			__func__, d->residue);
+		udma_free_hwdesc(uc, d);
+		kfree(d);
+		return NULL;
+	}
+
+	h_desc = d->hwdesc[0].cppi5_desc_vaddr;
+	cppi5_hdesc_set_pktlen(h_desc, d->residue);
+
+	return d;
+}
+
+static int udma_attach_metadata(struct dma_async_tx_descriptor *desc,
+				void *data, size_t len)
+{
+	struct udma_desc *d = to_udma_desc(desc);
+	struct udma_chan *uc = to_udma_chan(desc->chan);
+	struct cppi5_host_desc_t *h_desc;
+	u32 psd_size = len;
+	u32 flags = 0;
+
+	if (!uc->pkt_mode || !uc->metadata_size)
+		return -ENOTSUPP;
+
+	if (!data || len > uc->metadata_size)
+		return -EINVAL;
+
+	if (uc->needs_epib && len < CPPI5_INFO0_HDESC_EPIB_SIZE)
+		return -EINVAL;
+
+	h_desc = d->hwdesc[0].cppi5_desc_vaddr;
+	if (d->dir == DMA_MEM_TO_DEV)
+		memcpy(h_desc->epib, data, len);
+
+	if (uc->needs_epib)
+		psd_size -= CPPI5_INFO0_HDESC_EPIB_SIZE;
+
+	d->metadata = data;
+	d->metadata_size = len;
+	if (uc->needs_epib)
+		flags |= CPPI5_INFO0_HDESC_EPIB_PRESENT;
+
+	cppi5_hdesc_update_flags(h_desc, flags);
+	cppi5_hdesc_update_psdata_size(h_desc, psd_size);
+
+	return 0;
+}
+
+static void *udma_get_metadata_ptr(struct dma_async_tx_descriptor *desc,
+				   size_t *payload_len, size_t *max_len)
+{
+	struct udma_desc *d = to_udma_desc(desc);
+	struct udma_chan *uc = to_udma_chan(desc->chan);
+	struct cppi5_host_desc_t *h_desc;
+
+	if (!uc->pkt_mode || !uc->metadata_size)
+		return ERR_PTR(-ENOTSUPP);
+
+	h_desc = d->hwdesc[0].cppi5_desc_vaddr;
+
+	*max_len = uc->metadata_size;
+
+	*payload_len = cppi5_hdesc_epib_present(&h_desc->hdr) ?
+		       CPPI5_INFO0_HDESC_EPIB_SIZE : 0;
+	*payload_len += cppi5_hdesc_get_psdata_size(h_desc);
+
+	return h_desc->epib;
+}
+
+static int udma_set_metadata_len(struct dma_async_tx_descriptor *desc,
+				 size_t payload_len)
+{
+	struct udma_desc *d = to_udma_desc(desc);
+	struct udma_chan *uc = to_udma_chan(desc->chan);
+	struct cppi5_host_desc_t *h_desc;
+	u32 psd_size = payload_len;
+	u32 flags = 0;
+
+	if (!uc->pkt_mode || !uc->metadata_size)
+		return -ENOTSUPP;
+
+	if (payload_len > uc->metadata_size)
+		return -EINVAL;
+
+	if (uc->needs_epib && payload_len < CPPI5_INFO0_HDESC_EPIB_SIZE)
+		return -EINVAL;
+
+	h_desc = d->hwdesc[0].cppi5_desc_vaddr;
+
+	if (uc->needs_epib) {
+		psd_size -= CPPI5_INFO0_HDESC_EPIB_SIZE;
+		flags |= CPPI5_INFO0_HDESC_EPIB_PRESENT;
+	}
+
+	cppi5_hdesc_update_flags(h_desc, flags);
+	cppi5_hdesc_update_psdata_size(h_desc, psd_size);
+
+	return 0;
+}
+
+static struct dma_descriptor_metadata_ops metadata_ops = {
+	.attach = udma_attach_metadata,
+	.get_ptr = udma_get_metadata_ptr,
+	.set_len = udma_set_metadata_len,
+};
+
+static struct dma_async_tx_descriptor *udma_prep_slave_sg(
+	struct dma_chan *chan, struct scatterlist *sgl, unsigned int sglen,
+	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	enum dma_slave_buswidth dev_width;
+	struct udma_desc *d;
+	u32 burst;
+
+	if (dir != uc->dir) {
+		dev_err(chan->device->dev,
+			"%s: chan%d is for %s, not supporting %s\n",
+			__func__, uc->id, udma_get_dir_text(uc->dir),
+			udma_get_dir_text(dir));
+		return NULL;
+	}
+
+	if (dir == DMA_DEV_TO_MEM) {
+		dev_width = uc->cfg.src_addr_width;
+		burst = uc->cfg.src_maxburst;
+	} else if (dir == DMA_MEM_TO_DEV) {
+		dev_width = uc->cfg.dst_addr_width;
+		burst = uc->cfg.dst_maxburst;
+	} else {
+		dev_err(chan->device->dev, "%s: bad direction?\n", __func__);
+		return NULL;
+	}
+
+	if (!burst)
+		burst = 1;
+
+	if (uc->pkt_mode)
+		d = udma_prep_slave_sg_pkt(uc, sgl, sglen, dir, tx_flags,
+					   context);
+	else
+		d = udma_prep_slave_sg_tr(uc, sgl, sglen, dir, tx_flags,
+					  context);
+
+	if (!d)
+		return NULL;
+
+	d->dir = dir;
+	d->desc_idx = 0;
+	d->tr_idx = 0;
+
+	/* static TR for remote PDMA */
+	if (udma_configure_statictr(uc, d, dev_width, burst)) {
+		dev_err(uc->ud->dev,
+			"%s: StaticTR Z is limted to maximum 4095 (%u)\n",
+			__func__, d->static_tr.bstcnt);
+
+		udma_free_hwdesc(uc, d);
+		kfree(d);
+		return NULL;
+	}
+
+	if (uc->metadata_size)
+		d->vd.tx.metadata_ops = &metadata_ops;
+
+	return vchan_tx_prep(&uc->vc, &d->vd, tx_flags);
+}
+
+static struct udma_desc *udma_prep_dma_cyclic_tr(
+	struct udma_chan *uc, dma_addr_t buf_addr, size_t buf_len,
+	size_t period_len, enum dma_transfer_direction dir, unsigned long flags)
+{
+	enum dma_slave_buswidth dev_width;
+	struct udma_desc *d;
+	size_t tr_size;
+	struct cppi5_tr_type1_t *tr_req;
+	unsigned int i;
+	unsigned int periods = buf_len / period_len;
+	u32 burst;
+
+	if (dir == DMA_DEV_TO_MEM) {
+		dev_width = uc->cfg.src_addr_width;
+		burst = uc->cfg.src_maxburst;
+	} else if (dir == DMA_MEM_TO_DEV) {
+		dev_width = uc->cfg.dst_addr_width;
+		burst = uc->cfg.dst_maxburst;
+	} else {
+		dev_err(uc->ud->dev, "%s: bad direction?\n", __func__);
+		return NULL;
+	}
+
+	if (!burst)
+		burst = 1;
+
+	/* Now allocate and setup the descriptor. */
+	tr_size = sizeof(struct cppi5_tr_type1_t);
+	d = udma_alloc_tr_desc(uc, tr_size, periods, dir);
+	if (!d)
+		return NULL;
+
+	tr_req = (struct cppi5_tr_type1_t *)d->hwdesc[0].tr_req_base;
+	for (i = 0; i < periods; i++) {
+		cppi5_tr_init(&tr_req[i].flags, CPPI5_TR_TYPE1, false, false,
+			      CPPI5_TR_EVENT_SIZE_COMPLETION, 0);
+
+		tr_req[i].addr = buf_addr + period_len * i;
+		tr_req[i].icnt0 = dev_width;
+		tr_req[i].icnt1 = period_len / dev_width;
+		tr_req[i].dim1 = dev_width;
+
+		if (!(flags & DMA_PREP_INTERRUPT))
+			cppi5_tr_csf_set(&tr_req[i].flags,
+					 CPPI5_TR_CSF_SUPR_EVT);
+	}
+
+	return d;
+}
+
+static struct udma_desc *udma_prep_dma_cyclic_pkt(
+	struct udma_chan *uc, dma_addr_t buf_addr, size_t buf_len,
+	size_t period_len, enum dma_transfer_direction dir, unsigned long flags)
+{
+	struct udma_desc *d;
+	u32 ring_id;
+	int i;
+	int periods = buf_len / period_len;
+
+	if (periods > (K3_UDMA_DEFAULT_RING_SIZE - 1))
+		return NULL;
+
+	if (period_len > 0x3FFFFF)
+		return NULL;
+
+	d = kzalloc(sizeof(*d) + periods * sizeof(d->hwdesc[0]), GFP_ATOMIC);
+	if (!d)
+		return NULL;
+
+	d->hwdesc_count = periods;
+
+	/* TODO: re-check this... */
+	if (dir == DMA_DEV_TO_MEM)
+		ring_id = k3_ringacc_get_ring_id(uc->rchan->r_ring);
+	else
+		ring_id = k3_ringacc_get_ring_id(uc->tchan->tc_ring);
+
+	for (i = 0; i < periods; i++) {
+		struct udma_hwdesc *hwdesc = &d->hwdesc[i];
+		dma_addr_t period_addr = buf_addr + (period_len * i);
+		struct cppi5_host_desc_t *h_desc;
+
+		hwdesc->cppi5_desc_vaddr = dma_pool_zalloc(uc->hdesc_pool,
+						GFP_ATOMIC,
+						&hwdesc->cppi5_desc_paddr);
+		if (!hwdesc->cppi5_desc_vaddr) {
+			dev_err(uc->ud->dev,
+				"descriptor%d allocation failed\n", i);
+
+			udma_free_hwdesc(uc, d);
+			kfree(d);
+			return NULL;
+		}
+
+		hwdesc->cppi5_desc_size = uc->hdesc_size;
+		h_desc = hwdesc->cppi5_desc_vaddr;
+
+		cppi5_hdesc_init(h_desc, 0, 0);
+		cppi5_hdesc_set_pktlen(h_desc, period_len);
+
+		/* Flow and Packed ID */
+		cppi5_desc_set_pktids(&h_desc->hdr, uc->id, 0x3fff);
+		cppi5_desc_set_retpolicy(&h_desc->hdr, 0, ring_id);
+
+		/* attach each period to a new descriptor */
+		cppi5_hdesc_attach_buf(h_desc,
+				       period_addr, period_len,
+				       period_addr, period_len);
+	}
+
+	return d;
+}
+
+static struct dma_async_tx_descriptor *udma_prep_dma_cyclic(
+	struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
+	size_t period_len, enum dma_transfer_direction dir, unsigned long flags)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	enum dma_slave_buswidth dev_width;
+	struct udma_desc *d;
+	u32 burst;
+
+	if (dir != uc->dir) {
+		dev_err(chan->device->dev,
+			"%s: chan%d is for %s, not supporting %s\n",
+			__func__, uc->id, udma_get_dir_text(uc->dir),
+			udma_get_dir_text(dir));
+		return NULL;
+	}
+
+	uc->cyclic = true;
+
+	if (dir == DMA_DEV_TO_MEM) {
+		dev_width = uc->cfg.src_addr_width;
+		burst = uc->cfg.src_maxburst;
+	} else if (dir == DMA_MEM_TO_DEV) {
+		dev_width = uc->cfg.dst_addr_width;
+		burst = uc->cfg.dst_maxburst;
+	} else {
+		dev_err(uc->ud->dev, "%s: bad direction?\n", __func__);
+		return NULL;
+	}
+
+	if (!burst)
+		burst = 1;
+
+	if (uc->pkt_mode)
+		d = udma_prep_dma_cyclic_pkt(uc, buf_addr, buf_len, period_len,
+					     dir, flags);
+	else
+		d = udma_prep_dma_cyclic_tr(uc, buf_addr, buf_len, period_len,
+					    dir, flags);
+
+	if (!d)
+		return NULL;
+
+	d->sglen = buf_len / period_len;
+
+	d->dir = dir;
+	d->residue = buf_len;
+
+	/* static TR for remote PDMA */
+	if (udma_configure_statictr(uc, d, dev_width, burst)) {
+		dev_err(uc->ud->dev,
+			"%s: StaticTR Z is limted to maximum 4095 (%u)\n",
+			__func__, d->static_tr.bstcnt);
+
+		udma_free_hwdesc(uc, d);
+		kfree(d);
+		return NULL;
+	}
+
+	if (uc->metadata_size)
+		d->vd.tx.metadata_ops = &metadata_ops;
+
+	return vchan_tx_prep(&uc->vc, &d->vd, flags);
+}
+
+static struct dma_async_tx_descriptor *udma_prep_dma_memcpy(
+	struct dma_chan *chan, dma_addr_t dest, dma_addr_t src,
+	size_t len, unsigned long tx_flags)
+{
+	struct udma_chan *uc = to_udma_chan(chan);
+	struct udma_desc *d;
+	struct cppi5_tr_type15_t *tr_req;
+	int num_tr;
+	size_t tr_size = sizeof(struct cppi5_tr_type15_t);
+	u16 tr0_cnt0, tr0_cnt1, tr1_cnt0;
+
+	if (uc->dir != DMA_MEM_TO_MEM) {
+		dev_err(chan->device->dev,
+			"%s: chan%d is for %s, not supporting %s\n",
+			__func__, uc->id, udma_get_dir_text(uc->dir),
+			udma_get_dir_text(DMA_MEM_TO_MEM));
+		return NULL;
+	}
+
+	if (len < SZ_64K) {
+		num_tr = 1;
+		tr0_cnt0 = len;
+		tr0_cnt1 = 1;
+	} else {
+		unsigned long align_to = __ffs(src | dest);
+
+		if (align_to > 3)
+			align_to = 3;
+		/*
+		 * Keep simple: tr0: SZ_64K-alignment blocks,
+		 *		tr1: the remaining
+		 */
+		num_tr = 2;
+		tr0_cnt0 = (SZ_64K - BIT(align_to));
+		if (len / tr0_cnt0 >= SZ_64K) {
+			dev_err(uc->ud->dev, "size %zu is not supported\n",
+				len);
+			return NULL;
+		}
+
+		tr0_cnt1 = len / tr0_cnt0;
+		tr1_cnt0 = len % tr0_cnt0;
+	}
+
+	d = udma_alloc_tr_desc(uc, tr_size, num_tr, DMA_MEM_TO_MEM);
+	if (!d)
+		return NULL;
+
+	d->dir = DMA_MEM_TO_MEM;
+	d->desc_idx = 0;
+	d->tr_idx = 0;
+	d->residue = len;
+
+	tr_req = (struct cppi5_tr_type15_t *)d->hwdesc[0].tr_req_base;
+
+	cppi5_tr_init(&tr_req[0].flags, CPPI5_TR_TYPE15, false, true,
+		      CPPI5_TR_EVENT_SIZE_COMPLETION, 0);
+	cppi5_tr_csf_set(&tr_req[0].flags, CPPI5_TR_CSF_SUPR_EVT);
+
+	tr_req[0].addr = src;
+	tr_req[0].icnt0 = tr0_cnt0;
+	tr_req[0].icnt1 = tr0_cnt1;
+	tr_req[0].icnt2 = 1;
+	tr_req[0].icnt3 = 1;
+	tr_req[0].dim1 = tr0_cnt0;
+
+	tr_req[0].daddr = dest;
+	tr_req[0].dicnt0 = tr0_cnt0;
+	tr_req[0].dicnt1 = tr0_cnt1;
+	tr_req[0].dicnt2 = 1;
+	tr_req[0].dicnt3 = 1;
+	tr_req[0].ddim1 = tr0_cnt0;
+
+	if (num_tr == 2) {
+		cppi5_tr_init(&tr_req[1].flags, CPPI5_TR_TYPE15, false, true,
+			      CPPI5_TR_EVENT_SIZE_COMPLETION, 0);
+		cppi5_tr_csf_set(&tr_req[1].flags, CPPI5_TR_CSF_SUPR_EVT);
+
+		tr_req[1].addr = src + tr0_cnt1 * tr0_cnt0;
+		tr_req[1].icnt0 = tr1_cnt0;
+		tr_req[1].icnt1 = 1;
+		tr_req[1].icnt2 = 1;
+		tr_req[1].icnt3 = 1;
+
+		tr_req[1].daddr = dest + tr0_cnt1 * tr0_cnt0;
+		tr_req[1].dicnt0 = tr1_cnt0;
+		tr_req[1].dicnt1 = 1;
+		tr_req[1].dicnt2 = 1;
+		tr_req[1].dicnt3 = 1;
+	}
+
+	cppi5_tr_csf_set(&tr_req[num_tr - 1].flags, CPPI5_TR_CSF_EOP);
+
+	if (uc->metadata_size)
+		d->vd.tx.metadata_ops = &metadata_ops;
+
+	return vchan_tx_prep(&uc->vc, &d->vd, tx_flags);
+}
+
 static void udma_issue_pending(struct dma_chan *chan)
 {
 	struct udma_chan *uc = to_udma_chan(chan);
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 13/14] dmaengine: ti: New driver for K3 UDMA - split#6: Kconfig and Makefile
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (11 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 12/14] dmaengine: ti: New driver for K3 UDMA - split#5: dma_device callbacks 2 Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-30  9:34 ` [PATCH v2 14/14] dmaengine: ti: k3-udma: Add glue layer for non DMAengine users Peter Ujfalusi
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Split patch for review containing:
Kconfig and Makefile changes

DMA driver for
Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)

The UDMA-P is intended to perform similar (but significantly upgraded) functions
as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
supports the transmission and reception of various packet types. The UDMA-P is
architected to facilitate the segmentation and reassembly of SoC DMA data
structure compliant packets to/from smaller data blocks that are natively
compatible with the specific requirements of each connected peripheral. Multiple
Tx and Rx channels are provided within the DMA which allow multiple segmentation
or reassembly operations to be ongoing. The DMA controller maintains state
information for each of the channels which allows packet segmentation and
reassembly operations to be time division multiplexed between channels in order
to share the underlying DMA hardware. An external DMA scheduler is used to
control the ordering and rate at which this multiplexing occurs for Transmit
operations. The ordering and rate of Receive operations is indirectly controlled
by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.

The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
channels. Channels in the UDMA-P can be configured to be either Packet-Based or
Third-Party channels on a channel by channel basis.

The initial driver supports:
- MEM_TO_MEM (TR mode)
- DEV_TO_MEM (Packet / TR mode)
- MEM_TO_DEV (Packet / TR mode)
- Cyclic (Packet / TR mode)
- Metadata for descriptors

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/ti/Kconfig  | 13 +++++++++++++
 drivers/dma/ti/Makefile |  1 +
 2 files changed, 14 insertions(+)

diff --git a/drivers/dma/ti/Kconfig b/drivers/dma/ti/Kconfig
index d507c24fbf31..b6b7571be394 100644
--- a/drivers/dma/ti/Kconfig
+++ b/drivers/dma/ti/Kconfig
@@ -34,5 +34,18 @@ config DMA_OMAP
 	  Enable support for the TI sDMA (System DMA or DMA4) controller. This
 	  DMA engine is found on OMAP and DRA7xx parts.
 
+config TI_K3_UDMA
+	tristate "Texas Instruments UDMA support"
+	depends on ARCH_K3 || COMPILE_TEST
+	depends on TI_SCI_PROTOCOL
+	depends on TI_SCI_INTA_IRQCHIP
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	select TI_K3_RINGACC
+	default y
+        help
+	  Enable support for the TI UDMA (Unified DMA) controller. This
+	  DMA engine is used in AM65x.
+
 config TI_DMA_CROSSBAR
 	bool
diff --git a/drivers/dma/ti/Makefile b/drivers/dma/ti/Makefile
index 113e59ec9c32..ebd4822e064e 100644
--- a/drivers/dma/ti/Makefile
+++ b/drivers/dma/ti/Makefile
@@ -2,4 +2,5 @@
 obj-$(CONFIG_TI_CPPI41) += cppi41.o
 obj-$(CONFIG_TI_EDMA) += edma.o
 obj-$(CONFIG_DMA_OMAP) += omap-dma.o
+obj-$(CONFIG_TI_K3_UDMA) += k3-udma.o
 obj-$(CONFIG_TI_DMA_CROSSBAR) += dma-crossbar.o
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 14/14] dmaengine: ti: k3-udma: Add glue layer for non DMAengine users
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (12 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 13/14] dmaengine: ti: New driver for K3 UDMA - split#6: Kconfig and Makefile Peter Ujfalusi
@ 2019-07-30  9:34 ` Peter Ujfalusi
  2019-07-31  7:08 ` [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
  2019-08-30 12:12 ` Peter Ujfalusi
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-30  9:34 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

From: Grygorii Strashko <grygorii.strashko@ti.com>

Certain users can not use right now the DMAengine API due to missing
features in the core. Prime example is Networking.

These users can use the glue layer interface to avoid misuse of DMAengine
API and when the core gains the needed features they can be converted to
use generic API.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/ti/Kconfig           |    9 +
 drivers/dma/ti/Makefile          |    1 +
 drivers/dma/ti/k3-udma-glue.c    | 1039 ++++++++++++++++++++++++++++++
 drivers/dma/ti/k3-udma-private.c |  124 ++++
 drivers/dma/ti/k3-udma.c         |   61 ++
 drivers/dma/ti/k3-udma.h         |   30 +
 include/linux/dma/k3-udma-glue.h |  125 ++++
 7 files changed, 1389 insertions(+)
 create mode 100644 drivers/dma/ti/k3-udma-glue.c
 create mode 100644 drivers/dma/ti/k3-udma-private.c
 create mode 100644 include/linux/dma/k3-udma-glue.h

diff --git a/drivers/dma/ti/Kconfig b/drivers/dma/ti/Kconfig
index b6b7571be394..88f65c2123e9 100644
--- a/drivers/dma/ti/Kconfig
+++ b/drivers/dma/ti/Kconfig
@@ -47,5 +47,14 @@ config TI_K3_UDMA
 	  Enable support for the TI UDMA (Unified DMA) controller. This
 	  DMA engine is used in AM65x.
 
+config TI_K3_UDMA_GLUE_LAYER
+	tristate "Texas Instruments UDMA Glue layer for non DMAengine users"
+	depends on ARCH_K3 || COMPILE_TEST
+	depends on TI_K3_UDMA
+	default y
+	help
+	  Say y here to support the K3 NAVSS DMA glue interface
+	  If unsure, say N.
+
 config TI_DMA_CROSSBAR
 	bool
diff --git a/drivers/dma/ti/Makefile b/drivers/dma/ti/Makefile
index ebd4822e064e..fc6e0a2c7ce9 100644
--- a/drivers/dma/ti/Makefile
+++ b/drivers/dma/ti/Makefile
@@ -3,4 +3,5 @@ obj-$(CONFIG_TI_CPPI41) += cppi41.o
 obj-$(CONFIG_TI_EDMA) += edma.o
 obj-$(CONFIG_DMA_OMAP) += omap-dma.o
 obj-$(CONFIG_TI_K3_UDMA) += k3-udma.o
+obj-$(CONFIG_TI_K3_UDMA_GLUE_LAYER) += k3-udma-glue.o
 obj-$(CONFIG_TI_DMA_CROSSBAR) += dma-crossbar.o
diff --git a/drivers/dma/ti/k3-udma-glue.c b/drivers/dma/ti/k3-udma-glue.c
new file mode 100644
index 000000000000..43284ccb52f4
--- /dev/null
+++ b/drivers/dma/ti/k3-udma-glue.c
@@ -0,0 +1,1039 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * K3 NAVSS DMA glue interface
+ *
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <dt-bindings/dma/k3-udma.h>
+#include <linux/soc/ti/k3-ringacc.h>
+#include <linux/dma/ti-cppi5.h>
+#include <linux/dma/k3-udma-glue.h>
+
+#include "k3-udma.h"
+
+struct k3_udma_glue_common {
+	struct device *dev;
+	struct udma_dev *udmax;
+	const struct udma_tisci_rm *tisci_rm;
+	struct k3_ringacc *ringacc;
+	u32 src_thread;
+	u32 dst_thread;
+
+	u32  hdesc_size;
+	bool epib;
+	u32  psdata_size;
+	u32  swdata_size;
+};
+
+struct k3_udma_glue_tx_channel {
+	struct k3_udma_glue_common common;
+
+	struct udma_tchan *udma_tchanx;
+	int udma_tchan_id;
+	bool need_tisci_free;
+
+	struct k3_ring *ringtx;
+	struct k3_ring *ringtxcq;
+
+	bool psil_paired;
+
+	int virq;
+
+	atomic_t free_pkts;
+	bool tx_pause_on_err;
+	bool tx_filt_einfo;
+	bool tx_filt_pswords;
+	bool tx_supr_tdpkt;
+};
+
+/**
+ * k3_udma_glue_rx_flow - UDMA RX flow context data
+ *
+ */
+struct k3_udma_glue_rx_flow {
+	struct udma_rflow *udma_rflow;
+	int udma_rflow_id;
+	struct k3_ring *ringrx;
+	struct k3_ring *ringrxfdq;
+
+	int virq;
+};
+
+struct k3_udma_glue_rx_channel {
+	struct k3_udma_glue_common common;
+
+	struct udma_rchan *udma_rchanx;
+	int udma_rchan_id;
+	bool need_tisci_free;
+
+	bool psil_paired;
+
+	u32  swdata_size;
+	int  flow_id_base;
+
+	struct k3_udma_glue_rx_flow *flows;
+	u32 flow_num;
+	u32 flows_ready;
+};
+
+#define K3_UDMAX_TDOWN_TIMEOUT_US 1000
+
+static int of_k3_udma_glue_parse(struct device_node *udmax_np,
+				 struct k3_udma_glue_common *common)
+{
+	common->ringacc = of_k3_ringacc_get_by_phandle(udmax_np,
+						       "ti,ringacc");
+	if (IS_ERR(common->ringacc))
+		return PTR_ERR(common->ringacc);
+
+	common->udmax = of_xudma_dev_get(udmax_np, NULL);
+	if (IS_ERR(common->udmax))
+		return PTR_ERR(common->udmax);
+
+	common->tisci_rm = xudma_dev_get_tisci_rm(common->udmax);
+
+	return 0;
+}
+
+static int of_k3_udma_glue_parse_chn(struct device_node *chn_np,
+		const char *name, struct k3_udma_glue_common *common,
+		bool tx_chn)
+{
+	struct device_node *psil_cfg_node;
+	struct device_node *ch_cfg_node;
+	struct of_phandle_args dma_spec;
+	int index, ret = 0;
+	char prop[50];
+	u32 val;
+
+	if (unlikely(!name))
+		return -EINVAL;
+
+	index = of_property_match_string(chn_np, "dma-names", name);
+	if (index < 0)
+		return index;
+
+	if (of_parse_phandle_with_args(chn_np, "dmas", "#dma-cells", index,
+				       &dma_spec))
+		return -ENOENT;
+
+	if (tx_chn && dma_spec.args[2] != UDMA_DIR_TX) {
+		ret = -EINVAL;
+		goto out_put_spec;
+	}
+
+	if (!tx_chn && dma_spec.args[2] != UDMA_DIR_RX) {
+		ret = -EINVAL;
+		goto out_put_spec;
+	}
+
+	/* get psil cfg node */
+	psil_cfg_node = of_find_node_by_phandle(dma_spec.args[0]);
+	if (!psil_cfg_node) {
+		ret = -ENOENT;
+		goto out_put_spec;
+	}
+
+	snprintf(prop, sizeof(prop), "ti,psil-config%u", dma_spec.args[1]);
+	ch_cfg_node = of_find_node_by_name(psil_cfg_node, prop);
+	if (!ch_cfg_node) {
+		dev_err(common->dev,
+			"Channel %u configuration node is missing\n",
+			dma_spec.args[1]);
+		goto out_put_psil_cfg;
+	}
+
+	common->epib = of_property_read_bool(ch_cfg_node, "ti,needs-epib");
+
+	if (!of_property_read_u32(ch_cfg_node, "ti,psd-size", &val))
+		common->psdata_size = val;
+
+	ret = of_property_read_u32(psil_cfg_node, "ti,psil-base", &val);
+	if (ret) {
+		dev_err(common->dev, "ti,psil-base is missing %d\n", ret);
+		goto out_ch_cfg;
+	}
+
+	if (tx_chn)
+		common->dst_thread = val + dma_spec.args[1];
+	else
+		common->src_thread = val + dma_spec.args[1];
+	ret = of_k3_udma_glue_parse(dma_spec.np, common);
+
+out_ch_cfg:
+	of_node_put(ch_cfg_node);
+out_put_psil_cfg:
+	of_node_put(psil_cfg_node);
+out_put_spec:
+	of_node_put(dma_spec.np);
+	return ret;
+};
+
+static void k3_udma_glue_dump_tx_chn(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	struct device *dev = tx_chn->common.dev;
+
+	dev_dbg(dev, "dump_tx_chn:\n"
+		"udma_tchan_id: %d\n"
+		"src_thread: %08x\n"
+		"dst_thread: %08x\n",
+		tx_chn->udma_tchan_id,
+		tx_chn->common.src_thread,
+		tx_chn->common.dst_thread);
+}
+
+static void k3_udma_glue_dump_tx_rt_chn(struct k3_udma_glue_tx_channel *chn,
+					char *mark)
+{
+	struct device *dev = chn->common.dev;
+
+	dev_dbg(dev, "=== dump ===> %s\n", mark);
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_TCHAN_RT_CTL_REG,
+		xudma_tchanrt_read(chn->udma_tchanx, UDMA_TCHAN_RT_CTL_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_TCHAN_RT_PEER_RT_EN_REG,
+		xudma_tchanrt_read(chn->udma_tchanx,
+				   UDMA_TCHAN_RT_PEER_RT_EN_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_TCHAN_RT_PCNT_REG,
+		xudma_tchanrt_read(chn->udma_tchanx, UDMA_TCHAN_RT_PCNT_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_TCHAN_RT_BCNT_REG,
+		xudma_tchanrt_read(chn->udma_tchanx, UDMA_TCHAN_RT_BCNT_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_TCHAN_RT_SBCNT_REG,
+		xudma_tchanrt_read(chn->udma_tchanx, UDMA_TCHAN_RT_SBCNT_REG));
+}
+
+static int k3_udma_glue_cfg_tx_chn(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	const struct udma_tisci_rm *tisci_rm = tx_chn->common.tisci_rm;
+	struct ti_sci_msg_rm_udmap_tx_ch_cfg req;
+
+	memset(&req, 0, sizeof(req));
+
+	req.valid_params = TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_EINFO_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_PSWORDS_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID;
+	req.nav_id = tisci_rm->tisci_dev_id;
+	req.index = tx_chn->udma_tchan_id;
+	if (tx_chn->tx_pause_on_err)
+		req.tx_pause_on_err = 1;
+	if (tx_chn->tx_filt_einfo)
+		req.tx_filt_einfo = 1;
+	if (tx_chn->tx_filt_pswords)
+		req.tx_filt_pswords = 1;
+	req.tx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_PKT_PBRR;
+	if (tx_chn->tx_supr_tdpkt)
+		req.tx_supr_tdpkt = 1;
+	req.tx_fetch_size = tx_chn->common.hdesc_size >> 2;
+	req.txcq_qnum = k3_ringacc_get_ring_id(tx_chn->ringtxcq);
+
+	return tisci_rm->tisci_udmap_ops->tx_ch_cfg(tisci_rm->tisci, &req);
+}
+
+struct k3_udma_glue_tx_channel *k3_udma_glue_request_tx_chn(struct device *dev,
+		const char *name, struct k3_udma_glue_tx_channel_cfg *cfg)
+{
+	struct k3_udma_glue_tx_channel *tx_chn;
+	int ret;
+
+	tx_chn = devm_kzalloc(dev, sizeof(*tx_chn), GFP_KERNEL);
+	if (!tx_chn)
+		return ERR_PTR(-ENOMEM);
+
+	tx_chn->common.dev = dev;
+	tx_chn->common.swdata_size = cfg->swdata_size;
+	tx_chn->tx_pause_on_err = cfg->tx_pause_on_err;
+	tx_chn->tx_filt_einfo = cfg->tx_filt_einfo;
+	tx_chn->tx_filt_pswords = cfg->tx_filt_pswords;
+	tx_chn->tx_supr_tdpkt = cfg->tx_supr_tdpkt;
+
+	/* parse of udmap channel */
+	ret = of_k3_udma_glue_parse_chn(dev->of_node, name,
+					&tx_chn->common, true);
+	if (ret)
+		goto err;
+
+	tx_chn->common.hdesc_size = cppi5_hdesc_calc_size(tx_chn->common.epib,
+						tx_chn->common.psdata_size,
+						tx_chn->common.swdata_size);
+
+	/* request and cfg UDMAP TX channel */
+	tx_chn->udma_tchanx = xudma_tchan_get(tx_chn->common.udmax, -1);
+	if (IS_ERR(tx_chn->udma_tchanx)) {
+		ret = PTR_ERR(tx_chn->udma_tchanx);
+		dev_err(dev, "UDMAX tchanx get err %d\n", ret);
+		goto err;
+	}
+	tx_chn->udma_tchan_id = xudma_tchan_get_id(tx_chn->udma_tchanx);
+
+	atomic_set(&tx_chn->free_pkts, cfg->txcq_cfg.size);
+
+	/* request and cfg rings */
+	tx_chn->ringtx = k3_ringacc_request_ring(tx_chn->common.ringacc,
+						 tx_chn->udma_tchan_id, 0);
+	if (!tx_chn->ringtx) {
+		ret = -ENODEV;
+		dev_err(dev, "Failed to get TX ring %u\n",
+			tx_chn->udma_tchan_id);
+		goto err;
+	}
+
+	tx_chn->ringtxcq = k3_ringacc_request_ring(tx_chn->common.ringacc,
+						   -1, 0);
+	if (!tx_chn->ringtxcq) {
+		ret = -ENODEV;
+		dev_err(dev, "Failed to get TXCQ ring\n");
+		goto err;
+	}
+
+	ret = k3_ringacc_ring_cfg(tx_chn->ringtx, &cfg->tx_cfg);
+	if (ret) {
+		dev_err(dev, "Failed to cfg ringtx %d\n", ret);
+		goto err;
+	}
+
+	ret = k3_ringacc_ring_cfg(tx_chn->ringtxcq, &cfg->txcq_cfg);
+	if (ret) {
+		dev_err(dev, "Failed to cfg ringtx %d\n", ret);
+		goto err;
+	}
+
+	/* request and cfg psi-l */
+	tx_chn->common.src_thread =
+			xudma_dev_get_psil_base(tx_chn->common.udmax) +
+			tx_chn->udma_tchan_id;
+
+	tx_chn->need_tisci_free = false;
+	ret = k3_udma_glue_cfg_tx_chn(tx_chn);
+	if (ret) {
+		dev_err(dev, "Failed to cfg tchan %d\n", ret);
+		goto err;
+	}
+
+	tx_chn->need_tisci_free = true;
+
+	ret = xudma_navss_psil_pair(tx_chn->common.udmax,
+				    tx_chn->common.src_thread,
+				    tx_chn->common.dst_thread);
+	if (ret) {
+		dev_err(dev, "PSI-L request err %d\n", ret);
+		goto err;
+	}
+
+	tx_chn->psil_paired = true;
+
+	k3_udma_glue_dump_tx_chn(tx_chn);
+
+	return tx_chn;
+
+err:
+	k3_udma_glue_release_tx_chn(tx_chn);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_request_tx_chn);
+
+void k3_udma_glue_release_tx_chn(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	if (tx_chn->psil_paired) {
+		xudma_navss_psil_unpair(tx_chn->common.udmax,
+					tx_chn->common.src_thread,
+					tx_chn->common.dst_thread);
+		tx_chn->psil_paired = false;
+	}
+
+	if (!IS_ERR_OR_NULL(tx_chn->common.udmax)) {
+		if (tx_chn->need_tisci_free)
+			tx_chn->need_tisci_free = false;
+
+		if (!IS_ERR_OR_NULL(tx_chn->udma_tchanx))
+			xudma_tchan_put(tx_chn->common.udmax,
+					tx_chn->udma_tchanx);
+
+		xudma_dev_put(tx_chn->common.udmax);
+	}
+
+	if (tx_chn->ringtxcq)
+		k3_ringacc_ring_free(tx_chn->ringtxcq);
+
+	if (tx_chn->ringtx)
+		k3_ringacc_ring_free(tx_chn->ringtx);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_release_tx_chn);
+
+int k3_udma_glue_push_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+			     struct cppi5_host_desc_t *desc_tx,
+			     dma_addr_t desc_dma)
+{
+	u32 ringtxcq_id;
+
+	if (!atomic_add_unless(&tx_chn->free_pkts, -1, 0))
+		return -ENOMEM;
+
+	ringtxcq_id = k3_ringacc_get_ring_id(tx_chn->ringtxcq);
+	cppi5_desc_set_retpolicy(&desc_tx->hdr, 0, ringtxcq_id);
+
+	return k3_ringacc_ring_push(tx_chn->ringtx, &desc_dma);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_push_tx_chn);
+
+int k3_udma_glue_pop_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+			    dma_addr_t *desc_dma)
+{
+	int ret;
+
+	ret = k3_ringacc_ring_pop(tx_chn->ringtxcq, desc_dma);
+	if (!ret)
+		atomic_inc(&tx_chn->free_pkts);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_pop_tx_chn);
+
+int k3_udma_glue_enable_tx_chn(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	u32 txrt_ctl;
+
+	txrt_ctl = UDMA_PEER_RT_EN_ENABLE;
+	xudma_tchanrt_write(tx_chn->udma_tchanx,
+			    UDMA_TCHAN_RT_PEER_RT_EN_REG,
+			    txrt_ctl);
+
+	txrt_ctl = xudma_tchanrt_read(tx_chn->udma_tchanx,
+				      UDMA_TCHAN_RT_CTL_REG);
+	txrt_ctl |= UDMA_CHAN_RT_CTL_EN;
+	xudma_tchanrt_write(tx_chn->udma_tchanx, UDMA_TCHAN_RT_CTL_REG,
+			    txrt_ctl);
+
+	k3_udma_glue_dump_tx_rt_chn(tx_chn, "txchn en");
+	return 0;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_enable_tx_chn);
+
+void k3_udma_glue_disable_tx_chn(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	k3_udma_glue_dump_tx_rt_chn(tx_chn, "txchn dis1");
+
+	xudma_tchanrt_write(tx_chn->udma_tchanx, UDMA_TCHAN_RT_CTL_REG, 0);
+
+	xudma_tchanrt_write(tx_chn->udma_tchanx,
+			    UDMA_TCHAN_RT_PEER_RT_EN_REG, 0);
+	k3_udma_glue_dump_tx_rt_chn(tx_chn, "txchn dis2");
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_disable_tx_chn);
+
+void k3_udma_glue_tdown_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+			       bool sync)
+{
+	int i = 0;
+	u32 val;
+
+	k3_udma_glue_dump_tx_rt_chn(tx_chn, "txchn tdown1");
+
+	xudma_tchanrt_write(tx_chn->udma_tchanx, UDMA_TCHAN_RT_CTL_REG,
+			    UDMA_CHAN_RT_CTL_EN | UDMA_CHAN_RT_CTL_TDOWN);
+
+	val = xudma_tchanrt_read(tx_chn->udma_tchanx, UDMA_TCHAN_RT_CTL_REG);
+
+	while (sync && (val & UDMA_CHAN_RT_CTL_EN)) {
+		val = xudma_tchanrt_read(tx_chn->udma_tchanx,
+					 UDMA_TCHAN_RT_CTL_REG);
+		udelay(1);
+		if (i > K3_UDMAX_TDOWN_TIMEOUT_US) {
+			dev_err(tx_chn->common.dev, "TX tdown timeout\n");
+			break;
+		}
+		i++;
+	}
+
+	val = xudma_tchanrt_read(tx_chn->udma_tchanx,
+				 UDMA_TCHAN_RT_PEER_RT_EN_REG);
+	if (sync && (val & UDMA_PEER_RT_EN_ENABLE))
+		dev_err(tx_chn->common.dev, "TX tdown peer not stopped\n");
+	k3_udma_glue_dump_tx_rt_chn(tx_chn, "txchn tdown2");
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_tdown_tx_chn);
+
+void k3_udma_glue_reset_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+			       void *data,
+			       void (*cleanup)(void *data, dma_addr_t desc_dma))
+{
+	dma_addr_t desc_dma;
+	int occ_tx, i, ret;
+
+	/* reset TXCQ as it is not input for udma - expected to be empty */
+	if (tx_chn->ringtxcq)
+		k3_ringacc_ring_reset(tx_chn->ringtxcq);
+
+	/*
+	 * TXQ reset need to be special way as it is input for udma and its
+	 * state cached by udma, so:
+	 * 1) save TXQ occ
+	 * 2) clean up TXQ and call callback .cleanup() for each desc
+	 * 3) reset TXQ in a special way
+	 */
+	occ_tx = k3_ringacc_ring_get_occ(tx_chn->ringtx);
+	dev_dbg(tx_chn->common.dev, "TX reset occ_tx %u\n", occ_tx);
+
+	for (i = 0; i < occ_tx; i++) {
+		ret = k3_ringacc_ring_pop(tx_chn->ringtx, &desc_dma);
+		if (ret) {
+			dev_err(tx_chn->common.dev, "TX reset pop %d\n", ret);
+			break;
+		}
+		cleanup(data, desc_dma);
+	}
+
+	k3_ringacc_ring_reset_dma(tx_chn->ringtx, occ_tx);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_reset_tx_chn);
+
+u32 k3_udma_glue_tx_get_hdesc_size(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	return tx_chn->common.hdesc_size;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_tx_get_hdesc_size);
+
+u32 k3_udma_glue_tx_get_txcq_id(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	return k3_ringacc_get_ring_id(tx_chn->ringtxcq);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_tx_get_txcq_id);
+
+int k3_udma_glue_tx_get_irq(struct k3_udma_glue_tx_channel *tx_chn)
+{
+	tx_chn->virq = k3_ringacc_get_ring_irq_num(tx_chn->ringtxcq);
+
+	return tx_chn->virq;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_tx_get_irq);
+
+static int k3_udma_glue_cfg_rx_chn(struct k3_udma_glue_rx_channel *rx_chn)
+{
+	const struct udma_tisci_rm *tisci_rm = rx_chn->common.tisci_rm;
+	struct ti_sci_msg_rm_udmap_rx_ch_cfg req;
+	int ret;
+
+	memset(&req, 0, sizeof(req));
+
+	req.valid_params = TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID;
+
+	req.nav_id = tisci_rm->tisci_dev_id;
+	req.index = rx_chn->udma_rchan_id;
+	req.rx_fetch_size = rx_chn->common.hdesc_size >> 2;
+	/*
+	 * TODO: we can't support rxcq_qnum/RCHAN[a]_RCQ cfg with current sysfw
+	 * and udmax impl, so just configure it to invalid value.
+	 * req.rxcq_qnum = k3_ringacc_get_ring_id(rx_chn->flows[0].ringrx);
+	 */
+	req.rxcq_qnum = 0xFFFF;
+	if (rx_chn->flow_num && rx_chn->flow_id_base != rx_chn->udma_rchan_id) {
+		/* Default flow + extra ones */
+		req.flowid_start = rx_chn->flow_id_base;
+		req.flowid_cnt = rx_chn->flow_num;
+		req.valid_params |=
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_FLOWID_START_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_FLOWID_CNT_VALID;
+	}
+	req.rx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_PKT_PBRR;
+
+	ret = tisci_rm->tisci_udmap_ops->rx_ch_cfg(tisci_rm->tisci, &req);
+	if (ret)
+		dev_err(rx_chn->common.dev, "rchan%d cfg failed %d\n",
+			rx_chn->udma_rchan_id, ret);
+
+	return ret;
+}
+
+static void k3_udma_glue_release_rx_flow(struct k3_udma_glue_rx_channel *rx_chn,
+					 u32 flow_num)
+{
+	struct k3_udma_glue_rx_flow *flow = &rx_chn->flows[flow_num];
+
+	if (IS_ERR_OR_NULL(flow->udma_rflow))
+		return;
+
+	if (flow->ringrxfdq)
+		k3_ringacc_ring_free(flow->ringrxfdq);
+
+	if (flow->ringrx)
+		k3_ringacc_ring_free(flow->ringrx);
+
+	xudma_rflow_put(rx_chn->common.udmax, flow->udma_rflow);
+	rx_chn->flows_ready--;
+}
+
+static int k3_udma_glue_cfg_rx_flow(struct k3_udma_glue_rx_channel *rx_chn,
+				    u32 flow_idx,
+				    struct k3_udma_glue_rx_flow_cfg *flow_cfg)
+{
+	struct k3_udma_glue_rx_flow *flow = &rx_chn->flows[flow_idx];
+	const struct udma_tisci_rm *tisci_rm = rx_chn->common.tisci_rm;
+	struct device *dev = rx_chn->common.dev;
+	struct ti_sci_msg_rm_udmap_flow_cfg req;
+	int rx_ring_id;
+	int rx_ringfdq_id;
+	int ret = 0;
+
+	flow->udma_rflow = xudma_rflow_get(rx_chn->common.udmax,
+					   flow->udma_rflow_id);
+	if (IS_ERR(flow->udma_rflow)) {
+		ret = PTR_ERR(flow->udma_rflow);
+		dev_err(dev, "UDMAX rflow get err %d\n", ret);
+		goto err;
+	}
+
+	if (flow->udma_rflow_id != xudma_rflow_get_id(flow->udma_rflow)) {
+		xudma_rflow_put(rx_chn->common.udmax, flow->udma_rflow);
+		return -ENODEV;
+	}
+
+	/* request and cfg rings */
+	flow->ringrx = k3_ringacc_request_ring(rx_chn->common.ringacc,
+					       flow_cfg->ring_rxq_id, 0);
+	if (!flow->ringrx) {
+		ret = -ENODEV;
+		dev_err(dev, "Failed to get RX ring\n");
+		goto err;
+	}
+
+	flow->ringrxfdq = k3_ringacc_request_ring(rx_chn->common.ringacc,
+						  flow_cfg->ring_rxfdq0_id, 0);
+	if (!flow->ringrxfdq) {
+		ret = -ENODEV;
+		dev_err(dev, "Failed to get RXFDQ ring\n");
+		goto err;
+	}
+
+	ret = k3_ringacc_ring_cfg(flow->ringrx, &flow_cfg->rx_cfg);
+	if (ret) {
+		dev_err(dev, "Failed to cfg ringrx %d\n", ret);
+		goto err;
+	}
+
+	ret = k3_ringacc_ring_cfg(flow->ringrxfdq, &flow_cfg->rxfdq_cfg);
+	if (ret) {
+		dev_err(dev, "Failed to cfg ringrxfdq %d\n", ret);
+		goto err;
+	}
+
+	rx_ring_id = k3_ringacc_get_ring_id(flow->ringrx);
+	rx_ringfdq_id = k3_ringacc_get_ring_id(flow->ringrxfdq);
+
+	memset(&req, 0, sizeof(req));
+
+	req.valid_params =
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_EINFO_PRESENT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_PSINFO_PRESENT_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_ERROR_HANDLING_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DESC_TYPE_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_HI_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_LO_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_HI_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_LO_SEL_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ0_SZ0_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ1_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ2_QNUM_VALID |
+			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ3_QNUM_VALID;
+	req.nav_id = tisci_rm->tisci_dev_id;
+	req.flow_index = flow->udma_rflow_id;
+	if (rx_chn->common.epib)
+		req.rx_einfo_present = 1;
+	if (rx_chn->common.psdata_size)
+		req.rx_psinfo_present = 1;
+	if (flow_cfg->rx_error_handling)
+		req.rx_error_handling = 1;
+	req.rx_desc_type = 0;
+	req.rx_dest_qnum = rx_ring_id;
+	req.rx_src_tag_hi_sel = 0;
+	req.rx_src_tag_lo_sel = flow_cfg->src_tag_lo_sel;
+	req.rx_dest_tag_hi_sel = 0;
+	req.rx_dest_tag_lo_sel = 0;
+	req.rx_fdq0_sz0_qnum = rx_ringfdq_id;
+	req.rx_fdq1_qnum = rx_ringfdq_id;
+	req.rx_fdq2_qnum = rx_ringfdq_id;
+	req.rx_fdq3_qnum = rx_ringfdq_id;
+
+	ret = tisci_rm->tisci_udmap_ops->rx_flow_cfg(tisci_rm->tisci, &req);
+	if (ret) {
+		dev_err(dev, "flow%d config failed: %d\n", flow->udma_rflow_id,
+			ret);
+		goto err;
+	}
+
+	rx_chn->flows_ready++;
+
+	return 0;
+err:
+	k3_udma_glue_release_rx_flow(rx_chn, flow_idx);
+	return ret;
+}
+
+static void k3_udma_glue_dump_rx_chn(struct k3_udma_glue_rx_channel *chn)
+{
+	struct device *dev = chn->common.dev;
+
+	dev_dbg(dev, "dump_rx_chn:\n"
+		"udma_rchan_id: %d\n"
+		"src_thread: %08x\n"
+		"dst_thread: %08x\n"
+		"epib: %d\n"
+		"hdesc_size: %u\n"
+		"psdata_size: %u\n"
+		"swdata_size: %u\n"
+		"flow_id_base: %d\n"
+		"flow_num: %d\n",
+		chn->udma_rchan_id,
+		chn->common.src_thread,
+		chn->common.dst_thread,
+		chn->common.epib,
+		chn->common.hdesc_size,
+		chn->common.psdata_size,
+		chn->common.swdata_size,
+		chn->flow_id_base,
+		chn->flow_num);
+}
+
+static void k3_udma_glue_dump_rx_rt_chn(struct k3_udma_glue_rx_channel *chn,
+					char *mark)
+{
+	struct device *dev = chn->common.dev;
+
+	dev_dbg(dev, "=== dump ===> %s\n", mark);
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_RCHAN_RT_CTL_REG,
+		xudma_rchanrt_read(chn->udma_rchanx, UDMA_RCHAN_RT_CTL_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_RCHAN_RT_PEER_RT_EN_REG,
+		xudma_rchanrt_read(chn->udma_rchanx,
+				   UDMA_RCHAN_RT_PEER_RT_EN_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_RCHAN_RT_PCNT_REG,
+		xudma_rchanrt_read(chn->udma_rchanx, UDMA_RCHAN_RT_PCNT_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_RCHAN_RT_BCNT_REG,
+		xudma_rchanrt_read(chn->udma_rchanx, UDMA_RCHAN_RT_BCNT_REG));
+	dev_dbg(dev, "0x%08X: %08X\n", UDMA_RCHAN_RT_SBCNT_REG,
+		xudma_rchanrt_read(chn->udma_rchanx, UDMA_RCHAN_RT_SBCNT_REG));
+}
+
+struct k3_udma_glue_rx_channel *k3_udma_glue_request_rx_chn(struct device *dev,
+		const char *name, struct k3_udma_glue_rx_channel_cfg *cfg)
+{
+	struct k3_udma_glue_rx_channel *rx_chn;
+	int ret, i;
+
+	if (cfg->flow_id_num <= 0)
+		return ERR_PTR(-EINVAL);
+
+	if (cfg->flow_id_num != 1 &&
+	    (cfg->def_flow_cfg || cfg->flow_id_use_rxchan_id))
+		return ERR_PTR(-EINVAL);
+
+	rx_chn = devm_kzalloc(dev, sizeof(*rx_chn), GFP_KERNEL);
+	if (!rx_chn)
+		return ERR_PTR(-ENOMEM);
+
+	rx_chn->common.dev = dev;
+	rx_chn->common.swdata_size = cfg->swdata_size;
+
+	/* parse of udmap channel */
+	ret = of_k3_udma_glue_parse_chn(dev->of_node, name,
+					&rx_chn->common, false);
+	if (ret)
+		goto err;
+
+	rx_chn->common.hdesc_size = cppi5_hdesc_calc_size(rx_chn->common.epib,
+						rx_chn->common.psdata_size,
+						rx_chn->common.swdata_size);
+
+	/* request and cfg UDMAP RX channel */
+	rx_chn->udma_rchanx = xudma_rchan_get(rx_chn->common.udmax, -1);
+	if (IS_ERR(rx_chn->udma_rchanx)) {
+		ret = PTR_ERR(rx_chn->udma_rchanx);
+		dev_err(dev, "UDMAX rchanx get err %d\n", ret);
+		goto err;
+	}
+	rx_chn->udma_rchan_id = xudma_rchan_get_id(rx_chn->udma_rchanx);
+
+	rx_chn->flow_num = cfg->flow_id_num;
+	rx_chn->flow_id_base = cfg->flow_id_base;
+
+	/* Use RX channel id as flow id: target dev can't generate flow_id */
+	if (cfg->flow_id_use_rxchan_id)
+		rx_chn->flow_id_base = rx_chn->udma_rchan_id;
+
+	rx_chn->flows = devm_kcalloc(dev, rx_chn->flow_num,
+				     sizeof(*rx_chn->flows), GFP_KERNEL);
+	if (!rx_chn->flows) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	/* Reserve range of RX flows */
+	if (!cfg->flow_id_use_rxchan_id) {
+		ret = xudma_reserve_rflow_range(rx_chn->common.udmax,
+						rx_chn->flow_id_base,
+						rx_chn->flow_num);
+		if (ret < 0) {
+			dev_err(dev, "UDMAX reserve_rflow get err %d\n", ret);
+			goto err;
+		}
+		rx_chn->flow_id_base = ret;
+	}
+
+	for (i = 0; i < rx_chn->flow_num; i++)
+		rx_chn->flows[i].udma_rflow_id = rx_chn->flow_id_base + i;
+
+	/* request and cfg psi-l */
+	rx_chn->common.dst_thread =
+			xudma_dev_get_psil_base(rx_chn->common.udmax) +
+			rx_chn->udma_rchan_id;
+
+	rx_chn->need_tisci_free = false;
+	ret = k3_udma_glue_cfg_rx_chn(rx_chn);
+	if (ret) {
+		dev_err(dev, "Failed to cfg rchan %d\n", ret);
+		goto err;
+	}
+	rx_chn->need_tisci_free = true;
+
+	/* init default RX flow only if flow_num = 1 */
+	if (cfg->def_flow_cfg) {
+		ret = k3_udma_glue_cfg_rx_flow(rx_chn, 0, cfg->def_flow_cfg);
+		if (ret)
+			goto err;
+	}
+
+	if (!cfg->skip_psil) {
+		ret = xudma_navss_psil_pair(rx_chn->common.udmax,
+					    rx_chn->common.src_thread,
+					    rx_chn->common.dst_thread);
+		if (ret) {
+			dev_err(dev, "PSI-L request err %d\n", ret);
+			goto err;
+		}
+
+		rx_chn->psil_paired = true;
+	}
+
+	k3_udma_glue_dump_rx_chn(rx_chn);
+
+	return rx_chn;
+
+err:
+	k3_udma_glue_release_rx_chn(rx_chn);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_request_rx_chn);
+
+void k3_udma_glue_release_rx_chn(struct k3_udma_glue_rx_channel *rx_chn)
+{
+	int i;
+
+	if (rx_chn->psil_paired) {
+		xudma_navss_psil_unpair(rx_chn->common.udmax,
+					rx_chn->common.src_thread,
+					rx_chn->common.dst_thread);
+		rx_chn->psil_paired = false;
+	}
+
+	if (IS_ERR_OR_NULL(rx_chn->common.udmax))
+		return;
+
+	for (i = 0; i < rx_chn->flow_num; i++)
+		k3_udma_glue_release_rx_flow(rx_chn, i);
+
+	if (rx_chn->need_tisci_free)
+		rx_chn->need_tisci_free = false;
+
+	xudma_free_rflow_range(rx_chn->common.udmax,
+			       rx_chn->flow_id_base, rx_chn->flow_num);
+
+	if (!IS_ERR_OR_NULL(rx_chn->udma_rchanx))
+		xudma_rchan_put(rx_chn->common.udmax,
+				rx_chn->udma_rchanx);
+
+	xudma_dev_put(rx_chn->common.udmax);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_release_rx_chn);
+
+int k3_udma_glue_rx_flow_init(struct k3_udma_glue_rx_channel *rx_chn,
+			      u32 flow_idx,
+			      struct k3_udma_glue_rx_flow_cfg *flow_cfg)
+{
+	if (flow_idx >= rx_chn->flow_num)
+		return -EINVAL;
+
+	return k3_udma_glue_cfg_rx_flow(rx_chn, flow_idx, flow_cfg);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_rx_flow_init);
+
+u32 k3_udma_glue_rx_flow_get_fdq_id(struct k3_udma_glue_rx_channel *rx_chn,
+				    u32 flow_idx)
+{
+	struct k3_udma_glue_rx_flow *flow;
+
+	if (flow_idx >= rx_chn->flow_num)
+		return -EINVAL;
+
+	flow = &rx_chn->flows[flow_idx];
+
+	return k3_ringacc_get_ring_id(flow->ringrxfdq);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_rx_flow_get_fdq_id);
+
+u32 k3_udma_glue_rx_get_flow_id_base(struct k3_udma_glue_rx_channel *rx_chn)
+{
+	return rx_chn->flow_id_base;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_rx_get_flow_id_base);
+
+int k3_udma_glue_enable_rx_chn(struct k3_udma_glue_rx_channel *rx_chn)
+{
+	u32 rxrt_ctl;
+
+	if (rx_chn->flows_ready < rx_chn->flow_num)
+		return -EINVAL;
+
+	rxrt_ctl = xudma_rchanrt_read(rx_chn->udma_rchanx,
+				      UDMA_RCHAN_RT_CTL_REG);
+	rxrt_ctl |= UDMA_CHAN_RT_CTL_EN;
+	xudma_rchanrt_write(rx_chn->udma_rchanx, UDMA_RCHAN_RT_CTL_REG,
+			    rxrt_ctl);
+
+	xudma_rchanrt_write(rx_chn->udma_rchanx,
+			    UDMA_RCHAN_RT_PEER_RT_EN_REG,
+			    UDMA_PEER_RT_EN_ENABLE);
+
+	k3_udma_glue_dump_rx_rt_chn(rx_chn, "rxrt en");
+	return 0;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_enable_rx_chn);
+
+void k3_udma_glue_disable_rx_chn(struct k3_udma_glue_rx_channel *rx_chn)
+{
+	k3_udma_glue_dump_rx_rt_chn(rx_chn, "rxrt dis1");
+
+	xudma_rchanrt_write(rx_chn->udma_rchanx,
+			    UDMA_RCHAN_RT_PEER_RT_EN_REG,
+			    0);
+	xudma_rchanrt_write(rx_chn->udma_rchanx, UDMA_RCHAN_RT_CTL_REG, 0);
+
+	k3_udma_glue_dump_rx_rt_chn(rx_chn, "rxrt dis2");
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_disable_rx_chn);
+
+void k3_udma_glue_tdown_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+			       bool sync)
+{
+	int i = 0;
+	u32 val;
+
+	k3_udma_glue_dump_rx_rt_chn(rx_chn, "rxrt tdown1");
+
+	xudma_rchanrt_write(rx_chn->udma_rchanx, UDMA_RCHAN_RT_PEER_RT_EN_REG,
+			    UDMA_PEER_RT_EN_ENABLE | UDMA_PEER_RT_EN_TEARDOWN);
+
+	val = xudma_rchanrt_read(rx_chn->udma_rchanx, UDMA_RCHAN_RT_CTL_REG);
+
+	while (sync && (val & UDMA_CHAN_RT_CTL_EN)) {
+		val = xudma_rchanrt_read(rx_chn->udma_rchanx,
+					 UDMA_RCHAN_RT_CTL_REG);
+		udelay(1);
+		if (i > K3_UDMAX_TDOWN_TIMEOUT_US) {
+			dev_err(rx_chn->common.dev, "RX tdown timeout\n");
+			break;
+		}
+		i++;
+	}
+
+	val = xudma_rchanrt_read(rx_chn->udma_rchanx,
+				 UDMA_RCHAN_RT_PEER_RT_EN_REG);
+	if (sync && (val & UDMA_PEER_RT_EN_ENABLE))
+		dev_err(rx_chn->common.dev, "TX tdown peer not stopped\n");
+	k3_udma_glue_dump_rx_rt_chn(rx_chn, "rxrt tdown2");
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_tdown_rx_chn);
+
+void k3_udma_glue_reset_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+		u32 flow_num, void *data,
+		void (*cleanup)(void *data, dma_addr_t desc_dma), bool skip_fdq)
+{
+	struct k3_udma_glue_rx_flow *flow = &rx_chn->flows[flow_num];
+	struct device *dev = rx_chn->common.dev;
+	dma_addr_t desc_dma;
+	int occ_rx, i, ret;
+
+	/* reset RXCQ as it is not input for udma - expected to be empty */
+	if (flow->ringrx)
+		k3_ringacc_ring_reset(flow->ringrx);
+
+	/* Skip RX FDQ in case one FDQ is used for the set of flows */
+	if (skip_fdq)
+		return;
+
+	/*
+	 * RX FDQ reset need to be special way as it is input for udma and its
+	 * state cached by udma, so:
+	 * 1) save RX FDQ occ
+	 * 2) clean up RX FDQ and call callback .cleanup() for each desc
+	 * 3) reset RX FDQ in a special way
+	 */
+	occ_rx = k3_ringacc_ring_get_occ(flow->ringrxfdq);
+	dev_dbg(dev, "RX reset flow %u occ_tx %u\n", flow_num, occ_rx);
+
+	for (i = 0; i < occ_rx; i++) {
+		ret = k3_ringacc_ring_pop(flow->ringrxfdq, &desc_dma);
+		if (ret) {
+			dev_err(dev, "RX reset pop %d\n", ret);
+			break;
+		}
+		cleanup(data, desc_dma);
+	}
+
+	k3_ringacc_ring_reset_dma(flow->ringrxfdq, occ_rx);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_reset_rx_chn);
+
+int k3_udma_glue_push_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+			     u32 flow_num, struct cppi5_host_desc_t *desc_rx,
+			     dma_addr_t desc_dma)
+{
+	struct k3_udma_glue_rx_flow *flow = &rx_chn->flows[flow_num];
+
+	return k3_ringacc_ring_push(flow->ringrxfdq, &desc_dma);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_push_rx_chn);
+
+int k3_udma_glue_pop_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+			    u32 flow_num, dma_addr_t *desc_dma)
+{
+	struct k3_udma_glue_rx_flow *flow = &rx_chn->flows[flow_num];
+
+	return k3_ringacc_ring_pop(flow->ringrx, desc_dma);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_pop_rx_chn);
+
+int k3_udma_glue_rx_get_irq(struct k3_udma_glue_rx_channel *rx_chn,
+			    u32 flow_num)
+{
+	struct k3_udma_glue_rx_flow *flow;
+
+	flow = &rx_chn->flows[flow_num];
+
+	flow->virq = k3_ringacc_get_ring_irq_num(flow->ringrx);
+
+	return flow->virq;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_rx_get_irq);
diff --git a/drivers/dma/ti/k3-udma-private.c b/drivers/dma/ti/k3-udma-private.c
new file mode 100644
index 000000000000..2b75f9d0af10
--- /dev/null
+++ b/drivers/dma/ti/k3-udma-private.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ *  Author: Peter Ujfalusi <peter.ujfalusi@ti.com>
+ */
+
+int xudma_navss_psil_pair(struct udma_dev *ud, u32 src_thread, u32 dst_thread)
+{
+	return navss_psil_pair(ud, src_thread, dst_thread);
+}
+EXPORT_SYMBOL(xudma_navss_psil_pair);
+
+int xudma_navss_psil_unpair(struct udma_dev *ud, u32 src_thread, u32 dst_thread)
+{
+	return navss_psil_unpair(ud, src_thread, dst_thread);
+}
+EXPORT_SYMBOL(xudma_navss_psil_unpair);
+
+struct udma_dev *of_xudma_dev_get(struct device_node *np, const char *property)
+{
+	struct device_node *udma_node = np;
+	struct platform_device *pdev;
+	struct udma_dev *ud;
+
+	if (property) {
+		udma_node = of_parse_phandle(np, property, 0);
+		if (!udma_node) {
+			pr_err("UDMA node is not found\n");
+			return ERR_PTR(-ENODEV);
+		}
+	}
+
+	pdev = of_find_device_by_node(udma_node);
+	if (!pdev) {
+		pr_err("UDMA device not found\n");
+		return ERR_PTR(-EPROBE_DEFER);
+	}
+
+	if (np != udma_node)
+		of_node_put(udma_node);
+
+	ud = platform_get_drvdata(pdev);
+	if (!ud) {
+		pr_err("UDMA has not been proped\n");
+		return ERR_PTR(-EPROBE_DEFER);
+	}
+
+	pm_runtime_get_sync(&pdev->dev);
+
+	return ud;
+}
+EXPORT_SYMBOL(of_xudma_dev_get);
+
+void xudma_dev_put(struct udma_dev *ud)
+{
+	pm_runtime_put_sync(ud->ddev.dev);
+}
+EXPORT_SYMBOL(xudma_dev_put);
+
+u32 xudma_dev_get_psil_base(struct udma_dev *ud)
+{
+	return ud->psil_base;
+}
+EXPORT_SYMBOL(xudma_dev_get_psil_base);
+
+struct udma_tisci_rm *xudma_dev_get_tisci_rm(struct udma_dev *ud)
+{
+	return &ud->tisci_rm;
+}
+EXPORT_SYMBOL(xudma_dev_get_tisci_rm);
+
+int xudma_reserve_rflow_range(struct udma_dev *ud, int from, int cnt)
+{
+	return __udma_reserve_rflow_range(ud, from, cnt);
+}
+EXPORT_SYMBOL(xudma_reserve_rflow_range);
+
+int xudma_free_rflow_range(struct udma_dev *ud, int from, int cnt)
+{
+	return __udma_free_rflow_range(ud, from, cnt);
+}
+EXPORT_SYMBOL(xudma_free_rflow_range);
+
+#define XUDMA_GET_PUT_RESOURCE(res)					\
+struct udma_##res *xudma_##res##_get(struct udma_dev *ud, int id)	\
+{									\
+	return __udma_reserve_##res(ud, false, id);			\
+}									\
+EXPORT_SYMBOL(xudma_##res##_get);					\
+									\
+void xudma_##res##_put(struct udma_dev *ud, struct udma_##res *p)	\
+{									\
+	clear_bit(p->id, ud->res##_map);				\
+}									\
+EXPORT_SYMBOL(xudma_##res##_put)
+XUDMA_GET_PUT_RESOURCE(tchan);
+XUDMA_GET_PUT_RESOURCE(rchan);
+XUDMA_GET_PUT_RESOURCE(rflow);
+
+#define XUDMA_GET_RESOURCE_ID(res)					\
+int xudma_##res##_get_id(struct udma_##res *p)				\
+{									\
+	return p->id;							\
+}									\
+EXPORT_SYMBOL(xudma_##res##_get_id)
+XUDMA_GET_RESOURCE_ID(tchan);
+XUDMA_GET_RESOURCE_ID(rchan);
+XUDMA_GET_RESOURCE_ID(rflow);
+
+/* Exported register access functions */
+#define XUDMA_RT_IO_FUNCTIONS(res)					\
+u32 xudma_##res##rt_read(struct udma_##res *p, int reg)			\
+{									\
+	return udma_##res##rt_read(p, reg);				\
+}									\
+EXPORT_SYMBOL(xudma_##res##rt_read);					\
+									\
+void xudma_##res##rt_write(struct udma_##res *p, int reg, u32 val)	\
+{									\
+	udma_##res##rt_write(p, reg, val);				\
+}									\
+EXPORT_SYMBOL(xudma_##res##rt_write)
+XUDMA_RT_IO_FUNCTIONS(tchan);
+XUDMA_RT_IO_FUNCTIONS(rchan);
diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index bdd7652b01cf..54133a1ececc 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1039,6 +1039,64 @@ static irqreturn_t udma_udma_irq_handler(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+/**
+ * __udma_reserve_rflow_range - reserve range of flow ids
+ * @ud: UDMA device
+ * @from: Start the search from this flow id number
+ * @cnt: Number of consecutive flow ids to allocate
+ *
+ * Reserve range of flow ids for future use, those flows can be allocated
+ * only using explicit flow id number. if @from is set to -1 it will try to find
+ * first free range. if @from is positive value it will force allocation only
+ * of the specified range of flows.
+ *
+ * Returns -ENOMEM if can't find free range.
+ * -EEXIST if requested range is busy.
+ * -EINVAL if wrong input values passed.
+ * Returns flow id on success.
+ */
+static int __udma_reserve_rflow_range(struct udma_dev *ud, int from, int cnt)
+{
+	int start, tmp_from;
+	DECLARE_BITMAP(tmp, K3_UDMA_MAX_RFLOWS);
+
+	tmp_from = from;
+	if (tmp_from < 0)
+		tmp_from = ud->rchan_cnt;
+	/* default flows can't be reserved and accessible only by id */
+	if (tmp_from < ud->rchan_cnt)
+		return -EINVAL;
+
+	if (tmp_from + cnt > ud->rflow_cnt)
+		return -EINVAL;
+
+	bitmap_or(tmp, ud->rflow_map, ud->rflow_map_reserved,
+		  ud->rflow_cnt);
+
+	start = bitmap_find_next_zero_area(tmp,
+					   ud->rflow_cnt,
+					   tmp_from, cnt, 0);
+	if (start >= ud->rflow_cnt)
+		return -ENOMEM;
+
+	if (from >= 0 && start != from)
+		return -EEXIST;
+
+	bitmap_set(ud->rflow_map_reserved, start, cnt);
+	return start;
+}
+
+static int __udma_free_rflow_range(struct udma_dev *ud, int from, int cnt)
+{
+	if (from < ud->rchan_cnt)
+		return -EINVAL;
+	if (from + cnt > ud->rflow_cnt)
+		return -EINVAL;
+
+	bitmap_clear(ud->rflow_map_reserved, from, cnt);
+	return 0;
+}
+
 static struct udma_rflow *__udma_reserve_rflow(struct udma_dev *ud,
 					       enum udma_tp_level tpl, int id)
 {
@@ -3412,6 +3470,9 @@ static struct platform_driver udma_driver = {
 
 module_platform_driver(udma_driver);
 
+/* Private interfaces to UDMA */
+#include "k3-udma-private.c"
+
 MODULE_ALIAS("platform:ti-udma");
 MODULE_DESCRIPTION("TI K3 DMA driver for CPPI 5.0 compliant devices");
 MODULE_AUTHOR("Peter Ujfalusi <peter.ujfalusi@ti.com>");
diff --git a/drivers/dma/ti/k3-udma.h b/drivers/dma/ti/k3-udma.h
index a6153deb791b..bf5a823c13be 100644
--- a/drivers/dma/ti/k3-udma.h
+++ b/drivers/dma/ti/k3-udma.h
@@ -127,4 +127,34 @@ struct udma_tisci_rm {
 	struct ti_sci_resource *rm_ranges[RM_RANGE_LAST];
 };
 
+/* Direct access to UDMA low lever resources for the glue layer */
+int xudma_navss_psil_pair(struct udma_dev *ud, u32 src_thread, u32 dst_thread);
+int xudma_navss_psil_unpair(struct udma_dev *ud, u32 src_thread,
+			    u32 dst_thread);
+
+struct udma_dev *of_xudma_dev_get(struct device_node *np, const char *property);
+void xudma_dev_put(struct udma_dev *ud);
+u32 xudma_dev_get_psil_base(struct udma_dev *ud);
+struct udma_tisci_rm *xudma_dev_get_tisci_rm(struct udma_dev *ud);
+
+int xudma_reserve_rflow_range(struct udma_dev *ud, int from, int cnt);
+int xudma_free_rflow_range(struct udma_dev *ud, int from, int cnt);
+
+struct udma_tchan *xudma_tchan_get(struct udma_dev *ud, int id);
+struct udma_rchan *xudma_rchan_get(struct udma_dev *ud, int id);
+struct udma_rflow *xudma_rflow_get(struct udma_dev *ud, int id);
+
+void xudma_tchan_put(struct udma_dev *ud, struct udma_tchan *p);
+void xudma_rchan_put(struct udma_dev *ud, struct udma_rchan *p);
+void xudma_rflow_put(struct udma_dev *ud, struct udma_rflow *p);
+
+int xudma_tchan_get_id(struct udma_tchan *p);
+int xudma_rchan_get_id(struct udma_rchan *p);
+int xudma_rflow_get_id(struct udma_rflow *p);
+
+u32 xudma_tchanrt_read(struct udma_tchan *tchan, int reg);
+void xudma_tchanrt_write(struct udma_tchan *tchan, int reg, u32 val);
+u32 xudma_rchanrt_read(struct udma_rchan *rchan, int reg);
+void xudma_rchanrt_write(struct udma_rchan *rchan, int reg, u32 val);
+
 #endif /* K3_UDMA_H_ */
diff --git a/include/linux/dma/k3-udma-glue.h b/include/linux/dma/k3-udma-glue.h
new file mode 100644
index 000000000000..05e71c17c0ef
--- /dev/null
+++ b/include/linux/dma/k3-udma-glue.h
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ *  Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
+ */
+
+#ifndef K3_UDMA_GLUE_H_
+#define K3_UDMA_GLUE_H_
+
+#include <linux/types.h>
+#include <linux/soc/ti/k3-ringacc.h>
+#include <linux/dma/ti-cppi5.h>
+
+struct k3_udma_glue_tx_channel_cfg {
+	struct k3_ring_cfg tx_cfg;
+	struct k3_ring_cfg txcq_cfg;
+
+	bool tx_pause_on_err;
+	bool tx_filt_einfo;
+	bool tx_filt_pswords;
+	bool tx_supr_tdpkt;
+	u32  swdata_size;
+};
+
+struct k3_udma_glue_tx_channel;
+
+struct k3_udma_glue_tx_channel *k3_udma_glue_request_tx_chn(struct device *dev,
+		const char *name, struct k3_udma_glue_tx_channel_cfg *cfg);
+
+void k3_udma_glue_release_tx_chn(struct k3_udma_glue_tx_channel *tx_chn);
+int k3_udma_glue_push_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+			     struct cppi5_host_desc_t *desc_tx,
+			     dma_addr_t desc_dma);
+int k3_udma_glue_pop_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+			    dma_addr_t *desc_dma);
+int k3_udma_glue_enable_tx_chn(struct k3_udma_glue_tx_channel *tx_chn);
+void k3_udma_glue_disable_tx_chn(struct k3_udma_glue_tx_channel *tx_chn);
+void k3_udma_glue_tdown_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+			       bool sync);
+void k3_udma_glue_reset_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
+		void *data, void (*cleanup)(void *data, dma_addr_t desc_dma));
+u32 k3_udma_glue_tx_get_hdesc_size(struct k3_udma_glue_tx_channel *tx_chn);
+u32 k3_udma_glue_tx_get_txcq_id(struct k3_udma_glue_tx_channel *tx_chn);
+int k3_udma_glue_tx_get_irq(struct k3_udma_glue_tx_channel *tx_chn);
+
+enum {
+	K3_NAV_UDMAX_SRC_TAG_LO_KEEP = 0,
+	K3_NAV_UDMAX_SRC_TAG_LO_USE_FLOW_REG = 1,
+	K3_NAV_UDMAX_SRC_TAG_LO_USE_REMOTE_FLOW_ID = 2,
+	K3_NAV_UDMAX_SRC_TAG_LO_USE_REMOTE_SRC_TAG = 4,
+};
+
+/**
+ * k3_udma_glue_rx_flow_cfg - UDMA RX flow cfg
+ *
+ * @rx_cfg:		RX ring configuration
+ * @rxfdq_cfg:		RX free Host PD ring configuration
+ * @ring_rxq_id:	RX ring id (or -1 for any)
+ * @ring_rxfdq0_id:	RX free Host PD ring (FDQ) if (or -1 for any)
+ * @rx_error_handling:	Rx Error Handling Mode (0 - drop, 1 - re-try)
+ * @src_tag_lo_sel:	Rx Source Tag Low Byte Selector in Host PD
+ */
+struct k3_udma_glue_rx_flow_cfg {
+	struct k3_ring_cfg rx_cfg;
+	struct k3_ring_cfg rxfdq_cfg;
+	int ring_rxq_id;
+	int ring_rxfdq0_id;
+	bool rx_error_handling;
+	int src_tag_lo_sel;
+};
+
+/**
+ * k3_udma_glue_rx_channel_cfg - UDMA RX channel cfg
+ *
+ * @psdata_size:	SW Data is present in Host PD of @swdata_size bytes
+ * @flow_id_base:	first flow_id used by channel.
+ *			if @flow_id_base = -1 - flow ids range will be allocated
+ *			dynamically.
+ * @flow_id_num:	number of RX flows used by channel
+ * @flow_id_use_rxchan_id:	use RX channel id as flow id,
+ *				used only if @flow_id_num = 1
+ * @def_flow_cfg	default RX flow configuration,
+ *			used only if @flow_id_num = 1
+ */
+struct k3_udma_glue_rx_channel_cfg {
+	u32  swdata_size;
+	int  flow_id_base;
+	int  flow_id_num;
+	bool flow_id_use_rxchan_id;
+	bool skip_psil;
+
+	struct k3_udma_glue_rx_flow_cfg *def_flow_cfg;
+};
+
+struct k3_udma_glue_rx_channel;
+
+struct k3_udma_glue_rx_channel *k3_udma_glue_request_rx_chn(
+		struct device *dev,
+		const char *name,
+		struct k3_udma_glue_rx_channel_cfg *cfg);
+
+void k3_udma_glue_release_rx_chn(struct k3_udma_glue_rx_channel *rx_chn);
+int k3_udma_glue_enable_rx_chn(struct k3_udma_glue_rx_channel *rx_chn);
+void k3_udma_glue_disable_rx_chn(struct k3_udma_glue_rx_channel *rx_chn);
+void k3_udma_glue_tdown_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+			       bool sync);
+int k3_udma_glue_push_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+		u32 flow_num, struct cppi5_host_desc_t *desc_tx,
+		dma_addr_t desc_dma);
+int k3_udma_glue_pop_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+		u32 flow_num, dma_addr_t *desc_dma);
+int k3_udma_glue_rx_flow_init(struct k3_udma_glue_rx_channel *rx_chn,
+		u32 flow_idx, struct k3_udma_glue_rx_flow_cfg *flow_cfg);
+u32 k3_udma_glue_rx_flow_get_fdq_id(struct k3_udma_glue_rx_channel *rx_chn,
+				    u32 flow_idx);
+u32 k3_udma_glue_rx_get_flow_id_base(struct k3_udma_glue_rx_channel *rx_chn);
+int k3_udma_glue_rx_get_irq(struct k3_udma_glue_rx_channel *rx_chn,
+			    u32 flow_num);
+void k3_udma_glue_rx_put_irq(struct k3_udma_glue_rx_channel *rx_chn,
+			     u32 flow_num);
+void k3_udma_glue_reset_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
+		u32 flow_num, void *data,
+		void (*cleanup)(void *data, dma_addr_t desc_dma),
+		bool skip_fdq);
+
+#endif /* K3_UDMA_GLUE_H_ */
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (13 preceding siblings ...)
  2019-07-30  9:34 ` [PATCH v2 14/14] dmaengine: ti: k3-udma: Add glue layer for non DMAengine users Peter Ujfalusi
@ 2019-07-31  7:08 ` Peter Ujfalusi
  2019-08-30 12:12 ` Peter Ujfalusi
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-07-31  7:08 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Rob,

On 30/07/2019 12.34, Peter Ujfalusi wrote:
> Changes since v1
> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*)
> - Added support for j721e
> - Based on 5.3-rc2
> - dropped ti_sci API patch for RM management as it is already upstream
> - dropped dmadev_get_slave_channel() patch, using __dma_request_channel()
> - Added Rob's Reviewed-by to ringacc DT binding document patch
> - DT bindings changes:
>  - linux,udma-mode is gone, I have a simple lookup table in the driver to flag
>    TR channels.
>  - Support for j721e

I have also made smaller adjustment of some property names within the
psi-l config nodes to make it uniform and more readable.

> - Fix bug in of_node_put() handling in xlate function
> 
> Changes since RFC (https://patchwork.kernel.org/cover/10612465/):
> - Based on linux-next (20190506) which now have the ti_sci interrupt support
> - The series can be applied and the UDMA via DMAengine API will be functional
> - Included in the series: ti_sci Resource management API, cppi5 header and
>   driver for the ring accelerator.
> - The DMAengine core patches have been updated as per the review comments for
>   earlier submittion.
> - The DMAengine driver patch is artificially split up to 6 smaller patches
> 
> The k3-udma driver implements the Data Movement Architecture described in
> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and
> j721e TRM (http://www.ti.com/lit/pdf/spruil1)
> 
> This DMA architecture is a big departure from 'traditional' architecture where
> we had either EDMA or sDMA as system DMA.
> 
> Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)
> or USB (am335x) while other peripherals were serviced by EDMA.
> 
> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the
> SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). 
> 
> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)
> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and
> TR mode (similar to EDMA descriptor).
> The data movement is done within a PSI-L fabric, peripherals (including the
> UDMA-P) are not addressed by their I/O register as with traditional DMAs but
> with their PSI-L thread ID.
> 
> In AM65x/j721e we have two main type of peripherals:
> Legacy: McASP, McSPI, UART, etc.
>  to provide connectivity they are serviced by PDMA (Peripheral DMA)
>  PDMA threads are locked to service a given peripheral, for example PSI-L thread
>  0x4400/0xc400 is to service McASP0 rx/tx.
>  The PDMa configuration can be done via the UDMA Real Time Peer registers.
> Native: Networking, security accelerator
>  these peripherals have native support for PSI-L.
> 
> To be able to use the DMA the following generic steps need to be taken:
> - configure a DMA channel (tchan for TX, rchan for RX)
>  - channel mode: Packet or TR mode
>  - for memcpy a tchan and rchan pair is used.
>  - for packet mode RX we also need to configure a receive flow to configure the
>    packet receiption
> - the source and destination threads must be paired
> - at minimum one pair of rings need to be configured:
>  - tx: transfer ring and transfer completion ring
>  - rx: free descriptor ring and receive ring
> - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring
>  - If the channel is in packet mode or configured to memcpy then we only need
>    one interrupt from the ring, events from UDMAP is not used.
> 
> When the channel setup is completed we only interract with the rings:
> - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by
>   the UDMA-P
> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to
>   the r_ring.
> 
> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the
> case in previous DMAs we need to report the amount of data held in these FIFOs
> to clients (delay calculation for ALSA, UART FIFO flush support).
> 
> Metadata support:
> DMAengine user driver was posted upstream based/tested on the v1 of the UDMA
> series: https://lkml.org/lkml/2019/6/28/20
> SA2UL is using the metadata DMAengine API.
> 
> Note on the last patch:
> In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case
> anymore and the DMAengine API currently missing support for the features we
> would need to support networking, things like
> - support for receive descriptor 'classification'
>  - we need to support several receive queues for a channel.
>  - the queues are used for packet priority handling for example, but they can be
>    used to have pools of descriptors for different sizes.
> - out of order completion of descriptors on a channel
>  - when we have several queues to handle different priority packets the
>    descriptors will be completed 'out-of-order'
> - NAPI type of operation (polling instead of interrupt driven transfer)
>  - without this we can not sustain gigabit speeds and we need to support NAPI
>  - not to limit this to networking, but other high performance operations
> 
> It is my intention to work on these to be able to remove the 'glue' layer and
> switch to DMAengine API - or have an API aside of DMAengine to have generic way
> to support networking, but given how controversial and not trivial these changes
> are we need something to support networking.
> 
> The series (+DT patch to enabled UDMA/PDMA on AM65x) on top of 5.3-rc2 is
> available:
> https://github.com/omap-audio/linux-audio.git peter/udma/series_v2-5.3-rc2
> 
> Regards,
> Peter
> ---
> Grygorii Strashko (3):
>   bindings: soc: ti: add documentation for k3 ringacc
>   soc: ti: k3: add navss ringacc driver
>   dmaengine: ti: k3-udma: Add glue layer for non DMAengine users
> 
> Peter Ujfalusi (11):
>   dmaengine: doc: Add sections for per descriptor metadata support
>   dmaengine: Add metadata_ops for dma_async_tx_descriptor
>   dmaengine: Add support for reporting DMA cached data amount
>   dmaengine: ti: Add cppi5 header for UDMA
>   dt-bindings: dma: ti: Add document for K3 UDMA
>   dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io
>     func
>   dmaengine: ti: New driver for K3 UDMA - split#2: probe/remove, xlate
>     and filter_fn
>   dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free
>     chan_resources
>   dmaengine: ti: New driver for K3 UDMA - split#4: dma_device callbacks
>     1
>   dmaengine: ti: New driver for K3 UDMA - split#5: dma_device callbacks
>     2
>   dmaengine: ti: New driver for K3 UDMA - split#6: Kconfig and Makefile
> 
>  .../devicetree/bindings/dma/ti/k3-udma.txt    |  170 +
>  .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +
>  Documentation/driver-api/dmaengine/client.rst |   75 +
>  .../driver-api/dmaengine/provider.rst         |   46 +
>  drivers/dma/dmaengine.c                       |   73 +
>  drivers/dma/dmaengine.h                       |    8 +
>  drivers/dma/ti/Kconfig                        |   22 +
>  drivers/dma/ti/Makefile                       |    2 +
>  drivers/dma/ti/k3-udma-glue.c                 | 1039 +++++
>  drivers/dma/ti/k3-udma-private.c              |  124 +
>  drivers/dma/ti/k3-udma.c                      | 3479 +++++++++++++++++
>  drivers/dma/ti/k3-udma.h                      |  160 +
>  drivers/soc/ti/Kconfig                        |   17 +
>  drivers/soc/ti/Makefile                       |    1 +
>  drivers/soc/ti/k3-ringacc.c                   | 1191 ++++++
>  include/dt-bindings/dma/k3-udma.h             |   10 +
>  include/linux/dma/k3-udma-glue.h              |  125 +
>  include/linux/dma/ti-cppi5.h                  |  996 +++++
>  include/linux/dmaengine.h                     |  110 +
>  include/linux/soc/ti/k3-ringacc.h             |  262 ++
>  20 files changed, 7969 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
>  create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt
>  create mode 100644 drivers/dma/ti/k3-udma-glue.c
>  create mode 100644 drivers/dma/ti/k3-udma-private.c
>  create mode 100644 drivers/dma/ti/k3-udma.c
>  create mode 100644 drivers/dma/ti/k3-udma.h
>  create mode 100644 drivers/soc/ti/k3-ringacc.c
>  create mode 100644 include/dt-bindings/dma/k3-udma.h
>  create mode 100644 include/linux/dma/k3-udma-glue.h
>  create mode 100644 include/linux/dma/ti-cppi5.h
>  create mode 100644 include/linux/soc/ti/k3-ringacc.h
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA
  2019-07-30  9:34 ` [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA Peter Ujfalusi
@ 2019-08-21 17:59   ` Rob Herring
  2019-08-22 11:18     ` Peter Ujfalusi
  0 siblings, 1 reply; 32+ messages in thread
From: Rob Herring @ 2019-08-21 17:59 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: vkoul, nm, ssantosh, dan.j.williams, dmaengine, linux-arm-kernel,
	devicetree, linux-kernel, grygorii.strashko, lokeshvutla,
	t-kristo, tony, j-keerthy

On Tue, Jul 30, 2019 at 12:34:43PM +0300, Peter Ujfalusi wrote:
> New binding document for
> Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P).
> 
> UDMA-P is introduced as part of the K3 architecture and can be found on
> AM654 and j721e.
> 
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
>  .../devicetree/bindings/dma/ti/k3-udma.txt    | 170 ++++++++++++++++++
>  include/dt-bindings/dma/k3-udma.h             |  10 ++
>  2 files changed, 180 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
>  create mode 100644 include/dt-bindings/dma/k3-udma.h
> 
> diff --git a/Documentation/devicetree/bindings/dma/ti/k3-udma.txt b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
> new file mode 100644
> index 000000000000..7f30fe583ade
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
> @@ -0,0 +1,170 @@
> +* Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)
> +
> +The UDMA-P is intended to perform similar (but significantly upgraded) functions
> +as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
> +supports the transmission and reception of various packet types. The UDMA-P is
> +architected to facilitate the segmentation and reassembly of SoC DMA data
> +structure compliant packets to/from smaller data blocks that are natively
> +compatible with the specific requirements of each connected peripheral. Multiple
> +Tx and Rx channels are provided within the DMA which allow multiple segmentation
> +or reassembly operations to be ongoing. The DMA controller maintains state
> +information for each of the channels which allows packet segmentation and
> +reassembly operations to be time division multiplexed between channels in order
> +to share the underlying DMA hardware. An external DMA scheduler is used to
> +control the ordering and rate at which this multiplexing occurs for Transmit
> +operations. The ordering and rate of Receive operations is indirectly controlled
> +by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.
> +
> +The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
> +channels. Channels in the UDMA-P can be configured to be either Packet-Based or
> +Third-Party channels on a channel by channel basis.
> +
> +Required properties:
> +--------------------
> +- compatible:		Should be
> +			"ti,am654-navss-main-udmap" for am654 main NAVSS UDMAP
> +			"ti,am654-navss-mcu-udmap" for am654 mcu NAVSS UDMAP
> +			"ti,j721e-navss-main-udmap" for j721e main NAVSS UDMAP
> +			"ti,j721e-navss-mcu-udmap" for j721e mcu NAVSS UDMAP
> +- #dma-cells:		Should be set to <3>.
> +			- The first parameter is a phandle to the remote PSI-L
> +			  endpoint

This is the phandle of the client? That's weird. More below.

> +			- The second parameter is the thread offset within the
> +			  remote thread ID range
> +			- The third parameter is the channel direction.
> +- reg:			Memory map of UDMAP
> +- reg-names:		"gcfg", "rchanrt", "tchanrt"
> +- msi-parent:		phandle for "ti,sci-inta" interrupt controller
> +- ti,ringacc:		phandle for the ring accelerator node
> +- ti,psil-base:		PSI-L thread ID base of the UDMAP channels
> +- ti,sci:		phandle on TI-SCI compatible System controller node
> +- ti,sci-dev-id:	TI-SCI device id
> +- ti,sci-rm-range-tchan: UDMA tchan resource list in pairs of type and subtype
> +- ti,sci-rm-range-rchan: UDMA rchan resource list in pairs of type and subtype
> +- ti,sci-rm-range-rflow: UDMA rflow resource list in pairs of type and subtype
> +
> +For PSI-L thread management the parent NAVSS node must have:
> +- ti,sci:		phandle on TI-SCI compatible System controller node
> +- ti,sci-dev-id:	TI-SCI device id of the NAVSS instance
> +
> +Remote PSI-L endpoint
> +
> +Required properties:
> +--------------------
> +- ti,psil-base:		PSI-L thread ID base of the endpoint
> +
> +Within the PSI-L endpoint node thread configuration subnodes must present with:
> +psil-configX naming convention, where X is the thread ID offset.
> +
> +Configuration node Optional properties:
> +--------------------
> +- pdma,statictr-type:	In case the remote endpoint (PDMAs) requires StaticTR

Property names are in the form [<vendor>,]prop-name. pdma is not a 
vendor.

> +			configuration:
> +			- PSIL_STATIC_TR_XY (1): XY type of StaticTR
> +			For endpoints without StaticTR the property is not
> +			needed or to be set PSIL_STATIC_TR_NONE (0).
> +- pdma,enable-acc32:	Force 32 bit access on peripheral port. Only valid for
> +			XY type StaticTR, not supported on am654.
> +			Must be enabled for threads servicing McASP with AFIFO
> +			bypass mode.
> +- pdma,enable-burst:	Enable burst access on peripheral port. Only valid for
> +			XY type StaticTR, not supported on am654.
> +- ti,channel-tpl:	Channel Throughput level:
> +			0 / or not present - normal channel
> +			1 - High Throughput channel
> +			2 - Ultra High Throughput channel (j721e only)
> +- ti,needs-epib:	If the endpoint require EPIB to be present in the
> +			descriptor.
> +- ti,psd-size:		Size of the Protocol Specific Data section of the
> +			descriptor.
> +
> +Example:
> +
> +main_navss: main_navss {
> +	compatible = "simple-bus";
> +	#address-cells = <2>;
> +	#size-cells = <2>;
> +	dma-coherent;
> +	dma-ranges;
> +	ranges;
> +
> +	ti,sci = <&dmsc>;
> +	ti,sci-dev-id = <118>;
> +
> +	main_udmap: dma-controller@31150000 {
> +		compatible = "ti,am654-navss-main-udmap";
> +		reg =	<0x0 0x31150000 0x0 0x100>,
> +			<0x0 0x34000000 0x0 0x100000>,
> +			<0x0 0x35000000 0x0 0x100000>;
> +		reg-names = "gcfg", "rchanrt", "tchanrt";
> +		#dma-cells = <3>;
> +
> +		ti,ringacc = <&ringacc>;
> +		ti,psil-base = <0x1000>;
> +
> +		interrupt-parent = <&main_udmass_inta>;
> +
> +		ti,sci = <&dmsc>;
> +		ti,sci-dev-id = <188>;
> +
> +		ti,sci-rm-range-tchan = <0x6 0x1>, /* TX_HCHAN */
> +					<0x6 0x2>; /* TX_CHAN */
> +		ti,sci-rm-range-rchan = <0x6 0x4>, /* RX_HCHAN */
> +					<0x6 0x5>; /* RX_CHAN */
> +		ti,sci-rm-range-rflow = <0x6 0x6>; /* GP RFLOW */
> +	};
> +};
> +
> +psilss@340c000 {
> +	/* PSILSS1 AASRC */
> +	compatible = "ti,j721e-psilss";
> +	reg = <0x0 0x0340c000 0x0 0x1000>;
> +	reg-names = "config";
> +
> +	pdma_main_mcasp_g0: pdma_main_mcasp_g0 {
> +		/* PDMA6 (PDMA_MCASP_G0) */
> +		ti,psil-base = <0x4400>;
> +
> +		/* psil-config0 */
> +		psil-config0 {
> +			pdma,statictr-type = <PSIL_STATIC_TR_XY>;
> +			pdma,enable-acc32;
> +			pdma,enable-burst;
> +		};
> +	};
> +};
> +
> +mcasp0: mcasp@02B00000 {

I don't really follow what psilss and mcasp are...

> +...
> +	/* tx: PDMA_MAIN_MCASP_G0-0, rx: PDMA_MAIN_MCASP_G0-0 */
> +	dmas = <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_TX>,
> +	       <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_RX>;
> +	dma-names = "tx", "rx";
> +...
> +};
> +
> +crypto: crypto@4E00000 {
> +	compatible = "ti,sa2ul-crypto";
> +...
> +
> +	/* tx: crypto_pnp-1, rx: crypto_pnp-1 */
> +	dmas = <&main_udmap &crypto 0 UDMA_DIR_TX>,
> +	       <&main_udmap &crypto 0 UDMA_DIR_RX>,
> +	       <&main_udmap &crypto 1 UDMA_DIR_RX>;

'thread offset' is 1?

> +	dma-names = "tx", "rx1", "rx2";
> +...
> +	psil-config0 {

Are these nodes 1-1 with the 'dmas' entries? I think these flags should 
all be DMA cells. They are all configuration of DMA channels, right?

Though I'm not sure about how that would work for the previous example.

> +		ti,needs-epib;
> +		ti,psd-size = <64>;
> +	};
> +
> +	psil-config1 {
> +		ti,needs-epib;
> +		ti,psd-size = <64>;
> +	};
> +
> +	psil-config2 {
> +		ti,needs-epib;
> +		ti,psd-size = <64>;
> +	};
> +};
> diff --git a/include/dt-bindings/dma/k3-udma.h b/include/dt-bindings/dma/k3-udma.h
> new file mode 100644
> index 000000000000..f5c8f5d50491
> --- /dev/null
> +++ b/include/dt-bindings/dma/k3-udma.h
> @@ -0,0 +1,10 @@
> +#ifndef __DT_TI_UDMA_H
> +#define __DT_TI_UDMA_H
> +
> +#define UDMA_DIR_TX		0
> +#define UDMA_DIR_RX		1
> +
> +#define PSIL_STATIC_TR_NONE	0
> +#define PSIL_STATIC_TR_XY	1
> +
> +#endif /* __DT_TI_UDMA_H */
> -- 
> Peter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA
  2019-08-21 17:59   ` Rob Herring
@ 2019-08-22 11:18     ` Peter Ujfalusi
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-08-22 11:18 UTC (permalink / raw)
  To: Rob Herring
  Cc: vkoul, nm, ssantosh, dan.j.williams, dmaengine, linux-arm-kernel,
	devicetree, linux-kernel, grygorii.strashko, lokeshvutla,
	t-kristo, tony, j-keerthy



On 21/08/2019 20.59, Rob Herring wrote:
> On Tue, Jul 30, 2019 at 12:34:43PM +0300, Peter Ujfalusi wrote:
>> New binding document for
>> Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P).
>>
>> UDMA-P is introduced as part of the K3 architecture and can be found on
>> AM654 and j721e.
>>
>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
>> ---
>>  .../devicetree/bindings/dma/ti/k3-udma.txt    | 170 ++++++++++++++++++
>>  include/dt-bindings/dma/k3-udma.h             |  10 ++
>>  2 files changed, 180 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
>>  create mode 100644 include/dt-bindings/dma/k3-udma.h
>>
>> diff --git a/Documentation/devicetree/bindings/dma/ti/k3-udma.txt b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
>> new file mode 100644
>> index 000000000000..7f30fe583ade
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
>> @@ -0,0 +1,170 @@
>> +* Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)
>> +
>> +The UDMA-P is intended to perform similar (but significantly upgraded) functions
>> +as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
>> +supports the transmission and reception of various packet types. The UDMA-P is
>> +architected to facilitate the segmentation and reassembly of SoC DMA data
>> +structure compliant packets to/from smaller data blocks that are natively
>> +compatible with the specific requirements of each connected peripheral. Multiple
>> +Tx and Rx channels are provided within the DMA which allow multiple segmentation
>> +or reassembly operations to be ongoing. The DMA controller maintains state
>> +information for each of the channels which allows packet segmentation and
>> +reassembly operations to be time division multiplexed between channels in order
>> +to share the underlying DMA hardware. An external DMA scheduler is used to
>> +control the ordering and rate at which this multiplexing occurs for Transmit
>> +operations. The ordering and rate of Receive operations is indirectly controlled
>> +by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.
>> +
>> +The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
>> +channels. Channels in the UDMA-P can be configured to be either Packet-Based or
>> +Third-Party channels on a channel by channel basis.
>> +
>> +Required properties:
>> +--------------------
>> +- compatible:		Should be
>> +			"ti,am654-navss-main-udmap" for am654 main NAVSS UDMAP
>> +			"ti,am654-navss-mcu-udmap" for am654 mcu NAVSS UDMAP
>> +			"ti,j721e-navss-main-udmap" for j721e main NAVSS UDMAP
>> +			"ti,j721e-navss-mcu-udmap" for j721e mcu NAVSS UDMAP
>> +- #dma-cells:		Should be set to <3>.
>> +			- The first parameter is a phandle to the remote PSI-L
>> +			  endpoint
> 
> This is the phandle of the client? That's weird. More below.

Not the client, but of the psi-l gasket. PSI-L stands for Packet
Streaming Interface Link.
UDMA can not talk directly to peripherals, it operates within the PSI-L
fabric. If a peripheral needs to be serviced by UDMA it has to have a
PSI-L thread ID.

In PSI-L every source and destination have unique thread ID which is
used by the PSI-L to steer the data flow from source to destination.
The thread is map is broken down to gaskets, for example
PDMA0's thread IDs are 0x4400-0x44ff
SA2UL's threads are between 0x4000-0x40ff
UDMAP0's threads are 0x1000-0x3fff

Gaskets usually group multiple threads, so for example to communicate
with PDMA0's 2nd thread, the thread ID which need to be used for
configuration is 0x4402.

I'll try to give more details later.

> 
>> +			- The second parameter is the thread offset within the
>> +			  remote thread ID range
>> +			- The third parameter is the channel direction.
>> +- reg:			Memory map of UDMAP
>> +- reg-names:		"gcfg", "rchanrt", "tchanrt"
>> +- msi-parent:		phandle for "ti,sci-inta" interrupt controller
>> +- ti,ringacc:		phandle for the ring accelerator node
>> +- ti,psil-base:		PSI-L thread ID base of the UDMAP channels
>> +- ti,sci:		phandle on TI-SCI compatible System controller node
>> +- ti,sci-dev-id:	TI-SCI device id
>> +- ti,sci-rm-range-tchan: UDMA tchan resource list in pairs of type and subtype
>> +- ti,sci-rm-range-rchan: UDMA rchan resource list in pairs of type and subtype
>> +- ti,sci-rm-range-rflow: UDMA rflow resource list in pairs of type and subtype
>> +
>> +For PSI-L thread management the parent NAVSS node must have:
>> +- ti,sci:		phandle on TI-SCI compatible System controller node
>> +- ti,sci-dev-id:	TI-SCI device id of the NAVSS instance
>> +
>> +Remote PSI-L endpoint
>> +
>> +Required properties:
>> +--------------------
>> +- ti,psil-base:		PSI-L thread ID base of the endpoint
>> +
>> +Within the PSI-L endpoint node thread configuration subnodes must present with:
>> +psil-configX naming convention, where X is the thread ID offset.
>> +
>> +Configuration node Optional properties:
>> +--------------------
>> +- pdma,statictr-type:	In case the remote endpoint (PDMAs) requires StaticTR
> 
> Property names are in the form [<vendor>,]prop-name. pdma is not a 
> vendor.

Which one is acceptable replacement ti,pdma,statictr-type, or
ti,pdma-statictr-type?

> 
>> +			configuration:
>> +			- PSIL_STATIC_TR_XY (1): XY type of StaticTR
>> +			For endpoints without StaticTR the property is not
>> +			needed or to be set PSIL_STATIC_TR_NONE (0).
>> +- pdma,enable-acc32:	Force 32 bit access on peripheral port. Only valid for
>> +			XY type StaticTR, not supported on am654.
>> +			Must be enabled for threads servicing McASP with AFIFO
>> +			bypass mode.
>> +- pdma,enable-burst:	Enable burst access on peripheral port. Only valid for
>> +			XY type StaticTR, not supported on am654.
>> +- ti,channel-tpl:	Channel Throughput level:
>> +			0 / or not present - normal channel
>> +			1 - High Throughput channel
>> +			2 - Ultra High Throughput channel (j721e only)
>> +- ti,needs-epib:	If the endpoint require EPIB to be present in the
>> +			descriptor.
>> +- ti,psd-size:		Size of the Protocol Specific Data section of the
>> +			descriptor.
>> +
>> +Example:
>> +
>> +main_navss: main_navss {
>> +	compatible = "simple-bus";
>> +	#address-cells = <2>;
>> +	#size-cells = <2>;
>> +	dma-coherent;
>> +	dma-ranges;
>> +	ranges;
>> +
>> +	ti,sci = <&dmsc>;
>> +	ti,sci-dev-id = <118>;
>> +
>> +	main_udmap: dma-controller@31150000 {
>> +		compatible = "ti,am654-navss-main-udmap";
>> +		reg =	<0x0 0x31150000 0x0 0x100>,
>> +			<0x0 0x34000000 0x0 0x100000>,
>> +			<0x0 0x35000000 0x0 0x100000>;
>> +		reg-names = "gcfg", "rchanrt", "tchanrt";
>> +		#dma-cells = <3>;
>> +
>> +		ti,ringacc = <&ringacc>;
>> +		ti,psil-base = <0x1000>;
>> +
>> +		interrupt-parent = <&main_udmass_inta>;
>> +
>> +		ti,sci = <&dmsc>;
>> +		ti,sci-dev-id = <188>;
>> +
>> +		ti,sci-rm-range-tchan = <0x6 0x1>, /* TX_HCHAN */
>> +					<0x6 0x2>; /* TX_CHAN */
>> +		ti,sci-rm-range-rchan = <0x6 0x4>, /* RX_HCHAN */
>> +					<0x6 0x5>; /* RX_CHAN */
>> +		ti,sci-rm-range-rflow = <0x6 0x6>; /* GP RFLOW */
>> +	};
>> +};
>> +
>> +psilss@340c000 {
>> +	/* PSILSS1 AASRC */
>> +	compatible = "ti,j721e-psilss";
>> +	reg = <0x0 0x0340c000 0x0 0x1000>;
>> +	reg-names = "config";
>> +
>> +	pdma_main_mcasp_g0: pdma_main_mcasp_g0 {
>> +		/* PDMA6 (PDMA_MCASP_G0) */
>> +		ti,psil-base = <0x4400>;
>> +
>> +		/* psil-config0 */
>> +		psil-config0 {
>> +			pdma,statictr-type = <PSIL_STATIC_TR_XY>;
>> +			pdma,enable-acc32;
>> +			pdma,enable-burst;
>> +		};
>> +	};
>> +};
>> +
>> +mcasp0: mcasp@02B00000 {
> 
> I don't really follow what psilss and mcasp are...

McASP is a peripheral which does not have PSI-L interface, it is legacy
IP. It is serviced by PDMA which is on one side is a PSI-L peripheral,
on the other side it is communicating with McASP.

Each thread in the PDMA is dedicated to service a given legacy
peripheral and within a PDMA block we could have mixed type of
channels/threads servicing different legacy peripherals.

In essence the PDMA channel is a purpose built small DMA to service a
single peripheral and the channels are grouped together to for a PDMA
gasket.

The PDMA itself can only be programmed via the UDMA channel's remote
peer register area, it is like an extension of the UDMA to legacy
peripherals.

Native PSI-L peripherals like SA2UL, CPSW or ICSSG does not have PDMA as
they are built for PSI-L, thus they don't have any static TR registers
or things which is needed in PDMAs to be able to service lefacy
peripherals.

>> +...
>> +	/* tx: PDMA_MAIN_MCASP_G0-0, rx: PDMA_MAIN_MCASP_G0-0 */
>> +	dmas = <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_TX>,
>> +	       <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_RX>;
>> +	dma-names = "tx", "rx";
>> +...
>> +};
>> +
>> +crypto: crypto@4E00000 {
>> +	compatible = "ti,sa2ul-crypto";
>> +...
>> +
>> +	/* tx: crypto_pnp-1, rx: crypto_pnp-1 */
>> +	dmas = <&main_udmap &crypto 0 UDMA_DIR_TX>,
>> +	       <&main_udmap &crypto 0 UDMA_DIR_RX>,
>> +	       <&main_udmap &crypto 1 UDMA_DIR_RX>;
> 
> 'thread offset' is 1?

Yes. SA2UL uses one tx thread (paired with one UDMA tx channel): 0x4000
and it has two RX threads (each paired to separate UDMA rx channel):
0x4000 and 0x4001

In protocol level PSI-L the thread IDs are defined as:
0x0000 - 0x7fff Source PSI-L threads
0x8000 - 0xffff Destination threads

The documentation defines the source ID only and the destination ID is
derived by src_id + 0x8000 or src_id | 0x8000

and the PSI-L thread configuration is symmetric among them, so the
settings for 0x4000 applies to 0x4000|0x8000 as well, this is why I
opted to add the direction parameter instead of spelling out the
configuration for src and dst threads.

> 
>> +	dma-names = "tx", "rx1", "rx2";
>> +...
>> +	psil-config0 {
> 
> Are these nodes 1-1 with the 'dmas' entries? I think these flags should 
> all be DMA cells. They are all configuration of DMA channels, right?

These are unique for the given PSI-L thread. Depending on the remote
(remote to UDMA) thread the parameters can be different and the valid
set of parameters as well.

> Though I'm not sure about how that would work for the previous example.

Like native native PSI-L peripherals does not have for example static TR
support as they don't need, so anything which is needed for static TR
does not apply to them.

If I would put all the possible thread parameters in one line then I
would easily go beyond 15 entries and we still have not supported
features on the threads. Adding them would break the bindings as I would
need to expand the parameter list and it would be really hard to decode
and understand what the given dmas line is actually describing.

> 
>> +		ti,needs-epib;
>> +		ti,psd-size = <64>;
>> +	};
>> +
>> +	psil-config1 {
>> +		ti,needs-epib;
>> +		ti,psd-size = <64>;
>> +	};
>> +
>> +	psil-config2 {
>> +		ti,needs-epib;
>> +		ti,psd-size = <64>;
>> +	};
>> +};
>> diff --git a/include/dt-bindings/dma/k3-udma.h b/include/dt-bindings/dma/k3-udma.h
>> new file mode 100644
>> index 000000000000..f5c8f5d50491
>> --- /dev/null
>> +++ b/include/dt-bindings/dma/k3-udma.h
>> @@ -0,0 +1,10 @@
>> +#ifndef __DT_TI_UDMA_H
>> +#define __DT_TI_UDMA_H
>> +
>> +#define UDMA_DIR_TX		0
>> +#define UDMA_DIR_RX		1
>> +
>> +#define PSIL_STATIC_TR_NONE	0
>> +#define PSIL_STATIC_TR_XY	1
>> +
>> +#endif /* __DT_TI_UDMA_H */
>> -- 
>> Peter
>>
>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>>

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support
  2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
                   ` (14 preceding siblings ...)
  2019-07-31  7:08 ` [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
@ 2019-08-30 12:12 ` Peter Ujfalusi
  15 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-08-30 12:12 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Hi,

On 30/07/2019 12.34, Peter Ujfalusi wrote:
> Changes since v1
> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*)
> - Added support for j721e
> - Based on 5.3-rc2
> - dropped ti_sci API patch for RM management as it is already upstream
> - dropped dmadev_get_slave_channel() patch, using __dma_request_channel()
> - Added Rob's Reviewed-by to ringacc DT binding document patch
> - DT bindings changes:
>  - linux,udma-mode is gone, I have a simple lookup table in the driver to flag
>    TR channels.
>  - Support for j721e
> - Fix bug in of_node_put() handling in xlate function
> 
> Changes since RFC (https://patchwork.kernel.org/cover/10612465/):
> - Based on linux-next (20190506) which now have the ti_sci interrupt support
> - The series can be applied and the UDMA via DMAengine API will be functional
> - Included in the series: ti_sci Resource management API, cppi5 header and
>   driver for the ring accelerator.
> - The DMAengine core patches have been updated as per the review comments for
>   earlier submittion.
> - The DMAengine driver patch is artificially split up to 6 smaller patches

it has been exactly a month ago that I have sent the v2.
Before I send v3 to address Rob's pdma, prefix comment for the binding
document I would like to have some feedback on the rest of the series.

I believe the DMAengine core patches could be applied without the UDMA
driver stack as there is now a clear user and implementation coming up
for the metadata and the cached data reporting.

Lokesh, Tero, Santosh: Can you look at the ringacc driver, please?

Thanks,
- Péter

> The k3-udma driver implements the Data Movement Architecture described in
> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and
> j721e TRM (http://www.ti.com/lit/pdf/spruil1)
> 
> This DMA architecture is a big departure from 'traditional' architecture where
> we had either EDMA or sDMA as system DMA.
> 
> Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)
> or USB (am335x) while other peripherals were serviced by EDMA.
> 
> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the
> SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). 
> 
> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)
> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and
> TR mode (similar to EDMA descriptor).
> The data movement is done within a PSI-L fabric, peripherals (including the
> UDMA-P) are not addressed by their I/O register as with traditional DMAs but
> with their PSI-L thread ID.
> 
> In AM65x/j721e we have two main type of peripherals:
> Legacy: McASP, McSPI, UART, etc.
>  to provide connectivity they are serviced by PDMA (Peripheral DMA)
>  PDMA threads are locked to service a given peripheral, for example PSI-L thread
>  0x4400/0xc400 is to service McASP0 rx/tx.
>  The PDMa configuration can be done via the UDMA Real Time Peer registers.
> Native: Networking, security accelerator
>  these peripherals have native support for PSI-L.
> 
> To be able to use the DMA the following generic steps need to be taken:
> - configure a DMA channel (tchan for TX, rchan for RX)
>  - channel mode: Packet or TR mode
>  - for memcpy a tchan and rchan pair is used.
>  - for packet mode RX we also need to configure a receive flow to configure the
>    packet receiption
> - the source and destination threads must be paired
> - at minimum one pair of rings need to be configured:
>  - tx: transfer ring and transfer completion ring
>  - rx: free descriptor ring and receive ring
> - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring
>  - If the channel is in packet mode or configured to memcpy then we only need
>    one interrupt from the ring, events from UDMAP is not used.
> 
> When the channel setup is completed we only interract with the rings:
> - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by
>   the UDMA-P
> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to
>   the r_ring.
> 
> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the
> case in previous DMAs we need to report the amount of data held in these FIFOs
> to clients (delay calculation for ALSA, UART FIFO flush support).
> 
> Metadata support:
> DMAengine user driver was posted upstream based/tested on the v1 of the UDMA
> series: https://lkml.org/lkml/2019/6/28/20
> SA2UL is using the metadata DMAengine API.
> 
> Note on the last patch:
> In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case
> anymore and the DMAengine API currently missing support for the features we
> would need to support networking, things like
> - support for receive descriptor 'classification'
>  - we need to support several receive queues for a channel.
>  - the queues are used for packet priority handling for example, but they can be
>    used to have pools of descriptors for different sizes.
> - out of order completion of descriptors on a channel
>  - when we have several queues to handle different priority packets the
>    descriptors will be completed 'out-of-order'
> - NAPI type of operation (polling instead of interrupt driven transfer)
>  - without this we can not sustain gigabit speeds and we need to support NAPI
>  - not to limit this to networking, but other high performance operations
> 
> It is my intention to work on these to be able to remove the 'glue' layer and
> switch to DMAengine API - or have an API aside of DMAengine to have generic way
> to support networking, but given how controversial and not trivial these changes
> are we need something to support networking.
> 
> The series (+DT patch to enabled UDMA/PDMA on AM65x) on top of 5.3-rc2 is
> available:
> https://github.com/omap-audio/linux-audio.git peter/udma/series_v2-5.3-rc2
> 
> Regards,
> Peter
> ---
> Grygorii Strashko (3):
>   bindings: soc: ti: add documentation for k3 ringacc
>   soc: ti: k3: add navss ringacc driver
>   dmaengine: ti: k3-udma: Add glue layer for non DMAengine users
> 
> Peter Ujfalusi (11):
>   dmaengine: doc: Add sections for per descriptor metadata support
>   dmaengine: Add metadata_ops for dma_async_tx_descriptor
>   dmaengine: Add support for reporting DMA cached data amount
>   dmaengine: ti: Add cppi5 header for UDMA
>   dt-bindings: dma: ti: Add document for K3 UDMA
>   dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io
>     func
>   dmaengine: ti: New driver for K3 UDMA - split#2: probe/remove, xlate
>     and filter_fn
>   dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free
>     chan_resources
>   dmaengine: ti: New driver for K3 UDMA - split#4: dma_device callbacks
>     1
>   dmaengine: ti: New driver for K3 UDMA - split#5: dma_device callbacks
>     2
>   dmaengine: ti: New driver for K3 UDMA - split#6: Kconfig and Makefile
> 
>  .../devicetree/bindings/dma/ti/k3-udma.txt    |  170 +
>  .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +
>  Documentation/driver-api/dmaengine/client.rst |   75 +
>  .../driver-api/dmaengine/provider.rst         |   46 +
>  drivers/dma/dmaengine.c                       |   73 +
>  drivers/dma/dmaengine.h                       |    8 +
>  drivers/dma/ti/Kconfig                        |   22 +
>  drivers/dma/ti/Makefile                       |    2 +
>  drivers/dma/ti/k3-udma-glue.c                 | 1039 +++++
>  drivers/dma/ti/k3-udma-private.c              |  124 +
>  drivers/dma/ti/k3-udma.c                      | 3479 +++++++++++++++++
>  drivers/dma/ti/k3-udma.h                      |  160 +
>  drivers/soc/ti/Kconfig                        |   17 +
>  drivers/soc/ti/Makefile                       |    1 +
>  drivers/soc/ti/k3-ringacc.c                   | 1191 ++++++
>  include/dt-bindings/dma/k3-udma.h             |   10 +
>  include/linux/dma/k3-udma-glue.h              |  125 +
>  include/linux/dma/ti-cppi5.h                  |  996 +++++
>  include/linux/dmaengine.h                     |  110 +
>  include/linux/soc/ti/k3-ringacc.h             |  262 ++
>  20 files changed, 7969 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
>  create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt
>  create mode 100644 drivers/dma/ti/k3-udma-glue.c
>  create mode 100644 drivers/dma/ti/k3-udma-private.c
>  create mode 100644 drivers/dma/ti/k3-udma.c
>  create mode 100644 drivers/dma/ti/k3-udma.h
>  create mode 100644 drivers/soc/ti/k3-ringacc.c
>  create mode 100644 include/dt-bindings/dma/k3-udma.h
>  create mode 100644 include/linux/dma/k3-udma-glue.h
>  create mode 100644 include/linux/dma/ti-cppi5.h
>  create mode 100644 include/linux/soc/ti/k3-ringacc.h
> 

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver
  2019-07-30  9:34 ` [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver Peter Ujfalusi
@ 2019-08-30 12:57   ` Peter Ujfalusi
  2019-09-09  6:09   ` Tero Kristo
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-08-30 12:57 UTC (permalink / raw)
  To: vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, t-kristo, tony,
	j-keerthy

Hi,

On 30/07/2019 12.34, Peter Ujfalusi wrote:
> From: Grygorii Strashko <grygorii.strashko@ti.com>
> 
> The Ring Accelerator (RINGACC or RA) provides hardware acceleration to
> enable straightforward passing of work between a producer and a consumer.
> There is one RINGACC module per NAVSS on TI AM65x SoCs.
> 
> The RINGACC converts constant-address read and write accesses to equivalent
> read or write accesses to a circular data structure in memory. The RINGACC
> eliminates the need for each DMA controller which needs to access ring
> elements from having to know the current state of the ring (base address,
> current offset). The DMA controller performs a read or write access to a
> specific address range (which maps to the source interface on the RINGACC)
> and the RINGACC replaces the address for the transaction with a new address
> which corresponds to the head or tail element of the ring (head for reads,
> tail for writes). Since the RINGACC maintains the state, multiple DMA
> controllers or channels are allowed to coherently share the same rings as
> applicable. The RINGACC is able to place data which is destined towards
> software into cached memory directly.
> 
> Supported ring modes:
> - Ring Mode
> - Messaging Mode
> - Credentials Mode
> - Queue Manager Mode
> 
> TI-SCI integration:
> 
> Texas Instrument's System Control Interface (TI-SCI) Message Protocol now
> has control over Ringacc module resources management (RM) and Rings
> configuration.
> 
> The corresponding support of TI-SCI Ringacc module RM protocol
> introduced as option through DT parameters:
> - ti,sci: phandle on TI-SCI firmware controller DT node
> - ti,sci-dev-id: TI-SCI device identifier as per TI-SCI firmware spec
> 
> if both parameters present - Ringacc driver will configure/free/reset Rings
> using TI-SCI Message Ringacc RM Protocol.
> 
> The Ringacc driver manages Rings allocation by itself now and requests
> TI-SCI firmware to allocate and configure specific Rings only. It's done
> this way because, Linux driver implements two stage Rings allocation and
> configuration (allocate ring and configure ring) while I-SCI Message
> Protocol supports only one combined operation (allocate+configure).
> 
> Grygorii Strashko <grygorii.strashko@ti.com>
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
>  drivers/soc/ti/Kconfig            |   17 +
>  drivers/soc/ti/Makefile           |    1 +
>  drivers/soc/ti/k3-ringacc.c       | 1191 +++++++++++++++++++++++++++++
>  include/linux/soc/ti/k3-ringacc.h |  262 +++++++
>  4 files changed, 1471 insertions(+)
>  create mode 100644 drivers/soc/ti/k3-ringacc.c
>  create mode 100644 include/linux/soc/ti/k3-ringacc.h
> 
> diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
> index cf545f428d03..10c76faa503e 100644
> --- a/drivers/soc/ti/Kconfig
> +++ b/drivers/soc/ti/Kconfig
> @@ -80,6 +80,23 @@ config TI_SCI_PM_DOMAINS
>  	  called ti_sci_pm_domains. Note this is needed early in boot before
>  	  rootfs may be available.
>  
> +config TI_K3_RINGACC
> +	tristate "K3 Ring accelerator Sub System"
> +	depends on ARCH_K3 || COMPILE_TEST
> +	depends on TI_SCI_INTA_IRQCHIP
> +	default y
> +	help
> +	  Say y here to support the K3 Ring accelerator module.
> +	  The Ring Accelerator (RINGACC or RA)  provides hardware acceleration
> +	  to enable straightforward passing of work between a producer
> +	  and a consumer. There is one RINGACC module per NAVSS on TI AM65x SoCs
> +	  If unsure, say N.
> +
> +config TI_K3_RINGACC_DEBUG
> +	tristate "K3 Ring accelerator Sub System tests and debug"
> +	depends on TI_K3_RINGACC
> +	default n
> +
>  endif # SOC_TI
>  
>  config TI_SCI_INTA_MSI_DOMAIN
> diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
> index b3868d392d4f..cc4bc8b08bf5 100644
> --- a/drivers/soc/ti/Makefile
> +++ b/drivers/soc/ti/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_AMX3_PM)			+= pm33xx.o
>  obj-$(CONFIG_WKUP_M3_IPC)		+= wkup_m3_ipc.o
>  obj-$(CONFIG_TI_SCI_PM_DOMAINS)		+= ti_sci_pm_domains.o
>  obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN)	+= ti_sci_inta_msi.o
> +obj-$(CONFIG_TI_K3_RINGACC)		+= k3-ringacc.o
> diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
> new file mode 100644
> index 000000000000..401dfc963319
> --- /dev/null
> +++ b/drivers/soc/ti/k3-ringacc.c
> @@ -0,0 +1,1191 @@

...

> +void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ)
> +{
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return;
> +
> +	if (!ring->parent->dma_ring_reset_quirk)

k3_ringacc_ring_reset(ring); is missing from here.

> +		return;
> +
> +	if (!occ)
> +		occ = dbg_readl(&ring->rt->occ);
> +
> +	if (occ) {
> +		u32 db_ring_cnt, db_ring_cnt_cur;
> +
> +		k3_nav_dbg(ring->parent->dev, "%s %u occ: %u\n", __func__,
> +			   ring->ring_id, occ);
> +		/* 2. Reset the ring */
> +		k3_ringacc_ring_reset_sci(ring);
> +
> +		/*
> +		 * 3. Setup the ring in ring/doorbell mode
> +		 * (if not already in this mode)
> +		 */
> +		if (ring->mode != K3_RINGACC_RING_MODE_RING)
> +			k3_ringacc_ring_reconfig_qmode_sci(
> +					ring, K3_RINGACC_RING_MODE_RING);
> +		/*
> +		 * 4. Ring the doorbell 2**22 – ringOcc times.
> +		 * This will wrap the internal UDMAP ring state occupancy
> +		 * counter (which is 21-bits wide) to 0.
> +		 */
> +		db_ring_cnt = (1U << 22) - occ;
> +
> +		while (db_ring_cnt != 0) {
> +			/*
> +			 * Ring the doorbell with the maximum count each
> +			 * iteration if possible to minimize the total
> +			 * of writes
> +			 */
> +			if (db_ring_cnt > K3_RINGACC_MAX_DB_RING_CNT)
> +				db_ring_cnt_cur = K3_RINGACC_MAX_DB_RING_CNT;
> +			else
> +				db_ring_cnt_cur = db_ring_cnt;
> +
> +			writel(db_ring_cnt_cur, &ring->rt->db);
> +			db_ring_cnt -= db_ring_cnt_cur;
> +		}
> +
> +		/* 5. Restore the original ring mode (if not ring mode) */
> +		if (ring->mode != K3_RINGACC_RING_MODE_RING)
> +			k3_ringacc_ring_reconfig_qmode_sci(ring, ring->mode);
> +	}
> +
> +	/* 2. Reset the ring */
> +	k3_ringacc_ring_reset(ring);
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset_dma);

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/14] dmaengine: Add metadata_ops for dma_async_tx_descriptor
  2019-07-30  9:34 ` [PATCH v2 04/14] dmaengine: Add metadata_ops for dma_async_tx_descriptor Peter Ujfalusi
@ 2019-09-08 14:12   ` Vinod Koul
  2019-09-09  6:52     ` Peter Ujfalusi
  0 siblings, 1 reply; 32+ messages in thread
From: Vinod Koul @ 2019-09-08 14:12 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: robh+dt, nm, ssantosh, dan.j.williams, dmaengine,
	linux-arm-kernel, devicetree, linux-kernel, grygorii.strashko,
	lokeshvutla, t-kristo, tony, j-keerthy

On 30-07-19, 12:34, Peter Ujfalusi wrote:
> The metadata is best described as side band data or parameters traveling
> alongside the data DMAd by the DMA engine. It is data
> which is understood by the peripheral and the peripheral driver only, the
> DMA engine see it only as data block and it is not interpreting it in any
> way.
> 
> The metadata can be different per descriptor as it is a parameter for the
> data being transferred.
> 
> If the DMA supports per descriptor metadata it can implement the attach,
> get_ptr/set_len callbacks.
> 
> Client drivers must only use either attach or get_ptr/set_len to avoid
> misconfiguration.
> 
> Client driver can check if a given metadata mode is supported by the
> channel during probe time with
> dmaengine_is_metadata_mode_supported(chan, DESC_METADATA_CLIENT);
> dmaengine_is_metadata_mode_supported(chan, DESC_METADATA_ENGINE);
> 
> and based on this information can use either mode.
> 
> Wrappers are also added for the metadata_ops.
> 
> To be used in DESC_METADATA_CLIENT mode:
> dmaengine_desc_attach_metadata()
> 
> To be used in DESC_METADATA_ENGINE mode:
> dmaengine_desc_get_metadata_ptr()
> dmaengine_desc_set_metadata_len()
> 
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
>  drivers/dma/dmaengine.c   |  73 ++++++++++++++++++++++++++
>  include/linux/dmaengine.h | 108 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 181 insertions(+)
> 
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 03ac4b96117c..6baddf7dcbfd 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -1302,6 +1302,79 @@ void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
>  }
>  EXPORT_SYMBOL(dma_async_tx_descriptor_init);
>  
> +static inline int desc_check_and_set_metadata_mode(
> +	struct dma_async_tx_descriptor *desc, enum dma_desc_metadata_mode mode)
> +{
> +	/* Make sure that the metadata mode is not mixed */
> +	if (!desc->desc_metadata_mode) {
> +		if (dmaengine_is_metadata_mode_supported(desc->chan, mode))
> +			desc->desc_metadata_mode = mode;

So do we have different descriptors supporting different modes or is it
controlled based? For latter we can do this check at controller
registration!

-- 
~Vinod

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA
  2019-07-30  9:34 ` [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA Peter Ujfalusi
@ 2019-09-08 14:25   ` Vinod Koul
  2019-09-09 10:59     ` Peter Ujfalusi
  0 siblings, 1 reply; 32+ messages in thread
From: Vinod Koul @ 2019-09-08 14:25 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: robh+dt, nm, ssantosh, dan.j.williams, dmaengine,
	linux-arm-kernel, devicetree, linux-kernel, grygorii.strashko,
	lokeshvutla, t-kristo, tony, j-keerthy

On 30-07-19, 12:34, Peter Ujfalusi wrote:

> +/**
> + * Descriptor header, present in all types of descriptors
> + */
> +struct cppi5_desc_hdr_t {
> +	u32 pkt_info0;	/* Packet info word 0 (n/a in Buffer desc) */
> +	u32 pkt_info1;	/* Packet info word 1 (n/a in Buffer desc) */
> +	u32 pkt_info2;	/* Packet info word 2 Buffer reclamation info */
> +	u32 src_dst_tag; /* Packet info word 3 (n/a in Buffer desc) */

Can we move these comments to kernel-doc style please

> +/**
> + * cppi5_desc_get_type - get descriptor type
> + * @desc_hdr: packet descriptor/TR header
> + *
> + * Returns descriptor type:
> + * CPPI5_INFO0_DESC_TYPE_VAL_HOST
> + * CPPI5_INFO0_DESC_TYPE_VAL_MONO
> + * CPPI5_INFO0_DESC_TYPE_VAL_TR
> + */
> +static inline u32 cppi5_desc_get_type(struct cppi5_desc_hdr_t *desc_hdr)
> +{
> +	WARN_ON(!desc_hdr);

why WARN_ON and not return error!

> +/**
> + * cppi5_hdesc_calc_size - Calculate Host Packet Descriptor size
> + * @epib: is EPIB present
> + * @psdata_size: PSDATA size
> + * @sw_data_size: SWDATA size
> + *
> + * Returns required Host Packet Descriptor size
> + * 0 - if PSDATA > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE
> + */
> +static inline u32 cppi5_hdesc_calc_size(bool epib, u32 psdata_size,
> +					u32 sw_data_size)
> +{
> +	u32 desc_size;
> +
> +	if (psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE)
> +		return 0;
> +	//TODO_GS: align

:)

-- 
~Vinod

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver
  2019-07-30  9:34 ` [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver Peter Ujfalusi
  2019-08-30 12:57   ` Peter Ujfalusi
@ 2019-09-09  6:09   ` Tero Kristo
  2019-09-09  7:25     ` Vignesh Raghavendra
  2019-09-09 13:00     ` Peter Ujfalusi
  1 sibling, 2 replies; 32+ messages in thread
From: Tero Kristo @ 2019-09-09  6:09 UTC (permalink / raw)
  To: Peter Ujfalusi, vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, tony, j-keerthy

Hi,

Mostly some cosmetic comments below, other than that seems fine to me.

On 30/07/2019 12:34, Peter Ujfalusi wrote:
> From: Grygorii Strashko <grygorii.strashko@ti.com>
> 
> The Ring Accelerator (RINGACC or RA) provides hardware acceleration to
> enable straightforward passing of work between a producer and a consumer.
> There is one RINGACC module per NAVSS on TI AM65x SoCs.
> 
> The RINGACC converts constant-address read and write accesses to equivalent
> read or write accesses to a circular data structure in memory. The RINGACC
> eliminates the need for each DMA controller which needs to access ring
> elements from having to know the current state of the ring (base address,
> current offset). The DMA controller performs a read or write access to a
> specific address range (which maps to the source interface on the RINGACC)
> and the RINGACC replaces the address for the transaction with a new address
> which corresponds to the head or tail element of the ring (head for reads,
> tail for writes). Since the RINGACC maintains the state, multiple DMA
> controllers or channels are allowed to coherently share the same rings as
> applicable. The RINGACC is able to place data which is destined towards
> software into cached memory directly.
> 
> Supported ring modes:
> - Ring Mode
> - Messaging Mode
> - Credentials Mode
> - Queue Manager Mode
> 
> TI-SCI integration:
> 
> Texas Instrument's System Control Interface (TI-SCI) Message Protocol now
> has control over Ringacc module resources management (RM) and Rings
> configuration.
> 
> The corresponding support of TI-SCI Ringacc module RM protocol
> introduced as option through DT parameters:
> - ti,sci: phandle on TI-SCI firmware controller DT node
> - ti,sci-dev-id: TI-SCI device identifier as per TI-SCI firmware spec
> 
> if both parameters present - Ringacc driver will configure/free/reset Rings
> using TI-SCI Message Ringacc RM Protocol.
> 
> The Ringacc driver manages Rings allocation by itself now and requests
> TI-SCI firmware to allocate and configure specific Rings only. It's done
> this way because, Linux driver implements two stage Rings allocation and
> configuration (allocate ring and configure ring) while I-SCI Message

I-SCI should be TI-SCI I believe.

> Protocol supports only one combined operation (allocate+configure).
> 
> Grygorii Strashko <grygorii.strashko@ti.com>

Above seems to be missing SoB?

> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
>   drivers/soc/ti/Kconfig            |   17 +
>   drivers/soc/ti/Makefile           |    1 +
>   drivers/soc/ti/k3-ringacc.c       | 1191 +++++++++++++++++++++++++++++
>   include/linux/soc/ti/k3-ringacc.h |  262 +++++++
>   4 files changed, 1471 insertions(+)
>   create mode 100644 drivers/soc/ti/k3-ringacc.c
>   create mode 100644 include/linux/soc/ti/k3-ringacc.h
> 
> diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
> index cf545f428d03..10c76faa503e 100644
> --- a/drivers/soc/ti/Kconfig
> +++ b/drivers/soc/ti/Kconfig
> @@ -80,6 +80,23 @@ config TI_SCI_PM_DOMAINS
>   	  called ti_sci_pm_domains. Note this is needed early in boot before
>   	  rootfs may be available.
>   
> +config TI_K3_RINGACC
> +	tristate "K3 Ring accelerator Sub System"
> +	depends on ARCH_K3 || COMPILE_TEST
> +	depends on TI_SCI_INTA_IRQCHIP
> +	default y
> +	help
> +	  Say y here to support the K3 Ring accelerator module.
> +	  The Ring Accelerator (RINGACC or RA)  provides hardware acceleration
> +	  to enable straightforward passing of work between a producer
> +	  and a consumer. There is one RINGACC module per NAVSS on TI AM65x SoCs
> +	  If unsure, say N.
> +
> +config TI_K3_RINGACC_DEBUG
> +	tristate "K3 Ring accelerator Sub System tests and debug"
> +	depends on TI_K3_RINGACC
> +	default n
> +
>   endif # SOC_TI
>   
>   config TI_SCI_INTA_MSI_DOMAIN
> diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
> index b3868d392d4f..cc4bc8b08bf5 100644
> --- a/drivers/soc/ti/Makefile
> +++ b/drivers/soc/ti/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_AMX3_PM)			+= pm33xx.o
>   obj-$(CONFIG_WKUP_M3_IPC)		+= wkup_m3_ipc.o
>   obj-$(CONFIG_TI_SCI_PM_DOMAINS)		+= ti_sci_pm_domains.o
>   obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN)	+= ti_sci_inta_msi.o
> +obj-$(CONFIG_TI_K3_RINGACC)		+= k3-ringacc.o
> diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
> new file mode 100644
> index 000000000000..401dfc963319
> --- /dev/null
> +++ b/drivers/soc/ti/k3-ringacc.c
> @@ -0,0 +1,1191 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * TI K3 NAVSS Ring Accelerator subsystem driver
> + *
> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
> + */
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/io.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/soc/ti/k3-ringacc.h>
> +#include <linux/soc/ti/ti_sci_protocol.h>
> +#include <linux/soc/ti/ti_sci_inta_msi.h>
> +#include <linux/of_irq.h>
> +#include <linux/irqdomain.h>
> +
> +static LIST_HEAD(k3_ringacc_list);
> +static DEFINE_MUTEX(k3_ringacc_list_lock);
> +
> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
> +#define	k3_nav_dbg(dev, arg...) dev_err(dev, arg)

dev_err seems exaggeration for debug purposes, maybe just dev_info.

> +static	void dbg_writel(u32 v, void __iomem *reg)
> +{
> +	pr_err("WRITEL(32): v(%08X)-->reg(%p)\n", v, reg);

Again, maybe just pr_info.

> +	writel(v, reg);
> +}
> +
> +static	u32 dbg_readl(void __iomem *reg)
> +{
> +	u32 v;
> +
> +	v = readl(reg);
> +	pr_err("READL(32): v(%08X)<--reg(%p)\n", v, reg);
> +	return v;
> +}
> +#else
> +#define	k3_nav_dbg(dev, arg...) dev_dbg(dev, arg)
> +#define dbg_writel(v, reg) writel(v, reg)

Do you need to use hard writel, writel_relaxed is not enough?

> +
> +#define dbg_readl(reg) readl(reg)

Same as above but for read?

> +#endif
> +
> +#define K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK		GENMASK(19, 0)
> +
> +/**
> + * struct k3_ring_rt_regs -  The RA Control/Status Registers region
> + */
> +struct k3_ring_rt_regs {
> +	u32	resv_16[4];
> +	u32	db;		/* RT Ring N Doorbell Register */
> +	u32	resv_4[1];
> +	u32	occ;		/* RT Ring N Occupancy Register */
> +	u32	indx;		/* RT Ring N Current Index Register */
> +	u32	hwocc;		/* RT Ring N Hardware Occupancy Register */
> +	u32	hwindx;		/* RT Ring N Current Index Register */
> +};
> +
> +#define K3_RINGACC_RT_REGS_STEP	0x1000
> +
> +/**
> + * struct k3_ring_fifo_regs -  The Ring Accelerator Queues Registers region
> + */
> +struct k3_ring_fifo_regs {
> +	u32	head_data[128];		/* Ring Head Entry Data Registers */
> +	u32	tail_data[128];		/* Ring Tail Entry Data Registers */
> +	u32	peek_head_data[128];	/* Ring Peek Head Entry Data Regs */
> +	u32	peek_tail_data[128];	/* Ring Peek Tail Entry Data Regs */
> +};
> +
> +/**
> + * struct k3_ringacc_proxy_gcfg_regs - RA Proxy Global Config MMIO Region
> + */
> +struct k3_ringacc_proxy_gcfg_regs {
> +	u32	revision;	/* Revision Register */
> +	u32	config;		/* Config Register */
> +};
> +
> +#define K3_RINGACC_PROXY_CFG_THREADS_MASK		GENMASK(15, 0)
> +
> +/**
> + * struct k3_ringacc_proxy_target_regs -  Proxy Datapath MMIO Region
> + */
> +struct k3_ringacc_proxy_target_regs {
> +	u32	control;	/* Proxy Control Register */
> +	u32	status;		/* Proxy Status Register */
> +	u8	resv_512[504];
> +	u32	data[128];	/* Proxy Data Register */
> +};
> +
> +#define K3_RINGACC_PROXY_TARGET_STEP	0x1000
> +#define K3_RINGACC_PROXY_NOT_USED	(-1)
> +
> +enum k3_ringacc_proxy_access_mode {
> +	PROXY_ACCESS_MODE_HEAD = 0,
> +	PROXY_ACCESS_MODE_TAIL = 1,
> +	PROXY_ACCESS_MODE_PEEK_HEAD = 2,
> +	PROXY_ACCESS_MODE_PEEK_TAIL = 3,
> +};
> +
> +#define K3_RINGACC_FIFO_WINDOW_SIZE_BYTES  (512U)
> +#define K3_RINGACC_FIFO_REGS_STEP	0x1000
> +#define K3_RINGACC_MAX_DB_RING_CNT    (127U)
> +
> +/**
> + * struct k3_ring_ops -  Ring operations
> + */
> +struct k3_ring_ops {
> +	int (*push_tail)(struct k3_ring *ring, void *elm);
> +	int (*push_head)(struct k3_ring *ring, void *elm);
> +	int (*pop_tail)(struct k3_ring *ring, void *elm);
> +	int (*pop_head)(struct k3_ring *ring, void *elm);
> +};
> +
> +/**
> + * struct k3_ring - RA Ring descriptor
> + *
> + * @rt - Ring control/status registers
> + * @fifos - Ring queues registers
> + * @proxy - Ring Proxy Datapath registers
> + * @ring_mem_dma - Ring buffer dma address
> + * @ring_mem_virt - Ring buffer virt address
> + * @ops - Ring operations
> + * @size - Ring size in elements
> + * @elm_size - Size of the ring element
> + * @mode - Ring mode
> + * @flags - flags
> + * @free - Number of free elements
> + * @occ - Ring occupancy
> + * @windex - Write index (only for @K3_RINGACC_RING_MODE_RING)
> + * @rindex - Read index (only for @K3_RINGACC_RING_MODE_RING)
> + * @ring_id - Ring Id
> + * @parent - Pointer on struct @k3_ringacc
> + * @use_count - Use count for shared rings
> + * @proxy_id - RA Ring Proxy Id (only if @K3_RINGACC_RING_USE_PROXY)
> + */
> +struct k3_ring {
> +	struct k3_ring_rt_regs __iomem *rt;
> +	struct k3_ring_fifo_regs __iomem *fifos;
> +	struct k3_ringacc_proxy_target_regs  __iomem *proxy;
> +	dma_addr_t	ring_mem_dma;
> +	void		*ring_mem_virt;
> +	struct k3_ring_ops *ops;
> +	u32		size;
> +	enum k3_ring_size elm_size;
> +	enum k3_ring_mode mode;
> +	u32		flags;
> +#define K3_RING_FLAG_BUSY	BIT(1)
> +#define K3_RING_FLAG_SHARED	BIT(2)
> +	u32		free;
> +	u32		occ;
> +	u32		windex;
> +	u32		rindex;
> +	u32		ring_id;
> +	struct k3_ringacc	*parent;
> +	u32		use_count;
> +	int		proxy_id;
> +};
> +
> +/**
> + * struct k3_ringacc - Rings accelerator descriptor
> + *
> + * @dev - pointer on RA device
> + * @proxy_gcfg - RA proxy global config registers
> + * @proxy_target_base - RA proxy datapath region
> + * @num_rings - number of ring in RA
> + * @rm_gp_range - general purpose rings range from tisci
> + * @dma_ring_reset_quirk - DMA reset w/a enable
> + * @num_proxies - number of RA proxies
> + * @rings - array of rings descriptors (struct @k3_ring)
> + * @list - list of RAs in the system
> + * @tisci - pointer ti-sci handle
> + * @tisci_ring_ops - ti-sci rings ops
> + * @tisci_dev_id - ti-sci device id
> + */
> +struct k3_ringacc {
> +	struct device *dev;
> +	struct k3_ringacc_proxy_gcfg_regs __iomem *proxy_gcfg;
> +	void __iomem *proxy_target_base;
> +	u32 num_rings; /* number of rings in Ringacc module */
> +	unsigned long *rings_inuse;
> +	struct ti_sci_resource *rm_gp_range;
> +
> +	bool dma_ring_reset_quirk;
> +	u32 num_proxies;
> +	unsigned long *proxy_inuse;

proxy_inuse is not documented above.

> +
> +	struct k3_ring *rings;
> +	struct list_head list;
> +	struct mutex req_lock; /* protect rings allocation */
> +
> +	const struct ti_sci_handle *tisci;
> +	const struct ti_sci_rm_ringacc_ops *tisci_ring_ops;
> +	u32  tisci_dev_id;
> +};
> +
> +static long k3_ringacc_ring_get_fifo_pos(struct k3_ring *ring)
> +{
> +	return K3_RINGACC_FIFO_WINDOW_SIZE_BYTES -
> +	       (4 << ring->elm_size);
> +}
> +
> +static void *k3_ringacc_get_elm_addr(struct k3_ring *ring, u32 idx)
> +{
> +	return (idx * (4 << ring->elm_size) + ring->ring_mem_virt);

The arithmetic here seems backwards compared to most other code I've 
seen. It would be more readable if you have it like:

ring->ring_mem_virt + idx * (4 << ring->elm_size);

> +}
> +
> +static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem);
> +static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem);
> +
> +static struct k3_ring_ops k3_ring_mode_ring_ops = {
> +		.push_tail = k3_ringacc_ring_push_mem,
> +		.pop_head = k3_ringacc_ring_pop_mem,
> +};
> +
> +static int k3_ringacc_ring_push_io(struct k3_ring *ring, void *elem);
> +static int k3_ringacc_ring_pop_io(struct k3_ring *ring, void *elem);
> +static int k3_ringacc_ring_push_head_io(struct k3_ring *ring, void *elem);
> +static int k3_ringacc_ring_pop_tail_io(struct k3_ring *ring, void *elem);
> +
> +static struct k3_ring_ops k3_ring_mode_msg_ops = {
> +		.push_tail = k3_ringacc_ring_push_io,
> +		.push_head = k3_ringacc_ring_push_head_io,
> +		.pop_tail = k3_ringacc_ring_pop_tail_io,
> +		.pop_head = k3_ringacc_ring_pop_io,
> +};
> +
> +static int k3_ringacc_ring_push_head_proxy(struct k3_ring *ring, void *elem);
> +static int k3_ringacc_ring_push_tail_proxy(struct k3_ring *ring, void *elem);
> +static int k3_ringacc_ring_pop_head_proxy(struct k3_ring *ring, void *elem);
> +static int k3_ringacc_ring_pop_tail_proxy(struct k3_ring *ring, void *elem);
> +
> +static struct k3_ring_ops k3_ring_mode_proxy_ops = {
> +		.push_tail = k3_ringacc_ring_push_tail_proxy,
> +		.push_head = k3_ringacc_ring_push_head_proxy,
> +		.pop_tail = k3_ringacc_ring_pop_tail_proxy,
> +		.pop_head = k3_ringacc_ring_pop_head_proxy,
> +};
> +
> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
> +void k3_ringacc_ring_dump(struct k3_ring *ring)
> +{
> +	struct device *dev = ring->parent->dev;
> +
> +	k3_nav_dbg(dev, "dump ring: %d\n", ring->ring_id);
> +	k3_nav_dbg(dev, "dump mem virt %p, dma %pad\n",
> +		   ring->ring_mem_virt, &ring->ring_mem_dma);
> +	k3_nav_dbg(dev, "dump elmsize %d, size %d, mode %d, proxy_id %d\n",
> +		   ring->elm_size, ring->size, ring->mode, ring->proxy_id);
> +
> +	k3_nav_dbg(dev, "dump ring_rt_regs: db%08x\n",
> +		   readl(&ring->rt->db));

Why not use readl_relaxed in this func?

> +	k3_nav_dbg(dev, "dump occ%08x\n",
> +		   readl(&ring->rt->occ));
> +	k3_nav_dbg(dev, "dump indx%08x\n",
> +		   readl(&ring->rt->indx));
> +	k3_nav_dbg(dev, "dump hwocc%08x\n",
> +		   readl(&ring->rt->hwocc));
> +	k3_nav_dbg(dev, "dump hwindx%08x\n",
> +		   readl(&ring->rt->hwindx));
> +
> +	if (ring->ring_mem_virt)
> +		print_hex_dump(KERN_ERR, "dump ring_mem_virt ",
> +			       DUMP_PREFIX_NONE, 16, 1,
> +			       ring->ring_mem_virt, 16 * 8, false);
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_dump);

Do you really need to export a debug function?

> +#endif
> +
> +struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
> +					int id, u32 flags)
> +{
> +	int proxy_id = K3_RINGACC_PROXY_NOT_USED;
> +
> +	mutex_lock(&ringacc->req_lock);
> +
> +	if (id == K3_RINGACC_RING_ID_ANY) {
> +		/* Request for any general purpose ring */
> +		struct ti_sci_resource_desc *gp_rings =
> +						&ringacc->rm_gp_range->desc[0];
> +		unsigned long size;
> +
> +		size = gp_rings->start + gp_rings->num;
> +		id = find_next_zero_bit(ringacc->rings_inuse, size,
> +					gp_rings->start);
> +		if (id == size)
> +			goto error;
> +	} else if (id < 0) {
> +		goto error;
> +	}
> +
> +	if (test_bit(id, ringacc->rings_inuse) &&
> +	    !(ringacc->rings[id].flags & K3_RING_FLAG_SHARED))
> +		goto error;
> +	else if (ringacc->rings[id].flags & K3_RING_FLAG_SHARED)
> +		goto out;
> +
> +	if (flags & K3_RINGACC_RING_USE_PROXY) {
> +		proxy_id = find_next_zero_bit(ringacc->proxy_inuse,
> +					      ringacc->num_proxies, 0);
> +		if (proxy_id == ringacc->num_proxies)
> +			goto error;
> +	}
> +
> +	if (!try_module_get(ringacc->dev->driver->owner))
> +		goto error;
> +
> +	if (proxy_id != K3_RINGACC_PROXY_NOT_USED) {
> +		set_bit(proxy_id, ringacc->proxy_inuse);
> +		ringacc->rings[id].proxy_id = proxy_id;
> +		k3_nav_dbg(ringacc->dev, "Giving ring#%d proxy#%d\n",
> +			   id, proxy_id);
> +	} else {
> +		k3_nav_dbg(ringacc->dev, "Giving ring#%d\n", id);
> +	}
> +
> +	set_bit(id, ringacc->rings_inuse);
> +out:
> +	ringacc->rings[id].use_count++;
> +	mutex_unlock(&ringacc->req_lock);
> +	return &ringacc->rings[id];
> +
> +error:
> +	mutex_unlock(&ringacc->req_lock);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_request_ring);
> +
> +static void k3_ringacc_ring_reset_sci(struct k3_ring *ring)
> +{
> +	struct k3_ringacc *ringacc = ring->parent;
> +	int ret;
> +
> +	ret = ringacc->tisci_ring_ops->config(
> +			ringacc->tisci,
> +			TI_SCI_MSG_VALUE_RM_RING_COUNT_VALID,
> +			ringacc->tisci_dev_id,
> +			ring->ring_id,
> +			0,
> +			0,
> +			ring->size,
> +			0,
> +			0,
> +			0);
> +	if (ret)
> +		dev_err(ringacc->dev, "TISCI reset ring fail (%d) ring_idx %d\n",
> +			ret, ring->ring_id);

Return value of sci ops is masked, why not return it and let the caller 
handle it properly?

Same comment for anything similar that follows.

> +}
> +
> +void k3_ringacc_ring_reset(struct k3_ring *ring)
> +{
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return;
> +
> +	ring->occ = 0;
> +	ring->free = 0;
> +	ring->rindex = 0;
> +	ring->windex = 0;
> +
> +	k3_ringacc_ring_reset_sci(ring);
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset);
> +
> +static void k3_ringacc_ring_reconfig_qmode_sci(struct k3_ring *ring,
> +					       enum k3_ring_mode mode)
> +{
> +	struct k3_ringacc *ringacc = ring->parent;
> +	int ret;
> +
> +	ret = ringacc->tisci_ring_ops->config(
> +			ringacc->tisci,
> +			TI_SCI_MSG_VALUE_RM_RING_MODE_VALID,
> +			ringacc->tisci_dev_id,
> +			ring->ring_id,
> +			0,
> +			0,
> +			0,
> +			mode,
> +			0,
> +			0);
> +	if (ret)
> +		dev_err(ringacc->dev, "TISCI reconf qmode fail (%d) ring_idx %d\n",
> +			ret, ring->ring_id);
>+}
> +
> +void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ)
> +{
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return;
> +
> +	if (!ring->parent->dma_ring_reset_quirk)
> +		return;
> +
> +	if (!occ)
> +		occ = dbg_readl(&ring->rt->occ);
> +
> +	if (occ) {
> +		u32 db_ring_cnt, db_ring_cnt_cur;
> +
> +		k3_nav_dbg(ring->parent->dev, "%s %u occ: %u\n", __func__,
> +			   ring->ring_id, occ);
> +		/* 2. Reset the ring */

2? Where is 1?

> +		k3_ringacc_ring_reset_sci(ring);
> +
> +		/*
> +		 * 3. Setup the ring in ring/doorbell mode
> +		 * (if not already in this mode)
> +		 */
> +		if (ring->mode != K3_RINGACC_RING_MODE_RING)
> +			k3_ringacc_ring_reconfig_qmode_sci(
> +					ring, K3_RINGACC_RING_MODE_RING);
> +		/*
> +		 * 4. Ring the doorbell 2**22 – ringOcc times.
> +		 * This will wrap the internal UDMAP ring state occupancy
> +		 * counter (which is 21-bits wide) to 0.
> +		 */
> +		db_ring_cnt = (1U << 22) - occ;
> +
> +		while (db_ring_cnt != 0) {
> +			/*
> +			 * Ring the doorbell with the maximum count each
> +			 * iteration if possible to minimize the total
> +			 * of writes
> +			 */
> +			if (db_ring_cnt > K3_RINGACC_MAX_DB_RING_CNT)
> +				db_ring_cnt_cur = K3_RINGACC_MAX_DB_RING_CNT;
> +			else
> +				db_ring_cnt_cur = db_ring_cnt;
> +
> +			writel(db_ring_cnt_cur, &ring->rt->db);
> +			db_ring_cnt -= db_ring_cnt_cur;
> +		}
> +
> +		/* 5. Restore the original ring mode (if not ring mode) */
> +		if (ring->mode != K3_RINGACC_RING_MODE_RING)
> +			k3_ringacc_ring_reconfig_qmode_sci(ring, ring->mode);
> +	}
> +
> +	/* 2. Reset the ring */

Again 2?

> +	k3_ringacc_ring_reset(ring);
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset_dma);
> +
> +static void k3_ringacc_ring_free_sci(struct k3_ring *ring)
> +{
> +	struct k3_ringacc *ringacc = ring->parent;
> +	int ret;
> +
> +	ret = ringacc->tisci_ring_ops->config(
> +			ringacc->tisci,
> +			TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
> +			ringacc->tisci_dev_id,
> +			ring->ring_id,
> +			0,
> +			0,
> +			0,
> +			0,
> +			0,
> +			0);
> +	if (ret)
> +		dev_err(ringacc->dev, "TISCI ring free fail (%d) ring_idx %d\n",
> +			ret, ring->ring_id);
> +}
> +
> +int k3_ringacc_ring_free(struct k3_ring *ring)
> +{
> +	struct k3_ringacc *ringacc;
> +
> +	if (!ring)
> +		return -EINVAL;
> +
> +	ringacc = ring->parent;
> +
> +	k3_nav_dbg(ring->parent->dev, "flags: 0x%08x\n", ring->flags);
> +
> +	if (!test_bit(ring->ring_id, ringacc->rings_inuse))
> +		return -EINVAL;
> +
> +	mutex_lock(&ringacc->req_lock);
> +
> +	if (--ring->use_count)
> +		goto out;
> +
> +	if (!(ring->flags & K3_RING_FLAG_BUSY))
> +		goto no_init;
> +
> +	k3_ringacc_ring_free_sci(ring);
> +
> +	dma_free_coherent(ringacc->dev,
> +			  ring->size * (4 << ring->elm_size),
> +			  ring->ring_mem_virt, ring->ring_mem_dma);
> +	ring->flags = 0;
> +	ring->ops = NULL;
> +	if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED) {
> +		clear_bit(ring->proxy_id, ringacc->proxy_inuse);
> +		ring->proxy = NULL;
> +		ring->proxy_id = K3_RINGACC_PROXY_NOT_USED;
> +	}
> +
> +no_init:
> +	clear_bit(ring->ring_id, ringacc->rings_inuse);
> +
> +	module_put(ringacc->dev->driver->owner);
> +
> +out:
> +	mutex_unlock(&ringacc->req_lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_free);
> +
> +u32 k3_ringacc_get_ring_id(struct k3_ring *ring)
> +{
> +	if (!ring)
> +		return -EINVAL;
> +
> +	return ring->ring_id;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_id);
> +
> +u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring)
> +{
> +	if (!ring)
> +		return -EINVAL;
> +

What if parent is NULL? Can it ever be here?

> +	return ring->parent->tisci_dev_id;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_get_tisci_dev_id);
> +
> +int k3_ringacc_get_ring_irq_num(struct k3_ring *ring)
> +{
> +	int irq_num;
> +
> +	if (!ring)
> +		return -EINVAL;
> +
> +	irq_num = ti_sci_inta_msi_get_virq(ring->parent->dev, ring->ring_id);
> +	if (irq_num <= 0)
> +		irq_num = -EINVAL;
> +	return irq_num;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_irq_num);
> +
> +static int k3_ringacc_ring_cfg_sci(struct k3_ring *ring)
> +{
> +	struct k3_ringacc *ringacc = ring->parent;
> +	u32 ring_idx;
> +	int ret;
> +
> +	if (!ringacc->tisci)
> +		return -EINVAL;
> +
> +	ring_idx = ring->ring_id;
> +	ret = ringacc->tisci_ring_ops->config(
> +			ringacc->tisci,
> +			TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
> +			ringacc->tisci_dev_id,
> +			ring_idx,
> +			lower_32_bits(ring->ring_mem_dma),
> +			upper_32_bits(ring->ring_mem_dma),
> +			ring->size,
> +			ring->mode,
> +			ring->elm_size,
> +			0);
> +	if (ret)
> +		dev_err(ringacc->dev, "TISCI config ring fail (%d) ring_idx %d\n",
> +			ret, ring_idx);
> +
> +	return ret;
> +}
> +
> +int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg)
> +{
> +	struct k3_ringacc *ringacc = ring->parent;
> +	int ret = 0;
> +
> +	if (!ring || !cfg)
> +		return -EINVAL;
> +	if (cfg->elm_size > K3_RINGACC_RING_ELSIZE_256 ||
> +	    cfg->mode > K3_RINGACC_RING_MODE_QM ||
> +	    cfg->size & ~K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK ||
> +	    !test_bit(ring->ring_id, ringacc->rings_inuse))
> +		return -EINVAL;
> +
> +	if (ring->use_count != 1)

Hmm, isn't this a failure actually?

> +		return 0;
> +
> +	ring->size = cfg->size;
> +	ring->elm_size = cfg->elm_size;
> +	ring->mode = cfg->mode;
> +	ring->occ = 0;
> +	ring->free = 0;
> +	ring->rindex = 0;
> +	ring->windex = 0;
> +
> +	if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED)
> +		ring->proxy = ringacc->proxy_target_base +
> +			      ring->proxy_id * K3_RINGACC_PROXY_TARGET_STEP;
> +
> +	switch (ring->mode) {
> +	case K3_RINGACC_RING_MODE_RING:
> +		ring->ops = &k3_ring_mode_ring_ops;
> +		break;
> +	case K3_RINGACC_RING_MODE_QM:
> +		/*
> +		 * In Queue mode elm_size can be 8 only and each operation
> +		 * uses 2 element slots
> +		 */
> +		if (cfg->elm_size != K3_RINGACC_RING_ELSIZE_8 ||
> +		    cfg->size % 2)
> +			goto err_free_proxy;
> +		/* else, fall through */
> +	case K3_RINGACC_RING_MODE_MESSAGE:
> +		if (ring->proxy)
> +			ring->ops = &k3_ring_mode_proxy_ops;
> +		else
> +			ring->ops = &k3_ring_mode_msg_ops;
> +		break;
> +	default:
> +		ring->ops = NULL;
> +		ret = -EINVAL;
> +		goto err_free_proxy;
> +	};
> +
> +	ring->ring_mem_virt =
> +			dma_alloc_coherent(ringacc->dev,
> +					   ring->size * (4 << ring->elm_size),
> +					   &ring->ring_mem_dma, GFP_KERNEL);
> +	if (!ring->ring_mem_virt) {
> +		dev_err(ringacc->dev, "Failed to alloc ring mem\n");
> +		ret = -ENOMEM;
> +		goto err_free_ops;
> +	}
> +
> +	ret = k3_ringacc_ring_cfg_sci(ring);
> +
> +	if (ret)
> +		goto err_free_mem;
> +
> +	ring->flags |= K3_RING_FLAG_BUSY;
> +	ring->flags |= (cfg->flags & K3_RINGACC_RING_SHARED) ?
> +			K3_RING_FLAG_SHARED : 0;
> +
> +	k3_ringacc_ring_dump(ring);
> +
> +	return 0;
> +
> +err_free_mem:
> +	dma_free_coherent(ringacc->dev,
> +			  ring->size * (4 << ring->elm_size),
> +			  ring->ring_mem_virt,
> +			  ring->ring_mem_dma);
> +err_free_ops:
> +	ring->ops = NULL;
> +err_free_proxy:
> +	ring->proxy = NULL;
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_cfg);
> +
> +u32 k3_ringacc_ring_get_size(struct k3_ring *ring)
> +{
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return -EINVAL;
> +
> +	return ring->size;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_size);
> +
> +u32 k3_ringacc_ring_get_free(struct k3_ring *ring)
> +{
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return -EINVAL;
> +
> +	if (!ring->free)
> +		ring->free = ring->size - dbg_readl(&ring->rt->occ);
> +
> +	return ring->free;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_free);
> +
> +u32 k3_ringacc_ring_get_occ(struct k3_ring *ring)
> +{
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return -EINVAL;
> +
> +	return dbg_readl(&ring->rt->occ);
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_occ);
> +
> +u32 k3_ringacc_ring_is_full(struct k3_ring *ring)
> +{
> +	return !k3_ringacc_ring_get_free(ring);
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_is_full);
> +
> +enum k3_ringacc_access_mode {
> +	K3_RINGACC_ACCESS_MODE_PUSH_HEAD,
> +	K3_RINGACC_ACCESS_MODE_POP_HEAD,
> +	K3_RINGACC_ACCESS_MODE_PUSH_TAIL,
> +	K3_RINGACC_ACCESS_MODE_POP_TAIL,
> +	K3_RINGACC_ACCESS_MODE_PEEK_HEAD,
> +	K3_RINGACC_ACCESS_MODE_PEEK_TAIL,
> +};
> +
> +static int k3_ringacc_ring_cfg_proxy(struct k3_ring *ring,
> +				     enum k3_ringacc_proxy_access_mode mode)
> +{
> +	u32 val;
> +
> +	val = ring->ring_id;
> +	val |= mode << 16;
> +	val |= ring->elm_size << 24;

Would be nice to have these magic shifts as defines.

> +	dbg_writel(val, &ring->proxy->control);
> +	return 0;
> +}
> +
> +static int k3_ringacc_ring_access_proxy(struct k3_ring *ring, void *elem,
> +					enum k3_ringacc_access_mode access_mode)
> +{
> +	void __iomem *ptr;
> +
> +	ptr = (void __iomem *)&ring->proxy->data;
> +
> +	switch (access_mode) {
> +	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
> +	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
> +		k3_ringacc_ring_cfg_proxy(ring, PROXY_ACCESS_MODE_HEAD);
> +		break;
> +	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
> +	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
> +		k3_ringacc_ring_cfg_proxy(ring, PROXY_ACCESS_MODE_TAIL);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	ptr += k3_ringacc_ring_get_fifo_pos(ring);
> +
> +	switch (access_mode) {
> +	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
> +	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
> +		k3_nav_dbg(ring->parent->dev, "proxy:memcpy_fromio(x): --> ptr(%p), mode:%d\n",
> +			   ptr, access_mode);
> +		memcpy_fromio(elem, ptr, (4 << ring->elm_size));
> +		ring->occ--;
> +		break;
> +	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
> +	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
> +		k3_nav_dbg(ring->parent->dev, "proxy:memcpy_toio(x): --> ptr(%p), mode:%d\n",
> +			   ptr, access_mode);
> +		memcpy_toio(ptr, elem, (4 << ring->elm_size));
> +		ring->free--;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	k3_nav_dbg(ring->parent->dev, "proxy: free%d occ%d\n",
> +		   ring->free, ring->occ);
> +	return 0;
> +}
> +
> +static int k3_ringacc_ring_push_head_proxy(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_proxy(ring, elem,
> +					    K3_RINGACC_ACCESS_MODE_PUSH_HEAD);
> +}
> +
> +static int k3_ringacc_ring_push_tail_proxy(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_proxy(ring, elem,
> +					    K3_RINGACC_ACCESS_MODE_PUSH_TAIL);
> +}
> +
> +static int k3_ringacc_ring_pop_head_proxy(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_proxy(ring, elem,
> +					    K3_RINGACC_ACCESS_MODE_POP_HEAD);
> +}
> +
> +static int k3_ringacc_ring_pop_tail_proxy(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_proxy(ring, elem,
> +					    K3_RINGACC_ACCESS_MODE_POP_HEAD);
> +}
> +
> +static int k3_ringacc_ring_access_io(struct k3_ring *ring, void *elem,
> +				     enum k3_ringacc_access_mode access_mode)
> +{
> +	void __iomem *ptr;
> +
> +	switch (access_mode) {
> +	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
> +	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
> +		ptr = (void __iomem *)&ring->fifos->head_data;
> +		break;
> +	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
> +	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
> +		ptr = (void __iomem *)&ring->fifos->tail_data;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	ptr += k3_ringacc_ring_get_fifo_pos(ring);
> +
> +	switch (access_mode) {
> +	case K3_RINGACC_ACCESS_MODE_POP_HEAD:
> +	case K3_RINGACC_ACCESS_MODE_POP_TAIL:
> +		k3_nav_dbg(ring->parent->dev, "memcpy_fromio(x): --> ptr(%p), mode:%d\n",
> +			   ptr, access_mode);
> +		memcpy_fromio(elem, ptr, (4 << ring->elm_size));
> +		ring->occ--;
> +		break;
> +	case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
> +	case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
> +		k3_nav_dbg(ring->parent->dev, "memcpy_toio(x): --> ptr(%p), mode:%d\n",
> +			   ptr, access_mode);
> +		memcpy_toio(ptr, elem, (4 << ring->elm_size));
> +		ring->free--;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	k3_nav_dbg(ring->parent->dev, "free%d index%d occ%d index%d\n",
> +		   ring->free, ring->windex, ring->occ, ring->rindex);
> +	return 0;
> +}
> +
> +static int k3_ringacc_ring_push_head_io(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_io(ring, elem,
> +					 K3_RINGACC_ACCESS_MODE_PUSH_HEAD);
> +}
> +
> +static int k3_ringacc_ring_push_io(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_io(ring, elem,
> +					 K3_RINGACC_ACCESS_MODE_PUSH_TAIL);
> +}
> +
> +static int k3_ringacc_ring_pop_io(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_io(ring, elem,
> +					 K3_RINGACC_ACCESS_MODE_POP_HEAD);
> +}
> +
> +static int k3_ringacc_ring_pop_tail_io(struct k3_ring *ring, void *elem)
> +{
> +	return k3_ringacc_ring_access_io(ring, elem,
> +					 K3_RINGACC_ACCESS_MODE_POP_HEAD);
> +}
> +
> +static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem)
> +{
> +	void *elem_ptr;
> +
> +	elem_ptr = k3_ringacc_get_elm_addr(ring, ring->windex);
> +
> +	memcpy(elem_ptr, elem, (4 << ring->elm_size));
> +
> +	ring->windex = (ring->windex + 1) % ring->size;
> +	ring->free--;
> +	dbg_writel(1, &ring->rt->db);
> +
> +	k3_nav_dbg(ring->parent->dev, "ring_push_mem: free%d index%d\n",
> +		   ring->free, ring->windex);
> +
> +	return 0;
> +}
> +
> +static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem)
> +{
> +	void *elem_ptr;
> +
> +	elem_ptr = k3_ringacc_get_elm_addr(ring, ring->rindex);
> +
> +	memcpy(elem, elem_ptr, (4 << ring->elm_size));
> +
> +	ring->rindex = (ring->rindex + 1) % ring->size;
> +	ring->occ--;
> +	dbg_writel(-1, &ring->rt->db);
> +
> +	k3_nav_dbg(ring->parent->dev, "ring_pop_mem: occ%d index%d pos_ptr%p\n",
> +		   ring->occ, ring->rindex, elem_ptr);
> +	return 0;
> +}
> +
> +int k3_ringacc_ring_push(struct k3_ring *ring, void *elem)
> +{
> +	int ret = -EOPNOTSUPP;
> +
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return -EINVAL;
> +
> +	k3_nav_dbg(ring->parent->dev, "ring_push: free%d index%d\n",
> +		   ring->free, ring->windex);
> +
> +	if (k3_ringacc_ring_is_full(ring))
> +		return -ENOMEM;
> +
> +	if (ring->ops && ring->ops->push_tail)
> +		ret = ring->ops->push_tail(ring, elem);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_push);
> +
> +int k3_ringacc_ring_push_head(struct k3_ring *ring, void *elem)
> +{
> +	int ret = -EOPNOTSUPP;
> +
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return -EINVAL;
> +
> +	k3_nav_dbg(ring->parent->dev, "ring_push_head: free%d index%d\n",
> +		   ring->free, ring->windex);
> +
> +	if (k3_ringacc_ring_is_full(ring))
> +		return -ENOMEM;
> +
> +	if (ring->ops && ring->ops->push_head)
> +		ret = ring->ops->push_head(ring, elem);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_push_head);
> +
> +int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem)
> +{
> +	int ret = -EOPNOTSUPP;
> +
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return -EINVAL;
> +
> +	if (!ring->occ)
> +		ring->occ = k3_ringacc_ring_get_occ(ring);
> +
> +	k3_nav_dbg(ring->parent->dev, "ring_pop: occ%d index%d\n",
> +		   ring->occ, ring->rindex);
> +
> +	if (!ring->occ)
> +		return -ENODATA;
> +
> +	if (ring->ops && ring->ops->pop_head)
> +		ret = ring->ops->pop_head(ring, elem);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop);
> +
> +int k3_ringacc_ring_pop_tail(struct k3_ring *ring, void *elem)
> +{
> +	int ret = -EOPNOTSUPP;
> +
> +	if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
> +		return -EINVAL;
> +
> +	if (!ring->occ)
> +		ring->occ = k3_ringacc_ring_get_occ(ring);
> +
> +	k3_nav_dbg(ring->parent->dev, "ring_pop_tail: occ%d index%d\n",
> +		   ring->occ, ring->rindex);
> +
> +	if (!ring->occ)
> +		return -ENODATA;
> +
> +	if (ring->ops && ring->ops->pop_tail)
> +		ret = ring->ops->pop_tail(ring, elem);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop_tail);
> +
> +struct k3_ringacc *of_k3_ringacc_get_by_phandle(struct device_node *np,
> +						const char *property)
> +{
> +	struct device_node *ringacc_np;
> +	struct k3_ringacc *ringacc = ERR_PTR(-EPROBE_DEFER);
> +	struct k3_ringacc *entry;
> +
> +	ringacc_np = of_parse_phandle(np, property, 0);
> +	if (!ringacc_np)
> +		return ERR_PTR(-ENODEV);
> +
> +	mutex_lock(&k3_ringacc_list_lock);
> +	list_for_each_entry(entry, &k3_ringacc_list, list)
> +		if (entry->dev->of_node == ringacc_np) {
> +			ringacc = entry;
> +			break;
> +		}
> +	mutex_unlock(&k3_ringacc_list_lock);
> +	of_node_put(ringacc_np);
> +
> +	return ringacc;
> +}
> +EXPORT_SYMBOL_GPL(of_k3_ringacc_get_by_phandle);
> +
> +static int k3_ringacc_probe_dt(struct k3_ringacc *ringacc)
> +{
> +	struct device_node *node = ringacc->dev->of_node;
> +	struct device *dev = ringacc->dev;
> +	struct platform_device *pdev = to_platform_device(dev);
> +	int ret;
> +
> +	if (!node) {
> +		dev_err(dev, "device tree info unavailable\n");
> +		return -ENODEV;
> +	}
> +
> +	ret = of_property_read_u32(node, "ti,num-rings", &ringacc->num_rings);
> +	if (ret) {
> +		dev_err(dev, "ti,num-rings read failure %d\n", ret);
> +		return ret;
> +	}
> +
> +	ringacc->dma_ring_reset_quirk =
> +			of_property_read_bool(node, "ti,dma-ring-reset-quirk");
> +
> +	ringacc->tisci = ti_sci_get_by_phandle(node, "ti,sci");
> +	if (IS_ERR(ringacc->tisci)) {
> +		ret = PTR_ERR(ringacc->tisci);
> +		if (ret != -EPROBE_DEFER)
> +			dev_err(dev, "ti,sci read fail %d\n", ret);
> +		ringacc->tisci = NULL;
> +		return ret;
> +	}
> +
> +	ret = of_property_read_u32(node, "ti,sci-dev-id",
> +				   &ringacc->tisci_dev_id);
> +	if (ret) {
> +		dev_err(dev, "ti,sci-dev-id read fail %d\n", ret);
> +		return ret;
> +	}
> +
> +	pdev->id = ringacc->tisci_dev_id;
> +
> +	ringacc->rm_gp_range = devm_ti_sci_get_of_resource(ringacc->tisci, dev,
> +						ringacc->tisci_dev_id,
> +						"ti,sci-rm-range-gp-rings");
> +	if (IS_ERR(ringacc->rm_gp_range)) {
> +		dev_err(dev, "Failed to allocate MSI interrupts\n");
> +		return PTR_ERR(ringacc->rm_gp_range);
> +	}
> +
> +	return ti_sci_inta_msi_domain_alloc_irqs(ringacc->dev,
> +						 ringacc->rm_gp_range);
> +}
> +
> +static int k3_ringacc_probe(struct platform_device *pdev)
> +{
> +	struct k3_ringacc *ringacc;
> +	void __iomem *base_fifo, *base_rt;
> +	struct device *dev = &pdev->dev;
> +	struct resource *res;
> +	int ret, i;
> +
> +	ringacc = devm_kzalloc(dev, sizeof(*ringacc), GFP_KERNEL);
> +	if (!ringacc)
> +		return -ENOMEM;
> +
> +	ringacc->dev = dev;
> +	mutex_init(&ringacc->req_lock);
> +
> +	dev->msi_domain = of_msi_get_domain(dev, dev->of_node,
> +					    DOMAIN_BUS_TI_SCI_INTA_MSI);
> +	if (!dev->msi_domain) {
> +		dev_err(dev, "Failed to get MSI domain\n");
> +		return -EPROBE_DEFER;
> +	}
> +
> +	ret = k3_ringacc_probe_dt(ringacc);
> +	if (ret)
> +		return ret;
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "rt");
> +	base_rt = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(base_rt))
> +		return PTR_ERR(base_rt);
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "fifos");
> +	base_fifo = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(base_fifo))
> +		return PTR_ERR(base_fifo);
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "proxy_gcfg");
> +	ringacc->proxy_gcfg = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(ringacc->proxy_gcfg))
> +		return PTR_ERR(ringacc->proxy_gcfg);
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> +					   "proxy_target");
> +	ringacc->proxy_target_base = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(ringacc->proxy_target_base))
> +		return PTR_ERR(ringacc->proxy_target_base);
> +
> +	ringacc->num_proxies = dbg_readl(&ringacc->proxy_gcfg->config) &
> +					 K3_RINGACC_PROXY_CFG_THREADS_MASK;
> +
> +	ringacc->rings = devm_kzalloc(dev,
> +				      sizeof(*ringacc->rings) *
> +				      ringacc->num_rings,
> +				      GFP_KERNEL);
> +	ringacc->rings_inuse = devm_kcalloc(dev,
> +					    BITS_TO_LONGS(ringacc->num_rings),
> +					    sizeof(unsigned long), GFP_KERNEL);
> +	ringacc->proxy_inuse = devm_kcalloc(dev,
> +					    BITS_TO_LONGS(ringacc->num_proxies),
> +					    sizeof(unsigned long), GFP_KERNEL);
> +
> +	if (!ringacc->rings || !ringacc->rings_inuse || !ringacc->proxy_inuse)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < ringacc->num_rings; i++) {
> +		ringacc->rings[i].rt = base_rt +
> +				       K3_RINGACC_RT_REGS_STEP * i;
> +		ringacc->rings[i].fifos = base_fifo +
> +					  K3_RINGACC_FIFO_REGS_STEP * i;
> +		ringacc->rings[i].parent = ringacc;
> +		ringacc->rings[i].ring_id = i;
> +		ringacc->rings[i].proxy_id = K3_RINGACC_PROXY_NOT_USED;
> +	}
> +	dev_set_drvdata(dev, ringacc);
> +
> +	ringacc->tisci_ring_ops = &ringacc->tisci->ops.rm_ring_ops;
> +
> +	pm_runtime_enable(dev);
> +	ret = pm_runtime_get_sync(dev);
> +	if (ret < 0) {
> +		pm_runtime_put_noidle(dev);
> +		dev_err(dev, "Failed to enable pm %d\n", ret);
> +		goto err;
> +	}
> +
> +	mutex_lock(&k3_ringacc_list_lock);
> +	list_add_tail(&ringacc->list, &k3_ringacc_list);
> +	mutex_unlock(&k3_ringacc_list_lock);
> +
> +	dev_info(dev, "Ring Accelerator probed rings:%u, gp-rings[%u,%u] sci-dev-id:%u\n",
> +		 ringacc->num_rings,
> +		 ringacc->rm_gp_range->desc[0].start,
> +		 ringacc->rm_gp_range->desc[0].num,
> +		 ringacc->tisci_dev_id);
> +	dev_info(dev, "dma-ring-reset-quirk: %s\n",
> +		 ringacc->dma_ring_reset_quirk ? "enabled" : "disabled");
> +	dev_info(dev, "RA Proxy rev. %08x, num_proxies:%u\n",
> +		 dbg_readl(&ringacc->proxy_gcfg->revision),
> +		 ringacc->num_proxies);
> +	return 0;
> +
> +err:
> +	pm_runtime_disable(dev);
> +	return ret;
> +}
> +
> +static int k3_ringacc_remove(struct platform_device *pdev)
> +{
> +	struct k3_ringacc *ringacc = dev_get_drvdata(&pdev->dev);
> +
> +	pm_runtime_put_sync(&pdev->dev);
> +	pm_runtime_disable(&pdev->dev);
> +
> +	mutex_lock(&k3_ringacc_list_lock);
> +	list_del(&ringacc->list);
> +	mutex_unlock(&k3_ringacc_list_lock);
> +	return 0;
> +}
> +
> +/* Match table for of_platform binding */
> +static const struct of_device_id k3_ringacc_of_match[] = {
> +	{ .compatible = "ti,am654-navss-ringacc", },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, k3_ringacc_of_match);
> +
> +static struct platform_driver k3_ringacc_driver = {
> +	.probe		= k3_ringacc_probe,
> +	.remove		= k3_ringacc_remove,
> +	.driver		= {
> +		.name	= "k3-ringacc",
> +		.of_match_table = k3_ringacc_of_match,
> +	},
> +};
> +module_platform_driver(k3_ringacc_driver);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_DESCRIPTION("TI Ringacc driver for K3 SOCs");
> +MODULE_AUTHOR("Grygorii Strashko <grygorii.strashko@ti.com>");
> diff --git a/include/linux/soc/ti/k3-ringacc.h b/include/linux/soc/ti/k3-ringacc.h
> new file mode 100644
> index 000000000000..debffba48ac9
> --- /dev/null
> +++ b/include/linux/soc/ti/k3-ringacc.h
> @@ -0,0 +1,262 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * K3 Ring Accelerator (RA) subsystem interface
> + *
> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
> + */
> +
> +#ifndef __SOC_TI_K3_RINGACC_API_H_
> +#define __SOC_TI_K3_RINGACC_API_H_
> +
> +#include <linux/types.h>
> +
> +struct device_node;
> +
> +/**
> + * enum k3_ring_mode - &struct k3_ring_cfg mode
> + *
> + * RA ring operational modes
> + *
> + * @K3_RINGACC_RING_MODE_RING: Exposed Ring mode for SW direct access
> + * @K3_RINGACC_RING_MODE_MESSAGE: Messaging mode. Messaging mode requires
> + *	that all accesses to the queue must go through this IP so that all
> + *	accesses to the memory are controlled and ordered. This IP then
> + *	controls the entire state of the queue, and SW has no directly control,
> + *	such as through doorbells and cannot access the storage memory directly.
> + *	This is particularly useful when more than one SW or HW entity can be
> + *	the producer and/or consumer at the same time
> + * @K3_RINGACC_RING_MODE_CREDENTIALS: Credentials mode is message mode plus
> + *	stores credentials with each message, requiring the element size to be
> + *	doubled to fit the credentials. Any exposed memory should be protected
> + *	by a firewall from unwanted access
> + * @K3_RINGACC_RING_MODE_QM:  Queue manager mode. This takes the credentials
> + *	mode and adds packet length per element, along with additional read only
> + *	fields for element count and accumulated queue length. The QM mode only
> + *	operates with an 8 byte element size (any other element size is
> + *	illegal), and like in credentials mode each operation uses 2 element
> + *	slots to store the credentials and length fields
> + */
> +enum k3_ring_mode {
> +	K3_RINGACC_RING_MODE_RING = 0,
> +	K3_RINGACC_RING_MODE_MESSAGE,
> +	K3_RINGACC_RING_MODE_CREDENTIALS,
> +	K3_RINGACC_RING_MODE_QM,
> +	K3_RINGACC_RING_MODE_INVALID
> +};
> +
> +/**
> + * enum k3_ring_size - &struct k3_ring_cfg elm_size
> + *
> + * RA ring element's sizes in bytes.
> + */
> +enum k3_ring_size {
> +	K3_RINGACC_RING_ELSIZE_4 = 0,
> +	K3_RINGACC_RING_ELSIZE_8,
> +	K3_RINGACC_RING_ELSIZE_16,
> +	K3_RINGACC_RING_ELSIZE_32,
> +	K3_RINGACC_RING_ELSIZE_64,
> +	K3_RINGACC_RING_ELSIZE_128,
> +	K3_RINGACC_RING_ELSIZE_256,
> +	K3_RINGACC_RING_ELSIZE_INVALID
> +};
> +
> +struct k3_ringacc;
> +struct k3_ring;
> +
> +/**
> + * enum k3_ring_cfg - RA ring configuration structure
> + *
> + * @size: Ring size, number of elements
> + * @elm_size: Ring element size
> + * @mode: Ring operational mode
> + * @flags: Ring configuration flags. Possible values:
> + *	 @K3_RINGACC_RING_SHARED: when set allows to request the same ring
> + *	 few times. It's usable when the same ring is used as Free Host PD ring
> + *	 for different flows, for example.
> + *	 Note: Locking should be done by consumer if required
> + */
> +struct k3_ring_cfg {
> +	u32 size;
> +	enum k3_ring_size elm_size;
> +	enum k3_ring_mode mode;
> +#define K3_RINGACC_RING_SHARED BIT(1)
> +	u32 flags;
> +};
> +
> +#define K3_RINGACC_RING_ID_ANY (-1)
> +
> +/**
> + * of_k3_ringacc_get_by_phandle - find a RA by phandle property
> + * @np: device node
> + * @propname: property name containing phandle on RA node
> + *
> + * Returns pointer on the RA - struct k3_ringacc
> + * or -ENODEV if not found,
> + * or -EPROBE_DEFER if not yet registered
> + */
> +struct k3_ringacc *of_k3_ringacc_get_by_phandle(struct device_node *np,
> +						const char *property);
> +
> +#define K3_RINGACC_RING_USE_PROXY BIT(1)
> +
> +/**
> + * k3_ringacc_request_ring - request ring from ringacc
> + * @ringacc: pointer on ringacc
> + * @id: ring id or K3_RINGACC_RING_ID_ANY for any general purpose ring
> + * @flags:
> + *	@K3_RINGACC_RING_USE_PROXY: if set - proxy will be allocated and
> + *		used to access ring memory. Sopported only for rings in
> + *		Message/Credentials/Queue mode.
> + *
> + * Returns pointer on the Ring - struct k3_ring
> + * or NULL in case of failure.
> + */
> +struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
> +					int id, u32 flags);
> +
> +/**
> + * k3_ringacc_ring_reset - ring reset
> + * @ring: pointer on Ring
> + *
> + * Resets ring internal state ((hw)occ, (hw)idx).
> + * TODO_GS: ? Ring can be reused without reconfiguration
> + */
> +void k3_ringacc_ring_reset(struct k3_ring *ring);
> +/**
> + * k3_ringacc_ring_reset - ring reset for DMA rings
> + * @ring: pointer on Ring
> + *
> + * Resets ring internal state ((hw)occ, (hw)idx). Should be used for rings
> + * which are read by K3 UDMA, like TX or Free Host PD rings.
> + */
> +void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ);
> +
> +/**
> + * k3_ringacc_ring_free - ring free
> + * @ring: pointer on Ring
> + *
> + * Resets ring and free all alocated resources.
> + */
> +int k3_ringacc_ring_free(struct k3_ring *ring);
> +
> +/**
> + * k3_ringacc_get_ring_id - Get the Ring ID
> + * @ring: pointer on ring
> + *
> + * Returns the Ring ID
> + */
> +u32 k3_ringacc_get_ring_id(struct k3_ring *ring);
> +
> +/**
> + * k3_ringacc_get_ring_irq_num - Get the irq number for the ring
> + * @ring: pointer on ring
> + *
> + * Returns the interrupt number which can be used to request the interrupt
> + */
> +int k3_ringacc_get_ring_irq_num(struct k3_ring *ring);
> +
> +/**
> + * k3_ringacc_ring_cfg - ring configure
> + * @ring: pointer on ring
> + * @cfg: Ring configuration parameters (see &struct k3_ring_cfg)
> + *
> + * Configures ring, including ring memory allocation.
> + * Returns 0 on success, errno otherwise.
> + */
> +int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg);
> +
> +/**
> + * k3_ringacc_ring_get_size - get ring size
> + * @ring: pointer on ring
> + *
> + * Returns ring size in number of elements.
> + */
> +u32 k3_ringacc_ring_get_size(struct k3_ring *ring);
> +
> +/**
> + * k3_ringacc_ring_get_free - get free elements
> + * @ring: pointer on ring
> + *
> + * Returns number of free elements in the ring.
> + */
> +u32 k3_ringacc_ring_get_free(struct k3_ring *ring);
> +
> +/**
> + * k3_ringacc_ring_get_occ - get ring occupancy
> + * @ring: pointer on ring
> + *
> + * Returns total number of valid entries on the ring
> + */
> +u32 k3_ringacc_ring_get_occ(struct k3_ring *ring);
> +
> +/**
> + * k3_ringacc_ring_is_full - checks if ring is full
> + * @ring: pointer on ring
> + *
> + * Returns true if the ring is full
> + */
> +u32 k3_ringacc_ring_is_full(struct k3_ring *ring);
> +
> +/**
> + * k3_ringacc_ring_push - push element to the ring tail
> + * @ring: pointer on ring
> + * @elem: pointer on ring element buffer
> + *
> + * Push one ring element to the ring tail. Size of the ring element is
> + * determined by ring configuration &struct k3_ring_cfg elm_size.
> + *
> + * Returns 0 on success, errno otherwise.
> + */
> +int k3_ringacc_ring_push(struct k3_ring *ring, void *elem);
> +
> +/**
> + * k3_ringacc_ring_pop - pop element from the ring head
> + * @ring: pointer on ring
> + * @elem: pointer on ring element buffer
> + *
> + * Push one ring element from the ring head. Size of the ring element is
> + * determined by ring configuration &struct k3_ring_cfg elm_size..
> + *
> + * Returns 0 on success, errno otherwise.
> + */
> +int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem);
> +
> +/**
> + * k3_ringacc_ring_push_head - push element to the ring head
> + * @ring: pointer on ring
> + * @elem: pointer on ring element buffer
> + *
> + * Push one ring element to the ring head. Size of the ring element is
> + * determined by ring configuration &struct k3_ring_cfg elm_size.
> + *
> + * Returns 0 on success, errno otherwise.
> + * Not Supported by ring modes: K3_RINGACC_RING_MODE_RING
> + */
> +int k3_ringacc_ring_push_head(struct k3_ring *ring, void *elem);
> +
> +/**
> + * k3_ringacc_ring_pop_tail - pop element from the ring tail
> + * @ring: pointer on ring
> + * @elem: pointer on ring element buffer
> + *
> + * Push one ring element from the ring tail. Size of the ring element is
> + * determined by ring configuration &struct k3_ring_cfg elm_size.
> + *
> + * Returns 0 on success, errno otherwise.
> + * Not Supported by ring modes: K3_RINGACC_RING_MODE_RING
> + */
> +int k3_ringacc_ring_pop_tail(struct k3_ring *ring, void *elem);
> +
> +u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring);
> +
> +/**
> + * Debugging definitions
> + * TODO: might be removed
> + */
> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
> +void k3_ringacc_ring_dump(struct k3_ring *ring);
> +#else
> +static inline void k3_ringacc_ring_dump(struct k3_ring *ring) {};
> +#endif
> +
> +#endif /* __SOC_TI_K3_RINGACC_API_H_ */
> 

--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/14] dmaengine: Add metadata_ops for dma_async_tx_descriptor
  2019-09-08 14:12   ` Vinod Koul
@ 2019-09-09  6:52     ` Peter Ujfalusi
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-09-09  6:52 UTC (permalink / raw)
  To: Vinod Koul, Radhey Shyam Pandey
  Cc: robh+dt, Menon, Nishanth, ssantosh, dan.j.williams, dmaengine,
	linux-arm-kernel, devicetree, linux-kernel, grygorii.strashko,
	lokeshvutla, t-kristo, tony, j-keerthy



On 08/09/2019 17.12, Vinod Koul wrote:
> On 30-07-19, 12:34, Peter Ujfalusi wrote:
>> The metadata is best described as side band data or parameters traveling
>> alongside the data DMAd by the DMA engine. It is data
>> which is understood by the peripheral and the peripheral driver only, the
>> DMA engine see it only as data block and it is not interpreting it in any
>> way.
>>
>> The metadata can be different per descriptor as it is a parameter for the
>> data being transferred.
>>
>> If the DMA supports per descriptor metadata it can implement the attach,
>> get_ptr/set_len callbacks.
>>
>> Client drivers must only use either attach or get_ptr/set_len to avoid
>> misconfiguration.
>>
>> Client driver can check if a given metadata mode is supported by the
>> channel during probe time with
>> dmaengine_is_metadata_mode_supported(chan, DESC_METADATA_CLIENT);
>> dmaengine_is_metadata_mode_supported(chan, DESC_METADATA_ENGINE);
>>
>> and based on this information can use either mode.
>>
>> Wrappers are also added for the metadata_ops.
>>
>> To be used in DESC_METADATA_CLIENT mode:
>> dmaengine_desc_attach_metadata()
>>
>> To be used in DESC_METADATA_ENGINE mode:
>> dmaengine_desc_get_metadata_ptr()
>> dmaengine_desc_set_metadata_len()
>>
>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
>> ---
>>  drivers/dma/dmaengine.c   |  73 ++++++++++++++++++++++++++
>>  include/linux/dmaengine.h | 108 ++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 181 insertions(+)
>>
>> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
>> index 03ac4b96117c..6baddf7dcbfd 100644
>> --- a/drivers/dma/dmaengine.c
>> +++ b/drivers/dma/dmaengine.c
>> @@ -1302,6 +1302,79 @@ void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
>>  }
>>  EXPORT_SYMBOL(dma_async_tx_descriptor_init);
>>  
>> +static inline int desc_check_and_set_metadata_mode(
>> +	struct dma_async_tx_descriptor *desc, enum dma_desc_metadata_mode mode)
>> +{
>> +	/* Make sure that the metadata mode is not mixed */
>> +	if (!desc->desc_metadata_mode) {
>> +		if (dmaengine_is_metadata_mode_supported(desc->chan, mode))
>> +			desc->desc_metadata_mode = mode;
> 
> So do we have different descriptors supporting different modes or is it
> controlled based? For latter we can do this check at controller
> registration!

It is actually on channel basis (in UDMAP):
TR channel does not support metadata at all.
Packet Mode channel have support for metadata, but it might not be
available for certain remote peripherals. PDMAs for example does not use
metadata.
Any channel can be configured as TR or Packet Mode, any channel can
service a peripheral which needs or does not need metadata.

The reason we ended up per descriptor callbacks with Radhey (added to
CC) is that all functions operate on the descriptor and it was natural
to have them attached to the descriptor rather than add channel based
callbacks which must also take the descriptor pointer in addition. The
descriptor have pointer to the channel it is issued on.

I only know if metadata is going to be supported when the channel is
requested, based on the psil-config of the remote thread.

Clients still can check and plan ahead on how to use the metadata.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver
  2019-09-09  6:09   ` Tero Kristo
@ 2019-09-09  7:25     ` Vignesh Raghavendra
  2019-09-09 13:00     ` Peter Ujfalusi
  1 sibling, 0 replies; 32+ messages in thread
From: Vignesh Raghavendra @ 2019-09-09  7:25 UTC (permalink / raw)
  To: Tero Kristo, Peter Ujfalusi, vkoul, robh+dt, nm, ssantosh
  Cc: devicetree, grygorii.strashko, lokeshvutla, j-keerthy,
	linux-kernel, tony, dmaengine, dan.j.williams, linux-arm-kernel

Hi,

On 09/09/19 11:39 AM, Tero Kristo wrote:
[...]
>> diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
>> new file mode 100644
>> index 000000000000..401dfc963319
>> --- /dev/null
>> +++ b/drivers/soc/ti/k3-ringacc.c
>> @@ -0,0 +1,1191 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * TI K3 NAVSS Ring Accelerator subsystem driver
>> + *
>> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
>> + */
>> +
>> +#include <linux/dma-mapping.h>
>> +#include <linux/io.h>
>> +#include <linux/module.h>
>> +#include <linux/of.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/soc/ti/k3-ringacc.h>
>> +#include <linux/soc/ti/ti_sci_protocol.h>
>> +#include <linux/soc/ti/ti_sci_inta_msi.h>
>> +#include <linux/of_irq.h>
>> +#include <linux/irqdomain.h>
>> +
>> +static LIST_HEAD(k3_ringacc_list);
>> +static DEFINE_MUTEX(k3_ringacc_list_lock);
>> +
>> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
>> +#define    k3_nav_dbg(dev, arg...) dev_err(dev, arg)
> 
> dev_err seems exaggeration for debug purposes, maybe just dev_info.

If you make this dev_dbg(), it should be possible to just enable
CONFIG_DYNAMIC_DEBUG[1] and control whether or not debug prints are
enabled for this module. Have you explored that option? If that works we
may not need CONFIG_TI_K3_RINGACC_DEBUG at all.

[1] Documentation/admin-guide/dynamic-debug-howto.rst

Regards
Vignesh

> 
>> +static    void dbg_writel(u32 v, void __iomem *reg)
>> +{
>> +    pr_err("WRITEL(32): v(%08X)-->reg(%p)\n", v, reg);
> 
> Again, maybe just pr_info.
> 
>> +    writel(v, reg);
>> +}
>> +
>> +static    u32 dbg_readl(void __iomem *reg)
>> +{
>> +    u32 v;
>> +
>> +    v = readl(reg);
>> +    pr_err("READL(32): v(%08X)<--reg(%p)\n", v, reg);
>> +    return v;
>> +}
>> +#else
>> +#define    k3_nav_dbg(dev, arg...) dev_dbg(dev, arg)
>> +#define dbg_writel(v, reg) writel(v, reg)
> 
> Do you need to use hard writel, writel_relaxed is not enough?
> 
>> +
>> +#define dbg_readl(reg) readl(reg)
> 
> Same as above but for read?
> 
>> +#endif
>> +
>> +#define K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK        GENMASK(19, 0)
>> +
>> +/**
>> + * struct k3_ring_rt_regs -  The RA Control/Status Registers region
>> + */
>> +struct k3_ring_rt_regs {
>> +    u32    resv_16[4];
>> +    u32    db;        /* RT Ring N Doorbell Register */
>> +    u32    resv_4[1];
>> +    u32    occ;        /* RT Ring N Occupancy Register */
>> +    u32    indx;        /* RT Ring N Current Index Register */
>> +    u32    hwocc;        /* RT Ring N Hardware Occupancy Register */
>> +    u32    hwindx;        /* RT Ring N Current Index Register */
>> +};
>> +
>> +#define K3_RINGACC_RT_REGS_STEP    0x1000
>> +
>> +/**
>> + * struct k3_ring_fifo_regs -  The Ring Accelerator Queues Registers
>> region
>> + */
>> +struct k3_ring_fifo_regs {
>> +    u32    head_data[128];        /* Ring Head Entry Data Registers */
>> +    u32    tail_data[128];        /* Ring Tail Entry Data Registers */
>> +    u32    peek_head_data[128];    /* Ring Peek Head Entry Data Regs */
>> +    u32    peek_tail_data[128];    /* Ring Peek Tail Entry Data Regs */
>> +};
>> +
>> +/**
>> + * struct k3_ringacc_proxy_gcfg_regs - RA Proxy Global Config MMIO
>> Region
>> + */
>> +struct k3_ringacc_proxy_gcfg_regs {
>> +    u32    revision;    /* Revision Register */
>> +    u32    config;        /* Config Register */
>> +};
>> +
>> +#define K3_RINGACC_PROXY_CFG_THREADS_MASK        GENMASK(15, 0)
>> +
>> +/**
>> + * struct k3_ringacc_proxy_target_regs -  Proxy Datapath MMIO Region
>> + */
>> +struct k3_ringacc_proxy_target_regs {
>> +    u32    control;    /* Proxy Control Register */
>> +    u32    status;        /* Proxy Status Register */
>> +    u8    resv_512[504];
>> +    u32    data[128];    /* Proxy Data Register */
>> +};
>> +
>> +#define K3_RINGACC_PROXY_TARGET_STEP    0x1000
>> +#define K3_RINGACC_PROXY_NOT_USED    (-1)
>> +
>> +enum k3_ringacc_proxy_access_mode {
>> +    PROXY_ACCESS_MODE_HEAD = 0,
>> +    PROXY_ACCESS_MODE_TAIL = 1,
>> +    PROXY_ACCESS_MODE_PEEK_HEAD = 2,
>> +    PROXY_ACCESS_MODE_PEEK_TAIL = 3,
>> +};
>> +
>> +#define K3_RINGACC_FIFO_WINDOW_SIZE_BYTES  (512U)
>> +#define K3_RINGACC_FIFO_REGS_STEP    0x1000
>> +#define K3_RINGACC_MAX_DB_RING_CNT    (127U)
>> +
>> +/**
>> + * struct k3_ring_ops -  Ring operations
>> + */
>> +struct k3_ring_ops {
>> +    int (*push_tail)(struct k3_ring *ring, void *elm);
>> +    int (*push_head)(struct k3_ring *ring, void *elm);
>> +    int (*pop_tail)(struct k3_ring *ring, void *elm);
>> +    int (*pop_head)(struct k3_ring *ring, void *elm);
>> +};
>> +
>> +/**
>> + * struct k3_ring - RA Ring descriptor
>> + *
>> + * @rt - Ring control/status registers
>> + * @fifos - Ring queues registers
>> + * @proxy - Ring Proxy Datapath registers
>> + * @ring_mem_dma - Ring buffer dma address
>> + * @ring_mem_virt - Ring buffer virt address
>> + * @ops - Ring operations
>> + * @size - Ring size in elements
>> + * @elm_size - Size of the ring element
>> + * @mode - Ring mode
>> + * @flags - flags
>> + * @free - Number of free elements
>> + * @occ - Ring occupancy
>> + * @windex - Write index (only for @K3_RINGACC_RING_MODE_RING)
>> + * @rindex - Read index (only for @K3_RINGACC_RING_MODE_RING)
>> + * @ring_id - Ring Id
>> + * @parent - Pointer on struct @k3_ringacc
>> + * @use_count - Use count for shared rings
>> + * @proxy_id - RA Ring Proxy Id (only if @K3_RINGACC_RING_USE_PROXY)
>> + */
>> +struct k3_ring {
>> +    struct k3_ring_rt_regs __iomem *rt;
>> +    struct k3_ring_fifo_regs __iomem *fifos;
>> +    struct k3_ringacc_proxy_target_regs  __iomem *proxy;
>> +    dma_addr_t    ring_mem_dma;
>> +    void        *ring_mem_virt;
>> +    struct k3_ring_ops *ops;
>> +    u32        size;
>> +    enum k3_ring_size elm_size;
>> +    enum k3_ring_mode mode;
>> +    u32        flags;
>> +#define K3_RING_FLAG_BUSY    BIT(1)
>> +#define K3_RING_FLAG_SHARED    BIT(2)
>> +    u32        free;
>> +    u32        occ;
>> +    u32        windex;
>> +    u32        rindex;
>> +    u32        ring_id;
>> +    struct k3_ringacc    *parent;
>> +    u32        use_count;
>> +    int        proxy_id;
>> +};
>> +
>> +/**
>> + * struct k3_ringacc - Rings accelerator descriptor
>> + *
>> + * @dev - pointer on RA device
>> + * @proxy_gcfg - RA proxy global config registers
>> + * @proxy_target_base - RA proxy datapath region
>> + * @num_rings - number of ring in RA
>> + * @rm_gp_range - general purpose rings range from tisci
>> + * @dma_ring_reset_quirk - DMA reset w/a enable
>> + * @num_proxies - number of RA proxies
>> + * @rings - array of rings descriptors (struct @k3_ring)
>> + * @list - list of RAs in the system
>> + * @tisci - pointer ti-sci handle
>> + * @tisci_ring_ops - ti-sci rings ops
>> + * @tisci_dev_id - ti-sci device id
>> + */
>> +struct k3_ringacc {
>> +    struct device *dev;
>> +    struct k3_ringacc_proxy_gcfg_regs __iomem *proxy_gcfg;
>> +    void __iomem *proxy_target_base;
>> +    u32 num_rings; /* number of rings in Ringacc module */
>> +    unsigned long *rings_inuse;
>> +    struct ti_sci_resource *rm_gp_range;
>> +
>> +    bool dma_ring_reset_quirk;
>> +    u32 num_proxies;
>> +    unsigned long *proxy_inuse;
> 
> proxy_inuse is not documented above.
> 
>> +
>> +    struct k3_ring *rings;
>> +    struct list_head list;
>> +    struct mutex req_lock; /* protect rings allocation */
>> +
>> +    const struct ti_sci_handle *tisci;
>> +    const struct ti_sci_rm_ringacc_ops *tisci_ring_ops;
>> +    u32  tisci_dev_id;
>> +};
>> +
>> +static long k3_ringacc_ring_get_fifo_pos(struct k3_ring *ring)
>> +{
>> +    return K3_RINGACC_FIFO_WINDOW_SIZE_BYTES -
>> +           (4 << ring->elm_size);
>> +}
>> +
>> +static void *k3_ringacc_get_elm_addr(struct k3_ring *ring, u32 idx)
>> +{
>> +    return (idx * (4 << ring->elm_size) + ring->ring_mem_virt);
> 
> The arithmetic here seems backwards compared to most other code I've
> seen. It would be more readable if you have it like:
> 
> ring->ring_mem_virt + idx * (4 << ring->elm_size);
> 
>> +}
>> +
>> +static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem);
>> +static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem);
>> +
>> +static struct k3_ring_ops k3_ring_mode_ring_ops = {
>> +        .push_tail = k3_ringacc_ring_push_mem,
>> +        .pop_head = k3_ringacc_ring_pop_mem,
>> +};
>> +
>> +static int k3_ringacc_ring_push_io(struct k3_ring *ring, void *elem);
>> +static int k3_ringacc_ring_pop_io(struct k3_ring *ring, void *elem);
>> +static int k3_ringacc_ring_push_head_io(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_pop_tail_io(struct k3_ring *ring, void
>> *elem);
>> +
>> +static struct k3_ring_ops k3_ring_mode_msg_ops = {
>> +        .push_tail = k3_ringacc_ring_push_io,
>> +        .push_head = k3_ringacc_ring_push_head_io,
>> +        .pop_tail = k3_ringacc_ring_pop_tail_io,
>> +        .pop_head = k3_ringacc_ring_pop_io,
>> +};
>> +
>> +static int k3_ringacc_ring_push_head_proxy(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_push_tail_proxy(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_pop_head_proxy(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_pop_tail_proxy(struct k3_ring *ring, void
>> *elem);
>> +
>> +static struct k3_ring_ops k3_ring_mode_proxy_ops = {
>> +        .push_tail = k3_ringacc_ring_push_tail_proxy,
>> +        .push_head = k3_ringacc_ring_push_head_proxy,
>> +        .pop_tail = k3_ringacc_ring_pop_tail_proxy,
>> +        .pop_head = k3_ringacc_ring_pop_head_proxy,
>> +};
>> +
>> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
>> +void k3_ringacc_ring_dump(struct k3_ring *ring)
>> +{
>> +    struct device *dev = ring->parent->dev;
>> +
>> +    k3_nav_dbg(dev, "dump ring: %d\n", ring->ring_id);
>> +    k3_nav_dbg(dev, "dump mem virt %p, dma %pad\n",
>> +           ring->ring_mem_virt, &ring->ring_mem_dma);
>> +    k3_nav_dbg(dev, "dump elmsize %d, size %d, mode %d, proxy_id %d\n",
>> +           ring->elm_size, ring->size, ring->mode, ring->proxy_id);
>> +
>> +    k3_nav_dbg(dev, "dump ring_rt_regs: db%08x\n",
>> +           readl(&ring->rt->db));
> 
> Why not use readl_relaxed in this func?
> 
>> +    k3_nav_dbg(dev, "dump occ%08x\n",
>> +           readl(&ring->rt->occ));
>> +    k3_nav_dbg(dev, "dump indx%08x\n",
>> +           readl(&ring->rt->indx));
>> +    k3_nav_dbg(dev, "dump hwocc%08x\n",
>> +           readl(&ring->rt->hwocc));
>> +    k3_nav_dbg(dev, "dump hwindx%08x\n",
>> +           readl(&ring->rt->hwindx));
>> +
>> +    if (ring->ring_mem_virt)
>> +        print_hex_dump(KERN_ERR, "dump ring_mem_virt ",
>> +                   DUMP_PREFIX_NONE, 16, 1,
>> +                   ring->ring_mem_virt, 16 * 8, false);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_dump);
> 
> Do you really need to export a debug function?
> 
>> +#endif
>> +
>> +struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
>> +                    int id, u32 flags)
>> +{
>> +    int proxy_id = K3_RINGACC_PROXY_NOT_USED;
>> +
>> +    mutex_lock(&ringacc->req_lock);
>> +
>> +    if (id == K3_RINGACC_RING_ID_ANY) {
>> +        /* Request for any general purpose ring */
>> +        struct ti_sci_resource_desc *gp_rings =
>> +                        &ringacc->rm_gp_range->desc[0];
>> +        unsigned long size;
>> +
>> +        size = gp_rings->start + gp_rings->num;
>> +        id = find_next_zero_bit(ringacc->rings_inuse, size,
>> +                    gp_rings->start);
>> +        if (id == size)
>> +            goto error;
>> +    } else if (id < 0) {
>> +        goto error;
>> +    }
>> +
>> +    if (test_bit(id, ringacc->rings_inuse) &&
>> +        !(ringacc->rings[id].flags & K3_RING_FLAG_SHARED))
>> +        goto error;
>> +    else if (ringacc->rings[id].flags & K3_RING_FLAG_SHARED)
>> +        goto out;
>> +
>> +    if (flags & K3_RINGACC_RING_USE_PROXY) {
>> +        proxy_id = find_next_zero_bit(ringacc->proxy_inuse,
>> +                          ringacc->num_proxies, 0);
>> +        if (proxy_id == ringacc->num_proxies)
>> +            goto error;
>> +    }
>> +
>> +    if (!try_module_get(ringacc->dev->driver->owner))
>> +        goto error;
>> +
>> +    if (proxy_id != K3_RINGACC_PROXY_NOT_USED) {
>> +        set_bit(proxy_id, ringacc->proxy_inuse);
>> +        ringacc->rings[id].proxy_id = proxy_id;
>> +        k3_nav_dbg(ringacc->dev, "Giving ring#%d proxy#%d\n",
>> +               id, proxy_id);
>> +    } else {
>> +        k3_nav_dbg(ringacc->dev, "Giving ring#%d\n", id);
>> +    }
>> +
>> +    set_bit(id, ringacc->rings_inuse);
>> +out:
>> +    ringacc->rings[id].use_count++;
>> +    mutex_unlock(&ringacc->req_lock);
>> +    return &ringacc->rings[id];
>> +
>> +error:
>> +    mutex_unlock(&ringacc->req_lock);
>> +    return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_request_ring);
>> +
>> +static void k3_ringacc_ring_reset_sci(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret;
>> +
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_RING_COUNT_VALID,
>> +            ringacc->tisci_dev_id,
>> +            ring->ring_id,
>> +            0,
>> +            0,
>> +            ring->size,
>> +            0,
>> +            0,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI reset ring fail (%d) ring_idx
>> %d\n",
>> +            ret, ring->ring_id);
> 
> Return value of sci ops is masked, why not return it and let the caller
> handle it properly?
> 
> Same comment for anything similar that follows.
> 
>> +}
>> +
>> +void k3_ringacc_ring_reset(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return;
>> +
>> +    ring->occ = 0;
>> +    ring->free = 0;
>> +    ring->rindex = 0;
>> +    ring->windex = 0;
>> +
>> +    k3_ringacc_ring_reset_sci(ring);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset);
>> +
>> +static void k3_ringacc_ring_reconfig_qmode_sci(struct k3_ring *ring,
>> +                           enum k3_ring_mode mode)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret;
>> +
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_RING_MODE_VALID,
>> +            ringacc->tisci_dev_id,
>> +            ring->ring_id,
>> +            0,
>> +            0,
>> +            0,
>> +            mode,
>> +            0,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI reconf qmode fail (%d) ring_idx
>> %d\n",
>> +            ret, ring->ring_id);
>> +}
>> +
>> +void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return;
>> +
>> +    if (!ring->parent->dma_ring_reset_quirk)
>> +        return;
>> +
>> +    if (!occ)
>> +        occ = dbg_readl(&ring->rt->occ);
>> +
>> +    if (occ) {
>> +        u32 db_ring_cnt, db_ring_cnt_cur;
>> +
>> +        k3_nav_dbg(ring->parent->dev, "%s %u occ: %u\n", __func__,
>> +               ring->ring_id, occ);
>> +        /* 2. Reset the ring */
> 
> 2? Where is 1?
> 
>> +        k3_ringacc_ring_reset_sci(ring);
>> +
>> +        /*
>> +         * 3. Setup the ring in ring/doorbell mode
>> +         * (if not already in this mode)
>> +         */
>> +        if (ring->mode != K3_RINGACC_RING_MODE_RING)
>> +            k3_ringacc_ring_reconfig_qmode_sci(
>> +                    ring, K3_RINGACC_RING_MODE_RING);
>> +        /*
>> +         * 4. Ring the doorbell 2**22 – ringOcc times.
>> +         * This will wrap the internal UDMAP ring state occupancy
>> +         * counter (which is 21-bits wide) to 0.
>> +         */
>> +        db_ring_cnt = (1U << 22) - occ;
>> +
>> +        while (db_ring_cnt != 0) {
>> +            /*
>> +             * Ring the doorbell with the maximum count each
>> +             * iteration if possible to minimize the total
>> +             * of writes
>> +             */
>> +            if (db_ring_cnt > K3_RINGACC_MAX_DB_RING_CNT)
>> +                db_ring_cnt_cur = K3_RINGACC_MAX_DB_RING_CNT;
>> +            else
>> +                db_ring_cnt_cur = db_ring_cnt;
>> +
>> +            writel(db_ring_cnt_cur, &ring->rt->db);
>> +            db_ring_cnt -= db_ring_cnt_cur;
>> +        }
>> +
>> +        /* 5. Restore the original ring mode (if not ring mode) */
>> +        if (ring->mode != K3_RINGACC_RING_MODE_RING)
>> +            k3_ringacc_ring_reconfig_qmode_sci(ring, ring->mode);
>> +    }
>> +
>> +    /* 2. Reset the ring */
> 
> Again 2?
> 
>> +    k3_ringacc_ring_reset(ring);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset_dma);
>> +
>> +static void k3_ringacc_ring_free_sci(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret;
>> +
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
>> +            ringacc->tisci_dev_id,
>> +            ring->ring_id,
>> +            0,
>> +            0,
>> +            0,
>> +            0,
>> +            0,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI ring free fail (%d) ring_idx %d\n",
>> +            ret, ring->ring_id);
>> +}
>> +
>> +int k3_ringacc_ring_free(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc;
>> +
>> +    if (!ring)
>> +        return -EINVAL;
>> +
>> +    ringacc = ring->parent;
>> +
>> +    k3_nav_dbg(ring->parent->dev, "flags: 0x%08x\n", ring->flags);
>> +
>> +    if (!test_bit(ring->ring_id, ringacc->rings_inuse))
>> +        return -EINVAL;
>> +
>> +    mutex_lock(&ringacc->req_lock);
>> +
>> +    if (--ring->use_count)
>> +        goto out;
>> +
>> +    if (!(ring->flags & K3_RING_FLAG_BUSY))
>> +        goto no_init;
>> +
>> +    k3_ringacc_ring_free_sci(ring);
>> +
>> +    dma_free_coherent(ringacc->dev,
>> +              ring->size * (4 << ring->elm_size),
>> +              ring->ring_mem_virt, ring->ring_mem_dma);
>> +    ring->flags = 0;
>> +    ring->ops = NULL;
>> +    if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED) {
>> +        clear_bit(ring->proxy_id, ringacc->proxy_inuse);
>> +        ring->proxy = NULL;
>> +        ring->proxy_id = K3_RINGACC_PROXY_NOT_USED;
>> +    }
>> +
>> +no_init:
>> +    clear_bit(ring->ring_id, ringacc->rings_inuse);
>> +
>> +    module_put(ringacc->dev->driver->owner);
>> +
>> +out:
>> +    mutex_unlock(&ringacc->req_lock);
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_free);
>> +
>> +u32 k3_ringacc_get_ring_id(struct k3_ring *ring)
>> +{
>> +    if (!ring)
>> +        return -EINVAL;
>> +
>> +    return ring->ring_id;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_id);
>> +
>> +u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring)
>> +{
>> +    if (!ring)
>> +        return -EINVAL;
>> +
> 
> What if parent is NULL? Can it ever be here?
> 
>> +    return ring->parent->tisci_dev_id;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_tisci_dev_id);
>> +
>> +int k3_ringacc_get_ring_irq_num(struct k3_ring *ring)
>> +{
>> +    int irq_num;
>> +
>> +    if (!ring)
>> +        return -EINVAL;
>> +
>> +    irq_num = ti_sci_inta_msi_get_virq(ring->parent->dev,
>> ring->ring_id);
>> +    if (irq_num <= 0)
>> +        irq_num = -EINVAL;
>> +    return irq_num;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_irq_num);
>> +
>> +static int k3_ringacc_ring_cfg_sci(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    u32 ring_idx;
>> +    int ret;
>> +
>> +    if (!ringacc->tisci)
>> +        return -EINVAL;
>> +
>> +    ring_idx = ring->ring_id;
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
>> +            ringacc->tisci_dev_id,
>> +            ring_idx,
>> +            lower_32_bits(ring->ring_mem_dma),
>> +            upper_32_bits(ring->ring_mem_dma),
>> +            ring->size,
>> +            ring->mode,
>> +            ring->elm_size,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI config ring fail (%d) ring_idx
>> %d\n",
>> +            ret, ring_idx);
>> +
>> +    return ret;
>> +}
>> +
>> +int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret = 0;
>> +
>> +    if (!ring || !cfg)
>> +        return -EINVAL;
>> +    if (cfg->elm_size > K3_RINGACC_RING_ELSIZE_256 ||
>> +        cfg->mode > K3_RINGACC_RING_MODE_QM ||
>> +        cfg->size & ~K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK ||
>> +        !test_bit(ring->ring_id, ringacc->rings_inuse))
>> +        return -EINVAL;
>> +
>> +    if (ring->use_count != 1)
> 
> Hmm, isn't this a failure actually?
> 
>> +        return 0;
>> +
>> +    ring->size = cfg->size;
>> +    ring->elm_size = cfg->elm_size;
>> +    ring->mode = cfg->mode;
>> +    ring->occ = 0;
>> +    ring->free = 0;
>> +    ring->rindex = 0;
>> +    ring->windex = 0;
>> +
>> +    if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED)
>> +        ring->proxy = ringacc->proxy_target_base +
>> +                  ring->proxy_id * K3_RINGACC_PROXY_TARGET_STEP;
>> +
>> +    switch (ring->mode) {
>> +    case K3_RINGACC_RING_MODE_RING:
>> +        ring->ops = &k3_ring_mode_ring_ops;
>> +        break;
>> +    case K3_RINGACC_RING_MODE_QM:
>> +        /*
>> +         * In Queue mode elm_size can be 8 only and each operation
>> +         * uses 2 element slots
>> +         */
>> +        if (cfg->elm_size != K3_RINGACC_RING_ELSIZE_8 ||
>> +            cfg->size % 2)
>> +            goto err_free_proxy;
>> +        /* else, fall through */
>> +    case K3_RINGACC_RING_MODE_MESSAGE:
>> +        if (ring->proxy)
>> +            ring->ops = &k3_ring_mode_proxy_ops;
>> +        else
>> +            ring->ops = &k3_ring_mode_msg_ops;
>> +        break;
>> +    default:
>> +        ring->ops = NULL;
>> +        ret = -EINVAL;
>> +        goto err_free_proxy;
>> +    };
>> +
>> +    ring->ring_mem_virt =
>> +            dma_alloc_coherent(ringacc->dev,
>> +                       ring->size * (4 << ring->elm_size),
>> +                       &ring->ring_mem_dma, GFP_KERNEL);
>> +    if (!ring->ring_mem_virt) {
>> +        dev_err(ringacc->dev, "Failed to alloc ring mem\n");
>> +        ret = -ENOMEM;
>> +        goto err_free_ops;
>> +    }
>> +
>> +    ret = k3_ringacc_ring_cfg_sci(ring);
>> +
>> +    if (ret)
>> +        goto err_free_mem;
>> +
>> +    ring->flags |= K3_RING_FLAG_BUSY;
>> +    ring->flags |= (cfg->flags & K3_RINGACC_RING_SHARED) ?
>> +            K3_RING_FLAG_SHARED : 0;
>> +
>> +    k3_ringacc_ring_dump(ring);
>> +
>> +    return 0;
>> +
>> +err_free_mem:
>> +    dma_free_coherent(ringacc->dev,
>> +              ring->size * (4 << ring->elm_size),
>> +              ring->ring_mem_virt,
>> +              ring->ring_mem_dma);
>> +err_free_ops:
>> +    ring->ops = NULL;
>> +err_free_proxy:
>> +    ring->proxy = NULL;
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_cfg);
>> +
>> +u32 k3_ringacc_ring_get_size(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    return ring->size;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_size);
>> +
>> +u32 k3_ringacc_ring_get_free(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    if (!ring->free)
>> +        ring->free = ring->size - dbg_readl(&ring->rt->occ);
>> +
>> +    return ring->free;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_free);
>> +
>> +u32 k3_ringacc_ring_get_occ(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    return dbg_readl(&ring->rt->occ);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_occ);
>> +
>> +u32 k3_ringacc_ring_is_full(struct k3_ring *ring)
>> +{
>> +    return !k3_ringacc_ring_get_free(ring);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_is_full);
>> +
>> +enum k3_ringacc_access_mode {
>> +    K3_RINGACC_ACCESS_MODE_PUSH_HEAD,
>> +    K3_RINGACC_ACCESS_MODE_POP_HEAD,
>> +    K3_RINGACC_ACCESS_MODE_PUSH_TAIL,
>> +    K3_RINGACC_ACCESS_MODE_POP_TAIL,
>> +    K3_RINGACC_ACCESS_MODE_PEEK_HEAD,
>> +    K3_RINGACC_ACCESS_MODE_PEEK_TAIL,
>> +};
>> +
>> +static int k3_ringacc_ring_cfg_proxy(struct k3_ring *ring,
>> +                     enum k3_ringacc_proxy_access_mode mode)
>> +{
>> +    u32 val;
>> +
>> +    val = ring->ring_id;
>> +    val |= mode << 16;
>> +    val |= ring->elm_size << 24;
> 
> Would be nice to have these magic shifts as defines.
> 
>> +    dbg_writel(val, &ring->proxy->control);
>> +    return 0;
>> +}
>> +
>> +static int k3_ringacc_ring_access_proxy(struct k3_ring *ring, void
>> *elem,
>> +                    enum k3_ringacc_access_mode access_mode)
>> +{
>> +    void __iomem *ptr;
>> +
>> +    ptr = (void __iomem *)&ring->proxy->data;
>> +
>> +    switch (access_mode) {
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
>> +    case K3_RINGACC_ACCESS_MODE_POP_HEAD:
>> +        k3_ringacc_ring_cfg_proxy(ring, PROXY_ACCESS_MODE_HEAD);
>> +        break;
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
>> +    case K3_RINGACC_ACCESS_MODE_POP_TAIL:
>> +        k3_ringacc_ring_cfg_proxy(ring, PROXY_ACCESS_MODE_TAIL);
>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
>> +    ptr += k3_ringacc_ring_get_fifo_pos(ring);
>> +
>> +    switch (access_mode) {
>> +    case K3_RINGACC_ACCESS_MODE_POP_HEAD:
>> +    case K3_RINGACC_ACCESS_MODE_POP_TAIL:
>> +        k3_nav_dbg(ring->parent->dev, "proxy:memcpy_fromio(x): -->
>> ptr(%p), mode:%d\n",
>> +               ptr, access_mode);
>> +        memcpy_fromio(elem, ptr, (4 << ring->elm_size));
>> +        ring->occ--;
>> +        break;
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
>> +        k3_nav_dbg(ring->parent->dev, "proxy:memcpy_toio(x): -->
>> ptr(%p), mode:%d\n",
>> +               ptr, access_mode);
>> +        memcpy_toio(ptr, elem, (4 << ring->elm_size));
>> +        ring->free--;
>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
>> +    k3_nav_dbg(ring->parent->dev, "proxy: free%d occ%d\n",
>> +           ring->free, ring->occ);
>> +    return 0;
>> +}
>> +
>> +static int k3_ringacc_ring_push_head_proxy(struct k3_ring *ring, void
>> *elem)
>> +{
>> +    return k3_ringacc_ring_access_proxy(ring, elem,
>> +                        K3_RINGACC_ACCESS_MODE_PUSH_HEAD);
>> +}
>> +
>> +static int k3_ringacc_ring_push_tail_proxy(struct k3_ring *ring, void
>> *elem)
>> +{
>> +    return k3_ringacc_ring_access_proxy(ring, elem,
>> +                        K3_RINGACC_ACCESS_MODE_PUSH_TAIL);
>> +}
>> +
>> +static int k3_ringacc_ring_pop_head_proxy(struct k3_ring *ring, void
>> *elem)
>> +{
>> +    return k3_ringacc_ring_access_proxy(ring, elem,
>> +                        K3_RINGACC_ACCESS_MODE_POP_HEAD);
>> +}
>> +
>> +static int k3_ringacc_ring_pop_tail_proxy(struct k3_ring *ring, void
>> *elem)
>> +{
>> +    return k3_ringacc_ring_access_proxy(ring, elem,
>> +                        K3_RINGACC_ACCESS_MODE_POP_HEAD);
>> +}
>> +
>> +static int k3_ringacc_ring_access_io(struct k3_ring *ring, void *elem,
>> +                     enum k3_ringacc_access_mode access_mode)
>> +{
>> +    void __iomem *ptr;
>> +
>> +    switch (access_mode) {
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
>> +    case K3_RINGACC_ACCESS_MODE_POP_HEAD:
>> +        ptr = (void __iomem *)&ring->fifos->head_data;
>> +        break;
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
>> +    case K3_RINGACC_ACCESS_MODE_POP_TAIL:
>> +        ptr = (void __iomem *)&ring->fifos->tail_data;
>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
>> +    ptr += k3_ringacc_ring_get_fifo_pos(ring);
>> +
>> +    switch (access_mode) {
>> +    case K3_RINGACC_ACCESS_MODE_POP_HEAD:
>> +    case K3_RINGACC_ACCESS_MODE_POP_TAIL:
>> +        k3_nav_dbg(ring->parent->dev, "memcpy_fromio(x): --> ptr(%p),
>> mode:%d\n",
>> +               ptr, access_mode);
>> +        memcpy_fromio(elem, ptr, (4 << ring->elm_size));
>> +        ring->occ--;
>> +        break;
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_TAIL:
>> +    case K3_RINGACC_ACCESS_MODE_PUSH_HEAD:
>> +        k3_nav_dbg(ring->parent->dev, "memcpy_toio(x): --> ptr(%p),
>> mode:%d\n",
>> +               ptr, access_mode);
>> +        memcpy_toio(ptr, elem, (4 << ring->elm_size));
>> +        ring->free--;
>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
>> +    k3_nav_dbg(ring->parent->dev, "free%d index%d occ%d index%d\n",
>> +           ring->free, ring->windex, ring->occ, ring->rindex);
>> +    return 0;
>> +}
>> +
>> +static int k3_ringacc_ring_push_head_io(struct k3_ring *ring, void
>> *elem)
>> +{
>> +    return k3_ringacc_ring_access_io(ring, elem,
>> +                     K3_RINGACC_ACCESS_MODE_PUSH_HEAD);
>> +}
>> +
>> +static int k3_ringacc_ring_push_io(struct k3_ring *ring, void *elem)
>> +{
>> +    return k3_ringacc_ring_access_io(ring, elem,
>> +                     K3_RINGACC_ACCESS_MODE_PUSH_TAIL);
>> +}
>> +
>> +static int k3_ringacc_ring_pop_io(struct k3_ring *ring, void *elem)
>> +{
>> +    return k3_ringacc_ring_access_io(ring, elem,
>> +                     K3_RINGACC_ACCESS_MODE_POP_HEAD);
>> +}
>> +
>> +static int k3_ringacc_ring_pop_tail_io(struct k3_ring *ring, void *elem)
>> +{
>> +    return k3_ringacc_ring_access_io(ring, elem,
>> +                     K3_RINGACC_ACCESS_MODE_POP_HEAD);
>> +}
>> +
>> +static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem)
>> +{
>> +    void *elem_ptr;
>> +
>> +    elem_ptr = k3_ringacc_get_elm_addr(ring, ring->windex);
>> +
>> +    memcpy(elem_ptr, elem, (4 << ring->elm_size));
>> +
>> +    ring->windex = (ring->windex + 1) % ring->size;
>> +    ring->free--;
>> +    dbg_writel(1, &ring->rt->db);
>> +
>> +    k3_nav_dbg(ring->parent->dev, "ring_push_mem: free%d index%d\n",
>> +           ring->free, ring->windex);
>> +
>> +    return 0;
>> +}
>> +
>> +static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem)
>> +{
>> +    void *elem_ptr;
>> +
>> +    elem_ptr = k3_ringacc_get_elm_addr(ring, ring->rindex);
>> +
>> +    memcpy(elem, elem_ptr, (4 << ring->elm_size));
>> +
>> +    ring->rindex = (ring->rindex + 1) % ring->size;
>> +    ring->occ--;
>> +    dbg_writel(-1, &ring->rt->db);
>> +
>> +    k3_nav_dbg(ring->parent->dev, "ring_pop_mem: occ%d index%d
>> pos_ptr%p\n",
>> +           ring->occ, ring->rindex, elem_ptr);
>> +    return 0;
>> +}
>> +
>> +int k3_ringacc_ring_push(struct k3_ring *ring, void *elem)
>> +{
>> +    int ret = -EOPNOTSUPP;
>> +
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    k3_nav_dbg(ring->parent->dev, "ring_push: free%d index%d\n",
>> +           ring->free, ring->windex);
>> +
>> +    if (k3_ringacc_ring_is_full(ring))
>> +        return -ENOMEM;
>> +
>> +    if (ring->ops && ring->ops->push_tail)
>> +        ret = ring->ops->push_tail(ring, elem);
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_push);
>> +
>> +int k3_ringacc_ring_push_head(struct k3_ring *ring, void *elem)
>> +{
>> +    int ret = -EOPNOTSUPP;
>> +
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    k3_nav_dbg(ring->parent->dev, "ring_push_head: free%d index%d\n",
>> +           ring->free, ring->windex);
>> +
>> +    if (k3_ringacc_ring_is_full(ring))
>> +        return -ENOMEM;
>> +
>> +    if (ring->ops && ring->ops->push_head)
>> +        ret = ring->ops->push_head(ring, elem);
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_push_head);
>> +
>> +int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem)
>> +{
>> +    int ret = -EOPNOTSUPP;
>> +
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    if (!ring->occ)
>> +        ring->occ = k3_ringacc_ring_get_occ(ring);
>> +
>> +    k3_nav_dbg(ring->parent->dev, "ring_pop: occ%d index%d\n",
>> +           ring->occ, ring->rindex);
>> +
>> +    if (!ring->occ)
>> +        return -ENODATA;
>> +
>> +    if (ring->ops && ring->ops->pop_head)
>> +        ret = ring->ops->pop_head(ring, elem);
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop);
>> +
>> +int k3_ringacc_ring_pop_tail(struct k3_ring *ring, void *elem)
>> +{
>> +    int ret = -EOPNOTSUPP;
>> +
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    if (!ring->occ)
>> +        ring->occ = k3_ringacc_ring_get_occ(ring);
>> +
>> +    k3_nav_dbg(ring->parent->dev, "ring_pop_tail: occ%d index%d\n",
>> +           ring->occ, ring->rindex);
>> +
>> +    if (!ring->occ)
>> +        return -ENODATA;
>> +
>> +    if (ring->ops && ring->ops->pop_tail)
>> +        ret = ring->ops->pop_tail(ring, elem);
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop_tail);
>> +
>> +struct k3_ringacc *of_k3_ringacc_get_by_phandle(struct device_node *np,
>> +                        const char *property)
>> +{
>> +    struct device_node *ringacc_np;
>> +    struct k3_ringacc *ringacc = ERR_PTR(-EPROBE_DEFER);
>> +    struct k3_ringacc *entry;
>> +
>> +    ringacc_np = of_parse_phandle(np, property, 0);
>> +    if (!ringacc_np)
>> +        return ERR_PTR(-ENODEV);
>> +
>> +    mutex_lock(&k3_ringacc_list_lock);
>> +    list_for_each_entry(entry, &k3_ringacc_list, list)
>> +        if (entry->dev->of_node == ringacc_np) {
>> +            ringacc = entry;
>> +            break;
>> +        }
>> +    mutex_unlock(&k3_ringacc_list_lock);
>> +    of_node_put(ringacc_np);
>> +
>> +    return ringacc;
>> +}
>> +EXPORT_SYMBOL_GPL(of_k3_ringacc_get_by_phandle);
>> +
>> +static int k3_ringacc_probe_dt(struct k3_ringacc *ringacc)
>> +{
>> +    struct device_node *node = ringacc->dev->of_node;
>> +    struct device *dev = ringacc->dev;
>> +    struct platform_device *pdev = to_platform_device(dev);
>> +    int ret;
>> +
>> +    if (!node) {
>> +        dev_err(dev, "device tree info unavailable\n");
>> +        return -ENODEV;
>> +    }
>> +
>> +    ret = of_property_read_u32(node, "ti,num-rings",
>> &ringacc->num_rings);
>> +    if (ret) {
>> +        dev_err(dev, "ti,num-rings read failure %d\n", ret);
>> +        return ret;
>> +    }
>> +
>> +    ringacc->dma_ring_reset_quirk =
>> +            of_property_read_bool(node, "ti,dma-ring-reset-quirk");
>> +
>> +    ringacc->tisci = ti_sci_get_by_phandle(node, "ti,sci");
>> +    if (IS_ERR(ringacc->tisci)) {
>> +        ret = PTR_ERR(ringacc->tisci);
>> +        if (ret != -EPROBE_DEFER)
>> +            dev_err(dev, "ti,sci read fail %d\n", ret);
>> +        ringacc->tisci = NULL;
>> +        return ret;
>> +    }
>> +
>> +    ret = of_property_read_u32(node, "ti,sci-dev-id",
>> +                   &ringacc->tisci_dev_id);
>> +    if (ret) {
>> +        dev_err(dev, "ti,sci-dev-id read fail %d\n", ret);
>> +        return ret;
>> +    }
>> +
>> +    pdev->id = ringacc->tisci_dev_id;
>> +
>> +    ringacc->rm_gp_range =
>> devm_ti_sci_get_of_resource(ringacc->tisci, dev,
>> +                        ringacc->tisci_dev_id,
>> +                        "ti,sci-rm-range-gp-rings");
>> +    if (IS_ERR(ringacc->rm_gp_range)) {
>> +        dev_err(dev, "Failed to allocate MSI interrupts\n");
>> +        return PTR_ERR(ringacc->rm_gp_range);
>> +    }
>> +
>> +    return ti_sci_inta_msi_domain_alloc_irqs(ringacc->dev,
>> +                         ringacc->rm_gp_range);
>> +}
>> +
>> +static int k3_ringacc_probe(struct platform_device *pdev)
>> +{
>> +    struct k3_ringacc *ringacc;
>> +    void __iomem *base_fifo, *base_rt;
>> +    struct device *dev = &pdev->dev;
>> +    struct resource *res;
>> +    int ret, i;
>> +
>> +    ringacc = devm_kzalloc(dev, sizeof(*ringacc), GFP_KERNEL);
>> +    if (!ringacc)
>> +        return -ENOMEM;
>> +
>> +    ringacc->dev = dev;
>> +    mutex_init(&ringacc->req_lock);
>> +
>> +    dev->msi_domain = of_msi_get_domain(dev, dev->of_node,
>> +                        DOMAIN_BUS_TI_SCI_INTA_MSI);
>> +    if (!dev->msi_domain) {
>> +        dev_err(dev, "Failed to get MSI domain\n");
>> +        return -EPROBE_DEFER;
>> +    }
>> +
>> +    ret = k3_ringacc_probe_dt(ringacc);
>> +    if (ret)
>> +        return ret;
>> +
>> +    res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "rt");
>> +    base_rt = devm_ioremap_resource(dev, res);
>> +    if (IS_ERR(base_rt))
>> +        return PTR_ERR(base_rt);
>> +
>> +    res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "fifos");
>> +    base_fifo = devm_ioremap_resource(dev, res);
>> +    if (IS_ERR(base_fifo))
>> +        return PTR_ERR(base_fifo);
>> +
>> +    res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
>> "proxy_gcfg");
>> +    ringacc->proxy_gcfg = devm_ioremap_resource(dev, res);
>> +    if (IS_ERR(ringacc->proxy_gcfg))
>> +        return PTR_ERR(ringacc->proxy_gcfg);
>> +
>> +    res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
>> +                       "proxy_target");
>> +    ringacc->proxy_target_base = devm_ioremap_resource(dev, res);
>> +    if (IS_ERR(ringacc->proxy_target_base))
>> +        return PTR_ERR(ringacc->proxy_target_base);
>> +
>> +    ringacc->num_proxies = dbg_readl(&ringacc->proxy_gcfg->config) &
>> +                     K3_RINGACC_PROXY_CFG_THREADS_MASK;
>> +
>> +    ringacc->rings = devm_kzalloc(dev,
>> +                      sizeof(*ringacc->rings) *
>> +                      ringacc->num_rings,
>> +                      GFP_KERNEL);
>> +    ringacc->rings_inuse = devm_kcalloc(dev,
>> +                        BITS_TO_LONGS(ringacc->num_rings),
>> +                        sizeof(unsigned long), GFP_KERNEL);
>> +    ringacc->proxy_inuse = devm_kcalloc(dev,
>> +                        BITS_TO_LONGS(ringacc->num_proxies),
>> +                        sizeof(unsigned long), GFP_KERNEL);
>> +
>> +    if (!ringacc->rings || !ringacc->rings_inuse ||
>> !ringacc->proxy_inuse)
>> +        return -ENOMEM;
>> +
>> +    for (i = 0; i < ringacc->num_rings; i++) {
>> +        ringacc->rings[i].rt = base_rt +
>> +                       K3_RINGACC_RT_REGS_STEP * i;
>> +        ringacc->rings[i].fifos = base_fifo +
>> +                      K3_RINGACC_FIFO_REGS_STEP * i;
>> +        ringacc->rings[i].parent = ringacc;
>> +        ringacc->rings[i].ring_id = i;
>> +        ringacc->rings[i].proxy_id = K3_RINGACC_PROXY_NOT_USED;
>> +    }
>> +    dev_set_drvdata(dev, ringacc);
>> +
>> +    ringacc->tisci_ring_ops = &ringacc->tisci->ops.rm_ring_ops;
>> +
>> +    pm_runtime_enable(dev);
>> +    ret = pm_runtime_get_sync(dev);
>> +    if (ret < 0) {
>> +        pm_runtime_put_noidle(dev);
>> +        dev_err(dev, "Failed to enable pm %d\n", ret);
>> +        goto err;
>> +    }
>> +
>> +    mutex_lock(&k3_ringacc_list_lock);
>> +    list_add_tail(&ringacc->list, &k3_ringacc_list);
>> +    mutex_unlock(&k3_ringacc_list_lock);
>> +
>> +    dev_info(dev, "Ring Accelerator probed rings:%u, gp-rings[%u,%u]
>> sci-dev-id:%u\n",
>> +         ringacc->num_rings,
>> +         ringacc->rm_gp_range->desc[0].start,
>> +         ringacc->rm_gp_range->desc[0].num,
>> +         ringacc->tisci_dev_id);
>> +    dev_info(dev, "dma-ring-reset-quirk: %s\n",
>> +         ringacc->dma_ring_reset_quirk ? "enabled" : "disabled");
>> +    dev_info(dev, "RA Proxy rev. %08x, num_proxies:%u\n",
>> +         dbg_readl(&ringacc->proxy_gcfg->revision),
>> +         ringacc->num_proxies);
>> +    return 0;
>> +
>> +err:
>> +    pm_runtime_disable(dev);
>> +    return ret;
>> +}
>> +
>> +static int k3_ringacc_remove(struct platform_device *pdev)
>> +{
>> +    struct k3_ringacc *ringacc = dev_get_drvdata(&pdev->dev);
>> +
>> +    pm_runtime_put_sync(&pdev->dev);
>> +    pm_runtime_disable(&pdev->dev);
>> +
>> +    mutex_lock(&k3_ringacc_list_lock);
>> +    list_del(&ringacc->list);
>> +    mutex_unlock(&k3_ringacc_list_lock);
>> +    return 0;
>> +}
>> +
>> +/* Match table for of_platform binding */
>> +static const struct of_device_id k3_ringacc_of_match[] = {
>> +    { .compatible = "ti,am654-navss-ringacc", },
>> +    {},
>> +};
>> +MODULE_DEVICE_TABLE(of, k3_ringacc_of_match);
>> +
>> +static struct platform_driver k3_ringacc_driver = {
>> +    .probe        = k3_ringacc_probe,
>> +    .remove        = k3_ringacc_remove,
>> +    .driver        = {
>> +        .name    = "k3-ringacc",
>> +        .of_match_table = k3_ringacc_of_match,
>> +    },
>> +};
>> +module_platform_driver(k3_ringacc_driver);
>> +
>> +MODULE_LICENSE("GPL v2");
>> +MODULE_DESCRIPTION("TI Ringacc driver for K3 SOCs");
>> +MODULE_AUTHOR("Grygorii Strashko <grygorii.strashko@ti.com>");
>> diff --git a/include/linux/soc/ti/k3-ringacc.h
>> b/include/linux/soc/ti/k3-ringacc.h
>> new file mode 100644
>> index 000000000000..debffba48ac9
>> --- /dev/null
>> +++ b/include/linux/soc/ti/k3-ringacc.h
>> @@ -0,0 +1,262 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * K3 Ring Accelerator (RA) subsystem interface
>> + *
>> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
>> + */
>> +
>> +#ifndef __SOC_TI_K3_RINGACC_API_H_
>> +#define __SOC_TI_K3_RINGACC_API_H_
>> +
>> +#include <linux/types.h>
>> +
>> +struct device_node;
>> +
>> +/**
>> + * enum k3_ring_mode - &struct k3_ring_cfg mode
>> + *
>> + * RA ring operational modes
>> + *
>> + * @K3_RINGACC_RING_MODE_RING: Exposed Ring mode for SW direct access
>> + * @K3_RINGACC_RING_MODE_MESSAGE: Messaging mode. Messaging mode
>> requires
>> + *    that all accesses to the queue must go through this IP so that all
>> + *    accesses to the memory are controlled and ordered. This IP then
>> + *    controls the entire state of the queue, and SW has no directly
>> control,
>> + *    such as through doorbells and cannot access the storage memory
>> directly.
>> + *    This is particularly useful when more than one SW or HW entity
>> can be
>> + *    the producer and/or consumer at the same time
>> + * @K3_RINGACC_RING_MODE_CREDENTIALS: Credentials mode is message
>> mode plus
>> + *    stores credentials with each message, requiring the element
>> size to be
>> + *    doubled to fit the credentials. Any exposed memory should be
>> protected
>> + *    by a firewall from unwanted access
>> + * @K3_RINGACC_RING_MODE_QM:  Queue manager mode. This takes the
>> credentials
>> + *    mode and adds packet length per element, along with additional
>> read only
>> + *    fields for element count and accumulated queue length. The QM
>> mode only
>> + *    operates with an 8 byte element size (any other element size is
>> + *    illegal), and like in credentials mode each operation uses 2
>> element
>> + *    slots to store the credentials and length fields
>> + */
>> +enum k3_ring_mode {
>> +    K3_RINGACC_RING_MODE_RING = 0,
>> +    K3_RINGACC_RING_MODE_MESSAGE,
>> +    K3_RINGACC_RING_MODE_CREDENTIALS,
>> +    K3_RINGACC_RING_MODE_QM,
>> +    K3_RINGACC_RING_MODE_INVALID
>> +};
>> +
>> +/**
>> + * enum k3_ring_size - &struct k3_ring_cfg elm_size
>> + *
>> + * RA ring element's sizes in bytes.
>> + */
>> +enum k3_ring_size {
>> +    K3_RINGACC_RING_ELSIZE_4 = 0,
>> +    K3_RINGACC_RING_ELSIZE_8,
>> +    K3_RINGACC_RING_ELSIZE_16,
>> +    K3_RINGACC_RING_ELSIZE_32,
>> +    K3_RINGACC_RING_ELSIZE_64,
>> +    K3_RINGACC_RING_ELSIZE_128,
>> +    K3_RINGACC_RING_ELSIZE_256,
>> +    K3_RINGACC_RING_ELSIZE_INVALID
>> +};
>> +
>> +struct k3_ringacc;
>> +struct k3_ring;
>> +
>> +/**
>> + * enum k3_ring_cfg - RA ring configuration structure
>> + *
>> + * @size: Ring size, number of elements
>> + * @elm_size: Ring element size
>> + * @mode: Ring operational mode
>> + * @flags: Ring configuration flags. Possible values:
>> + *     @K3_RINGACC_RING_SHARED: when set allows to request the same ring
>> + *     few times. It's usable when the same ring is used as Free Host
>> PD ring
>> + *     for different flows, for example.
>> + *     Note: Locking should be done by consumer if required
>> + */
>> +struct k3_ring_cfg {
>> +    u32 size;
>> +    enum k3_ring_size elm_size;
>> +    enum k3_ring_mode mode;
>> +#define K3_RINGACC_RING_SHARED BIT(1)
>> +    u32 flags;
>> +};
>> +
>> +#define K3_RINGACC_RING_ID_ANY (-1)
>> +
>> +/**
>> + * of_k3_ringacc_get_by_phandle - find a RA by phandle property
>> + * @np: device node
>> + * @propname: property name containing phandle on RA node
>> + *
>> + * Returns pointer on the RA - struct k3_ringacc
>> + * or -ENODEV if not found,
>> + * or -EPROBE_DEFER if not yet registered
>> + */
>> +struct k3_ringacc *of_k3_ringacc_get_by_phandle(struct device_node *np,
>> +                        const char *property);
>> +
>> +#define K3_RINGACC_RING_USE_PROXY BIT(1)
>> +
>> +/**
>> + * k3_ringacc_request_ring - request ring from ringacc
>> + * @ringacc: pointer on ringacc
>> + * @id: ring id or K3_RINGACC_RING_ID_ANY for any general purpose ring
>> + * @flags:
>> + *    @K3_RINGACC_RING_USE_PROXY: if set - proxy will be allocated and
>> + *        used to access ring memory. Sopported only for rings in
>> + *        Message/Credentials/Queue mode.
>> + *
>> + * Returns pointer on the Ring - struct k3_ring
>> + * or NULL in case of failure.
>> + */
>> +struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
>> +                    int id, u32 flags);
>> +
>> +/**
>> + * k3_ringacc_ring_reset - ring reset
>> + * @ring: pointer on Ring
>> + *
>> + * Resets ring internal state ((hw)occ, (hw)idx).
>> + * TODO_GS: ? Ring can be reused without reconfiguration
>> + */
>> +void k3_ringacc_ring_reset(struct k3_ring *ring);
>> +/**
>> + * k3_ringacc_ring_reset - ring reset for DMA rings
>> + * @ring: pointer on Ring
>> + *
>> + * Resets ring internal state ((hw)occ, (hw)idx). Should be used for
>> rings
>> + * which are read by K3 UDMA, like TX or Free Host PD rings.
>> + */
>> +void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ);
>> +
>> +/**
>> + * k3_ringacc_ring_free - ring free
>> + * @ring: pointer on Ring
>> + *
>> + * Resets ring and free all alocated resources.
>> + */
>> +int k3_ringacc_ring_free(struct k3_ring *ring);
>> +
>> +/**
>> + * k3_ringacc_get_ring_id - Get the Ring ID
>> + * @ring: pointer on ring
>> + *
>> + * Returns the Ring ID
>> + */
>> +u32 k3_ringacc_get_ring_id(struct k3_ring *ring);
>> +
>> +/**
>> + * k3_ringacc_get_ring_irq_num - Get the irq number for the ring
>> + * @ring: pointer on ring
>> + *
>> + * Returns the interrupt number which can be used to request the
>> interrupt
>> + */
>> +int k3_ringacc_get_ring_irq_num(struct k3_ring *ring);
>> +
>> +/**
>> + * k3_ringacc_ring_cfg - ring configure
>> + * @ring: pointer on ring
>> + * @cfg: Ring configuration parameters (see &struct k3_ring_cfg)
>> + *
>> + * Configures ring, including ring memory allocation.
>> + * Returns 0 on success, errno otherwise.
>> + */
>> +int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg);
>> +
>> +/**
>> + * k3_ringacc_ring_get_size - get ring size
>> + * @ring: pointer on ring
>> + *
>> + * Returns ring size in number of elements.
>> + */
>> +u32 k3_ringacc_ring_get_size(struct k3_ring *ring);
>> +
>> +/**
>> + * k3_ringacc_ring_get_free - get free elements
>> + * @ring: pointer on ring
>> + *
>> + * Returns number of free elements in the ring.
>> + */
>> +u32 k3_ringacc_ring_get_free(struct k3_ring *ring);
>> +
>> +/**
>> + * k3_ringacc_ring_get_occ - get ring occupancy
>> + * @ring: pointer on ring
>> + *
>> + * Returns total number of valid entries on the ring
>> + */
>> +u32 k3_ringacc_ring_get_occ(struct k3_ring *ring);
>> +
>> +/**
>> + * k3_ringacc_ring_is_full - checks if ring is full
>> + * @ring: pointer on ring
>> + *
>> + * Returns true if the ring is full
>> + */
>> +u32 k3_ringacc_ring_is_full(struct k3_ring *ring);
>> +
>> +/**
>> + * k3_ringacc_ring_push - push element to the ring tail
>> + * @ring: pointer on ring
>> + * @elem: pointer on ring element buffer
>> + *
>> + * Push one ring element to the ring tail. Size of the ring element is
>> + * determined by ring configuration &struct k3_ring_cfg elm_size.
>> + *
>> + * Returns 0 on success, errno otherwise.
>> + */
>> +int k3_ringacc_ring_push(struct k3_ring *ring, void *elem);
>> +
>> +/**
>> + * k3_ringacc_ring_pop - pop element from the ring head
>> + * @ring: pointer on ring
>> + * @elem: pointer on ring element buffer
>> + *
>> + * Push one ring element from the ring head. Size of the ring element is
>> + * determined by ring configuration &struct k3_ring_cfg elm_size..
>> + *
>> + * Returns 0 on success, errno otherwise.
>> + */
>> +int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem);
>> +
>> +/**
>> + * k3_ringacc_ring_push_head - push element to the ring head
>> + * @ring: pointer on ring
>> + * @elem: pointer on ring element buffer
>> + *
>> + * Push one ring element to the ring head. Size of the ring element is
>> + * determined by ring configuration &struct k3_ring_cfg elm_size.
>> + *
>> + * Returns 0 on success, errno otherwise.
>> + * Not Supported by ring modes: K3_RINGACC_RING_MODE_RING
>> + */
>> +int k3_ringacc_ring_push_head(struct k3_ring *ring, void *elem);
>> +
>> +/**
>> + * k3_ringacc_ring_pop_tail - pop element from the ring tail
>> + * @ring: pointer on ring
>> + * @elem: pointer on ring element buffer
>> + *
>> + * Push one ring element from the ring tail. Size of the ring element is
>> + * determined by ring configuration &struct k3_ring_cfg elm_size.
>> + *
>> + * Returns 0 on success, errno otherwise.
>> + * Not Supported by ring modes: K3_RINGACC_RING_MODE_RING
>> + */
>> +int k3_ringacc_ring_pop_tail(struct k3_ring *ring, void *elem);
>> +
>> +u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring);
>> +
>> +/**
>> + * Debugging definitions
>> + * TODO: might be removed
>> + */
>> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
>> +void k3_ringacc_ring_dump(struct k3_ring *ring);
>> +#else
>> +static inline void k3_ringacc_ring_dump(struct k3_ring *ring) {};
>> +#endif
>> +
>> +#endif /* __SOC_TI_K3_RINGACC_API_H_ */
>>
> 
> -- 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Regards
Vignesh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA
  2019-09-08 14:25   ` Vinod Koul
@ 2019-09-09 10:59     ` Peter Ujfalusi
  2019-09-10  7:06       ` Grygorii Strashko
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Ujfalusi @ 2019-09-09 10:59 UTC (permalink / raw)
  To: Vinod Koul
  Cc: robh+dt, nm, ssantosh, dan.j.williams, dmaengine,
	linux-arm-kernel, devicetree, linux-kernel, grygorii.strashko,
	lokeshvutla, t-kristo, tony, j-keerthy



On 08/09/2019 17.25, Vinod Koul wrote:
> On 30-07-19, 12:34, Peter Ujfalusi wrote:
> 
>> +/**
>> + * Descriptor header, present in all types of descriptors
>> + */
>> +struct cppi5_desc_hdr_t {
>> +	u32 pkt_info0;	/* Packet info word 0 (n/a in Buffer desc) */
>> +	u32 pkt_info1;	/* Packet info word 1 (n/a in Buffer desc) */
>> +	u32 pkt_info2;	/* Packet info word 2 Buffer reclamation info */
>> +	u32 src_dst_tag; /* Packet info word 3 (n/a in Buffer desc) */
> 
> Can we move these comments to kernel-doc style please

Sure, I'll move all struct and enums.

>> +/**
>> + * cppi5_desc_get_type - get descriptor type
>> + * @desc_hdr: packet descriptor/TR header
>> + *
>> + * Returns descriptor type:
>> + * CPPI5_INFO0_DESC_TYPE_VAL_HOST
>> + * CPPI5_INFO0_DESC_TYPE_VAL_MONO
>> + * CPPI5_INFO0_DESC_TYPE_VAL_TR
>> + */
>> +static inline u32 cppi5_desc_get_type(struct cppi5_desc_hdr_t *desc_hdr)
>> +{
>> +	WARN_ON(!desc_hdr);
> 
> why WARN_ON and not return error!

these helpers were intended to be as simple as possible.
I can go through with all of the WARN_ONs and replace them with if()
pr_warn() and either just return or return with 0.

Would that be acceptable?

>> +/**
>> + * cppi5_hdesc_calc_size - Calculate Host Packet Descriptor size
>> + * @epib: is EPIB present
>> + * @psdata_size: PSDATA size
>> + * @sw_data_size: SWDATA size
>> + *
>> + * Returns required Host Packet Descriptor size
>> + * 0 - if PSDATA > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE
>> + */
>> +static inline u32 cppi5_hdesc_calc_size(bool epib, u32 psdata_size,
>> +					u32 sw_data_size)
>> +{
>> +	u32 desc_size;
>> +
>> +	if (psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE)
>> +		return 0;
>> +	//TODO_GS: align
> 
> :)

Leftover TODO from Grygorii, the align is already done.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver
  2019-09-09  6:09   ` Tero Kristo
  2019-09-09  7:25     ` Vignesh Raghavendra
@ 2019-09-09 13:00     ` Peter Ujfalusi
  2019-09-09 16:58       ` Grygorii Strashko
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Ujfalusi @ 2019-09-09 13:00 UTC (permalink / raw)
  To: Tero Kristo, vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, grygorii.strashko, lokeshvutla, tony, j-keerthy

Hi,

Grygorii, can you take a look?

On 09/09/2019 9.09, Tero Kristo wrote:
> Hi,
> 
> Mostly some cosmetic comments below, other than that seems fine to me.
> 
> On 30/07/2019 12:34, Peter Ujfalusi wrote:
>> From: Grygorii Strashko <grygorii.strashko@ti.com>
>>
>> The Ring Accelerator (RINGACC or RA) provides hardware acceleration to
>> enable straightforward passing of work between a producer and a consumer.
>> There is one RINGACC module per NAVSS on TI AM65x SoCs.
>>
>> The RINGACC converts constant-address read and write accesses to
>> equivalent
>> read or write accesses to a circular data structure in memory. The
>> RINGACC
>> eliminates the need for each DMA controller which needs to access ring
>> elements from having to know the current state of the ring (base address,
>> current offset). The DMA controller performs a read or write access to a
>> specific address range (which maps to the source interface on the
>> RINGACC)
>> and the RINGACC replaces the address for the transaction with a new
>> address
>> which corresponds to the head or tail element of the ring (head for
>> reads,
>> tail for writes). Since the RINGACC maintains the state, multiple DMA
>> controllers or channels are allowed to coherently share the same rings as
>> applicable. The RINGACC is able to place data which is destined towards
>> software into cached memory directly.
>>
>> Supported ring modes:
>> - Ring Mode
>> - Messaging Mode
>> - Credentials Mode
>> - Queue Manager Mode
>>
>> TI-SCI integration:
>>
>> Texas Instrument's System Control Interface (TI-SCI) Message Protocol now
>> has control over Ringacc module resources management (RM) and Rings
>> configuration.
>>
>> The corresponding support of TI-SCI Ringacc module RM protocol
>> introduced as option through DT parameters:
>> - ti,sci: phandle on TI-SCI firmware controller DT node
>> - ti,sci-dev-id: TI-SCI device identifier as per TI-SCI firmware spec
>>
>> if both parameters present - Ringacc driver will configure/free/reset
>> Rings
>> using TI-SCI Message Ringacc RM Protocol.
>>
>> The Ringacc driver manages Rings allocation by itself now and requests
>> TI-SCI firmware to allocate and configure specific Rings only. It's done
>> this way because, Linux driver implements two stage Rings allocation and
>> configuration (allocate ring and configure ring) while I-SCI Message
> 
> I-SCI should be TI-SCI I believe.

Yes, it supposed to be.

> 
>> Protocol supports only one combined operation (allocate+configure).
>>
>> Grygorii Strashko <grygorii.strashko@ti.com>
> 
> Above seems to be missing SoB?

Oh, it is really missing.

> 
>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
>> ---
>>   drivers/soc/ti/Kconfig            |   17 +
>>   drivers/soc/ti/Makefile           |    1 +
>>   drivers/soc/ti/k3-ringacc.c       | 1191 +++++++++++++++++++++++++++++
>>   include/linux/soc/ti/k3-ringacc.h |  262 +++++++
>>   4 files changed, 1471 insertions(+)
>>   create mode 100644 drivers/soc/ti/k3-ringacc.c
>>   create mode 100644 include/linux/soc/ti/k3-ringacc.h
>>
>> diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
>> index cf545f428d03..10c76faa503e 100644
>> --- a/drivers/soc/ti/Kconfig
>> +++ b/drivers/soc/ti/Kconfig
>> @@ -80,6 +80,23 @@ config TI_SCI_PM_DOMAINS
>>         called ti_sci_pm_domains. Note this is needed early in boot
>> before
>>         rootfs may be available.
>>   +config TI_K3_RINGACC
>> +    tristate "K3 Ring accelerator Sub System"
>> +    depends on ARCH_K3 || COMPILE_TEST
>> +    depends on TI_SCI_INTA_IRQCHIP
>> +    default y
>> +    help
>> +      Say y here to support the K3 Ring accelerator module.
>> +      The Ring Accelerator (RINGACC or RA)  provides hardware
>> acceleration
>> +      to enable straightforward passing of work between a producer
>> +      and a consumer. There is one RINGACC module per NAVSS on TI
>> AM65x SoCs
>> +      If unsure, say N.
>> +
>> +config TI_K3_RINGACC_DEBUG
>> +    tristate "K3 Ring accelerator Sub System tests and debug"
>> +    depends on TI_K3_RINGACC
>> +    default n
>> +
>>   endif # SOC_TI
>>     config TI_SCI_INTA_MSI_DOMAIN
>> diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
>> index b3868d392d4f..cc4bc8b08bf5 100644
>> --- a/drivers/soc/ti/Makefile
>> +++ b/drivers/soc/ti/Makefile
>> @@ -9,3 +9,4 @@ obj-$(CONFIG_AMX3_PM)            += pm33xx.o
>>   obj-$(CONFIG_WKUP_M3_IPC)        += wkup_m3_ipc.o
>>   obj-$(CONFIG_TI_SCI_PM_DOMAINS)        += ti_sci_pm_domains.o
>>   obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN)    += ti_sci_inta_msi.o
>> +obj-$(CONFIG_TI_K3_RINGACC)        += k3-ringacc.o
>> diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
>> new file mode 100644
>> index 000000000000..401dfc963319
>> --- /dev/null
>> +++ b/drivers/soc/ti/k3-ringacc.c
>> @@ -0,0 +1,1191 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * TI K3 NAVSS Ring Accelerator subsystem driver
>> + *
>> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
>> + */
>> +
>> +#include <linux/dma-mapping.h>
>> +#include <linux/io.h>
>> +#include <linux/module.h>
>> +#include <linux/of.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/soc/ti/k3-ringacc.h>
>> +#include <linux/soc/ti/ti_sci_protocol.h>
>> +#include <linux/soc/ti/ti_sci_inta_msi.h>
>> +#include <linux/of_irq.h>
>> +#include <linux/irqdomain.h>
>> +
>> +static LIST_HEAD(k3_ringacc_list);
>> +static DEFINE_MUTEX(k3_ringacc_list_lock);
>> +
>> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
>> +#define    k3_nav_dbg(dev, arg...) dev_err(dev, arg)
> 
> dev_err seems exaggeration for debug purposes, maybe just dev_info.
> 
>> +static    void dbg_writel(u32 v, void __iomem *reg)
>> +{
>> +    pr_err("WRITEL(32): v(%08X)-->reg(%p)\n", v, reg);
> 
> Again, maybe just pr_info.

I think I'll just drop CONFIG_TI_K3_RINGACC_DEBUG altogether along with
dbg_writel/dbg_readl/k3_nav_dbg and use dev_dbg() when appropriate.

> 
>> +    writel(v, reg);
>> +}
>> +
>> +static    u32 dbg_readl(void __iomem *reg)
>> +{
>> +    u32 v;
>> +
>> +    v = readl(reg);
>> +    pr_err("READL(32): v(%08X)<--reg(%p)\n", v, reg);
>> +    return v;
>> +}
>> +#else
>> +#define    k3_nav_dbg(dev, arg...) dev_dbg(dev, arg)
>> +#define dbg_writel(v, reg) writel(v, reg)
> 
> Do you need to use hard writel, writel_relaxed is not enough?

not sure if we really need the barriers, but __raw_writel() should be
fine here imho

>> +
>> +#define dbg_readl(reg) readl(reg)
> 
> Same as above but for read?

__raw_readl() could be fine in also.

...

>> +/**
>> + * struct k3_ringacc - Rings accelerator descriptor
>> + *
>> + * @dev - pointer on RA device
>> + * @proxy_gcfg - RA proxy global config registers
>> + * @proxy_target_base - RA proxy datapath region
>> + * @num_rings - number of ring in RA
>> + * @rm_gp_range - general purpose rings range from tisci
>> + * @dma_ring_reset_quirk - DMA reset w/a enable
>> + * @num_proxies - number of RA proxies
>> + * @rings - array of rings descriptors (struct @k3_ring)
>> + * @list - list of RAs in the system
>> + * @tisci - pointer ti-sci handle
>> + * @tisci_ring_ops - ti-sci rings ops
>> + * @tisci_dev_id - ti-sci device id
>> + */
>> +struct k3_ringacc {
>> +    struct device *dev;
>> +    struct k3_ringacc_proxy_gcfg_regs __iomem *proxy_gcfg;
>> +    void __iomem *proxy_target_base;
>> +    u32 num_rings; /* number of rings in Ringacc module */
>> +    unsigned long *rings_inuse;
>> +    struct ti_sci_resource *rm_gp_range;
>> +
>> +    bool dma_ring_reset_quirk;
>> +    u32 num_proxies;
>> +    unsigned long *proxy_inuse;
> 
> proxy_inuse is not documented above.

I see, I'll update the documentation.

>> +
>> +    struct k3_ring *rings;
>> +    struct list_head list;
>> +    struct mutex req_lock; /* protect rings allocation */
>> +
>> +    const struct ti_sci_handle *tisci;
>> +    const struct ti_sci_rm_ringacc_ops *tisci_ring_ops;
>> +    u32  tisci_dev_id;
>> +};
>> +
>> +static long k3_ringacc_ring_get_fifo_pos(struct k3_ring *ring)
>> +{
>> +    return K3_RINGACC_FIFO_WINDOW_SIZE_BYTES -
>> +           (4 << ring->elm_size);
>> +}
>> +
>> +static void *k3_ringacc_get_elm_addr(struct k3_ring *ring, u32 idx)
>> +{
>> +    return (idx * (4 << ring->elm_size) + ring->ring_mem_virt);
> 
> The arithmetic here seems backwards compared to most other code I've
> seen. It would be more readable if you have it like:
> 
> ring->ring_mem_virt + idx * (4 << ring->elm_size);

Yes, I'll update.

> 
>> +}
>> +
>> +static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem);
>> +static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem);
>> +
>> +static struct k3_ring_ops k3_ring_mode_ring_ops = {
>> +        .push_tail = k3_ringacc_ring_push_mem,
>> +        .pop_head = k3_ringacc_ring_pop_mem,
>> +};
>> +
>> +static int k3_ringacc_ring_push_io(struct k3_ring *ring, void *elem);
>> +static int k3_ringacc_ring_pop_io(struct k3_ring *ring, void *elem);
>> +static int k3_ringacc_ring_push_head_io(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_pop_tail_io(struct k3_ring *ring, void
>> *elem);
>> +
>> +static struct k3_ring_ops k3_ring_mode_msg_ops = {
>> +        .push_tail = k3_ringacc_ring_push_io,
>> +        .push_head = k3_ringacc_ring_push_head_io,
>> +        .pop_tail = k3_ringacc_ring_pop_tail_io,
>> +        .pop_head = k3_ringacc_ring_pop_io,
>> +};
>> +
>> +static int k3_ringacc_ring_push_head_proxy(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_push_tail_proxy(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_pop_head_proxy(struct k3_ring *ring, void
>> *elem);
>> +static int k3_ringacc_ring_pop_tail_proxy(struct k3_ring *ring, void
>> *elem);
>> +
>> +static struct k3_ring_ops k3_ring_mode_proxy_ops = {
>> +        .push_tail = k3_ringacc_ring_push_tail_proxy,
>> +        .push_head = k3_ringacc_ring_push_head_proxy,
>> +        .pop_tail = k3_ringacc_ring_pop_tail_proxy,
>> +        .pop_head = k3_ringacc_ring_pop_head_proxy,
>> +};
>> +
>> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
>> +void k3_ringacc_ring_dump(struct k3_ring *ring)
>> +{
>> +    struct device *dev = ring->parent->dev;
>> +
>> +    k3_nav_dbg(dev, "dump ring: %d\n", ring->ring_id);
>> +    k3_nav_dbg(dev, "dump mem virt %p, dma %pad\n",
>> +           ring->ring_mem_virt, &ring->ring_mem_dma);
>> +    k3_nav_dbg(dev, "dump elmsize %d, size %d, mode %d, proxy_id %d\n",
>> +           ring->elm_size, ring->size, ring->mode, ring->proxy_id);
>> +
>> +    k3_nav_dbg(dev, "dump ring_rt_regs: db%08x\n",
>> +           readl(&ring->rt->db));
> 
> Why not use readl_relaxed in this func?

__raw_readl() might be enough?

> 
>> +    k3_nav_dbg(dev, "dump occ%08x\n",
>> +           readl(&ring->rt->occ));
>> +    k3_nav_dbg(dev, "dump indx%08x\n",
>> +           readl(&ring->rt->indx));
>> +    k3_nav_dbg(dev, "dump hwocc%08x\n",
>> +           readl(&ring->rt->hwocc));
>> +    k3_nav_dbg(dev, "dump hwindx%08x\n",
>> +           readl(&ring->rt->hwindx));
>> +
>> +    if (ring->ring_mem_virt)
>> +        print_hex_dump(KERN_ERR, "dump ring_mem_virt ",
>> +                   DUMP_PREFIX_NONE, 16, 1,
>> +                   ring->ring_mem_virt, 16 * 8, false);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_dump);
> 
> Do you really need to export a debug function?

It might come helpful for clients to dump the ring status runtime, but
since we don't have users, I'll move it to static.

>> +#endif
>> +
>> +struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
>> +                    int id, u32 flags)
>> +{
>> +    int proxy_id = K3_RINGACC_PROXY_NOT_USED;
>> +
>> +    mutex_lock(&ringacc->req_lock);
>> +
>> +    if (id == K3_RINGACC_RING_ID_ANY) {
>> +        /* Request for any general purpose ring */
>> +        struct ti_sci_resource_desc *gp_rings =
>> +                        &ringacc->rm_gp_range->desc[0];
>> +        unsigned long size;
>> +
>> +        size = gp_rings->start + gp_rings->num;
>> +        id = find_next_zero_bit(ringacc->rings_inuse, size,
>> +                    gp_rings->start);
>> +        if (id == size)
>> +            goto error;
>> +    } else if (id < 0) {
>> +        goto error;
>> +    }
>> +
>> +    if (test_bit(id, ringacc->rings_inuse) &&
>> +        !(ringacc->rings[id].flags & K3_RING_FLAG_SHARED))
>> +        goto error;
>> +    else if (ringacc->rings[id].flags & K3_RING_FLAG_SHARED)
>> +        goto out;
>> +
>> +    if (flags & K3_RINGACC_RING_USE_PROXY) {
>> +        proxy_id = find_next_zero_bit(ringacc->proxy_inuse,
>> +                          ringacc->num_proxies, 0);
>> +        if (proxy_id == ringacc->num_proxies)
>> +            goto error;
>> +    }
>> +
>> +    if (!try_module_get(ringacc->dev->driver->owner))
>> +        goto error;
>> +
>> +    if (proxy_id != K3_RINGACC_PROXY_NOT_USED) {
>> +        set_bit(proxy_id, ringacc->proxy_inuse);
>> +        ringacc->rings[id].proxy_id = proxy_id;
>> +        k3_nav_dbg(ringacc->dev, "Giving ring#%d proxy#%d\n",
>> +               id, proxy_id);
>> +    } else {
>> +        k3_nav_dbg(ringacc->dev, "Giving ring#%d\n", id);
>> +    }
>> +
>> +    set_bit(id, ringacc->rings_inuse);
>> +out:
>> +    ringacc->rings[id].use_count++;
>> +    mutex_unlock(&ringacc->req_lock);
>> +    return &ringacc->rings[id];
>> +
>> +error:
>> +    mutex_unlock(&ringacc->req_lock);
>> +    return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_request_ring);
>> +
>> +static void k3_ringacc_ring_reset_sci(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret;
>> +
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_RING_COUNT_VALID,
>> +            ringacc->tisci_dev_id,
>> +            ring->ring_id,
>> +            0,
>> +            0,
>> +            ring->size,
>> +            0,
>> +            0,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI reset ring fail (%d) ring_idx
>> %d\n",
>> +            ret, ring->ring_id);
> 
> Return value of sci ops is masked, why not return it and let the caller
> handle it properly?
> 
> Same comment for anything similar that follows.

Hrm, there is not much a caller can do other than PANIC in case the ring
configuration fails.
I can probagate the error, but not sure what action can be taken, if any.

>> +}
>> +
>> +void k3_ringacc_ring_reset(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return;
>> +
>> +    ring->occ = 0;
>> +    ring->free = 0;
>> +    ring->rindex = 0;
>> +    ring->windex = 0;
>> +
>> +    k3_ringacc_ring_reset_sci(ring);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset);
>> +
>> +static void k3_ringacc_ring_reconfig_qmode_sci(struct k3_ring *ring,
>> +                           enum k3_ring_mode mode)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret;
>> +
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_RING_MODE_VALID,
>> +            ringacc->tisci_dev_id,
>> +            ring->ring_id,
>> +            0,
>> +            0,
>> +            0,
>> +            mode,
>> +            0,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI reconf qmode fail (%d) ring_idx
>> %d\n",
>> +            ret, ring->ring_id);
>> +}
>> +
>> +void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return;
>> +
>> +    if (!ring->parent->dma_ring_reset_quirk)
>> +        return;
>> +
>> +    if (!occ)
>> +        occ = dbg_readl(&ring->rt->occ);
>> +
>> +    if (occ) {
>> +        u32 db_ring_cnt, db_ring_cnt_cur;
>> +
>> +        k3_nav_dbg(ring->parent->dev, "%s %u occ: %u\n", __func__,
>> +               ring->ring_id, occ);
>> +        /* 2. Reset the ring */
> 
> 2? Where is 1?

Oh, I'll fix the numbering.

> 
>> +        k3_ringacc_ring_reset_sci(ring);
>> +
>> +        /*
>> +         * 3. Setup the ring in ring/doorbell mode
>> +         * (if not already in this mode)
>> +         */
>> +        if (ring->mode != K3_RINGACC_RING_MODE_RING)
>> +            k3_ringacc_ring_reconfig_qmode_sci(
>> +                    ring, K3_RINGACC_RING_MODE_RING);
>> +        /*
>> +         * 4. Ring the doorbell 2**22 – ringOcc times.
>> +         * This will wrap the internal UDMAP ring state occupancy
>> +         * counter (which is 21-bits wide) to 0.
>> +         */
>> +        db_ring_cnt = (1U << 22) - occ;
>> +
>> +        while (db_ring_cnt != 0) {
>> +            /*
>> +             * Ring the doorbell with the maximum count each
>> +             * iteration if possible to minimize the total
>> +             * of writes
>> +             */
>> +            if (db_ring_cnt > K3_RINGACC_MAX_DB_RING_CNT)
>> +                db_ring_cnt_cur = K3_RINGACC_MAX_DB_RING_CNT;
>> +            else
>> +                db_ring_cnt_cur = db_ring_cnt;
>> +
>> +            writel(db_ring_cnt_cur, &ring->rt->db);
>> +            db_ring_cnt -= db_ring_cnt_cur;
>> +        }
>> +
>> +        /* 5. Restore the original ring mode (if not ring mode) */
>> +        if (ring->mode != K3_RINGACC_RING_MODE_RING)
>> +            k3_ringacc_ring_reconfig_qmode_sci(ring, ring->mode);
>> +    }
>> +
>> +    /* 2. Reset the ring */
> 
> Again 2?

I'll drop the '2.'

> 
>> +    k3_ringacc_ring_reset(ring);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset_dma);
>> +
>> +static void k3_ringacc_ring_free_sci(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret;
>> +
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
>> +            ringacc->tisci_dev_id,
>> +            ring->ring_id,
>> +            0,
>> +            0,
>> +            0,
>> +            0,
>> +            0,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI ring free fail (%d) ring_idx %d\n",
>> +            ret, ring->ring_id);
>> +}
>> +
>> +int k3_ringacc_ring_free(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc;
>> +
>> +    if (!ring)
>> +        return -EINVAL;
>> +
>> +    ringacc = ring->parent;
>> +
>> +    k3_nav_dbg(ring->parent->dev, "flags: 0x%08x\n", ring->flags);
>> +
>> +    if (!test_bit(ring->ring_id, ringacc->rings_inuse))
>> +        return -EINVAL;
>> +
>> +    mutex_lock(&ringacc->req_lock);
>> +
>> +    if (--ring->use_count)
>> +        goto out;
>> +
>> +    if (!(ring->flags & K3_RING_FLAG_BUSY))
>> +        goto no_init;
>> +
>> +    k3_ringacc_ring_free_sci(ring);
>> +
>> +    dma_free_coherent(ringacc->dev,
>> +              ring->size * (4 << ring->elm_size),
>> +              ring->ring_mem_virt, ring->ring_mem_dma);
>> +    ring->flags = 0;
>> +    ring->ops = NULL;
>> +    if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED) {
>> +        clear_bit(ring->proxy_id, ringacc->proxy_inuse);
>> +        ring->proxy = NULL;
>> +        ring->proxy_id = K3_RINGACC_PROXY_NOT_USED;
>> +    }
>> +
>> +no_init:
>> +    clear_bit(ring->ring_id, ringacc->rings_inuse);
>> +
>> +    module_put(ringacc->dev->driver->owner);
>> +
>> +out:
>> +    mutex_unlock(&ringacc->req_lock);
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_free);
>> +
>> +u32 k3_ringacc_get_ring_id(struct k3_ring *ring)
>> +{
>> +    if (!ring)
>> +        return -EINVAL;
>> +
>> +    return ring->ring_id;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_id);
>> +
>> +u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring)
>> +{
>> +    if (!ring)
>> +        return -EINVAL;
>> +
> 
> What if parent is NULL? Can it ever be here?

No, parent can not be NULL as the client would not have the ring in the
first place.

> 
>> +    return ring->parent->tisci_dev_id;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_tisci_dev_id);
>> +
>> +int k3_ringacc_get_ring_irq_num(struct k3_ring *ring)
>> +{
>> +    int irq_num;
>> +
>> +    if (!ring)
>> +        return -EINVAL;
>> +
>> +    irq_num = ti_sci_inta_msi_get_virq(ring->parent->dev,
>> ring->ring_id);
>> +    if (irq_num <= 0)
>> +        irq_num = -EINVAL;
>> +    return irq_num;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_irq_num);
>> +
>> +static int k3_ringacc_ring_cfg_sci(struct k3_ring *ring)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    u32 ring_idx;
>> +    int ret;
>> +
>> +    if (!ringacc->tisci)
>> +        return -EINVAL;
>> +
>> +    ring_idx = ring->ring_id;
>> +    ret = ringacc->tisci_ring_ops->config(
>> +            ringacc->tisci,
>> +            TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
>> +            ringacc->tisci_dev_id,
>> +            ring_idx,
>> +            lower_32_bits(ring->ring_mem_dma),
>> +            upper_32_bits(ring->ring_mem_dma),
>> +            ring->size,
>> +            ring->mode,
>> +            ring->elm_size,
>> +            0);
>> +    if (ret)
>> +        dev_err(ringacc->dev, "TISCI config ring fail (%d) ring_idx
>> %d\n",
>> +            ret, ring_idx);
>> +
>> +    return ret;
>> +}
>> +
>> +int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg)
>> +{
>> +    struct k3_ringacc *ringacc = ring->parent;
>> +    int ret = 0;
>> +
>> +    if (!ring || !cfg)
>> +        return -EINVAL;
>> +    if (cfg->elm_size > K3_RINGACC_RING_ELSIZE_256 ||
>> +        cfg->mode > K3_RINGACC_RING_MODE_QM ||
>> +        cfg->size & ~K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK ||
>> +        !test_bit(ring->ring_id, ringacc->rings_inuse))
>> +        return -EINVAL;
>> +
>> +    if (ring->use_count != 1)
> 
> Hmm, isn't this a failure actually?

Yes, it is: -EBUSY

>> +        return 0;
>> +
>> +    ring->size = cfg->size;
>> +    ring->elm_size = cfg->elm_size;
>> +    ring->mode = cfg->mode;
>> +    ring->occ = 0;
>> +    ring->free = 0;
>> +    ring->rindex = 0;
>> +    ring->windex = 0;
>> +
>> +    if (ring->proxy_id != K3_RINGACC_PROXY_NOT_USED)
>> +        ring->proxy = ringacc->proxy_target_base +
>> +                  ring->proxy_id * K3_RINGACC_PROXY_TARGET_STEP;
>> +
>> +    switch (ring->mode) {
>> +    case K3_RINGACC_RING_MODE_RING:
>> +        ring->ops = &k3_ring_mode_ring_ops;
>> +        break;
>> +    case K3_RINGACC_RING_MODE_QM:
>> +        /*
>> +         * In Queue mode elm_size can be 8 only and each operation
>> +         * uses 2 element slots
>> +         */
>> +        if (cfg->elm_size != K3_RINGACC_RING_ELSIZE_8 ||
>> +            cfg->size % 2)
>> +            goto err_free_proxy;
>> +        /* else, fall through */
>> +    case K3_RINGACC_RING_MODE_MESSAGE:
>> +        if (ring->proxy)
>> +            ring->ops = &k3_ring_mode_proxy_ops;
>> +        else
>> +            ring->ops = &k3_ring_mode_msg_ops;
>> +        break;
>> +    default:
>> +        ring->ops = NULL;
>> +        ret = -EINVAL;
>> +        goto err_free_proxy;
>> +    };
>> +
>> +    ring->ring_mem_virt =
>> +            dma_alloc_coherent(ringacc->dev,
>> +                       ring->size * (4 << ring->elm_size),
>> +                       &ring->ring_mem_dma, GFP_KERNEL);
>> +    if (!ring->ring_mem_virt) {
>> +        dev_err(ringacc->dev, "Failed to alloc ring mem\n");
>> +        ret = -ENOMEM;
>> +        goto err_free_ops;
>> +    }
>> +
>> +    ret = k3_ringacc_ring_cfg_sci(ring);
>> +
>> +    if (ret)
>> +        goto err_free_mem;
>> +
>> +    ring->flags |= K3_RING_FLAG_BUSY;
>> +    ring->flags |= (cfg->flags & K3_RINGACC_RING_SHARED) ?
>> +            K3_RING_FLAG_SHARED : 0;
>> +
>> +    k3_ringacc_ring_dump(ring);
>> +
>> +    return 0;
>> +
>> +err_free_mem:
>> +    dma_free_coherent(ringacc->dev,
>> +              ring->size * (4 << ring->elm_size),
>> +              ring->ring_mem_virt,
>> +              ring->ring_mem_dma);
>> +err_free_ops:
>> +    ring->ops = NULL;
>> +err_free_proxy:
>> +    ring->proxy = NULL;
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_cfg);
>> +
>> +u32 k3_ringacc_ring_get_size(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    return ring->size;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_size);
>> +
>> +u32 k3_ringacc_ring_get_free(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    if (!ring->free)
>> +        ring->free = ring->size - dbg_readl(&ring->rt->occ);
>> +
>> +    return ring->free;
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_free);
>> +
>> +u32 k3_ringacc_ring_get_occ(struct k3_ring *ring)
>> +{
>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>> +        return -EINVAL;
>> +
>> +    return dbg_readl(&ring->rt->occ);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_occ);
>> +
>> +u32 k3_ringacc_ring_is_full(struct k3_ring *ring)
>> +{
>> +    return !k3_ringacc_ring_get_free(ring);
>> +}
>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_is_full);
>> +
>> +enum k3_ringacc_access_mode {
>> +    K3_RINGACC_ACCESS_MODE_PUSH_HEAD,
>> +    K3_RINGACC_ACCESS_MODE_POP_HEAD,
>> +    K3_RINGACC_ACCESS_MODE_PUSH_TAIL,
>> +    K3_RINGACC_ACCESS_MODE_POP_TAIL,
>> +    K3_RINGACC_ACCESS_MODE_PEEK_HEAD,
>> +    K3_RINGACC_ACCESS_MODE_PEEK_TAIL,
>> +};
>> +
>> +static int k3_ringacc_ring_cfg_proxy(struct k3_ring *ring,
>> +                     enum k3_ringacc_proxy_access_mode mode)
>> +{
>> +    u32 val;
>> +
>> +    val = ring->ring_id;
>> +    val |= mode << 16;
>> +    val |= ring->elm_size << 24;
> 
> Would be nice to have these magic shifts as defines.

OK, I'll add defines for the magic shifts.

> 
>> +    dbg_writel(val, &ring->proxy->control);
>> +    return 0;
>> +}
>> +

Thanks for the review,
- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver
  2019-09-09 13:00     ` Peter Ujfalusi
@ 2019-09-09 16:58       ` Grygorii Strashko
  0 siblings, 0 replies; 32+ messages in thread
From: Grygorii Strashko @ 2019-09-09 16:58 UTC (permalink / raw)
  To: Peter Ujfalusi, Tero Kristo, vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, lokeshvutla, tony, j-keerthy



On 09/09/2019 16:00, Peter Ujfalusi wrote:
> Hi,
> 
> Grygorii, can you take a look?
> 
> On 09/09/2019 9.09, Tero Kristo wrote:
>> Hi,
>>
>> Mostly some cosmetic comments below, other than that seems fine to me.
>>
>> On 30/07/2019 12:34, Peter Ujfalusi wrote:
>>> From: Grygorii Strashko <grygorii.strashko@ti.com>
>>>
>>> The Ring Accelerator (RINGACC or RA) provides hardware acceleration to
>>> enable straightforward passing of work between a producer and a consumer.
>>> There is one RINGACC module per NAVSS on TI AM65x SoCs.
>>>
>>> The RINGACC converts constant-address read and write accesses to
>>> equivalent
>>> read or write accesses to a circular data structure in memory. The
>>> RINGACC
>>> eliminates the need for each DMA controller which needs to access ring
>>> elements from having to know the current state of the ring (base address,
>>> current offset). The DMA controller performs a read or write access to a
>>> specific address range (which maps to the source interface on the
>>> RINGACC)
>>> and the RINGACC replaces the address for the transaction with a new
>>> address
>>> which corresponds to the head or tail element of the ring (head for
>>> reads,
>>> tail for writes). Since the RINGACC maintains the state, multiple DMA
>>> controllers or channels are allowed to coherently share the same rings as
>>> applicable. The RINGACC is able to place data which is destined towards
>>> software into cached memory directly.
>>>
>>> Supported ring modes:
>>> - Ring Mode
>>> - Messaging Mode
>>> - Credentials Mode
>>> - Queue Manager Mode
>>>
>>> TI-SCI integration:
>>>
>>> Texas Instrument's System Control Interface (TI-SCI) Message Protocol now
>>> has control over Ringacc module resources management (RM) and Rings
>>> configuration.
>>>
>>> The corresponding support of TI-SCI Ringacc module RM protocol
>>> introduced as option through DT parameters:
>>> - ti,sci: phandle on TI-SCI firmware controller DT node
>>> - ti,sci-dev-id: TI-SCI device identifier as per TI-SCI firmware spec
>>>
>>> if both parameters present - Ringacc driver will configure/free/reset
>>> Rings
>>> using TI-SCI Message Ringacc RM Protocol.
>>>
>>> The Ringacc driver manages Rings allocation by itself now and requests
>>> TI-SCI firmware to allocate and configure specific Rings only. It's done
>>> this way because, Linux driver implements two stage Rings allocation and
>>> configuration (allocate ring and configure ring) while I-SCI Message
>>
>> I-SCI should be TI-SCI I believe.
> 
> Yes, it supposed to be.
> 
>>
>>> Protocol supports only one combined operation (allocate+configure).
>>>
>>> Grygorii Strashko <grygorii.strashko@ti.com>
>>
>> Above seems to be missing SoB?
> 
> Oh, it is really missing.
> 
>>
>>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
>>> ---
>>>    drivers/soc/ti/Kconfig            |   17 +
>>>    drivers/soc/ti/Makefile           |    1 +
>>>    drivers/soc/ti/k3-ringacc.c       | 1191 +++++++++++++++++++++++++++++
>>>    include/linux/soc/ti/k3-ringacc.h |  262 +++++++
>>>    4 files changed, 1471 insertions(+)
>>>    create mode 100644 drivers/soc/ti/k3-ringacc.c
>>>    create mode 100644 include/linux/soc/ti/k3-ringacc.h
>>>
>>> diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
>>> index cf545f428d03..10c76faa503e 100644
>>> --- a/drivers/soc/ti/Kconfig
>>> +++ b/drivers/soc/ti/Kconfig
>>> @@ -80,6 +80,23 @@ config TI_SCI_PM_DOMAINS
>>>          called ti_sci_pm_domains. Note this is needed early in boot
>>> before
>>>          rootfs may be available.
>>>    +config TI_K3_RINGACC
>>> +    tristate "K3 Ring accelerator Sub System"
>>> +    depends on ARCH_K3 || COMPILE_TEST
>>> +    depends on TI_SCI_INTA_IRQCHIP
>>> +    default y
>>> +    help
>>> +      Say y here to support the K3 Ring accelerator module.
>>> +      The Ring Accelerator (RINGACC or RA)  provides hardware
>>> acceleration
>>> +      to enable straightforward passing of work between a producer
>>> +      and a consumer. There is one RINGACC module per NAVSS on TI
>>> AM65x SoCs
>>> +      If unsure, say N.
>>> +
>>> +config TI_K3_RINGACC_DEBUG
>>> +    tristate "K3 Ring accelerator Sub System tests and debug"
>>> +    depends on TI_K3_RINGACC
>>> +    default n
>>> +
>>>    endif # SOC_TI
>>>      config TI_SCI_INTA_MSI_DOMAIN
>>> diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
>>> index b3868d392d4f..cc4bc8b08bf5 100644
>>> --- a/drivers/soc/ti/Makefile
>>> +++ b/drivers/soc/ti/Makefile
>>> @@ -9,3 +9,4 @@ obj-$(CONFIG_AMX3_PM)            += pm33xx.o
>>>    obj-$(CONFIG_WKUP_M3_IPC)        += wkup_m3_ipc.o
>>>    obj-$(CONFIG_TI_SCI_PM_DOMAINS)        += ti_sci_pm_domains.o
>>>    obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN)    += ti_sci_inta_msi.o
>>> +obj-$(CONFIG_TI_K3_RINGACC)        += k3-ringacc.o
>>> diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
>>> new file mode 100644
>>> index 000000000000..401dfc963319
>>> --- /dev/null
>>> +++ b/drivers/soc/ti/k3-ringacc.c
>>> @@ -0,0 +1,1191 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * TI K3 NAVSS Ring Accelerator subsystem driver
>>> + *
>>> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com
>>> + */
>>> +
>>> +#include <linux/dma-mapping.h>
>>> +#include <linux/io.h>
>>> +#include <linux/module.h>
>>> +#include <linux/of.h>
>>> +#include <linux/platform_device.h>
>>> +#include <linux/pm_runtime.h>
>>> +#include <linux/soc/ti/k3-ringacc.h>
>>> +#include <linux/soc/ti/ti_sci_protocol.h>
>>> +#include <linux/soc/ti/ti_sci_inta_msi.h>
>>> +#include <linux/of_irq.h>
>>> +#include <linux/irqdomain.h>
>>> +
>>> +static LIST_HEAD(k3_ringacc_list);
>>> +static DEFINE_MUTEX(k3_ringacc_list_lock);
>>> +
>>> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
>>> +#define    k3_nav_dbg(dev, arg...) dev_err(dev, arg)
>>
>> dev_err seems exaggeration for debug purposes, maybe just dev_info.
>>
>>> +static    void dbg_writel(u32 v, void __iomem *reg)
>>> +{
>>> +    pr_err("WRITEL(32): v(%08X)-->reg(%p)\n", v, reg);
>>
>> Again, maybe just pr_info.
> 
> I think I'll just drop CONFIG_TI_K3_RINGACC_DEBUG altogether along with
> dbg_writel/dbg_readl/k3_nav_dbg and use dev_dbg() when appropriate.

Sounds good.

> 
>>
>>> +    writel(v, reg);
>>> +}
>>> +
>>> +static    u32 dbg_readl(void __iomem *reg)
>>> +{
>>> +    u32 v;
>>> +
>>> +    v = readl(reg);
>>> +    pr_err("READL(32): v(%08X)<--reg(%p)\n", v, reg);
>>> +    return v;
>>> +}
>>> +#else
>>> +#define    k3_nav_dbg(dev, arg...) dev_dbg(dev, arg)
>>> +#define dbg_writel(v, reg) writel(v, reg)
>>
>> Do you need to use hard writel, writel_relaxed is not enough?
> 
> not sure if we really need the barriers, but __raw_writel() should be
> fine here imho

xxx_relaxed relaxed versions should be used only when necessary and with
adding appropriate comments why they've been used and what benefits from using
them for each particular case.
So, i do not agree with this blind conversation.

> 
>>> +
>>> +#define dbg_readl(reg) readl(reg)
>>
>> Same as above but for read?
> 
> __raw_readl() could be fine in also.

No. __raw_xxx api should never be used by drivers.


> 
> ...
> 
>>> +/**
>>> + * struct k3_ringacc - Rings accelerator descriptor
>>> + *
>>> + * @dev - pointer on RA device
>>> + * @proxy_gcfg - RA proxy global config registers
>>> + * @proxy_target_base - RA proxy datapath region
>>> + * @num_rings - number of ring in RA
>>> + * @rm_gp_range - general purpose rings range from tisci
>>> + * @dma_ring_reset_quirk - DMA reset w/a enable
>>> + * @num_proxies - number of RA proxies
>>> + * @rings - array of rings descriptors (struct @k3_ring)
>>> + * @list - list of RAs in the system
>>> + * @tisci - pointer ti-sci handle
>>> + * @tisci_ring_ops - ti-sci rings ops
>>> + * @tisci_dev_id - ti-sci device id
>>> + */

...

>>> +
>>> +#ifdef CONFIG_TI_K3_RINGACC_DEBUG
>>> +void k3_ringacc_ring_dump(struct k3_ring *ring)
>>> +{
>>> +    struct device *dev = ring->parent->dev;
>>> +
>>> +    k3_nav_dbg(dev, "dump ring: %d\n", ring->ring_id);
>>> +    k3_nav_dbg(dev, "dump mem virt %p, dma %pad\n",
>>> +           ring->ring_mem_virt, &ring->ring_mem_dma);
>>> +    k3_nav_dbg(dev, "dump elmsize %d, size %d, mode %d, proxy_id %d\n",
>>> +           ring->elm_size, ring->size, ring->mode, ring->proxy_id);
>>> +
>>> +    k3_nav_dbg(dev, "dump ring_rt_regs: db%08x\n",
>>> +           readl(&ring->rt->db));
>>
>> Why not use readl_relaxed in this func?
> 
> __raw_readl() might be enough?

No Raw, but this seems only one place where relaxed version can be used.

> 
>>
>>> +    k3_nav_dbg(dev, "dump occ%08x\n",
>>> +           readl(&ring->rt->occ));
>>> +    k3_nav_dbg(dev, "dump indx%08x\n",
>>> +           readl(&ring->rt->indx));
>>> +    k3_nav_dbg(dev, "dump hwocc%08x\n",
>>> +           readl(&ring->rt->hwocc));
>>> +    k3_nav_dbg(dev, "dump hwindx%08x\n",
>>> +           readl(&ring->rt->hwindx));
>>> +
>>> +    if (ring->ring_mem_virt)
>>> +        print_hex_dump(KERN_ERR, "dump ring_mem_virt ",
>>> +                   DUMP_PREFIX_NONE, 16, 1,
>>> +                   ring->ring_mem_virt, 16 * 8, false);
>>> +}
>>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_dump);
>>
>> Do you really need to export a debug function?
> 
> It might come helpful for clients to dump the ring status runtime, but
> since we don't have users, I'll move it to static.

Yep. It was exported for debug purposes. But hence there are no active users - cna be removed.

> 
>>> +#endif
>>> +
>>> +struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc,
>>> +                    int id, u32 flags)
>>> +{
>>> +    int proxy_id = K3_RINGACC_PROXY_NOT_USED;
>>> +
>>> +    mutex_lock(&ringacc->req_lock);
>>> +
>>> +    if (id == K3_RINGACC_RING_ID_ANY) {
>>> +        /* Request for any general purpose ring */
>>> +        struct ti_sci_resource_desc *gp_rings =
>>> +                        &ringacc->rm_gp_range->desc[0];
>>> +        unsigned long size;
>>> +
>>> +        size = gp_rings->start + gp_rings->num;
>>> +        id = find_next_zero_bit(ringacc->rings_inuse, size,
>>> +                    gp_rings->start);
>>> +        if (id == size)
>>> +            goto error;
>>> +    } else if (id < 0) {
>>> +        goto error;
>>> +    }
>>> +
>>> +    if (test_bit(id, ringacc->rings_inuse) &&
>>> +        !(ringacc->rings[id].flags & K3_RING_FLAG_SHARED))
>>> +        goto error;
>>> +    else if (ringacc->rings[id].flags & K3_RING_FLAG_SHARED)
>>> +        goto out;
>>> +
>>> +    if (flags & K3_RINGACC_RING_USE_PROXY) {
>>> +        proxy_id = find_next_zero_bit(ringacc->proxy_inuse,
>>> +                          ringacc->num_proxies, 0);
>>> +        if (proxy_id == ringacc->num_proxies)
>>> +            goto error;
>>> +    }
>>> +
>>> +    if (!try_module_get(ringacc->dev->driver->owner))
>>> +        goto error;
>>> +
>>> +    if (proxy_id != K3_RINGACC_PROXY_NOT_USED) {
>>> +        set_bit(proxy_id, ringacc->proxy_inuse);
>>> +        ringacc->rings[id].proxy_id = proxy_id;
>>> +        k3_nav_dbg(ringacc->dev, "Giving ring#%d proxy#%d\n",
>>> +               id, proxy_id);
>>> +    } else {
>>> +        k3_nav_dbg(ringacc->dev, "Giving ring#%d\n", id);
>>> +    }
>>> +
>>> +    set_bit(id, ringacc->rings_inuse);
>>> +out:
>>> +    ringacc->rings[id].use_count++;
>>> +    mutex_unlock(&ringacc->req_lock);
>>> +    return &ringacc->rings[id];
>>> +
>>> +error:
>>> +    mutex_unlock(&ringacc->req_lock);
>>> +    return NULL;
>>> +}
>>> +EXPORT_SYMBOL_GPL(k3_ringacc_request_ring);
>>> +
>>> +static void k3_ringacc_ring_reset_sci(struct k3_ring *ring)
>>> +{
>>> +    struct k3_ringacc *ringacc = ring->parent;
>>> +    int ret;
>>> +
>>> +    ret = ringacc->tisci_ring_ops->config(
>>> +            ringacc->tisci,
>>> +            TI_SCI_MSG_VALUE_RM_RING_COUNT_VALID,
>>> +            ringacc->tisci_dev_id,
>>> +            ring->ring_id,
>>> +            0,
>>> +            0,
>>> +            ring->size,
>>> +            0,
>>> +            0,
>>> +            0);
>>> +    if (ret)
>>> +        dev_err(ringacc->dev, "TISCI reset ring fail (%d) ring_idx
>>> %d\n",
>>> +            ret, ring->ring_id);
>>
>> Return value of sci ops is masked, why not return it and let the caller
>> handle it properly?
>>
>> Same comment for anything similar that follows.
> 
> Hrm, there is not much a caller can do other than PANIC in case the ring
> configuration fails.
> I can probagate the error, but not sure what action can be taken, if any.
> 
>>> +}
>>> +
>>> +void k3_ringacc_ring_reset(struct k3_ring *ring)
>>> +{
>>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>>> +        return;
>>> +
>>> +    ring->occ = 0;
>>> +    ring->free = 0;
>>> +    ring->rindex = 0;
>>> +    ring->windex = 0;
>>> +
>>> +    k3_ringacc_ring_reset_sci(ring);
>>> +}
>>> +EXPORT_SYMBOL_GPL(k3_ringacc_ring_reset);
>>> +
>>> +static void k3_ringacc_ring_reconfig_qmode_sci(struct k3_ring *ring,
>>> +                           enum k3_ring_mode mode)
>>> +{
>>> +    struct k3_ringacc *ringacc = ring->parent;
>>> +    int ret;
>>> +
>>> +    ret = ringacc->tisci_ring_ops->config(
>>> +            ringacc->tisci,
>>> +            TI_SCI_MSG_VALUE_RM_RING_MODE_VALID,
>>> +            ringacc->tisci_dev_id,
>>> +            ring->ring_id,
>>> +            0,
>>> +            0,
>>> +            0,
>>> +            mode,
>>> +            0,
>>> +            0);
>>> +    if (ret)
>>> +        dev_err(ringacc->dev, "TISCI reconf qmode fail (%d) ring_idx
>>> %d\n",
>>> +            ret, ring->ring_id);
>>> +}
>>> +
>>> +void k3_ringacc_ring_reset_dma(struct k3_ring *ring, u32 occ)
>>> +{
>>> +    if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
>>> +        return;
>>> +
>>> +    if (!ring->parent->dma_ring_reset_quirk)
>>> +        return;
>>> +
>>> +    if (!occ)
>>> +        occ = dbg_readl(&ring->rt->occ);
>>> +
>>> +    if (occ) {
>>> +        u32 db_ring_cnt, db_ring_cnt_cur;
>>> +
>>> +        k3_nav_dbg(ring->parent->dev, "%s %u occ: %u\n", __func__,
>>> +               ring->ring_id, occ);
>>> +        /* 2. Reset the ring */
>>
>> 2? Where is 1?
> 
> Oh, I'll fix the numbering.

1. is 'Get ring occupancy count"
I think you can just drop numbering

> 
>>
>>> +        k3_ringacc_ring_reset_sci(ring);
>>> +
>>> +        /*
>>> +         * 3. Setup the ring in ring/doorbell mode
>>> +         * (if not already in this mode)
>>> +         */
>>> +        if (ring->mode != K3_RINGACC_RING_MODE_RING)
>>> +            k3_ringacc_ring_reconfig_qmode_sci(
>>> +                    ring, K3_RINGACC_RING_MODE_RING);
>>> +        /*
>>> +         * 4. Ring the doorbell 2**22 – ringOcc times.
>>> +         * This will wrap the internal UDMAP ring state occupancy
>>> +         * counter (which is 21-bits wide) to 0.
>>> +         */
>>> +        db_ring_cnt = (1U << 22) - occ;
>>> +
>>> +        while (db_ring_cnt != 0) {
>>> +            /*
>>> +             * Ring the doorbell with the maximum count each
>>> +             * iteration if possible to minimize the total
>>> +             * of writes
>>> +             */
>>> +            if (db_ring_cnt > K3_RINGACC_MAX_DB_RING_CNT)
>>> +                db_ring_cnt_cur = K3_RINGACC_MAX_DB_RING_CNT;
>>> +            else
>>> +                db_ring_cnt_cur = db_ring_cnt;
>>> +
>>> +            writel(db_ring_cnt_cur, &ring->rt->db);
>>> +            db_ring_cnt -= db_ring_cnt_cur;
>>> +        }
>>> +
>>> +        /* 5. Restore the original ring mode (if not ring mode) */
>>> +        if (ring->mode != K3_RINGACC_RING_MODE_RING)
>>> +            k3_ringacc_ring_reconfig_qmode_sci(ring, ring->mode);
>>> +    }
>>> +
>>> +    /* 2. Reset the ring */
>>

>>> +
>>> +u32 k3_ringacc_get_tisci_dev_id(struct k3_ring *ring)
>>> +{
>>> +    if (!ring)
>>> +        return -EINVAL;
>>> +
>>
>> What if parent is NULL? Can it ever be here?
> 
> No, parent can not be NULL as the client would not have the ring in the
> first place.
> 
>>
>>> +    return ring->parent->tisci_dev_id;
>>> +}
>>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_tisci_dev_id);
>>> +
>>> +int k3_ringacc_get_ring_irq_num(struct k3_ring *ring)
>>> +{
>>> +    int irq_num;
>>> +
>>> +    if (!ring)
>>> +        return -EINVAL;
>>> +
>>> +    irq_num = ti_sci_inta_msi_get_virq(ring->parent->dev,
>>> ring->ring_id);
>>> +    if (irq_num <= 0)
>>> +        irq_num = -EINVAL;
>>> +    return irq_num;
>>> +}
>>> +EXPORT_SYMBOL_GPL(k3_ringacc_get_ring_irq_num);
>>> +
>>> +static int k3_ringacc_ring_cfg_sci(struct k3_ring *ring)
>>> +{
>>> +    struct k3_ringacc *ringacc = ring->parent;
>>> +    u32 ring_idx;
>>> +    int ret;
>>> +
>>> +    if (!ringacc->tisci)
>>> +        return -EINVAL;
>>> +
>>> +    ring_idx = ring->ring_id;
>>> +    ret = ringacc->tisci_ring_ops->config(
>>> +            ringacc->tisci,
>>> +            TI_SCI_MSG_VALUE_RM_ALL_NO_ORDER,
>>> +            ringacc->tisci_dev_id,
>>> +            ring_idx,
>>> +            lower_32_bits(ring->ring_mem_dma),
>>> +            upper_32_bits(ring->ring_mem_dma),
>>> +            ring->size,
>>> +            ring->mode,
>>> +            ring->elm_size,
>>> +            0);
>>> +    if (ret)
>>> +        dev_err(ringacc->dev, "TISCI config ring fail (%d) ring_idx
>>> %d\n",
>>> +            ret, ring_idx);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +int k3_ringacc_ring_cfg(struct k3_ring *ring, struct k3_ring_cfg *cfg)
>>> +{
>>> +    struct k3_ringacc *ringacc = ring->parent;
>>> +    int ret = 0;
>>> +
>>> +    if (!ring || !cfg)
>>> +        return -EINVAL;
>>> +    if (cfg->elm_size > K3_RINGACC_RING_ELSIZE_256 ||
>>> +        cfg->mode > K3_RINGACC_RING_MODE_QM ||
>>> +        cfg->size & ~K3_RINGACC_CFG_RING_SIZE_ELCNT_MASK ||
>>> +        !test_bit(ring->ring_id, ringacc->rings_inuse))
>>> +        return -EINVAL;
>>> +
>>> +    if (ring->use_count != 1)
>>
>> Hmm, isn't this a failure actually?
> 
> Yes, it is: -EBUSY

No. This is for shared rings.
0 - should never happens once ring is requested.
1 - only one user - configure ring
>1 - shared ring which is configured already - just exit as ring configure already.

> 
>>> +        return 0;
>>> +
>>> +    ring->size = cfg->size;
>>> +    ring->elm_size = cfg->elm_size;
>>> +    ring->mode = cfg->mode;
>>> +    ring->occ = 0;
>>> +    ring->free = 0;
>>> +    ring->rindex = 0;
>>> +    ring->windex = 0;
>>> +

[...]

-- 
Best regards,
grygorii

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA
  2019-09-09 10:59     ` Peter Ujfalusi
@ 2019-09-10  7:06       ` Grygorii Strashko
  0 siblings, 0 replies; 32+ messages in thread
From: Grygorii Strashko @ 2019-09-10  7:06 UTC (permalink / raw)
  To: Peter Ujfalusi, Vinod Koul
  Cc: robh+dt, nm, ssantosh, dan.j.williams, dmaengine,
	linux-arm-kernel, devicetree, linux-kernel, lokeshvutla,
	t-kristo, tony, j-keerthy



On 09/09/2019 13:59, Peter Ujfalusi wrote:
> 
> 
> On 08/09/2019 17.25, Vinod Koul wrote:
>> On 30-07-19, 12:34, Peter Ujfalusi wrote:
>>
>>> +/**
>>> + * Descriptor header, present in all types of descriptors
>>> + */
>>> +struct cppi5_desc_hdr_t {
>>> +	u32 pkt_info0;	/* Packet info word 0 (n/a in Buffer desc) */
>>> +	u32 pkt_info1;	/* Packet info word 1 (n/a in Buffer desc) */
>>> +	u32 pkt_info2;	/* Packet info word 2 Buffer reclamation info */
>>> +	u32 src_dst_tag; /* Packet info word 3 (n/a in Buffer desc) */
>>
>> Can we move these comments to kernel-doc style please
> 
> Sure, I'll move all struct and enums.
> 
>>> +/**
>>> + * cppi5_desc_get_type - get descriptor type
>>> + * @desc_hdr: packet descriptor/TR header
>>> + *
>>> + * Returns descriptor type:
>>> + * CPPI5_INFO0_DESC_TYPE_VAL_HOST
>>> + * CPPI5_INFO0_DESC_TYPE_VAL_MONO
>>> + * CPPI5_INFO0_DESC_TYPE_VAL_TR
>>> + */
>>> +static inline u32 cppi5_desc_get_type(struct cppi5_desc_hdr_t *desc_hdr)
>>> +{
>>> +	WARN_ON(!desc_hdr);
>>
>> why WARN_ON and not return error!
> 
> these helpers were intended to be as simple as possible.
> I can go through with all of the WARN_ONs and replace them with if()
> pr_warn() and either just return or return with 0.
> 
> Would that be acceptable?
> 

This should never happens in working system unless there is buggy code.
I think It can be just removed

-- 
Best regards,
grygorii

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/14] dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free chan_resources
  2019-07-30  9:34 ` [PATCH v2 10/14] dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free chan_resources Peter Ujfalusi
@ 2019-09-10  7:25   ` Grygorii Strashko
  2019-09-10  7:53     ` Peter Ujfalusi
  0 siblings, 1 reply; 32+ messages in thread
From: Grygorii Strashko @ 2019-09-10  7:25 UTC (permalink / raw)
  To: Peter Ujfalusi, vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, lokeshvutla, t-kristo, tony, j-keerthy



On 30/07/2019 12:34, Peter Ujfalusi wrote:
> Split patch for review containing: channel rsource allocation and free
> functions.
> 
> DMA driver for
> Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)
> 
> The UDMA-P is intended to perform similar (but significantly upgraded) functions
> as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
> supports the transmission and reception of various packet types. The UDMA-P is
> architected to facilitate the segmentation and reassembly of SoC DMA data
> structure compliant packets to/from smaller data blocks that are natively
> compatible with the specific requirements of each connected peripheral. Multiple
> Tx and Rx channels are provided within the DMA which allow multiple segmentation
> or reassembly operations to be ongoing. The DMA controller maintains state
> information for each of the channels which allows packet segmentation and
> reassembly operations to be time division multiplexed between channels in order
> to share the underlying DMA hardware. An external DMA scheduler is used to
> control the ordering and rate at which this multiplexing occurs for Transmit
> operations. The ordering and rate of Receive operations is indirectly controlled
> by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.
> 
> The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
> channels. Channels in the UDMA-P can be configured to be either Packet-Based or
> Third-Party channels on a channel by channel basis.
> 
> The initial driver supports:
> - MEM_TO_MEM (TR mode)
> - DEV_TO_MEM (Packet / TR mode)
> - MEM_TO_DEV (Packet / TR mode)
> - Cyclic (Packet / TR mode)
> - Metadata for descriptors
> 
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
>   drivers/dma/ti/k3-udma.c | 780 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 780 insertions(+)
> 
> diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
> index 52ccc6d46de9..0de38db03b8d 100644
> --- a/drivers/dma/ti/k3-udma.c
> +++ b/drivers/dma/ti/k3-udma.c
> @@ -1039,6 +1039,786 @@ static irqreturn_t udma_udma_irq_handler(int irq, void *data)
>   	return IRQ_HANDLED;
>   }
>   
> +static struct udma_rflow *__udma_reserve_rflow(struct udma_dev *ud,
> +					       enum udma_tp_level tpl, int id)
> +{
> +	DECLARE_BITMAP(tmp, K3_UDMA_MAX_RFLOWS);
> +
> +	if (id >= 0) {
> +		if (test_bit(id, ud->rflow_map)) {
> +			dev_err(ud->dev, "rflow%d is in use\n", id);
> +			return ERR_PTR(-ENOENT);
> +		}
> +	} else {
> +		bitmap_or(tmp, ud->rflow_map, ud->rflow_map_reserved,
> +			  ud->rflow_cnt);
> +
> +		id = find_next_zero_bit(tmp, ud->rflow_cnt, ud->rchan_cnt);
> +		if (id >= ud->rflow_cnt)
> +			return ERR_PTR(-ENOENT);
> +	}
> +
> +	set_bit(id, ud->rflow_map);
> +	return &ud->rflows[id];
> +}
> +
> +#define UDMA_RESERVE_RESOURCE(res)					\
> +static struct udma_##res *__udma_reserve_##res(struct udma_dev *ud,	\
> +					       enum udma_tp_level tpl,	\
> +					       int id)			\
> +{									\
> +	if (id >= 0) {							\
> +		if (test_bit(id, ud->res##_map)) {			\
> +			dev_err(ud->dev, "res##%d is in use\n", id);	\
> +			return ERR_PTR(-ENOENT);			\
> +		}							\
> +	} else {							\
> +		int start;						\
> +									\
> +		if (tpl >= ud->match_data->tpl_levels)			\
> +			tpl = ud->match_data->tpl_levels - 1;		\
> +									\
> +		start = ud->match_data->level_start_idx[tpl];		\
> +									\
> +		id = find_next_zero_bit(ud->res##_map, ud->res##_cnt,	\
> +					start);				\
> +		if (id == ud->res##_cnt) {				\
> +			return ERR_PTR(-ENOENT);			\
> +		}							\
> +	}								\
> +									\
> +	set_bit(id, ud->res##_map);					\
> +	return &ud->res##s[id];						\
> +}
> +
> +UDMA_RESERVE_RESOURCE(tchan);
> +UDMA_RESERVE_RESOURCE(rchan);

Personally I'm not a fan of such a big macro, wouldn't be static functions better.
  

> +
> +static int udma_get_tchan(struct udma_chan *uc)
> +{
> +	struct udma_dev *ud = uc->ud;
> +
> +	if (uc->tchan) {
> +		dev_dbg(ud->dev, "chan%d: already have tchan%d allocated\n",
> +			uc->id, uc->tchan->id);
> +		return 0;
> +	}
> +
> +	uc->tchan = __udma_reserve_tchan(ud, uc->channel_tpl, -1);
> +	if (IS_ERR(uc->tchan))
> +		return PTR_ERR(uc->tchan);
> +
> +	return 0;
> +}
> +

[...]

> +
> +static int udma_tisci_channel_config(struct udma_chan *uc)
> +{
> +	struct udma_dev *ud = uc->ud;
> +	struct udma_tisci_rm *tisci_rm = &ud->tisci_rm;
> +	const struct ti_sci_rm_udmap_ops *tisci_ops = tisci_rm->tisci_udmap_ops;
> +	struct udma_tchan *tchan = uc->tchan;
> +	struct udma_rchan *rchan = uc->rchan;
> +	int ret = 0;
> +
> +	if (uc->dir == DMA_MEM_TO_MEM) {
> +		/* Non synchronized - mem to mem type of transfer */
> +		int tc_ring = k3_ringacc_get_ring_id(tchan->tc_ring);
> +		struct ti_sci_msg_rm_udmap_tx_ch_cfg req_tx = { 0 };
> +		struct ti_sci_msg_rm_udmap_rx_ch_cfg req_rx = { 0 };
> +
> +		req_tx.valid_params =
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_EINFO_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_PSWORDS_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID;
> +
> +		req_tx.nav_id = tisci_rm->tisci_dev_id;
> +		req_tx.index = tchan->id;
> +		req_tx.tx_pause_on_err = 0;
> +		req_tx.tx_filt_einfo = 0;
> +		req_tx.tx_filt_pswords = 0;
> +		req_tx.tx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_BCOPY_PBRR;
> +		req_tx.tx_supr_tdpkt = 0;
> +		req_tx.tx_fetch_size = sizeof(struct cppi5_desc_hdr_t) >> 2;
> +		req_tx.txcq_qnum = tc_ring;
> +
> +		ret = tisci_ops->tx_ch_cfg(tisci_rm->tisci, &req_tx);
> +		if (ret) {
> +			dev_err(ud->dev, "tchan%d cfg failed %d\n",
> +				tchan->id, ret);
> +			return ret;
> +		}
> +
> +		req_rx.valid_params =
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_SHORT_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_LONG_VALID;
> +
> +		req_rx.nav_id = tisci_rm->tisci_dev_id;
> +		req_rx.index = rchan->id;
> +		req_rx.rx_fetch_size = sizeof(struct cppi5_desc_hdr_t) >> 2;
> +		req_rx.rxcq_qnum = tc_ring;
> +		req_rx.rx_pause_on_err = 0;
> +		req_rx.rx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_BCOPY_PBRR;
> +		req_rx.rx_ignore_short = 0;
> +		req_rx.rx_ignore_long = 0;Personally I'm not a fan of such a big macro.
> +
> +		ret = tisci_ops->rx_ch_cfg(tisci_rm->tisci, &req_rx);
> +		if (ret) {
> +			dev_err(ud->dev, "rchan%d alloc failed %d\n",
> +				rchan->id, ret);
> +			return ret;
> +		}
> +	} else {
> +		/* Slave transfer */
> +		u32 mode, fetch_size;
> +
> +		if (uc->pkt_mode) {
> +			mode = TI_SCI_RM_UDMAP_CHAN_TYPE_PKT_PBRR;
> +			fetch_size = cppi5_hdesc_calc_size(uc->needs_epib,
> +							   uc->psd_size, 0);
> +		} else {
> +			mode = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_PBRR;
> +			fetch_size = sizeof(struct cppi5_desc_hdr_t);
> +		}
> +
> +		if (uc->dir == DMA_MEM_TO_DEV) {
> +			/* TX */
> +			int tc_ring = k3_ringacc_get_ring_id(tchan->tc_ring);
> +			struct ti_sci_msg_rm_udmap_tx_ch_cfg req_tx = { 0 };
> +
> +			req_tx.valid_params =
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_EINFO_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_PSWORDS_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID;
> +
> +			req_tx.nav_id = tisci_rm->tisci_dev_id;
> +			req_tx.index = tchan->id;
> +			req_tx.tx_pause_on_err = 0;
> +			req_tx.tx_filt_einfo = 0;
> +			req_tx.tx_filt_pswords = 0;
> +			req_tx.tx_chan_type = mode;
> +			req_tx.tx_supr_tdpkt = 0;
> +			req_tx.tx_fetch_size = fetch_size >> 2;
> +			req_tx.txcq_qnum = tc_ring;
> +
> +			ret = tisci_ops->tx_ch_cfg(tisci_rm->tisci, &req_tx);
> +			if (ret) {
> +				dev_err(ud->dev, "tchan%d cfg failed %d\n",
> +					tchan->id, ret);
> +				return ret;
> +			}
> +		} else {
> +			/* RX */
> +			int fd_ring = k3_ringacc_get_ring_id(rchan->fd_ring);
> +			int rx_ring = k3_ringacc_get_ring_id(rchan->r_ring);
> +			struct ti_sci_msg_rm_udmap_rx_ch_cfg req_rx = { 0 };
> +			struct ti_sci_msg_rm_udmap_flow_cfg flow_req = { 0 };
> +
> +			req_rx.valid_params =
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_SHORT_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_LONG_VALID;
> +
> +			req_rx.nav_id = tisci_rm->tisci_dev_id;
> +			req_rx.index = rchan->id;
> +			req_rx.rx_fetch_size =  fetch_size >> 2;
> +			req_rx.rxcq_qnum = rx_ring;
> +			req_rx.rx_pause_on_err = 0;
> +			req_rx.rx_chan_type = mode;
> +			req_rx.rx_ignore_short = 0;
> +			req_rx.rx_ignore_long = 0;
> +
> +			ret = tisci_ops->rx_ch_cfg(tisci_rm->tisci, &req_rx);
> +			if (ret) {
> +				dev_err(ud->dev, "rchan%d cfg failed %d\n",
> +					rchan->id, ret);
> +				return ret;
> +			}
> +
> +			flow_req.valid_params =
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_EINFO_PRESENT_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_PSINFO_PRESENT_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_ERROR_HANDLING_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DESC_TYPE_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_QNUM_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_HI_SEL_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_LO_SEL_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_HI_SEL_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_LO_SEL_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ0_SZ0_QNUM_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ1_QNUM_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ2_QNUM_VALID |
> +			TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ3_QNUM_VALID;
> +
> +			flow_req.nav_id = tisci_rm->tisci_dev_id;
> +			flow_req.flow_index = rchan->id;
> +
> +			if (uc->needs_epib)
> +				flow_req.rx_einfo_present = 1;
> +			else
> +				flow_req.rx_einfo_present = 0;
> +			if (uc->psd_size)
> +				flow_req.rx_psinfo_present = 1;
> +			else
> +				flow_req.rx_psinfo_present = 0;
> +			flow_req.rx_error_handling = 1;
> +			flow_req.rx_desc_type = 0;
> +			flow_req.rx_dest_qnum = rx_ring;
> +			flow_req.rx_src_tag_hi_sel = 2;
> +			flow_req.rx_src_tag_lo_sel = 4;
> +			flow_req.rx_dest_tag_hi_sel = 5;
> +			flow_req.rx_dest_tag_lo_sel = 4;
> +			flow_req.rx_fdq0_sz0_qnum = fd_ring;
> +			flow_req.rx_fdq1_qnum = fd_ring;
> +			flow_req.rx_fdq2_qnum = fd_ring;
> +			flow_req.rx_fdq3_qnum = fd_ring;
> +
> +			ret = tisci_ops->rx_flow_cfg(tisci_rm->tisci,
> +						     &flow_req);
> +
> +			if (ret) {
> +				dev_err(ud->dev, "flow%d config failed: %d\n",
> +					rchan->id, ret);
> +				return ret;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}

Could you split above big function pls?

> +
> +static int udma_alloc_chan_resources(struct dma_chan *chan)
> +{
> +	struct udma_chan *uc = to_udma_chan(chan);
> +	struct udma_dev *ud = to_udma_dev(chan->device);
> +	const struct udma_match_data *match_data = ud->match_data;
> +	struct k3_ring *irq_ring;
> +	u32 irq_udma_idx;
> +	int ret;
> +
> +	if (uc->pkt_mode || uc->dir == DMA_MEM_TO_MEM) {
> +		uc->use_dma_pool = true;
> +		/* in case of MEM_TO_MEM we have maximum of two TRs */
> +		if (uc->dir == DMA_MEM_TO_MEM) {
> +			uc->hdesc_size = cppi5_trdesc_calc_size(
> +					sizeof(struct cppi5_tr_type15_t), 2);
> +			uc->pkt_mode = false;
> +		}
> +	}
> +
> +	if (uc->use_dma_pool) {
> +		uc->hdesc_pool = dma_pool_create(uc->name, ud->ddev.dev,
> +						 uc->hdesc_size, ud->desc_align,
> +						 0);
> +		if (!uc->hdesc_pool) {
> +			dev_err(ud->ddev.dev,
> +				"Descriptor pool allocation failed\n");
> +			uc->use_dma_pool = false;
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	pm_runtime_get_sync(ud->ddev.dev);
> +
> +	/*
> +	 * Make sure that the completion is in a known state:
> +	 * No teardown, the channel is idle
> +	 */
> +	reinit_completion(&uc->teardown_completed);
> +	complete_all(&uc->teardown_completed);
> +	uc->state = UDMA_CHAN_IS_IDLE;
> +
> +	switch (uc->dir) {
> +	case DMA_MEM_TO_MEM:
> +		/* Non synchronized - mem to mem type of transfer */
> +		dev_dbg(uc->ud->dev, "%s: chan%d as MEM-to-MEM\n", __func__,
> +			uc->id);
> +
> +		ret = udma_get_chan_pair(uc);
> +		if (ret)
> +			return ret;
> +
> +		ret = udma_alloc_tx_resources(uc);
> +		if (ret)
> +			return ret;
> +
> +		ret = udma_alloc_rx_resources(uc);
> +		if (ret) {
> +			udma_free_tx_resources(uc);
> +			return ret;
> +		}
> +
> +		uc->src_thread = ud->psil_base + uc->tchan->id;
> +		uc->dst_thread = (ud->psil_base + uc->rchan->id) |
> +				 UDMA_PSIL_DST_THREAD_ID_OFFSET;
> +
> +		irq_ring = uc->tchan->tc_ring;
> +		irq_udma_idx = uc->tchan->id;
> +		break;
> +	case DMA_MEM_TO_DEV:
> +		/* Slave transfer synchronized - mem to dev (TX) trasnfer */
> +		dev_dbg(uc->ud->dev, "%s: chan%d as MEM-to-DEV\n", __func__,
> +			uc->id);
> +
> +		ret = udma_alloc_tx_resources(uc);
> +		if (ret) {
> +			uc->remote_thread_id = -1;
> +			return ret;
> +		}
> +
> +		uc->src_thread = ud->psil_base + uc->tchan->id;
> +		uc->dst_thread = uc->remote_thread_id;
> +		uc->dst_thread |= UDMA_PSIL_DST_THREAD_ID_OFFSET;
> +
> +		irq_ring = uc->tchan->tc_ring;
> +		irq_udma_idx = uc->tchan->id;
> +		break;
> +	case DMA_DEV_TO_MEM:
> +		/* Slave transfer synchronized - dev to mem (RX) trasnfer */
> +		dev_dbg(uc->ud->dev, "%s: chan%d as DEV-to-MEM\n", __func__,
> +			uc->id);
> +
> +		ret = udma_alloc_rx_resources(uc);
> +		if (ret) {
> +			uc->remote_thread_id = -1;
> +			return ret;
> +		}
> +
> +		uc->src_thread = uc->remote_thread_id;
> +		uc->dst_thread = (ud->psil_base + uc->rchan->id) |
> +				 UDMA_PSIL_DST_THREAD_ID_OFFSET;
> +
> +		irq_ring = uc->rchan->r_ring;
> +		irq_udma_idx = match_data->rchan_oes_offset + uc->rchan->id;
> +		break;
> +	default:
> +		/* Can not happen */
> +		dev_err(uc->ud->dev, "%s: chan%d invalid direction (%u)\n",
> +			__func__, uc->id, uc->dir);
> +		return -EINVAL;
> +	}
> +
> +	/* Configure channel(s), rflow via tisci */
> +	ret = udma_tisci_channel_config(uc);
> +	if (ret)
> +		goto err_res_free;
> +
> +	if (udma_is_chan_running(uc)) {
> +		dev_warn(ud->dev, "chan%d: is running!\n", uc->id);
> +		udma_stop(uc);
> +		if (udma_is_chan_running(uc)) {
> +			dev_err(ud->dev, "chan%d: won't stop!\n", uc->id);
> +			goto err_res_free;
> +		}
> +	}
> +
> +	/* PSI-L pairing */
> +	ret = navss_psil_pair(ud, uc->src_thread, uc->dst_thread);
> +	if (ret) {
> +		dev_err(ud->dev, "PSI-L pairing failed: 0x%04x -> 0x%04x\n",
> +			uc->src_thread, uc->dst_thread);
> +		goto err_res_free;
> +	}
> +
> +	uc->psil_paired = true;
> +
> +	uc->irq_num_ring = k3_ringacc_get_ring_irq_num(irq_ring);
> +	if (uc->irq_num_ring <= 0) {
> +		dev_err(ud->dev, "Failed to get ring irq (index: %u)\n",
> +			k3_ringacc_get_ring_id(irq_ring));
> +		ret = -EINVAL;
> +		goto err_psi_free;
> +	}
> +
> +	ret = request_irq(uc->irq_num_ring, udma_ring_irq_handler,
> +			  IRQF_TRIGGER_HIGH, uc->name, uc);
> +	if (ret) {
> +		dev_err(ud->dev, "chan%d: ring irq request failed\n", uc->id);
> +		goto err_irq_free;
> +	}
> +
> +	/* Event from UDMA (TR events) only needed for slave TR mode channels */
> +	if (is_slave_direction(uc->dir) && !uc->pkt_mode) {
> +		uc->irq_num_udma = ti_sci_inta_msi_get_virq(ud->dev,
> +							    irq_udma_idx);
> +		if (uc->irq_num_udma <= 0) {
> +			dev_err(ud->dev, "Failed to get udma irq (index: %u)\n",
> +				irq_udma_idx);
> +			free_irq(uc->irq_num_ring, uc);
> +			ret = -EINVAL;
> +			goto err_irq_free;
> +		}
> +
> +		ret = request_irq(uc->irq_num_udma, udma_udma_irq_handler, 0,
> +				  uc->name, uc);
> +		if (ret) {
> +			dev_err(ud->dev, "chan%d: UDMA irq request failed\n",
> +				uc->id);
> +			free_irq(uc->irq_num_ring, uc);
> +			goto err_irq_free;
> +		}
> +	} else {
> +		uc->irq_num_udma = 0;
> +	}
> +
> +	udma_reset_rings(uc);
> +
> +	return 0;
> +
> +err_irq_free:
> +	uc->irq_num_ring = 0;
> +	uc->irq_num_udma = 0;
> +err_psi_free:
> +	navss_psil_unpair(ud, uc->src_thread, uc->dst_thread);
> +	uc->psil_paired = false;
> +err_res_free:
> +	udma_free_tx_resources(uc);
> +	udma_free_rx_resources(uc);
> +
> +	uc->remote_thread_id = -1;
> +	uc->dir = DMA_MEM_TO_MEM;
> +	uc->pkt_mode = false;
> +	uc->static_tr_type = 0;
> +	uc->enable_acc32 = 0;
> +	uc->enable_burst = 0;
> +	uc->channel_tpl = 0;
> +	uc->psd_size = 0;
> +	uc->metadata_size = 0;
> +	uc->hdesc_size = 0;
> +
> +	if (uc->use_dma_pool) {
> +		dma_pool_destroy(uc->hdesc_pool);
> +		uc->use_dma_pool = false;
> +	}
> +
> +	return ret;
> +}
> +

[...]

-- 
Best regards,
grygorii

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/14] dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io func
  2019-07-30  9:34 ` [PATCH v2 08/14] dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io func Peter Ujfalusi
@ 2019-09-10  7:27   ` Grygorii Strashko
  0 siblings, 0 replies; 32+ messages in thread
From: Grygorii Strashko @ 2019-09-10  7:27 UTC (permalink / raw)
  To: Peter Ujfalusi, vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, lokeshvutla, t-kristo, tony, j-keerthy



On 30/07/2019 12:34, Peter Ujfalusi wrote:
> Split patch for review containing: defines, structs, io and low level
> functions and interrupt callbacks.
> 
> DMA driver for
> Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)
> 
> The UDMA-P is intended to perform similar (but significantly upgraded) functions
> as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
> supports the transmission and reception of various packet types. The UDMA-P is
> architected to facilitate the segmentation and reassembly of SoC DMA data
> structure compliant packets to/from smaller data blocks that are natively
> compatible with the specific requirements of each connected peripheral. Multiple
> Tx and Rx channels are provided within the DMA which allow multiple segmentation
> or reassembly operations to be ongoing. The DMA controller maintains state
> information for each of the channels which allows packet segmentation and
> reassembly operations to be time division multiplexed between channels in order
> to share the underlying DMA hardware. An external DMA scheduler is used to
> control the ordering and rate at which this multiplexing occurs for Transmit
> operations. The ordering and rate of Receive operations is indirectly controlled
> by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.
> 
> The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
> channels. Channels in the UDMA-P can be configured to be either Packet-Based or
> Third-Party channels on a channel by channel basis.
> 
> The initial driver supports:
> - MEM_TO_MEM (TR mode)
> - DEV_TO_MEM (Packet / TR mode)
> - MEM_TO_DEV (Packet / TR mode)
> - Cyclic (Packet / TR mode)
> - Metadata for descriptors
> 
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---

[...]

> +
> +/* Generic register access functions */
> +static inline u32 udma_read(void __iomem *base, int reg)
> +{
> +	return __raw_readl(base + reg);
> +}
> +
> +static inline void udma_write(void __iomem *base, int reg, u32 val)
> +{
> +	__raw_writel(val, base + reg);
> +}
> +
> +static inline void udma_update_bits(void __iomem *base, int reg,
> +				    u32 mask, u32 val)
> +{
> +	u32 tmp, orig;
> +
> +	orig = __raw_readl(base + reg);
> +	tmp = orig & ~mask;
> +	tmp |= (val & mask);
> +
> +	if (tmp != orig)
> +		__raw_writel(tmp, base + reg);
> +}

Pls, do not use  _raw APIs in drivers.

[...]

-- 
Best regards,
grygorii

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/14] dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free chan_resources
  2019-09-10  7:25   ` Grygorii Strashko
@ 2019-09-10  7:53     ` Peter Ujfalusi
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Ujfalusi @ 2019-09-10  7:53 UTC (permalink / raw)
  To: Grygorii Strashko, vkoul, robh+dt, nm, ssantosh
  Cc: dan.j.williams, dmaengine, linux-arm-kernel, devicetree,
	linux-kernel, lokeshvutla, t-kristo, tony, j-keerthy



On 10/09/2019 10.25, Grygorii Strashko wrote:
> 
> 
> On 30/07/2019 12:34, Peter Ujfalusi wrote:
>> Split patch for review containing: channel rsource allocation and free
>> functions.
>>
>> DMA driver for
>> Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)
>>
>> The UDMA-P is intended to perform similar (but significantly upgraded)
>> functions
>> as the packet-oriented DMA used on previous SoC devices. The UDMA-P
>> module
>> supports the transmission and reception of various packet types. The
>> UDMA-P is
>> architected to facilitate the segmentation and reassembly of SoC DMA data
>> structure compliant packets to/from smaller data blocks that are natively
>> compatible with the specific requirements of each connected
>> peripheral. Multiple
>> Tx and Rx channels are provided within the DMA which allow multiple
>> segmentation
>> or reassembly operations to be ongoing. The DMA controller maintains
>> state
>> information for each of the channels which allows packet segmentation and
>> reassembly operations to be time division multiplexed between channels
>> in order
>> to share the underlying DMA hardware. An external DMA scheduler is
>> used to
>> control the ordering and rate at which this multiplexing occurs for
>> Transmit
>> operations. The ordering and rate of Receive operations is indirectly
>> controlled
>> by the order in which blocks are pushed into the DMA on the Rx PSI-L
>> interface.
>>
>> The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
>> channels. Channels in the UDMA-P can be configured to be either
>> Packet-Based or
>> Third-Party channels on a channel by channel basis.
>>
>> The initial driver supports:
>> - MEM_TO_MEM (TR mode)
>> - DEV_TO_MEM (Packet / TR mode)
>> - MEM_TO_DEV (Packet / TR mode)
>> - Cyclic (Packet / TR mode)
>> - Metadata for descriptors
>>
>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
>> ---
>>   drivers/dma/ti/k3-udma.c | 780 +++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 780 insertions(+)
>>
>> diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
>> index 52ccc6d46de9..0de38db03b8d 100644
>> --- a/drivers/dma/ti/k3-udma.c
>> +++ b/drivers/dma/ti/k3-udma.c
>> @@ -1039,6 +1039,786 @@ static irqreturn_t udma_udma_irq_handler(int
>> irq, void *data)
>>       return IRQ_HANDLED;
>>   }
>>   +static struct udma_rflow *__udma_reserve_rflow(struct udma_dev *ud,
>> +                           enum udma_tp_level tpl, int id)
>> +{
>> +    DECLARE_BITMAP(tmp, K3_UDMA_MAX_RFLOWS);
>> +
>> +    if (id >= 0) {
>> +        if (test_bit(id, ud->rflow_map)) {
>> +            dev_err(ud->dev, "rflow%d is in use\n", id);
>> +            return ERR_PTR(-ENOENT);
>> +        }
>> +    } else {
>> +        bitmap_or(tmp, ud->rflow_map, ud->rflow_map_reserved,
>> +              ud->rflow_cnt);
>> +
>> +        id = find_next_zero_bit(tmp, ud->rflow_cnt, ud->rchan_cnt);
>> +        if (id >= ud->rflow_cnt)
>> +            return ERR_PTR(-ENOENT);
>> +    }
>> +
>> +    set_bit(id, ud->rflow_map);
>> +    return &ud->rflows[id];
>> +}
>> +
>> +#define UDMA_RESERVE_RESOURCE(res)                    \
>> +static struct udma_##res *__udma_reserve_##res(struct udma_dev *ud,    \
>> +                           enum udma_tp_level tpl,    \
>> +                           int id)            \
>> +{                                    \
>> +    if (id >= 0) {                            \
>> +        if (test_bit(id, ud->res##_map)) {            \
>> +            dev_err(ud->dev, "res##%d is in use\n", id);    \
>> +            return ERR_PTR(-ENOENT);            \
>> +        }                            \
>> +    } else {                            \
>> +        int start;                        \
>> +                                    \
>> +        if (tpl >= ud->match_data->tpl_levels)            \
>> +            tpl = ud->match_data->tpl_levels - 1;        \
>> +                                    \
>> +        start = ud->match_data->level_start_idx[tpl];        \
>> +                                    \
>> +        id = find_next_zero_bit(ud->res##_map, ud->res##_cnt,    \
>> +                    start);                \
>> +        if (id == ud->res##_cnt) {                \
>> +            return ERR_PTR(-ENOENT);            \
>> +        }                            \
>> +    }                                \
>> +                                    \
>> +    set_bit(id, ud->res##_map);                    \
>> +    return &ud->res##s[id];                        \
>> +}
>> +
>> +UDMA_RESERVE_RESOURCE(tchan);
>> +UDMA_RESERVE_RESOURCE(rchan);
> 
> Personally I'm not a fan of such a big macro, wouldn't be static
> functions better.

The other option is to have two identical function with only difference
is s/tchan/rchan.

> 
>> +
>> +static int udma_get_tchan(struct udma_chan *uc)
>> +{
>> +    struct udma_dev *ud = uc->ud;
>> +
>> +    if (uc->tchan) {
>> +        dev_dbg(ud->dev, "chan%d: already have tchan%d allocated\n",
>> +            uc->id, uc->tchan->id);
>> +        return 0;
>> +    }
>> +
>> +    uc->tchan = __udma_reserve_tchan(ud, uc->channel_tpl, -1);
>> +    if (IS_ERR(uc->tchan))
>> +        return PTR_ERR(uc->tchan);
>> +
>> +    return 0;
>> +}
>> +
> 
> [...]
> 
>> +
>> +static int udma_tisci_channel_config(struct udma_chan *uc)
>> +{
>> +    struct udma_dev *ud = uc->ud;
>> +    struct udma_tisci_rm *tisci_rm = &ud->tisci_rm;
>> +    const struct ti_sci_rm_udmap_ops *tisci_ops =
>> tisci_rm->tisci_udmap_ops;
>> +    struct udma_tchan *tchan = uc->tchan;
>> +    struct udma_rchan *rchan = uc->rchan;
>> +    int ret = 0;
>> +
>> +    if (uc->dir == DMA_MEM_TO_MEM) {
>> +        /* Non synchronized - mem to mem type of transfer */
>> +        int tc_ring = k3_ringacc_get_ring_id(tchan->tc_ring);
>> +        struct ti_sci_msg_rm_udmap_tx_ch_cfg req_tx = { 0 };
>> +        struct ti_sci_msg_rm_udmap_rx_ch_cfg req_rx = { 0 };
>> +
>> +        req_tx.valid_params =
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_EINFO_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_PSWORDS_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID;
>> +
>> +        req_tx.nav_id = tisci_rm->tisci_dev_id;
>> +        req_tx.index = tchan->id;
>> +        req_tx.tx_pause_on_err = 0;
>> +        req_tx.tx_filt_einfo = 0;
>> +        req_tx.tx_filt_pswords = 0;
>> +        req_tx.tx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_BCOPY_PBRR;
>> +        req_tx.tx_supr_tdpkt = 0;
>> +        req_tx.tx_fetch_size = sizeof(struct cppi5_desc_hdr_t) >> 2;
>> +        req_tx.txcq_qnum = tc_ring;
>> +
>> +        ret = tisci_ops->tx_ch_cfg(tisci_rm->tisci, &req_tx);
>> +        if (ret) {
>> +            dev_err(ud->dev, "tchan%d cfg failed %d\n",
>> +                tchan->id, ret);
>> +            return ret;
>> +        }
>> +
>> +        req_rx.valid_params =
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_SHORT_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_LONG_VALID;
>> +
>> +        req_rx.nav_id = tisci_rm->tisci_dev_id;
>> +        req_rx.index = rchan->id;
>> +        req_rx.rx_fetch_size = sizeof(struct cppi5_desc_hdr_t) >> 2;
>> +        req_rx.rxcq_qnum = tc_ring;
>> +        req_rx.rx_pause_on_err = 0;
>> +        req_rx.rx_chan_type = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_BCOPY_PBRR;
>> +        req_rx.rx_ignore_short = 0;
>> +        req_rx.rx_ignore_long = 0;
>> +
>> +        ret = tisci_ops->rx_ch_cfg(tisci_rm->tisci, &req_rx);
>> +        if (ret) {
>> +            dev_err(ud->dev, "rchan%d alloc failed %d\n",
>> +                rchan->id, ret);
>> +            return ret;
>> +        }
>> +    } else {
>> +        /* Slave transfer */
>> +        u32 mode, fetch_size;
>> +
>> +        if (uc->pkt_mode) {
>> +            mode = TI_SCI_RM_UDMAP_CHAN_TYPE_PKT_PBRR;
>> +            fetch_size = cppi5_hdesc_calc_size(uc->needs_epib,
>> +                               uc->psd_size, 0);
>> +        } else {
>> +            mode = TI_SCI_RM_UDMAP_CHAN_TYPE_3RDP_PBRR;
>> +            fetch_size = sizeof(struct cppi5_desc_hdr_t);
>> +        }
>> +
>> +        if (uc->dir == DMA_MEM_TO_DEV) {
>> +            /* TX */
>> +            int tc_ring = k3_ringacc_get_ring_id(tchan->tc_ring);
>> +            struct ti_sci_msg_rm_udmap_tx_ch_cfg req_tx = { 0 };
>> +
>> +            req_tx.valid_params =
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_EINFO_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FILT_PSWORDS_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID;
>> +
>> +            req_tx.nav_id = tisci_rm->tisci_dev_id;
>> +            req_tx.index = tchan->id;
>> +            req_tx.tx_pause_on_err = 0;
>> +            req_tx.tx_filt_einfo = 0;
>> +            req_tx.tx_filt_pswords = 0;
>> +            req_tx.tx_chan_type = mode;
>> +            req_tx.tx_supr_tdpkt = 0;
>> +            req_tx.tx_fetch_size = fetch_size >> 2;
>> +            req_tx.txcq_qnum = tc_ring;
>> +
>> +            ret = tisci_ops->tx_ch_cfg(tisci_rm->tisci, &req_tx);
>> +            if (ret) {
>> +                dev_err(ud->dev, "tchan%d cfg failed %d\n",
>> +                    tchan->id, ret);
>> +                return ret;
>> +            }
>> +        } else {
>> +            /* RX */
>> +            int fd_ring = k3_ringacc_get_ring_id(rchan->fd_ring);
>> +            int rx_ring = k3_ringacc_get_ring_id(rchan->r_ring);
>> +            struct ti_sci_msg_rm_udmap_rx_ch_cfg req_rx = { 0 };
>> +            struct ti_sci_msg_rm_udmap_flow_cfg flow_req = { 0 };
>> +
>> +            req_rx.valid_params =
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_FETCH_SIZE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CQ_QNUM_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_CHAN_TYPE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_SHORT_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_CH_RX_IGNORE_LONG_VALID;
>> +
>> +            req_rx.nav_id = tisci_rm->tisci_dev_id;
>> +            req_rx.index = rchan->id;
>> +            req_rx.rx_fetch_size =  fetch_size >> 2;
>> +            req_rx.rxcq_qnum = rx_ring;
>> +            req_rx.rx_pause_on_err = 0;
>> +            req_rx.rx_chan_type = mode;
>> +            req_rx.rx_ignore_short = 0;
>> +            req_rx.rx_ignore_long = 0;
>> +
>> +            ret = tisci_ops->rx_ch_cfg(tisci_rm->tisci, &req_rx);
>> +            if (ret) {
>> +                dev_err(ud->dev, "rchan%d cfg failed %d\n",
>> +                    rchan->id, ret);
>> +                return ret;
>> +            }
>> +
>> +            flow_req.valid_params =
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_EINFO_PRESENT_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_PSINFO_PRESENT_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_ERROR_HANDLING_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DESC_TYPE_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_QNUM_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_HI_SEL_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_SRC_TAG_LO_SEL_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_HI_SEL_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_DEST_TAG_LO_SEL_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ0_SZ0_QNUM_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ1_QNUM_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ2_QNUM_VALID |
>> +            TI_SCI_MSG_VALUE_RM_UDMAP_FLOW_FDQ3_QNUM_VALID;
>> +
>> +            flow_req.nav_id = tisci_rm->tisci_dev_id;
>> +            flow_req.flow_index = rchan->id;
>> +
>> +            if (uc->needs_epib)
>> +                flow_req.rx_einfo_present = 1;
>> +            else
>> +                flow_req.rx_einfo_present = 0;
>> +            if (uc->psd_size)
>> +                flow_req.rx_psinfo_present = 1;
>> +            else
>> +                flow_req.rx_psinfo_present = 0;
>> +            flow_req.rx_error_handling = 1;
>> +            flow_req.rx_desc_type = 0;
>> +            flow_req.rx_dest_qnum = rx_ring;
>> +            flow_req.rx_src_tag_hi_sel = 2;
>> +            flow_req.rx_src_tag_lo_sel = 4;
>> +            flow_req.rx_dest_tag_hi_sel = 5;
>> +            flow_req.rx_dest_tag_lo_sel = 4;
>> +            flow_req.rx_fdq0_sz0_qnum = fd_ring;
>> +            flow_req.rx_fdq1_qnum = fd_ring;
>> +            flow_req.rx_fdq2_qnum = fd_ring;
>> +            flow_req.rx_fdq3_qnum = fd_ring;
>> +
>> +            ret = tisci_ops->rx_flow_cfg(tisci_rm->tisci,
>> +                             &flow_req);
>> +
>> +            if (ret) {
>> +                dev_err(ud->dev, "flow%d config failed: %d\n",
>> +                    rchan->id, ret);
>> +                return ret;
>> +            }
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
> 
> Could you split above big function pls?

I can slit to:
udma_tisci_m2m_channel_config()
udma_tisci_tx_channel_config()
udma_tisci_rx_channel_config()

and call them from the first switch case in udma_alloc_chan_resources()

> 
>> +
>> +static int udma_alloc_chan_resources(struct dma_chan *chan)
>> +{
>> +    struct udma_chan *uc = to_udma_chan(chan);
>> +    struct udma_dev *ud = to_udma_dev(chan->device);
>> +    const struct udma_match_data *match_data = ud->match_data;
>> +    struct k3_ring *irq_ring;
>> +    u32 irq_udma_idx;
>> +    int ret;
>> +
>> +    if (uc->pkt_mode || uc->dir == DMA_MEM_TO_MEM) {
>> +        uc->use_dma_pool = true;
>> +        /* in case of MEM_TO_MEM we have maximum of two TRs */
>> +        if (uc->dir == DMA_MEM_TO_MEM) {
>> +            uc->hdesc_size = cppi5_trdesc_calc_size(
>> +                    sizeof(struct cppi5_tr_type15_t), 2);
>> +            uc->pkt_mode = false;
>> +        }
>> +    }
>> +
>> +    if (uc->use_dma_pool) {
>> +        uc->hdesc_pool = dma_pool_create(uc->name, ud->ddev.dev,
>> +                         uc->hdesc_size, ud->desc_align,
>> +                         0);
>> +        if (!uc->hdesc_pool) {
>> +            dev_err(ud->ddev.dev,
>> +                "Descriptor pool allocation failed\n");
>> +            uc->use_dma_pool = false;
>> +            return -ENOMEM;
>> +        }
>> +    }
>> +
>> +    pm_runtime_get_sync(ud->ddev.dev);
>> +
>> +    /*
>> +     * Make sure that the completion is in a known state:
>> +     * No teardown, the channel is idle
>> +     */
>> +    reinit_completion(&uc->teardown_completed);
>> +    complete_all(&uc->teardown_completed);
>> +    uc->state = UDMA_CHAN_IS_IDLE;
>> +
>> +    switch (uc->dir) {
>> +    case DMA_MEM_TO_MEM:
>> +        /* Non synchronized - mem to mem type of transfer */
>> +        dev_dbg(uc->ud->dev, "%s: chan%d as MEM-to-MEM\n", __func__,
>> +            uc->id);
>> +
>> +        ret = udma_get_chan_pair(uc);
>> +        if (ret)
>> +            return ret;
>> +
>> +        ret = udma_alloc_tx_resources(uc);
>> +        if (ret)
>> +            return ret;
>> +
>> +        ret = udma_alloc_rx_resources(uc);
>> +        if (ret) {
>> +            udma_free_tx_resources(uc);
>> +            return ret;
>> +        }
>> +
>> +        uc->src_thread = ud->psil_base + uc->tchan->id;
>> +        uc->dst_thread = (ud->psil_base + uc->rchan->id) |
>> +                 UDMA_PSIL_DST_THREAD_ID_OFFSET;
>> +
>> +        irq_ring = uc->tchan->tc_ring;
>> +        irq_udma_idx = uc->tchan->id;
>> +        break;
>> +    case DMA_MEM_TO_DEV:
>> +        /* Slave transfer synchronized - mem to dev (TX) trasnfer */
>> +        dev_dbg(uc->ud->dev, "%s: chan%d as MEM-to-DEV\n", __func__,
>> +            uc->id);
>> +
>> +        ret = udma_alloc_tx_resources(uc);
>> +        if (ret) {
>> +            uc->remote_thread_id = -1;
>> +            return ret;
>> +        }
>> +
>> +        uc->src_thread = ud->psil_base + uc->tchan->id;
>> +        uc->dst_thread = uc->remote_thread_id;
>> +        uc->dst_thread |= UDMA_PSIL_DST_THREAD_ID_OFFSET;
>> +
>> +        irq_ring = uc->tchan->tc_ring;
>> +        irq_udma_idx = uc->tchan->id;
>> +        break;
>> +    case DMA_DEV_TO_MEM:
>> +        /* Slave transfer synchronized - dev to mem (RX) trasnfer */
>> +        dev_dbg(uc->ud->dev, "%s: chan%d as DEV-to-MEM\n", __func__,
>> +            uc->id);
>> +
>> +        ret = udma_alloc_rx_resources(uc);
>> +        if (ret) {
>> +            uc->remote_thread_id = -1;
>> +            return ret;
>> +        }
>> +
>> +        uc->src_thread = uc->remote_thread_id;
>> +        uc->dst_thread = (ud->psil_base + uc->rchan->id) |
>> +                 UDMA_PSIL_DST_THREAD_ID_OFFSET;
>> +
>> +        irq_ring = uc->rchan->r_ring;
>> +        irq_udma_idx = match_data->rchan_oes_offset + uc->rchan->id;
>> +        break;
>> +    default:
>> +        /* Can not happen */
>> +        dev_err(uc->ud->dev, "%s: chan%d invalid direction (%u)\n",
>> +            __func__, uc->id, uc->dir);
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Configure channel(s), rflow via tisci */
>> +    ret = udma_tisci_channel_config(uc);
>> +    if (ret)
>> +        goto err_res_free;
>> +
>> +    if (udma_is_chan_running(uc)) {
>> +        dev_warn(ud->dev, "chan%d: is running!\n", uc->id);
>> +        udma_stop(uc);
>> +        if (udma_is_chan_running(uc)) {
>> +            dev_err(ud->dev, "chan%d: won't stop!\n", uc->id);
>> +            goto err_res_free;
>> +        }
>> +    }
>> +
>> +    /* PSI-L pairing */
>> +    ret = navss_psil_pair(ud, uc->src_thread, uc->dst_thread);
>> +    if (ret) {
>> +        dev_err(ud->dev, "PSI-L pairing failed: 0x%04x -> 0x%04x\n",
>> +            uc->src_thread, uc->dst_thread);
>> +        goto err_res_free;
>> +    }
>> +
>> +    uc->psil_paired = true;
>> +
>> +    uc->irq_num_ring = k3_ringacc_get_ring_irq_num(irq_ring);
>> +    if (uc->irq_num_ring <= 0) {
>> +        dev_err(ud->dev, "Failed to get ring irq (index: %u)\n",
>> +            k3_ringacc_get_ring_id(irq_ring));
>> +        ret = -EINVAL;
>> +        goto err_psi_free;
>> +    }
>> +
>> +    ret = request_irq(uc->irq_num_ring, udma_ring_irq_handler,
>> +              IRQF_TRIGGER_HIGH, uc->name, uc);
>> +    if (ret) {
>> +        dev_err(ud->dev, "chan%d: ring irq request failed\n", uc->id);
>> +        goto err_irq_free;
>> +    }
>> +
>> +    /* Event from UDMA (TR events) only needed for slave TR mode
>> channels */
>> +    if (is_slave_direction(uc->dir) && !uc->pkt_mode) {
>> +        uc->irq_num_udma = ti_sci_inta_msi_get_virq(ud->dev,
>> +                                irq_udma_idx);
>> +        if (uc->irq_num_udma <= 0) {
>> +            dev_err(ud->dev, "Failed to get udma irq (index: %u)\n",
>> +                irq_udma_idx);
>> +            free_irq(uc->irq_num_ring, uc);
>> +            ret = -EINVAL;
>> +            goto err_irq_free;
>> +        }
>> +
>> +        ret = request_irq(uc->irq_num_udma, udma_udma_irq_handler, 0,
>> +                  uc->name, uc);
>> +        if (ret) {
>> +            dev_err(ud->dev, "chan%d: UDMA irq request failed\n",
>> +                uc->id);
>> +            free_irq(uc->irq_num_ring, uc);
>> +            goto err_irq_free;
>> +        }
>> +    } else {
>> +        uc->irq_num_udma = 0;
>> +    }
>> +
>> +    udma_reset_rings(uc);
>> +
>> +    return 0;
>> +
>> +err_irq_free:
>> +    uc->irq_num_ring = 0;
>> +    uc->irq_num_udma = 0;
>> +err_psi_free:
>> +    navss_psil_unpair(ud, uc->src_thread, uc->dst_thread);
>> +    uc->psil_paired = false;
>> +err_res_free:
>> +    udma_free_tx_resources(uc);
>> +    udma_free_rx_resources(uc);
>> +
>> +    uc->remote_thread_id = -1;
>> +    uc->dir = DMA_MEM_TO_MEM;
>> +    uc->pkt_mode = false;
>> +    uc->static_tr_type = 0;
>> +    uc->enable_acc32 = 0;
>> +    uc->enable_burst = 0;
>> +    uc->channel_tpl = 0;
>> +    uc->psd_size = 0;
>> +    uc->metadata_size = 0;
>> +    uc->hdesc_size = 0;
>> +
>> +    if (uc->use_dma_pool) {
>> +        dma_pool_destroy(uc->hdesc_pool);
>> +        uc->use_dma_pool = false;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
> 
> [...]
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, back to index

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30  9:34 [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 01/14] bindings: soc: ti: add documentation for k3 ringacc Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 02/14] soc: ti: k3: add navss ringacc driver Peter Ujfalusi
2019-08-30 12:57   ` Peter Ujfalusi
2019-09-09  6:09   ` Tero Kristo
2019-09-09  7:25     ` Vignesh Raghavendra
2019-09-09 13:00     ` Peter Ujfalusi
2019-09-09 16:58       ` Grygorii Strashko
2019-07-30  9:34 ` [PATCH v2 03/14] dmaengine: doc: Add sections for per descriptor metadata support Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 04/14] dmaengine: Add metadata_ops for dma_async_tx_descriptor Peter Ujfalusi
2019-09-08 14:12   ` Vinod Koul
2019-09-09  6:52     ` Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 05/14] dmaengine: Add support for reporting DMA cached data amount Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 06/14] dmaengine: ti: Add cppi5 header for UDMA Peter Ujfalusi
2019-09-08 14:25   ` Vinod Koul
2019-09-09 10:59     ` Peter Ujfalusi
2019-09-10  7:06       ` Grygorii Strashko
2019-07-30  9:34 ` [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA Peter Ujfalusi
2019-08-21 17:59   ` Rob Herring
2019-08-22 11:18     ` Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 08/14] dmaengine: ti: New driver for K3 UDMA - split#1: defines, structs, io func Peter Ujfalusi
2019-09-10  7:27   ` Grygorii Strashko
2019-07-30  9:34 ` [PATCH v2 09/14] dmaengine: ti: New driver for K3 UDMA - split#2: probe/remove, xlate and filter_fn Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 10/14] dmaengine: ti: New driver for K3 UDMA - split#3: alloc/free chan_resources Peter Ujfalusi
2019-09-10  7:25   ` Grygorii Strashko
2019-09-10  7:53     ` Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 11/14] dmaengine: ti: New driver for K3 UDMA - split#4: dma_device callbacks 1 Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 12/14] dmaengine: ti: New driver for K3 UDMA - split#5: dma_device callbacks 2 Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 13/14] dmaengine: ti: New driver for K3 UDMA - split#6: Kconfig and Makefile Peter Ujfalusi
2019-07-30  9:34 ` [PATCH v2 14/14] dmaengine: ti: k3-udma: Add glue layer for non DMAengine users Peter Ujfalusi
2019-07-31  7:08 ` [PATCH v2 00/14] dmaengine/soc: Add Texas Instruments UDMA support Peter Ujfalusi
2019-08-30 12:12 ` Peter Ujfalusi

dmaengine Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dmaengine/0 dmaengine/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dmaengine dmaengine/ https://lore.kernel.org/dmaengine \
		dmaengine@vger.kernel.org dmaengine@archiver.kernel.org
	public-inbox-index dmaengine


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.dmaengine


AGPL code for this site: git clone https://public-inbox.org/ public-inbox