linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] dmaengine: Add DW AXI DMAC driver
@ 2017-04-07 14:04 Eugeniy Paltsev
  2017-04-07 14:04 ` [PATCH v2 1/2] dt-bindings: Document the Synopsys DW AXI DMA bindings Eugeniy Paltsev
  2017-04-07 14:04 ` [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver Eugeniy Paltsev
  0 siblings, 2 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-07 14:04 UTC (permalink / raw)
  To: dmaengine
  Cc: linux-kernel, devicetree, linux-snps-arc, Dan Williams,
	Vinod Koul, Rob Herring, Andy Shevchenko, Alexey Brodkin,
	Eugeniy Paltsev

This patch series add support for the DW AXI DMAC controller.

DW AXI DMAC is a part of upcoming development board from Synopsys.

In this driver implementation only DMA_MEMCPY and DMA_SG transfers
are supported.

Changes for v2:
 * Use async version of runtime PM get/put callbacks.
 * Use atomic_t (and corresponding operations) for allocated descriptors
   counter.
 * Use GFP_NOWAIT flag for allocating dma descriptors in "dma_pool_zalloc"
   instead of GFP_ATOMIC flag.
 * Add kernel-doc style comments for the irq enum, cleanup.

Changes for v1:
 * Implement Runtime PM (the driver can operate with or without
   Runtime PM support)
 * Move submitting new txn to interrupt handler from tasklet
 * Free IRQ manually in driver remove function
 * Add 64 bit support
 * Rid of subsys_initcall
 * Use dev_vdbg instead dev_dbg in some places
 * Rid of C99 style comments
 * Add IP version to DT compatible string

Note:
 * I left "is_paused" variable untouched. I checked the drivers which
   have 'enum dma_status' field in their channel data structures -
   there is no much sense to add enum to this driver channel data
   structure - it will be used only for determinating is channel
   paused or not.
 * I left preparation of SG list according to max data width and
   max block size in dma_chan_prep_dma_sg function.
 * I left axi_chan_is_hw_enable assert untouched because it is
   HW per-channel assert. It can't be managed by runtime PM.

Changes for v0:
 * Switch to virt-dma API (according to previous RFC)
 * Small fixies according to previous RFC
 * Add DT bindings

Eugeniy Paltsev (2):
  dt-bindings: Document the Synopsys DW AXI DMA bindings
  dmaengine: Add DW AXI DMAC driver

 .../devicetree/bindings/dma/snps,axi-dw-dmac.txt   |   34 +
 drivers/dma/Kconfig                                |   10 +
 drivers/dma/Makefile                               |    1 +
 drivers/dma/axi_dma_platform.c                     | 1044 ++++++++++++++++++++
 drivers/dma/axi_dma_platform.h                     |  119 +++
 drivers/dma/axi_dma_platform_reg.h                 |  220 +++++
 6 files changed, 1428 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/snps,axi-dw-dmac.txt
 create mode 100644 drivers/dma/axi_dma_platform.c
 create mode 100644 drivers/dma/axi_dma_platform.h
 create mode 100644 drivers/dma/axi_dma_platform_reg.h

-- 
2.5.5

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] dt-bindings: Document the Synopsys DW AXI DMA bindings
  2017-04-07 14:04 [PATCH v2 0/2] dmaengine: Add DW AXI DMAC driver Eugeniy Paltsev
@ 2017-04-07 14:04 ` Eugeniy Paltsev
  2017-04-07 14:04 ` [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver Eugeniy Paltsev
  1 sibling, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-07 14:04 UTC (permalink / raw)
  To: dmaengine
  Cc: linux-kernel, devicetree, linux-snps-arc, Dan Williams,
	Vinod Koul, Rob Herring, Andy Shevchenko, Alexey Brodkin,
	Eugeniy Paltsev

This patch adds documentation of device tree bindings for the Synopsys
DesignWare AXI DMA controller.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 .../devicetree/bindings/dma/snps,axi-dw-dmac.txt   | 34 ++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/snps,axi-dw-dmac.txt

diff --git a/Documentation/devicetree/bindings/dma/snps,axi-dw-dmac.txt b/Documentation/devicetree/bindings/dma/snps,axi-dw-dmac.txt
new file mode 100644
index 0000000..21666fe
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/snps,axi-dw-dmac.txt
@@ -0,0 +1,34 @@
+* Synopsys DesignWare AXI DMA Controller
+
+Required properties:
+- compatible: "snps,axi-dma-1.01a"
+- reg: Address range of the DMAC registers. This should include
+  all of the per-channel registers.
+- interrupt: Should contain the DMAC interrupt number.
+- interrupt-parent: Should be the phandle for the interrupt controller
+  that services interrupts for this device.
+- dma-channels: Number of channels supported by hardware.
+- snps,dma-masters: Number of AXI masters supported by the hardware.
+- snps,data-width: Maximum AXI data width supported by hardware.
+  (0 - 8bits, 1 - 16bits, 2 - 32bits, ..., 6 - 512bits)
+- snps,priority: Priority of channel. Array size is equal to the number of
+  dma-channels. Priority value must be programmed within [0:dma-channels-1]
+  range. (0 - minimum priority)
+- snps,block-size: Maximum block size supported by the controller channel.
+  Array size is equal to the number of dma-channels.
+
+Example:
+
+dmac: dma-controller@80000 {
+	compatible = "snps,axi-dma-1.01a";
+	reg = <0x80000 0x400>;
+	clocks = <&core_clk>;
+	interrupt-parent = <&intc>;
+	interrupts = <27>;
+
+	dma-channels = <4>;
+	snps,dma-masters = <2>;
+	snps,data-width = <3>;
+	snps,block-size = <4096 4096 4096 4096>;
+	snps,priority = <0 1 2 3>;
+};
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-07 14:04 [PATCH v2 0/2] dmaengine: Add DW AXI DMAC driver Eugeniy Paltsev
  2017-04-07 14:04 ` [PATCH v2 1/2] dt-bindings: Document the Synopsys DW AXI DMA bindings Eugeniy Paltsev
@ 2017-04-07 14:04 ` Eugeniy Paltsev
  2017-04-18 12:31   ` Andy Shevchenko
  1 sibling, 1 reply; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-07 14:04 UTC (permalink / raw)
  To: dmaengine
  Cc: linux-kernel, devicetree, linux-snps-arc, Dan Williams,
	Vinod Koul, Rob Herring, Andy Shevchenko, Alexey Brodkin,
	Eugeniy Paltsev

This patch adds support for the DW AXI DMAC controller.

DW AXI DMAC is a part of upcoming development board from Synopsys.

In this driver implementation only DMA_MEMCPY and DMA_SG transfers
are supported.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
---
 drivers/dma/Kconfig                |   10 +
 drivers/dma/Makefile               |    1 +
 drivers/dma/axi_dma_platform.c     | 1044 ++++++++++++++++++++++++++++++++++++
 drivers/dma/axi_dma_platform.h     |  119 ++++
 drivers/dma/axi_dma_platform_reg.h |  220 ++++++++
 5 files changed, 1394 insertions(+)
 create mode 100644 drivers/dma/axi_dma_platform.c
 create mode 100644 drivers/dma/axi_dma_platform.h
 create mode 100644 drivers/dma/axi_dma_platform_reg.h

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index fc3435c..0ad946a 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -578,6 +578,16 @@ config ZX_DMA
 	help
 	  Support the DMA engine for ZTE ZX family platform devices.
 
+config AXI_DW_DMAC
+	tristate "Synopsys DesignWare AXI DMA support"
+	depends on OF || COMPILE_TEST
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Enable support for Synopsys DesignWare AXI DMA controller.
+	  NOTE: This driver wasn't tested on 64 bit platform because
+	  of lack 64 bit platform with Synopsys DW AXI DMAC.
+
 
 # driver files
 source "drivers/dma/bestcomm/Kconfig"
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 0b723e9..7f1824d 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
 obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
+obj-$(CONFIG_AXI_DW_DMAC) += axi_dma_platform.o
 obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
 obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
 obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
diff --git a/drivers/dma/axi_dma_platform.c b/drivers/dma/axi_dma_platform.c
new file mode 100644
index 0000000..d8d9bf3b
--- /dev/null
+++ b/drivers/dma/axi_dma_platform.c
@@ -0,0 +1,1044 @@
+/*
+ * Synopsys DesignWare AXI DMA Controller driver.
+ *
+ * Copyright (C) 2017 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/bitops.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/dmaengine.h>
+#include <linux/dmapool.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/property.h>
+#include <linux/types.h>
+
+#include "axi_dma_platform.h"
+#include "axi_dma_platform_reg.h"
+#include "dmaengine.h"
+#include "virt-dma.h"
+
+#define DRV_NAME	"axi_dw_dmac"
+
+/*
+ * The set of bus widths supported by the DMA controller. DW AXI DMAC supports
+ * master data bus width up to 512 bits (for both AXI master interfaces), but
+ * it depends on IP block configurarion.
+ */
+#define AXI_DMA_BUSWIDTHS		  \
+	(DMA_SLAVE_BUSWIDTH_1_BYTE	| \
+	DMA_SLAVE_BUSWIDTH_2_BYTES	| \
+	DMA_SLAVE_BUSWIDTH_4_BYTES	| \
+	DMA_SLAVE_BUSWIDTH_8_BYTES	| \
+	DMA_SLAVE_BUSWIDTH_16_BYTES	| \
+	DMA_SLAVE_BUSWIDTH_32_BYTES	| \
+	DMA_SLAVE_BUSWIDTH_64_BYTES)
+/* TODO: check: do we need to use BIT() macro here? */
+
+static inline void
+axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
+{
+	iowrite32(val, chip->regs + reg);
+}
+
+static inline u32 axi_dma_ioread32(struct axi_dma_chip *chip, u32 reg)
+{
+	return ioread32(chip->regs + reg);
+}
+
+static inline void
+axi_chan_iowrite32(struct axi_dma_chan *chan, u32 reg, u32 val)
+{
+	iowrite32(val, chan->chan_regs + reg);
+}
+
+static inline u32 axi_chan_ioread32(struct axi_dma_chan *chan, u32 reg)
+{
+	return ioread32(chan->chan_regs + reg);
+}
+
+static inline void
+axi_chan_iowrite64(struct axi_dma_chan *chan, u32 reg, u64 val)
+{
+	iowrite32(val & 0xFFFFFFFF, chan->chan_regs + reg);
+	iowrite32(val >> 32, chan->chan_regs + reg + 4);
+}
+
+static inline void axi_dma_disable(struct axi_dma_chip *chip)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chip, DMAC_CFG);
+	val &= ~DMAC_EN_MASK;
+	axi_dma_iowrite32(chip, DMAC_CFG, val);
+}
+
+static inline void axi_dma_enable(struct axi_dma_chip *chip)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chip, DMAC_CFG);
+	val |= DMAC_EN_MASK;
+	axi_dma_iowrite32(chip, DMAC_CFG, val);
+}
+
+static inline void axi_dma_irq_disable(struct axi_dma_chip *chip)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chip, DMAC_CFG);
+	val &= ~INT_EN_MASK;
+	axi_dma_iowrite32(chip, DMAC_CFG, val);
+}
+
+static inline void axi_dma_irq_enable(struct axi_dma_chip *chip)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chip, DMAC_CFG);
+	val |= INT_EN_MASK;
+	axi_dma_iowrite32(chip, DMAC_CFG, val);
+}
+
+static inline void axi_chan_irq_disable(struct axi_dma_chan *chan, u32 irq_mask)
+{
+	u32 val;
+
+	if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
+		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, DWAXIDMAC_IRQ_NONE);
+	} else {
+		val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
+		val &= ~irq_mask;
+		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
+	}
+}
+
+static inline void axi_chan_irq_set(struct axi_dma_chan *chan, u32 irq_mask)
+{
+	axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, irq_mask);
+}
+
+static inline void axi_chan_irq_sig_set(struct axi_dma_chan *chan, u32 irq_mask)
+{
+	axi_chan_iowrite32(chan, CH_INTSIGNAL_ENA, irq_mask);
+}
+
+static inline void axi_chan_irq_clear(struct axi_dma_chan *chan, u32 irq_mask)
+{
+	axi_chan_iowrite32(chan, CH_INTCLEAR, irq_mask);
+}
+
+static inline u32 axi_chan_irq_read(struct axi_dma_chan *chan)
+{
+	return axi_chan_ioread32(chan, CH_INTSTATUS);
+}
+
+static inline void axi_chan_disable(struct axi_dma_chan *chan)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
+	val &= ~(BIT(chan->id) << DMAC_CHAN_EN_SHIFT);
+	val |=   BIT(chan->id) << DMAC_CHAN_EN_WE_SHIFT;
+	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);
+}
+
+static inline void axi_chan_enable(struct axi_dma_chan *chan)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
+	val |= BIT(chan->id) << DMAC_CHAN_EN_SHIFT |
+	       BIT(chan->id) << DMAC_CHAN_EN_WE_SHIFT;
+	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);
+}
+
+static inline bool axi_chan_is_hw_enable(struct axi_dma_chan *chan)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
+
+	return !!(val & (BIT(chan->id) << DMAC_CHAN_EN_SHIFT));
+}
+
+static void axi_dma_hw_init(struct axi_dma_chip *chip)
+{
+	u32 i;
+
+	for (i = 0; i < chip->dw->hdata->nr_channels; i++) {
+		axi_chan_irq_disable(&chip->dw->chan[i], DWAXIDMAC_IRQ_ALL);
+		axi_chan_disable(&chip->dw->chan[i]);
+	}
+}
+
+static u32 axi_chan_get_xfer_width(struct axi_dma_chan *chan, dma_addr_t src,
+				   dma_addr_t dst, size_t len)
+{
+	u32 max_width = chan->chip->dw->hdata->m_data_width;
+	size_t sdl = (src | dst | len);
+
+	return min_t(size_t, __ffs(sdl), max_width);
+}
+
+static inline const char *axi_chan_name(struct axi_dma_chan *chan)
+{
+	return dma_chan_name(&chan->vc.chan);
+}
+
+static struct axi_dma_desc *axi_desc_get(struct axi_dma_chan *chan)
+{
+	struct dw_axi_dma *dw = chan->chip->dw;
+	struct axi_dma_desc *desc;
+	dma_addr_t phys;
+
+	desc = dma_pool_zalloc(dw->desc_pool, GFP_NOWAIT, &phys);
+	if (unlikely(!desc)) {
+		dev_err(chan2dev(chan), "%s: not enough descriptors available\n",
+			axi_chan_name(chan));
+		return NULL;
+	}
+
+	atomic_inc(&chan->descs_allocated);
+	INIT_LIST_HEAD(&desc->xfer_list);
+	desc->vd.tx.phys = phys;
+	desc->chan = chan;
+
+	return desc;
+}
+
+static void axi_desc_put(struct axi_dma_desc *desc)
+{
+	struct axi_dma_chan *chan = desc->chan;
+	struct dw_axi_dma *dw = chan->chip->dw;
+	struct axi_dma_desc *child, *_next;
+	unsigned int descs_put = 0;
+
+	list_for_each_entry_safe(child, _next, &desc->xfer_list, xfer_list) {
+		list_del(&child->xfer_list);
+		dma_pool_free(dw->desc_pool, child, child->vd.tx.phys);
+		descs_put++;
+	}
+
+	dma_pool_free(dw->desc_pool, desc, desc->vd.tx.phys);
+	descs_put++;
+
+	atomic_sub(descs_put, &chan->descs_allocated);
+	dev_vdbg(chan2dev(chan), "%s: %d descs put, %d still allocated\n",
+		axi_chan_name(chan), descs_put,
+		atomic_read(&chan->descs_allocated));
+}
+
+static void vchan_desc_put(struct virt_dma_desc *vdesc)
+{
+	axi_desc_put(vd_to_axi_desc(vdesc));
+}
+
+static enum dma_status
+dma_chan_tx_status(struct dma_chan *dchan, dma_cookie_t cookie,
+		  struct dma_tx_state *txstate)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+	enum dma_status ret;
+
+	ret = dma_cookie_status(dchan, cookie, txstate);
+
+	if (chan->is_paused && ret == DMA_IN_PROGRESS)
+		return DMA_PAUSED;
+
+	return ret;
+}
+
+static void write_desc_llp(struct axi_dma_desc *desc, dma_addr_t adr)
+{
+	desc->lli.llp = cpu_to_le64(adr);
+}
+
+static void write_chan_llp(struct axi_dma_chan *chan, dma_addr_t adr)
+{
+	axi_chan_iowrite64(chan, CH_LLP, adr);
+}
+
+/* Called in chan locked context */
+static void axi_chan_block_xfer_start(struct axi_dma_chan *chan,
+				      struct axi_dma_desc *first)
+{
+	u32 reg, irq_mask;
+	u8 lms = 0;
+	u32 priority = chan->chip->dw->hdata->priority[chan->id];
+
+	if (unlikely(axi_chan_is_hw_enable(chan))) {
+		dev_err(chan2dev(chan), "%s is non-idle!\n",
+			axi_chan_name(chan));
+
+		return;
+	}
+
+	axi_dma_enable(chan->chip);
+
+	reg = (DWAXIDMAC_MBLK_TYPE_LL << CH_CFG_L_DST_MULTBLK_TYPE_POS |
+	       DWAXIDMAC_MBLK_TYPE_LL << CH_CFG_L_SRC_MULTBLK_TYPE_POS);
+	axi_chan_iowrite32(chan, CH_CFG_L, reg);
+
+	reg = (DWAXIDMAC_TT_FC_MEM_TO_MEM_DMAC << CH_CFG_H_TT_FC_POS |
+	       priority << CH_CFG_H_PRIORITY_POS |
+	       DWAXIDMAC_HS_SEL_HW << CH_CFG_H_HS_SEL_DST_POS |
+	       DWAXIDMAC_HS_SEL_HW << CH_CFG_H_HS_SEL_SRC_POS);
+	axi_chan_iowrite32(chan, CH_CFG_H, reg);
+
+	write_chan_llp(chan, first->vd.tx.phys | lms);
+
+	irq_mask = DWAXIDMAC_IRQ_DMA_TRF | DWAXIDMAC_IRQ_ALL_ERR;
+	axi_chan_irq_sig_set(chan, irq_mask);
+
+	/* Generate 'suspend' status but don't generate interrupt */
+	irq_mask |= DWAXIDMAC_IRQ_SUSPENDED;
+	axi_chan_irq_set(chan, irq_mask);
+
+	axi_chan_enable(chan);
+}
+
+static void axi_chan_start_first_queued(struct axi_dma_chan *chan)
+{
+	struct axi_dma_desc *desc;
+	struct virt_dma_desc *vd;
+
+	vd = vchan_next_desc(&chan->vc);
+	if (!vd)
+		return;
+
+	desc = vd_to_axi_desc(vd);
+	dev_vdbg(chan2dev(chan), "%s: started %u\n", axi_chan_name(chan),
+		vd->tx.cookie);
+	axi_chan_block_xfer_start(chan, desc);
+}
+
+static void dma_chan_issue_pending(struct dma_chan *dchan)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+	if (vchan_issue_pending(&chan->vc))
+		axi_chan_start_first_queued(chan);
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
+static int dma_chan_alloc_chan_resources(struct dma_chan *dchan)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+
+	/* ASSERT: channel hw is idle */
+	if (axi_chan_is_hw_enable(chan)) {
+		dev_err(chan2dev(chan), "%s is non-idle!\n",
+			axi_chan_name(chan));
+		return -EBUSY;
+	}
+
+	dev_vdbg(dchan2dev(dchan), "%s: allocating\n", axi_chan_name(chan));
+
+	dma_cookie_init(dchan);
+
+	pm_runtime_get(chan->chip->dev);
+
+	return 0;
+}
+
+static void dma_chan_free_chan_resources(struct dma_chan *dchan)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+
+	/* ASSERT: channel hw is idle */
+	if (axi_chan_is_hw_enable(chan))
+		dev_err(dchan2dev(dchan), "%s is non-idle!\n",
+			axi_chan_name(chan));
+
+	axi_chan_disable(chan);
+	axi_chan_irq_disable(chan, DWAXIDMAC_IRQ_ALL);
+
+	vchan_free_chan_resources(&chan->vc);
+
+	dev_vdbg(dchan2dev(dchan), "%s: %s: descriptor still allocated: %u\n",
+		__func__, axi_chan_name(chan),
+		atomic_read(&chan->descs_allocated));
+
+	pm_runtime_put(chan->chip->dev);
+}
+
+/*
+ * If DW_axi_dmac sees CHx_CTL.ShadowReg_Or_LLI_Last bit of the fetched LLI
+ * as 1, it understands that the current block is the final block in the
+ * transfer and completes the DMA transfer operation at the end of current
+ * block transfer.
+ */
+static void set_desc_last(struct axi_dma_desc *desc)
+{
+	u32 val;
+
+	val = le32_to_cpu(desc->lli.ctl_hi);
+	val |= CH_CTL_H_LLI_LAST;
+	desc->lli.ctl_hi = cpu_to_le32(val);
+}
+
+static void write_desc_sar(struct axi_dma_desc *desc, dma_addr_t adr)
+{
+	desc->lli.sar = cpu_to_le64(adr);
+}
+
+static void write_desc_dar(struct axi_dma_desc *desc, dma_addr_t adr)
+{
+	desc->lli.dar = cpu_to_le64(adr);
+}
+
+static void set_desc_src_master(struct axi_dma_desc *desc)
+{
+	u32 val;
+
+	/* Select AXI0 for source master */
+	val = le32_to_cpu(desc->lli.ctl_lo);
+	val &= ~CH_CTL_L_SRC_MAST;
+	desc->lli.ctl_lo = cpu_to_le32(val);
+}
+
+static void set_desc_dest_master(struct axi_dma_desc *desc)
+{
+	u32 val;
+
+	/* Select AXI1 for source master if available */
+	val = le32_to_cpu(desc->lli.ctl_lo);
+	if (desc->chan->chip->dw->hdata->nr_masters > 1)
+		val |= CH_CTL_L_DST_MAST;
+	else
+		val &= ~CH_CTL_L_DST_MAST;
+
+	desc->lli.ctl_lo = cpu_to_le32(val);
+}
+
+static struct dma_async_tx_descriptor *
+dma_chan_prep_dma_sg(struct dma_chan *dchan,
+		     struct scatterlist *dst_sg, unsigned int dst_nents,
+		     struct scatterlist *src_sg, unsigned int src_nents,
+		     unsigned long flags)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+	struct axi_dma_desc *first = NULL, *desc = NULL, *prev = NULL;
+	size_t dst_len = 0, src_len = 0, xfer_len = 0;
+	dma_addr_t dst_adr = 0, src_adr = 0;
+	u32 src_width, dst_width;
+	size_t block_ts, max_block_ts;
+	u32 reg;
+	u8 lms = 0;
+
+	dev_dbg(chan2dev(chan), "%s: %s: sn: %d dn: %d flags: 0x%lx",
+		__func__, axi_chan_name(chan), src_nents, dst_nents, flags);
+
+	if (unlikely(dst_nents == 0 || src_nents == 0))
+		return NULL;
+
+	if (unlikely(dst_sg == NULL || src_sg == NULL))
+		return NULL;
+
+	max_block_ts = chan->chip->dw->hdata->block_size[chan->id];
+
+	/*
+	 * Loop until there is either no more source or no more destination
+	 * scatterlist entry.
+	 */
+	while ((dst_len || (dst_sg && dst_nents)) &&
+	       (src_len || (src_sg && src_nents))) {
+		if (dst_len == 0) {
+			dst_adr = sg_dma_address(dst_sg);
+			dst_len = sg_dma_len(dst_sg);
+
+			dst_sg = sg_next(dst_sg);
+			dst_nents--;
+		}
+
+		/* Process src sg list */
+		if (src_len == 0) {
+			src_adr = sg_dma_address(src_sg);
+			src_len = sg_dma_len(src_sg);
+
+			src_sg = sg_next(src_sg);
+			src_nents--;
+		}
+
+		/* Min of src and dest length will be this xfer length */
+		xfer_len = min_t(size_t, src_len, dst_len);
+		if (xfer_len == 0)
+			continue;
+
+		/* Take care for the alignment */
+		src_width = axi_chan_get_xfer_width(chan, src_adr,
+						    dst_adr, xfer_len);
+		/*
+		 * Actually src_width and dst_width can be different, but make
+		 * them same to be simpler.
+		 */
+		dst_width = src_width;
+
+		/*
+		 * block_ts indicates the total number of data of width
+		 * src_width to be transferred in a DMA block transfer.
+		 * BLOCK_TS register should be set to block_ts -1
+		 */
+		block_ts = xfer_len >> src_width;
+		if (block_ts > max_block_ts) {
+			block_ts = max_block_ts;
+			xfer_len = max_block_ts << src_width;
+		}
+
+		desc = axi_desc_get(chan);
+		if (unlikely(!desc))
+			goto err_desc_get;
+
+		write_desc_sar(desc, src_adr);
+		write_desc_dar(desc, dst_adr);
+		desc->lli.block_ts_lo = cpu_to_le32(block_ts - 1);
+		desc->lli.ctl_hi = cpu_to_le32(CH_CTL_H_LLI_VALID);
+
+		reg = (DWAXIDMAC_BURST_TRANS_LEN_4 << CH_CTL_L_DST_MSIZE_POS |
+		       DWAXIDMAC_BURST_TRANS_LEN_4 << CH_CTL_L_SRC_MSIZE_POS |
+		       dst_width << CH_CTL_L_DST_WIDTH_POS |
+		       src_width << CH_CTL_L_SRC_WIDTH_POS |
+		       DWAXIDMAC_CH_CTL_L_INC << CH_CTL_L_DST_INC_POS |
+		       DWAXIDMAC_CH_CTL_L_INC << CH_CTL_L_SRC_INC_POS);
+		desc->lli.ctl_lo = cpu_to_le32(reg);
+
+		set_desc_src_master(desc);
+		set_desc_dest_master(desc);
+
+		/* Manage transfer list (xfer_list) */
+		if (!first) {
+			first = desc;
+		} else {
+			list_add_tail(&desc->xfer_list, &first->xfer_list);
+			write_desc_llp(prev, desc->vd.tx.phys | lms);
+		}
+		prev = desc;
+
+		/* update the lengths and addresses for the next loop cycle */
+		dst_len -= xfer_len;
+		src_len -= xfer_len;
+		dst_adr += xfer_len;
+		src_adr += xfer_len;
+	}
+
+	/* Total len of src/dest sg == 0, so no descriptor were allocated */
+	if (unlikely(!first))
+		return NULL;
+
+	/* Set end-of-link to the last link descriptor of list */
+	set_desc_last(desc);
+
+	return vchan_tx_prep(&chan->vc, &first->vd, flags);
+
+err_desc_get:
+	axi_desc_put(first);
+	return NULL;
+}
+
+static struct dma_async_tx_descriptor *
+dma_chan_prep_dma_memcpy(struct dma_chan *dchan, dma_addr_t dest,
+			 dma_addr_t src, size_t len, unsigned long flags)
+{
+	unsigned int nents = 1;
+	struct scatterlist dst_sg;
+	struct scatterlist src_sg;
+
+	sg_init_table(&dst_sg, nents);
+	sg_init_table(&src_sg, nents);
+
+	sg_dma_address(&dst_sg) = dest;
+	sg_dma_address(&src_sg) = src;
+
+	sg_dma_len(&dst_sg) = len;
+	sg_dma_len(&src_sg) = len;
+
+	/* Implement memcpy transfer as sg transfer with single list */
+	return dma_chan_prep_dma_sg(dchan, &dst_sg, nents,
+				    &src_sg, nents, flags);
+}
+
+static void axi_chan_dump_lli(struct axi_dma_chan *chan,
+			      struct axi_dma_desc *desc)
+{
+	dev_err(dchan2dev(&chan->vc.chan),
+		"SAR: 0x%llx DAR: 0x%llx LLP: 0x%llx BTS 0x%x CTL: 0x%x:%08x",
+		le64_to_cpu(desc->lli.sar),
+		le64_to_cpu(desc->lli.dar),
+		le64_to_cpu(desc->lli.llp),
+		le32_to_cpu(desc->lli.block_ts_lo),
+		le32_to_cpu(desc->lli.ctl_hi),
+		le32_to_cpu(desc->lli.ctl_lo));
+}
+
+static void axi_chan_list_dump_lli(struct axi_dma_chan *chan,
+				   struct axi_dma_desc *desc_head)
+{
+	struct axi_dma_desc *desc;
+
+	axi_chan_dump_lli(chan, desc_head);
+	list_for_each_entry(desc, &desc_head->xfer_list, xfer_list)
+		axi_chan_dump_lli(chan, desc);
+}
+
+
+static void axi_chan_handle_err(struct axi_dma_chan *chan, u32 status)
+{
+	struct virt_dma_desc *vd;
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	axi_chan_disable(chan);
+
+	/* The bad descriptor currently is in the head of vc list */
+	vd = vchan_next_desc(&chan->vc);
+	/* Remove the completed descriptor from issued list */
+	list_del(&vd->node);
+
+	/* WARN about bad descriptor */
+	dev_err(chan2dev(chan),
+		"Bad descriptor submitted for %s, cookie: %d, irq: 0x%08x\n",
+		axi_chan_name(chan), vd->tx.cookie, status);
+	axi_chan_list_dump_lli(chan, vd_to_axi_desc(vd));
+
+	/* Pretend the bad descriptor completed successfully */
+	vchan_cookie_complete(vd);
+
+	/* Try to restart the controller */
+	axi_chan_start_first_queued(chan);
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
+static void axi_chan_block_xfer_complete(struct axi_dma_chan *chan)
+{
+	struct virt_dma_desc *vd;
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+	if (unlikely(axi_chan_is_hw_enable(chan))) {
+		dev_err(chan2dev(chan), "BUG: %s catched DWAXIDMAC_IRQ_DMA_TRF, but channel not idle!\n",
+			axi_chan_name(chan));
+		axi_chan_disable(chan);
+	}
+
+	/* The completed descriptor currently is in the head of vc list */
+	vd = vchan_next_desc(&chan->vc);
+	/* Remove the completed descriptor from issued list before completing */
+	list_del(&vd->node);
+	vchan_cookie_complete(vd);
+
+	/* Submit queued descriptors after processing the completed ones */
+	axi_chan_start_first_queued(chan);
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
+static irqreturn_t dw_axi_dma_intretupt(int irq, void *dev_id)
+{
+	struct axi_dma_chip *chip = dev_id;
+	struct dw_axi_dma *dw = chip->dw;
+	struct axi_dma_chan *chan;
+
+	u32 status, i;
+
+	/* Disable DMAC inerrupts. We'll enable them after processing chanels */
+	axi_dma_irq_disable(chip);
+
+	/* Poll, clear and process every chanel interrupt status */
+	for (i = 0; i < dw->hdata->nr_channels; i++) {
+		chan = &dw->chan[i];
+		status = axi_chan_irq_read(chan);
+		axi_chan_irq_clear(chan, status);
+
+		dev_vdbg(chip->dev, "%s %u IRQ status: 0x%08x\n",
+			axi_chan_name(chan), i, status);
+
+		if (status & DWAXIDMAC_IRQ_ALL_ERR)
+			axi_chan_handle_err(chan, status);
+		else if (status & DWAXIDMAC_IRQ_DMA_TRF)
+			axi_chan_block_xfer_complete(chan);
+	}
+
+	/* Re-enable interrupts */
+	axi_dma_irq_enable(chip);
+
+	return IRQ_HANDLED;
+}
+
+static int dma_chan_terminate_all(struct dma_chan *dchan)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+	unsigned long flags;
+	LIST_HEAD(head);
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	axi_chan_disable(chan);
+
+	vchan_get_all_descriptors(&chan->vc, &head);
+
+	/*
+	 * As vchan_dma_desc_free_list can access to desc_allocated list
+	 * we need to call it in vc.lock context.
+	 */
+	vchan_dma_desc_free_list(&chan->vc, &head);
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+
+	dev_vdbg(dchan2dev(dchan), "terminated: %s\n", axi_chan_name(chan));
+
+	return 0;
+}
+
+static int dma_chan_pause(struct dma_chan *dchan)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+	unsigned long flags;
+	unsigned int timeout = 20; /* timeout iterations */
+	int ret = -EAGAIN;
+	u32 val;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
+	val |= BIT(chan->id) << DMAC_CHAN_SUSP_SHIFT |
+	       BIT(chan->id) << DMAC_CHAN_SUSP_WE_SHIFT;
+	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);
+
+	while (timeout--) {
+		if (axi_chan_irq_read(chan) & DWAXIDMAC_IRQ_SUSPENDED) {
+			ret = 0;
+			break;
+		}
+		udelay(2);
+	}
+
+	axi_chan_irq_clear(chan, DWAXIDMAC_IRQ_SUSPENDED);
+
+	chan->is_paused = true;
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+
+	return ret;
+}
+
+/* Called in chan locked context */
+static inline void axi_chan_resume(struct axi_dma_chan *chan)
+{
+	u32 val;
+
+	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
+	val &= ~(BIT(chan->id) << DMAC_CHAN_SUSP_SHIFT);
+	val |=  (BIT(chan->id) << DMAC_CHAN_SUSP_WE_SHIFT);
+	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);
+
+	chan->is_paused = false;
+}
+
+static int dma_chan_resume(struct dma_chan *dchan)
+{
+	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	if (chan->is_paused)
+		axi_chan_resume(chan);
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+
+	return 0;
+}
+
+static int axi_dma_runtime_suspend(struct device *dev)
+{
+	struct axi_dma_chip *chip = dev_get_drvdata(dev);
+
+	dev_info(dev, "PAL: %s\n", __func__);
+
+	axi_dma_irq_disable(chip);
+	axi_dma_disable(chip);
+
+	clk_disable_unprepare(chip->clk);
+
+	return 0;
+}
+
+static int axi_dma_runtime_resume(struct device *dev)
+{
+	struct axi_dma_chip *chip = dev_get_drvdata(dev);
+	int ret = 0;
+
+	dev_info(dev, "PAL: %s\n", __func__);
+
+	ret = clk_prepare_enable(chip->clk);
+	if (ret < 0)
+		return ret;
+
+	axi_dma_enable(chip);
+	axi_dma_irq_enable(chip);
+
+	return 0;
+}
+
+static int parse_device_properties(struct axi_dma_chip *chip)
+{
+	struct device *dev = chip->dev;
+	u32 tmp, carr[DMAC_MAX_CHANNELS];
+	int ret;
+
+	ret = device_property_read_u32(dev, "dma-channels", &tmp);
+	if (ret)
+		return ret;
+	if (tmp == 0 || tmp > DMAC_MAX_CHANNELS)
+		return -EINVAL;
+
+	chip->dw->hdata->nr_channels = tmp;
+
+	ret = device_property_read_u32(dev, "snps,dma-masters", &tmp);
+	if (ret)
+		return ret;
+	if (tmp == 0 || tmp > DMAC_MAX_MASTERS)
+		return -EINVAL;
+
+	chip->dw->hdata->nr_masters = tmp;
+
+	ret = device_property_read_u32(dev, "snps,data-width", &tmp);
+	if (ret)
+		return ret;
+	if (tmp > DWAXIDMAC_TRANS_WIDTH_MAX)
+		return -EINVAL;
+
+	chip->dw->hdata->m_data_width = tmp;
+
+	ret = device_property_read_u32_array(dev, "snps,block-size", carr,
+					     chip->dw->hdata->nr_channels);
+	if (ret)
+		return ret;
+	for (tmp = 0; tmp < chip->dw->hdata->nr_channels; tmp++)
+		if (carr[tmp] == 0 || carr[tmp] > DMAC_MAX_BLK_SIZE)
+			return -EINVAL;
+		else
+			chip->dw->hdata->block_size[tmp] = carr[tmp];
+
+	ret = device_property_read_u32_array(dev, "snps,priority", carr,
+					     chip->dw->hdata->nr_channels);
+	if (ret)
+		return ret;
+	/* Priority value must be programmed within [0:nr_channels-1] range */
+	for (tmp = 0; tmp < chip->dw->hdata->nr_channels; tmp++)
+		if (carr[tmp] >= chip->dw->hdata->nr_channels)
+			return -EINVAL;
+		else
+			chip->dw->hdata->priority[tmp] = carr[tmp];
+
+	return 0;
+}
+
+static int dw_probe(struct platform_device *pdev)
+{
+	struct axi_dma_chip *chip;
+	struct resource *mem;
+	struct dw_axi_dma *dw;
+	struct dw_axi_dma_hcfg *hdata;
+	u32 i;
+	int ret;
+
+	chip = devm_kzalloc(&pdev->dev, sizeof(*chip), GFP_KERNEL);
+	if (!chip)
+		return -ENOMEM;
+
+	dw = devm_kzalloc(&pdev->dev, sizeof(*dw), GFP_KERNEL);
+	if (!dw)
+		return -ENOMEM;
+
+	hdata = devm_kzalloc(&pdev->dev, sizeof(*hdata), GFP_KERNEL);
+	if (!hdata)
+		return -ENOMEM;
+
+	chip->dw = dw;
+	chip->dev = &pdev->dev;
+	chip->dw->hdata = hdata;
+
+	chip->irq = platform_get_irq(pdev, 0);
+	if (chip->irq < 0)
+		return chip->irq;
+
+	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	chip->regs = devm_ioremap_resource(chip->dev, mem);
+	if (IS_ERR(chip->regs))
+		return PTR_ERR(chip->regs);
+
+	chip->clk = devm_clk_get(chip->dev, NULL);
+	if (IS_ERR(chip->clk))
+		return PTR_ERR(chip->clk);
+
+	ret = parse_device_properties(chip);
+	if (ret)
+		return ret;
+
+	dw->chan = devm_kcalloc(chip->dev, hdata->nr_channels,
+				sizeof(*dw->chan), GFP_KERNEL);
+	if (!dw->chan)
+		return -ENOMEM;
+
+	ret = devm_request_irq(chip->dev, chip->irq, dw_axi_dma_intretupt,
+			       IRQF_SHARED, DRV_NAME, chip);
+	if (ret)
+		return ret;
+
+	/* Lli address must be aligned to a 64-byte boundary */
+	dw->desc_pool = dmam_pool_create(DRV_NAME, chip->dev,
+					 sizeof(struct axi_dma_desc), 64, 0);
+	if (!dw->desc_pool) {
+		dev_err(chip->dev, "No memory for descriptors dma pool\n");
+		return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&dw->dma.channels);
+	for (i = 0; i < hdata->nr_channels; i++) {
+		struct axi_dma_chan *chan = &dw->chan[i];
+
+		chan->chip = chip;
+		chan->id = (u8)i;
+		chan->chan_regs = chip->regs + COMMON_REG_LEN + i * CHAN_REG_LEN;
+		atomic_set(&chan->descs_allocated, 0);
+
+		chan->vc.desc_free = vchan_desc_put;
+		vchan_init(&chan->vc, &dw->dma);
+	}
+
+	/* Set capabilities */
+	dma_cap_set(DMA_MEMCPY, dw->dma.cap_mask);
+	dma_cap_set(DMA_SG, dw->dma.cap_mask);
+
+	/* DMA capabilities */
+	dw->dma.chancnt = hdata->nr_channels;
+	dw->dma.src_addr_widths = AXI_DMA_BUSWIDTHS;
+	dw->dma.dst_addr_widths = AXI_DMA_BUSWIDTHS;
+	dw->dma.directions = BIT(DMA_MEM_TO_MEM);
+	dw->dma.residue_granularity = DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
+
+	dw->dma.dev = chip->dev;
+	dw->dma.device_tx_status = dma_chan_tx_status;
+	dw->dma.device_issue_pending = dma_chan_issue_pending;
+	dw->dma.device_terminate_all = dma_chan_terminate_all;
+	dw->dma.device_pause = dma_chan_pause;
+	dw->dma.device_resume = dma_chan_resume;
+
+	dw->dma.device_alloc_chan_resources = dma_chan_alloc_chan_resources;
+	dw->dma.device_free_chan_resources = dma_chan_free_chan_resources;
+
+	dw->dma.device_prep_dma_memcpy = dma_chan_prep_dma_memcpy;
+	dw->dma.device_prep_dma_sg = dma_chan_prep_dma_sg;
+
+	platform_set_drvdata(pdev, chip);
+
+	pm_runtime_enable(chip->dev);
+
+	/*
+	 * We can't just call pm_runtime_get here instead of
+	 * pm_runtime_get_noresume + axi_dma_runtime_resume because we need
+	 * driver to work also without Runtime PM.
+	 */
+	pm_runtime_get_noresume(chip->dev);
+	ret = axi_dma_runtime_resume(chip->dev);
+	if (ret < 0)
+		goto err_pm_disable;
+
+	axi_dma_hw_init(chip);
+
+	pm_runtime_put(chip->dev);
+
+	ret = dma_async_device_register(&dw->dma);
+	if (ret)
+		goto err_pm_disable;
+
+	dev_info(chip->dev, "DesignWare AXI DMA Controller, %d channels\n",
+		 dw->hdata->nr_channels);
+
+	return 0;
+
+err_pm_disable:
+	pm_runtime_disable(chip->dev);
+
+	return ret;
+}
+
+static int dw_remove(struct platform_device *pdev)
+{
+	struct axi_dma_chip *chip = platform_get_drvdata(pdev);
+	struct dw_axi_dma *dw = chip->dw;
+	struct axi_dma_chan *chan, *_chan;
+	u32 i;
+
+	/* Enable clk before accessing to registers */
+	clk_prepare_enable(chip->clk);
+	axi_dma_irq_disable(chip);
+	for (i = 0; i < dw->hdata->nr_channels; i++) {
+		axi_chan_disable(&chip->dw->chan[i]);
+		axi_chan_irq_disable(&chip->dw->chan[i], DWAXIDMAC_IRQ_ALL);
+	}
+	axi_dma_disable(chip);
+
+	pm_runtime_disable(chip->dev);
+	axi_dma_runtime_suspend(chip->dev);
+
+	devm_free_irq(chip->dev, chip->irq, chip);
+
+	list_for_each_entry_safe(chan, _chan, &dw->dma.channels,
+			vc.chan.device_node) {
+		list_del(&chan->vc.chan.device_node);
+		tasklet_kill(&chan->vc.task);
+	}
+
+	dma_async_device_unregister(&dw->dma);
+
+	return 0;
+}
+
+static const struct dev_pm_ops dw_axi_dma_pm_ops = {
+	SET_RUNTIME_PM_OPS(axi_dma_runtime_suspend, axi_dma_runtime_resume, NULL)
+};
+
+static const struct of_device_id dw_dma_of_id_table[] = {
+	{ .compatible = "snps,axi-dma-1.01a" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, dw_dma_of_id_table);
+
+static struct platform_driver dw_driver = {
+	.probe		= dw_probe,
+	.remove		= dw_remove,
+	.driver = {
+		.name	= DRV_NAME,
+		.of_match_table = of_match_ptr(dw_dma_of_id_table),
+		.pm = &dw_axi_dma_pm_ops,
+	},
+};
+module_platform_driver(dw_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Synopsys DesignWare AXI DMA Controller platform driver");
+MODULE_AUTHOR("Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>");
diff --git a/drivers/dma/axi_dma_platform.h b/drivers/dma/axi_dma_platform.h
new file mode 100644
index 0000000..3d644d2
--- /dev/null
+++ b/drivers/dma/axi_dma_platform.h
@@ -0,0 +1,119 @@
+/*
+ * Synopsys DesignWare AXI DMA Controller driver.
+ *
+ * Copyright (C) 2017 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _AXI_DMA_PLATFORM_H
+#define _AXI_DMA_PLATFORM_H
+
+#include <linux/bitops.h>
+#include <linux/clk.h>
+#include <linux/device.h>
+#include <linux/dmaengine.h>
+#include <linux/types.h>
+
+#include "virt-dma.h"
+
+#define DMAC_MAX_CHANNELS	8
+#define DMAC_MAX_MASTERS	2
+#define DMAC_MAX_BLK_SIZE	0x200000
+
+struct dw_axi_dma_hcfg {
+	u32	nr_channels;
+	u32	nr_masters;
+	u32	m_data_width;
+	u32	block_size[DMAC_MAX_CHANNELS];
+	u32	priority[DMAC_MAX_CHANNELS];
+};
+
+struct axi_dma_chan {
+	struct axi_dma_chip		*chip;
+	void __iomem			*chan_regs;
+	u8				id;
+	atomic_t			descs_allocated;
+
+	struct virt_dma_chan		vc;
+
+	/* these other elements are all protected by vc.lock */
+	bool				is_paused;
+};
+
+struct dw_axi_dma {
+	struct dma_device	dma;
+	struct dw_axi_dma_hcfg	*hdata;
+	struct dma_pool		*desc_pool;
+
+	/* channels */
+	struct axi_dma_chan	*chan;
+};
+
+struct axi_dma_chip {
+	struct device		*dev;
+	int			irq;
+	void __iomem		*regs;
+	struct clk		*clk;
+	struct dw_axi_dma	*dw;
+};
+
+/* LLI == Linked List Item */
+struct __attribute__ ((__packed__)) axi_dma_lli {
+	__le64		sar;
+	__le64		dar;
+	__le32		block_ts_lo;
+	__le32		block_ts_hi;
+	__le64		llp;
+	__le32		ctl_lo;
+	__le32		ctl_hi;
+	__le32		sstat;
+	__le32		dstat;
+	__le32		status_lo;
+	__le32		ststus_hi;
+	__le32		reserved_lo;
+	__le32		reserved_hi;
+};
+
+struct axi_dma_desc {
+	struct axi_dma_lli		lli;
+
+	struct virt_dma_desc		vd;
+	struct axi_dma_chan		*chan;
+	struct list_head		xfer_list;
+};
+
+static inline struct device *dchan2dev(struct dma_chan *dchan)
+{
+	return &dchan->dev->device;
+}
+
+static inline struct device *chan2dev(struct axi_dma_chan *chan)
+{
+	return &chan->vc.chan.dev->device;
+}
+
+static inline struct axi_dma_desc *vd_to_axi_desc(struct virt_dma_desc *vd)
+{
+	return container_of(vd, struct axi_dma_desc, vd);
+}
+
+static inline struct axi_dma_chan *vc_to_axi_dma_chan(struct virt_dma_chan *vc)
+{
+	return container_of(vc, struct axi_dma_chan, vc);
+}
+
+static inline struct axi_dma_chan *dchan_to_axi_dma_chan(struct dma_chan *dchan)
+{
+	return vc_to_axi_dma_chan(to_virt_chan(dchan));
+}
+
+#endif /* _AXI_DMA_PLATFORM_H */
diff --git a/drivers/dma/axi_dma_platform_reg.h b/drivers/dma/axi_dma_platform_reg.h
new file mode 100644
index 0000000..72110cc
--- /dev/null
+++ b/drivers/dma/axi_dma_platform_reg.h
@@ -0,0 +1,220 @@
+/*
+ * Synopsys DesignWare AXI DMA Controller driver.
+ *
+ * Copyright (C) 2017 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _AXI_DMA_PLATFORM_REG_H
+#define _AXI_DMA_PLATFORM_REG_H
+
+#include <linux/bitops.h>
+
+#define COMMON_REG_LEN		0x100
+#define CHAN_REG_LEN		0x100
+
+/* Common registers offset */
+#define DMAC_ID			0x000 /* R DMAC ID */
+#define DMAC_COMPVER		0x008 /* R DMAC Component Version */
+#define DMAC_CFG		0x010 /* R/W DMAC Configuration */
+#define DMAC_CHEN		0x018 /* R/W DMAC Channel Enable */
+#define DMAC_CHEN_L		0x018 /* R/W DMAC Channel Enable 00-31 */
+#define DMAC_CHEN_H		0x01C /* R/W DMAC Channel Enable 32-63 */
+#define DMAC_INTSTATUS		0x030 /* R DMAC Interrupt Status */
+#define DMAC_COMMON_INTCLEAR	0x038 /* W DMAC Interrupt Clear */
+#define DMAC_COMMON_INTSTATUS_ENA 0x040 /* R DMAC Interrupt Status Enable */
+#define DMAC_COMMON_INTSIGNAL_ENA 0x048 /* R/W DMAC Interrupt Signal Enable */
+#define DMAC_COMMON_INTSTATUS	0x050 /* R DMAC Interrupt Status */
+#define DMAC_RESET		0x058 /* R DMAC Reset Register1 */
+
+/* DMA channel registers offset */
+#define CH_SAR			0x000 /* R/W Chan Source Address */
+#define CH_DAR			0x008 /* R/W Chan Destination Address */
+#define CH_BLOCK_TS		0x010 /* R/W Chan Block Transfer Size */
+#define CH_CTL			0x018 /* R/W Chan Control */
+#define CH_CTL_L		0x018 /* R/W Chan Control 00-31 */
+#define CH_CTL_H		0x01C /* R/W Chan Control 32-63 */
+#define CH_CFG			0x020 /* R/W Chan Configuration */
+#define CH_CFG_L		0x020 /* R/W Chan Configuration 00-31 */
+#define CH_CFG_H		0x024 /* R/W Chan Configuration 32-63 */
+#define CH_LLP			0x028 /* R/W Chan Linked List Pointer */
+#define CH_STATUS		0x030 /* R Chan Status */
+#define CH_SWHSSRC		0x038 /* R/W Chan SW Handshake Source */
+#define CH_SWHSDST		0x040 /* R/W Chan SW Handshake Destination */
+#define CH_BLK_TFR_RESUMEREQ	0x048 /* W Chan Block Transfer Resume Req */
+#define CH_AXI_ID		0x050 /* R/W Chan AXI ID */
+#define CH_AXI_QOS		0x058 /* R/W Chan AXI QOS */
+#define CH_SSTAT		0x060 /* R Chan Source Status */
+#define CH_DSTAT		0x068 /* R Chan Destination Status */
+#define CH_SSTATAR		0x070 /* R/W Chan Source Status Fetch Addr */
+#define CH_DSTATAR		0x078 /* R/W Chan Destination Status Fetch Addr */
+#define CH_INTSTATUS_ENA	0x080 /* R/W Chan Interrupt Status Enable */
+#define CH_INTSTATUS		0x088 /* R/W Chan Interrupt Status */
+#define CH_INTSIGNAL_ENA	0x090 /* R/W Chan Interrupt Signal Enable */
+#define CH_INTCLEAR		0x098 /* W Chan Interrupt Clear */
+
+
+/* DMAC_CFG */
+#define DMAC_EN_MASK		0x00000001U
+#define DMAC_EN_POS		0
+
+#define INT_EN_MASK		0x00000002U
+#define INT_EN_POS		1
+
+#define DMAC_CHAN_EN_SHIFT	0
+#define DMAC_CHAN_EN_WE_SHIFT	8
+
+#define DMAC_CHAN_SUSP_SHIFT	16
+#define DMAC_CHAN_SUSP_WE_SHIFT	24
+
+/* CH_CTL_H */
+#define CH_CTL_H_LLI_LAST	BIT(30)
+#define CH_CTL_H_LLI_VALID	BIT(31)
+
+/* CH_CTL_L */
+#define CH_CTL_L_LAST_WRITE_EN	BIT(30)
+
+#define CH_CTL_L_DST_MSIZE_POS	18
+#define CH_CTL_L_SRC_MSIZE_POS	14
+
+enum {
+	DWAXIDMAC_BURST_TRANS_LEN_1	= 0x0,
+	DWAXIDMAC_BURST_TRANS_LEN_4,
+	DWAXIDMAC_BURST_TRANS_LEN_8,
+	DWAXIDMAC_BURST_TRANS_LEN_16,
+	DWAXIDMAC_BURST_TRANS_LEN_32,
+	DWAXIDMAC_BURST_TRANS_LEN_64,
+	DWAXIDMAC_BURST_TRANS_LEN_128,
+	DWAXIDMAC_BURST_TRANS_LEN_256,
+	DWAXIDMAC_BURST_TRANS_LEN_512,
+	DWAXIDMAC_BURST_TRANS_LEN_1024
+};
+
+#define CH_CTL_L_DST_WIDTH_POS	11
+#define CH_CTL_L_SRC_WIDTH_POS	8
+
+#define CH_CTL_L_DST_INC_POS	6
+#define CH_CTL_L_SRC_INC_POS	4
+enum {
+	DWAXIDMAC_CH_CTL_L_INC	= 0x0,
+	DWAXIDMAC_CH_CTL_L_NOINC
+};
+
+#define CH_CTL_L_DST_MAST	BIT(2)
+#define CH_CTL_L_SRC_MAST	BIT(0)
+
+/* CH_CFG_H */
+#define CH_CFG_H_PRIORITY_POS	17
+#define CH_CFG_H_HS_SEL_DST_POS	4
+#define CH_CFG_H_HS_SEL_SRC_POS	3
+enum {
+	DWAXIDMAC_HS_SEL_HW	= 0x0,
+	DWAXIDMAC_HS_SEL_SW
+};
+
+#define CH_CFG_H_TT_FC_POS	0
+enum {
+	DWAXIDMAC_TT_FC_MEM_TO_MEM_DMAC	= 0x0,
+	DWAXIDMAC_TT_FC_MEM_TO_PER_DMAC,
+	DWAXIDMAC_TT_FC_PER_TO_MEM_DMAC,
+	DWAXIDMAC_TT_FC_PER_TO_PER_DMAC,
+	DWAXIDMAC_TT_FC_PER_TO_MEM_SRC,
+	DWAXIDMAC_TT_FC_PER_TO_PER_SRC,
+	DWAXIDMAC_TT_FC_MEM_TO_PER_DST,
+	DWAXIDMAC_TT_FC_PER_TO_PER_DST
+};
+
+/* CH_CFG_L */
+#define CH_CFG_L_DST_MULTBLK_TYPE_POS	2
+#define CH_CFG_L_SRC_MULTBLK_TYPE_POS	0
+enum {
+	DWAXIDMAC_MBLK_TYPE_CONTIGUOUS	= 0x0,
+	DWAXIDMAC_MBLK_TYPE_RELOAD,
+	DWAXIDMAC_MBLK_TYPE_SHADOW_REG,
+	DWAXIDMAC_MBLK_TYPE_LL
+};
+
+/**
+ * Dw axi dma channel interrupts
+ *
+ * @DWAXIDMAC_IRQ_NONE: Bitmask of no one interrupt
+ * @DWAXIDMAC_IRQ_BLOCK_TRF: Block transfer complete
+ * @DWAXIDMAC_IRQ_DMA_TRF: Dma transfer complete
+ * @DWAXIDMAC_IRQ_SRC_TRAN: Source transaction complete
+ * @DWAXIDMAC_IRQ_DST_TRAN: Destination transaction complete
+ * @DWAXIDMAC_IRQ_SRC_DEC_ERR: Source decode error
+ * @DWAXIDMAC_IRQ_DST_DEC_ERR: Destination decode error
+ * @DWAXIDMAC_IRQ_SRC_SLV_ERR: Source slave error
+ * @DWAXIDMAC_IRQ_DST_SLV_ERR: Destination slave error
+ * @DWAXIDMAC_IRQ_LLI_RD_DEC_ERR: LLI read decode error
+ * @DWAXIDMAC_IRQ_LLI_WR_DEC_ERR: LLI write decode error
+ * @DWAXIDMAC_IRQ_LLI_RD_SLV_ERR: LLI read slave error
+ * @DWAXIDMAC_IRQ_LLI_WR_SLV_ERR: LLI write slave error
+ * @DWAXIDMAC_IRQ_INVALID_ERR: LLI invalide error or Shadow register error
+ * @DWAXIDMAC_IRQ_MULTIBLKTYPE_ERR: Slave Interface Multiblock type error
+ * @DWAXIDMAC_IRQ_DEC_ERR: Slave Interface decode error
+ * @DWAXIDMAC_IRQ_WR2RO_ERR: Slave Interface write to read only error
+ * @DWAXIDMAC_IRQ_RD2RWO_ERR: Slave Interface read to write only error
+ * @DWAXIDMAC_IRQ_WRONCHEN_ERR: Slave Interface write to channel error
+ * @DWAXIDMAC_IRQ_SHADOWREG_ERR: Slave Interface shadow reg error
+ * @DWAXIDMAC_IRQ_WRONHOLD_ERR: Slave Interface hold error
+ * @DWAXIDMAC_IRQ_LOCK_CLEARED: Lock Cleared Status
+ * @DWAXIDMAC_IRQ_SRC_SUSPENDED: Source Suspended Status
+ * @DWAXIDMAC_IRQ_SUSPENDED: Channel Suspended Status
+ * @DWAXIDMAC_IRQ_DISABLED: Channel Disabled Status
+ * @DWAXIDMAC_IRQ_ABORTED: Channel Aborted Status
+ * @DWAXIDMAC_IRQ_ALL_ERR: Bitmask of all error interrupts
+ * @DWAXIDMAC_IRQ_ALL: Bitmask of all interrupts
+ */
+enum {
+	DWAXIDMAC_IRQ_NONE		= 0x0,
+	DWAXIDMAC_IRQ_BLOCK_TRF		= BIT(0),
+	DWAXIDMAC_IRQ_DMA_TRF		= BIT(1),
+	DWAXIDMAC_IRQ_SRC_TRAN		= BIT(3),
+	DWAXIDMAC_IRQ_DST_TRAN		= BIT(4),
+	DWAXIDMAC_IRQ_SRC_DEC_ERR	= BIT(5),
+	DWAXIDMAC_IRQ_DST_DEC_ERR	= BIT(6),
+	DWAXIDMAC_IRQ_SRC_SLV_ERR	= BIT(7),
+	DWAXIDMAC_IRQ_DST_SLV_ERR	= BIT(8),
+	DWAXIDMAC_IRQ_LLI_RD_DEC_ERR	= BIT(9),
+	DWAXIDMAC_IRQ_LLI_WR_DEC_ERR	= BIT(10),
+	DWAXIDMAC_IRQ_LLI_RD_SLV_ERR	= BIT(11),
+	DWAXIDMAC_IRQ_LLI_WR_SLV_ERR	= BIT(12),
+	DWAXIDMAC_IRQ_INVALID_ERR	= BIT(13),
+	DWAXIDMAC_IRQ_MULTIBLKTYPE_ERR	= BIT(14),
+	DWAXIDMAC_IRQ_DEC_ERR		= BIT(16),
+	DWAXIDMAC_IRQ_WR2RO_ERR		= BIT(17),
+	DWAXIDMAC_IRQ_RD2RWO_ERR	= BIT(18),
+	DWAXIDMAC_IRQ_WRONCHEN_ERR	= BIT(19),
+	DWAXIDMAC_IRQ_SHADOWREG_ERR	= BIT(20),
+	DWAXIDMAC_IRQ_WRONHOLD_ERR	= BIT(21),
+	DWAXIDMAC_IRQ_LOCK_CLEARED	= BIT(27),
+	DWAXIDMAC_IRQ_SRC_SUSPENDED	= BIT(28),
+	DWAXIDMAC_IRQ_SUSPENDED		= BIT(29),
+	DWAXIDMAC_IRQ_DISABLED		= BIT(30),
+	DWAXIDMAC_IRQ_ABORTED		= BIT(31),
+	DWAXIDMAC_IRQ_ALL_ERR		= 0x003F7FE0,
+	DWAXIDMAC_IRQ_ALL		= 0xFFFFFFFF
+};
+
+enum {
+	DWAXIDMAC_TRANS_WIDTH_8		= 0x0,
+	DWAXIDMAC_TRANS_WIDTH_16,
+	DWAXIDMAC_TRANS_WIDTH_32,
+	DWAXIDMAC_TRANS_WIDTH_64,
+	DWAXIDMAC_TRANS_WIDTH_128,
+	DWAXIDMAC_TRANS_WIDTH_256,
+	DWAXIDMAC_TRANS_WIDTH_512,
+	DWAXIDMAC_TRANS_WIDTH_MAX	= DWAXIDMAC_TRANS_WIDTH_512
+};
+
+#endif /* _AXI_DMA_PLATFORM_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-07 14:04 ` [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver Eugeniy Paltsev
@ 2017-04-18 12:31   ` Andy Shevchenko
  2017-04-21 14:29     ` Eugeniy Paltsev
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2017-04-18 12:31 UTC (permalink / raw)
  To: Eugeniy Paltsev, dmaengine
  Cc: linux-kernel, devicetree, linux-snps-arc, Dan Williams,
	Vinod Koul, Rob Herring, Alexey Brodkin

On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> This patch adds support for the DW AXI DMAC controller.
> 
> DW AXI DMAC is a part of upcoming development board from Synopsys.
> 
> In this driver implementation only DMA_MEMCPY and DMA_SG transfers
> are supported.

> 

> +++ b/drivers/dma/axi_dma_platform.c
> @@ -0,0 +1,1044 @@
> +/*
> + * Synopsys DesignWare AXI DMA Controller driver.
> + *
> + * Copyright (C) 2017 Synopsys, Inc. (www.synopsys.com)
> + *
> + * This program is free software; you can redistribute it and/or
> modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/dmaengine.h>
> +#include <linux/dmapool.h>
> +#include <linux/err.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>

> +#include <linux/of.h>

Are you sure you still need of.h along with depends OF ?

> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/property.h>
> +#include <linux/types.h>
> +

> +#include "axi_dma_platform.h"
> +#include "axi_dma_platform_reg.h"

Can't you have this in one header?

> +#include "dmaengine.h"
> +#include "virt-dma.h"

> +#define AXI_DMA_BUSWIDTHS		  \
> +	(DMA_SLAVE_BUSWIDTH_1_BYTE	| \
> +	DMA_SLAVE_BUSWIDTH_2_BYTES	| \
> +	DMA_SLAVE_BUSWIDTH_4_BYTES	| \
> +	DMA_SLAVE_BUSWIDTH_8_BYTES	| \
> +	DMA_SLAVE_BUSWIDTH_16_BYTES	| \
> +	DMA_SLAVE_BUSWIDTH_32_BYTES	| \
> +	DMA_SLAVE_BUSWIDTH_64_BYTES)
> +/* TODO: check: do we need to use BIT() macro here? */

Still TODO? I remember I answered to this on the first round.

> +
> +static inline void
> +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> +{
> +	iowrite32(val, chip->regs + reg);

Are you going to use IO ports for this IP? I don't think so.
Wouldn't be better to call readl()/writel() instead?

> +}

> +static inline void
> +axi_chan_iowrite64(struct axi_dma_chan *chan, u32 reg, u64 val)
> +{
> +	iowrite32(val & 0xFFFFFFFF, chan->chan_regs + reg);

Useless conjunction.

> +	iowrite32(val >> 32, chan->chan_regs + reg + 4);
> +}

Can your hardware get 8 bytes at once?

For such cases we have iowrite64() for 64-bit kernels

But with readq()/writeq() we have specific helpers to make this pretty,
i.e. lo_hi_readq() / lo_hi_writeq() (or hi_lo_*() variants).

> +static inline void axi_chan_irq_disable(struct axi_dma_chan *chan,
> u32 irq_mask)
> +{
> +	u32 val;
> +

> +	if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> DWAXIDMAC_IRQ_NONE);
> +	} else {

I don't see the benefit. (Yes, I see one read-less path, I think it
makes perplexity for nothing here)

> +		val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> +		val &= ~irq_mask;
> +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> +	}
> +}

> +static inline void axi_chan_disable(struct axi_dma_chan *chan)
> +{

> +}
> +
> +static inline void axi_chan_enable(struct axi_dma_chan *chan)
> +{

> +}

> +static u32 axi_chan_get_xfer_width(struct axi_dma_chan *chan,
> dma_addr_t src,
> +				   dma_addr_t dst, size_t len)
> +{
> +	u32 max_width = chan->chip->dw->hdata->m_data_width;

> +	size_t sdl = (src | dst | len);

Redundant parens, redundant temporary variable (you may do this in
place).

> +
> +	return min_t(size_t, __ffs(sdl), max_width);
> +}

> +static void axi_desc_put(struct axi_dma_desc *desc)
> +{
> +	struct axi_dma_chan *chan = desc->chan;
> +	struct dw_axi_dma *dw = chan->chip->dw;
> +	struct axi_dma_desc *child, *_next;
> +	unsigned int descs_put = 0;

> +	list_for_each_entry_safe(child, _next, &desc->xfer_list,
> xfer_list) {

xfer_list looks redundant.
Can you elaborate why virtual channel management is not working for you?

> +		list_del(&child->xfer_list);
> +		dma_pool_free(dw->desc_pool, child, child-
> >vd.tx.phys);
> +		descs_put++;
> +	}

> +}

> +/* Called in chan locked context */
> +static void axi_chan_block_xfer_start(struct axi_dma_chan *chan,
> +				      struct axi_dma_desc *first)
> +{

> +	u32 reg, irq_mask;
> +	u8 lms = 0;

Does it make any sense? It looks like lms is always 0.
Or I miss the source of its value?

> +	u32 priority = chan->chip->dw->hdata->priority[chan->id];

Reversed xmas tree, please.

Btw, are you planning to use priority at all? For now on I didn't see a
single driver (from the set I have checked, like 4-5 of them) that uses
priority anyhow. It makes driver more complex for nothing.

> +
> +	if (unlikely(axi_chan_is_hw_enable(chan))) {
> +		dev_err(chan2dev(chan), "%s is non-idle!\n",
> +			axi_chan_name(chan));
> +
> +		return;
> +	}

> +}

> +static void dma_chan_free_chan_resources(struct dma_chan *dchan)
> +{
> +	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
> +
> +	/* ASSERT: channel hw is idle */
> +	if (axi_chan_is_hw_enable(chan))
> +		dev_err(dchan2dev(dchan), "%s is non-idle!\n",
> +			axi_chan_name(chan));
> +
> +	axi_chan_disable(chan);
> +	axi_chan_irq_disable(chan, DWAXIDMAC_IRQ_ALL);
> +
> +	vchan_free_chan_resources(&chan->vc);
> +
> +	dev_vdbg(dchan2dev(dchan), "%s: %s: descriptor still
> allocated: %u\n",
> +		__func__, axi_chan_name(chan),

Redundant __func__ parameter for debug prints.

> +		atomic_read(&chan->descs_allocated));
> +
> +	pm_runtime_put(chan->chip->dev);
> +}

> +static struct dma_async_tx_descriptor *
> +dma_chan_prep_dma_sg(struct dma_chan *dchan,
> +		     struct scatterlist *dst_sg, unsigned int
> dst_nents,
> +		     struct scatterlist *src_sg, unsigned int
> src_nents,
> +		     unsigned long flags)
> +{
> +	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
> +	struct axi_dma_desc *first = NULL, *desc = NULL, *prev =
> NULL;
> +	size_t dst_len = 0, src_len = 0, xfer_len = 0;
> +	dma_addr_t dst_adr = 0, src_adr = 0;
> +	u32 src_width, dst_width;
> +	size_t block_ts, max_block_ts;
> +	u32 reg;

> +	u8 lms = 0;

Same about lms.

> +
> +	dev_dbg(chan2dev(chan), "%s: %s: sn: %d dn: %d flags: 0x%lx",
> +		__func__, axi_chan_name(chan), src_nents, dst_nents,
> flags);

Ditto for __func__.

> +
> 

> +	if (unlikely(dst_nents == 0 || src_nents == 0))
> +		return NULL;
> +
> +	if (unlikely(dst_sg == NULL || src_sg == NULL))
> +		return NULL;

If we need those checks they should go to dmaengine.h/dmaengine.c.

> +static void axi_chan_list_dump_lli(struct axi_dma_chan *chan,
> +				   struct axi_dma_desc *desc_head)
> +{
> +	struct axi_dma_desc *desc;
> +
> +	axi_chan_dump_lli(chan, desc_head);
> +	list_for_each_entry(desc, &desc_head->xfer_list, xfer_list)
> +		axi_chan_dump_lli(chan, desc);
> +}

> +
> +

Redundant (one) empty line.

> +static void axi_chan_handle_err(struct axi_dma_chan *chan, u32
> status)
> +{

> +	/* WARN about bad descriptor */
> 
> +	dev_err(chan2dev(chan),
> +		"Bad descriptor submitted for %s, cookie: %d, irq:
> 0x%08x\n",
> +		axi_chan_name(chan), vd->tx.cookie, status);
> +	axi_chan_list_dump_lli(chan, vd_to_axi_desc(vd));

As I said earlier dw_dmac is *bad* example of the (virtual channel
based) DMA driver.

I guess you may just fail the descriptor and don't pretend it has been
processed successfully.

> +static int dma_chan_pause(struct dma_chan *dchan)
> +{
> +	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
> +	unsigned long flags;
> +	unsigned int timeout = 20; /* timeout iterations */
> +	int ret = -EAGAIN;
> +	u32 val;
> +
> +	spin_lock_irqsave(&chan->vc.lock, flags);
> +

> +	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
> +	val |= BIT(chan->id) << DMAC_CHAN_SUSP_SHIFT |
> +	       BIT(chan->id) << DMAC_CHAN_SUSP_WE_SHIFT;
> +	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);

You have helpers which you don't use. Why?

> +
> +	while (timeout--) {

In such cases I prefer do {} while (); to explicitly show that body goes
at least once.

> +		if (axi_chan_irq_read(chan) &
> DWAXIDMAC_IRQ_SUSPENDED) {
> +			ret = 0;
> +			break;
> +		}
> +		udelay(2);
> +	}
> +
> +	axi_chan_irq_clear(chan, DWAXIDMAC_IRQ_SUSPENDED);
> +
> +	chan->is_paused = true;
> +
> +	spin_unlock_irqrestore(&chan->vc.lock, flags);
> +
> +	return ret;
> +}
> +
> +/* Called in chan locked context */
> +static inline void axi_chan_resume(struct axi_dma_chan *chan)
> +{
> +	u32 val;
> +

> +	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
> +	val &= ~(BIT(chan->id) << DMAC_CHAN_SUSP_SHIFT);
> +	val |=  (BIT(chan->id) << DMAC_CHAN_SUSP_WE_SHIFT);
> +	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);

Use helper.

> +
> +	chan->is_paused = false;
> +}

> +static int axi_dma_runtime_suspend(struct device *dev)
> +{
> +	struct axi_dma_chip *chip = dev_get_drvdata(dev);
> +

> +	dev_info(dev, "PAL: %s\n", __func__);

Noisy and useless.
We have functional tracer in kernel. Use it.

> +
> +	axi_dma_irq_disable(chip);
> +	axi_dma_disable(chip);
> +
> +	clk_disable_unprepare(chip->clk);
> +
> +	return 0;
> +}
> +
> +static int axi_dma_runtime_resume(struct device *dev)
> +{
> +	struct axi_dma_chip *chip = dev_get_drvdata(dev);
> +	int ret = 0;
> +

> +	dev_info(dev, "PAL: %s\n", __func__);

Ditto.

> +
> +	ret = clk_prepare_enable(chip->clk);
> +	if (ret < 0)
> +		return ret;
> +
> +	axi_dma_enable(chip);
> +	axi_dma_irq_enable(chip);
> +
> +	return 0;
> +}

> +static int dw_probe(struct platform_device *pdev)
> +{
> +	struct axi_dma_chip *chip;
> +	struct resource *mem;
> +	struct dw_axi_dma *dw;
> +	struct dw_axi_dma_hcfg *hdata;
> +	u32 i;
> +	int ret;
> +
> +	chip = devm_kzalloc(&pdev->dev, sizeof(*chip), GFP_KERNEL);
> +	if (!chip)
> +		return -ENOMEM;
> +
> +	dw = devm_kzalloc(&pdev->dev, sizeof(*dw), GFP_KERNEL);
> +	if (!dw)
> +		return -ENOMEM;
> +
> +	hdata = devm_kzalloc(&pdev->dev, sizeof(*hdata), GFP_KERNEL);
> +	if (!hdata)
> +		return -ENOMEM;
> +
> +	chip->dw = dw;
> +	chip->dev = &pdev->dev;
> +	chip->dw->hdata = hdata;
> +
> +	chip->irq = platform_get_irq(pdev, 0);
> +	if (chip->irq < 0)
> +		return chip->irq;
> +
> +	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	chip->regs = devm_ioremap_resource(chip->dev, mem);
> +	if (IS_ERR(chip->regs))
> +		return PTR_ERR(chip->regs);
> +
> +	chip->clk = devm_clk_get(chip->dev, NULL);
> +	if (IS_ERR(chip->clk))
> +		return PTR_ERR(chip->clk);
> +
> +	ret = parse_device_properties(chip);
> +	if (ret)
> +		return ret;
> +
> +	dw->chan = devm_kcalloc(chip->dev, hdata->nr_channels,
> +				sizeof(*dw->chan), GFP_KERNEL);
> +	if (!dw->chan)
> +		return -ENOMEM;
> +
> +	ret = devm_request_irq(chip->dev, chip->irq,
> dw_axi_dma_intretupt,
> +			       IRQF_SHARED, DRV_NAME, chip);
> +	if (ret)
> +		return ret;
> +
> +	/* Lli address must be aligned to a 64-byte boundary */
> +	dw->desc_pool = dmam_pool_create(DRV_NAME, chip->dev,
> +					 sizeof(struct axi_dma_desc),
> 64, 0);
> +	if (!dw->desc_pool) {
> +		dev_err(chip->dev, "No memory for descriptors dma
> pool\n");
> +		return -ENOMEM;
> +	}
> +
> +	INIT_LIST_HEAD(&dw->dma.channels);
> +	for (i = 0; i < hdata->nr_channels; i++) {
> +		struct axi_dma_chan *chan = &dw->chan[i];
> +
> +		chan->chip = chip;
> +		chan->id = (u8)i;
> +		chan->chan_regs = chip->regs + COMMON_REG_LEN + i *
> CHAN_REG_LEN;
> +		atomic_set(&chan->descs_allocated, 0);
> +
> +		chan->vc.desc_free = vchan_desc_put;
> +		vchan_init(&chan->vc, &dw->dma);
> +	}
> +
> +	/* Set capabilities */
> +	dma_cap_set(DMA_MEMCPY, dw->dma.cap_mask);
> +	dma_cap_set(DMA_SG, dw->dma.cap_mask);
> +
> +	/* DMA capabilities */
> +	dw->dma.chancnt = hdata->nr_channels;
> +	dw->dma.src_addr_widths = AXI_DMA_BUSWIDTHS;
> +	dw->dma.dst_addr_widths = AXI_DMA_BUSWIDTHS;
> +	dw->dma.directions = BIT(DMA_MEM_TO_MEM);
> +	dw->dma.residue_granularity =
> DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
> +
> +	dw->dma.dev = chip->dev;
> +	dw->dma.device_tx_status = dma_chan_tx_status;
> +	dw->dma.device_issue_pending = dma_chan_issue_pending;
> +	dw->dma.device_terminate_all = dma_chan_terminate_all;
> +	dw->dma.device_pause = dma_chan_pause;
> +	dw->dma.device_resume = dma_chan_resume;
> +
> +	dw->dma.device_alloc_chan_resources =
> dma_chan_alloc_chan_resources;
> +	dw->dma.device_free_chan_resources =
> dma_chan_free_chan_resources;
> +
> +	dw->dma.device_prep_dma_memcpy = dma_chan_prep_dma_memcpy;
> +	dw->dma.device_prep_dma_sg = dma_chan_prep_dma_sg;
> +
> +	platform_set_drvdata(pdev, chip);
> +
> +	pm_runtime_enable(chip->dev);
> +
> +	/*
> +	 * We can't just call pm_runtime_get here instead of
> +	 * pm_runtime_get_noresume + axi_dma_runtime_resume because
> we need
> +	 * driver to work also without Runtime PM.
> +	 */
> +	pm_runtime_get_noresume(chip->dev);
> +	ret = axi_dma_runtime_resume(chip->dev);
> +	if (ret < 0)
> +		goto err_pm_disable;
> +
> +	axi_dma_hw_init(chip);
> +
> +	pm_runtime_put(chip->dev);
> +
> +	ret = dma_async_device_register(&dw->dma);
> +	if (ret)
> +		goto err_pm_disable;
> +
> +	dev_info(chip->dev, "DesignWare AXI DMA Controller, %d
> channels\n",
> +		 dw->hdata->nr_channels);
> +
> +	return 0;
> +
> +err_pm_disable:
> +	pm_runtime_disable(chip->dev);
> +
> +	return ret;
> +}
> +
> +static int dw_remove(struct platform_device *pdev)
> +{
> +	struct axi_dma_chip *chip = platform_get_drvdata(pdev);
> +	struct dw_axi_dma *dw = chip->dw;
> +	struct axi_dma_chan *chan, *_chan;
> +	u32 i;
> +
> +	/* Enable clk before accessing to registers */
> +	clk_prepare_enable(chip->clk);
> +	axi_dma_irq_disable(chip);
> +	for (i = 0; i < dw->hdata->nr_channels; i++) {
> +		axi_chan_disable(&chip->dw->chan[i]);
> +		axi_chan_irq_disable(&chip->dw->chan[i],
> DWAXIDMAC_IRQ_ALL);
> +	}
> +	axi_dma_disable(chip);
> +
> +	pm_runtime_disable(chip->dev);
> +	axi_dma_runtime_suspend(chip->dev);
> +
> +	devm_free_irq(chip->dev, chip->irq, chip);
> +
> +	list_for_each_entry_safe(chan, _chan, &dw->dma.channels,
> +			vc.chan.device_node) {
> +		list_del(&chan->vc.chan.device_node);
> +		tasklet_kill(&chan->vc.task);
> +	}
> +
> +	dma_async_device_unregister(&dw->dma);
> +
> +	return 0;
> +}
> +

> +static const struct dev_pm_ops dw_axi_dma_pm_ops = {
> +	SET_RUNTIME_PM_OPS(axi_dma_runtime_suspend,
> axi_dma_runtime_resume, NULL)
> +};

Have you tried to build with CONFIG_PM disabled?

I'm pretty sure you need __maybe_unused applied to your PM ops.

> +struct axi_dma_chan {
> +	struct axi_dma_chip		*chip;
> +	void __iomem			*chan_regs;
> +	u8				id;
> +	atomic_t			descs_allocated;
> +
> +	struct virt_dma_chan		vc;
> +

> +	/* these other elements are all protected by vc.lock */
> +	bool				is_paused;

I still didn't get (already forgot) why you can't use dma_status instead
for the active descriptor?

> +};

> +/* LLI == Linked List Item */
> +struct __attribute__ ((__packed__)) axi_dma_lli {

...

> +	__le64		sar;
> +	__le64		dar;
> +	__le32		block_ts_lo;
> +	__le32		block_ts_hi;
> +	__le64		llp;
> +	__le32		ctl_lo;
> +	__le32		ctl_hi;
> +	__le32		sstat;
> +	__le32		dstat;
> +	__le32		status_lo;
> +	__le32		ststus_hi;
> +	__le32		reserved_lo;
> +	__le32		reserved_hi;
> +};

Just __packed here.

> +
> +struct axi_dma_desc {
> +	struct axi_dma_lli		lli;
> +
> +	struct virt_dma_desc		vd;
> +	struct axi_dma_chan		*chan;

> +	struct list_head		xfer_list;

This looks redundant. Already asked above about it.

> +};
> +

> +/* Common registers offset */
> +#define DMAC_ID			0x000 /* R DMAC ID */
> +#define DMAC_COMPVER		0x008 /* R DMAC Component Version
> */
> +#define DMAC_CFG		0x010 /* R/W DMAC Configuration */
> +#define DMAC_CHEN		0x018 /* R/W DMAC Channel Enable */
> +#define DMAC_CHEN_L		0x018 /* R/W DMAC Channel Enable
> 00-31 */
> +#define DMAC_CHEN_H		0x01C /* R/W DMAC Channel Enable
> 32-63 */
> +#define DMAC_INTSTATUS		0x030 /* R DMAC Interrupt
> Status */
> +#define DMAC_COMMON_INTCLEAR	0x038 /* W DMAC Interrupt Clear
> */
> +#define DMAC_COMMON_INTSTATUS_ENA 0x040 /* R DMAC Interrupt Status
> Enable */
> +#define DMAC_COMMON_INTSIGNAL_ENA 0x048 /* R/W DMAC Interrupt Signal
> Enable */
> +#define DMAC_COMMON_INTSTATUS	0x050 /* R DMAC Interrupt Status
> */
> +#define DMAC_RESET		0x058 /* R DMAC Reset Register1 */
> +
> +/* DMA channel registers offset */
> +#define CH_SAR			0x000 /* R/W Chan Source
> Address */
> +#define CH_DAR			0x008 /* R/W Chan Destination
> Address */
> +#define CH_BLOCK_TS		0x010 /* R/W Chan Block Transfer
> Size */
> +#define CH_CTL			0x018 /* R/W Chan Control */
> +#define CH_CTL_L		0x018 /* R/W Chan Control 00-31 */
> +#define CH_CTL_H		0x01C /* R/W Chan Control 32-63 */
> +#define CH_CFG			0x020 /* R/W Chan Configuration
> */
> +#define CH_CFG_L		0x020 /* R/W Chan Configuration 00-31 
> */
> +#define CH_CFG_H		0x024 /* R/W Chan Configuration 32-63 
> */
> +#define CH_LLP			0x028 /* R/W Chan Linked List
> Pointer */
> +#define CH_STATUS		0x030 /* R Chan Status */
> +#define CH_SWHSSRC		0x038 /* R/W Chan SW Handshake
> Source */
> +#define CH_SWHSDST		0x040 /* R/W Chan SW Handshake
> Destination */
> +#define CH_BLK_TFR_RESUMEREQ	0x048 /* W Chan Block Transfer
> Resume Req */
> +#define CH_AXI_ID		0x050 /* R/W Chan AXI ID */
> +#define CH_AXI_QOS		0x058 /* R/W Chan AXI QOS */
> +#define CH_SSTAT		0x060 /* R Chan Source Status */
> +#define CH_DSTAT		0x068 /* R Chan Destination Status */
> +#define CH_SSTATAR		0x070 /* R/W Chan Source Status
> Fetch Addr */
> +#define CH_DSTATAR		0x078 /* R/W Chan Destination
> Status Fetch Addr */
> +#define CH_INTSTATUS_ENA	0x080 /* R/W Chan Interrupt Status
> Enable */
> +#define CH_INTSTATUS		0x088 /* R/W Chan Interrupt
> Status */
> +#define CH_INTSIGNAL_ENA	0x090 /* R/W Chan Interrupt Signal
> Enable */
> +#define CH_INTCLEAR		0x098 /* W Chan Interrupt Clear */

I'm wondering if you can use regmap API instead.

> +
> +

Redundant (one) empty line.

> +/* DMAC_CFG */
> +#define DMAC_EN_MASK		0x00000001U

GENMASK()

> +#define DMAC_EN_POS		0

Usually _SHIFT, but it's up to you.

> +
> +#define INT_EN_MASK		0x00000002U

GENMASK()

> +#define INT_EN_POS		1
> +

_SHIFT ?

> +#define CH_CTL_L_DST_MSIZE_POS	18
> +#define CH_CTL_L_SRC_MSIZE_POS	14

Ditto.

> +enum {
> +	DWAXIDMAC_BURST_TRANS_LEN_1	= 0x0,
> +	DWAXIDMAC_BURST_TRANS_LEN_4,
> +	DWAXIDMAC_BURST_TRANS_LEN_8,
> +	DWAXIDMAC_BURST_TRANS_LEN_16,
> +	DWAXIDMAC_BURST_TRANS_LEN_32,
> +	DWAXIDMAC_BURST_TRANS_LEN_64,
> +	DWAXIDMAC_BURST_TRANS_LEN_128,
> +	DWAXIDMAC_BURST_TRANS_LEN_256,
> +	DWAXIDMAC_BURST_TRANS_LEN_512,
> +	DWAXIDMAC_BURST_TRANS_LEN_1024

..._1024, ?

> +};

Hmm... do you need them in the header?

> +#define CH_CFG_H_TT_FC_POS	0
> +enum {
> +	DWAXIDMAC_TT_FC_MEM_TO_MEM_DMAC	= 0x0,
> +	DWAXIDMAC_TT_FC_MEM_TO_PER_DMAC,
> +	DWAXIDMAC_TT_FC_PER_TO_MEM_DMAC,
> +	DWAXIDMAC_TT_FC_PER_TO_PER_DMAC,
> +	DWAXIDMAC_TT_FC_PER_TO_MEM_SRC,
> +	DWAXIDMAC_TT_FC_PER_TO_PER_SRC,
> +	DWAXIDMAC_TT_FC_MEM_TO_PER_DST,
> +	DWAXIDMAC_TT_FC_PER_TO_PER_DST
> +};

Some of definitions are the same as for dw_dmac, right? We might split
them to a common header, though I have no strong opinion about it.
Vinod?

> +/**
> + * Dw axi dma channel interrupts

Dw axi dma - > DW AXI DMA?

> + *
> + * @DWAXIDMAC_IRQ_NONE: Bitmask of no one interrupt
> + * @DWAXIDMAC_IRQ_BLOCK_TRF: Block transfer complete
> + * @DWAXIDMAC_IRQ_DMA_TRF: Dma transfer complete
> + * @DWAXIDMAC_IRQ_SRC_TRAN: Source transaction complete
> + * @DWAXIDMAC_IRQ_DST_TRAN: Destination transaction complete
> + * @DWAXIDMAC_IRQ_SRC_DEC_ERR: Source decode error
> + * @DWAXIDMAC_IRQ_DST_DEC_ERR: Destination decode error
> + * @DWAXIDMAC_IRQ_SRC_SLV_ERR: Source slave error
> + * @DWAXIDMAC_IRQ_DST_SLV_ERR: Destination slave error
> + * @DWAXIDMAC_IRQ_LLI_RD_DEC_ERR: LLI read decode error
> + * @DWAXIDMAC_IRQ_LLI_WR_DEC_ERR: LLI write decode error
> + * @DWAXIDMAC_IRQ_LLI_RD_SLV_ERR: LLI read slave error
> + * @DWAXIDMAC_IRQ_LLI_WR_SLV_ERR: LLI write slave error
> + * @DWAXIDMAC_IRQ_INVALID_ERR: LLI invalide error or Shadow register
> error
> + * @DWAXIDMAC_IRQ_MULTIBLKTYPE_ERR: Slave Interface Multiblock type
> error
> + * @DWAXIDMAC_IRQ_DEC_ERR: Slave Interface decode error
> + * @DWAXIDMAC_IRQ_WR2RO_ERR: Slave Interface write to read only error
> + * @DWAXIDMAC_IRQ_RD2RWO_ERR: Slave Interface read to write only
> error
> + * @DWAXIDMAC_IRQ_WRONCHEN_ERR: Slave Interface write to channel
> error
> + * @DWAXIDMAC_IRQ_SHADOWREG_ERR: Slave Interface shadow reg error
> + * @DWAXIDMAC_IRQ_WRONHOLD_ERR: Slave Interface hold error
> + * @DWAXIDMAC_IRQ_LOCK_CLEARED: Lock Cleared Status
> + * @DWAXIDMAC_IRQ_SRC_SUSPENDED: Source Suspended Status
> + * @DWAXIDMAC_IRQ_SUSPENDED: Channel Suspended Status
> + * @DWAXIDMAC_IRQ_DISABLED: Channel Disabled Status
> + * @DWAXIDMAC_IRQ_ABORTED: Channel Aborted Status
> + * @DWAXIDMAC_IRQ_ALL_ERR: Bitmask of all error interrupts
> + * @DWAXIDMAC_IRQ_ALL: Bitmask of all interrupts
> + */
> +enum {
> +	DWAXIDMAC_IRQ_NONE		= 0x0,

Just 0.

> +	DWAXIDMAC_IRQ_BLOCK_TRF		= BIT(0),
> +	DWAXIDMAC_IRQ_DMA_TRF		= BIT(1),
> +	DWAXIDMAC_IRQ_SRC_TRAN		= BIT(3),
> +	DWAXIDMAC_IRQ_DST_TRAN		= BIT(4),
> +	DWAXIDMAC_IRQ_SRC_DEC_ERR	= BIT(5),
> +	DWAXIDMAC_IRQ_DST_DEC_ERR	= BIT(6),
> +	DWAXIDMAC_IRQ_SRC_SLV_ERR	= BIT(7),
> +	DWAXIDMAC_IRQ_DST_SLV_ERR	= BIT(8),
> +	DWAXIDMAC_IRQ_LLI_RD_DEC_ERR	= BIT(9),
> +	DWAXIDMAC_IRQ_LLI_WR_DEC_ERR	= BIT(10),
> +	DWAXIDMAC_IRQ_LLI_RD_SLV_ERR	= BIT(11),
> +	DWAXIDMAC_IRQ_LLI_WR_SLV_ERR	= BIT(12),
> +	DWAXIDMAC_IRQ_INVALID_ERR	= BIT(13),
> +	DWAXIDMAC_IRQ_MULTIBLKTYPE_ERR	= BIT(14),
> +	DWAXIDMAC_IRQ_DEC_ERR		= BIT(16),
> +	DWAXIDMAC_IRQ_WR2RO_ERR		= BIT(17),
> +	DWAXIDMAC_IRQ_RD2RWO_ERR	= BIT(18),
> +	DWAXIDMAC_IRQ_WRONCHEN_ERR	= BIT(19),
> +	DWAXIDMAC_IRQ_SHADOWREG_ERR	= BIT(20),
> +	DWAXIDMAC_IRQ_WRONHOLD_ERR	= BIT(21),
> +	DWAXIDMAC_IRQ_LOCK_CLEARED	= BIT(27),
> +	DWAXIDMAC_IRQ_SRC_SUSPENDED	= BIT(28),
> +	DWAXIDMAC_IRQ_SUSPENDED		= BIT(29),
> +	DWAXIDMAC_IRQ_DISABLED		= BIT(30),
> +	DWAXIDMAC_IRQ_ABORTED		= BIT(31),

> +	DWAXIDMAC_IRQ_ALL_ERR		= 0x003F7FE0,

GENMASK() | GENMASK() ?

> +	DWAXIDMAC_IRQ_ALL		= 0xFFFFFFFF

GENMASK()

> +};

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-18 12:31   ` Andy Shevchenko
@ 2017-04-21 14:29     ` Eugeniy Paltsev
  2017-04-21 15:13       ` Andy Shevchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-21 14:29 UTC (permalink / raw)
  To: andriy.shevchenko
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	Eugeniy.Paltsev, linux-snps-arc, dan.j.williams, dmaengine

Hi Andy,
thanks for respond.
My comments are inlined below.

On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > This patch adds support for the DW AXI DMAC controller.
> >
> > DW AXI DMAC is a part of upcoming development board from Synopsys.
> >
> > In this driver implementation only DMA_MEMCPY and DMA_SG transfers
> > are supported.
> >
> > +++ b/drivers/dma/axi_dma_platform.c
> > @@ -0,0 +1,1044 @@
> > +#include <linux/bitops.h>
> > +#include <linux/delay.h>
> > +#include <linux/device.h>
> > +#include <linux/dmaengine.h>
> > +#include <linux/dmapool.h>
> > +#include <linux/err.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/io.h>
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/of.h>
>
> Are you sure you still need of.h along with depends OF ?
"of_match_ptr" used from of.h

> > +#include <linux/platform_device.h>
> > +#include <linux/pm_runtime.h>
> > +#include <linux/property.h>
> > +#include <linux/types.h>
> > +
> > +#include "axi_dma_platform.h"
> > +#include "axi_dma_platform_reg.h"
>
> Can't you have this in one header?
Sure.

> > +#include "dmaengine.h"
> > +#include "virt-dma.h"
> > +#define AXI_DMA_BUSWIDTHS		  \
> > +	(DMA_SLAVE_BUSWIDTH_1_BYTE	| \
> > +	DMA_SLAVE_BUSWIDTH_2_BYTES	| \
> > +	DMA_SLAVE_BUSWIDTH_4_BYTES	| \
> > +	DMA_SLAVE_BUSWIDTH_8_BYTES	| \
> > +	DMA_SLAVE_BUSWIDTH_16_BYTES	| \
> > +	DMA_SLAVE_BUSWIDTH_32_BYTES	| \
> > +	DMA_SLAVE_BUSWIDTH_64_BYTES)
> > +/* TODO: check: do we need to use BIT() macro here? */
>
> Still TODO? I remember I answered to this on the first round.
Yes, I remember it.
I left this TODO as a reminder because src_addr_widths and
dst_addr_widths are
not used anywhere and they are set differently in different drivers
(with or without BIT macro).

> > +
> > +static inline void
> > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > +{
> > +	iowrite32(val, chip->regs + reg);
>
> Are you going to use IO ports for this IP? I don't think so.
> Wouldn't be better to call readl()/writel() instead?
As I understand, it's better to use ioread/iowrite as more universal IO
access way. Am I wrong?

> > +}
> > +static inline void
> > +axi_chan_iowrite64(struct axi_dma_chan *chan, u32 reg, u64 val)
> > +{
> > +	iowrite32(val & 0xFFFFFFFF, chan->chan_regs + reg);
> Useless conjunction.
>
> > +	iowrite32(val >> 32, chan->chan_regs + reg + 4);
> > +}
>
> Can your hardware get 8 bytes at once?
> For such cases we have iowrite64() for 64-bit kernels
>
> But with readq()/writeq() we have specific helpers to make this
> pretty,
> i.e. lo_hi_readq() / lo_hi_writeq() (or hi_lo_*() variants).
Ok, I possibly can use lo_hi_writeq here.

> > +static inline void axi_chan_irq_disable(struct axi_dma_chan *chan,
> > u32 irq_mask)
> > +{
> > +	u32 val;
> > +
> > +	if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > DWAXIDMAC_IRQ_NONE);
> > +	} else {
>
> I don't see the benefit. (Yes, I see one read-less path, I think it
> makes perplexity for nothing here)
This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until I
add DMA_SLAVE support.
But I can cut off this 'if' statment, if it is necessery.

> > +		val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> > +		val &= ~irq_mask;
> > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> > +	}
> > +}
> > +static inline void axi_chan_disable(struct axi_dma_chan *chan)
> > +{
> > +}
> > +
> > +static inline void axi_chan_enable(struct axi_dma_chan *chan)
> > +{
> > +}
> > +static u32 axi_chan_get_xfer_width(struct axi_dma_chan *chan,
> > dma_addr_t src,
> > +				   dma_addr_t dst, size_t len)
> > +{
> > +	u32 max_width = chan->chip->dw->hdata->m_data_width;
> > +	size_t sdl = (src | dst | len);
>
> Redundant parens, redundant temporary variable (you may do this in
> place).
Ok.

> > +
> > +	return min_t(size_t, __ffs(sdl), max_width);
> > +}
> > +static void axi_desc_put(struct axi_dma_desc *desc)
> > +{
> > +	struct axi_dma_chan *chan = desc->chan;
> > +	struct dw_axi_dma *dw = chan->chip->dw;
> > +	struct axi_dma_desc *child, *_next;
> > +	unsigned int descs_put = 0;
> > +	list_for_each_entry_safe(child, _next, &desc->xfer_list,
> > xfer_list) {
>
> xfer_list looks redundant.
> Can you elaborate why virtual channel management is not working for
> you?
Each virtual descriptor encapsulates several hardware descriptors,
which belong to same transfer.
This list (xfer_list) is used only for allocating/freeing these
descriptors and it doesn't affect on virtual dma work logic.
I can see this approach in several drivers with VirtDMA (but they
mostly use array instead of list)

> > +		list_del(&child->xfer_list);
> > +		dma_pool_free(dw->desc_pool, child, child-
> > > vd.tx.phys);
> >
> > +		descs_put++;
> > +	}
> > +}
> > +/* Called in chan locked context */
> > +static void axi_chan_block_xfer_start(struct axi_dma_chan *chan,
> > +				      struct axi_dma_desc *first)
> > +{
> > +	u32 reg, irq_mask;
> > +	u8 lms = 0;
>
> Does it make any sense? It looks like lms is always 0.
> Or I miss the source of its value?
lms variable uset to determine axi bus for reading lli descriptors. It
is equal to 0 for mem-to-mem transfers. Perhaps it is better to use
define instead of this variable.

> > +	u32 priority = chan->chip->dw->hdata->priority[chan->id];
>
> Reversed xmas tree, please.
>
> Btw, are you planning to use priority at all? For now on I didn't see
> a single driver (from the set I have checked, like 4-5 of them) that
> uses priority anyhow. It makes driver more complex for nothing.
Only for dma slave operations.

>
> > +
> > +	if (unlikely(axi_chan_is_hw_enable(chan))) {
> > +		dev_err(chan2dev(chan), "%s is non-idle!\n",
> > +			axi_chan_name(chan));
> > +
> > +		return;
> > +	}
> > +}
> > +static void dma_chan_free_chan_resources(struct dma_chan *dchan)
> > +{
> > +	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
> > +
> > +	/* ASSERT: channel hw is idle */
> > +	if (axi_chan_is_hw_enable(chan))
> > +		dev_err(dchan2dev(dchan), "%s is non-idle!\n",
> > +			axi_chan_name(chan));
> > +
> > +	axi_chan_disable(chan);
> > +	axi_chan_irq_disable(chan, DWAXIDMAC_IRQ_ALL);
> > +
> > +	vchan_free_chan_resources(&chan->vc);
> > +
> > +	dev_vdbg(dchan2dev(dchan), "%s: %s: descriptor still
> > allocated: %u\n",
> > +		__func__, axi_chan_name(chan),
>
> Redundant __func__ parameter for debug prints.
>
> > +		atomic_read(&chan->descs_allocated));
> > +
> > +	pm_runtime_put(chan->chip->dev);
> > +}
> > +static struct dma_async_tx_descriptor *
> > +dma_chan_prep_dma_sg(struct dma_chan *dchan,
> > +		     struct scatterlist *dst_sg, unsigned int
> > dst_nents,
> > +		     struct scatterlist *src_sg, unsigned int
> > src_nents,
> > +		     unsigned long flags)
> > +{
> > +	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
> > +	struct axi_dma_desc *first = NULL, *desc = NULL, *prev =
> > NULL;
> > +	size_t dst_len = 0, src_len = 0, xfer_len = 0;
> > +	dma_addr_t dst_adr = 0, src_adr = 0;
> > +	u32 src_width, dst_width;
> > +	size_t block_ts, max_block_ts;
> > +	u32 reg;
> > +	u8 lms = 0;
>
> Same about lms.
>
> > +
> > +	dev_dbg(chan2dev(chan), "%s: %s: sn: %d dn: %d flags:
> > 0x%lx",
> > +		__func__, axi_chan_name(chan), src_nents,
> > dst_nents,
> > flags);
>
> Ditto for __func__.
>
> > +
> >
> > +	if (unlikely(dst_nents == 0 || src_nents == 0))
> > +		return NULL;
> > +
> > +	if (unlikely(dst_sg == NULL || src_sg == NULL))
> > +		return NULL;
>
> If we need those checks they should go to dmaengine.h/dmaengine.c.
I checked several drivers, which implements device_prep_dma_sg and they
implements this checkers.
Should I add something like "dma_sg_desc_invalid" function to
dmaengine.h ?

> > +static void axi_chan_list_dump_lli(struct axi_dma_chan *chan,
> > +				   struct axi_dma_desc *desc_head)
> > +{
> > +	struct axi_dma_desc *desc;
> > +
> > +	axi_chan_dump_lli(chan, desc_head);
> > +	list_for_each_entry(desc, &desc_head->xfer_list,
> > xfer_list)
> > +		axi_chan_dump_lli(chan, desc);
> > +}
> > +
> > +static void axi_chan_handle_err(struct axi_dma_chan *chan, u32
> > status)
> > +{
> > +	/* WARN about bad descriptor */
> >
> > +	dev_err(chan2dev(chan),
> > +		"Bad descriptor submitted for %s, cookie: %d, irq:
> > 0x%08x\n",
> > +		axi_chan_name(chan), vd->tx.cookie, status);
> > +	axi_chan_list_dump_lli(chan, vd_to_axi_desc(vd));
>
> As I said earlier dw_dmac is *bad* example of the (virtual channel
> based) DMA driver.
>
> I guess you may just fail the descriptor and don't pretend it has
> been processed successfully.
What do you mean by saying "fail the descriptor"?
After I get error I cancel current transfer and free all descriptors
from it (by calling vchan_cookie_complete).
I can't store error status in descriptor structure because it will be
freed by vchan_cookie_complete.
I can't store error status in channel structure because it will be
overwritten by next transfer.


> > +static int dma_chan_pause(struct dma_chan *dchan)
> > +{
> > +	struct axi_dma_chan *chan = dchan_to_axi_dma_chan(dchan);
> > +	unsigned long flags;
> > +	unsigned int timeout = 20; /* timeout iterations */
> > +	int ret = -EAGAIN;
> > +	u32 val;
> > +
> > +	spin_lock_irqsave(&chan->vc.lock, flags);
> > +
> > +	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
> > +	val |= BIT(chan->id) << DMAC_CHAN_SUSP_SHIFT |
> > +	       BIT(chan->id) << DMAC_CHAN_SUSP_WE_SHIFT;
> > +	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);
>
> You have helpers which you don't use. Why?
Ok, will use.

> > +
> > +	while (timeout--) {
>
> In such cases I prefer do {} while (); to explicitly show that body
> goes at least once.
Good idea. Will change to do {} while () here.

> > +		if (axi_chan_irq_read(chan) &
> > DWAXIDMAC_IRQ_SUSPENDED) {
> > +			ret = 0;
> > +			break;
> > +		}
> > +		udelay(2);
> > +	}
> > +
> > +	axi_chan_irq_clear(chan, DWAXIDMAC_IRQ_SUSPENDED);
> > +
> > +	chan->is_paused = true;
> > +
> > +	spin_unlock_irqrestore(&chan->vc.lock, flags);
> > +
> > +	return ret;
> > +}
> > +
> > +/* Called in chan locked context */
> > +static inline void axi_chan_resume(struct axi_dma_chan *chan)
> > +{
> > +	u32 val;
> > +
> > +	val = axi_dma_ioread32(chan->chip, DMAC_CHEN);
> > +	val &= ~(BIT(chan->id) << DMAC_CHAN_SUSP_SHIFT);
> > +	val |=  (BIT(chan->id) << DMAC_CHAN_SUSP_WE_SHIFT);
> > +	axi_dma_iowrite32(chan->chip, DMAC_CHEN, val);
> > +
> > +	chan->is_paused = false;
> > +}
> > +static int axi_dma_runtime_suspend(struct device *dev)
> > +{
> > +	struct axi_dma_chip *chip = dev_get_drvdata(dev);
> > +
> > +	dev_info(dev, "PAL: %s\n", __func__);
>
> Noisy and useless.
> We have functional tracer in kernel. Use it.
Ok.

> > +
> > +	axi_dma_irq_disable(chip);
> > +	axi_dma_disable(chip);
> > +
> > +	clk_disable_unprepare(chip->clk);
> > +
> > +	return 0;
> > +}

[snip]

> > +
> > +static const struct dev_pm_ops dw_axi_dma_pm_ops = {
> > +	SET_RUNTIME_PM_OPS(axi_dma_runtime_suspend,
> > axi_dma_runtime_resume, NULL)
> > +};
>
> Have you tried to build with CONFIG_PM disabled?
Yes.

> I'm pretty sure you need __maybe_unused applied to your PM ops.
I call axi_dma_runtime_suspend / axi_dma_runtime_resume even I dont't
use PM.
(I call them in probe / remove function.)
So I don't need to declare them with __maybe_unused.

> > +struct axi_dma_chan {
> > +	struct axi_dma_chip		*chip;
> > +	void __iomem			*chan_regs;
> > +	u8				id;
> > +	atomic_t			descs_allocated;
> > +
> > +	struct virt_dma_chan		vc;
> > +
> > +	/* these other elements are all protected by vc.lock */
> > +	bool				is_paused;
>
> I still didn't get (already forgot) why you can't use dma_status
> instead for the active descriptor?
As I said before, I checked several driver, which have status variable
in their channel structure - it is used *only* for determinating is
channel paused or not. So there is no much sense in replacing
"is_paused" to "status" and I left "is_paused" variable untouched.

(I described above why we can't use status in channel structure for
error handling)

> > +};
> > +/* LLI == Linked List Item */
> > +struct __attribute__ ((__packed__)) axi_dma_lli {
>
> ...
>
> > +	__le64		sar;
> > +	__le64		dar;
> > +	__le32		block_ts_lo;
> > +	__le32		block_ts_hi;
> > +	__le64		llp;
> > +	__le32		ctl_lo;
> > +	__le32		ctl_hi;
> > +	__le32		sstat;
> > +	__le32		dstat;
> > +	__le32		status_lo;
> > +	__le32		ststus_hi;
> > +	__le32		reserved_lo;
> > +	__le32		reserved_hi;
> > +};
>
> Just __packed here.
>
Ok.

> > +
> > +struct axi_dma_desc {
> > +	struct axi_dma_lli		lli;
> > +
> > +	struct virt_dma_desc		vd;
> > +	struct axi_dma_chan		*chan;
> > +	struct list_head		xfer_list;
>
> This looks redundant. Already asked above about it.
Answered above.

> > +};
> > +
> > +/* Common registers offset */
> > +#define DMAC_ID			0x000 /* R DMAC ID */
> > +#define DMAC_COMPVER		0x008 /* R DMAC Component
> > Version
> > */
> > +#define DMAC_CFG		0x010 /* R/W DMAC Configuration */
> > +#define DMAC_CHEN		0x018 /* R/W DMAC Channel Enable
> > */
> > +#define DMAC_CHEN_L		0x018 /* R/W DMAC Channel
> > Enable
> > 00-31 */
> > +#define DMAC_CHEN_H		0x01C /* R/W DMAC Channel
> > Enable
> > 32-63 */
> > +#define DMAC_INTSTATUS		0x030 /* R DMAC Interrupt
> > Status */
> > +#define DMAC_COMMON_INTCLEAR	0x038 /* W DMAC Interrupt
> > Clear
> > */
> > +#define DMAC_COMMON_INTSTATUS_ENA 0x040 /* R DMAC Interrupt Status
> > Enable */
> > +#define DMAC_COMMON_INTSIGNAL_ENA 0x048 /* R/W DMAC Interrupt
> > Signal
> > Enable */
> > +#define DMAC_COMMON_INTSTATUS	0x050 /* R DMAC Interrupt
> > Status
> > */
> > +#define DMAC_RESET		0x058 /* R DMAC Reset Register1
> > */
> > +
> > +/* DMA channel registers offset */
> > +#define CH_SAR			0x000 /* R/W Chan Source
> > Address */
> > +#define CH_DAR			0x008 /* R/W Chan
> > Destination
> > Address */
> > +#define CH_BLOCK_TS		0x010 /* R/W Chan Block
> > Transfer
> > Size */
> > +#define CH_CTL			0x018 /* R/W Chan Control */
> > +#define CH_CTL_L		0x018 /* R/W Chan Control 00-31 */
> > +#define CH_CTL_H		0x01C /* R/W Chan Control 32-63 */
> > +#define CH_CFG			0x020 /* R/W Chan
> > Configuration
> > */
> > +#define CH_CFG_L		0x020 /* R/W Chan Configuration
> > 00-31
> > */
> > +#define CH_CFG_H		0x024 /* R/W Chan Configuration
> > 32-63
> > */
> > +#define CH_LLP			0x028 /* R/W Chan Linked
> > List
> > Pointer */
> > +#define CH_STATUS		0x030 /* R Chan Status */
> > +#define CH_SWHSSRC		0x038 /* R/W Chan SW Handshake
> > Source */
> > +#define CH_SWHSDST		0x040 /* R/W Chan SW Handshake
> > Destination */
> > +#define CH_BLK_TFR_RESUMEREQ	0x048 /* W Chan Block Transfer
> > Resume Req */
> > +#define CH_AXI_ID		0x050 /* R/W Chan AXI ID */
> > +#define CH_AXI_QOS		0x058 /* R/W Chan AXI QOS */
> > +#define CH_SSTAT		0x060 /* R Chan Source Status */
> > +#define CH_DSTAT		0x068 /* R Chan Destination Status
> > */
> > +#define CH_SSTATAR		0x070 /* R/W Chan Source Status
> > Fetch Addr */
> > +#define CH_DSTATAR		0x078 /* R/W Chan Destination
> > Status Fetch Addr */
> > +#define CH_INTSTATUS_ENA	0x080 /* R/W Chan Interrupt Status
> > Enable */
> > +#define CH_INTSTATUS		0x088 /* R/W Chan Interrupt
> > Status */
> > +#define CH_INTSIGNAL_ENA	0x090 /* R/W Chan Interrupt Signal
> > Enable */
> > +#define CH_INTCLEAR		0x098 /* W Chan Interrupt Clear
> > */
>
> I'm wondering if you can use regmap API instead.
Is it really necessary? It'll make driver more complex for nothing.
>
> > +/* DMAC_CFG */
> > +#define DMAC_EN_MASK		0x00000001U
>
> GENMASK()
Ok.

> > +#define DMAC_EN_POS		0
>
> Usually _SHIFT, but it's up to you.
>
> > +enum {
> > +	DWAXIDMAC_BURST_TRANS_LEN_1	= 0x0,
> > +	DWAXIDMAC_BURST_TRANS_LEN_4,
> > +	DWAXIDMAC_BURST_TRANS_LEN_8,
> > +	DWAXIDMAC_BURST_TRANS_LEN_16,
> > +	DWAXIDMAC_BURST_TRANS_LEN_32,
> > +	DWAXIDMAC_BURST_TRANS_LEN_64,
> > +	DWAXIDMAC_BURST_TRANS_LEN_128,
> > +	DWAXIDMAC_BURST_TRANS_LEN_256,
> > +	DWAXIDMAC_BURST_TRANS_LEN_512,
> > +	DWAXIDMAC_BURST_TRANS_LEN_1024
>
> ..._1024, ?
What exactly you are asking about?

> > +};
>
> Hmm... do you need them in the header?
I use some of these definitions in my code so I guess yes.
/* Maybe I misunderstood your question... */

> > +#define CH_CFG_H_TT_FC_POS	0
> > +enum {
> > +	DWAXIDMAC_TT_FC_MEM_TO_MEM_DMAC	= 0x0,
> > +	DWAXIDMAC_TT_FC_MEM_TO_PER_DMAC,
> > +	DWAXIDMAC_TT_FC_PER_TO_MEM_DMAC,
> > +	DWAXIDMAC_TT_FC_PER_TO_PER_DMAC,
> > +	DWAXIDMAC_TT_FC_PER_TO_MEM_SRC,
> > +	DWAXIDMAC_TT_FC_PER_TO_PER_SRC,
> > +	DWAXIDMAC_TT_FC_MEM_TO_PER_DST,
> > +	DWAXIDMAC_TT_FC_PER_TO_PER_DST
> > +};
>
> Some of definitions are the same as for dw_dmac, right? We might
> split them to a common header, though I have no strong opinion about
it.
> Vinod?
APB DMAC and AXI DMAC have completely different regmap. So there is no
much sense to do that.

--
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-21 14:29     ` Eugeniy Paltsev
@ 2017-04-21 15:13       ` Andy Shevchenko
  2017-04-24 15:55         ` Eugeniy Paltsev
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2017-04-21 15:13 UTC (permalink / raw)
  To: Eugeniy Paltsev
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	linux-snps-arc, dan.j.williams, dmaengine

On Fri, 2017-04-21 at 14:29 +0000, Eugeniy Paltsev wrote:
> On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > This patch adds support for the DW AXI DMAC controller.
> > > 

> > > +#include <linux/of.h>
> > 
> > Are you sure you still need of.h along with depends OF ?
> 
> "of_match_ptr" used from of.h

It safe not to use it and always have a table. In this case the driver
even would be available for ACPI-enabled platforms (I suppose some ARM64
might find this useful).

> > > +#define AXI_DMA_BUSWIDTHS		  \
> > > +	(DMA_SLAVE_BUSWIDTH_1_BYTE	| \
> > > +	DMA_SLAVE_BUSWIDTH_2_BYTES	| \
> > > +	DMA_SLAVE_BUSWIDTH_4_BYTES	| \
> > > +	DMA_SLAVE_BUSWIDTH_8_BYTES	| \
> > > +	DMA_SLAVE_BUSWIDTH_16_BYTES	| \
> > > +	DMA_SLAVE_BUSWIDTH_32_BYTES	| \
> > > +	DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > +/* TODO: check: do we need to use BIT() macro here? */
> > 
> > Still TODO? I remember I answered to this on the first round.
> 
> Yes, I remember it.
> I left this TODO as a reminder because src_addr_widths and
> dst_addr_widths are
> not used anywhere and they are set differently in different drivers
> (with or without BIT macro).

Strange. AFAIK they are representing bits (which is not the best idea)
in the resulting u64 field. So, anything bigger than 63 doesn't make
sense. In drivers where they are not bits quite likely a bug is hidden.

> 
> > > +
> > > +static inline void
> > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > > +{
> > > +	iowrite32(val, chip->regs + reg);
> > 
> > Are you going to use IO ports for this IP? I don't think so.
> > Wouldn't be better to call readl()/writel() instead?
> 
> As I understand, it's better to use ioread/iowrite as more universal
> IO
> access way. Am I wrong?

As I said above the ioreadX/iowriteX makes only sense when your IP would
be accessed via IO region or MMIO. I'm pretty sure IO is not the case at
all for this IP.

> > > +static inline void axi_chan_irq_disable(struct axi_dma_chan
> > > *chan,
> > > u32 irq_mask)
> > > +{
> > > +	u32 val;
> > > +
> > > +	if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > DWAXIDMAC_IRQ_NONE);
> > > +	} else {
> > 
> > I don't see the benefit. (Yes, I see one read-less path, I think it
> > makes perplexity for nothing here)
> 
> This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
> Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until I
> add DMA_SLAVE support.
> But I can cut off this 'if' statment, if it is necessery.

It's not necessary, but from my point of view increases readability of
the code a lot for the price of an additional readl().

> 
> > > +		val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> > > +		val &= ~irq_mask;
> > > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> > > +	}

> > > +
> > > +	return min_t(size_t, __ffs(sdl), max_width);
> > > +}
> > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > +{
> > > +	struct axi_dma_chan *chan = desc->chan;
> > > +	struct dw_axi_dma *dw = chan->chip->dw;
> > > +	struct axi_dma_desc *child, *_next;
> > > +	unsigned int descs_put = 0;
> > > +	list_for_each_entry_safe(child, _next, &desc->xfer_list,
> > > xfer_list) {
> > 
> > xfer_list looks redundant.
> > Can you elaborate why virtual channel management is not working for
> > you?
> 
> Each virtual descriptor encapsulates several hardware descriptors,
> which belong to same transfer.
> This list (xfer_list) is used only for allocating/freeing these
> descriptors and it doesn't affect on virtual dma work logic.
> I can see this approach in several drivers with VirtDMA (but they
> mostly use array instead of list)

You described how most of the DMA drivers are implemented, though they
are using just sg_list directly. I would recommend to do the same and
get rid of this list.

> > Btw, are you planning to use priority at all? For now on I didn't
> > see
> > a single driver (from the set I have checked, like 4-5 of them) that
> > uses priority anyhow. It makes driver more complex for nothing.
> 
> Only for dma slave operations.

So, in other words you *have* an actual two or more users that *need*
prioritization?

> > > +	if (unlikely(dst_nents == 0 || src_nents == 0))
> > > +		return NULL;
> > > +
> > > +	if (unlikely(dst_sg == NULL || src_sg == NULL))
> > > +		return NULL;
> > 
> > If we need those checks they should go to dmaengine.h/dmaengine.c.
> 
> I checked several drivers, which implements device_prep_dma_sg and
> they
> implements this checkers.
> Should I add something like "dma_sg_desc_invalid" function to
> dmaengine.h ?

I suppose it's better to implement them in the corresponding helpers in
dmaengine.h.

> > > +static void axi_chan_list_dump_lli(struct axi_dma_chan *chan,
> > > +				   struct axi_dma_desc
> > > *desc_head)
> > > +{
> > > +	struct axi_dma_desc *desc;
> > > +
> > > +	axi_chan_dump_lli(chan, desc_head);
> > > +	list_for_each_entry(desc, &desc_head->xfer_list,
> > > xfer_list)
> > > +		axi_chan_dump_lli(chan, desc);
> > > +}
> > > +
> > > +static void axi_chan_handle_err(struct axi_dma_chan *chan, u32
> > > status)
> > > +{
> > > +	/* WARN about bad descriptor */
> > > 
> > > +	dev_err(chan2dev(chan),
> > > +		"Bad descriptor submitted for %s, cookie: %d,
> > > irq:
> > > 0x%08x\n",
> > > +		axi_chan_name(chan), vd->tx.cookie, status);
> > > +	axi_chan_list_dump_lli(chan, vd_to_axi_desc(vd));
> > 
> > As I said earlier dw_dmac is *bad* example of the (virtual channel
> > based) DMA driver.
> > 
> > I guess you may just fail the descriptor and don't pretend it has
> > been processed successfully.
> 
> What do you mean by saying "fail the descriptor"?
> After I get error I cancel current transfer and free all descriptors
> from it (by calling vchan_cookie_complete).
> I can't store error status in descriptor structure because it will be
> freed by vchan_cookie_complete.
> I can't store error status in channel structure because it will be
> overwritten by next transfer.

Better not to pretend that it has been processed successfully. Don't
call callback on it and set its status to DMA_ERROR (that's why
descriptors in many drivers have dma_status field). When user asks for
status (using cookie) the saved value would be returned until descriptor
is active. 

Do you have some other workflow in mind?

> > > +
> > > +static const struct dev_pm_ops dw_axi_dma_pm_ops = {
> > > +	SET_RUNTIME_PM_OPS(axi_dma_runtime_suspend,
> > > axi_dma_runtime_resume, NULL)
> > > +};
> > 
> > Have you tried to build with CONFIG_PM disabled?
> 
> Yes.
> 
> > I'm pretty sure you need __maybe_unused applied to your PM ops.
> 
> I call axi_dma_runtime_suspend / axi_dma_runtime_resume even I dont't
> use PM.
> (I call them in probe / remove function.)

Hmm... I didn't check your ->probe() and ->remove(). Do you mean you
call them explicitly by those names?

If so, please don't do that. Use pm_runtime_*() instead. And...

> So I don't need to declare them with __maybe_unused.

...in that case it's possible you have them defined but not used.


> >> +struct axi_dma_chan {
> > > +	struct axi_dma_chip		*chip;
> > > +	void __iomem			*chan_regs;
> > > +	u8				id;
> > > +	atomic_t			descs_allocated;
> > > +
> > > +	struct virt_dma_chan		vc;
> > > +
> > > +	/* these other elements are all protected by vc.lock */
> > > +	bool				is_paused;
> > 
> > I still didn't get (already forgot) why you can't use dma_status
> > instead for the active descriptor?
> 
> As I said before, I checked several driver, which have status variable
> in their channel structure - it is used *only* for determinating is
> channel paused or not. So there is no much sense in replacing
> "is_paused" to "status" and I left "is_paused" variable untouched.

Not only (see above), the errored descriptor keeps that status.

> (I described above why we can't use status in channel structure for
> error handling)

Ah, I'm talking about descriptor.

> > > Status Fetch Addr */
> > > +#define CH_INTSTATUS_ENA	0x080 /* R/W Chan Interrupt
> > > Status
> > > Enable */
> > > +#define CH_INTSTATUS		0x088 /* R/W Chan Interrupt
> > > Status */
> > > +#define CH_INTSIGNAL_ENA	0x090 /* R/W Chan Interrupt
> > > Signal
> > > Enable */
> > > +#define CH_INTCLEAR		0x098 /* W Chan Interrupt
> > > Clear
> > > */
> > 
> > I'm wondering if you can use regmap API instead.
> 
> Is it really necessary? It'll make driver more complex for nothing.

That's why I'm not suggesting but wondering.

> > > +	DWAXIDMAC_BURST_TRANS_LEN_1024
> > 
> > ..._1024, ?
> 
> What exactly you are asking about?

Comma at the end.

> 
> > > +};
> > 
> > Hmm... do you need them in the header?
> 
> I use some of these definitions in my code so I guess yes.
> /* Maybe I misunderstood your question... */

I mean, who are the users of them? If it's only one module, there is no
need to put them in header.

> 
> > > +#define CH_CFG_H_TT_FC_POS	0
> > > +enum {
> > > +	DWAXIDMAC_TT_FC_MEM_TO_MEM_DMAC	= 0x0,
> > > +	DWAXIDMAC_TT_FC_MEM_TO_PER_DMAC,
> > > +	DWAXIDMAC_TT_FC_PER_TO_MEM_DMAC,
> > > +	DWAXIDMAC_TT_FC_PER_TO_PER_DMAC,
> > > +	DWAXIDMAC_TT_FC_PER_TO_MEM_SRC,
> > > +	DWAXIDMAC_TT_FC_PER_TO_PER_SRC,
> > > +	DWAXIDMAC_TT_FC_MEM_TO_PER_DST,
> > > +	DWAXIDMAC_TT_FC_PER_TO_PER_DST
> > > +};
> > 
> > Some of definitions are the same as for dw_dmac, right? We might
> > split them to a common header, though I have no strong opinion about
> 
> it.
> > Vinod?
> 
> APB DMAC and AXI DMAC have completely different regmap. So there is no
> much sense to do that.

I'm not talking about registers, I'm talking about definitions like
above. Though it would be indeed little sense to split some common code
between those two.

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-21 15:13       ` Andy Shevchenko
@ 2017-04-24 15:55         ` Eugeniy Paltsev
  2017-04-24 16:56           ` Andy Shevchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-24 15:55 UTC (permalink / raw)
  To: andriy.shevchenko
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	Eugeniy.Paltsev, linux-snps-arc, dan.j.williams, dmaengine

Hi,
On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> On Fri, 2017-04-21 at 14:29 +0000, Eugeniy Paltsev wrote:
> > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > This patch adds support for the DW AXI DMAC controller.
> > > > +#define AXI_DMA_BUSWIDTHS		  \
> > > > +	(DMA_SLAVE_BUSWIDTH_1_BYTE	| \
> > > > +	DMA_SLAVE_BUSWIDTH_2_BYTES	| \
> > > > +	DMA_SLAVE_BUSWIDTH_4_BYTES	| \
> > > > +	DMA_SLAVE_BUSWIDTH_8_BYTES	| \
> > > > +	DMA_SLAVE_BUSWIDTH_16_BYTES	| \
> > > > +	DMA_SLAVE_BUSWIDTH_32_BYTES	| \
> > > > +	DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > > +/* TODO: check: do we need to use BIT() macro here? */
> > >
> > > Still TODO? I remember I answered to this on the first round.
> >
> > Yes, I remember it.
> > I left this TODO as a reminder because src_addr_widths and
> > dst_addr_widths are
> > not used anywhere and they are set differently in different drivers
> > (with or without BIT macro).
>
> Strange. AFAIK they are representing bits (which is not the best
> idea) in the resulting u64 field. So, anything bigger than 63 doesn't
> make sense.
They are u32 fields!
>From dmaengine.h :
struct dma_device {
...
    u32 src_addr_widths;
    u32 dst_addr_widths;
};

> In drivers where they are not bits quite likely a bug is hidden.
For example (from pxa_dma.c):
const enum dma_slave_buswidth widths = DMA_SLAVE_BUSWIDTH_1_BYTE |
DMA_SLAVE_BUSWIDTH_2_BYTES | DMA_SLAVE_BUSWIDTH_4_BYTES;

And there are a lot of drivers, which don't use BIT for this fields.
sh/shdmac.c
sh/rcar-dmac.c
qcom/bam_dma.c
mmp_pdma.c
ste_dma40.c
And many others...

> >
> > > > +
> > > > +static inline void
> > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > > > +{
> > > > +	iowrite32(val, chip->regs + reg);
> > >
> > > Are you going to use IO ports for this IP? I don't think so.
> > > Wouldn't be better to call readl()/writel() instead?
> >
> > As I understand, it's better to use ioread/iowrite as more
> > universal
> > IO
> > access way. Am I wrong?
>
> As I said above the ioreadX/iowriteX makes only sense when your IP
> would be accessed via IO region or MMIO. I'm pretty sure IO is not
> the case at all for this IP.
MMIO? This IP works exactly via memory-mapped I/O.

> > > > +static inline void axi_chan_irq_disable(struct axi_dma_chan
> > > > *chan,
> > > > u32 irq_mask)
> > > > +{
> > > > +	u32 val;
> > > > +
> > > > +	if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > > > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > DWAXIDMAC_IRQ_NONE);
> > > > +	} else {
> > >
> > > I don't see the benefit. (Yes, I see one read-less path, I think
> > > it
> > > makes perplexity for nothing here)
> >
> > This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
> > Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until
> > I add DMA_SLAVE support.
> > But I can cut off this 'if' statment, if it is necessery.
>
> It's not necessary, but from my point of view increases readability
> of the code a lot for the price of an additional readl().
Ok.

> >
> > > > +		val = axi_chan_ioread32(chan,
> > > > CH_INTSTATUS_ENA);
> > > > +		val &= ~irq_mask;
> > > > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > val);
> > > > +	}
> > > > +
> > > > +	return min_t(size_t, __ffs(sdl), max_width);
> > > > +}
> > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > +{
> > > > +	struct axi_dma_chan *chan = desc->chan;
> > > > +	struct dw_axi_dma *dw = chan->chip->dw;
> > > > +	struct axi_dma_desc *child, *_next;
> > > > +	unsigned int descs_put = 0;
> > > > +	list_for_each_entry_safe(child, _next, &desc-
> > > > >xfer_list,
> > > > xfer_list) {
> > >
> > > xfer_list looks redundant.
> > > Can you elaborate why virtual channel management is not working
> > > for
> > > you?
> >
> > Each virtual descriptor encapsulates several hardware descriptors,
> > which belong to same transfer.
> > This list (xfer_list) is used only for allocating/freeing these
> > descriptors and it doesn't affect on virtual dma work logic.
> > I can see this approach in several drivers with VirtDMA (but they
> > mostly use array instead of list)
>
> You described how most of the DMA drivers are implemented, though
> they
> are using just sg_list directly. I would recommend to do the same and
> get rid of this list.
This IP can be (ans is) configured with small block size.
(note, that I am not saying about runtime HW configuration)

And there is opportunity what we can't use sg_list directly and need to
split sg_list to a smaller chunks.

> > > Btw, are you planning to use priority at all? For now on I didn't
> > > see
> > > a single driver (from the set I have checked, like 4-5 of them)
> > > that
> > > uses priority anyhow. It makes driver more complex for nothing.
> >
> > Only for dma slave operations.
>
> So, in other words you *have* an actual two or more users that *need*
> prioritization?
As I remember there was an idea to give higher priority to audio dma
chanels.

> > > > +	if (unlikely(dst_nents == 0 || src_nents == 0))
> > > > +		return NULL;
> > > > +
> > > > +	if (unlikely(dst_sg == NULL || src_sg == NULL))
> > > > +		return NULL;
> > >
> > > If we need those checks they should go to
> > > dmaengine.h/dmaengine.c.
> >
> > I checked several drivers, which implements device_prep_dma_sg and
> > they
> > implements this checkers.
> > Should I add something like "dma_sg_desc_invalid" function to
> > dmaengine.h ?
>
> I suppose it's better to implement them in the corresponding helpers
> in dmaengine.h.
Ok.

> > > > +static void axi_chan_list_dump_lli(struct axi_dma_chan *chan,
> > > > +				   struct axi_dma_desc
> > > > *desc_head)
> > > > +{
> > > > +	struct axi_dma_desc *desc;
> > > > +
> > > > +	axi_chan_dump_lli(chan, desc_head);
> > > > +	list_for_each_entry(desc, &desc_head->xfer_list,
> > > > xfer_list)
> > > > +		axi_chan_dump_lli(chan, desc);
> > > > +}
> > > > +
> > > > +static void axi_chan_handle_err(struct axi_dma_chan *chan, u32
> > > > status)
> > > > +{
> > > > +	/* WARN about bad descriptor */
> > > >
> > > > +	dev_err(chan2dev(chan),
> > > > +		"Bad descriptor submitted for %s, cookie: %d,
> > > > irq:
> > > > 0x%08x\n",
> > > > +		axi_chan_name(chan), vd->tx.cookie, status);
> > > > +	axi_chan_list_dump_lli(chan, vd_to_axi_desc(vd));
> > >
> > > As I said earlier dw_dmac is *bad* example of the (virtual
> > > channel
> > > based) DMA driver.
> > >
> > > I guess you may just fail the descriptor and don't pretend it has
> > > been processed successfully.
> >
> > What do you mean by saying "fail the descriptor"?
> > After I get error I cancel current transfer and free all
> > descriptors
> > from it (by calling vchan_cookie_complete).
> > I can't store error status in descriptor structure because it will
> > be
> > freed by vchan_cookie_complete.
> > I can't store error status in channel structure because it will be
> > overwritten by next transfer.
>
> Better not to pretend that it has been processed successfully. Don't
> call callback on it and set its status to DMA_ERROR (that's why
> descriptors in many drivers have dma_status field). When user asks
> for
> status (using cookie) the saved value would be returned until
> descriptor
> is active.
>
> Do you have some other workflow in mind?

Hmm...
Do you mean I should left error descriptors in desc_issued list
or I should create another list (like desc_error) in my driver and move
error descriptors to desc_error list?

And when exactly should I free error descriptors?

I checked hsu/hsu.c dma driver implementation:
  vdma descriptor is deleted from desc_issued list when transfer
  starts. When descriptor marked as error descriptor
  vchan_cookie_complete isn't called for this descriptor. And this
  descriptor isn't placed in any list. So error descriptors *never*
  will be freed.
  I don't actually like this approach.

> > > > +
> > > > +static const struct dev_pm_ops dw_axi_dma_pm_ops = {
> > > > +	SET_RUNTIME_PM_OPS(axi_dma_runtime_suspend,
> > > > axi_dma_runtime_resume, NULL)
> > > > +};
> > >
> > > Have you tried to build with CONFIG_PM disabled?
> >
> > Yes.
> >
> > > I'm pretty sure you need __maybe_unused applied to your PM ops.
> >
> > I call axi_dma_runtime_suspend / axi_dma_runtime_resume even I
> > dont't
> > use PM.
> > (I call them in probe / remove function.)
>
> Hmm... I didn't check your ->probe() and ->remove(). Do you mean you
> call them explicitly by those names?
>
> If so, please don't do that. Use pm_runtime_*() instead. And...
>
> > So I don't need to declare them with __maybe_unused.
>
> ...in that case it's possible you have them defined but not used.
>
>From my ->probe() function:

pm_runtime_get_noresume(chip->dev);
ret = axi_dma_runtime_resume(chip->dev);

Firstly I only incrememt counter.
Secondly explicitly call my resume function.

I call them explicitly because I need driver to work also without
Runtime PM. So I can't just call pm_runtime_get here instead of
pm_runtime_get_noresume + axi_dma_runtime_resume.

Of course I can copy *all* code from axi_dma_runtime_resume
to ->probe() function, but I don't really like this idea.

> > > > +struct axi_dma_chan {
> > > > +	struct axi_dma_chip		*chip;
> > > > +	void __iomem			*chan_regs;
> > > > +	u8				id;
> > > > +	atomic_t			descs_allocated;
> > > > +
> > > > +	struct virt_dma_chan		vc;
> > > > +
> > > > +	/* these other elements are all protected by vc.lock
> > > > */
> > > > +	bool				is_paused;
> > >
> > > I still didn't get (already forgot) why you can't use dma_status
> > > instead for the active descriptor?
> >
> > As I said before, I checked several driver, which have status
> > variable
> > in their channel structure - it is used *only* for determinating is
> > channel paused or not. So there is no much sense in replacing
> > "is_paused" to "status" and I left "is_paused" variable untouched.
>
> Not only (see above), the errored descriptor keeps that status.
>
> > (I described above why we can't use status in channel structure for
> > error handling)
>
> Ah, I'm talking about descriptor.

Again - PAUSED is per-channel flag. So, even if we have status field in
each descriptor, it is simpler to use one per-channel flag instead of
plenty per-descriptor flags.
When we pausing/resuming dma channel it is simpler to set only one flag
instead of writing DMA_PAUSED to *each* descriptor status field.

> > > > Status Fetch Addr */
> > > > +#define CH_INTSTATUS_ENA	0x080 /* R/W Chan Interrupt
> > > > Status
> > > > Enable */
> > > > +#define CH_INTSTATUS		0x088 /* R/W Chan
> > > > Interrupt
> > > > Status */
> > > > +#define CH_INTSIGNAL_ENA	0x090 /* R/W Chan Interrupt
> > > > Signal
> > > > Enable */
> > > > +#define CH_INTCLEAR		0x098 /* W Chan Interrupt
> > > > Clear
> > > > */
> > >
> > > I'm wondering if you can use regmap API instead.
> >
> > Is it really necessary? It'll make driver more complex for nothing.
>
> That's why I'm not suggesting but wondering.
> >
> > > > +};
> > >
> > > Hmm... do you need them in the header?
> >
> > I use some of these definitions in my code so I guess yes.
> > /* Maybe I misunderstood your question... */
>
> I mean, who are the users of them? If it's only one module, there is
> no need to put them in header.
Yes, only one module.
Should I move all this definitions to axi_dma_platform.c file and rid
of both axi_dma_platform_reg.h and axi_dma_platform.h headers?

> >
> > > > +#define CH_CFG_H_TT_FC_POS	0
> > > > +enum {
> > > > +	DWAXIDMAC_TT_FC_MEM_TO_MEM_DMAC	= 0x0,
> > > > +	DWAXIDMAC_TT_FC_MEM_TO_PER_DMAC,
> > > > +	DWAXIDMAC_TT_FC_PER_TO_MEM_DMAC,
> > > > +	DWAXIDMAC_TT_FC_PER_TO_PER_DMAC,
> > > > +	DWAXIDMAC_TT_FC_PER_TO_MEM_SRC,
> > > > +	DWAXIDMAC_TT_FC_PER_TO_PER_SRC,
> > > > +	DWAXIDMAC_TT_FC_MEM_TO_PER_DST,
> > > > +	DWAXIDMAC_TT_FC_PER_TO_PER_DST
> > > > +};
> > >
> > > Some of definitions are the same as for dw_dmac, right? We might
> > > split them to a common header, though I have no strong opinion
> > > about
> >
> > it.
> > > Vinod?
> >
> > APB DMAC and AXI DMAC have completely different regmap. So there is
> > no
> > much sense to do that.
>
> I'm not talking about registers, I'm talking about definitions like
> above. Though it would be indeed little sense to split some common
> code between those two.
This definitions are the fields of some registers. So they are also
different.

--
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-24 15:55         ` Eugeniy Paltsev
@ 2017-04-24 16:56           ` Andy Shevchenko
  2017-04-25 15:16             ` Eugeniy Paltsev
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2017-04-24 16:56 UTC (permalink / raw)
  To: Eugeniy Paltsev
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	linux-snps-arc, dan.j.williams, dmaengine

On Mon, 2017-04-24 at 15:55 +0000, Eugeniy Paltsev wrote:
> Hi,
> On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > On Fri, 2017-04-21 at 14:29 +0000, Eugeniy Paltsev wrote:
> > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > > This patch adds support for the DW AXI DMAC controller.
> > > > > +#define AXI_DMA_BUSWIDTHS		  \
> > > > > +	(DMA_SLAVE_BUSWIDTH_1_BYTE	| \
> > > > > +	DMA_SLAVE_BUSWIDTH_2_BYTES	| \
> > > > > +	DMA_SLAVE_BUSWIDTH_4_BYTES	| \
> > > > > +	DMA_SLAVE_BUSWIDTH_8_BYTES	| \
> > > > > +	DMA_SLAVE_BUSWIDTH_16_BYTES	| \
> > > > > +	DMA_SLAVE_BUSWIDTH_32_BYTES	| \
> > > > > +	DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > > > +/* TODO: check: do we need to use BIT() macro here? */
> > > > 
> > > > Still TODO? I remember I answered to this on the first round.
> > > 
> > > Yes, I remember it.
> > > I left this TODO as a reminder because src_addr_widths and
> > > dst_addr_widths are
> > > not used anywhere and they are set differently in different
> > > drivers
> > > (with or without BIT macro).
> > 
> > Strange. AFAIK they are representing bits (which is not the best
> > idea) in the resulting u64 field. So, anything bigger than 63
> > doesn't
> >  make sense.
> 
> They are u32 fields!

Even "better"!

> From dmaengine.h :
> struct dma_device {
> ...
>     u32 src_addr_widths;
>     u32 dst_addr_widths;
> };
> 
> > In drivers where they are not bits quite likely a bug is hidden.
> 
> For example (from pxa_dma.c):
> const enum dma_slave_buswidth widths = DMA_SLAVE_BUSWIDTH_1_BYTE |
> DMA_SLAVE_BUSWIDTH_2_BYTES | DMA_SLAVE_BUSWIDTH_4_BYTES;
> 
> And there are a lot of drivers, which don't use BIT for this fields.
> sh/shdmac.c
> sh/rcar-dmac.c
> qcom/bam_dma.c
> mmp_pdma.c
> ste_dma40.c
> And many others...

Definitely the concept of that interface never thought enough and broken
by design.

> > > > > +static inline void
> > > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32
> > > > > val)
> > > > > +{
> > > > > +	iowrite32(val, chip->regs + reg);
> > > > 
> > > > Are you going to use IO ports for this IP? I don't think so.
> > > > Wouldn't be better to call readl()/writel() instead?
> > > 
> > > As I understand, it's better to use ioread/iowrite as more
> > > universal
> > > IO
> > > access way. Am I wrong?
> > 
> > As I said above the ioreadX/iowriteX makes only sense when your IP
> > would be accessed via IO region or MMIO. I'm pretty sure IO is not
> > the case at all for this IP.
> 
> MMIO? This IP works exactly via memory-mapped I/O.

Yes, and why do you need to check this on each IO read/write?
Please, switch to plain readX()/writeX() instead.

> > > > > +		val = axi_chan_ioread32(chan,
> > > > > CH_INTSTATUS_ENA);
> > > > > +		val &= ~irq_mask;
> > > > > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > > val);
> > > > > +	}
> > > > > +
> > > > > +	return min_t(size_t, __ffs(sdl), max_width);
> > > > > +}
> > > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > > +{
> > > > > +	struct axi_dma_chan *chan = desc->chan;
> > > > > +	struct dw_axi_dma *dw = chan->chip->dw;
> > > > > +	struct axi_dma_desc *child, *_next;
> > > > > +	unsigned int descs_put = 0;
> > > > > +	list_for_each_entry_safe(child, _next, &desc-
> > > > > > xfer_list,
> > > > > 
> > > > > xfer_list) {
> > > > 
> > > > xfer_list looks redundant.
> > > > Can you elaborate why virtual channel management is not working
> > > > for
> > > > you?
> > > 
> > > Each virtual descriptor encapsulates several hardware descriptors,
> > > which belong to same transfer.
> > > This list (xfer_list) is used only for allocating/freeing these
> > > descriptors and it doesn't affect on virtual dma work logic.
> > > I can see this approach in several drivers with VirtDMA (but they
> > > mostly use array instead of list)
> > 
> > You described how most of the DMA drivers are implemented, though
> > they
> > are using just sg_list directly. I would recommend to do the same
> > and
> > get rid of this list.
> 
> This IP can be (ans is) configured with small block size.
> (note, that I am not saying about runtime HW configuration)
> 
> And there is opportunity what we can't use sg_list directly and need
> to
> split sg_list to a smaller chunks.

That's what I have referred quite ago. The driver should provide an
interface to tell potential caller what maximum block (number of items
with given bus width) it supports.

We have struct dma_parms in struct device, but what we actually need is
to support similar on per channel basis in DMAengine framework.

So, instead of working around this I recommend either to implement it
properly or rely on the fact that in the future someone eventually does
that for you.

Each driver which has this re-splitting mechanism should be cleaned up
and refactored.

> > > > Btw, are you planning to use priority at all? For now on I
> > > > didn't
> > > > see
> > > > a single driver (from the set I have checked, like 4-5 of them)
> > > > that
> > > > uses priority anyhow. It makes driver more complex for nothing.
> > > 
> > > Only for dma slave operations.
> > 
> > So, in other words you *have* an actual two or more users that
> > *need*
> > prioritization?
> 
> As I remember there was an idea to give higher priority to audio dma
> chanels.

I don't see cyclic transfers support in the driver. So, I would suggest
just drop entire prioritization for now. When it would be actual user
one may start thinking of it.

Just a rule of common sense: do not implement something which will have
no user or solve non-existing problem.

> > > > As I said earlier dw_dmac is *bad* example of the (virtual
> > > > channel
> > > > based) DMA driver.
> > > > 
> > > > I guess you may just fail the descriptor and don't pretend it
> > > > has
> > > > been processed successfully.
> > > 
> > > What do you mean by saying "fail the descriptor"?
> > > After I get error I cancel current transfer and free all
> > > descriptors
> > > from it (by calling vchan_cookie_complete).
> > > I can't store error status in descriptor structure because it will
> > > be
> > > freed by vchan_cookie_complete.
> > > I can't store error status in channel structure because it will be
> > > overwritten by next transfer.
> > 
> > Better not to pretend that it has been processed successfully. Don't
> > call callback on it and set its status to DMA_ERROR (that's why
> > descriptors in many drivers have dma_status field). When user asks
> > for
> > status (using cookie) the saved value would be returned until
> > descriptor
> > is active.
> > 
> > Do you have some other workflow in mind?
> 
> Hmm...
> Do you mean I should left error descriptors in desc_issued list
> or I should create another list (like desc_error) in my driver and
> move
> error descriptors to desc_error list?
> 
> And when exactly should I free error descriptors?

See below.

> I checked hsu/hsu.c dma driver implementation:
>   vdma descriptor is deleted from desc_issued list when transfer
>   starts. When descriptor marked as error descriptor
>   vchan_cookie_complete isn't called for this descriptor. And this
>   descriptor isn't placed in any list. So error descriptors *never*
>   will be freed.
>   I don't actually like this approach.

Descriptor is active until terminate_all() is called or new descriptor
is supplied. So, the caller has a quite time to check on it.

So, what's wrong on it by your opinion?

Of course, if you want to keep by some reason (should be stated what the
reason in comment) erred descriptors, you can do that.

> > > > > +static const struct dev_pm_ops dw_axi_dma_pm_ops = {
> > > > > +	SET_RUNTIME_PM_OPS(axi_dma_runtime_suspend,
> > > > > axi_dma_runtime_resume, NULL)
> > > > > +};
> > > > 
> > > > Have you tried to build with CONFIG_PM disabled?
> > > 
> > > Yes.
> > > 
> > > > I'm pretty sure you need __maybe_unused applied to your PM ops.
> > > 
> > > I call axi_dma_runtime_suspend / axi_dma_runtime_resume even I
> > > dont't
> > > use PM.
> > > (I call them in probe / remove function.)
> > 
> > Hmm... I didn't check your ->probe() and ->remove(). Do you mean you
> > call them explicitly by those names?
> > 
> > If so, please don't do that. Use pm_runtime_*() instead. And...
> > 
> > > So I don't need to declare them with __maybe_unused.
> > 
> > ...in that case it's possible you have them defined but not used.
> > 
> 
> From my ->probe() function:
> 
> pm_runtime_get_noresume(chip->dev);
> ret = axi_dma_runtime_resume(chip->dev);
> 
> Firstly I only incrememt counter.
> Secondly explicitly call my resume function.
> 
> I call them explicitly because I need driver to work also without
> Runtime PM. So I can't just call pm_runtime_get here instead of
> pm_runtime_get_noresume + axi_dma_runtime_resume.
> 
> Of course I can copy *all* code from axi_dma_runtime_resume
> to ->probe() function, but I don't really like this idea.

It looks like you need more time to investigate how runtime PM works
from driver point of view, but you shouldn't call your PM callbacks
directly without a really good reason (weird silicon bugs, architectural
impediments). I don't think that is the case here.

> > > > > +
> > > > > +	/* these other elements are all protected by vc.lock
> > > > > */
> > > > > +	bool				is_paused;
> > > > 
> > > > I still didn't get (already forgot) why you can't use dma_status
> > > > instead for the active descriptor?
> > > 
> > > As I said before, I checked several driver, which have status
> > > variable
> > > in their channel structure - it is used *only* for determinating
> > > is
> > > channel paused or not. So there is no much sense in replacing
> > > "is_paused" to "status" and I left "is_paused" variable untouched.
> > 
> > Not only (see above), the errored descriptor keeps that status.
> > 
> > > (I described above why we can't use status in channel structure
> > > for
> > > error handling)
> > 
> > Ah, I'm talking about descriptor.
> 
> Again - PAUSED is per-channel flag. So, even if we have status field
> in
> each descriptor, it is simpler to use one per-channel flag instead of
> plenty per-descriptor flags.

> When we pausing/resuming dma channel it is simpler to set only one
> flag
> instead of writing DMA_PAUSED to *each* descriptor status field.

What do you mean by "each"? I don't recall the driver which can handle
more than one *active* descriptor per channel. Do you?

In that case status of active descriptor == status of channel. That
trick (I also already referred to earlier) is used in some drivers.

> > I mean, who are the users of them? If it's only one module, there is
> > no need to put them in header.
> 
> Yes, only one module.
> Should I move all this definitions to axi_dma_platform.c file and rid
> of both axi_dma_platform_reg.h and axi_dma_platform.h headers?

Depends on your design.

If it would be just one C module it might make sense to use driver.c and
driver.h or just driver.c.

I see several drivers in current linux-next that are using latter and
some that are using former, and number of plain driver.c variant is
bigger.

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-24 16:56           ` Andy Shevchenko
@ 2017-04-25 15:16             ` Eugeniy Paltsev
  2017-04-25 18:12               ` Andy Shevchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-25 15:16 UTC (permalink / raw)
  To: andriy.shevchenko
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	Eugeniy.Paltsev, linux-snps-arc, dan.j.williams, dmaengine

On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> On Mon, 2017-04-24 at 15:55 +0000, Eugeniy Paltsev wrote:
> > Hi,
> > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > On Fri, 2017-04-21 at 14:29 +0000, Eugeniy Paltsev wrote:
> > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > > > This patch adds support for the DW AXI DMAC controller.
> > > > > > +static inline void
> > > > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32
> > > > > > val)
> > > > > > +{
> > > > > > +	iowrite32(val, chip->regs + reg);
> > > > >
> > > > > Are you going to use IO ports for this IP? I don't think so.
> > > > > Wouldn't be better to call readl()/writel() instead?
> > > >
> > > > As I understand, it's better to use ioread/iowrite as more
> > > > universal
> > > > IO
> > > > access way. Am I wrong?
> > >
> > > As I said above the ioreadX/iowriteX makes only sense when your
> > > IP
> > > would be accessed via IO region or MMIO. I'm pretty sure IO is
> > > not
> > > the case at all for this IP.
> >
> > MMIO? This IP works exactly via memory-mapped I/O.
>
> Yes, and why do you need to check this on each IO read/write?
> Please, switch to plain readX()/writeX() instead.
Ok, I'll switch to readX()/writeX().

> > > > > > +		val = axi_chan_ioread32(chan,
> > > > > > CH_INTSTATUS_ENA);
> > > > > > +		val &= ~irq_mask;
> > > > > > +		axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > > > val);
> > > > > > +	}
> > > > > > +
> > > > > > +	return min_t(size_t, __ffs(sdl), max_width);
> > > > > > +}
> > > > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > > > +{
> > > > > > +	struct axi_dma_chan *chan = desc->chan;
> > > > > > +	struct dw_axi_dma *dw = chan->chip->dw;
> > > > > > +	struct axi_dma_desc *child, *_next;
> > > > > > +	unsigned int descs_put = 0;
> > > > > > +	list_for_each_entry_safe(child, _next, &desc-
> > > > > > > xfer_list,
> > > > > >
> > > > > > xfer_list) {
> > > > >
> > > > > xfer_list looks redundant.
> > > > > Can you elaborate why virtual channel management is not
> > > > > working
> > > > > for
> > > > > you?
> > > >
> > > > Each virtual descriptor encapsulates several hardware
> > > > descriptors,
> > > > which belong to same transfer.
> > > > This list (xfer_list) is used only for allocating/freeing these
> > > > descriptors and it doesn't affect on virtual dma work logic.
> > > > I can see this approach in several drivers with VirtDMA (but
> > > > they
> > > > mostly use array instead of list)
> > >
> > > You described how most of the DMA drivers are implemented, though
> > > they
> > > are using just sg_list directly. I would recommend to do the same
> > > and
> > > get rid of this list.
> >
> > This IP can be (ans is) configured with small block size.
> > (note, that I am not saying about runtime HW configuration)
> >
> > And there is opportunity what we can't use sg_list directly and
> > need
> > to
> > split sg_list to a smaller chunks.
>
> That's what I have referred quite ago. The driver should provide an
> interface to tell potential caller what maximum block (number of
> items
> with given bus width) it supports.
>
> We have struct dma_parms in struct device, but what we actually need
> is
> to support similar on per channel basis in DMAengine framework.
>
> So, instead of working around this I recommend either to implement it
> properly or rely on the fact that in the future someone eventually
> does that for you.
>
> Each driver which has this re-splitting mechanism should be cleaned
> up and refactored.
I still can't see any pros of this implementation.
There is no performance profit: we anyway need to re-splitt sg_list
(but now in dma-user driver instead of dma driver)

If we want to use same descriptors several times we just can use
DMA_CTRL_REUSE option - the descriptors will be created one time and
re-splitting will be сompleted only one time.

But there are cons of this implementation:
we need to implement re-splitting mechanism in each place we use dma
instead of one dma driver. So there are more places for bugs and etc...

> > > > > Btw, are you planning to use priority at all? For now on I
> > > > > didn't
> > > > > see
> > > > > a single driver (from the set I have checked, like 4-5 of
> > > > > them)
> > > > > that
> > > > > uses priority anyhow. It makes driver more complex for
> > > > > nothing.
> > > >
> > > > Only for dma slave operations.
> > >
> > > So, in other words you *have* an actual two or more users that
> > > *need*
> > > prioritization?
> >
> > As I remember there was an idea to give higher priority to audio
> > dma
> > chanels.
>
> I don't see cyclic transfers support in the driver. So, I would
> suggest
> just drop entire prioritization for now. When it would be actual user
> one may start thinking of it.
> Just a rule of common sense: do not implement something which will
> have no user or solve non-existing problem.

Ok, I'll drop prioritization untill I implement cyclic transfers.

> > > > > As I said earlier dw_dmac is *bad* example of the (virtual
> > > > > channel
> > > > > based) DMA driver.
> > > > >
> > > > > I guess you may just fail the descriptor and don't pretend it
> > > > > has
> > > > > been processed successfully.
> > > >
> > > > What do you mean by saying "fail the descriptor"?
> > > > After I get error I cancel current transfer and free all
> > > > descriptors
> > > > from it (by calling vchan_cookie_complete).
> > > > I can't store error status in descriptor structure because it
> > > > will
> > > > be
> > > > freed by vchan_cookie_complete.
> > > > I can't store error status in channel structure because it will
> > > > be
> > > > overwritten by next transfer.
> > >
> > > Better not to pretend that it has been processed successfully.
> > > Don't
> > > call callback on it and set its status to DMA_ERROR (that's why
> > > descriptors in many drivers have dma_status field). When user
> > > asks for
> > > status (using cookie) the saved value would be returned until
> > > descriptor
> > > is active.
> > >
> > > Do you have some other workflow in mind?
> >
> > Hmm...
> > Do you mean I should left error descriptors in desc_issued list
> > or I should create another list (like desc_error) in my driver and
> > move
> > error descriptors to desc_error list?
> >
> > And when exactly should I free error descriptors?
> See below.
>
> > I checked hsu/hsu.c dma driver implementation:
> >   vdma descriptor is deleted from desc_issued list when transfer
> >   starts. When descriptor marked as error descriptor
> >   vchan_cookie_complete isn't called for this descriptor. And this
> >   descriptor isn't placed in any list. So error descriptors *never*
> >   will be freed.
> >   I don't actually like this approach.
>
> Descriptor is active until terminate_all() is called or new
> descriptor
> is supplied. So, the caller has a quite time to check on it.
>
> So, what's wrong on it by your opinion?

Hmm, this looks OK. (In my example (hsu/hsu.c driver) error descriptors
are not freed even after terminate_all is called)

> Of course, if you want to keep by some reason (should be stated what
> the reason in comment) erred descriptors, you can do that.

So, I'll create desc_error list and store failed descriptors in this
list until terminate_all() is called.
Is it OK implementation?

> > > > > > +static const struct dev_pm_ops dw_axi_dma_pm_ops = {
> > > > > > +	SET_RUNTIME_PM_OPS(axi_dma_runtime_suspend,
> > > > > > axi_dma_runtime_resume, NULL)
> > > > > > +};
> > > > >
> > > > > Have you tried to build with CONFIG_PM disabled?
> > > >
> > > > Yes.
> > > >
> > > > > I'm pretty sure you need __maybe_unused applied to your PM
> > > > > ops.
> > > >
> > > > I call axi_dma_runtime_suspend / axi_dma_runtime_resume even I
> > > > dont't
> > > > use PM.
> > > > (I call them in probe / remove function.)
> > >
> > > Hmm... I didn't check your ->probe() and ->remove(). Do you mean
> > > you
> > > call them explicitly by those names?
> > >
> > > If so, please don't do that. Use pm_runtime_*() instead. And...
> > >
> > > > So I don't need to declare them with __maybe_unused.
> > >
> > > ...in that case it's possible you have them defined but not used.
> > >
> >
> > From my ->probe() function:
> >
> > pm_runtime_get_noresume(chip->dev);
> > ret = axi_dma_runtime_resume(chip->dev);
> >
> > Firstly I only incrememt counter.
> > Secondly explicitly call my resume function.
> >
> > I call them explicitly because I need driver to work also without
> > Runtime PM. So I can't just call pm_runtime_get here instead of
> > pm_runtime_get_noresume + axi_dma_runtime_resume.
> >
> > Of course I can copy *all* code from axi_dma_runtime_resume
> > to ->probe() function, but I don't really like this idea.
>
> It looks like you need more time to investigate how runtime PM works
> from driver point of view, but you shouldn't call your PM callbacks
> directly without a really good reason (weird silicon bugs,
> architectural impediments). I don't think that is the case here.
>
There is a simple reason:
I had to do same actions in probe/remove as in
runtime_resume/runtime_suspend.
(like enabling clock, enabling dma)

If my driver is build with RUNTIME_PM this actions will be
automatically handled by runtime pm (clock and dma will be enabled
before using and disabled after using).
Otherwise, if my driver is build without RUNTIME_PM clock and dma will
be enabled only one time - when ->probe() is called.
So I use runtime_resume callback directly for this purpose (because if
my driver is build without RUNTIME_PM this callback wiil not be called
at all)

> > > > > > +	bool		is_paused;
> > > > >
> > > > > I still didn't get (already forgot) why you can't use
> > > > > dma_status
> > > > > instead for the active descriptor?
> > > >
> > > > As I said before, I checked several driver, which have status
> > > > variable
> > > > in their channel structure - it is used *only* for
> > > > determinating
> > > > is
> > > > channel paused or not. So there is no much sense in replacing
> > > > "is_paused" to "status" and I left "is_paused" variable
> > > > untouched.
> > >
> > > Not only (see above), the errored descriptor keeps that status.
> > >
> > > > (I described above why we can't use status in channel structure
> > > > for
> > > > error handling)
> > >
> > > Ah, I'm talking about descriptor.
> >
> > Again - PAUSED is per-channel flag. So, even if we have status
> > field
> > in
> > each descriptor, it is simpler to use one per-channel flag instead
> > of
> > plenty per-descriptor flags.
> > When we pausing/resuming dma channel it is simpler to set only one
> > flag
> > instead of writing DMA_PAUSED to *each* descriptor status field.
>
> What do you mean by "each"? I don't recall the driver which can
> handle
> more than one *active* descriptor per channel. Do you?
>
> In that case status of active descriptor == status of channel. That
> trick (I also already referred to earlier) is used in some drivers.

Ok, I'll recheck others implementation.

> > > I mean, who are the users of them? If it's only one module, there
> > > is
> > > no need to put them in header.
> >
> > Yes, only one module.
> > Should I move all this definitions to axi_dma_platform.c file and
> > rid
> > of both axi_dma_platform_reg.h and axi_dma_platform.h headers?
> Depends on your design.
>
> If it would be just one C module it might make sense to use driver.c
> and driver.h or just driver.c.
> I see several drivers in current linux-next that are using latter and
> some that are using former, and number of plain driver.c variant is
> bigger.

Ok.

--
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-25 15:16             ` Eugeniy Paltsev
@ 2017-04-25 18:12               ` Andy Shevchenko
  2017-04-26 15:04                 ` Andy Shevchenko
  2017-04-27 15:34                 ` Eugeniy Paltsev
  0 siblings, 2 replies; 14+ messages in thread
From: Andy Shevchenko @ 2017-04-25 18:12 UTC (permalink / raw)
  To: Eugeniy Paltsev
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	linux-snps-arc, dan.j.williams, dmaengine

On Tue, 2017-04-25 at 15:16 +0000, Eugeniy Paltsev wrote:
> On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > On Mon, 2017-04-24 at 15:55 +0000, Eugeniy Paltsev wrote:
> > > Hi,
> > > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > > On Fri, 2017-04-21 at 14:29 +0000, Eugeniy Paltsev wrote:
> > > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:

> > > This IP can be (ans is) configured with small block size.
> > > (note, that I am not saying about runtime HW configuration)
> > > 
> > > And there is opportunity what we can't use sg_list directly and
> > > need
> > > to
> > > split sg_list to a smaller chunks.
> > 
> > That's what I have referred quite ago. The driver should provide an
> > interface to tell potential caller what maximum block (number of
> > items
> > with given bus width) it supports.
> > 
> > We have struct dma_parms in struct device, but what we actually need
> > is
> > to support similar on per channel basis in DMAengine framework.
> > 
> > So, instead of working around this I recommend either to implement
> > it
> > properly or rely on the fact that in the future someone eventually
> > does that for you.
> > 
> > Each driver which has this re-splitting mechanism should be cleaned
> > up and refactored.
> 
> I still can't see any pros of this implementation.
> There is no performance profit: we anyway need to re-splitt sg_list
> (but now in dma-user driver instead of dma driver)

There is, you seems don't see it.

Currently:
 User:
  prepares sg-list

 DMA driver:
  a) iterates over it, and
  b) if sg_len is bigger than block it re-splits it.

New approach:

 User:
  a) gets DMA channel and its properties
  b) prepares sg-list taking into consideration block size

 DMA driver:
  a) uses the given sg-list and for sake of simplicity bails out if
something wrong with it

So, it means in some cases (where entry is big enough) we have to
prepare list *and* re-split it. It's really performance consuming (like
UART case where buffer is small enough and latency like above matters).

> If we want to use same descriptors several times we just can use
> DMA_CTRL_REUSE option - the descriptors will be created one time and
> re-splitting will be сompleted only one time.

There are two type of descriptors: SW and HW. That flag about SW
descriptor, so, it in most cases has nothing to do with the actual entry
size.

> But there are cons of this implementation:
> we need to implement re-splitting mechanism in each place we use dma
> instead of one dma driver. So there are more places for bugs and
> etc...

No, you completely missed the point.

With new approach we do the preparation only once per descriptor /
transfer and in one place where the sg-list is created.

> > > > Better not to pretend that it has been processed successfully.
> > > > Don't
> > > > call callback on it and set its status to DMA_ERROR (that's why
> > > > descriptors in many drivers have dma_status field). When user
> > > > asks for
> > > > status (using cookie) the saved value would be returned until
> > > > descriptor
> > > > is active.
> > > > 
> > > > Do you have some other workflow in mind?
> > > 
> > > Hmm...
> > > Do you mean I should left error descriptors in desc_issued list
> > > or I should create another list (like desc_error) in my driver and
> > > move
> > > error descriptors to desc_error list?
> > > 
> > > And when exactly should I free error descriptors?
> > 
> > See below.
> > 
> > > I checked hsu/hsu.c dma driver implementation:
> > >    vdma descriptor is deleted from desc_issued list when transfer
> > >    starts. When descriptor marked as error descriptor
> > >    vchan_cookie_complete isn't called for this descriptor. And
> > > this
> > >    descriptor isn't placed in any list. So error descriptors
> > > *never*
> > >    will be freed.
> > >    I don't actually like this approach.
> > 
> > Descriptor is active until terminate_all() is called or new
> > descriptor
> > is supplied. So, the caller has a quite time to check on it.
> > 
> > So, what's wrong on it by your opinion?
> 
> Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> descriptors
> are not freed even after terminate_all is called)

If it's active it will be freed.
Otherwise caller should check somewhere that descriptor fails.

But actually this is fragile and we need to monitor failed descriptors.
Thanks for reporting.

> 
> > Of course, if you want to keep by some reason (should be stated what
> > the reason in comment) erred descriptors, you can do that.
> 
> So, I'll create desc_error list and store failed descriptors in this
> list until terminate_all() is called.
> Is it OK implementation?

Nope, we need to amend virt-chan API for that. I'm on it. Will send a
series soon.

> There is a simple reason:
> I had to do same actions in probe/remove as in
> runtime_resume/runtime_suspend.
> (like enabling clock, enabling dma)

> If my driver is build with RUNTIME_PM this actions will be
> automatically handled by runtime pm (clock and dma will be enabled
> before using and disabled after using).
> Otherwise, if my driver is build without RUNTIME_PM clock and dma will
> be enabled only one time - when ->probe() is called.
> So I use runtime_resume callback directly for this purpose (because if
> my driver is build without RUNTIME_PM this callback wiil not be called
> at all)

Okay, The best way to do that is to create functions which take struct
chip and do that. Re-use them in runtime PM hooks.

But you still need to learn a bit about runtime PM. There is no harm to
call clk_enable() twice in a raw if you guarantee you do the opposite
(twice call clk_disable()) on tear down.

Also I would reconsider clock (un)preparation in those hooks. Why do you
need that part? It's resource consuming.

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-25 18:12               ` Andy Shevchenko
@ 2017-04-26 15:04                 ` Andy Shevchenko
  2017-04-27 13:21                   ` Eugeniy Paltsev
  2017-04-27 15:34                 ` Eugeniy Paltsev
  1 sibling, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2017-04-26 15:04 UTC (permalink / raw)
  To: Eugeniy Paltsev
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	linux-snps-arc, dan.j.williams, dmaengine

On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 15:16 +0000, Eugeniy Paltsev wrote:
> > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > On Mon, 2017-04-24 at 15:55 +0000, Eugeniy Paltsev wrote:

> > > Descriptor is active until terminate_all() is called or new
> > > descriptor
> > > is supplied. So, the caller has a quite time to check on it.
> > > 
> > > So, what's wrong on it by your opinion?
> > 
> > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > descriptors
> > are not freed even after terminate_all is called)
> 
> If it's active it will be freed.
> Otherwise caller should check somewhere that descriptor fails.
> 
> But actually this is fragile and we need to monitor failed
> descriptors.
> Thanks for reporting.
> 
> > 
> > > Of course, if you want to keep by some reason (should be stated
> > > what
> > > the reason in comment) erred descriptors, you can do that.
> > 
> > So, I'll create desc_error list and store failed descriptors in this
> > list until terminate_all() is called.
> > Is it OK implementation?
> 
> Nope, we need to amend virt-chan API for that. I'm on it. Will send a
> series soon.

I have to correct what I wrote before.

We have two options:
a) one I proposed above;
b) move descriptor to complete list and call complete callback with
result.

So, it looks like the b) variant is what is done already in 4 (did I
calculate correctly?) drivers and respective users.

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-26 15:04                 ` Andy Shevchenko
@ 2017-04-27 13:21                   ` Eugeniy Paltsev
  2017-04-27 13:33                     ` Andy Shevchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-27 13:21 UTC (permalink / raw)
  To: andriy.shevchenko
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	Eugeniy.Paltsev, linux-snps-arc, dan.j.williams, dmaengine

On Wed, 2017-04-26 at 18:04 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> > On Tue, 2017-04-25 at 15:16 +0000, Eugeniy Paltsev wrote:
> > > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > > On Mon, 2017-04-24 at 15:55 +0000, Eugeniy Paltsev wrote:
> > > > Descriptor is active until terminate_all() is called or new
> > > > descriptor
> > > > is supplied. So, the caller has a quite time to check on it.
> > > >
> > > > So, what's wrong on it by your opinion?
> > >
> > > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > > descriptors
> > > are not freed even after terminate_all is called)
> >
> > If it's active it will be freed.
> > Otherwise caller should check somewhere that descriptor fails.
> >
> > But actually this is fragile and we need to monitor failed
> > descriptors.
> > Thanks for reporting.
> >
> > >
> > > > Of course, if you want to keep by some reason (should be stated
> > > > what
> > > > the reason in comment) erred descriptors, you can do that.
> > >
> > > So, I'll create desc_error list and store failed descriptors in
> > > this
> > > list until terminate_all() is called.
> > > Is it OK implementation?
> >
> > Nope, we need to amend virt-chan API for that. I'm on it. Will send
> > a series soon.
>
> I have to correct what I wrote before.
>
> We have two options:
> a) one I proposed above;
> b) move descriptor to complete list and call complete callback with
> result.
>
> So, it looks like the b) variant is what is done already in 4 (did I
> calculate correctly?) drivers and respective users.

In my opinion we should call error descriptor complete callback.
But I don't think we should move error descriptor to desc_completed
list.

When descriptor following our error descriptor will be completed
successfully vchan tasklet will be called.
So all descriptors from desc_completed list will be freed (including
our error descriptor)
We'll lost information about error descriptors and we'll not be able to
return correct status from dma_async_is_tx_complete().

In my opinion, we should create desc_error list.
When we get error we'll move descriptor to desc_error list and call
complete callback.

--
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-27 13:21                   ` Eugeniy Paltsev
@ 2017-04-27 13:33                     ` Andy Shevchenko
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Shevchenko @ 2017-04-27 13:33 UTC (permalink / raw)
  To: Eugeniy Paltsev, Lars-Peter Clausen
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	linux-snps-arc, dan.j.williams, dmaengine

On Thu, 2017-04-27 at 13:21 +0000, Eugeniy Paltsev wrote:
> On Wed, 2017-04-26 at 18:04 +0300, Andy Shevchenko wrote:
> > On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> > > On Tue, 2017-04-25 at 15:16 +0000, Eugeniy Paltsev wrote:
> > > > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > > > On Mon, 2017-04-24 at 15:55 +0000, Eugeniy Paltsev wrote:

> > > > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > > > descriptors
> > > > are not freed even after terminate_all is called)
> > > 
> > > If it's active it will be freed.
> > > Otherwise caller should check somewhere that descriptor fails.
> > > 
> > > But actually this is fragile and we need to monitor failed
> > > descriptors.
> > > Thanks for reporting.
> > > 
> > > > 
> > > > > Of course, if you want to keep by some reason (should be
> > > > > stated
> > > > > what
> > > > > the reason in comment) erred descriptors, you can do that.
> > > > 
> > > > So, I'll create desc_error list and store failed descriptors in
> > > > this
> > > > list until terminate_all() is called.
> > > > Is it OK implementation?
> > > 
> > > Nope, we need to amend virt-chan API for that. I'm on it. Will
> > > send
> > > a series soon.
> > 
> > I have to correct what I wrote before.
> > 
> > We have two options:
> > a) one I proposed above;
> > b) move descriptor to complete list and call complete callback with
> > result.
> > 
> > So, it looks like the b) variant is what is done already in 4 (did I
> > calculate correctly?) drivers and respective users.
> 
> In my opinion we should call error descriptor complete callback.
> But I don't think we should move error descriptor to desc_completed
> list.
> 
> When descriptor following our error descriptor will be completed
> successfully vchan tasklet will be called.
> So all descriptors from desc_completed list will be freed (including
> our error descriptor)
> We'll lost information about error descriptors and we'll not be able
> to
> return correct status from dma_async_is_tx_complete().

While I more agree on the point that we better to have explicit list of
failed descriptors, here is another point that user (which is interested
in error checking) has to provide callback_result instead of callback.

> In my opinion, we should create desc_error list.
> When we get error we'll move descriptor to desc_error list and call
> complete callback.

Vinod, Lars, others, what is(are) your opinion(s)?

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver
  2017-04-25 18:12               ` Andy Shevchenko
  2017-04-26 15:04                 ` Andy Shevchenko
@ 2017-04-27 15:34                 ` Eugeniy Paltsev
  1 sibling, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2017-04-27 15:34 UTC (permalink / raw)
  To: andriy.shevchenko
  Cc: vinod.koul, linux-kernel, robh+dt, Alexey.Brodkin, devicetree,
	Eugeniy.Paltsev, linux-snps-arc, dan.j.williams, dmaengine

On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 15:16 +0000, Eugeniy Paltsev wrote:
> > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > On Mon, 2017-04-24 at 15:55 +0000, Eugeniy Paltsev wrote:
> > > > Hi,
> > > > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > > > On Fri, 2017-04-21 at 14:29 +0000, Eugeniy Paltsev wrote:
> > > > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > This IP can be (ans is) configured with small block size.
> > > > (note, that I am not saying about runtime HW configuration)
> > > >
> > > > And there is opportunity what we can't use sg_list directly and
> > > > need
> > > > to
> > > > split sg_list to a smaller chunks.
> > >
> > > That's what I have referred quite ago. The driver should provide
> > > an
> > > interface to tell potential caller what maximum block (number of
> > > items
> > > with given bus width) it supports.
> > >
> > > We have struct dma_parms in struct device, but what we actually
> > > need
> > > is
> > > to support similar on per channel basis in DMAengine framework.
> > >
> > > So, instead of working around this I recommend either to
> > > implement
> > > it
> > > properly or rely on the fact that in the future someone
> > > eventually
> > > does that for you.
> > >
> > > Each driver which has this re-splitting mechanism should be
> > > cleaned
> > > up and refactored.
> >
> > I still can't see any pros of this implementation.
> > There is no performance profit: we anyway need to re-splitt sg_list
> > (but now in dma-user driver instead of dma driver)
--->---
> > If we want to use same descriptors several times we just can use
> > DMA_CTRL_REUSE option - the descriptors will be created one time
> > and re-splitting will be сompleted only one time.
>
> There are two type of descriptors: SW and HW. That flag about SW
> descriptor, so, it in most cases has nothing to do with the actual
> entry size.

Hmm, I checked all virt-dma code attentively and I don't see this
limitation.
The only existing limitation I can see is that we can use
DMA_CTRL_REUSE only for channels supporting slave transfers. (But it is
irrelevant to our discussion)

So, we can use DMA_CTRL_REUSE for both HW/SW descriptor types.

--
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-04-27 15:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07 14:04 [PATCH v2 0/2] dmaengine: Add DW AXI DMAC driver Eugeniy Paltsev
2017-04-07 14:04 ` [PATCH v2 1/2] dt-bindings: Document the Synopsys DW AXI DMA bindings Eugeniy Paltsev
2017-04-07 14:04 ` [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver Eugeniy Paltsev
2017-04-18 12:31   ` Andy Shevchenko
2017-04-21 14:29     ` Eugeniy Paltsev
2017-04-21 15:13       ` Andy Shevchenko
2017-04-24 15:55         ` Eugeniy Paltsev
2017-04-24 16:56           ` Andy Shevchenko
2017-04-25 15:16             ` Eugeniy Paltsev
2017-04-25 18:12               ` Andy Shevchenko
2017-04-26 15:04                 ` Andy Shevchenko
2017-04-27 13:21                   ` Eugeniy Paltsev
2017-04-27 13:33                     ` Andy Shevchenko
2017-04-27 15:34                 ` Eugeniy Paltsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).