All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] fsldma: Add DMA_SLAVE support
@ 2009-05-15 22:56 Ira Snyder
  2009-06-03 18:10 ` Ira Snyder
  2009-06-16 19:01 ` Dan Williams
  0 siblings, 2 replies; 11+ messages in thread
From: Ira Snyder @ 2009-05-15 22:56 UTC (permalink / raw)
  To: Dan Williams, Li Yang, linuxppc-dev

Use the DMA_SLAVE capability of the DMAEngine API to copy/from a
scatterlist into an arbitrary list of hardware address/length pairs.

This allows a single DMA transaction to copy data from several different
devices into a scatterlist at the same time.

This also adds support to enable some controller-specific features such as
external start and external pause of a DMA transaction.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
---

This is a request for comments on this patch. I hunch it is not quite
ready for inclusion, though it is certainly ready for review. Correct
functioning of this patch depends on the patches submitted earlier.

As suggested by Dan Williams, I implemented DMA_SLAVE support for the
fsldma controller to allow me to use the hardware to transfer to/from a
scatterlist to a list of hardware address/length pairs.

I implemented support for the extra features available in the DMA
controller, such as external pause and external start. I have not tested
the features yet. I am willing to drop the support if everything else
looks good.

I have implemented helper functions for creating the list of hardware
address/length pairs as static inline functions in the linux/fsldma.h
header. Should I incorporate these into the driver itself and use
EXPORT_SYMBOL()? I've never done this before :)

Thanks for your review,
Ira

 drivers/dma/fsldma.c   |  226 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fsldma.h |  105 ++++++++++++++++++++++
 2 files changed, 331 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/fsldma.h

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index de0e5c8..465846c 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -29,6 +29,7 @@
 #include <linux/dmapool.h>
 #include <linux/of_platform.h>
 
+#include <linux/fsldma.h>
 #include "fsldma.h"
 
 static void dma_init(struct fsl_dma_chan *fsl_chan)
@@ -530,6 +531,228 @@ fail:
 	return NULL;
 }
 
+/*
+ * Setup the DMA controller for a DMA_SLAVE transaction
+ *
+ * NOTE: this gets the hardware address/length pairs from the
+ * NOTE: struct fsl_dma_slave stored in chan->private
+ *
+ * @param chan the DMA channel
+ * @param sgl the scatterlist to transfer to/from
+ * @param sg_len the number of entries in the scatterlist
+ * @param direction the DMA direction
+ * @param flags DMAEngine flags
+ *
+ * @return a new struct dma_async_tx_descriptor or NULL
+ */
+static struct dma_async_tx_descriptor *fsl_dma_prep_slave_sg(
+	struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len,
+	enum dma_data_direction direction, unsigned long flags)
+{
+	struct fsl_dma_chan *fsl_chan;
+	struct fsl_desc_sw *first = NULL, *prev = NULL, *new = NULL;
+	struct fsl_dma_slave *slave;
+	struct list_head *tx_list;
+	size_t copy;
+
+	int i;
+	struct scatterlist *sg;
+	size_t sg_used;
+	size_t hw_used;
+	struct fsl_dma_hw_addr *hw;
+	dma_addr_t dma_dst, dma_src;
+
+	if (!chan)
+		return NULL;
+
+	if (!chan->private)
+		return NULL;
+
+	fsl_chan = to_fsl_chan(chan);
+	slave = chan->private;
+
+	if (list_empty(&slave->addresses))
+		return NULL;
+
+	hw = list_first_entry(&slave->addresses, struct fsl_dma_hw_addr, entry);
+	hw_used = 0;
+
+	/*
+	 * Build the hardware transaction to copy from the scatterlist to
+	 * the hardware, or from the hardware to the scatterlist
+	 *
+	 * If you are copying from the hardware to the scatterlist and it
+	 * takes two hardware entries to fill an entire page, then both
+	 * hardware entries will be coalesced into the same page
+	 *
+	 * If you are copying from the scatterlist to the hardware and a
+	 * single page can fill two hardware entries, then the data will
+	 * be read out of the page into the first hardware entry, and so on
+	 */
+	for_each_sg(sgl, sg, sg_len, i) {
+		sg_used = 0;
+
+		/* Loop until the entire scatterlist entry is used */
+		while (sg_used < sg_dma_len(sg)) {
+
+			/*
+			 * If we've used up the current hardware address/length
+			 * pair, we need to load a new one
+			 *
+			 * This is done in a while loop so that descriptors with
+			 * length == 0 will be skipped
+			 */
+			while (hw_used >= hw->length) {
+
+				/*
+				 * If the current hardware entry is the last
+				 * entry in the list, we're finished
+				 */
+				if (list_is_last(&hw->entry, &slave->addresses))
+					goto finished;
+
+				/* Get the next hardware address/length pair */
+				hw = list_entry(hw->entry.next,
+						struct fsl_dma_hw_addr, entry);
+				hw_used = 0;
+			}
+
+			/* Allocate the link descriptor from DMA pool */
+			new = fsl_dma_alloc_descriptor(fsl_chan);
+			if (!new) {
+				dev_err(fsl_chan->dev, "No free memory for "
+						       "link descriptor\n");
+				goto fail;
+			}
+#ifdef FSL_DMA_LD_DEBUG
+			dev_dbg(fsl_chan->dev, "new link desc alloc %p\n", new);
+#endif
+
+			/*
+			 * Calculate the maximum number of bytes to transfer,
+			 * making sure it is less than the DMA controller limit
+			 */
+			copy = min_t(size_t, sg_dma_len(sg) - sg_used,
+					     hw->length - hw_used);
+			copy = min_t(size_t, copy, FSL_DMA_BCR_MAX_CNT);
+
+			/*
+			 * DMA_FROM_DEVICE
+			 * from the hardware to the scatterlist
+			 *
+			 * DMA_TO_DEVICE
+			 * from the scatterlist to the hardware
+			 */
+			if (direction == DMA_FROM_DEVICE) {
+				dma_src = hw->address + hw_used;
+				dma_dst = sg_dma_address(sg) + sg_used;
+			} else {
+				dma_src = sg_dma_address(sg) + sg_used;
+				dma_dst = hw->address + hw_used;
+			}
+
+			/* Fill in the descriptor */
+			set_desc_cnt(fsl_chan, &new->hw, copy);
+			set_desc_src(fsl_chan, &new->hw, dma_src);
+			set_desc_dest(fsl_chan, &new->hw, dma_dst);
+
+			/*
+			 * If this is not the first descriptor, chain the
+			 * current descriptor after the previous descriptor
+			 */
+			if (!first) {
+				first = new;
+			} else {
+				set_desc_next(fsl_chan, &prev->hw,
+					      new->async_tx.phys);
+			}
+
+			new->async_tx.cookie = 0;
+			async_tx_ack(&new->async_tx);
+
+			prev = new;
+			sg_used += copy;
+			hw_used += copy;
+
+			/* Insert the link descriptor into the LD ring */
+			list_add_tail(&new->node, &first->async_tx.tx_list);
+		}
+	}
+
+finished:
+
+	/* All of the hardware address/length pairs had length == 0 */
+	if (!first || !new)
+		return NULL;
+
+	new->async_tx.flags = flags;
+	new->async_tx.cookie = -EBUSY;
+
+	/* Set End-of-link to the last link descriptor of new list */
+	set_ld_eol(fsl_chan, new);
+
+	/* Enable extra controller features */
+	if (fsl_chan->set_src_loop_size)
+		fsl_chan->set_src_loop_size(fsl_chan, slave->src_loop_size);
+
+	if (fsl_chan->set_dest_loop_size)
+		fsl_chan->set_dest_loop_size(fsl_chan, slave->dst_loop_size);
+
+	if (fsl_chan->toggle_ext_start)
+		fsl_chan->toggle_ext_start(fsl_chan, slave->external_start);
+
+	if (fsl_chan->toggle_ext_pause)
+		fsl_chan->toggle_ext_pause(fsl_chan, slave->external_pause);
+
+	return &first->async_tx;
+
+fail:
+	/* If first was not set, then we failed to allocate the very first
+	 * descriptor, and we're done */
+	if (!first)
+		return NULL;
+
+	/*
+	 * First is set, so all of the descriptors we allocated have been added
+	 * to first->async_tx.tx_list, INCLUDING "first" itself. Therefore we
+	 * must traverse the list backwards freeing each descriptor in turn
+	 *
+	 * We're re-using variables for the loop, oh well
+	 */
+	tx_list = &first->async_tx.tx_list;
+	list_for_each_entry_safe_reverse(new, prev, tx_list, node) {
+		list_del_init(&new->node);
+		dma_pool_free(fsl_chan->desc_pool, new, new->async_tx.phys);
+	}
+
+	return NULL;
+}
+
+static void fsl_dma_device_terminate_all(struct dma_chan *chan)
+{
+	struct fsl_dma_chan *fsl_chan;
+	struct fsl_desc_sw *desc, *tmp;
+	unsigned long flags;
+
+	if (!chan)
+		return;
+
+	fsl_chan = to_fsl_chan(chan);
+
+	/* Halt the DMA engine */
+	dma_halt(fsl_chan);
+
+	spin_lock_irqsave(&fsl_chan->desc_lock, flags);
+
+	/* Remove and free all of the descriptors in the LD queue */
+	list_for_each_entry_safe(desc, tmp, &fsl_chan->ld_queue, node) {
+		list_del(&desc->node);
+		dma_pool_free(fsl_chan->desc_pool, desc, desc->async_tx.phys);
+	}
+
+	spin_unlock_irqrestore(&fsl_chan->desc_lock, flags);
+}
+
 /**
  * fsl_dma_update_completed_cookie - Update the completed cookie.
  * @fsl_chan : Freescale DMA channel
@@ -952,12 +1175,15 @@ static int __devinit of_fsl_dma_probe(struct of_device *dev,
 
 	dma_cap_set(DMA_MEMCPY, fdev->common.cap_mask);
 	dma_cap_set(DMA_INTERRUPT, fdev->common.cap_mask);
+	dma_cap_set(DMA_SLAVE, fdev->common.cap_mask);
 	fdev->common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources;
 	fdev->common.device_free_chan_resources = fsl_dma_free_chan_resources;
 	fdev->common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
 	fdev->common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
 	fdev->common.device_is_tx_complete = fsl_dma_is_complete;
 	fdev->common.device_issue_pending = fsl_dma_memcpy_issue_pending;
+	fdev->common.device_prep_slave_sg = fsl_dma_prep_slave_sg;
+	fdev->common.device_terminate_all = fsl_dma_device_terminate_all;
 	fdev->common.dev = &dev->dev;
 
 	fdev->irq = irq_of_parse_and_map(dev->node, 0);
diff --git a/include/linux/fsldma.h b/include/linux/fsldma.h
new file mode 100644
index 0000000..a42dcdd
--- /dev/null
+++ b/include/linux/fsldma.h
@@ -0,0 +1,105 @@
+/*
+ * Freescale MPC83XX / MPC85XX DMA Controller
+ *
+ * Copyright (c) 2009 Ira W. Snyder <iws@ovro.caltech.edu>
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+
+#ifndef __LINUX_FSLDMA_H__
+#define __LINUX_FSLDMA_H__
+
+#include <linux/dmaengine.h>
+
+/*
+ * physical hardware address / length pair for use with the
+ * DMAEngine DMA_SLAVE API
+ */
+struct fsl_dma_hw_addr {
+	struct list_head entry;
+
+	dma_addr_t address;
+	size_t length;
+};
+
+/*
+ * structure passed to the DMAEngine DMA_SLAVE API via the
+ * chan->private pointer
+ */
+struct fsl_dma_slave {
+
+	/* List of hardware address/length pairs */
+	struct list_head addresses;
+
+	/* Support for extra controller features */
+	unsigned int src_loop_size;
+	unsigned int dst_loop_size;
+	bool external_start;
+	bool external_pause;
+};
+
+/*
+ * Add an address/length pair to an existing DMA_SLAVE structure
+ *
+ * @param slave the DMA_SLAVE structure
+ * @param address the hardware address
+ * @param length the length of bytes to transfer
+ * @return 0 on success, -ERRNO otherwise
+ */
+static inline int fsl_dma_slave_append(struct fsl_dma_slave *slave,
+				       dma_addr_t address, size_t length)
+{
+	struct fsl_dma_hw_addr *addr;
+
+	addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
+	if (!addr)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&addr->entry);
+	addr->address = address;
+	addr->length = length;
+
+	list_add_tail(&addr->entry, &slave->addresses);
+	return 0;
+}
+
+/*
+ * Free a DMA_SLAVE structure and associated address/length pairs
+ *
+ * @param slave the DMA_SLAVE structure to free
+ */
+static inline void fsl_dma_slave_free(struct fsl_dma_slave *slave)
+{
+	struct fsl_dma_hw_addr *addr, *tmp;
+
+	if (slave) {
+		list_for_each_entry_safe(addr, tmp, &slave->addresses, entry) {
+			list_del(&addr->entry);
+			kfree(addr);
+		}
+
+		kfree(slave);
+	}
+}
+
+/*
+ * Allocate a DMA_SLAVE structure
+ *
+ * @param gfp memory allocation flags
+ * @return a new struct fsl_dma_slave or NULL
+ */
+static inline struct fsl_dma_slave *fsl_dma_slave_alloc(gfp_t gfp)
+{
+	struct fsl_dma_slave *slave;
+
+	slave = kzalloc(sizeof(*slave), gfp);
+	if (!slave)
+		return NULL;
+
+	INIT_LIST_HEAD(&slave->addresses);
+	return slave;
+}
+
+#endif /* __LINUX_FSLDMA_H__ */
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-05-15 22:56 [RFC PATCH] fsldma: Add DMA_SLAVE support Ira Snyder
@ 2009-06-03 18:10 ` Ira Snyder
  2009-06-04 11:20   ` Li Yang-R58472
  2009-06-16 19:01 ` Dan Williams
  1 sibling, 1 reply; 11+ messages in thread
From: Ira Snyder @ 2009-06-03 18:10 UTC (permalink / raw)
  To: Dan Williams, Li Yang, linuxppc-dev

On Fri, May 15, 2009 at 03:56:59PM -0700, Ira Snyder wrote:
> Use the DMA_SLAVE capability of the DMAEngine API to copy/from a
> scatterlist into an arbitrary list of hardware address/length pairs.
> 
> This allows a single DMA transaction to copy data from several different
> devices into a scatterlist at the same time.
> 
> This also adds support to enable some controller-specific features such as
> external start and external pause of a DMA transaction.
> 
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> ---
> 
> This is a request for comments on this patch. I hunch it is not quite
> ready for inclusion, though it is certainly ready for review. Correct
> functioning of this patch depends on the patches submitted earlier.
> 
> As suggested by Dan Williams, I implemented DMA_SLAVE support for the
> fsldma controller to allow me to use the hardware to transfer to/from a
> scatterlist to a list of hardware address/length pairs.
> 
> I implemented support for the extra features available in the DMA
> controller, such as external pause and external start. I have not tested
> the features yet. I am willing to drop the support if everything else
> looks good.
> 
> I have implemented helper functions for creating the list of hardware
> address/length pairs as static inline functions in the linux/fsldma.h
> header. Should I incorporate these into the driver itself and use
> EXPORT_SYMBOL()? I've never done this before :)
> 

Any comments on this patch?

I've tested the external start feature, and it works great. I had to set
the DMA Request Count (in the mode register) after the driver was in
control. There is no interface for doing this in the existing driver. I
just ioremap()ed the registers and added the value from my driver after
claiming the channel with dma_request_channel().

There are ways to get the DMA controller into a state where the CB bit
stays set no matter what you do. I have only seen this during an
externally controlled transfer, when the external master does weird
things. I would fix this state in terminate_all(), but I have been
unsuccessful in getting the DMA controller working again after it has
been messed up.

Ira

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-03 18:10 ` Ira Snyder
@ 2009-06-04 11:20   ` Li Yang-R58472
  2009-06-04 16:22     ` Ira Snyder
  0 siblings, 1 reply; 11+ messages in thread
From: Li Yang-R58472 @ 2009-06-04 11:20 UTC (permalink / raw)
  To: Ira Snyder, Dan Williams, linuxppc-dev


>> This is a request for comments on this patch. I hunch it is=20
>not quite=20
>> ready for inclusion, though it is certainly ready for=20
>review. Correct=20
>> functioning of this patch depends on the patches submitted earlier.
>>=20
>> As suggested by Dan Williams, I implemented DMA_SLAVE=20
>support for the=20
>> fsldma controller to allow me to use the hardware to=20
>transfer to/from=20
>> a scatterlist to a list of hardware address/length pairs.
>>=20
>> I implemented support for the extra features available in the DMA=20
>> controller, such as external pause and external start. I have not=20
>> tested the features yet. I am willing to drop the support if=20
>> everything else looks good.
>>=20
>> I have implemented helper functions for creating the list of=20
>hardware=20
>> address/length pairs as static inline functions in the=20
>linux/fsldma.h=20
>> header. Should I incorporate these into the driver itself and use=20
>> EXPORT_SYMBOL()? I've never done this before :)
>>=20
>
>Any comments on this patch?
>

No comment for now.  Didn't get the time to study the DMA_SLAVE thing.
:(

>I've tested the external start feature, and it works great. I=20

Good.  Is it working with your custom board?  Can I verify this with my
reference board?

>had to set the DMA Request Count (in the mode register) after=20
>the driver was in control. There is no interface for doing=20
>this in the existing driver. I just ioremap()ed the registers=20
>and added the value from my driver after claiming the channel=20
>with dma_request_channel().
>
>There are ways to get the DMA controller into a state where=20
>the CB bit stays set no matter what you do. I have only seen=20
>this during an externally controlled transfer, when the=20
>external master does weird things. I would fix this state in=20
>terminate_all(), but I have been unsuccessful in getting the=20
>DMA controller working again after it has been messed up.

What's the frequency for this to happen?  Is the problem reproducable?

- Leo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-04 11:20   ` Li Yang-R58472
@ 2009-06-04 16:22     ` Ira Snyder
  0 siblings, 0 replies; 11+ messages in thread
From: Ira Snyder @ 2009-06-04 16:22 UTC (permalink / raw)
  To: Li Yang-R58472; +Cc: linuxppc-dev, Dan Williams

On Thu, Jun 04, 2009 at 07:20:26PM +0800, Li Yang-R58472 wrote:
> 
> >> This is a request for comments on this patch. I hunch it is 
> >not quite 
> >> ready for inclusion, though it is certainly ready for 
> >review. Correct 
> >> functioning of this patch depends on the patches submitted earlier.
> >> 
> >> As suggested by Dan Williams, I implemented DMA_SLAVE 
> >support for the 
> >> fsldma controller to allow me to use the hardware to 
> >transfer to/from 
> >> a scatterlist to a list of hardware address/length pairs.
> >> 
> >> I implemented support for the extra features available in the DMA 
> >> controller, such as external pause and external start. I have not 
> >> tested the features yet. I am willing to drop the support if 
> >> everything else looks good.
> >> 
> >> I have implemented helper functions for creating the list of 
> >hardware 
> >> address/length pairs as static inline functions in the 
> >linux/fsldma.h 
> >> header. Should I incorporate these into the driver itself and use 
> >> EXPORT_SYMBOL()? I've never done this before :)
> >> 
> >
> >Any comments on this patch?
> >
> 
> No comment for now.  Didn't get the time to study the DMA_SLAVE thing.
> :(
> 

No problem, I just wanted to make sure that it had been noticed.

The DMA_SLAVE thing is pretty simple: it uses chan->private to pass in
arbitrary parameters to the device_prep_slave_sg() routine. The
device_prep_slave_sg() routine takes a scatterlist and a DMA direction,
then uses the private data to setup the transfer.

When you're ready, submit the transfer and go, just like a normal memcpy
operation. It is basically a way to have a device-specific function in
the generic API.

> >I've tested the external start feature, and it works great. I 
> 
> Good.  Is it working with your custom board?  Can I verify this with my
> reference board?
> 

Yes, I only tested it on my custom board. We use the external DMA
control as part of the programming sequence for some FPGA's on the
board.

If I'm reading the eval board's schematic correctly, the pins for
external DMA control aren't hooked to anything useful at the moment.

If you really want to test it, you'll have to do some modifications to
your board. On page 9 of the schematic (upper right corner) you'll see
that the DMA pins are connected to the GPIO controller on the processor.
You could solder some wires between the first and second three GPIO
pins. Then you can configure SICRL so the first 3 pins are configured
for GPIO, and the second 3 pins are configured for external DMA control.
It is a bit of work, but not too bad.

> >had to set the DMA Request Count (in the mode register) after 
> >the driver was in control. There is no interface for doing 
> >this in the existing driver. I just ioremap()ed the registers 
> >and added the value from my driver after claiming the channel 
> >with dma_request_channel().
> >
> >There are ways to get the DMA controller into a state where 
> >the CB bit stays set no matter what you do. I have only seen 
> >this during an externally controlled transfer, when the 
> >external master does weird things. I would fix this state in 
> >terminate_all(), but I have been unsuccessful in getting the 
> >DMA controller working again after it has been messed up.
> 
> What's the frequency for this to happen?  Is the problem reproducable?

I can only cause it to totally lock up when there is an error, and my
FPGA programmer stops toggling the external DMA control pins in the
middle of a transfer. Nothing I can do will get the DMASR[CB] bit to
zero.

I tried causing the problem with a memory-to-memory transfer, and was
able to cause a similar problem. I programmed to the controller to read
32 bytes from 0xc0000000, which isn't part of the memory map.

The DMA never completes, and the controller has the following bits set:
DMAMR[CS]==1
DMASR[TE]==1
DMASR[CB]==1

The fsldma driver doesn't seem to detect the error at this point.

I was able to re-start the DMA controller with the following sequence:
DMAMR[CS]=0
DMASR[TE]=1 DMASR[CB]=1				(write 1's to set bits)
DMACDAR=0 DMASAR=0 DMADAR=0 DMABCR=0 DMANDAR=0	(zero all registers)
DMAMR[CS]=1

Now the fsldma driver tells me "transfer error!". The DMA channel is now
in working order, and I can use it again.

So, I can get something similar by doing a "bad" memory-to-memory
transfer, but I cannot completely lock up the DMA controller.

Thanks,
Ira

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-05-15 22:56 [RFC PATCH] fsldma: Add DMA_SLAVE support Ira Snyder
  2009-06-03 18:10 ` Ira Snyder
@ 2009-06-16 19:01 ` Dan Williams
  2009-06-16 20:12   ` Ira Snyder
  1 sibling, 1 reply; 11+ messages in thread
From: Dan Williams @ 2009-06-16 19:01 UTC (permalink / raw)
  To: Ira Snyder; +Cc: linuxppc-dev, Li Yang

Hi Ira,

Ira Snyder wrote:
> Use the DMA_SLAVE capability of the DMAEngine API to copy/from a
> scatterlist into an arbitrary list of hardware address/length pairs.
> 
> This allows a single DMA transaction to copy data from several different
> devices into a scatterlist at the same time.
> 
> This also adds support to enable some controller-specific features such as
> external start and external pause of a DMA transaction.
> 
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> ---
> 
> This is a request for comments on this patch. I hunch it is not quite
> ready for inclusion, though it is certainly ready for review. Correct
> functioning of this patch depends on the patches submitted earlier.
> 
> As suggested by Dan Williams, I implemented DMA_SLAVE support for the
> fsldma controller to allow me to use the hardware to transfer to/from a
> scatterlist to a list of hardware address/length pairs.
> 
> I implemented support for the extra features available in the DMA
> controller, such as external pause and external start. I have not tested
> the features yet. I am willing to drop the support if everything else
> looks good.
> 
> I have implemented helper functions for creating the list of hardware
> address/length pairs as static inline functions in the linux/fsldma.h
> header. Should I incorporate these into the driver itself and use
> EXPORT_SYMBOL()? I've never done this before :)

Using EXPORT_SYMBOL would defeat the purpose of conforming to the 
dmaengine api which should allow other subsystems to generically 
discover an fsldma resource.

> diff --git a/include/linux/fsldma.h b/include/linux/fsldma.h
> new file mode 100644
> index 0000000..a42dcdd
> --- /dev/null
> +++ b/include/linux/fsldma.h
> @@ -0,0 +1,105 @@
> +/*
> + * Freescale MPC83XX / MPC85XX DMA Controller
> + *
> + * Copyright (c) 2009 Ira W. Snyder <iws@ovro.caltech.edu>
> + *
> + * This file is licensed under the terms of the GNU General Public License
> + * version 2. This program is licensed "as is" without any warranty of any
> + * kind, whether express or implied.
> + */
> +
> +#ifndef __LINUX_FSLDMA_H__
> +#define __LINUX_FSLDMA_H__
> +
> +#include <linux/dmaengine.h>
> +
> +/*
> + * physical hardware address / length pair for use with the
> + * DMAEngine DMA_SLAVE API
> + */
> +struct fsl_dma_hw_addr {
> +       struct list_head entry;
> +
> +       dma_addr_t address;
> +       size_t length;
> +};

Can you explain a bit more why you need the new dma address list, would 
a struct scatterlist suffice?

In general it is difficult to merge new functionality without an in-tree 
user.  Can you share the client of this new api?

I suspect you can get away without needing these new helper routines. 
Have a look at how Haavard implemented his DMA_SLAVE client in 
drivers/mmc/host/atmel-mci.c.

Haavard ended up needing to add some public structure definitions to 
include/linux, but my preference is to keep this in an 
architecture/platform specific header file location if possible.

Regards,
Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-16 19:01 ` Dan Williams
@ 2009-06-16 20:12   ` Ira Snyder
  2009-06-17 17:17     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Ira Snyder @ 2009-06-16 20:12 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, Li Yang

On Tue, Jun 16, 2009 at 12:01:40PM -0700, Dan Williams wrote:
> Hi Ira,
>
> Ira Snyder wrote:
>> Use the DMA_SLAVE capability of the DMAEngine API to copy/from a
>> scatterlist into an arbitrary list of hardware address/length pairs.
>>
>> This allows a single DMA transaction to copy data from several different
>> devices into a scatterlist at the same time.
>>
>> This also adds support to enable some controller-specific features such as
>> external start and external pause of a DMA transaction.
>>
>> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
>> ---
>>
>> This is a request for comments on this patch. I hunch it is not quite
>> ready for inclusion, though it is certainly ready for review. Correct
>> functioning of this patch depends on the patches submitted earlier.
>>
>> As suggested by Dan Williams, I implemented DMA_SLAVE support for the
>> fsldma controller to allow me to use the hardware to transfer to/from a
>> scatterlist to a list of hardware address/length pairs.
>>
>> I implemented support for the extra features available in the DMA
>> controller, such as external pause and external start. I have not tested
>> the features yet. I am willing to drop the support if everything else
>> looks good.
>>
>> I have implemented helper functions for creating the list of hardware
>> address/length pairs as static inline functions in the linux/fsldma.h
>> header. Should I incorporate these into the driver itself and use
>> EXPORT_SYMBOL()? I've never done this before :)
>
> Using EXPORT_SYMBOL would defeat the purpose of conforming to the  
> dmaengine api which should allow other subsystems to generically  
> discover an fsldma resource.
>

Any driver would still use dma_request_channel(), etc. to get access to
a DMA channel. AFAICT, DMA_SLAVE is intended for doing something
completely hardware-specific with the DMA controller.

>> diff --git a/include/linux/fsldma.h b/include/linux/fsldma.h
>> new file mode 100644
>> index 0000000..a42dcdd
>> --- /dev/null
>> +++ b/include/linux/fsldma.h
>> @@ -0,0 +1,105 @@
>> +/*
>> + * Freescale MPC83XX / MPC85XX DMA Controller
>> + *
>> + * Copyright (c) 2009 Ira W. Snyder <iws@ovro.caltech.edu>
>> + *
>> + * This file is licensed under the terms of the GNU General Public License
>> + * version 2. This program is licensed "as is" without any warranty of any
>> + * kind, whether express or implied.
>> + */
>> +
>> +#ifndef __LINUX_FSLDMA_H__
>> +#define __LINUX_FSLDMA_H__
>> +
>> +#include <linux/dmaengine.h>
>> +
>> +/*
>> + * physical hardware address / length pair for use with the
>> + * DMAEngine DMA_SLAVE API
>> + */
>> +struct fsl_dma_hw_addr {
>> +       struct list_head entry;
>> +
>> +       dma_addr_t address;
>> +       size_t length;
>> +};
>
> Can you explain a bit more why you need the new dma address list, would  
> a struct scatterlist suffice?
>

I don't believe so. A scatterlist only holds page/length pairs. How
would you pass an arbitrary dma_addr_t/length pair in a scatterlist. I
/could/ abuse sg_dma_address() and do something like the following, but
I think you'd be even less inclined to take the patch:

struct scatterlist sg[10];

sg_dma_address(sg) = addr1;
sg_dma_len(sg) = len1;
sg++;
sg_dma_address(sg) = addr2;
sg_dma_len(sg) = len2;

/* and so on */

This would mean that there is a scatterlist with the struct page
pointers set to NULL, which has not had dma_map_sg() run on it. Seems
like abuse to me.

What I've implemented is this: (sorry about the poor drawing)

scatterlist           fsl_dma_hw_addr
+--------+            +-------+
|  DATA  | -------->  | DEST1 |
|  DATA  | ----+      +-------+
|  DATA  |     |
|  DATA  |     |      +-------+
|  DATA  |     +--->  | DEST2 |
|  DATA  |            +-------+
+--------+
                          .
                          .
                          .

Of course, the reverse works as well. You can copy from a list of
hardware address/length pairs to a scatterlist.

So, using my implementation of the DMA_SLAVE feature, you can take a big
chunk of data (which is organized into a scatterlist) and DMA it
directly to a set of hardware addresses, all in a single, unbroken
transaction.

I've got an FPGA programmer which needs a ~12MB image dumped to a FIFO
at 0xf0003000 in 4K chunks (all writes must be in the 0xf0003000 to
0xf0004000 range). The programmer is actually in control of the DMA
controller at that time. Internally, the FPGA programmer does some
toggling of pins, etc. which is needed to actually push the image into
the FPGA's themselves.

> In general it is difficult to merge new functionality without an in-tree  
> user.  Can you share the client of this new api?
>

I've inlined the driver for the FPGA programmer below. I don't think it
is appropriate to push into mainline, since it will only work for our
board, and nothing else.

It is pretty simple, but I'm totally open to suggestions for changes. I
used a char device to fill in a scatterlist, then set up the DMA to
0xf0003000 in 4K chunks.

I've got another driver that uses the interface, but this one is a bit
simpler. I can post the other one if you'd like, as well.

> I suspect you can get away without needing these new helper routines.  
> Have a look at how Haavard implemented his DMA_SLAVE client in  
> drivers/mmc/host/atmel-mci.c.
>
> Haavard ended up needing to add some public structure definitions to  
> include/linux, but my preference is to keep this in an  
> architecture/platform specific header file location if possible.
>

I was studying his code when I implemented this routine. He defined
struct dw_dma_slave in include/linux/dw_dmac.h. This is how he passes
information to device_prep_slave_sg() in the mmc driver.

It appears that his controller just uses a single register for
device-to-peripheral transfers. I implemented something different:
scatter/gather IO to/from a list of address/length pairs.

He also used platform data to get the register addresses. I'm unaware of
a way to put arbitrary platform data into the OF tree used on PowerPC.

I didn't want to force other users to implement the allocation routines
for the struct fsl_dma_hw_addr themselves, so I provided routines to do
so.

Thanks for the questions/comments. I feel like I've left something
unanswered, but I can't figure out what it is. Any more questions?
Ira


Some quick notes on the driver below:
carma_device_create()/carma_device_destroy() are just wrappers around
device_create()/device_destroy() which share a struct class, so that all
of the related drivers for FPGA's can go into /sys/class/carma/.

/*
 * CARMA Board Data FPGA Programmer
 *
 * Copyright (c) 2009 Ira W. Snyder <iws@ovro.caltech.edu>
 *
 * This file is licensed under the terms of the GNU General Public License
 * version 2. This program is licensed "as is" without any warranty of any
 * kind, whether express or implied.
 */

#include <linux/cdev.h>
#include <linux/completion.h>
#include <linux/delay.h>
#include <linux/dma-mapping.h>
#include <linux/highmem.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/kernel.h>
#include <linux/leds.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/of_platform.h>
#include <linux/dmaengine.h>

/* Freescale DMA Controller DMA_SLAVE interface */
#include <linux/fsldma.h>

/* MPC8349EMDS specific get_immrbase() */
#include <sysdev/fsl_soc.h>

/* CARMA device class */
#include "carma.h"

static const char driver_name[] = "fpga-program";

/*
 * Maximum firmware size
 *
 * 12849552 bytes for a CARMA Digitizer Board
 * XXXXXXXX bytes for a CARMA Correlator Board
 */
#define FW_SIZE_MAX (16 << 20)

struct fpga_fw_elem {
	struct page *page;
	unsigned int len;
	struct list_head entry;
};

struct fpga_dev {

	/* Character Device */
	struct cdev cdev;
	dev_t devno;

	/* Device Registers */
	struct device *dev;
	void __iomem *regs;
	void __iomem *immr;

	/* Freescale DMA Device */
	struct device *dmadev;
	struct dma_chan *chan;

	/* Interrupts */
	int irq, status;
	struct completion completion;

	/* FPGA Bitfile */
	struct mutex lock;
	struct list_head list;
	size_t bytes;
};

#define to_fpga_dev(X) container_of(X, struct fpga_dev, cdev)

/*----------------------------------------------------------------------------*/
/* FPGA Bitfile Element Allocation Helpers                                    */
/*----------------------------------------------------------------------------*/

static struct fpga_fw_elem *fpga_fw_elem_alloc(gfp_t gfp)
{
	struct fpga_fw_elem *elem;

	elem = kzalloc(sizeof(*elem), gfp);
	if (!elem)
		return NULL;

	elem->page = alloc_page(gfp);
	if (!elem->page) {
		kfree(elem);
		return NULL;
	}

	INIT_LIST_HEAD(&elem->entry);
	return elem;
}

static void fpga_fw_elem_free(struct fpga_fw_elem *elem)
{
	if (elem) {
		__free_page(elem->page);
		kfree(elem);
	}
}

/*----------------------------------------------------------------------------*/
/* FPGA Bitfile Helpers                                                       */
/*----------------------------------------------------------------------------*/

/*
 * Drop the firmware bitfile image from memory
 *
 * LOCKING: you must hold the priv->lock mutex
 *
 * @param priv the driver's private data structure
 */
static void fpga_drop_firmware_data(struct fpga_dev *priv)
{
	struct fpga_fw_elem *elem, *tmp;

	priv->bytes = 0;
	list_for_each_entry_safe(elem, tmp, &priv->list, entry) {
		list_del(&elem->entry);
		fpga_fw_elem_free(elem);
	}
}

static unsigned int list_num_entries(struct list_head *list)
{
	struct list_head *tmp;
	unsigned int num = 0;

	list_for_each(tmp, list)
		num++;

	return num;
}

/*----------------------------------------------------------------------------*/
/* LED Trigger (could be a seperate module)                                   */
/*----------------------------------------------------------------------------*/

/*
 * NOTE: this whole thing does have the problem that whenever the led's are
 * NOTE: first set to use the fpga trigger, they could be in the wrong state
 */

DEFINE_LED_TRIGGER(ledtrig_fpga);

static void ledtrig_fpga_programmed(bool enabled)
{
	if (enabled)
		led_trigger_event(ledtrig_fpga, LED_FULL);
	else
		led_trigger_event(ledtrig_fpga, LED_OFF);
}

/*----------------------------------------------------------------------------*/
/* FPGA Register Helpers                                                      */
/*----------------------------------------------------------------------------*/

/* Register Definitions */
#define FPGA_CONFIG_CONTROL		0x40
#define FPGA_CONFIG_STATUS		0x44
#define FPGA_CONFIG_FIFO_SIZE		0x48
#define FPGA_CONFIG_FIFO_USED		0x4C
#define FPGA_CONFIG_TOTAL_BYTE_COUNT	0x50
#define FPGA_CONFIG_CUR_BYTE_COUNT	0x54

#define FPGA_FIFO_ADDRESS		0x3000

static int fpga_fifo_size(void __iomem *regs)
{
	return ioread32be(regs + FPGA_CONFIG_FIFO_SIZE);
}

static int fpga_config_error(void __iomem *regs)
{
	return ioread32be(regs + FPGA_CONFIG_STATUS) & 0xFFFE;
}

static int fpga_fifo_empty(void __iomem *regs)
{
	return ioread32be(regs + FPGA_CONFIG_FIFO_USED) == 0;
}

static void fpga_fifo_write(void __iomem *regs, u32 val)
{
	iowrite32be(val, regs + FPGA_FIFO_ADDRESS);
}

static void fpga_set_byte_count(void __iomem *regs, u32 count)
{
	iowrite32be(count, regs + FPGA_CONFIG_TOTAL_BYTE_COUNT);
}

static void fpga_programmer_enable(struct fpga_dev *priv, bool dma)
{
	if (dma)
		iowrite32be(0x5, priv->regs + FPGA_CONFIG_CONTROL);
	else
		iowrite32be(0x1, priv->regs + FPGA_CONFIG_CONTROL);
}

static void fpga_programmer_disable(struct fpga_dev *priv)
{
	iowrite32be(0x0, priv->regs + FPGA_CONFIG_CONTROL);
}

#ifdef DEBUG
static void fpga_dump_registers(struct fpga_dev *priv)
{
	/* Dump all status registers */
	dev_info(priv->dev, "Configuration failed, dumping status registers\n");
	dev_info(priv->dev, "Control:    0x%.8x\n", ioread32be(priv->regs + FPGA_CONFIG_CONTROL));
	dev_info(priv->dev, "Status:     0x%.8x\n", ioread32be(priv->regs + FPGA_CONFIG_STATUS));
	dev_info(priv->dev, "FIFO Size:  0x%.8x\n", ioread32be(priv->regs + FPGA_CONFIG_FIFO_SIZE));
	dev_info(priv->dev, "FIFO Used:  0x%.8x\n", ioread32be(priv->regs + FPGA_CONFIG_FIFO_USED));
	dev_info(priv->dev, "FIFO Total: 0x%.8x\n", ioread32be(priv->regs + FPGA_CONFIG_TOTAL_BYTE_COUNT));
	dev_info(priv->dev, "FIFO Curr:  0x%.8x\n", ioread32be(priv->regs + FPGA_CONFIG_CUR_BYTE_COUNT));
}
#else
static void fpga_dump_registers(struct fpga_dev *priv)
{
}
#endif

/*----------------------------------------------------------------------------*/
/* FPGA Power Supply Code                                                     */
/*----------------------------------------------------------------------------*/

/*
 * Determine if the FPGA power is good for all supplies
 */
static bool fpga_power_good(struct fpga_dev *priv)
{
	const int addr[] = { 0x2006, 0x2008, 0x200A, 0x200D, };
	int i;

	for (i = 0; i < ARRAY_SIZE(addr); i++) {
		u8 val = ioread8(priv->regs + addr[i]);
		dev_dbg(priv->dev, "FPGA pgood 0x%.4x -> 0x%.2x\n", addr[i], val);

		if (!(val & 0x1))
			return false;
	}

	return true;
}

/*
 * Disable the FPGA power supplies
 */
static void fpga_disable_power_supplies(struct fpga_dev *priv)
{
	iowrite8(0x00, priv->regs + 0x2007); /* raw supply */
	iowrite8(0x00, priv->regs + 0x2009); /* 1.2v supply */
	iowrite8(0x00, priv->regs + 0x200B); /* 2.5v supply */
	iowrite8(0x00, priv->regs + 0x200E); /* 3.3v supply */
}

/*
 * Enable the FPGA power supplies
 *
 * @return 0 if the power went good, -ERRNO otherwise
 */
static int fpga_enable_power_supplies(struct fpga_dev *priv)
{
	unsigned long timeout;

	iowrite8(0x01, priv->regs + 0x2007); /* raw supply */
	iowrite8(0x7A, priv->regs + 0x2009); /* 1.2v supply */
	iowrite8(0x03, priv->regs + 0x200B); /* 2.5v supply */
	iowrite8(0x01, priv->regs + 0x200E); /* 3.3v supply */

	/* We'll give 1 second for the power supplies to enable */
	timeout = jiffies + HZ;

	while (!time_after(jiffies, timeout)) {
		if (fpga_power_good(priv))
			return 0;

		msleep(10);
	}

	/* Timed out, so disable the power supplies */
	fpga_disable_power_supplies(priv);

	return -ETIMEDOUT;
}

/*
 * Determine if the FPGA power supplies are all enabled
 */
static bool fpga_power_enabled(struct fpga_dev *priv)
{
	const int addr[] = { 0x2007, 0x2009, 0x200B, 0x200E, };
	const int vals[] = { 0x01, 0x7A, 0x03, 0x01, };
	int i;

	BUILD_BUG_ON(ARRAY_SIZE(addr) != ARRAY_SIZE(vals));

	/* Check each enable against the expected values */
	for (i = 0; i < ARRAY_SIZE(addr); i++) {
		if (ioread8(priv->regs + addr[i]) != vals[i])
			return false;
	}

	return true;
}

/*
 * Determine if the FPGA's are programmed and running correctly
 */
static bool fpga_running(struct fpga_dev *priv)
{
	if (!fpga_power_good(priv))
		return false;

	/* Check the config done bit */
	return ioread32be(priv->regs + FPGA_CONFIG_STATUS) & (1 << 18);
}

/*----------------------------------------------------------------------------*/
/* FPGA Programming Code                                                      */
/*----------------------------------------------------------------------------*/

/*
 * Program some data to the FPGA fifo
 *
 * @priv the private data
 * @buf the data to program
 * @count the length of data to program (must be a multiple of 4 bytes)
 *
 * @return 0 on success, -ERRNO otherwise
 */
static int fpga_program_block(struct fpga_dev *priv, void *buf, size_t count)
{
	u32 *data = buf;
	int size = fpga_fifo_size(priv->regs);
	int i, len;
	unsigned long timeout;

	/* FIXME: BUG_ON instead */
	WARN_ON_ONCE(count % 4 != 0);

	while (count > 0) {

		/* Get the size of the block to write (maximum is FIFO_SIZE) */
		len = min_t(size_t, count, size);
		timeout = jiffies + HZ / 4;

		/* Write the block */
		for (i = 0; i < len / 4; i++)
			fpga_fifo_write(priv->regs, data[i]);

		/* Update the amounts left */
		count -= len;
		data += len / 4;

		/* Wait for the fifo to empty */
		while (true) {

			if (fpga_fifo_empty(priv->regs)) {
				break;
			} else {
				dev_dbg(priv->dev, "Fifo not empty\n");
				cpu_relax();
			}

			if (fpga_config_error(priv->regs)) {
				dev_err(priv->dev, "Error detected\n");
				return -EIO;
			}

			if (time_after(jiffies, timeout)) {
				dev_err(priv->dev, "Fifo timed out\n");
				return -ETIMEDOUT;
			}

			msleep(10);
		}
	}

	return 0;
}

/*
 * Program the FPGA's using the CPU
 *
 * @param priv the driver's private data structure
 * @return 0 on success, -ERRNO otherwise
 */
static noinline int fpga_program_cpu(struct fpga_dev *priv)
{
	struct fpga_fw_elem *elem;
	void *data;
	int ret;

	/* Disable the programmer */
	fpga_programmer_disable(priv);

	/* Set the total byte count */
	fpga_set_byte_count(priv->regs, priv->bytes);
	dev_dbg(priv->dev, "total byte count %u bytes\n", priv->bytes);

	/* Enable the controller for programming */
	fpga_programmer_enable(priv, false);
	dev_dbg(priv->dev, "enabled the controller\n");

	/* Write each chunk of the FPGA bitfile to FPGA programmer */
	list_for_each_entry(elem, &priv->list, entry) {
		data = kmap(elem->page);
		ret = fpga_program_block(priv, data, elem->len);
		kunmap(elem->page);

		if (ret)
			goto out_disable_controller;
	}

	/* Wait for the interrupt handler to notify us that programming finished */
	ret = wait_for_completion_timeout(&priv->completion, 2 * HZ);
	if (!ret) {
		dev_err(priv->dev, "Timed out waiting for completion\n");
		ret = -ETIMEDOUT;
		goto out_disable_controller;
	}

	/* Retrieve the status from the interrupt handler */
	ret = priv->status;

out_disable_controller:
	fpga_programmer_disable(priv);
	return ret;
}

/*
 * Program the FPGA's using the DMA controller
 *
 * @param priv the driver's private data structure
 * @return 0 on success, -ERRNO otherwise
 */
static noinline int fpga_program_dma(struct fpga_dev *priv)
{
	struct dma_chan *chan = priv->chan;
	struct dma_async_tx_descriptor *tx;
	struct fsl_dma_slave *slave;
	struct sg_table table;
	dma_cookie_t cookie;
	unsigned int nents, npages;
	size_t len, avail = 0;
	int ret;
	struct scatterlist *sg;
	struct fpga_fw_elem *elem;

	/* Disable the programmer */
	fpga_programmer_disable(priv);
	chan->device->device_terminate_all(chan);

	/* Allocate the DMA_SLAVE structure */
	slave = fsl_dma_slave_alloc(GFP_KERNEL);
	if (!slave) {
		dev_err(priv->dev, "Unable to allocate DMA_SLAVE structure\n");
		ret = -ENOMEM;
		goto out_return;
	}

	/* Set the DMA controller in external start mode */
	slave->external_start = true;

	/* Allocate the SG table */
	npages = list_num_entries(&priv->list);
	ret = sg_alloc_table(&table, npages, GFP_KERNEL);
	if (ret) {
		dev_err(priv->dev, "Unable to allocate SG table\n");
		goto out_free_slave;
	}

	/* Fill the SG table */
	sg = table.sgl;
	list_for_each_entry(elem, &priv->list, entry) {
		sg_set_page(sg, elem->page, elem->len, 0);
		sg = sg_next(sg);
	}

	/* Map the SG table for DMA */
	nents = dma_map_sg(priv->dmadev, table.sgl, npages, DMA_TO_DEVICE);
	if (nents <= 0) {
		dev_err(priv->dev, "Unable to DMA map SG table\n");
		ret = -ENOMEM;
		goto out_free_table;
	}

	/* Append the addresses to the DMA_SLAVE structure */
	avail = priv->bytes;
	while (avail > 0) {
		len = min_t(size_t, avail, PAGE_SIZE);
		ret = fsl_dma_slave_append(slave, 0xf0003000, len);
		if (ret) {
			dev_err(priv->dev, "Unable to append FIFO address\n");
			goto out_unmap_table;
		}

		avail -= len;
	}

	/* Submit the DMA slave */
	chan->private = slave;
	tx = chan->device->device_prep_slave_sg(chan, table.sgl, nents,
						DMA_TO_DEVICE, 0);
	if (!tx) {
		dev_err(priv->dev, "Unable to prep DMA_SLAVE transaction\n");
		ret = -ENOMEM;
		goto out_unmap_table;
	}

	/*
	 * Submit the transaction to the DMA controller
	 *
	 * We would leak memory if the submission failed, but that doesn't
	 * happen in the fsldma driver, so it isn't an issue
	 */
	cookie = tx->tx_submit(tx);
	if (dma_submit_error(cookie)) {
		dev_err(priv->dev, "Unable to submit DMA_SLAVE transaction\n");
		ret = -ENOMEM;
		goto out_unmap_table;
	}

	dma_async_memcpy_issue_pending(chan);

	/* Set the total byte count */
	fpga_set_byte_count(priv->regs, priv->bytes);
	dev_dbg(priv->dev, "total byte count %u bytes\n", priv->bytes);

	/* Enable the controller for DMA programming */
	fpga_programmer_enable(priv, true);
	dev_dbg(priv->dev, "enabled the controller\n");

	/* Wait for the interrupt handler to notify us that programming finished */
	ret = wait_for_completion_timeout(&priv->completion, 2 * HZ);
	if (!ret) {
		dev_err(priv->dev, "Timed out waiting for completion\n");
		ret = -ETIMEDOUT;
		goto out_disable_controller;
	}

	/* Retrieve the status from the interrupt handler */
	ret = priv->status;

out_disable_controller:
	fpga_programmer_disable(priv);
	chan->device->device_terminate_all(chan);
out_unmap_table:
	dma_unmap_sg(priv->dmadev, table.sgl, npages, DMA_TO_DEVICE);
out_free_table:
	sg_free_table(&table);
out_free_slave:
	fsl_dma_slave_free(slave);
out_return:
	return ret;
}

/*----------------------------------------------------------------------------*/
/* Interrupt Handling                                                         */
/*----------------------------------------------------------------------------*/

static irqreturn_t fpga_interrupt(int irq, void *dev_id)
{
	struct fpga_dev *priv = dev_id;

	/* Save the status */
	priv->status = fpga_config_error(priv->regs) ? -EIO : 0;
	dev_dbg(priv->dev, "INTERRUPT status %d\n", priv->status);
	fpga_dump_registers(priv);

	/* Disabling the programmer clears the interrupt */
	fpga_programmer_disable(priv);

	/* Notify any waiters */
	complete(&priv->completion);

	return IRQ_HANDLED;
}

/*----------------------------------------------------------------------------*/
/* SYSFS Helpers                                                              */
/*----------------------------------------------------------------------------*/

static int fpga_do_stop(struct fpga_dev *priv)
{
	/* TODO: anything else here ? */

	/* Set the led to unprogrammed */
	ledtrig_fpga_programmed(false);

	return 0;
}

static noinline int fpga_do_program(struct fpga_dev *priv)
{
	int ret;

	if (list_empty(&priv->list)) {
		dev_err(priv->dev, "No data to program\n");
		return -EINVAL;
	}

	if (!fpga_power_enabled(priv)) {
		dev_err(priv->dev, "Power not enabled\n");
		return -EINVAL;
	}

	if (!fpga_power_good(priv)) {
		dev_err(priv->dev, "Power not good\n");
		return -EINVAL;
	}

	/* Try to program the FPGA's using DMA */
	ret = fpga_program_dma(priv);

	/* If DMA failed or doesn't exist, try with CPU */
	if (ret) {
		dev_warn(priv->dev, "Falling back to CPU programming\n");
		ret = fpga_program_cpu(priv);
	}

	if (ret) {
		dev_err(priv->dev, "Unable to program FPGA's\n");
		return ret;
	}

	/* Drop the firmware bitfile from memory */
	fpga_drop_firmware_data(priv);

	dev_dbg(priv->dev, "FPGA programming successful\n");
	ledtrig_fpga_programmed(true);

	return 0;
}

/*----------------------------------------------------------------------------*/
/* File Operations                                                            */
/*----------------------------------------------------------------------------*/

static int fpga_open(struct inode *inode, struct file *filp)
{
	struct fpga_dev *priv = to_fpga_dev(inode->i_cdev);

	/* We only allow one process at a time */
	if (mutex_lock_interruptible(&priv->lock))
		return -ERESTARTSYS;

	filp->private_data = priv;

	/* Free any data left over from a previous open (if any) */
	fpga_drop_firmware_data(priv);

	return nonseekable_open(inode, filp);
}

static int fpga_release(struct inode *inode, struct file *filp)
{
	struct fpga_dev *priv = filp->private_data;

	mutex_unlock(&priv->lock);
	return 0;
}

static ssize_t fpga_write(struct file *filp, const char __user *buf,
			  size_t count, loff_t *f_pos)
{
	struct fpga_dev *priv = filp->private_data;
	struct fpga_fw_elem *elem, *tmp;
	size_t used, len, copy;
	struct list_head list;
	void *mem;
	int ret;

	/* Disallow firmware images larger than 16MB */
	if (priv->bytes >= FW_SIZE_MAX)
		return -ENOSPC;

	count = min_t(size_t, FW_SIZE_MAX - priv->bytes, count);

	INIT_LIST_HEAD(&list);
	len = count;
	used = 0;

	while (len > 0) {

		/* Allocate a list element */
		elem = fpga_fw_elem_alloc(GFP_KERNEL);
		if (!elem) {
			count = used;
			goto out_success;
		}

		/* Copy the data from userspace */
		copy = min_t(size_t, PAGE_SIZE, len);
		mem = kmap(elem->page);
		ret = copy_from_user(mem, buf + used, copy);
		kunmap(elem->page);

		if (ret) {
			count = -EFAULT;
			goto out_cleanup;
		}

		elem->len = copy;
		list_add_tail(&elem->entry, &list);

		len -= copy;
		used += copy;
	}

out_success:
	if (list_empty(&list))
		return -ENOMEM;

	list_splice_tail_init(&list, &priv->list);
	priv->bytes += count;

	*f_pos += count;
	return count;

out_cleanup:
	/* Free the last allocated element */
	fpga_fw_elem_free(elem);

	/* Free all of the elements on the temporary list */
	list_for_each_entry_safe(elem, tmp, &list, entry) {
		list_del(&elem->entry);
		fpga_fw_elem_free(elem);
	}

	return count;
}

static const struct file_operations fpga_fops = {
	.owner		= THIS_MODULE,
	.open		= fpga_open,
	.release	= fpga_release,
	.write		= fpga_write,
	.llseek		= no_llseek,
};

/*----------------------------------------------------------------------------*/
/* Device Attributes                                                          */
/*----------------------------------------------------------------------------*/

static ssize_t pgood_show(struct device *dev, struct device_attribute *attr,
			  char *buf)
{
	struct fpga_dev *priv = dev_get_drvdata(dev);
	return snprintf(buf, PAGE_SIZE, "%d\n", fpga_power_good(priv));
}

static ssize_t penable_show(struct device *dev, struct device_attribute *attr,
			    char *buf)
{
	struct fpga_dev *priv = dev_get_drvdata(dev);
	return snprintf(buf, PAGE_SIZE, "%d\n", fpga_power_enabled(priv));
}

static ssize_t penable_store(struct device *dev, struct device_attribute *attr,
			     const char *buf, size_t count)
{
	struct fpga_dev *priv = dev_get_drvdata(dev);
	unsigned long val;

	if (strict_strtoul(buf, 0, &val))
		return -EINVAL;

	if (val) {
		fpga_enable_power_supplies(priv);
	} else {
		fpga_do_stop(priv);
		fpga_disable_power_supplies(priv);
	}

	return count;
}

static ssize_t program_show(struct device *dev, struct device_attribute *attr,
			    char *buf)
{
	struct fpga_dev *priv = dev_get_drvdata(dev);
	return snprintf(buf, PAGE_SIZE, "%d\n", fpga_running(priv));
}

static ssize_t program_store(struct device *dev, struct device_attribute *attr,
			     const char *buf, size_t count)
{
	struct fpga_dev *priv = dev_get_drvdata(dev);
	unsigned long val;
	int ret;

	if (strict_strtoul(buf, 0, &val))
		return -EINVAL;

	/* We can't have an image writer and be programming simultaneously */
	if (mutex_lock_interruptible(&priv->lock))
		return -ERESTARTSYS;

	if (val) {
		ret = fpga_do_program(priv);
		if (ret)
			goto out_unlock;
	}

	ret = count;

out_unlock:
	mutex_unlock(&priv->lock);
	return ret;
}

static DEVICE_ATTR(power_good, S_IRUGO, pgood_show, NULL);
static DEVICE_ATTR(power_enable, S_IRUGO | S_IWUSR, penable_show, penable_store);
static DEVICE_ATTR(program, S_IRUGO | S_IWUSR, program_show, program_store);

static struct attribute *fpga_attributes[] = {
	&dev_attr_power_good.attr,
	&dev_attr_power_enable.attr,
	&dev_attr_program.attr,
	NULL,
};

static const struct attribute_group fpga_attr_group = {
	.attrs = fpga_attributes,
};

/*----------------------------------------------------------------------------*/
/* OpenFirmware Device Subsystem                                              */
/*----------------------------------------------------------------------------*/

static bool dma_filter(struct dma_chan *chan, void *data)
{
	/*
	 * DMA Channel #0 is the only acceptable device
	 *
	 * This probably won't survive an unload/load cycle of the Freescale
	 * DMAEngine driver, but that won't be a problem
	 */
	return chan->chan_id == 0 && chan->device->dev_id == 0;
}

static int fpga_of_remove(struct of_device *op)
{
	struct fpga_dev *priv = dev_get_drvdata(&op->dev);

	sysfs_remove_group(&priv->dev->kobj, &fpga_attr_group);

	cdev_del(&priv->cdev);
	free_irq(priv->irq, priv);

	iounmap(priv->immr);

	fpga_disable_power_supplies(priv);
	iounmap(priv->regs);

	dma_release_channel(priv->chan);
	carma_device_destroy(priv->devno);
	unregister_chrdev_region(priv->devno, 1);

	/* Free any firmware image that has not been programmed */
	fpga_drop_firmware_data(priv);

	mutex_destroy(&priv->lock);
	kfree(priv);

	return 0;
}

static int fpga_of_probe(struct of_device *op, const struct of_device_id *match)
{
	struct fpga_dev *priv;
	dma_cap_mask_t mask;
	u32 val;
	int ret;

	/* Allocate private data */
	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
	if (!priv) {
		dev_err(&op->dev, "Unable to allocate private data\n");
		ret = -ENOMEM;
		goto out_return;
	}

	dev_set_drvdata(&op->dev, priv);
	mutex_init(&priv->lock);
	init_completion(&priv->completion);
	cdev_init(&priv->cdev, &fpga_fops);
	priv->dmadev = &op->dev;
	INIT_LIST_HEAD(&priv->list);

	/* Allocate the character device */
	ret = alloc_chrdev_region(&priv->devno, 0, 1, driver_name);
	if (ret) {
		dev_err(&op->dev, "Unable to allocate chardev region\n");
		goto out_free_priv;
	}

	/* Allocate the CARMA device */
	priv->dev = carma_device_create(&op->dev, priv->devno, driver_name);
	if (IS_ERR(priv->dev)) {
		dev_err(&op->dev, "Unable to create CARMA device\n");
		ret = PTR_ERR(priv->dev);
		goto out_unregister_chrdev_region;
	}

	dev_set_drvdata(priv->dev, priv);
	dma_cap_zero(mask);
	dma_cap_set(DMA_MEMCPY, mask);
	dma_cap_set(DMA_INTERRUPT, mask);
	dma_cap_set(DMA_SLAVE, mask);

	/* Get control of DMA channel #0 */
	priv->chan = dma_request_channel(mask, dma_filter, NULL);
	if (!priv->chan) {
		dev_err(&op->dev, "Unable to acquire DMA channel #0\n");
		goto out_carma_device_destroy;
	}

	/* Remap the registers for use */
	priv->regs = of_iomap(op->node, 0);
	if (!priv->regs) {
		dev_err(&op->dev, "Unable to ioremap registers\n");
		ret = -ENOMEM;
		goto out_dma_release_channel;
	}

	/* Remap the IMMR for use */
	priv->immr = ioremap(get_immrbase(), 0x100000);
	if (!priv->immr) {
		dev_err(&op->dev, "Unable to ioremap IMMR\n");
		ret = -ENOMEM;
		goto out_unmap_regs;
	}

	/*
	 * Enable external DMA control pins
	 *
	 * WARNING: this must be done before attempting to set up the DMA
	 * WARNING: controller for externally initiated transfers. The default
	 * WARNING: state of the DMA control pins is incorrect for proper
	 * WARNING: operation of the FPGA programmer!!!
	 *
	 * NOTE: 2009-06-12: the U-Boot bootloader handles this now
	 *
	 * Failing to do so will cause the DMA controller to transfer a single
	 * cacheline worth of data, and then wedge itself.
	 */
	iowrite32be(0x80000E00, priv->immr + 0x114);

	/*
	 * TODO: make a better interface for this
	 *
	 * The DMA controller is not being used by the driver at this point,
	 * so we will just add the bits we need, and the driver will not alter
	 * them. This sets the following values:
	 *
	 * DMA Request Count: 8 cache lines
	 * Bandwidth Control: 8 cache lines
	 *
	 * These request count setting is necessary for the programmer to work
	 * in external start mode. The bandwidth control setting is optional.
	 */
	val = ioread32(priv->immr + 0x8100);
	iowrite32(val | 0x08600000, priv->immr + 0x8100);

	/* Enable the FPGA Programmer Interrupt */
	iowrite32be(0x1, priv->regs + 0x24); /* IRQ0 internal control */
	iowrite32be(0x1, priv->regs + 0x28); /* IRQ0 enabled */

	/* Find the correct IRQ number */
	priv->irq = irq_of_parse_and_map(op->node, 0);
	ret = request_irq(priv->irq, fpga_interrupt, IRQF_SHARED, driver_name, priv);
	if (ret) {
		dev_err(&op->dev, "Unable to request IRQ %d\n", priv->irq);
		ret = -ENODEV;
		goto out_unmap_immr;
	}

	/* Reset and stop the FPGA's, just in case */
	fpga_do_stop(priv);

	/* Enable the FPGA power supplies */
	ret = fpga_enable_power_supplies(priv);
	if (ret) {
		dev_err(&op->dev, "Unable to enable FPGA power supplies\n");
		ret = -ENODEV;
		goto out_free_irq;
	}

	/* Register the character device */
	ret = cdev_add(&priv->cdev, priv->devno, 1);
	if (ret) {
		dev_err(&op->dev, "Unable to add character device\n");
		goto out_disable_power_supplies;
	}

	/* Create the sysfs files */
	ret = sysfs_create_group(&priv->dev->kobj, &fpga_attr_group);
	if (ret) {
		dev_err(&op->dev, "Unable to create sysfs files\n");
		goto out_cdev_del;
	}

	dev_info(priv->dev, "CARMA FPGA Programmer Driver loaded\n");
	return 0;

out_cdev_del:
	cdev_del(&priv->cdev);
out_disable_power_supplies:
	fpga_disable_power_supplies(priv);
out_free_irq:
	free_irq(priv->irq, priv);
out_unmap_immr:
	iounmap(priv->immr);
out_unmap_regs:
	iounmap(priv->regs);
out_dma_release_channel:
	dma_release_channel(priv->chan);
out_carma_device_destroy:
	carma_device_destroy(priv->devno);
out_unregister_chrdev_region:
	unregister_chrdev_region(priv->devno, 1);
out_free_priv:
	mutex_destroy(&priv->lock);
	kfree(priv);
out_return:
	return ret;
}

static struct of_device_id fpga_of_match[] = {
	{ .compatible = "carma,fpga-programmer", },
	{},
};

static struct of_platform_driver fpga_of_driver = {
	.owner		= THIS_MODULE,
	.name		= driver_name,
	.match_table	= fpga_of_match,
	.probe		= fpga_of_probe,
	.remove		= fpga_of_remove,
};

/*----------------------------------------------------------------------------*/
/* Module Init / Exit                                                         */
/*----------------------------------------------------------------------------*/

static int __init fpga_init(void)
{
	led_trigger_register_simple("fpga", &ledtrig_fpga);
	return of_register_platform_driver(&fpga_of_driver);
}

static void __exit fpga_exit(void)
{
	of_unregister_platform_driver(&fpga_of_driver);
	led_trigger_unregister_simple(ledtrig_fpga);
}

MODULE_AUTHOR("Ira W. Snyder <iws@ovro.caltech.edu>");
MODULE_DESCRIPTION("CARMA FPGA Programmer Driver");
MODULE_LICENSE("GPL");

module_init(fpga_init);
module_exit(fpga_exit);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-16 20:12   ` Ira Snyder
@ 2009-06-17 17:17     ` Dan Williams
  2009-06-17 18:29       ` Ira Snyder
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2009-06-17 17:17 UTC (permalink / raw)
  To: Ira Snyder; +Cc: linuxppc-dev, Li Yang

Ira Snyder wrote:
>> Using EXPORT_SYMBOL would defeat the purpose of conforming to the
>> dmaengine api which should allow other subsystems to generically
>> discover an fsldma resource.
>>
> 
> Any driver would still use dma_request_channel(), etc. to get access to
> a DMA channel. AFAICT, DMA_SLAVE is intended for doing something
> completely hardware-specific with the DMA controller.

Yes.  Specifically DMA_SLAVE operations imply a pre-established context 
with a 3rd device (the other 2 devices being system memory and the dma 
channel), as compared to plain memcpy.  The assumption is that a dma 
device with this capability may not be shared with other requesters 
until the context is torn down.

[..]
> What I've implemented is this: (sorry about the poor drawing)
> 
> scatterlist           fsl_dma_hw_addr
> +--------+            +-------+
> |  DATA  | -------->  | DEST1 |
> |  DATA  | ----+      +-------+
> |  DATA  |     |
> |  DATA  |     |      +-------+
> |  DATA  |     +--->  | DEST2 |
> |  DATA  |            +-------+
> +--------+
>                           .
>                           .
>                           .
> 
> Of course, the reverse works as well. You can copy from a list of
> hardware address/length pairs to a scatterlist.
> 
> So, using my implementation of the DMA_SLAVE feature, you can take a big
> chunk of data (which is organized into a scatterlist) and DMA it
> directly to a set of hardware addresses, all in a single, unbroken
> transaction.

Could the same effect be achieved by calling ->prep_slave_sg multiple 
times?  Once you need to add dma-driver specific helper routines you are 
effectively extending the dmaengine interface in an fsldma specific 
fashion.  Can we reduce this to just the existing primitives?  If it 
turns out that this is untenable can we extend dmaengine to make this a 
generic capability?  My preference is the former.

> I've inlined the driver for the FPGA programmer below. I don't think it
> is appropriate to push into mainline, since it will only work for our
> board, and nothing else.

If we find that we need to extend the dmaengine interface we will need 
an in-tree user.  In my opinion, as long as it passes the Voyager test 
[1] then I do not see why it should be barred from upstream.

[..]
> He also used platform data to get the register addresses. I'm unaware of
> a way to put arbitrary platform data into the OF tree used on PowerPC.
> 

Anybody else on ppc-dev know if this is possible??

> I didn't want to force other users to implement the allocation routines
> for the struct fsl_dma_hw_addr themselves, so I provided routines to do
> so.

It depends on how many users of this feature there ends up being, 
pushing this into each driver that needs it would not be too horrible 
especially if it just the three straightforward routines 
(fsl_dma_slave_append, fsl_dma_slave_free, and fsl_dma_slave_alloc).

So, the three options in order of preference are:
1/ arrange to call ->prep multiple times to handle each dma address
2/ push the functionality down into the individual drivers that need it
3/ up level the functionality into dmaengine to make it generically 
available

Thanks,
Dan

[1]: The Voyager test refers to James Bottomley's maintenance of the 
Voyager x86-sub-architecture.  If at least one person cares about a 
feature, is willing to maintain the code, and it does not impose a 
maintenance burden on other parts of the tree then it can go upstream.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-17 17:17     ` Dan Williams
@ 2009-06-17 18:29       ` Ira Snyder
  2009-06-18 18:16         ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Ira Snyder @ 2009-06-17 18:29 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, Li Yang

On Wed, Jun 17, 2009 at 10:17:54AM -0700, Dan Williams wrote:
> Ira Snyder wrote:
>>> Using EXPORT_SYMBOL would defeat the purpose of conforming to the
>>> dmaengine api which should allow other subsystems to generically
>>> discover an fsldma resource.
>>>
>>
>> Any driver would still use dma_request_channel(), etc. to get access to
>> a DMA channel. AFAICT, DMA_SLAVE is intended for doing something
>> completely hardware-specific with the DMA controller.
>
> Yes.  Specifically DMA_SLAVE operations imply a pre-established context  
> with a 3rd device (the other 2 devices being system memory and the dma  
> channel), as compared to plain memcpy.  The assumption is that a dma  
> device with this capability may not be shared with other requesters  
> until the context is torn down.
>
> [..]
>> What I've implemented is this: (sorry about the poor drawing)
>>
>> scatterlist           fsl_dma_hw_addr
>> +--------+            +-------+
>> |  DATA  | -------->  | DEST1 |
>> |  DATA  | ----+      +-------+
>> |  DATA  |     |
>> |  DATA  |     |      +-------+
>> |  DATA  |     +--->  | DEST2 |
>> |  DATA  |            +-------+
>> +--------+
>>                           .
>>                           .
>>                           .
>>
>> Of course, the reverse works as well. You can copy from a list of
>> hardware address/length pairs to a scatterlist.
>>
>> So, using my implementation of the DMA_SLAVE feature, you can take a big
>> chunk of data (which is organized into a scatterlist) and DMA it
>> directly to a set of hardware addresses, all in a single, unbroken
>> transaction.
>
> Could the same effect be achieved by calling ->prep_slave_sg multiple  
> times?  Once you need to add dma-driver specific helper routines you are  
> effectively extending the dmaengine interface in an fsldma specific  
> fashion.  Can we reduce this to just the existing primitives?  If it  
> turns out that this is untenable can we extend dmaengine to make this a  
> generic capability?  My preference is the former.
>

I can do something similar by calling device_prep_dma_memcpy() lots of
times. Once for each page in the scatterlist.

This has the advantage of not changing the DMAEngine API.

This has the disadvantage of not being a single transaction. The DMA
controller will get an interrupt after each memcpy() transaction, clean
up descriptors, etc.

It also has the disadvantage of not being able to change the
controller-specific features I'm using: external start. This feature
lets the "3rd device" control the DMA controller. It looks like the
atmel-mci driver has a similar feature. The DMAEngine API has no way to
expose this type of feature except through DMA_SLAVE.

If it is just the 3 helper routines that are the issue with this patch,
I have no problem dropping them and letting each user re-create them
themselves.

>> I've inlined the driver for the FPGA programmer below. I don't think it
>> is appropriate to push into mainline, since it will only work for our
>> board, and nothing else.
>
> If we find that we need to extend the dmaengine interface we will need  
> an in-tree user.  In my opinion, as long as it passes the Voyager test  
> [1] then I do not see why it should be barred from upstream.
>

Yep, I understand the Voyager test. I just didn't think it would be
useful to anyone to try and push it upstream, especially since the
hardware is in-house only.

The virtio-over-pci patches I've posted a while back may benefit from
this as well. They look more likely to go upstream. I've been really
busy and haven't had time to work on them recently, but Grant Likely is
working on them at the moment.

> [..]
>> He also used platform data to get the register addresses. I'm unaware of
>> a way to put arbitrary platform data into the OF tree used on PowerPC.
>>
>
> Anybody else on ppc-dev know if this is possible??
>
>> I didn't want to force other users to implement the allocation routines
>> for the struct fsl_dma_hw_addr themselves, so I provided routines to do
>> so.
>
> It depends on how many users of this feature there ends up being,  
> pushing this into each driver that needs it would not be too horrible  
> especially if it just the three straightforward routines  
> (fsl_dma_slave_append, fsl_dma_slave_free, and fsl_dma_slave_alloc).
>

Right, the routines are very simple. I wouldn't have any problem leaving
it up to users.

> So, the three options in order of preference are:
> 1/ arrange to call ->prep multiple times to handle each dma address

This doesn't seem quite right. I'd end up doing something like:

struct fsl_dma_slave slave;
slave.external_start = true;
slave.address = 0xf0003000;
slave.length = 4096;
chan->private = &slave;

for_each_sg(sgl, sg, sg_len, i) {
	device_prep_slave_sg(chan, sg, 1, DMA_TO_DEVICE, 0);
}

That pretty much defeats the purpose of the scatterlist argument,
doesn't it? Also, it is no better than just calling
device_prep_dma_memcpy() lots of times. You don't get a single DMA
transaction.

It is trivial to implement a dma_addr_t to dma_addr_t memcpy function.
Here is a version I've used in the past, based very heavily on
dma_async_memcpy_buf_to_buf():

static dma_cookie_t dma_async_memcpy_raw_to_raw(struct dma_chan *chan,
                                               dma_addr_t dst,
                                               dma_addr_t src,
                                               size_t len)
{
        struct dma_device *dev = chan->device;
        struct dma_async_tx_descriptor *tx;
        enum dma_ctrl_flags flags;
        dma_cookie_t cookie;
        int cpu;

        flags = DMA_COMPL_SKIP_SRC_UNMAP | DMA_COMPL_SKIP_DEST_UNMAP;
        tx = dev->device_prep_dma_memcpy(chan, dst, src, len, flags);
        if (!tx)
                return -ENOMEM;

        tx->callback = NULL;
        cookie = tx->tx_submit(tx);

        cpu = get_cpu();
        per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
        per_cpu_ptr(chan->local, cpu)->memcpy_count++;
        put_cpu();

        return cookie;
}

I've used it when copying data from another machine over PCI, and for
copying to peripherals, such as the FPGA programmer.

> 2/ push the functionality down into the individual drivers that need it
> 3/ up level the functionality into dmaengine to make it generically  
> available

A single-transaction scatterlist-to-scatterlist copy would be nice.

One of the major advantages of working with the DMA controller is that
it automatically handles scatter/gather. Almost all DMA controllers have
the feature, but it is totally missing from the high-level API.

Thanks,
Ira

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-17 18:29       ` Ira Snyder
@ 2009-06-18 18:16         ` Dan Williams
  2009-06-18 20:50           ` Ira Snyder
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2009-06-18 18:16 UTC (permalink / raw)
  To: Ira Snyder; +Cc: linuxppc-dev, Li Yang

Ira Snyder wrote:
> I can do something similar by calling device_prep_dma_memcpy() lots of
> times. Once for each page in the scatterlist.
> 
> This has the advantage of not changing the DMAEngine API.
> 
> This has the disadvantage of not being a single transaction. The DMA
> controller will get an interrupt after each memcpy() transaction, clean
> up descriptors, etc.
>

Why would it need an interrupt per memcpy?  There is a 
DMA_PREP_INTERRUPT flag to gate interrupt generation to the last 
descriptor.  This is how async_tx delays callbacks until the last 
operation in a chain has completed.  Also, I am not currently seeing how 
your implementation achieves a single hardware transaction.  It still 
calls fsl_dma_alloc_descriptor() per address pair?

The api currently allows a call to ->prep_* to generate multiple 
internal descriptors for a single input address (i.e. memcpy in the case 
where the transfer length exceeds the hardware maximum).  The slave api 
allows for transfers from a scatterlist to a slave context.  I think 
what is bothering me is that the fsldma slave implementation is passing 
a list of sideband addresses rather than a constant address context like 
the existing dw_dmac, so it is different.  However, I can now see that 
trying to enforce uniformity in this area is counterproductive.  The 
DMA_SLAVE interface will always have irreconcilable differences across 
architectures.

> It also has the disadvantage of not being able to change the
> controller-specific features I'm using: external start. This feature
> lets the "3rd device" control the DMA controller. It looks like the
> atmel-mci driver has a similar feature. The DMAEngine API has no way to
> expose this type of feature except through DMA_SLAVE.

Yeah, an example of an architecture specific quirk that has no chance of 
being uniformly handled in a generic api.

> If it is just the 3 helper routines that are the issue with this patch,
> I have no problem dropping them and letting each user re-create them
> themselves.

I think this makes the most sense at this point.  We can reserve 
judgement on the approach until the next fsldma-slave user arrives and 
tries to use this implementation for their device.  Can we move the 
header file under arch/powerpc/include?

[..]
> A single-transaction scatterlist-to-scatterlist copy would be nice.
> 
> One of the major advantages of working with the DMA controller is that
> it automatically handles scatter/gather. Almost all DMA controllers have
> the feature, but it is totally missing from the high-level API.

The engines I am familiar with (ioatdma and iop-adma) still need a 
hardware descriptor per address pair I do not see how fsldma does this 
any more automatically than those engines?  I could see having a helper 
routine that does the mapping and iterating, but in the end the driver 
still sees multiple calls to ->prep (unless of course you are doing this 
in a DMA_SLAVE context, then anything goes).

Thanks,
Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-18 18:16         ` Dan Williams
@ 2009-06-18 20:50           ` Ira Snyder
  2009-06-18 21:36             ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Ira Snyder @ 2009-06-18 20:50 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, Li Yang

On Thu, Jun 18, 2009 at 11:16:03AM -0700, Dan Williams wrote:
> Ira Snyder wrote:
>> I can do something similar by calling device_prep_dma_memcpy() lots of
>> times. Once for each page in the scatterlist.
>>
>> This has the advantage of not changing the DMAEngine API.
>>
>> This has the disadvantage of not being a single transaction. The DMA
>> controller will get an interrupt after each memcpy() transaction, clean
>> up descriptors, etc.
>>
>
> Why would it need an interrupt per memcpy?  There is a  
> DMA_PREP_INTERRUPT flag to gate interrupt generation to the last  
> descriptor.  This is how async_tx delays callbacks until the last  
> operation in a chain has completed.  Also, I am not currently seeing how  
> your implementation achieves a single hardware transaction.  It still  
> calls fsl_dma_alloc_descriptor() per address pair?
>

Ok, there are a few things here:

1) an interrupt per memcpy

The *software* using DMAEngine doesn't need one interrupt per memcpy.
That is controlled by the DMA_PREP_INTERRUPT flag, exactly as you
describe.

The Freescale DMA driver DOES use one interrupt per async_tx descriptor.
See drivers/dma/fsldma.c dma_init() and append_ld_queue().

The FSL_DMA_MR_EOTIE flag in dma_init() tells the controller to generate
an interrupt at the end of each transfer. A transfer is (potentially
long) list of address pairs / hardware descriptors.

The FSL_DMA_MR_EOSIE flag in append_ld_queue() tells the controller to
generate an interrupt at the end of transferring this hardware
descriptor (AKA segment). The driver combines multiple memcpy operations
into a single transfer. When the driver combines operations, it adds the
EOSIE flag to the descriptor that would-have-been the end of a transfer.
It uses this interrupt to update the DMA cookie, presumably to speed up
users of dma_sync_wait() when there are lots of users sharing the DMA
controller.

Let me try to explain what will happen with some code:

=== Case 1: Two seperate transfers ===

dma_cookie_t t1, t2;
t1 = dma_async_memcpy_buf_to_buf(...);
dma_async_memcpy_issue_pending();

/*
 * some time goes by, the DMA transfer completes,
 * and the controller gets the end-of-transfer interrupt
 */

t2 = dma_async_memcpy_buf_to_buf(...);
dma_async_memcpy_issue_pending();

/*
 * some time goes by, the DMA transfer completes,
 * and the controller gets the end-of-transfer interrupt
 */

This is exactly what I would expect from the API. There are two seperate
transfers, and there are two hardware interrupts. Here is a crude
timeline.

----|----------|----------------------------|----------|-------
    |          |                            |          |
    t1 start   t1 end, EOT interrupt        t2 start   t2 end, EOT interrupt

=== Case 2: Two seperate transfers, merged by the driver ===

t1 = dma_async_memcpy_buf_to_buf(...);
dma_async_memcpy_issue_pending();

/*
 * the controller starts executing the transfer, but has not
 * finished yet
 */
t2 = dma_async_memcpy_buf_to_buf(...);

/*
 * append_ld_queue() sets the EOSIE flag on the last hardware
 * descriptor in t1, then sets the next link descriptor to the
 * first descriptor in t2
 */

Now there are two transfers, and again two hardware interrupts. Again,
not really a problem. Every part of the API still works as expected.

----|-----------|---------------|---------------------------------|--------
    |           |               |                                 |
    t1 start    t2 tx_submit()  t1 end, EOS interrupt, t2 start   t2 end, EOT interrupt

=== Case 3: Two transfers, merged by the driver ===

t1 = dma_async_memcpy_buf_to_buf(...);
t2 = dma_async_memcpy_buf_to_buf(...);
dma_async_memcpy_issue_pending();

After a very careful reading of the driver, I just noticed that if there
is no transfer in progress (as would be expected on a private channel)
then the EOS interrupt never gets set, and the requests are simply
merged together. This would lead to a timeline like this:

----|----------------|----------------------|--------------------------
    |                |                      |
    t1 start         t1 end, t2 start       t2 end, EOT interrupt

This is perfectly fine as well. It is the behavior I want.

Some more study of the code shows that the Freescale DMA driver will not
halt the channel as long as the channel is busy, meaning that it will
not clear the external start bits during a transfer.

So, in conclusion, I can call memcpy multiple times and have it all
merged into a single transfer by the driver, with a single hardware
interrupt (at the end-of-transfer) and have everything complete without
halting the DMA controller.

2) Single transaction

I think we're using different terms here. I define a single transaction
to be the time while the DMA controller is busy transferring things.

In case #1 above, there are two transfers. In case #2 above, one
transfer, and two interrupts. In case #3 above, one transfer, one
interrupt.

3) Hardware descriptor per address pair

Yes, there can be many hardware descriptors per address pair. And
therefore many hardware descriptors per transaction, with my definition.

> The api currently allows a call to ->prep_* to generate multiple  
> internal descriptors for a single input address (i.e. memcpy in the case  
> where the transfer length exceeds the hardware maximum).  The slave api  
> allows for transfers from a scatterlist to a slave context.  I think  
> what is bothering me is that the fsldma slave implementation is passing  
> a list of sideband addresses rather than a constant address context like  
> the existing dw_dmac, so it is different.  However, I can now see that  
> trying to enforce uniformity in this area is counterproductive.  The  
> DMA_SLAVE interface will always have irreconcilable differences across  
> architectures.
>

Yep, we're in agreement here.

In another driver, I used this DMA_SLAVE API to transfer from hardware
addresses to a scatterlist. I have ~50 blocks, all at different
non-contiguous addresses that I want transferred into a single
scatterlist.

It was awfully convenient to have this happen in a single call, rather
than 50 seperate calls to dma_async_memcpy_buf_to_buf().

>> It also has the disadvantage of not being able to change the
>> controller-specific features I'm using: external start. This feature
>> lets the "3rd device" control the DMA controller. It looks like the
>> atmel-mci driver has a similar feature. The DMAEngine API has no way to
>> expose this type of feature except through DMA_SLAVE.
>
> Yeah, an example of an architecture specific quirk that has no chance of  
> being uniformly handled in a generic api.
>

Again, we're in agreement.

>> If it is just the 3 helper routines that are the issue with this patch,
>> I have no problem dropping them and letting each user re-create them
>> themselves.
>
> I think this makes the most sense at this point.  We can reserve  
> judgement on the approach until the next fsldma-slave user arrives and  
> tries to use this implementation for their device.  Can we move the  
> header file under arch/powerpc/include?
>

Sure, that would be fine.

> [..]
>> A single-transaction scatterlist-to-scatterlist copy would be nice.
>>
>> One of the major advantages of working with the DMA controller is that
>> it automatically handles scatter/gather. Almost all DMA controllers have
>> the feature, but it is totally missing from the high-level API.
>
> The engines I am familiar with (ioatdma and iop-adma) still need a  
> hardware descriptor per address pair I do not see how fsldma does this  
> any more automatically than those engines?  I could see having a helper  
> routine that does the mapping and iterating, but in the end the driver  
> still sees multiple calls to ->prep (unless of course you are doing this  
> in a DMA_SLAVE context, then anything goes).
>

Yep. The Freescale DMA controller behaves exactly as you describe the
ioatdma and iop-dma engines. A helper routine that does the mapping and
iterating would be nice, in my opinion. It took me a while to convince
myself that the nested loops in device_prep_slave_sg() were correct.

Following on the earlier observations in this email, it would be
possible to emulate the slave behavior using just
dma_async_memcpy_buf_to_buf(). With the exception of the external start
bit, of course.

Now, the only thing I would use device_prep_slave_sg() for would be to
set the external start bit. No actual data transfer needs to happen, no
descriptors need to be allocated. Just the "Enable extra controller
features" block from my patch. This seems like an abuse of the DMA_SLAVE
API. What would be returned in that case?  An async_tx descriptor is
wrong, because the controller won't be able to free() it...

So, I see a couple of ways of moving forward:
1) Keep my implementation, moving the includes to arch/powerpc/include.
   Do we drop the allocation functions?

2) Drop the data transfer part of device_prep_slave_sg(), and just use
   it for setting device-specific bits.

Thanks for all the input Dan. I finally feel like we're getting
somewhere :)

Ira

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
  2009-06-18 20:50           ` Ira Snyder
@ 2009-06-18 21:36             ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2009-06-18 21:36 UTC (permalink / raw)
  To: Ira Snyder; +Cc: linuxppc-dev, Li Yang

Ira Snyder wrote:
> So, I see a couple of ways of moving forward:
> 1) Keep my implementation, moving the includes to arch/powerpc/include.
>    Do we drop the allocation functions?

+1 for option 1.  Having it under arch/powerpc/include makes it clear 
that it is a powerpc specific api, so keep the allocation routines. 
Copy Kumar on the updated patch as I'll need a ppc-dev ack for carrying 
this file addition through the dmaengine tree.

> Thanks for all the input Dan. I finally feel like we're getting
> somewhere :)

Thanks for the exchange it always helps to get a good picture of the 
underlying design rationale.

Regards,
Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-06-18 21:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-15 22:56 [RFC PATCH] fsldma: Add DMA_SLAVE support Ira Snyder
2009-06-03 18:10 ` Ira Snyder
2009-06-04 11:20   ` Li Yang-R58472
2009-06-04 16:22     ` Ira Snyder
2009-06-16 19:01 ` Dan Williams
2009-06-16 20:12   ` Ira Snyder
2009-06-17 17:17     ` Dan Williams
2009-06-17 18:29       ` Ira Snyder
2009-06-18 18:16         ` Dan Williams
2009-06-18 20:50           ` Ira Snyder
2009-06-18 21:36             ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.