All of lore.kernel.org
 help / color / mirror / Atom feed
* FSL DMA engine transfer to PCI memory
@ 2011-01-24 21:47 Felix Radensky
  2011-01-24 22:26 ` Ira W. Snyder
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Felix Radensky @ 2011-01-24 21:47 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

I'm trying to use FSL DMA engine to perform DMA transfer from
memory buffer obtained by kmalloc() to PCI memory. This is on
custom board based on P2020 running linux-2.6.35. The PCI
device is Altera FPGA, connected directly to SoC PCI-E controller.

01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device 
0004 (rev 01)
         Subsystem: Altera Corporation Unknown device 0004
         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast 
 >TAbort- <TAbort- <MAbort- >SERR- <PERR-
         Interrupt: pin A routed to IRQ 16
         Region 0: Memory at c0000000 (32-bit, non-prefetchable) 
[size=128K]
         Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/0 Enable-
                 Address: 0000000000000000  Data: 0000
         Capabilities: [78] Power Management version 3
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [80] Express Endpoint IRQ 0
                 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, 
ExtTag-
                 Device: Latency L0s <64ns, L1 <1us
                 Device: AtnBtn- AtnInd- PwrInd-
                 Device: Errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                 Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
                 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
                 Link: Latency L0s unlimited, L1 unlimited
                 Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
                 Link: Speed 2.5Gb/s, Width x1
         Capabilities: [100] Virtual Channel


I can successfully writel() to PCI memory via address obtained from 
pci_ioremap_bar().
Here's my DMA transfer routine

static int dma_transfer(struct dma_chan *chan, void *dst, void *src, 
size_t len)
{
     int rc = 0;
     dma_addr_t dma_src;
     dma_addr_t dma_dst;
     dma_cookie_t cookie;
     struct completion cmp;
     enum dma_status status;
     enum dma_ctrl_flags flags = 0;
     struct dma_device *dev = chan->device;
     struct dma_async_tx_descriptor *tx = NULL;
     unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);

     dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
     if (dma_mapping_error(dev->dev, dma_src)) {
         printk(KERN_ERR "Failed to map src for DMA\n");
         return -EIO;
     }

     dma_dst = (dma_addr_t)dst;

     flags = DMA_CTRL_ACK |
         DMA_COMPL_SRC_UNMAP_SINGLE  |
         DMA_COMPL_SKIP_DEST_UNMAP |
         DMA_PREP_INTERRUPT;

     tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
     if (!tx) {
         printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
                __FUNCTION__);
         dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
         return -ENOMEM;
     }

     init_completion(&cmp);
     tx->callback = dma_callback;
     tx->callback_param = &cmp;
     cookie = tx->tx_submit(tx);

     if (dma_submit_error(cookie)) {
         printk(KERN_ERR "%s: Failed to start DMA transfer\n",
                __FUNCTION__);
         return -ENOMEM;
     }

     dma_async_issue_pending(chan);

     tmo = wait_for_completion_timeout(&cmp, tmo);
     status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);

     if (tmo == 0) {
         printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
         rc = -ETIMEDOUT;
     } else if (status != DMA_SUCCESS) {
         printk(KERN_ERR "%s: Transfer failed: status is %s\n",
                __FUNCTION__,
                status == DMA_ERROR ? "error" : "in progress");

         dev->device_control(chan, DMA_TERMINATE_ALL, 0);
         rc = -EIO;
     }

     return rc;
}

The destination address is PCI memory address returned by 
pci_ioremap_bar().
The transfer silently fails, destination buffer doesn't change 
contents, but no
error condition is reported.

What am I doing wrong ?

Thanks a lot in advance.

Felix.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-24 21:47 FSL DMA engine transfer to PCI memory Felix Radensky
@ 2011-01-24 22:26 ` Ira W. Snyder
  2011-01-24 23:39   ` Felix Radensky
  2011-01-24 22:44 ` Scott Wood
  2011-01-25  8:56 ` David Laight
  2 siblings, 1 reply; 14+ messages in thread
From: Ira W. Snyder @ 2011-01-24 22:26 UTC (permalink / raw)
  To: Felix Radensky; +Cc: linuxppc-dev

On Mon, Jan 24, 2011 at 11:47:22PM +0200, Felix Radensky wrote:
> Hi,
> 
> I'm trying to use FSL DMA engine to perform DMA transfer from
> memory buffer obtained by kmalloc() to PCI memory. This is on
> custom board based on P2020 running linux-2.6.35. The PCI
> device is Altera FPGA, connected directly to SoC PCI-E controller.
> 
> 01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device 
> 0004 (rev 01)
>          Subsystem: Altera Corporation Unknown device 0004
>          Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
>          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast 
>  >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>          Interrupt: pin A routed to IRQ 16
>          Region 0: Memory at c0000000 (32-bit, non-prefetchable) 
> [size=128K]
>          Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
> Queue=0/0 Enable-
>                  Address: 0000000000000000  Data: 0000
>          Capabilities: [78] Power Management version 3
>                  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                  Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>          Capabilities: [80] Express Endpoint IRQ 0
>                  Device: Supported: MaxPayload 256 bytes, PhantFunc 0, 
> ExtTag-
>                  Device: Latency L0s <64ns, L1 <1us
>                  Device: AtnBtn- AtnInd- PwrInd-
>                  Device: Errors: Correctable- Non-Fatal- Fatal- 
> Unsupported-
>                  Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                  Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
>                  Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
>                  Link: Latency L0s unlimited, L1 unlimited
>                  Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
>                  Link: Speed 2.5Gb/s, Width x1
>          Capabilities: [100] Virtual Channel
> 
> 
> I can successfully writel() to PCI memory via address obtained from 
> pci_ioremap_bar().
> Here's my DMA transfer routine
> 
> static int dma_transfer(struct dma_chan *chan, void *dst, void *src, 
> size_t len)
> {
>      int rc = 0;
>      dma_addr_t dma_src;
>      dma_addr_t dma_dst;
>      dma_cookie_t cookie;
>      struct completion cmp;
>      enum dma_status status;
>      enum dma_ctrl_flags flags = 0;
>      struct dma_device *dev = chan->device;
>      struct dma_async_tx_descriptor *tx = NULL;
>      unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
> 
>      dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
>      if (dma_mapping_error(dev->dev, dma_src)) {
>          printk(KERN_ERR "Failed to map src for DMA\n");
>          return -EIO;
>      }
> 
>      dma_dst = (dma_addr_t)dst;
> 
>      flags = DMA_CTRL_ACK |
>          DMA_COMPL_SRC_UNMAP_SINGLE  |
>          DMA_COMPL_SKIP_DEST_UNMAP |
>          DMA_PREP_INTERRUPT;
> 
>      tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
>      if (!tx) {
>          printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
>                 __FUNCTION__);
>          dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
>          return -ENOMEM;
>      }
> 
>      init_completion(&cmp);
>      tx->callback = dma_callback;
>      tx->callback_param = &cmp;
>      cookie = tx->tx_submit(tx);
> 
>      if (dma_submit_error(cookie)) {
>          printk(KERN_ERR "%s: Failed to start DMA transfer\n",
>                 __FUNCTION__);
>          return -ENOMEM;
>      }
> 
>      dma_async_issue_pending(chan);
> 
>      tmo = wait_for_completion_timeout(&cmp, tmo);
>      status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
> 
>      if (tmo == 0) {
>          printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
>          rc = -ETIMEDOUT;
>      } else if (status != DMA_SUCCESS) {
>          printk(KERN_ERR "%s: Transfer failed: status is %s\n",
>                 __FUNCTION__,
>                 status == DMA_ERROR ? "error" : "in progress");
> 
>          dev->device_control(chan, DMA_TERMINATE_ALL, 0);
>          rc = -EIO;
>      }
> 
>      return rc;
> }
> 
> The destination address is PCI memory address returned by 
> pci_ioremap_bar().
> The transfer silently fails, destination buffer doesn't change 
> contents, but no
> error condition is reported.
> 
> What am I doing wrong ?
> 
> Thanks a lot in advance.
> 

Your destination address is wrong. The device_prep_dma_memcpy() routine
works in physical addresses only (dma_addr_t type). Your source address
looks fine: you're using the result of dma_map_single(), which returns a
physical address.

Your destination address should be something that comes from struct
pci_dev.resource[x].start + offset if necessary. In your lspci output
above, that will be 0xc0000000.

Another possible problem: AFAIK you must use the _ONSTACK() variants
from include/linux/completion.h for struct completion which are on the
stack.

Hope it helps,
Ira

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-24 21:47 FSL DMA engine transfer to PCI memory Felix Radensky
  2011-01-24 22:26 ` Ira W. Snyder
@ 2011-01-24 22:44 ` Scott Wood
  2011-01-25  8:56 ` David Laight
  2 siblings, 0 replies; 14+ messages in thread
From: Scott Wood @ 2011-01-24 22:44 UTC (permalink / raw)
  To: Felix Radensky; +Cc: linuxppc-dev

On Mon, 24 Jan 2011 23:47:22 +0200
Felix Radensky <felix@embedded-sol.com> wrote:

> static int dma_transfer(struct dma_chan *chan, void *dst, void *src, 
> size_t len)
> {
>      int rc = 0;
>      dma_addr_t dma_src;
>      dma_addr_t dma_dst;
>      dma_cookie_t cookie;
>      struct completion cmp;
>      enum dma_status status;
>      enum dma_ctrl_flags flags = 0;
>      struct dma_device *dev = chan->device;
>      struct dma_async_tx_descriptor *tx = NULL;
>      unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
> 
>      dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
>      if (dma_mapping_error(dev->dev, dma_src)) {
>          printk(KERN_ERR "Failed to map src for DMA\n");
>          return -EIO;
>      }
> 
>      dma_dst = (dma_addr_t)dst;

Why are you casting a virtual address to dma_addr_t?

> The destination address is PCI memory address returned by 
> pci_ioremap_bar().

You need the physical address.

-Scott

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-24 22:26 ` Ira W. Snyder
@ 2011-01-24 23:39   ` Felix Radensky
  2011-01-25  0:18     ` Ira W. Snyder
  0 siblings, 1 reply; 14+ messages in thread
From: Felix Radensky @ 2011-01-24 23:39 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: Scott Wood, linuxppc-dev

Hi Ira, Scott

On 01/25/2011 12:26 AM, Ira W. Snyder wrote:
> On Mon, Jan 24, 2011 at 11:47:22PM +0200, Felix Radensky wrote:
>> Hi,
>>
>> I'm trying to use FSL DMA engine to perform DMA transfer from
>> memory buffer obtained by kmalloc() to PCI memory. This is on
>> custom board based on P2020 running linux-2.6.35. The PCI
>> device is Altera FPGA, connected directly to SoC PCI-E controller.
>>
>> 01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device
>> 0004 (rev 01)
>>           Subsystem: Altera Corporation Unknown device 0004
>>           Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B-
>>           Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>   >TAbort-<TAbort-<MAbort->SERR-<PERR-
>>           Interrupt: pin A routed to IRQ 16
>>           Region 0: Memory at c0000000 (32-bit, non-prefetchable)
>> [size=128K]
>>           Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
>> Queue=0/0 Enable-
>>                   Address: 0000000000000000  Data: 0000
>>           Capabilities: [78] Power Management version 3
>>                   Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>                   Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>           Capabilities: [80] Express Endpoint IRQ 0
>>                   Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
>> ExtTag-
>>                   Device: Latency L0s<64ns, L1<1us
>>                   Device: AtnBtn- AtnInd- PwrInd-
>>                   Device: Errors: Correctable- Non-Fatal- Fatal-
>> Unsupported-
>>                   Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>                   Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
>>                   Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
>>                   Link: Latency L0s unlimited, L1 unlimited
>>                   Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
>>                   Link: Speed 2.5Gb/s, Width x1
>>           Capabilities: [100] Virtual Channel
>>
>>
>> I can successfully writel() to PCI memory via address obtained from
>> pci_ioremap_bar().
>> Here's my DMA transfer routine
>>
>> static int dma_transfer(struct dma_chan *chan, void *dst, void *src,
>> size_t len)
>> {
>>       int rc = 0;
>>       dma_addr_t dma_src;
>>       dma_addr_t dma_dst;
>>       dma_cookie_t cookie;
>>       struct completion cmp;
>>       enum dma_status status;
>>       enum dma_ctrl_flags flags = 0;
>>       struct dma_device *dev = chan->device;
>>       struct dma_async_tx_descriptor *tx = NULL;
>>       unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
>>
>>       dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
>>       if (dma_mapping_error(dev->dev, dma_src)) {
>>           printk(KERN_ERR "Failed to map src for DMA\n");
>>           return -EIO;
>>       }
>>
>>       dma_dst = (dma_addr_t)dst;
>>
>>       flags = DMA_CTRL_ACK |
>>           DMA_COMPL_SRC_UNMAP_SINGLE  |
>>           DMA_COMPL_SKIP_DEST_UNMAP |
>>           DMA_PREP_INTERRUPT;
>>
>>       tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
>>       if (!tx) {
>>           printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
>>                  __FUNCTION__);
>>           dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
>>           return -ENOMEM;
>>       }
>>
>>       init_completion(&cmp);
>>       tx->callback = dma_callback;
>>       tx->callback_param =&cmp;
>>       cookie = tx->tx_submit(tx);
>>
>>       if (dma_submit_error(cookie)) {
>>           printk(KERN_ERR "%s: Failed to start DMA transfer\n",
>>                  __FUNCTION__);
>>           return -ENOMEM;
>>       }
>>
>>       dma_async_issue_pending(chan);
>>
>>       tmo = wait_for_completion_timeout(&cmp, tmo);
>>       status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
>>
>>       if (tmo == 0) {
>>           printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
>>           rc = -ETIMEDOUT;
>>       } else if (status != DMA_SUCCESS) {
>>           printk(KERN_ERR "%s: Transfer failed: status is %s\n",
>>                  __FUNCTION__,
>>                  status == DMA_ERROR ? "error" : "in progress");
>>
>>           dev->device_control(chan, DMA_TERMINATE_ALL, 0);
>>           rc = -EIO;
>>       }
>>
>>       return rc;
>> }
>>
>> The destination address is PCI memory address returned by
>> pci_ioremap_bar().
>> The transfer silently fails, destination buffer doesn't change
>> contents, but no
>> error condition is reported.
>>
>> What am I doing wrong ?
>>
>> Thanks a lot in advance.
>>
> Your destination address is wrong. The device_prep_dma_memcpy() routine
> works in physical addresses only (dma_addr_t type). Your source address
> looks fine: you're using the result of dma_map_single(), which returns a
> physical address.
>
> Your destination address should be something that comes from struct
> pci_dev.resource[x].start + offset if necessary. In your lspci output
> above, that will be 0xc0000000.
>
> Another possible problem: AFAIK you must use the _ONSTACK() variants
> from include/linux/completion.h for struct completion which are on the
> stack.
>
> Hope it helps,
> Ira

Thanks for your help. I'm now passing the result of 
pci_resource_start(pdev, 0)
as destination address, and destination buffer changes after the 
transfer. But
the contents of source and destination buffers are different. What 
else could
be wrong ?

Thanks.

Felix.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-24 23:39   ` Felix Radensky
@ 2011-01-25  0:18     ` Ira W. Snyder
  2011-01-25 14:32       ` Felix Radensky
  0 siblings, 1 reply; 14+ messages in thread
From: Ira W. Snyder @ 2011-01-25  0:18 UTC (permalink / raw)
  To: Felix Radensky; +Cc: Scott Wood, linuxppc-dev

On Tue, Jan 25, 2011 at 01:39:39AM +0200, Felix Radensky wrote:
> Hi Ira, Scott
> 
> On 01/25/2011 12:26 AM, Ira W. Snyder wrote:
> > On Mon, Jan 24, 2011 at 11:47:22PM +0200, Felix Radensky wrote:
> >> Hi,
> >>
> >> I'm trying to use FSL DMA engine to perform DMA transfer from
> >> memory buffer obtained by kmalloc() to PCI memory. This is on
> >> custom board based on P2020 running linux-2.6.35. The PCI
> >> device is Altera FPGA, connected directly to SoC PCI-E controller.
> >>
> >> 01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device
> >> 0004 (rev 01)
> >>           Subsystem: Altera Corporation Unknown device 0004
> >>           Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> >> ParErr- Stepping- SERR- FastB2B-
> >>           Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> >>   >TAbort-<TAbort-<MAbort->SERR-<PERR-
> >>           Interrupt: pin A routed to IRQ 16
> >>           Region 0: Memory at c0000000 (32-bit, non-prefetchable)
> >> [size=128K]
> >>           Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
> >> Queue=0/0 Enable-
> >>                   Address: 0000000000000000  Data: 0000
> >>           Capabilities: [78] Power Management version 3
> >>                   Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> >> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >>                   Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> >>           Capabilities: [80] Express Endpoint IRQ 0
> >>                   Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
> >> ExtTag-
> >>                   Device: Latency L0s<64ns, L1<1us
> >>                   Device: AtnBtn- AtnInd- PwrInd-
> >>                   Device: Errors: Correctable- Non-Fatal- Fatal-
> >> Unsupported-
> >>                   Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> >>                   Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
> >>                   Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
> >>                   Link: Latency L0s unlimited, L1 unlimited
> >>                   Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
> >>                   Link: Speed 2.5Gb/s, Width x1
> >>           Capabilities: [100] Virtual Channel
> >>
> >>
> >> I can successfully writel() to PCI memory via address obtained from
> >> pci_ioremap_bar().
> >> Here's my DMA transfer routine
> >>
> >> static int dma_transfer(struct dma_chan *chan, void *dst, void *src,
> >> size_t len)
> >> {
> >>       int rc = 0;
> >>       dma_addr_t dma_src;
> >>       dma_addr_t dma_dst;
> >>       dma_cookie_t cookie;
> >>       struct completion cmp;
> >>       enum dma_status status;
> >>       enum dma_ctrl_flags flags = 0;
> >>       struct dma_device *dev = chan->device;
> >>       struct dma_async_tx_descriptor *tx = NULL;
> >>       unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
> >>
> >>       dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
> >>       if (dma_mapping_error(dev->dev, dma_src)) {
> >>           printk(KERN_ERR "Failed to map src for DMA\n");
> >>           return -EIO;
> >>       }
> >>
> >>       dma_dst = (dma_addr_t)dst;
> >>
> >>       flags = DMA_CTRL_ACK |
> >>           DMA_COMPL_SRC_UNMAP_SINGLE  |
> >>           DMA_COMPL_SKIP_DEST_UNMAP |
> >>           DMA_PREP_INTERRUPT;
> >>
> >>       tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
> >>       if (!tx) {
> >>           printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
> >>                  __FUNCTION__);
> >>           dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
> >>           return -ENOMEM;
> >>       }
> >>
> >>       init_completion(&cmp);
> >>       tx->callback = dma_callback;
> >>       tx->callback_param =&cmp;
> >>       cookie = tx->tx_submit(tx);
> >>
> >>       if (dma_submit_error(cookie)) {
> >>           printk(KERN_ERR "%s: Failed to start DMA transfer\n",
> >>                  __FUNCTION__);
> >>           return -ENOMEM;
> >>       }
> >>
> >>       dma_async_issue_pending(chan);
> >>
> >>       tmo = wait_for_completion_timeout(&cmp, tmo);
> >>       status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
> >>
> >>       if (tmo == 0) {
> >>           printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
> >>           rc = -ETIMEDOUT;
> >>       } else if (status != DMA_SUCCESS) {
> >>           printk(KERN_ERR "%s: Transfer failed: status is %s\n",
> >>                  __FUNCTION__,
> >>                  status == DMA_ERROR ? "error" : "in progress");
> >>
> >>           dev->device_control(chan, DMA_TERMINATE_ALL, 0);
> >>           rc = -EIO;
> >>       }
> >>
> >>       return rc;
> >> }
> >>
> >> The destination address is PCI memory address returned by
> >> pci_ioremap_bar().
> >> The transfer silently fails, destination buffer doesn't change
> >> contents, but no
> >> error condition is reported.
> >>
> >> What am I doing wrong ?
> >>
> >> Thanks a lot in advance.
> >>
> > Your destination address is wrong. The device_prep_dma_memcpy() routine
> > works in physical addresses only (dma_addr_t type). Your source address
> > looks fine: you're using the result of dma_map_single(), which returns a
> > physical address.
> >
> > Your destination address should be something that comes from struct
> > pci_dev.resource[x].start + offset if necessary. In your lspci output
> > above, that will be 0xc0000000.
> >
> > Another possible problem: AFAIK you must use the _ONSTACK() variants
> > from include/linux/completion.h for struct completion which are on the
> > stack.
> >
> > Hope it helps,
> > Ira
> 
> Thanks for your help. I'm now passing the result of 
> pci_resource_start(pdev, 0)
> as destination address, and destination buffer changes after the 
> transfer. But
> the contents of source and destination buffers are different. What 
> else could
> be wrong ?
> 

After you changed the dst address to pci_resource_start(pdev, 0), I
don't see anything wrong with the code.

Try using memcpy_toio() to copy some bytes to the FPGA. Also try writing
a single byte at a time (writeb()?) in a loop. This should help
establish that your device is working.

If you put some pattern in your src buffer (such as 0x0, 0x1, 0x2, ...
0xff, repeat) does the destination show some pattern after the DMA
completes? (Such as, every 4th byte is correct.)

Ira

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: FSL DMA engine transfer to PCI memory
  2011-01-24 21:47 FSL DMA engine transfer to PCI memory Felix Radensky
  2011-01-24 22:26 ` Ira W. Snyder
  2011-01-24 22:44 ` Scott Wood
@ 2011-01-25  8:56 ` David Laight
  2 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2011-01-25  8:56 UTC (permalink / raw)
  To: linuxppc-dev

=20
> I'm trying to use FSL DMA engine to perform DMA transfer from
> memory buffer obtained by kmalloc() to PCI memory. This is on
> custom board based on P2020 running linux-2.6.35. The PCI
> device is Altera FPGA, connected directly to SoC PCI-E controller.

You'll need to use the dma engine that is part of the PCIe
interface in order to get large PCIe transfers.
I think everything else will still generate single 32bit PCIe
transfers - which are (if your measurements match mine)
exceptionally lethargic - the ISA bus is faster!

That does work provided you remember to give the dma controller
physical addresses and byteswap absolutely everything.
(Oh, and I didn't get single word transfers to work - they locked
the dma controller - not a problem since they are faster by PIO.)

Note that the PPC Linux (Linux in general??) doesn't have a
'virtual to physical' function that works for all addresses,
you'll need to remember the physical address of the PCIe slave
and use malloc'ed memory for the descriptors (on which
virt_to_phys() actually works).

I don't think there is a standard device driver for the PCIe dma,
I couldn't even find any header files that were vaugely relevent
except in the uboot sources.
I certainly wrote some code that just assumes it is on the right
hardware!

These are the relevant bits of code ....

Global initialisation:
    /* Enable the read/write dma controllers */
    csb_ctrl =3D  in_le32(&pex->pex_csb_ctrl);
    csb_ctrl |=3D PEX_CSB_CTRL_WDMAE | PEX_CSB_CTRL_RDMAE;
    out_le32(&pex->pex_csb_ctrl, csb_ctrl);

    /* We don't rely on the dma polling the descriptor, I have NFI
     * whether the default of 0 means 'never poll' or 'poll very
quickly'.
     * Set a large slow value for sanity. */
    out_le32(&pex->pex_dms_dstmr, ~0u);

Transfer setup:
    /* We only support aligned writes - caller must verify */
    dma_ctrl =3D PDMAD_CTRL_VALID;
    dma_ctrl |=3D PDMAD_CTRL_SNOOP_CSB;
    dma_ctrl |=3D PDMAD_CTRL_1ST_BYTES | PDMAD_CTRL_LAST_BYTES;
    dma_ctrl |=3D PDMAD_CTRL_NEXT_VALID;
    dma_ctrl |=3D len << (PDMAD_CTRL_LEN_SHIFT - 2);

    /* Fill in DMA descriptor */
    st_le32(&desc->pdmad_ctrl, dma_ctrl);
    /* We MUST clear the status - otherwise the xfer will be skipped */
    st_le32(&desc->pdmad_stat, 0);
    st_le32(&desc->pdmad_src_address, src_phys);
    st_le32(&desc->pdmad_dst_address, dst_phys);
    st_le32(&desc->pdmad_next_desc, 0);

    /* Clear old status */
    st_le32(&pex_dma->pex_dma_stat, in_le32(&pex_dma->pex_dma_stat));

    /* Give descriptor address to dma engine */
    st_le32(&pex_dma->pex_dma_addr, virt_to_phys(desc));

    /* Wait for all above memory cycles, then start xfer */
    iosync();
    st_le32(&pex_dma->pex_dma_ctrl, PEX_DMA_CTRL_START |
PEX_DMA_CTRL_SNOOP);

Poll for completion:
    /* Wait for transfer to complete/fail */
    do {
        desc_stat =3D ld_le32(&desc->pdmad_stat);
    } while (!(desc_stat & PDMAD_STAT_DONE));

    status =3D ld_le32(&pex_dma->pex_dma_stat);

    if (status =3D=3D (PEX_DMA_STAT_DSCPL | PEX_DMA_STAT_CHCPL)
            && desc_stat =3D=3D PDMAD_STAT_DONE)
        /* Transfer ok */
        return 0;
    /* Transfer failed */

Oh, since I couldn't find it in the documentation, the first
word of the dma descriptor is 'ctrl' and the last 'next_desc'.

    David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-25  0:18     ` Ira W. Snyder
@ 2011-01-25 14:32       ` Felix Radensky
  2011-01-25 16:29         ` Ira W. Snyder
  0 siblings, 1 reply; 14+ messages in thread
From: Felix Radensky @ 2011-01-25 14:32 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: Scott Wood, linuxppc-dev

Hi Ira,

On 01/25/2011 02:18 AM, Ira W. Snyder wrote:
> On Tue, Jan 25, 2011 at 01:39:39AM +0200, Felix Radensky wrote:
>> Hi Ira, Scott
>>
>> On 01/25/2011 12:26 AM, Ira W. Snyder wrote:
>>> On Mon, Jan 24, 2011 at 11:47:22PM +0200, Felix Radensky wrote:
>>>> Hi,
>>>>
>>>> I'm trying to use FSL DMA engine to perform DMA transfer from
>>>> memory buffer obtained by kmalloc() to PCI memory. This is on
>>>> custom board based on P2020 running linux-2.6.35. The PCI
>>>> device is Altera FPGA, connected directly to SoC PCI-E controller.
>>>>
>>>> 01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device
>>>> 0004 (rev 01)
>>>>            Subsystem: Altera Corporation Unknown device 0004
>>>>            Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
>>>> ParErr- Stepping- SERR- FastB2B-
>>>>            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>>    >TAbort-<TAbort-<MAbort->SERR-<PERR-
>>>>            Interrupt: pin A routed to IRQ 16
>>>>            Region 0: Memory at c0000000 (32-bit, non-prefetchable)
>>>> [size=128K]
>>>>            Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
>>>> Queue=0/0 Enable-
>>>>                    Address: 0000000000000000  Data: 0000
>>>>            Capabilities: [78] Power Management version 3
>>>>                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>>>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>>>                    Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>>>            Capabilities: [80] Express Endpoint IRQ 0
>>>>                    Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
>>>> ExtTag-
>>>>                    Device: Latency L0s<64ns, L1<1us
>>>>                    Device: AtnBtn- AtnInd- PwrInd-
>>>>                    Device: Errors: Correctable- Non-Fatal- Fatal-
>>>> Unsupported-
>>>>                    Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>>>                    Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
>>>>                    Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
>>>>                    Link: Latency L0s unlimited, L1 unlimited
>>>>                    Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
>>>>                    Link: Speed 2.5Gb/s, Width x1
>>>>            Capabilities: [100] Virtual Channel
>>>>
>>>>
>>>> I can successfully writel() to PCI memory via address obtained from
>>>> pci_ioremap_bar().
>>>> Here's my DMA transfer routine
>>>>
>>>> static int dma_transfer(struct dma_chan *chan, void *dst, void *src,
>>>> size_t len)
>>>> {
>>>>        int rc = 0;
>>>>        dma_addr_t dma_src;
>>>>        dma_addr_t dma_dst;
>>>>        dma_cookie_t cookie;
>>>>        struct completion cmp;
>>>>        enum dma_status status;
>>>>        enum dma_ctrl_flags flags = 0;
>>>>        struct dma_device *dev = chan->device;
>>>>        struct dma_async_tx_descriptor *tx = NULL;
>>>>        unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
>>>>
>>>>        dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
>>>>        if (dma_mapping_error(dev->dev, dma_src)) {
>>>>            printk(KERN_ERR "Failed to map src for DMA\n");
>>>>            return -EIO;
>>>>        }
>>>>
>>>>        dma_dst = (dma_addr_t)dst;
>>>>
>>>>        flags = DMA_CTRL_ACK |
>>>>            DMA_COMPL_SRC_UNMAP_SINGLE  |
>>>>            DMA_COMPL_SKIP_DEST_UNMAP |
>>>>            DMA_PREP_INTERRUPT;
>>>>
>>>>        tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
>>>>        if (!tx) {
>>>>            printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
>>>>                   __FUNCTION__);
>>>>            dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
>>>>            return -ENOMEM;
>>>>        }
>>>>
>>>>        init_completion(&cmp);
>>>>        tx->callback = dma_callback;
>>>>        tx->callback_param =&cmp;
>>>>        cookie = tx->tx_submit(tx);
>>>>
>>>>        if (dma_submit_error(cookie)) {
>>>>            printk(KERN_ERR "%s: Failed to start DMA transfer\n",
>>>>                   __FUNCTION__);
>>>>            return -ENOMEM;
>>>>        }
>>>>
>>>>        dma_async_issue_pending(chan);
>>>>
>>>>        tmo = wait_for_completion_timeout(&cmp, tmo);
>>>>        status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
>>>>
>>>>        if (tmo == 0) {
>>>>            printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
>>>>            rc = -ETIMEDOUT;
>>>>        } else if (status != DMA_SUCCESS) {
>>>>            printk(KERN_ERR "%s: Transfer failed: status is %s\n",
>>>>                   __FUNCTION__,
>>>>                   status == DMA_ERROR ? "error" : "in progress");
>>>>
>>>>            dev->device_control(chan, DMA_TERMINATE_ALL, 0);
>>>>            rc = -EIO;
>>>>        }
>>>>
>>>>        return rc;
>>>> }
>>>>
>>>> The destination address is PCI memory address returned by
>>>> pci_ioremap_bar().
>>>> The transfer silently fails, destination buffer doesn't change
>>>> contents, but no
>>>> error condition is reported.
>>>>
>>>> What am I doing wrong ?
>>>>
>>>> Thanks a lot in advance.
>>>>
>>> Your destination address is wrong. The device_prep_dma_memcpy() routine
>>> works in physical addresses only (dma_addr_t type). Your source address
>>> looks fine: you're using the result of dma_map_single(), which returns a
>>> physical address.
>>>
>>> Your destination address should be something that comes from struct
>>> pci_dev.resource[x].start + offset if necessary. In your lspci output
>>> above, that will be 0xc0000000.
>>>
>>> Another possible problem: AFAIK you must use the _ONSTACK() variants
>>> from include/linux/completion.h for struct completion which are on the
>>> stack.
>>>
>>> Hope it helps,
>>> Ira
>> Thanks for your help. I'm now passing the result of
>> pci_resource_start(pdev, 0)
>> as destination address, and destination buffer changes after the
>> transfer. But
>> the contents of source and destination buffers are different. What
>> else could
>> be wrong ?
>>
> After you changed the dst address to pci_resource_start(pdev, 0), I
> don't see anything wrong with the code.
>
> Try using memcpy_toio() to copy some bytes to the FPGA. Also try writing
> a single byte at a time (writeb()?) in a loop. This should help
> establish that your device is working.
>
> If you put some pattern in your src buffer (such as 0x0, 0x1, 0x2, ...
> 0xff, repeat) does the destination show some pattern after the DMA
> completes? (Such as, every 4th byte is correct.)
>
> Ira

memcpy_toio() works fine, the data is written correctly. After
DMA, the correct data appears at offsets 0xC, 0x1C, 0x2C, etc.
of the destination buffer. I have 12 bytes of junk, 4 bytes of
correct data, then again 12 bytes of junk and so on.

Felix.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-25 14:32       ` Felix Radensky
@ 2011-01-25 16:29         ` Ira W. Snyder
  2011-01-25 16:34           ` David Laight
  2011-01-27  8:32           ` Felix Radensky
  0 siblings, 2 replies; 14+ messages in thread
From: Ira W. Snyder @ 2011-01-25 16:29 UTC (permalink / raw)
  To: Felix Radensky; +Cc: Scott Wood, linuxppc-dev

On Tue, Jan 25, 2011 at 04:32:02PM +0200, Felix Radensky wrote:
> Hi Ira,
> 
> On 01/25/2011 02:18 AM, Ira W. Snyder wrote:
> > On Tue, Jan 25, 2011 at 01:39:39AM +0200, Felix Radensky wrote:
> >> Hi Ira, Scott
> >>
> >> On 01/25/2011 12:26 AM, Ira W. Snyder wrote:
> >>> On Mon, Jan 24, 2011 at 11:47:22PM +0200, Felix Radensky wrote:
> >>>> Hi,
> >>>>
> >>>> I'm trying to use FSL DMA engine to perform DMA transfer from
> >>>> memory buffer obtained by kmalloc() to PCI memory. This is on
> >>>> custom board based on P2020 running linux-2.6.35. The PCI
> >>>> device is Altera FPGA, connected directly to SoC PCI-E controller.
> >>>>
> >>>> 01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device
> >>>> 0004 (rev 01)
> >>>>            Subsystem: Altera Corporation Unknown device 0004
> >>>>            Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> >>>> ParErr- Stepping- SERR- FastB2B-
> >>>>            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> >>>>    >TAbort-<TAbort-<MAbort->SERR-<PERR-
> >>>>            Interrupt: pin A routed to IRQ 16
> >>>>            Region 0: Memory at c0000000 (32-bit, non-prefetchable)
> >>>> [size=128K]
> >>>>            Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
> >>>> Queue=0/0 Enable-
> >>>>                    Address: 0000000000000000  Data: 0000
> >>>>            Capabilities: [78] Power Management version 3
> >>>>                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> >>>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >>>>                    Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> >>>>            Capabilities: [80] Express Endpoint IRQ 0
> >>>>                    Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
> >>>> ExtTag-
> >>>>                    Device: Latency L0s<64ns, L1<1us
> >>>>                    Device: AtnBtn- AtnInd- PwrInd-
> >>>>                    Device: Errors: Correctable- Non-Fatal- Fatal-
> >>>> Unsupported-
> >>>>                    Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> >>>>                    Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
> >>>>                    Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
> >>>>                    Link: Latency L0s unlimited, L1 unlimited
> >>>>                    Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
> >>>>                    Link: Speed 2.5Gb/s, Width x1
> >>>>            Capabilities: [100] Virtual Channel
> >>>>
> >>>>
> >>>> I can successfully writel() to PCI memory via address obtained from
> >>>> pci_ioremap_bar().
> >>>> Here's my DMA transfer routine
> >>>>
> >>>> static int dma_transfer(struct dma_chan *chan, void *dst, void *src,
> >>>> size_t len)
> >>>> {
> >>>>        int rc = 0;
> >>>>        dma_addr_t dma_src;
> >>>>        dma_addr_t dma_dst;
> >>>>        dma_cookie_t cookie;
> >>>>        struct completion cmp;
> >>>>        enum dma_status status;
> >>>>        enum dma_ctrl_flags flags = 0;
> >>>>        struct dma_device *dev = chan->device;
> >>>>        struct dma_async_tx_descriptor *tx = NULL;
> >>>>        unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
> >>>>
> >>>>        dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
> >>>>        if (dma_mapping_error(dev->dev, dma_src)) {
> >>>>            printk(KERN_ERR "Failed to map src for DMA\n");
> >>>>            return -EIO;
> >>>>        }
> >>>>
> >>>>        dma_dst = (dma_addr_t)dst;
> >>>>
> >>>>        flags = DMA_CTRL_ACK |
> >>>>            DMA_COMPL_SRC_UNMAP_SINGLE  |
> >>>>            DMA_COMPL_SKIP_DEST_UNMAP |
> >>>>            DMA_PREP_INTERRUPT;
> >>>>
> >>>>        tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
> >>>>        if (!tx) {
> >>>>            printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
> >>>>                   __FUNCTION__);
> >>>>            dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
> >>>>            return -ENOMEM;
> >>>>        }
> >>>>
> >>>>        init_completion(&cmp);
> >>>>        tx->callback = dma_callback;
> >>>>        tx->callback_param =&cmp;
> >>>>        cookie = tx->tx_submit(tx);
> >>>>
> >>>>        if (dma_submit_error(cookie)) {
> >>>>            printk(KERN_ERR "%s: Failed to start DMA transfer\n",
> >>>>                   __FUNCTION__);
> >>>>            return -ENOMEM;
> >>>>        }
> >>>>
> >>>>        dma_async_issue_pending(chan);
> >>>>
> >>>>        tmo = wait_for_completion_timeout(&cmp, tmo);
> >>>>        status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
> >>>>
> >>>>        if (tmo == 0) {
> >>>>            printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
> >>>>            rc = -ETIMEDOUT;
> >>>>        } else if (status != DMA_SUCCESS) {
> >>>>            printk(KERN_ERR "%s: Transfer failed: status is %s\n",
> >>>>                   __FUNCTION__,
> >>>>                   status == DMA_ERROR ? "error" : "in progress");
> >>>>
> >>>>            dev->device_control(chan, DMA_TERMINATE_ALL, 0);
> >>>>            rc = -EIO;
> >>>>        }
> >>>>
> >>>>        return rc;
> >>>> }
> >>>>
> >>>> The destination address is PCI memory address returned by
> >>>> pci_ioremap_bar().
> >>>> The transfer silently fails, destination buffer doesn't change
> >>>> contents, but no
> >>>> error condition is reported.
> >>>>
> >>>> What am I doing wrong ?
> >>>>
> >>>> Thanks a lot in advance.
> >>>>
> >>> Your destination address is wrong. The device_prep_dma_memcpy() routine
> >>> works in physical addresses only (dma_addr_t type). Your source address
> >>> looks fine: you're using the result of dma_map_single(), which returns a
> >>> physical address.
> >>>
> >>> Your destination address should be something that comes from struct
> >>> pci_dev.resource[x].start + offset if necessary. In your lspci output
> >>> above, that will be 0xc0000000.
> >>>
> >>> Another possible problem: AFAIK you must use the _ONSTACK() variants
> >>> from include/linux/completion.h for struct completion which are on the
> >>> stack.
> >>>
> >>> Hope it helps,
> >>> Ira
> >> Thanks for your help. I'm now passing the result of
> >> pci_resource_start(pdev, 0)
> >> as destination address, and destination buffer changes after the
> >> transfer. But
> >> the contents of source and destination buffers are different. What
> >> else could
> >> be wrong ?
> >>
> > After you changed the dst address to pci_resource_start(pdev, 0), I
> > don't see anything wrong with the code.
> >
> > Try using memcpy_toio() to copy some bytes to the FPGA. Also try writing
> > a single byte at a time (writeb()?) in a loop. This should help
> > establish that your device is working.
> >
> > If you put some pattern in your src buffer (such as 0x0, 0x1, 0x2, ...
> > 0xff, repeat) does the destination show some pattern after the DMA
> > completes? (Such as, every 4th byte is correct.)
> >
> > Ira
> 
> memcpy_toio() works fine, the data is written correctly. After
> DMA, the correct data appears at offsets 0xC, 0x1C, 0x2C, etc.
> of the destination buffer. I have 12 bytes of junk, 4 bytes of
> correct data, then again 12 bytes of junk and so on.
> 

This sounds like your FPGA doesn't handle burst mode accesses correctly.
A logic analyzer will help you prove it.

Another quick test to try is using an unaligned transfer and see what
happens. The 83xx DMA controller handles unaligned transfers by doing
several small, non-burst transfers until the src and dst are aligned,
and then does cacheline size burst transfers until complete. I hunch the
85xx/86xx controller behaves the same way.

Something like this:

dma_src = dma_map_single(...);
dma_dst = pci_resource_start(pdev, 0) + 1;

Notice that the dst address is offset by one byte, so you'll need to
take that into account when comparing data after the transfer.

Ira

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: FSL DMA engine transfer to PCI memory
  2011-01-25 16:29         ` Ira W. Snyder
@ 2011-01-25 16:34           ` David Laight
  2011-01-25 19:57             ` Scott Wood
  2011-01-27  8:32           ` Felix Radensky
  1 sibling, 1 reply; 14+ messages in thread
From: David Laight @ 2011-01-25 16:34 UTC (permalink / raw)
  To: Ira W. Snyder, Felix Radensky; +Cc: Scott Wood, linuxppc-dev

=20
> > >>>> custom board based on P2020 running linux-2.6.35. The PCI
> > >>>> device is Altera FPGA, connected directly to SoC PCI-E
controller.

=20
> This sounds like your FPGA doesn't handle burst mode accesses=20
> correctly.
> A logic analyzer will help you prove it.

He is doing PCIe, not PCI.
A PCIe transfers is an HDLC packet pair, one containing the
request, the other the response.
In order to get any significant throughput the hdlc packet(s)
have to contain all the data (eg 128 bytes).
On the ppc we used that means you have to use the dma
controller inside the PCIe interface block.
The generic dma controller can't even generate 64bit
cycles into the ppc's PCIe engine.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-25 16:34           ` David Laight
@ 2011-01-25 19:57             ` Scott Wood
  2011-01-26 10:18               ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Scott Wood @ 2011-01-25 19:57 UTC (permalink / raw)
  To: David Laight; +Cc: linuxppc-dev, Felix Radensky, Ira W. Snyder

On Tue, 25 Jan 2011 16:34:49 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

>  
> > > >>>> custom board based on P2020 running linux-2.6.35. The PCI
> > > >>>> device is Altera FPGA, connected directly to SoC PCI-E
> controller.
> 
>  
> > This sounds like your FPGA doesn't handle burst mode accesses 
> > correctly.
> > A logic analyzer will help you prove it.
> 
> He is doing PCIe, not PCI.
> A PCIe transfers is an HDLC packet pair, one containing the
> request, the other the response.
> In order to get any significant throughput the hdlc packet(s)
> have to contain all the data (eg 128 bytes).
> On the ppc we used that means you have to use the dma
> controller inside the PCIe interface block.

What was the ppc you used?

On 85xx/QorIQ-family chips such as P2020, there is no DMA controller
inside the PCIe controller itself (or are you talking about bus
mastering by the PCIe device[1]?  "interface" is a bit ambiguous),
though it was considered part of the PCI controller on 82xx.

The DMA engine and PCIe are both on OCeaN, so the traffic does not need
to pass through the e500 Coherency Module.  My understanding -- for
what it's worth, coming from a software person :-) -- is that you should
be able to get large transfer chunks using the DMA engine.

I suggest getting things working, and then seeing whether the
performance is acceptable.

> The generic dma controller can't even generate 64bit
> cycles into the ppc's PCIe engine.

Could you elaborate?

-Scott

[1] To the original poster, is there any reason you're not doing bus
mastering from the PCIe device, assuming you control the content of
the FPGA?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: FSL DMA engine transfer to PCI memory
  2011-01-25 19:57             ` Scott Wood
@ 2011-01-26 10:18               ` David Laight
  2011-01-26 19:09                 ` Scott Wood
  0 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2011-01-26 10:18 UTC (permalink / raw)
  Cc: linuxppc-dev

=20
> What was the ppc you used?

The 8315E PowerQUIICC II

> On 85xx/QorIQ-family chips such as P2020, there is no DMA controller
> inside the PCIe controller itself (or are you talking about bus
> mastering by the PCIe device[1]?  "interface" is a bit ambiguous),
> though it was considered part of the PCI controller on 82xx.
>=20
> The DMA engine and PCIe are both on OCeaN, so the traffic=20
> does not need to pass through the e500 Coherency Module.
> My understanding -- for what it's worth, coming from a
> software person :-) -- is that you should
> be able to get large transfer chunks using the DMA engine.

It might be possible - but the ppc's pcie would need to know
the length of the dma (or at least be told that there was more
data to arrive) before even starting the pcie transfer.
I used 128 bytes per pcie transfer (which the altera slave
can handle) but that is longer than you want a burst on
the internal (CSB in my case) bus on the ppc.
It is also longer than a cache line - so the dma engine's
memory reads might induce a cache flush.=20

> I suggest getting things working, and then seeing whether the
> performance is acceptable.

The only reason for using dma (instead of pio) is to get
long pcie transfers - otherwise it isn't really worth the
effort. Transfers are unlikely to take long enough to make
it worth taking an interrupt at the end of the dma.

My device driver implements read() and write() (and poll()
to wait for interrupts). So I do overlap the copy_to/from_user
with the next dma.

> > The generic dma controller can't even generate 64bit
> > cycles into the ppc's PCIe engine.
>=20
> Could you elaborate?

The pcie is (apparantly) a 64bit interface, to a single 32bit
transfer is actually a 64bit one with only 4 byte enables driven.

I couldn't see anything that would allow a CSB master to generate
two 32bit cycles (since it is a 32bit bus) that the pcie hardware
could convert into a single 64bit pcie transfer.
The fpga target is likely to have 32bit targets (it could have
64 bit ones, but if you've instantiated a NiosII cpu it wont!)
so you get a bus width adapter (which carefully does the cycle
with no byte enables driven) as well as the clock crossing bridge.
These both make the slave even slower than it would otherwise be!

IIRC We managed to get 2us for a read and 500ns for a write cycle.
The per byte costs are relatively small in comparison.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-26 10:18               ` David Laight
@ 2011-01-26 19:09                 ` Scott Wood
  0 siblings, 0 replies; 14+ messages in thread
From: Scott Wood @ 2011-01-26 19:09 UTC (permalink / raw)
  To: David Laight; +Cc: linuxppc-dev

On Wed, 26 Jan 2011 10:18:01 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

>  
> > What was the ppc you used?
> 
> The 8315E PowerQUIICC II

Ah.  The interconnect between the DMA engine and PCIe is different on
83xx.

> > The DMA engine and PCIe are both on OCeaN, so the traffic 
> > does not need to pass through the e500 Coherency Module.
> > My understanding -- for what it's worth, coming from a
> > software person :-) -- is that you should
> > be able to get large transfer chunks using the DMA engine.
> 
> It might be possible - but the ppc's pcie would need to know
> the length of the dma (or at least be told that there was more
> data to arrive) before even starting the pcie transfer.

On 85xx/QorIQ, I believe the connection between the DMA engine and the
PCIe controller allows the data to arrive in suitably large chunks.

> > I suggest getting things working, and then seeing whether the
> > performance is acceptable.
> 
> The only reason for using dma (instead of pio) is to get
> long pcie transfers - otherwise it isn't really worth the
> effort. Transfers are unlikely to take long enough to make
> it worth taking an interrupt at the end of the dma.

But in the absence of specific knowledge about this specific
chip, implementing it and testing is a good way of determining whether
you get those large PCIe transactions on this particular hardware.

And even if the transfers aren't particularly fast, if the total
transfer size (not the size of the chunks that go on the bus) is large
enough, it could be worth freeing up the core to do something else.  It
could also avoid running the data through the core's caches, or be a
transfer from one PCIe device to another, etc.  Don't be too quick to
say don't bother. :-)

> > > The generic dma controller can't even generate 64bit
> > > cycles into the ppc's PCIe engine.
> > 
> > Could you elaborate?
> 
> The pcie is (apparantly) a 64bit interface, to a single 32bit
> transfer is actually a 64bit one with only 4 byte enables driven.

My understanding is that PCIe is an aggregation of one or more
serial links, over which packets are sent.  I'm not sure to what extent
it makes sense to call it a 64-bit interface, other than addressing.

> I couldn't see anything that would allow a CSB master to generate
> two 32bit cycles (since it is a 32bit bus) that the pcie hardware
> could convert into a single 64bit pcie transfer.

Again, that's an 83xx thing, 85xx/QorIQ is different.

Though from the 8315 manual it looks like the CSB can do 64-bit data
(but not addresses).

-Scott

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-25 16:29         ` Ira W. Snyder
  2011-01-25 16:34           ` David Laight
@ 2011-01-27  8:32           ` Felix Radensky
  2011-01-27 16:34             ` Ira W. Snyder
  1 sibling, 1 reply; 14+ messages in thread
From: Felix Radensky @ 2011-01-27  8:32 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: Scott Wood, linuxppc-dev

Hi Ira,

On 01/25/2011 06:29 PM, Ira W. Snyder wrote:
> On Tue, Jan 25, 2011 at 04:32:02PM +0200, Felix Radensky wrote:
>> Hi Ira,
>>
>> On 01/25/2011 02:18 AM, Ira W. Snyder wrote:
>>> On Tue, Jan 25, 2011 at 01:39:39AM +0200, Felix Radensky wrote:
>>>> Hi Ira, Scott
>>>>
>>>> On 01/25/2011 12:26 AM, Ira W. Snyder wrote:
>>>>> On Mon, Jan 24, 2011 at 11:47:22PM +0200, Felix Radensky wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to use FSL DMA engine to perform DMA transfer from
>>>>>> memory buffer obtained by kmalloc() to PCI memory. This is on
>>>>>> custom board based on P2020 running linux-2.6.35. The PCI
>>>>>> device is Altera FPGA, connected directly to SoC PCI-E controller.
>>>>>>
>>>>>> 01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device
>>>>>> 0004 (rev 01)
>>>>>>             Subsystem: Altera Corporation Unknown device 0004
>>>>>>             Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
>>>>>> ParErr- Stepping- SERR- FastB2B-
>>>>>>             Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>>>>     >TAbort-<TAbort-<MAbort->SERR-<PERR-
>>>>>>             Interrupt: pin A routed to IRQ 16
>>>>>>             Region 0: Memory at c0000000 (32-bit, non-prefetchable)
>>>>>> [size=128K]
>>>>>>             Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
>>>>>> Queue=0/0 Enable-
>>>>>>                     Address: 0000000000000000  Data: 0000
>>>>>>             Capabilities: [78] Power Management version 3
>>>>>>                     Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>>>>>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>>>>>                     Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>>>>>             Capabilities: [80] Express Endpoint IRQ 0
>>>>>>                     Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
>>>>>> ExtTag-
>>>>>>                     Device: Latency L0s<64ns, L1<1us
>>>>>>                     Device: AtnBtn- AtnInd- PwrInd-
>>>>>>                     Device: Errors: Correctable- Non-Fatal- Fatal-
>>>>>> Unsupported-
>>>>>>                     Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>>>>>                     Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
>>>>>>                     Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
>>>>>>                     Link: Latency L0s unlimited, L1 unlimited
>>>>>>                     Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
>>>>>>                     Link: Speed 2.5Gb/s, Width x1
>>>>>>             Capabilities: [100] Virtual Channel
>>>>>>
>>>>>>
>>>>>> I can successfully writel() to PCI memory via address obtained from
>>>>>> pci_ioremap_bar().
>>>>>> Here's my DMA transfer routine
>>>>>>
>>>>>> static int dma_transfer(struct dma_chan *chan, void *dst, void *src,
>>>>>> size_t len)
>>>>>> {
>>>>>>         int rc = 0;
>>>>>>         dma_addr_t dma_src;
>>>>>>         dma_addr_t dma_dst;
>>>>>>         dma_cookie_t cookie;
>>>>>>         struct completion cmp;
>>>>>>         enum dma_status status;
>>>>>>         enum dma_ctrl_flags flags = 0;
>>>>>>         struct dma_device *dev = chan->device;
>>>>>>         struct dma_async_tx_descriptor *tx = NULL;
>>>>>>         unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
>>>>>>
>>>>>>         dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
>>>>>>         if (dma_mapping_error(dev->dev, dma_src)) {
>>>>>>             printk(KERN_ERR "Failed to map src for DMA\n");
>>>>>>             return -EIO;
>>>>>>         }
>>>>>>
>>>>>>         dma_dst = (dma_addr_t)dst;
>>>>>>
>>>>>>         flags = DMA_CTRL_ACK |
>>>>>>             DMA_COMPL_SRC_UNMAP_SINGLE  |
>>>>>>             DMA_COMPL_SKIP_DEST_UNMAP |
>>>>>>             DMA_PREP_INTERRUPT;
>>>>>>
>>>>>>         tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
>>>>>>         if (!tx) {
>>>>>>             printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
>>>>>>                    __FUNCTION__);
>>>>>>             dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
>>>>>>             return -ENOMEM;
>>>>>>         }
>>>>>>
>>>>>>         init_completion(&cmp);
>>>>>>         tx->callback = dma_callback;
>>>>>>         tx->callback_param =&cmp;
>>>>>>         cookie = tx->tx_submit(tx);
>>>>>>
>>>>>>         if (dma_submit_error(cookie)) {
>>>>>>             printk(KERN_ERR "%s: Failed to start DMA transfer\n",
>>>>>>                    __FUNCTION__);
>>>>>>             return -ENOMEM;
>>>>>>         }
>>>>>>
>>>>>>         dma_async_issue_pending(chan);
>>>>>>
>>>>>>         tmo = wait_for_completion_timeout(&cmp, tmo);
>>>>>>         status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
>>>>>>
>>>>>>         if (tmo == 0) {
>>>>>>             printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
>>>>>>             rc = -ETIMEDOUT;
>>>>>>         } else if (status != DMA_SUCCESS) {
>>>>>>             printk(KERN_ERR "%s: Transfer failed: status is %s\n",
>>>>>>                    __FUNCTION__,
>>>>>>                    status == DMA_ERROR ? "error" : "in progress");
>>>>>>
>>>>>>             dev->device_control(chan, DMA_TERMINATE_ALL, 0);
>>>>>>             rc = -EIO;
>>>>>>         }
>>>>>>
>>>>>>         return rc;
>>>>>> }
>>>>>>
>>>>>> The destination address is PCI memory address returned by
>>>>>> pci_ioremap_bar().
>>>>>> The transfer silently fails, destination buffer doesn't change
>>>>>> contents, but no
>>>>>> error condition is reported.
>>>>>>
>>>>>> What am I doing wrong ?
>>>>>>
>>>>>> Thanks a lot in advance.
>>>>>>
>>>>> Your destination address is wrong. The device_prep_dma_memcpy() routine
>>>>> works in physical addresses only (dma_addr_t type). Your source address
>>>>> looks fine: you're using the result of dma_map_single(), which returns a
>>>>> physical address.
>>>>>
>>>>> Your destination address should be something that comes from struct
>>>>> pci_dev.resource[x].start + offset if necessary. In your lspci output
>>>>> above, that will be 0xc0000000.
>>>>>
>>>>> Another possible problem: AFAIK you must use the _ONSTACK() variants
>>>>> from include/linux/completion.h for struct completion which are on the
>>>>> stack.
>>>>>
>>>>> Hope it helps,
>>>>> Ira
>>>> Thanks for your help. I'm now passing the result of
>>>> pci_resource_start(pdev, 0)
>>>> as destination address, and destination buffer changes after the
>>>> transfer. But
>>>> the contents of source and destination buffers are different. What
>>>> else could
>>>> be wrong ?
>>>>
>>> After you changed the dst address to pci_resource_start(pdev, 0), I
>>> don't see anything wrong with the code.
>>>
>>> Try using memcpy_toio() to copy some bytes to the FPGA. Also try writing
>>> a single byte at a time (writeb()?) in a loop. This should help
>>> establish that your device is working.
>>>
>>> If you put some pattern in your src buffer (such as 0x0, 0x1, 0x2, ...
>>> 0xff, repeat) does the destination show some pattern after the DMA
>>> completes? (Such as, every 4th byte is correct.)
>>>
>>> Ira
>> memcpy_toio() works fine, the data is written correctly. After
>> DMA, the correct data appears at offsets 0xC, 0x1C, 0x2C, etc.
>> of the destination buffer. I have 12 bytes of junk, 4 bytes of
>> correct data, then again 12 bytes of junk and so on.
>>
> This sounds like your FPGA doesn't handle burst mode accesses correctly.
> A logic analyzer will help you prove it.
>
> Another quick test to try is using an unaligned transfer and see what
> happens. The 83xx DMA controller handles unaligned transfers by doing
> several small, non-burst transfers until the src and dst are aligned,
> and then does cacheline size burst transfers until complete. I hunch the
> 85xx/86xx controller behaves the same way.
>
> Something like this:
>
> dma_src = dma_map_single(...);
> dma_dst = pci_resource_start(pdev, 0) + 1;
>
> Notice that the dst address is offset by one byte, so you'll need to
> take that into account when comparing data after the transfer.
>
> Ira

Thanks a lot for your help. It seems the problem was in fsldma.c code,
which was fixed in later kernels (I'm using 2.6.35). The BWC field
in MR register was not set, resulting in single-byte transfers. This
did not work well with FPGA which implements a FIFO with minimal
transfer unit of 32 bits. After setting BWC field DMA works fine.

Felix.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FSL DMA engine transfer to PCI memory
  2011-01-27  8:32           ` Felix Radensky
@ 2011-01-27 16:34             ` Ira W. Snyder
  0 siblings, 0 replies; 14+ messages in thread
From: Ira W. Snyder @ 2011-01-27 16:34 UTC (permalink / raw)
  To: Felix Radensky; +Cc: Scott Wood, linuxppc-dev

On Thu, Jan 27, 2011 at 10:32:19AM +0200, Felix Radensky wrote:
> Hi Ira,
> 
> On 01/25/2011 06:29 PM, Ira W. Snyder wrote:
> > On Tue, Jan 25, 2011 at 04:32:02PM +0200, Felix Radensky wrote:
> >> Hi Ira,
> >>
> >> On 01/25/2011 02:18 AM, Ira W. Snyder wrote:
> >>> On Tue, Jan 25, 2011 at 01:39:39AM +0200, Felix Radensky wrote:
> >>>> Hi Ira, Scott
> >>>>
> >>>> On 01/25/2011 12:26 AM, Ira W. Snyder wrote:
> >>>>> On Mon, Jan 24, 2011 at 11:47:22PM +0200, Felix Radensky wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm trying to use FSL DMA engine to perform DMA transfer from
> >>>>>> memory buffer obtained by kmalloc() to PCI memory. This is on
> >>>>>> custom board based on P2020 running linux-2.6.35. The PCI
> >>>>>> device is Altera FPGA, connected directly to SoC PCI-E controller.
> >>>>>>
> >>>>>> 01:00.0 Unassigned class [ff00]: Altera Corporation Unknown device
> >>>>>> 0004 (rev 01)
> >>>>>>             Subsystem: Altera Corporation Unknown device 0004
> >>>>>>             Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> >>>>>> ParErr- Stepping- SERR- FastB2B-
> >>>>>>             Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> >>>>>>     >TAbort-<TAbort-<MAbort->SERR-<PERR-
> >>>>>>             Interrupt: pin A routed to IRQ 16
> >>>>>>             Region 0: Memory at c0000000 (32-bit, non-prefetchable)
> >>>>>> [size=128K]
> >>>>>>             Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
> >>>>>> Queue=0/0 Enable-
> >>>>>>                     Address: 0000000000000000  Data: 0000
> >>>>>>             Capabilities: [78] Power Management version 3
> >>>>>>                     Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> >>>>>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >>>>>>                     Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>             Capabilities: [80] Express Endpoint IRQ 0
> >>>>>>                     Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
> >>>>>> ExtTag-
> >>>>>>                     Device: Latency L0s<64ns, L1<1us
> >>>>>>                     Device: AtnBtn- AtnInd- PwrInd-
> >>>>>>                     Device: Errors: Correctable- Non-Fatal- Fatal-
> >>>>>> Unsupported-
> >>>>>>                     Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> >>>>>>                     Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
> >>>>>>                     Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
> >>>>>>                     Link: Latency L0s unlimited, L1 unlimited
> >>>>>>                     Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
> >>>>>>                     Link: Speed 2.5Gb/s, Width x1
> >>>>>>             Capabilities: [100] Virtual Channel
> >>>>>>
> >>>>>>
> >>>>>> I can successfully writel() to PCI memory via address obtained from
> >>>>>> pci_ioremap_bar().
> >>>>>> Here's my DMA transfer routine
> >>>>>>
> >>>>>> static int dma_transfer(struct dma_chan *chan, void *dst, void *src,
> >>>>>> size_t len)
> >>>>>> {
> >>>>>>         int rc = 0;
> >>>>>>         dma_addr_t dma_src;
> >>>>>>         dma_addr_t dma_dst;
> >>>>>>         dma_cookie_t cookie;
> >>>>>>         struct completion cmp;
> >>>>>>         enum dma_status status;
> >>>>>>         enum dma_ctrl_flags flags = 0;
> >>>>>>         struct dma_device *dev = chan->device;
> >>>>>>         struct dma_async_tx_descriptor *tx = NULL;
> >>>>>>         unsigned long tmo = msecs_to_jiffies(FPGA_DMA_TIMEOUT_MS);
> >>>>>>
> >>>>>>         dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
> >>>>>>         if (dma_mapping_error(dev->dev, dma_src)) {
> >>>>>>             printk(KERN_ERR "Failed to map src for DMA\n");
> >>>>>>             return -EIO;
> >>>>>>         }
> >>>>>>
> >>>>>>         dma_dst = (dma_addr_t)dst;
> >>>>>>
> >>>>>>         flags = DMA_CTRL_ACK |
> >>>>>>             DMA_COMPL_SRC_UNMAP_SINGLE  |
> >>>>>>             DMA_COMPL_SKIP_DEST_UNMAP |
> >>>>>>             DMA_PREP_INTERRUPT;
> >>>>>>
> >>>>>>         tx = dev->device_prep_dma_memcpy(chan, dma_dst, dma_src, len, flags);
> >>>>>>         if (!tx) {
> >>>>>>             printk(KERN_ERR "%s: Failed to prepare DMA transfer\n",
> >>>>>>                    __FUNCTION__);
> >>>>>>             dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
> >>>>>>             return -ENOMEM;
> >>>>>>         }
> >>>>>>
> >>>>>>         init_completion(&cmp);
> >>>>>>         tx->callback = dma_callback;
> >>>>>>         tx->callback_param =&cmp;
> >>>>>>         cookie = tx->tx_submit(tx);
> >>>>>>
> >>>>>>         if (dma_submit_error(cookie)) {
> >>>>>>             printk(KERN_ERR "%s: Failed to start DMA transfer\n",
> >>>>>>                    __FUNCTION__);
> >>>>>>             return -ENOMEM;
> >>>>>>         }
> >>>>>>
> >>>>>>         dma_async_issue_pending(chan);
> >>>>>>
> >>>>>>         tmo = wait_for_completion_timeout(&cmp, tmo);
> >>>>>>         status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
> >>>>>>
> >>>>>>         if (tmo == 0) {
> >>>>>>             printk(KERN_ERR "%s: Transfer timed out\n", __FUNCTION__);
> >>>>>>             rc = -ETIMEDOUT;
> >>>>>>         } else if (status != DMA_SUCCESS) {
> >>>>>>             printk(KERN_ERR "%s: Transfer failed: status is %s\n",
> >>>>>>                    __FUNCTION__,
> >>>>>>                    status == DMA_ERROR ? "error" : "in progress");
> >>>>>>
> >>>>>>             dev->device_control(chan, DMA_TERMINATE_ALL, 0);
> >>>>>>             rc = -EIO;
> >>>>>>         }
> >>>>>>
> >>>>>>         return rc;
> >>>>>> }
> >>>>>>
> >>>>>> The destination address is PCI memory address returned by
> >>>>>> pci_ioremap_bar().
> >>>>>> The transfer silently fails, destination buffer doesn't change
> >>>>>> contents, but no
> >>>>>> error condition is reported.
> >>>>>>
> >>>>>> What am I doing wrong ?
> >>>>>>
> >>>>>> Thanks a lot in advance.
> >>>>>>
> >>>>> Your destination address is wrong. The device_prep_dma_memcpy() routine
> >>>>> works in physical addresses only (dma_addr_t type). Your source address
> >>>>> looks fine: you're using the result of dma_map_single(), which returns a
> >>>>> physical address.
> >>>>>
> >>>>> Your destination address should be something that comes from struct
> >>>>> pci_dev.resource[x].start + offset if necessary. In your lspci output
> >>>>> above, that will be 0xc0000000.
> >>>>>
> >>>>> Another possible problem: AFAIK you must use the _ONSTACK() variants
> >>>>> from include/linux/completion.h for struct completion which are on the
> >>>>> stack.
> >>>>>
> >>>>> Hope it helps,
> >>>>> Ira
> >>>> Thanks for your help. I'm now passing the result of
> >>>> pci_resource_start(pdev, 0)
> >>>> as destination address, and destination buffer changes after the
> >>>> transfer. But
> >>>> the contents of source and destination buffers are different. What
> >>>> else could
> >>>> be wrong ?
> >>>>
> >>> After you changed the dst address to pci_resource_start(pdev, 0), I
> >>> don't see anything wrong with the code.
> >>>
> >>> Try using memcpy_toio() to copy some bytes to the FPGA. Also try writing
> >>> a single byte at a time (writeb()?) in a loop. This should help
> >>> establish that your device is working.
> >>>
> >>> If you put some pattern in your src buffer (such as 0x0, 0x1, 0x2, ...
> >>> 0xff, repeat) does the destination show some pattern after the DMA
> >>> completes? (Such as, every 4th byte is correct.)
> >>>
> >>> Ira
> >> memcpy_toio() works fine, the data is written correctly. After
> >> DMA, the correct data appears at offsets 0xC, 0x1C, 0x2C, etc.
> >> of the destination buffer. I have 12 bytes of junk, 4 bytes of
> >> correct data, then again 12 bytes of junk and so on.
> >>
> > This sounds like your FPGA doesn't handle burst mode accesses correctly.
> > A logic analyzer will help you prove it.
> >
> > Another quick test to try is using an unaligned transfer and see what
> > happens. The 83xx DMA controller handles unaligned transfers by doing
> > several small, non-burst transfers until the src and dst are aligned,
> > and then does cacheline size burst transfers until complete. I hunch the
> > 85xx/86xx controller behaves the same way.
> >
> > Something like this:
> >
> > dma_src = dma_map_single(...);
> > dma_dst = pci_resource_start(pdev, 0) + 1;
> >
> > Notice that the dst address is offset by one byte, so you'll need to
> > take that into account when comparing data after the transfer.
> >
> > Ira
> 
> Thanks a lot for your help. It seems the problem was in fsldma.c code,
> which was fixed in later kernels (I'm using 2.6.35). The BWC field
> in MR register was not set, resulting in single-byte transfers. This
> did not work well with FPGA which implements a FIFO with minimal
> transfer unit of 32 bits. After setting BWC field DMA works fine.
> 

I'm glad to hear it works.

Ira

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-01-27 16:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-24 21:47 FSL DMA engine transfer to PCI memory Felix Radensky
2011-01-24 22:26 ` Ira W. Snyder
2011-01-24 23:39   ` Felix Radensky
2011-01-25  0:18     ` Ira W. Snyder
2011-01-25 14:32       ` Felix Radensky
2011-01-25 16:29         ` Ira W. Snyder
2011-01-25 16:34           ` David Laight
2011-01-25 19:57             ` Scott Wood
2011-01-26 10:18               ` David Laight
2011-01-26 19:09                 ` Scott Wood
2011-01-27  8:32           ` Felix Radensky
2011-01-27 16:34             ` Ira W. Snyder
2011-01-24 22:44 ` Scott Wood
2011-01-25  8:56 ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.