All of lore.kernel.org
 help / color / mirror / Atom feed
* cobalt & dma
@ 2015-11-17  7:39 Ran Shalit
  2015-11-17  7:53 ` Hans Verkuil
  0 siblings, 1 reply; 10+ messages in thread
From: Ran Shalit @ 2015-11-17  7:39 UTC (permalink / raw)
  To: linux-media

Hello,

I intend to use cobalt driver as a refence for new pci v4l2 driver,
which is required to use several input simultaneously. for this cobalt
seems like a best starting point.
read/write streaming will probably be suffecient (at least for the
dirst debugging).
The configuration in my cast is i7 core <-- pci ---> fpga.
I see that the dma implementation is quite complex, and would like to
ask for some tips regarding the following points related to dma issue:

1. Is it possible to do the read/write without dma (for debug as start) ?
What changes are required for read without dma (I assume dma is used
by default in read/write) ?
Is it done by using  #include <media/videobuf2-vmalloc.h> instead of
#include <media/videobuf2-dma*> ?

2. I find it difficult to unerstand  cobalt_dma_start_streaming()
implementation, which has many specific cobalt memory writing
iowrite32().
How can I understand how/what to implement dma in my specific platform/device ?


Best Regards,
Ran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-17  7:39 cobalt & dma Ran Shalit
@ 2015-11-17  7:53 ` Hans Verkuil
  2015-11-17 13:15   ` Ran Shalit
  2015-11-20 14:49   ` Ran Shalit
  0 siblings, 2 replies; 10+ messages in thread
From: Hans Verkuil @ 2015-11-17  7:53 UTC (permalink / raw)
  To: Ran Shalit, linux-media

On 11/17/2015 08:39 AM, Ran Shalit wrote:
> Hello,
> 
> I intend to use cobalt driver as a refence for new pci v4l2 driver,
> which is required to use several input simultaneously. for this cobalt
> seems like a best starting point.
> read/write streaming will probably be suffecient (at least for the
> dirst debugging).
> The configuration in my cast is i7 core <-- pci ---> fpga.
> I see that the dma implementation is quite complex, and would like to
> ask for some tips regarding the following points related to dma issue:
> 
> 1. Is it possible to do the read/write without dma (for debug as start) ?

No. All video capture/output devices all use DMA since it would be prohibitively
expensive for the CPU to do otherwise. So just dig in and implement it.

> What changes are required for read without dma (I assume dma is used
> by default in read/write) ?
> Is it done by using  #include <media/videobuf2-vmalloc.h> instead of
> #include <media/videobuf2-dma*> ?

No. The vmalloc variant is typically used for USB devices. For PCI(e) you'll
use videobuf2-dma-contig if the DMA engine requires physically contiguous DMA,
or videobuf2-dma-sg if the DMA engine supports scatter-gather DMA. You can
start with dma-contig since the DMA code tends to be simpler, but it is
harder to get the required physically contiguous memory if memory fragmentation
takes place. So you may not be able to allocate the buffers. dma-sg works much
better with virtual memory.

> 
> 2. I find it difficult to unerstand  cobalt_dma_start_streaming()
> implementation, which has many specific cobalt memory writing
> iowrite32().
> How can I understand how/what to implement dma in my specific platform/device ?

Read include/media/videobuf2-core.h.

There is also an LWN article somewhere (albeit somewhat outdated by now).

Don't expect to write three lines of code and everything works. You *do*
have to write the code for your DMA hardware, there is no way around that.

Regards,

	Hans

> 
> 
> Best Regards,
> Ran
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-17  7:53 ` Hans Verkuil
@ 2015-11-17 13:15   ` Ran Shalit
  2015-11-17 13:32     ` Steven Toth
  2015-11-17 13:54     ` Hans Verkuil
  2015-11-20 14:49   ` Ran Shalit
  1 sibling, 2 replies; 10+ messages in thread
From: Ran Shalit @ 2015-11-17 13:15 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media

On Tue, Nov 17, 2015 at 9:53 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> On 11/17/2015 08:39 AM, Ran Shalit wrote:
>> Hello,
>>
>> I intend to use cobalt driver as a refence for new pci v4l2 driver,
>> which is required to use several input simultaneously. for this cobalt
>> seems like a best starting point.
>> read/write streaming will probably be suffecient (at least for the
>> dirst debugging).
>> The configuration in my cast is i7 core <-- pci ---> fpga.
>> I see that the dma implementation is quite complex, and would like to
>> ask for some tips regarding the following points related to dma issue:
>>
>> 1. Is it possible to do the read/write without dma (for debug as start) ?
>
> No. All video capture/output devices all use DMA since it would be prohibitively
> expensive for the CPU to do otherwise. So just dig in and implement it.
>

Hi,

Is the cobalt or other pci v4l device have the chip datasheet
available so that we can do a reverse engineering and gain more
understanding about the register read/write for the dma transactions ?
I made a search but it seems that the PCIe chip datasheet for these
devices is not available anywhere.

Best Regards,
Ran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-17 13:15   ` Ran Shalit
@ 2015-11-17 13:32     ` Steven Toth
  2015-11-17 13:54     ` Hans Verkuil
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Toth @ 2015-11-17 13:32 UTC (permalink / raw)
  To: Ran Shalit; +Cc: Hans Verkuil, linux-media

> Is the cobalt or other pci v4l device have the chip datasheet
> available so that we can do a reverse engineering and gain more
> understanding about the register read/write for the dma transactions ?
> I made a search but it seems that the PCIe chip datasheet for these
> devices is not available anywhere.

Generally you wouldn't need it, and I'm not sure it would help having it.

Get to grips with the fundamentals and don't worry about cobalt registers.

DMA programming is highly chip specific, but in general terms its
highly similar in concept on any PCIe controller. Every
driver+controller uses virtual/physical bus addresses that need to be
understood, scatter gather list created and programmed into the h/w,
interrupts serviced, buffer/transfer completion identification and
transfer sizes.

Look hard enough at any of the PCI/E drivers in the media tree and
you'll see each of them implementing their own versions of the above.

-- 
Steven Toth - Kernel Labs
http://www.kernellabs.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-17 13:15   ` Ran Shalit
  2015-11-17 13:32     ` Steven Toth
@ 2015-11-17 13:54     ` Hans Verkuil
  2015-11-17 21:43       ` Ran Shalit
  1 sibling, 1 reply; 10+ messages in thread
From: Hans Verkuil @ 2015-11-17 13:54 UTC (permalink / raw)
  To: Ran Shalit; +Cc: linux-media

On 11/17/15 14:15, Ran Shalit wrote:
> On Tue, Nov 17, 2015 at 9:53 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>> On 11/17/2015 08:39 AM, Ran Shalit wrote:
>>> Hello,
>>>
>>> I intend to use cobalt driver as a refence for new pci v4l2 driver,
>>> which is required to use several input simultaneously. for this cobalt
>>> seems like a best starting point.
>>> read/write streaming will probably be suffecient (at least for the
>>> dirst debugging).
>>> The configuration in my cast is i7 core <-- pci ---> fpga.
>>> I see that the dma implementation is quite complex, and would like to
>>> ask for some tips regarding the following points related to dma issue:
>>>
>>> 1. Is it possible to do the read/write without dma (for debug as start) ?
>>
>> No. All video capture/output devices all use DMA since it would be prohibitively
>> expensive for the CPU to do otherwise. So just dig in and implement it.
>>
> 
> Hi,
> 
> Is the cobalt or other pci v4l device have the chip datasheet
> available so that we can do a reverse engineering and gain more
> understanding about the register read/write for the dma transactions ?
> I made a search but it seems that the PCIe chip datasheet for these
> devices is not available anywhere.

Sorry, no, it's not publicly available.

But they all work along the same lines: each DMA descriptor has a
PCI DMA address (where the data should be written to in memory), the length
(bytes) of the DMA transfer and the pointer to the next DMA descriptor (chaining
descriptors together). Finally there is some bit to trigger and interrupt when
the full frame has been transferred.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-17 13:54     ` Hans Verkuil
@ 2015-11-17 21:43       ` Ran Shalit
  0 siblings, 0 replies; 10+ messages in thread
From: Ran Shalit @ 2015-11-17 21:43 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media

On Tue, Nov 17, 2015 at 3:54 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> On 11/17/15 14:15, Ran Shalit wrote:
>> On Tue, Nov 17, 2015 at 9:53 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>> On 11/17/2015 08:39 AM, Ran Shalit wrote:
>>>> Hello,
>>>>
>>>> I intend to use cobalt driver as a refence for new pci v4l2 driver,
>>>> which is required to use several input simultaneously. for this cobalt
>>>> seems like a best starting point.
>>>> read/write streaming will probably be suffecient (at least for the
>>>> dirst debugging).
>>>> The configuration in my cast is i7 core <-- pci ---> fpga.
>>>> I see that the dma implementation is quite complex, and would like to
>>>> ask for some tips regarding the following points related to dma issue:
>>>>
>>>> 1. Is it possible to do the read/write without dma (for debug as start) ?
>>>
>>> No. All video capture/output devices all use DMA since it would be prohibitively
>>> expensive for the CPU to do otherwise. So just dig in and implement it.
>>>
>>
>> Hi,
>>
>> Is the cobalt or other pci v4l device have the chip datasheet
>> available so that we can do a reverse engineering and gain more
>> understanding about the register read/write for the dma transactions ?
>> I made a search but it seems that the PCIe chip datasheet for these
>> devices is not available anywhere.
>
> Sorry, no, it's not publicly available.
>
> But they all work along the same lines: each DMA descriptor has a
> PCI DMA address (where the data should be written to in memory), the length
> (bytes) of the DMA transfer and the pointer to the next DMA descriptor (chaining
> descriptors together). Finally there is some bit to trigger and interrupt when
> the full frame has been transferred.
>


Thank you all very much for all these valuable information !
I must admit that when I observe the source examples,
it seems quite complex, (at least much more complex than the driver I
am familiar with, which most of them time is taking a functional
example and understanding what to change and how, or writing simple
drivers.... )

If there are any other tips and ideas about debug/testing/development
steps when doing pci v4l device driver please tell me.

Thank you all very much,
Ran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-17  7:53 ` Hans Verkuil
  2015-11-17 13:15   ` Ran Shalit
@ 2015-11-20 14:49   ` Ran Shalit
  2015-11-20 14:55     ` Hans Verkuil
  1 sibling, 1 reply; 10+ messages in thread
From: Ran Shalit @ 2015-11-20 14:49 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media

Hello,



>
> No. All video capture/output devices all use DMA since it would be prohibitively
> expensive for the CPU to do otherwise. So just dig in and implement it.

I am trying to better understand how read() operation actually use the
dma, but I can't yet understand it from code.

>
> No. The vmalloc variant is typically used for USB devices. For PCI(e) you'll
> use videobuf2-dma-contig if the DMA engine requires physically contiguous DMA,
> or videobuf2-dma-sg if the DMA engine supports scatter-gather DMA. You can
> start with dma-contig since the DMA code tends to be simpler, but it is
> harder to get the required physically contiguous memory if memory fragmentation
> takes place. So you may not be able to allocate the buffers. dma-sg works much
> better with virtual memory.
>
>


1. I tried to understand the code implementation of videobuf2 with
regards to read():
read() ->
    vb2_read() ->
          __vb2_perform_fileio()->
             vb2_internal_dqbuf() &  copy_to_user()

Where is the actual allocation of dma contiguous memory ? Is done with
the userspace calloc() call in userspace (as shown in the v4l2 API
example) ? As I understand the calloc/malloc are not guaranteed to be
contiguous.
     How do I know if the try to allocate contigious memory has failed or not ?


2. Is the call to copy_to_user results is performance degredation of
read() in compare to mmap() method ?

Best Regards,
Ran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-20 14:49   ` Ran Shalit
@ 2015-11-20 14:55     ` Hans Verkuil
  2015-11-20 16:14       ` Ran Shalit
  0 siblings, 1 reply; 10+ messages in thread
From: Hans Verkuil @ 2015-11-20 14:55 UTC (permalink / raw)
  To: Ran Shalit; +Cc: linux-media

On 11/20/2015 03:49 PM, Ran Shalit wrote:
> Hello,
> 
> 
> 
>>
>> No. All video capture/output devices all use DMA since it would be prohibitively
>> expensive for the CPU to do otherwise. So just dig in and implement it.
> 
> I am trying to better understand how read() operation actually use the
> dma, but I can't yet understand it from code.
> 
>>
>> No. The vmalloc variant is typically used for USB devices. For PCI(e) you'll
>> use videobuf2-dma-contig if the DMA engine requires physically contiguous DMA,
>> or videobuf2-dma-sg if the DMA engine supports scatter-gather DMA. You can
>> start with dma-contig since the DMA code tends to be simpler, but it is
>> harder to get the required physically contiguous memory if memory fragmentation
>> takes place. So you may not be able to allocate the buffers. dma-sg works much
>> better with virtual memory.
>>
>>
> 
> 
> 1. I tried to understand the code implementation of videobuf2 with
> regards to read():
> read() ->
>     vb2_read() ->
>           __vb2_perform_fileio()->
>              vb2_internal_dqbuf() &  copy_to_user()
> 
> Where is the actual allocation of dma contiguous memory ? Is done with
> the userspace calloc() call in userspace (as shown in the v4l2 API
> example) ? As I understand the calloc/malloc are not guaranteed to be
> contiguous.
>      How do I know if the try to allocate contigious memory has failed or not ?

The actual allocation happens in videobuf2-vmalloc/dma-contig/dma-sg depending
on the flavor of buffers you want (virtual memory, DMA into physically contiguous
memory or DMA into scatter-gather memory). The alloc operation is the one that
allocates the memory.

> 
> 
> 2. Is the call to copy_to_user results is performance degredation of
> read() in compare to mmap() method ?

Correct. But if you use the vb2 framework then you get stream I/O and the
read/write operations for free. vb2_read() sits on top of the stream I/O
implementation. It basically requests buffers and loops while queuing and
dequeuing buffers and calling copy_to_user() to copy the data into the
read() buffer.

This is (very) inefficient and applications should use the V4L2 stream I/O
mechanism directly.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-20 14:55     ` Hans Verkuil
@ 2015-11-20 16:14       ` Ran Shalit
  2015-11-20 16:25         ` Hans Verkuil
  0 siblings, 1 reply; 10+ messages in thread
From: Ran Shalit @ 2015-11-20 16:14 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media

>>
>> 1. I tried to understand the code implementation of videobuf2 with
>> regards to read():
>> read() ->
>>     vb2_read() ->
>>           __vb2_perform_fileio()->
>>              vb2_internal_dqbuf() &  copy_to_user()
>>
>> Where is the actual allocation of dma contiguous memory ? Is done with
>> the userspace calloc() call in userspace (as shown in the v4l2 API
>> example) ? As I understand the calloc/malloc are not guaranteed to be
>> contiguous.
>>      How do I know if the try to allocate contigious memory has failed or not ?
>
> The actual allocation happens in videobuf2-vmalloc/dma-contig/dma-sg depending
> on the flavor of buffers you want (virtual memory, DMA into physically contiguous
> memory or DMA into scatter-gather memory). The alloc operation is the one that
> allocates the memory.


Thank you very much for the time.

Just to be sure I understand the general mechanism of DMA with regards
to the read() operation and in the case of using contiguous memory,
I try to draw the general sequence as I understand it from the code
and reading on this issue:

read() into user memory buffer ->
          vb2_read() ->
                __vb2_perform_fileio() ->
                        deaque buffer with:  vb2_internal_dqbuf() into
contiguous DMA memory (kernel)  ->
                               copy_to_user() will actually copy from
the contigious dma memory(kernel)  into user buffer (userspace)

1. Is the above sequence  correct ?
2. When talking about contiguous dma memory (or scatter-gatther) we
actually always refer to memory allocated in kernel, right ?

Best Regards,

Ran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cobalt & dma
  2015-11-20 16:14       ` Ran Shalit
@ 2015-11-20 16:25         ` Hans Verkuil
  0 siblings, 0 replies; 10+ messages in thread
From: Hans Verkuil @ 2015-11-20 16:25 UTC (permalink / raw)
  To: Ran Shalit; +Cc: linux-media

On 11/20/2015 05:14 PM, Ran Shalit wrote:
>>>
>>> 1. I tried to understand the code implementation of videobuf2 with
>>> regards to read():
>>> read() ->
>>>     vb2_read() ->
>>>           __vb2_perform_fileio()->
>>>              vb2_internal_dqbuf() &  copy_to_user()
>>>
>>> Where is the actual allocation of dma contiguous memory ? Is done with
>>> the userspace calloc() call in userspace (as shown in the v4l2 API
>>> example) ? As I understand the calloc/malloc are not guaranteed to be
>>> contiguous.
>>>      How do I know if the try to allocate contigious memory has failed or not ?
>>
>> The actual allocation happens in videobuf2-vmalloc/dma-contig/dma-sg depending
>> on the flavor of buffers you want (virtual memory, DMA into physically contiguous
>> memory or DMA into scatter-gather memory). The alloc operation is the one that
>> allocates the memory.
> 
> 
> Thank you very much for the time.
> 
> Just to be sure I understand the general mechanism of DMA with regards
> to the read() operation and in the case of using contiguous memory,
> I try to draw the general sequence as I understand it from the code
> and reading on this issue:
> 
> read() into user memory buffer ->
>           vb2_read() ->
>                 __vb2_perform_fileio() ->
>                         deaque buffer with:  vb2_internal_dqbuf() into
> contiguous DMA memory (kernel)  ->
>                                copy_to_user() will actually copy from
> the contigious dma memory(kernel)  into user buffer (userspace)
> 
> 1. Is the above sequence  correct ?

Yes.

> 2. When talking about contiguous dma memory (or scatter-gatther) we
> actually always refer to memory allocated in kernel, right ?

Usually. With the V4L2_MEMORY_USERPTR stream I/O mode it is userspace
that allocates the memory, but when using physically contiguous DMA
this particular streaming mode is normally not supported.

With V4L2_MEMORY_MMAP it is always the kernel that allocates the memory.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-11-20 16:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-17  7:39 cobalt & dma Ran Shalit
2015-11-17  7:53 ` Hans Verkuil
2015-11-17 13:15   ` Ran Shalit
2015-11-17 13:32     ` Steven Toth
2015-11-17 13:54     ` Hans Verkuil
2015-11-17 21:43       ` Ran Shalit
2015-11-20 14:49   ` Ran Shalit
2015-11-20 14:55     ` Hans Verkuil
2015-11-20 16:14       ` Ran Shalit
2015-11-20 16:25         ` Hans Verkuil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.