From mboxrd@z Thu Jan 1 00:00:00 1970 From: Inki Dae Subject: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework Date: Tue, 25 Jun 2013 18:09:32 +0900 Message-ID: References: <20130617182127.GM2718@n2100.arm.linux.org.uk> <007301ce6be4$8d5c6040$a81520c0$%dae@samsung.com> <20130618084308.GU2718@n2100.arm.linux.org.uk> <008a01ce6c02$e00a9f50$a01fddf0$%dae@samsung.com> <1371548849.4276.6.camel@weser.hi.pengutronix.de> <008601ce6cb0$2c8cec40$85a6c4c0$%dae@samsung.com> <1371637326.4230.24.camel@weser.hi.pengutronix.de> <00ae01ce6cd9$f4834630$dd89d290$%dae@samsung.com> <1371645247.4230.41.camel@weser.hi.pengutronix.de> <20130619182925.GL2718@n2100.arm.linux.org.uk> <00da01ce6d81$76eb3d60$64c1b820$%dae@samsung.com> <1371714427.4230.64.camel@weser.hi.pengutronix.de> <00db01ce6d8f$a3c23dd0$eb46b970$%dae@samsung.com> <1371723063.4114.12.camel@weser.hi.pengutronix.de> <010801ce6da7$896affe0$9c40ffa0$%dae@samsung.com> <1371804843.4114.49.camel@weser.hi.pengutronix.de> <1371817628.5882.13.camel@weser.hi.pengutronix.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1431084902==" Return-path: Received: from mail-la0-f42.google.com (mail-la0-f42.google.com [209.85.215.42]) by gabe.freedesktop.org (Postfix) with ESMTP id 158F0E5D50 for ; Tue, 25 Jun 2013 02:09:33 -0700 (PDT) Received: by mail-la0-f42.google.com with SMTP id eb20so11633761lab.29 for ; Tue, 25 Jun 2013 02:09:33 -0700 (PDT) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Jerome Glisse Cc: linux-fbdev , Russell King - ARM Linux , DRI mailing list , Kyungmin Park , "myungjoo.ham" , YoungJun Cho , "linux-media@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" List-Id: dri-devel@lists.freedesktop.org --===============1431084902== Content-Type: multipart/alternative; boundary=089e0160a3ac29f42d04dff6e371 --089e0160a3ac29f42d04dff6e371 Content-Type: text/plain; charset=ISO-8859-1 2013/6/22 Jerome Glisse : > On Fri, Jun 21, 2013 at 12:55 PM, Inki Dae wrote: >> 2013/6/21 Lucas Stach : >>> Hi Inki, >>> >>> please refrain from sending HTML Mails, it makes proper quoting without >>> messing up the layout everywhere pretty hard. >>> >> >> Sorry about that. I should have used text mode. >> >>> Am Freitag, den 21.06.2013, 20:01 +0900 schrieb Inki Dae: >>> [...] >>> >>>> Yeah, you'll some knowledge and understanding about the API >>>> you are >>>> working with to get things right. But I think it's not an >>>> unreasonable >>>> thing to expect the programmer working directly with kernel >>>> interfaces >>>> to read up on how things work. >>>> >>>> Second thing: I'll rather have *one* consistent API for every >>>> subsystem, >>>> even if they differ from each other than having to implement >>>> this >>>> syncpoint thing in every subsystem. Remember: a single execbuf >>>> in DRM >>>> might reference both GEM objects backed by dma-buf as well >>>> native SHM or >>>> CMA backed objects. The dma-buf-mgr proposal already allows >>>> you to >>>> handle dma-bufs much the same way during validation than >>>> native GEM >>>> objects. >>>> >>>> Actually, at first I had implemented a fence helper framework based on >>>> reservation and dma fence to provide easy-use-interface for device >>>> drivers. However, that was wrong implemention: I had not only >>>> customized the dma fence but also not considered dead lock issue. >>>> After that, I have reimplemented it as dmabuf sync to solve dead >>>> issue, and at that time, I realized that we first need to concentrate >>>> on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and >>>> DMA can access a same buffer, And the fact simple is the best, and the >>>> fact we need not only kernel side but also user side interfaces. After >>>> that, I collected what is the common part for all subsystems, and I >>>> have devised this dmabuf sync framework for it. I'm not really >>>> specialist in Desktop world. So question. isn't the execbuf used only >>>> for the GPU? the gpu has dedicated video memory(VRAM) so it needs >>>> migration mechanism between system memory and the dedicated video >>>> memory, and also to consider ordering issue while be migrated. >>>> >>> >>> Yeah, execbuf is pretty GPU specific, but I don't see how this matters >>> for this discussion. Also I don't see a big difference between embedded >>> and desktop GPUs. Buffer migration is more of a detail here. Both take >>> command stream that potentially reference other buffers, which might be >>> native GEM or dma-buf backed objects. Both have to make sure the buffers >>> are in the right domain (caches cleaned and address mappings set up) and >>> are available for the desired operation, meaning you have to sync with >>> other DMA engines and maybe also with CPU. >> >> Yeah, right. Then, in case of desktop gpu, does't it need additional >> something to do when a buffer/s is/are migrated from system memory to >> video memory domain, or from video memory to system memory domain? I >> guess the below members does similar thing, and all other DMA devices >> would not need them: >> struct fence { >> ... >> unsigned int context, seqno; >> ... >> }; >> >> And, >> struct seqno_fence { >> ... >> uint32_t seqno_ofs; >> ... >> }; >> >>> >>> The only case where sync isn't clearly defined right now by the current >>> API entrypoints is when you access memory through the dma-buf fallback >>> mmap support, which might happen with some software processing element >>> in a video pipeline or something. I agree that we will need a userspace >>> interface here, but I think this shouldn't be yet another sync object, >>> but rather more a prepare/fini_cpu_access ioctl on the dma-buf which >>> hooks into the existing dma-fence and reservation stuff. >> >> I think we don't need addition ioctl commands for that. I am thinking >> of using existing resources as possible. My idea also is similar in >> using the reservation stuff to your idea because my approach also >> should use the dma-buf resource. However, My idea is that a user >> process, that wants buffer synchronization with the other, sees a sync >> object as a file descriptor like dma-buf does. The below shows simple >> my idea about it: >> >> ioctl(dmabuf_fd, DMA_BUF_IOC_OPEN_SYNC, &sync); >> >> flock(sync->fd, LOCK_SH); <- LOCK_SH means a shared lock. >> CPU access for read >> flock(sync->fd, LOCK_UN); >> >> Or >> >> flock(sync->fd, LOCK_EX); <- LOCK_EX means an exclusive lock >> CPU access for write >> flock(sync->fd, LOCK_UN); >> >> close(sync->fd); >> >> As you know, that's similar to dmabuf export feature. >> >> In addition, a more simple idea, >> flock(dmabuf_fd, LOCK_SH/EX); >> CPU access for read/write >> flock(dmabuf_fd, LOCK_UN); >> >> However, I'm not sure that the above examples could be worked well, >> and there are no problems yet: actually, I don't fully understand >> flock mechanism, so looking into it. >> >>> >>>> >>>> And to get back to my original point: if you have more than >>>> one task >>>> operating together on a buffer you absolutely need some kind >>>> of real IPC >>>> to sync them up and do something useful. Both you syncpoints >>>> and the >>>> proposed dma-fences only protect the buffer accesses to make >>>> sure >>>> different task don't stomp on each other. There is nothing in >>>> there to >>>> make sure that the output of your pipeline is valid. You have >>>> to take >>>> care of that yourself in userspace. I'll reuse your example to >>>> make it >>>> clear what I mean: >>>> >>>> Task A Task B >>>> ------ ------- >>>> dma_buf_sync_lock(buf1) >>>> CPU write buf1 >>>> dma_buf_sync_unlock(buf1) >>>> ---------schedule Task A again------- >>>> dma_buf_sync_lock(buf1) >>>> CPU write buf1 >>>> dma_buf_sync_unlock(buf1) >>>> ---------schedule Task B--------- >>>> qbuf(buf1) >>>> >>>> dma_buf_sync_lock(buf1) >>>> .... >>>> >>>> This is what can happen if you don't take care of proper >>>> syncing. Task A >>>> writes something to the buffer in expectation that Task B will >>>> take care >>>> of it, but before Task B even gets scheduled Task A overwrites >>>> the >>>> buffer again. Not what you wanted, isn't it? >>>> >>>> Exactly wrong example. I had already mentioned about that. "In case >>>> that data flow goes from A to B, it needs some kind of IPC between the >>>> two tasks every time" So again, your example would have no any >>>> problem in case that *two tasks share the same buffer but these tasks >>>> access the buffer(buf1) as write, and data of the buffer(buf1) isn't >>>> needed to be shared*. They just need to use the buffer as *storage*. >>>> So All they want is to avoid stomping on the buffer in this case. >>>> >>> Sorry, but I don't see the point. If no one is interested in the data of >>> the buffer, why are you sharing it in the first place? >>> >> >> Just used as a storage. i.e., Task A fills the buffer with "AAAAAA" >> using CPU, And Task B fills the buffer with "BBBBBB" using DMA. They >> don't share data of the buffer, but they share *memory region* of the >> buffer. That would be very useful for the embedded systems with very >> small size system memory. > > Just so i understand. You want to share backing memory, you don't want > to share content ie you want to do memory management in userspace. > This sounds wrong on so many level (not even considering the security > implication). > > If Task A need memory and then can release it for Task B usage Not true. Task A can never release memory because All task A can do is to unreference dma buf object of sync object. And please know that user interfaces hasn't been implemented yet, and we just have a plan for it as I already mentioned. > that > should be the role of kernel memory management which of course needs > synchronization btw A and B. But in no case this should be done using > dma-buf. dma-buf is for sharing content btw different devices not > sharing resources. > hmm, is that true? And are you sure? Then how do you think about reservation? the reservation also uses dma-buf with same reason as long as I know: actually, we use reservation to use dma-buf. As you may know, a reservation object is allocated and initialized when a buffer object is exported to a dma buf. Thanks, Inki Dae > > Also don't over complicate the vram case, just consider desktop gpu as > using system memory directly. They can do it and they do it. Migration > to vram is orthogonal to all this, it's an optimization so to speak. > > Cheers, > Jerome --089e0160a3ac29f42d04dff6e371 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

2013/6/22 Jerome Glisse <j= .glisse@gmail.com>:
> On Fri, Jun 21, 2013 at 12:55 PM, Inki D= ae <daeinki@gmail.com> wrote= :
>> 2013/6/21 Lucas Stach <l.stach@pengutronix.de>:
>>> Hi Inki,
>>><= br>>>> please refrain from sending HTML Mails, it makes proper quo= ting without
>>> messing up the layout everywhere pretty hard.
>>><= br>>>
>> Sorry about that. I should have used text mode.
= >>
>>> Am Freitag, den 21.06.2013, 20:01 +0900 schrieb In= ki Dae:
>>> [...]
>>>
>>>> =A0 =A0 =A0 =A0 Yeah= , you'll some knowledge and understanding about the API
>>>= > =A0 =A0 =A0 =A0 you are
>>>> =A0 =A0 =A0 =A0 working wi= th to get things right. But I think it's not an
>>>> =A0 =A0 =A0 =A0 unreasonable
>>>> =A0 =A0 = =A0 =A0 thing to expect the programmer working directly with kernel
>= >>> =A0 =A0 =A0 =A0 interfaces
>>>> =A0 =A0 =A0 =A0= to read up on how things work.
>>>>
>>>> =A0 =A0 =A0 =A0 Second thing: I'll= rather have *one* consistent API for every
>>>> =A0 =A0 =A0= =A0 subsystem,
>>>> =A0 =A0 =A0 =A0 even if they differ fro= m each other than having to implement
>>>> =A0 =A0 =A0 =A0 this
>>>> =A0 =A0 =A0 =A0 s= yncpoint thing in every subsystem. Remember: a single execbuf
>>&g= t;> =A0 =A0 =A0 =A0 in DRM
>>>> =A0 =A0 =A0 =A0 might ref= erence both GEM objects backed by dma-buf as well
>>>> =A0 =A0 =A0 =A0 native SHM or
>>>> =A0 =A0 = =A0 =A0 CMA backed objects. The dma-buf-mgr proposal already allows
>= >>> =A0 =A0 =A0 =A0 you to
>>>> =A0 =A0 =A0 =A0 han= dle dma-bufs much the same way during validation than
>>>> =A0 =A0 =A0 =A0 native GEM
>>>> =A0 =A0 =A0= =A0 objects.
>>>>
>>>> Actually, at first I = had implemented a fence helper framework based on
>>>> reser= vation and dma fence to provide easy-use-interface for device
>>>> drivers. However, that was wrong implemention: I had not o= nly
>>>> customized the dma fence but also not considered de= ad lock issue.
>>>> After that, I have reimplemented it as d= mabuf sync to solve dead
>>>> issue, and at that time, I realized that we first need to = concentrate
>>>> on the most basic thing: the fact CPU and C= PU, CPU and DMA, or DMA and
>>>> DMA can access a same buffe= r, And the fact simple is the best, and the
>>>> fact we need not only kernel side but also user side inter= faces. After
>>>> that, I collected what is the common part = for all subsystems, and I
>>>> have devised this dmabuf sync= framework for it. I'm not really
>>>> specialist in Desktop world. So question. isn't the ex= ecbuf used only
>>>> for the GPU? the gpu has dedicated vide= o memory(VRAM) so it needs
>>>> migration mechanism between = system memory and the dedicated video
>>>> memory, and also to consider ordering issue while be migra= ted.
>>>>
>>>
>>> Yeah, execbuf is p= retty GPU specific, but I don't see how this matters
>>> fo= r this discussion. Also I don't see a big difference between embedded >>> and desktop GPUs. Buffer migration is more of a detail here. B= oth take
>>> command stream that potentially reference other bu= ffers, which might be
>>> native GEM or dma-buf backed objects.= Both have to make sure the buffers
>>> are in the right domain (caches cleaned and address mappings s= et up) and
>>> are available for the desired operation, meaning= you have to sync with
>>> other DMA engines and maybe also wit= h CPU.
>>
>> Yeah, right. Then, in case of desktop gpu, does't = it need additional
>> something to do when a buffer/s is/are migra= ted from system memory to
>> video memory domain, or from video me= mory to system memory domain? I
>> guess the below members does similar thing, and all other DMA devi= ces
>> would not need them:
>> =A0 =A0 =A0 =A0 struct fen= ce {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ...
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 unsigned int context, seqno;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ...
>> =A0 =A0 =A0 = =A0 };
>>
>> And,
>> =A0 =A0 =A0 =A0struct seqno= _fence {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...
>> =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0uint32_t seqno_ofs;
>> =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0...
>> =A0 =A0 =A0 =A0};
>>
>>>
>>> The = only case where sync isn't clearly defined right now by the current
= >>> API entrypoints is when you access memory through the dma-buf = fallback
>>> mmap support, which might happen with some software processing= element
>>> in a video pipeline or something. I agree that we = will need a userspace
>>> interface here, but I think this shou= ldn't be yet another sync object,
>>> but rather more a prepare/fini_cpu_access ioctl on the dma-buf= which
>>> hooks into the existing dma-fence and reservation st= uff.
>>
>> I think we don't need addition ioctl comma= nds for that. I am thinking
>> of using existing resources as possible. My idea also is similar i= n
>> using the reservation stuff to your idea because my approach = also
>> should use the dma-buf resource. However, My idea is that = a user
>> process, that wants buffer synchronization with the other, sees a = sync
>> object as a file descriptor like dma-buf does. The below s= hows simple
>> my idea about it:
>>
>> ioctl(dma= buf_fd, DMA_BUF_IOC_OPEN_SYNC, &sync);
>>
>> flock(sync->fd, LOCK_SH); <- LOCK_SH means a sha= red lock.
>> CPU access for read
>> flock(sync->fd, LO= CK_UN);
>>
>> Or
>>
>> flock(sync->f= d, LOCK_EX); <- LOCK_EX means an exclusive lock
>> CPU access for write
>> flock(sync->fd, LOCK_UN);
&= gt;>
>> close(sync->fd);
>>
>> As you know= , that's similar to dmabuf export feature.
>>
>> In a= ddition, a more simple idea,
>> flock(dmabuf_fd, LOCK_SH/EX);
>> CPU access for read/writ= e
>> flock(dmabuf_fd, LOCK_UN);
>>
>> However, I= 'm not sure that the above examples could be worked well,
>> a= nd there are no problems yet: actually, I don't fully understand
>> flock mechanism, so looking into it.
>>
>>>>>>>
>>>> =A0 =A0 =A0 =A0 And to get back to m= y original point: if you have more than
>>>> =A0 =A0 =A0 =A0= one task
>>>> =A0 =A0 =A0 =A0 operating together on a buffer you absolut= ely need some kind
>>>> =A0 =A0 =A0 =A0 of real IPC
>&= gt;>> =A0 =A0 =A0 =A0 to sync them up and do something useful. Both y= ou syncpoints
>>>> =A0 =A0 =A0 =A0 and the
>>>> =A0 =A0 =A0 =A0 proposed dma-fences only protect the buffe= r accesses to make
>>>> =A0 =A0 =A0 =A0 sure
>>>= > =A0 =A0 =A0 =A0 different task don't stomp on each other. There is= nothing in
>>>> =A0 =A0 =A0 =A0 there to
>>>> =A0 =A0 =A0 =A0 make sure that the output of your pipeline= is valid. You have
>>>> =A0 =A0 =A0 =A0 to take
>>= >> =A0 =A0 =A0 =A0 care of that yourself in userspace. I'll reuse= your example to
>>>> =A0 =A0 =A0 =A0 make it
>>>> =A0 =A0 =A0 = =A0 clear what I mean:
>>>>
>>>> =A0 =A0 =A0 = =A0 Task A =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Task B
>>>> =A0 =A0 =A0 =A0 ------ =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ---= ----
>>>> =A0 =A0 =A0 =A0 dma_buf_sync_lock(buf1)
>>>>= ; =A0 =A0 =A0 =A0 CPU write buf1
>>>> =A0 =A0 =A0 =A0 dma_bu= f_sync_unlock(buf1)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= ---------schedule Task A again-------
>>>> =A0 =A0 =A0 =A0 dma_buf_sync_lock(buf1)
>>>>= ; =A0 =A0 =A0 =A0 CPU write buf1
>>>> =A0 =A0 =A0 =A0 dma_bu= f_sync_unlock(buf1)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 ---------schedule Task B---------
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0qbuf(buf1)
>&g= t;>>
>>>> =A0 =A0 =A0 =A0 dma_buf_sync_lock(buf1)
&= gt;>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0....
>>>>
>>>> =A0 =A0 =A0 =A0 This is what can happe= n if you don't take care of proper
>>>> =A0 =A0 =A0 =A0 = syncing. Task A
>>>> =A0 =A0 =A0 =A0 writes something to the= buffer in expectation that Task B will
>>>> =A0 =A0 =A0 =A0 take care
>>>> =A0 =A0 =A0 = =A0 of it, but before Task B even gets scheduled Task A overwrites
>&= gt;>> =A0 =A0 =A0 =A0 the
>>>> =A0 =A0 =A0 =A0 buffer = again. Not what you wanted, isn't it?
>>>>
>>>> Exactly wrong example. I had already m= entioned about that. "In case
>>>> that data flow goes = from A to B, it needs some kind of IPC between the
>>>> two = tasks every time" =A0So again, your example would have no any
>>>> problem in case that *two tasks share the same buffer but = these tasks
>>>> access the buffer(buf1) as write, and data = of the buffer(buf1) isn't
>>>> needed to be shared*. =A0= They just need to use the buffer as *storage*.
>>>> So All they want is to avoid stomping on the buffer in thi= s case.
>>>>
>>> Sorry, but I don't see the = point. If no one is interested in the data of
>>> the buffer, w= hy are you sharing it in the first place?
>>>
>>
>> Just used as a storage. i.e., Task A f= ills the buffer with "AAAAAA"
>> using CPU, And Task B f= ills the buffer with "BBBBBB" using DMA. They
>> don'= ;t share data of the buffer, but they share *memory region* of the
>> buffer. That would be very useful for the embedded systems with ve= ry
>> small size system memory.
>
> Just so i understa= nd. You want to share backing memory, you don't want
> to share c= ontent ie you want to do memory management in userspace.
> This sounds wrong on so many level (not even considering the security<= br>> implication).
>
> If Task A need memory and then can re= lease it for Task B usage

Not true. Task A can never release memory = because All task A can do is to unreference dma buf object of sync object. = And please know that user interfaces hasn't been implemented yet, and w= e just have a plan for it as I already mentioned.

> that
> should be the role of kernel memory management which = of course needs
> synchronization btw A and B. But in no case this sh= ould be done using
> dma-buf. dma-buf is for sharing content btw diff= erent devices not
> sharing resources.
>

hmm, is that true? And are you sure?= Then how do you think about reservation? the reservation also uses dma-buf= with same reason as long as I know: actually, we use reservation to use dm= a-buf. As you may know, a reservation object is allocated and initialized w= hen a buffer object is exported to a dma buf.

Thanks,
Inki Dae

>
> Also don't over complicate = the vram case, just consider desktop gpu as
> using system memory dir= ectly. They can do it and they do it. Migration
> to vram is orthogon= al to all this, it's an optimization so to speak.
>
> Cheers,
> Jerome


--089e0160a3ac29f42d04dff6e371-- --===============1431084902== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel --===============1431084902==--