On Thu, Jan 10, 2019, 4:15 AM Koenig, Christian Am 10.01.19 um 00:39 schrieb Marek Olšák: > > On Wed, Jan 9, 2019 at 1:41 PM Christian König < > ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> Am 09.01.19 um 17:14 schrieb Marek Olšák: >> >> On Wed, Jan 9, 2019 at 8:09 AM Christian König < >> ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>> Am 09.01.19 um 13:36 schrieb Marek Olšák: >>> >>> >>> >>> On Wed, Jan 9, 2019, 5:28 AM Christian König < >>> ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote: >>> >>>> Looks good, but I'm wondering what's the actual improvement? >>>> >>> >>> No malloc calls and 1 less for loop copying the bo list. >>> >>> >>> Yeah, but didn't we want to get completely rid of the bo list? >>> >> >> If we have multiple IBs (e.g. gfx + compute) that share a BO list, I >> think it's faster to send the BO list to the kernel only once. >> >> >> That's not really faster. >> >> The only thing we safe us is a single loop over all BOs to lockup the >> handle into a pointer and that is only a tiny fraction of the overhead. >> >> The majority of the overhead is locking the BOs and reserving space for >> the submission. >> >> What could really help here is to submit gfx+comput together in just one >> CS IOCTL. This way we would need the locking and space reservation only >> once. >> >> It's a bit of work in the kernel side, but certainly doable. >> > > OK. Any objections to this patch? > > > In general I'm wondering if we couldn't avoid adding so much new interface. > There are Vulkan drivers that still use the bo_list interface. > For example we can avoid the malloc() when we just cache the last freed > bo_list structure in the device. We would just need an atomic pointer > exchange operation for that. > > This way we even don't need to change mesa at all. > There is still the for loop that we need to get rid of. > Regarding optimization, this chunk can be replaced by a cast on 64bit: > > + chunk_array = alloca(sizeof(uint64_t) * num_chunks); > + for (i = 0; i < num_chunks; i++) > + chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i]; > > It can't. The input is an array of structures. The ioctl takes an array of pointers. Marek > Regards, > Christian. > > > Thanks, > Marek > > >