Am 10.01.19 um 00:39 schrieb Marek Olšák:

On Wed, Jan 9, 2019 at 1:41 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote:

Am 09.01.19 um 17:14 schrieb Marek Olšák:

On Wed, Jan 9, 2019 at 8:09 AM Christian König <ckoenig.leichtzumerken@gmail.com> wrote:

Am 09.01.19 um 13:36 schrieb Marek Olšák:

On Wed, Jan 9, 2019, 5:28 AM Christian König <ckoenig.leichtzumerken@gmail.com wrote:

Looks good, but I'm wondering what's the actual improvement?

No malloc calls and 1 less for loop copying the bo list.

Yeah, but didn't we want to get completely rid of the bo list?

If we have multiple IBs (e.g. gfx + compute) that share a BO list, I think it's faster to send the BO list to the kernel only once.

That's not really faster.

The only thing we safe us is a single loop over all BOs to lockup the handle into a pointer and that is only a tiny fraction of the overhead.

The majority of the overhead is locking the BOs and reserving space for the submission.

What could really help here is to submit gfx+comput together in just one CS IOCTL. This way we would need the locking and space reservation only once.

It's a bit of work in the kernel side, but certainly doable.

OK. Any objections to this patch?

In general I'm wondering if we couldn't avoid adding so much new interface.

For example we can avoid the malloc() when we just cache the last freed bo_list structure in the device. We would just need an atomic pointer exchange operation for that.

This way we even don't need to change mesa at all.

Regarding optimization, this chunk can be replaced by a cast on 64bit:

+	chunk_array = alloca(sizeof(uint64_t) * num_chunks);
+	for (i = 0; i < num_chunks; i++)
+		chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];

Regards,
Christian.

Thanks,

Marek