From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Christian_K=c3=b6nig?= Subject: Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs Date: Tue, 29 Aug 2017 15:59:24 +0200 Message-ID: <8e7b93cf-033b-ac3a-4c81-446db00186f5@vodafone.de> References: <1503653899-1781-1-git-send-email-deathsimple@vodafone.de> <1503653899-1781-9-git-send-email-deathsimple@vodafone.de> <19c04fac-1fdd-1436-e85c-95dd4ac02b1b@amd.com> <9304342a-def2-187e-4e9c-d872c58cdc17@vodafone.de> <0006623b-f042-dda0-b6a2-425dc568ff03@vodafone.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0841725084==" Return-path: In-Reply-To: Content-Language: en-US List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "amd-gfx" To: "Zhou, David(ChunMing)" Cc: "Olsak, Marek" , amd-gfx , =?UTF-8?B?TWFyZWsgT2w/6Yaf?= This is a multi-part message in MIME format. --===============0841725084== Content-Type: multipart/alternative; boundary="------------D611DE905B83654816640270" Content-Language: en-US This is a multi-part message in MIME format. --------------D611DE905B83654816640270 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Ok, found something that works. Xonotic in lowest resolution, lowest effects quality (e.g. totally CPU bound): Without per process BOs: Xonotic 0.8: pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] Test 1 of 1 Estimated Trial Run Count: 3 Estimated Time To Completion: 3 Minutes Started Run 1 @ 21:13:50 Started Run 2 @ 21:14:57 Started Run 3 @ 21:16:03 [Std. Dev: 0.94%] Test Results: 187.436577 189.514724 190.9605812 Average: 189.30 Frames Per Second Minimum: 131 Maximum: 355 With per process BOs: Xonotic 0.8: pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] Test 1 of 1 Estimated Trial Run Count: 3 Estimated Time To Completion: 3 Minutes Started Run 1 @ 21:20:05 Started Run 2 @ 21:21:07 Started Run 3 @ 21:22:10 [Std. Dev: 1.49%] Test Results: 203.0471676 199.6622532 197.0954183 Average: 199.93 Frames Per Second Minimum: 132 Maximum: 349 Well that looks like some improvement. Regards, Christian. Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing): > I will push our vulkan guys to test it, their bo list is very long. > > 发自坚果 Pro > > Christian K鰊ig 于 2017年8月28日 下午7:55写道: > > Am 28.08.2017 um 06:21 schrieb zhoucm1: > > > > > > On 2017年08月27日 18:03, Christian König wrote: > >> Am 25.08.2017 um 21:19 schrieb Christian König: > >>> Am 25.08.2017 um 18:22 schrieb Marek Olšák: > >>>> On Fri, Aug 25, 2017 at 3:00 PM, Christian König > >>>> wrote: > >>>>> Am 25.08.2017 um 12:32 schrieb zhoucm1: > >>>>>> > >>>>>> > >>>>>> On 2017年08月25日 17:38, Christian König wrote: > >>>>>>> From: Christian König > >>>>>>> > >>>>>>> Add the IOCTL interface so that applications can allocate per VM > >>>>>>> BOs. > >>>>>>> > >>>>>>> Still WIP since not all corner cases are tested yet, but this > >>>>>>> reduces > >>>>>>> average > >>>>>>> CS overhead for 10K BOs from 21ms down to 48us. > >>>>>> Wow, cheers, eventually you get per vm bo to same reservation > >>>>>> with PD/pts, > >>>>>> indeed save a lot of bo list. > >>>>> > >>>>> Don't cheer to loud yet, that is a completely constructed test case. > >>>>> > >>>>> So far I wasn't able to archive any improvements with any real > >>>>> game on this > >>>>> with Mesa. > > With thinking more, too many BOs share one reservation, which could > > result in reservation lock often is busy, if eviction or destroy also > > happens often in the meaning time, then which could effect VM update > > and CS submission as well. > > That's exactly the reason why I've added code to the BO destroy path to > avoid at least some of the problems. But yeah, that's only the tip of > the iceberg of problems with that approach. > > > Anyway, this is very good start and try that we reduce CS overhead, > > especially we've seen "reduces average CS overhead for 10K BOs from > > 21ms down to 48us. ". > > Actually, it's not that good. See this is a completely build up test > case on a kernel with lockdep and KASAN enabled. > > In reality we usually don't have so many BOs and so far I wasn't able to > find much of an improvement in any real world testing. > > Regards, > Christian. > > > _______________________________________________ > amd-gfx mailing list > amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx --------------D611DE905B83654816640270 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Ok, found something that works. Xonotic in lowest resolution, lowest effects quality (e.g. totally CPU bound):

Without per process BOs:

Xonotic 0.8:
    pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 3 Minutes
        Started Run 1 @ 21:13:50
        Started Run 2 @ 21:14:57
        Started Run 3 @ 21:16:03  [Std. Dev: 0.94%]

    Test Results:
        187.436577
        189.514724
        190.9605812

    Average: 189.30 Frames Per Second
    Minimum: 131
    Maximum: 355

With per process BOs:

Xonotic 0.8:
    pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 3 Minutes
        Started Run 1 @ 21:20:05
        Started Run 2 @ 21:21:07
        Started Run 3 @ 21:22:10  [Std. Dev: 1.49%]

    Test Results:
        203.0471676
        199.6622532
        197.0954183

    Average: 199.93 Frames Per Second
    Minimum: 132
    Maximum: 349

Well that looks like some improvement.

Regards,
Christian.

Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing):
I will push our vulkan guys to test it, their bo list is very long.

发自坚果 Pro

Christian K鰊ig <deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 于 2017年8月28日 下午7:55写道:

Am 28.08.2017 um 06:21 schrieb zhoucm1:
>
>
> On 2017年08月27日 18:03, Christian König wrote:
>> Am 25.08.2017 um 21:19 schrieb Christian König:
>>> Am 25.08.2017 um 18:22 schrieb Marek Olšák:
>>>> On Fri, Aug 25, 2017 at 3:00 PM, Christian König
>>>> <deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> wrote:
>>>>> Am 25.08.2017 um 12:32 schrieb zhoucm1:
>>>>>>
>>>>>>
>>>>>> On 2017年08月25日 17:38, Christian König wrote:
>>>>>>> From: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>>
>>>>>>> Add the IOCTL interface so that applications can allocate per VM
>>>>>>> BOs.
>>>>>>>
>>>>>>> Still WIP since not all corner cases are tested yet, but this
>>>>>>> reduces
>>>>>>> average
>>>>>>> CS overhead for 10K BOs from 21ms down to 48us.
>>>>>> Wow, cheers, eventually you get per vm bo to same reservation
>>>>>> with PD/pts,
>>>>>> indeed save a lot of bo list.
>>>>>
>>>>> Don't cheer to loud yet, that is a completely constructed test case.
>>>>>
>>>>> So far I wasn't able to archive any improvements with any real
>>>>> game on this
>>>>> with Mesa.
> With thinking more, too many BOs share one reservation, which could
> result in reservation lock often is busy, if eviction or destroy also
> happens often in the meaning time, then which could effect VM update
> and CS submission as well.

That's exactly the reason why I've added code to the BO destroy path to
avoid at least some of the problems. But yeah, that's only the tip of
the iceberg of problems with that approach.

> Anyway, this is very good start and try that we reduce CS overhead,
> especially we've seen "reduces average CS overhead for 10K BOs from
> 21ms down to 48us. ".

Actually, it's not that good. See this is a completely build up test
case on a kernel with lockdep and KASAN enabled.

In reality we usually don't have so many BOs and so far I wasn't able to
find much of an improvement in any real world testing.

Regards,
Christian.


_______________________________________________
amd-gfx mailing list
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


--------------D611DE905B83654816640270-- --===============0841725084== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KYW1kLWdmeCBt YWlsaW5nIGxpc3QKYW1kLWdmeEBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5m cmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9hbWQtZ2Z4Cg== --===============0841725084==--