All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling-5C7GfCeVMHo@public.gmane.org>
To: Dave Airlie <airlied-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: "Oded Gabbay"
	<oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"Maling list - DRI developers"
	<dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>
Subject: Re: New KFD ioctls: taking the skeletons out of the closet
Date: Mon, 12 Mar 2018 14:17:59 -0400	[thread overview]
Message-ID: <c9209a60-0d33-49ce-8944-1f9874aaef17@amd.com> (raw)
In-Reply-To: <4ad64912-1fb7-6bcf-8d00-97d9e4ac04bd-5C7GfCeVMHo@public.gmane.org>

On 2018-03-07 03:34 PM, Felix Kuehling wrote:
>> Again stop worrying about ioctl overhead, this isn't Windows. If you
>> can show the overhead as being a problem then address it, but I
>> think it's premature worrying about it at this stage.
> I'd like syscall overhead to be small. But with recent kernel page table
> isolation, NUMA systems and lots of GPUs, I think this may not be
> negligible. For example we're working with some Intel NUMA systems and 8
> GPUs for HPC or deep learning applications. I'll be measuring the
> overhead on such systems and get back with results in a few days. I want
> to have an API that can scale to such applications.

I ran some tests on a 2-socket Xeon E5-2680 v4 with 56 CPU threads and 8
Vega10 GPUs. The kernel was 4.16-rc1 based with KPTI enabled and a
kernel config based on a standard Ubuntu kernel. No debug options were
enabled. My test application measures KFD memory management API
performance for allocating, mapping, unmapping and freeing 1000 buffers
of different sizes (4K, 16K, 64K, 256K) and memory types (VRAM and
system memory). The impact of ioctl overhead depended on whether the
page table update was done by CPU or SDMA.

I averaged 10 runs of the application and also calculated the standard
deviation to see if my results were just random noise.

With SDMA using a single ioctl was about 5% faster for mapping and 10%
faster for unmapping. The standard deviation was 2.5% and 7.5% respectively.

With CPU a single ioctl was 2.5% faster for mapping, 18% faster for
unmapping. Standard deviation was 0.2% and 3% respectively.

For unmapping the difference was bigger than mapping because unmapping
is faster to begin with, so the system call overhead is bigger in
proportion. Mapping of a single buffer to 8 GPUs takes about 220us with
SDMA or 190us with CPU with only minor dependence on buffer size and
memory type. Unmapping takes about 35us with SDMA or 13us with CPU.

>
> Regards,
>   Felix
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2018-03-12 18:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-06 22:44 New KFD ioctls: taking the skeletons out of the closet Felix Kuehling
     [not found] ` <20e5f2e3-89de-c22b-9c9e-c2b6bee02b1c-5C7GfCeVMHo@public.gmane.org>
2018-03-06 23:09   ` Dave Airlie
     [not found]     ` <CAPM=9tzYRLHfocxvMM249pKioxELs=pPBgg20N_u1ZXVmAMbSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-07  8:38       ` Christian König
     [not found]         ` <9f3c4b76-9d69-f685-439a-951354e6e98b-5C7GfCeVMHo@public.gmane.org>
2018-03-07 16:38           ` Daniel Vetter
2018-03-07 19:55             ` Alex Deucher
2018-03-07 20:34       ` Felix Kuehling
     [not found]         ` <4ad64912-1fb7-6bcf-8d00-97d9e4ac04bd-5C7GfCeVMHo@public.gmane.org>
2018-03-12 18:17           ` Felix Kuehling [this message]
2018-03-12 19:37             ` Daniel Vetter
     [not found]               ` <CAKMK7uG94kutBbK3_ZU6e=0HtJ2FJfgaeEKHtC-WxPen9LJiVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-12 20:20                 ` Felix Kuehling
2018-03-06 23:34   ` Jerome Glisse
2018-03-07  8:41     ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9209a60-0d33-49ce-8944-1f9874aaef17@amd.com \
    --to=felix.kuehling-5c7gfcevmho@public.gmane.org \
    --cc=airlied-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=christian.koenig-5C7GfCeVMHo@public.gmane.org \
    --cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.