All of lore.kernel.org
 help / color / mirror / Atom feed
* Measuring the impact of buffer copy for virtio-gpu guests
@ 2021-02-17 13:46 Alex Bennée
  2021-02-17 14:48 ` François Ozog
  2021-02-18 11:06 ` Gerd Hoffmann
  0 siblings, 2 replies; 4+ messages in thread
From: Alex Bennée @ 2021-02-17 13:46 UTC (permalink / raw)
  To: Gerd Hoffmann, Marc-André Lureau
  Cc: Francois Ozog, Mikhail Golubev, Vasyl Vavrychuk,
	Zhao Jiancong (Jerry 趙 健淙),
	qemu-devel, Peter Griffin, Stratos Mailing List

Hi Gerd,

I was in a discussion with the AGL folks today talking about approaches
to achieving zero-copy when running VirGL virtio guests. AIUI (which is
probably not very much) the reasons for copy can be due to a number of
reasons:

  - the GPA not being mapped to a HPA that is accessible to the final HW
  - the guest allocation of a buffer not meeting stride/alignment requirements
  - data needing to be transformed for consumption by the real hardware?

any others? Is there an impedance between different buffer resource
allocators in the guest and the guest? Is that just a problem for
non-FLOSS blob drivers in the kernel?

I'm curious if it's possible to measure the effect of these extra copies
and where do they occur? Do all resources get copied from the guest buffer to
host or does this only occur when there is a mismatch in the buffer
requirements?

Are there any functions where I could add trace points to measure this?
If this occurs in the kernel I wonder if I could use an eBPF probe to
count the number of bytes copied?

Apologies for the wall of questions I'm still very new to the 3D side of
things ;-)

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Measuring the impact of buffer copy for virtio-gpu guests
  2021-02-17 13:46 Measuring the impact of buffer copy for virtio-gpu guests Alex Bennée
@ 2021-02-17 14:48 ` François Ozog
  2021-02-17 15:48   ` Alex Bennée
  2021-02-18 11:06 ` Gerd Hoffmann
  1 sibling, 1 reply; 4+ messages in thread
From: François Ozog @ 2021-02-17 14:48 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Mikhail Golubev, Vasyl Vavrychuk, Zhao Jiancong, QEMU Developers,
	Peter Griffin, Gerd Hoffmann, Marc-André Lureau,
	Stratos Mailing List

[-- Attachment #1: Type: text/plain, Size: 1981 bytes --]

On Wed, 17 Feb 2021 at 15:13, Alex Bennée <alex.bennee@linaro.org> wrote:

> Hi Gerd,
>
> I was in a discussion with the AGL folks today talking about approaches
> to achieving zero-copy when running VirGL virtio guests. AIUI (which is
> probably not very much) the reasons for copy can be due to a number of
> reasons:
>
>   - the GPA not being mapped to a HPA that is accessible to the final HW
>   - the guest allocation of a buffer not meeting stride/alignment
> requirements
>   - data needing to be transformed for consumption by the real hardware?
>
> any others? Is there an impedance between different buffer resource
> allocators in the guest and the guest? Is that just a problem for
> non-FLOSS blob drivers in the kernel?
>
> I'm curious if it's possible to measure the effect of these extra copies
> and where do they occur?

Making a good benchmark is going to be difficult. Copying has big impacts
on:
- L3 pressure (pure cost of evictions and loss of "sticky" cache lines
benefits)
- Memory request queue and prefetching
- TLB pressure
Conversely, as we are in VM environments the pressure that other VMs have
on those resources, the jitter of the bounce copies will grow.
(lesson learnt from high speed - > 100Gbps - user pace networking)
All this to say that a unitest may be wrongly give impression that copy is
not that costly.

> Do all resources get copied from the guest buffer to
> host or does this only occur when there is a mismatch in the buffer
> requirements?
>
> Are there any functions where I could add trace points to measure this?
> If this occurs in the kernel I wonder if I could use an eBPF probe to
> count the number of bytes copied?
>
> Apologies for the wall of questions I'm still very new to the 3D side of
> things ;-)
>
> --
> Alex Bennée
>


-- 
François-Frédéric Ozog | *Director Linaro Edge & Fog Computing Group*
T: +33.67221.6485
francois.ozog@linaro.org | Skype: ffozog

[-- Attachment #2: Type: text/html, Size: 4082 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Measuring the impact of buffer copy for virtio-gpu guests
  2021-02-17 14:48 ` François Ozog
@ 2021-02-17 15:48   ` Alex Bennée
  0 siblings, 0 replies; 4+ messages in thread
From: Alex Bennée @ 2021-02-17 15:48 UTC (permalink / raw)
  To: François Ozog
  Cc: Mikhail Golubev, Vasyl Vavrychuk, Zhao Jiancong, QEMU Developers,
	Peter Griffin, Gerd Hoffmann, Marc-André Lureau,
	Stratos Mailing List


François Ozog <francois.ozog@linaro.org> writes:

> On Wed, 17 Feb 2021 at 15:13, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>> Hi Gerd,
>>
>> I was in a discussion with the AGL folks today talking about approaches
>> to achieving zero-copy when running VirGL virtio guests. AIUI (which is
>> probably not very much) the reasons for copy can be due to a number of
>> reasons:
>>
>>   - the GPA not being mapped to a HPA that is accessible to the final HW
>>   - the guest allocation of a buffer not meeting stride/alignment
>> requirements
>>   - data needing to be transformed for consumption by the real hardware?
>>
>> any others? Is there an impedance between different buffer resource
>> allocators in the guest and the guest? Is that just a problem for
>> non-FLOSS blob drivers in the kernel?
>>
>> I'm curious if it's possible to measure the effect of these extra copies
>> and where do they occur?
>
> Making a good benchmark is going to be difficult. Copying has big impacts
> on:
> - L3 pressure (pure cost of evictions and loss of "sticky" cache lines
> benefits)
> - Memory request queue and prefetching
> - TLB pressure
> Conversely, as we are in VM environments the pressure that other VMs have
> on those resources, the jitter of the bounce copies will grow.
> (lesson learnt from high speed - > 100Gbps - user pace networking)
> All this to say that a unitest may be wrongly give impression that copy is
> not that costly.

No I'm not doubting that unneeded copying can be costly - I'm just
trying to get an understanding of the scope of the problem. How often do
buffers get copied rather than measuring the total effect which as you
say can be very load dependant.

>
>> Do all resources get copied from the guest buffer to
>> host or does this only occur when there is a mismatch in the buffer
>> requirements?
>>
>> Are there any functions where I could add trace points to measure this?
>> If this occurs in the kernel I wonder if I could use an eBPF probe to
>> count the number of bytes copied?
>>
>> Apologies for the wall of questions I'm still very new to the 3D side of
>> things ;-)
>>
>> --
>> Alex Bennée
>>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Measuring the impact of buffer copy for virtio-gpu guests
  2021-02-17 13:46 Measuring the impact of buffer copy for virtio-gpu guests Alex Bennée
  2021-02-17 14:48 ` François Ozog
@ 2021-02-18 11:06 ` Gerd Hoffmann
  1 sibling, 0 replies; 4+ messages in thread
From: Gerd Hoffmann @ 2021-02-18 11:06 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Francois Ozog, Mikhail Golubev, Vasyl Vavrychuk, Zhao Jiancong,
	qemu-devel, Peter Griffin, Marc-André Lureau,
	Stratos Mailing List

On Wed, Feb 17, 2021 at 01:46:28PM +0000, Alex Bennée wrote:
> Hi Gerd,
> 
> I was in a discussion with the AGL folks today talking about approaches
> to achieving zero-copy when running VirGL virtio guests. AIUI (which is
> probably not very much) the reasons for copy can be due to a number of
> reasons:
> 
>   - the GPA not being mapped to a HPA that is accessible to the final HW
>   - the guest allocation of a buffer not meeting stride/alignment requirements
>   - data needing to be transformed for consumption by the real hardware?

With the current qemu code base each ressource has both a guest and host
buffer and the data is copied over when the guest asks for it.

virtio-gpu got a new feature (VIRTIO_GPU_F_RESOURCE_BLOB) to improve
that.  For blob resources we have stride/alignment negotiation, and they
can also be allocated by the host and mapped into the guest address
space instead of living in guest ram.

linux guest support is there in the kernel and mesa, host side is
supported by crosvm.  qemu doesn't support blob resources though.

> I'm curious if it's possible to measure the effect of these extra copies
> and where do they occur? Do all resources get copied from the guest buffer to
> host or does this only occur when there is a mismatch in the buffer
> requirements?

Without blob resources a copy is required whenever the guest cpu wants
access to the resource (i.e.  glWritePixels / glReadPixels + simliar).
For resources which are a gpu render target and never touched by the cpu
this is not needed.  For these you wouldn't even need guest ram backing
storage (VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING), linux doesn't
implement that optimization though.

> Are there any functions where I could add trace points to measure this?
> If this occurs in the kernel I wonder if I could use an eBPF probe to
> count the number of bytes copied?

Copy happens in qemu or virglrenderer, in response to
VIRTIO_GPU_CMD_TRANSFER_* commands from the guest.

There are tracepoint already in qemu (trace_virtio_gpu_cmd_res_xfer_*),
they log only the resource id though, not the amount of data transfered.

Tracing on the guest side by adding trace points to the kernel shouldn't
be hard too.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-18 11:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-17 13:46 Measuring the impact of buffer copy for virtio-gpu guests Alex Bennée
2021-02-17 14:48 ` François Ozog
2021-02-17 15:48   ` Alex Bennée
2021-02-18 11:06 ` Gerd Hoffmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.