All of lore.kernel.org
 help / color / mirror / Atom feed
* QEMU-KVM and video performance
@ 2010-04-19 19:14 ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-19 19:14 UTC (permalink / raw)
  To: qemu-devel, kvm

Hello,

Finally I got QEMU-KVM to work but video performance under DOS is very 
low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM is 
slow)

I'm measuring 2 performance critical video performance parameters:
1.) INT 10h, function AX=4F05h (set same window/set window/get window)
2.) Memory performance to segment page A000h

So BIOS performance (which might be port performance to VGA index/value 
port) is about factor 5 slower, memory performance is about factor 100 
slower.

QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
tolerance) and listed only once, QEMU KVM is much more slower (details see 
below).

Test programs can be provided, source code will be release soon.

Any ideas why KVM is so slow? Any ideas for improvement?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

==========================================================
QEMU 0.12.3 and QEMU GIT
==========================================================
INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger

VESA set window: seconds=2, operations=1000000, ops/s=500000.0
VESA set window (nc): seconds=2, operations=1000000, ops/s=500000.0
VESA get window: seconds=2, operations=1000000, ops/s=500000.0

MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger

BYTE performance, time=10s, bytes=930611200, rate=88.750 MB/s
WORD performance, time=10s, bytes=766771200, rate=73.125 MB/s
DWORD performance, time=10s, bytes=812646400, rate=77.500 MB/s
QWORD performance, time=10s, bytes=806092800, rate=76.875 MB/s
==========================================================

==========================================================
QEMU-KVM
==========================================================
INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger

VESA set window: seconds=9, operations=1000000, ops/s=111111.1
VESA set window (nc): seconds=9, operations=1000000, ops/s=111111.1
VESA get window: seconds=5, operations=1000000, ops/s=200000.0

MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger

BYTE performance, time=13s, bytes=13107200, rate=0.962 MB/s
WORD performance, time=13s, bytes=13107200, rate=0.962 MB/s
DWORD performance, time=12s, bytes=13107200, rate=1.042 MB/s
QWORD performance, time=13s, bytes=13107200, rate=0.962 MB/s
==========================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] QEMU-KVM and video performance
@ 2010-04-19 19:14 ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-19 19:14 UTC (permalink / raw)
  To: qemu-devel, kvm

Hello,

Finally I got QEMU-KVM to work but video performance under DOS is very 
low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM is 
slow)

I'm measuring 2 performance critical video performance parameters:
1.) INT 10h, function AX=4F05h (set same window/set window/get window)
2.) Memory performance to segment page A000h

So BIOS performance (which might be port performance to VGA index/value 
port) is about factor 5 slower, memory performance is about factor 100 
slower.

QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
tolerance) and listed only once, QEMU KVM is much more slower (details see 
below).

Test programs can be provided, source code will be release soon.

Any ideas why KVM is so slow? Any ideas for improvement?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

==========================================================
QEMU 0.12.3 and QEMU GIT
==========================================================
INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger

VESA set window: seconds=2, operations=1000000, ops/s=500000.0
VESA set window (nc): seconds=2, operations=1000000, ops/s=500000.0
VESA get window: seconds=2, operations=1000000, ops/s=500000.0

MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger

BYTE performance, time=10s, bytes=930611200, rate=88.750 MB/s
WORD performance, time=10s, bytes=766771200, rate=73.125 MB/s
DWORD performance, time=10s, bytes=812646400, rate=77.500 MB/s
QWORD performance, time=10s, bytes=806092800, rate=76.875 MB/s
==========================================================

==========================================================
QEMU-KVM
==========================================================
INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger

VESA set window: seconds=9, operations=1000000, ops/s=111111.1
VESA set window (nc): seconds=9, operations=1000000, ops/s=111111.1
VESA get window: seconds=5, operations=1000000, ops/s=200000.0

MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger

BYTE performance, time=13s, bytes=13107200, rate=0.962 MB/s
WORD performance, time=13s, bytes=13107200, rate=0.962 MB/s
DWORD performance, time=12s, bytes=13107200, rate=1.042 MB/s
QWORD performance, time=13s, bytes=13107200, rate=0.962 MB/s
==========================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: QEMU-KVM and video performance
  2010-04-19 19:14 ` [Qemu-devel] " Gerhard Wiesinger
@ 2010-04-21  8:59   ` Avi Kivity
  -1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21  8:59 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
> Hello,
>
> Finally I got QEMU-KVM to work but video performance under DOS is very 
> low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM 
> is slow)
>
> I'm measuring 2 performance critical video performance parameters:
> 1.) INT 10h, function AX=4F05h (set same window/set window/get window)
> 2.) Memory performance to segment page A000h
>
> So BIOS performance (which might be port performance to VGA 
> index/value port) is about factor 5 slower, memory performance is 
> about factor 100 slower.
>
> QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
> tolerance) and listed only once, QEMU KVM is much more slower (details 
> see below).
>
> Test programs can be provided, source code will be release soon.
>
> Any ideas why KVM is so slow? 

16-color vga is slow because kvm cannot map the framebuffer to the guest 
(writes are not interpreted as RAM writes).  256+-color vga should be 
fast, except when switching the vga window.  Note it's only fast on 
average, the first write into a page will be slow as kvm maps it in.

Which mode are you using?

> Any ideas for improvement?

Currently when the physical memory map changes (which is what happens 
when the vga window is updated), kvm drops the entire shadow cache.  
It's possible to do this only for vga memory, but not easy.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21  8:59   ` Avi Kivity
  0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21  8:59 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
> Hello,
>
> Finally I got QEMU-KVM to work but video performance under DOS is very 
> low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM 
> is slow)
>
> I'm measuring 2 performance critical video performance parameters:
> 1.) INT 10h, function AX=4F05h (set same window/set window/get window)
> 2.) Memory performance to segment page A000h
>
> So BIOS performance (which might be port performance to VGA 
> index/value port) is about factor 5 slower, memory performance is 
> about factor 100 slower.
>
> QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
> tolerance) and listed only once, QEMU KVM is much more slower (details 
> see below).
>
> Test programs can be provided, source code will be release soon.
>
> Any ideas why KVM is so slow? 

16-color vga is slow because kvm cannot map the framebuffer to the guest 
(writes are not interpreted as RAM writes).  256+-color vga should be 
fast, except when switching the vga window.  Note it's only fast on 
average, the first write into a page will be slow as kvm maps it in.

Which mode are you using?

> Any ideas for improvement?

Currently when the physical memory map changes (which is what happens 
when the vga window is updated), kvm drops the entire shadow cache.  
It's possible to do this only for vga memory, but not easy.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21  8:59   ` [Qemu-devel] " Avi Kivity
  (?)
@ 2010-04-21 10:08   ` Jamie Lokier
  2010-04-21 10:49     ` Avi Kivity
  -1 siblings, 1 reply; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 10:08 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gerhard Wiesinger, qemu-devel, kvm

Avi Kivity wrote:
> On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
> >Hello,
> >
> >Finally I got QEMU-KVM to work but video performance under DOS is very 
> >low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM 
> >is slow)
> >
> >I'm measuring 2 performance critical video performance parameters:
> >1.) INT 10h, function AX=4F05h (set same window/set window/get window)
> >2.) Memory performance to segment page A000h
> >
> >So BIOS performance (which might be port performance to VGA 
> >index/value port) is about factor 5 slower, memory performance is 
> >about factor 100 slower.
> >
> >QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
> >tolerance) and listed only once, QEMU KVM is much more slower (details 
> >see below).
> >
> >Test programs can be provided, source code will be release soon.
> >
> >Any ideas why KVM is so slow? 
> 
> 16-color vga is slow because kvm cannot map the framebuffer to the guest 
> (writes are not interpreted as RAM writes).  256+-color vga should be 
> fast, except when switching the vga window.  Note it's only fast on 
> average, the first write into a page will be slow as kvm maps it in.

I don't understand: why is 256+-colour mappable and 16-colour not mappable?

Is this a case where TCG would run significantly faster for code blocks
that have been detected to access the VGA memory?

> Which mode are you using?
> 
> >Any ideas for improvement?
> 
> Currently when the physical memory map changes (which is what happens 
> when the vga window is updated), kvm drops the entire shadow cache.  
> It's possible to do this only for vga memory, but not easy.

If it's a page fault handled in the kernel, I would expect it to be
about as fast as those old VGA DOS-extender drivers which provide the
illusion of a single flat mapping, and bank switch on page faults -
multiplied by the speed of modern CPUs compared with then.  For many
graphics things those DOS-extender drivers worked perfectly well.

If it's a trap out to qemu on every vga window change, perhaps not
quite so well.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 10:08   ` Jamie Lokier
@ 2010-04-21 10:49     ` Avi Kivity
  2010-04-21 18:14         ` [Qemu-devel] " Gerhard Wiesinger
  2010-04-21 18:39       ` Jamie Lokier
  0 siblings, 2 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21 10:49 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Gerhard Wiesinger, qemu-devel, kvm

On 04/21/2010 01:08 PM, Jamie Lokier wrote:
> Avi Kivity wrote:
>    
>> On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
>>      
>>> Hello,
>>>
>>> Finally I got QEMU-KVM to work but video performance under DOS is very
>>> low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM
>>> is slow)
>>>
>>> I'm measuring 2 performance critical video performance parameters:
>>> 1.) INT 10h, function AX=4F05h (set same window/set window/get window)
>>> 2.) Memory performance to segment page A000h
>>>
>>> So BIOS performance (which might be port performance to VGA
>>> index/value port) is about factor 5 slower, memory performance is
>>> about factor 100 slower.
>>>
>>> QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement
>>> tolerance) and listed only once, QEMU KVM is much more slower (details
>>> see below).
>>>
>>> Test programs can be provided, source code will be release soon.
>>>
>>> Any ideas why KVM is so slow?
>>>        
>> 16-color vga is slow because kvm cannot map the framebuffer to the guest
>> (writes are not interpreted as RAM writes).  256+-color vga should be
>> fast, except when switching the vga window.  Note it's only fast on
>> average, the first write into a page will be slow as kvm maps it in.
>>      
> I don't understand: why is 256+-colour mappable and 16-colour not mappable?
>    

Writes to vga in 16-color mode don't change set a memory location to a 
value, instead they change multiple memory locations.

> Is this a case where TCG would run significantly faster for code blocks
> that have been detected to access the VGA memory?
>    

Yes.

>> Currently when the physical memory map changes (which is what happens
>> when the vga window is updated), kvm drops the entire shadow cache.
>> It's possible to do this only for vga memory, but not easy.
>>      
> If it's a page fault handled in the kernel, I would expect it to be
> about as fast as those old VGA DOS-extender drivers which provide the
> illusion of a single flat mapping, and bank switch on page faults -
> multiplied by the speed of modern CPUs compared with then.  For many
> graphics things those DOS-extender drivers worked perfectly well.
>
> If it's a trap out to qemu on every vga window change, perhaps not
> quite so well.
>    

It's much more complicated.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21  8:59   ` [Qemu-devel] " Avi Kivity
  (?)
  (?)
@ 2010-04-21 18:09   ` Gerhard Wiesinger
  2010-04-21 18:33       ` Jamie Lokier
  -1 siblings, 1 reply; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-21 18:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
>> Hello,
>> 
>> Finally I got QEMU-KVM to work but video performance under DOS is very low 
>> (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM is slow)
>> 
>> I'm measuring 2 performance critical video performance parameters:
>> 1.) INT 10h, function AX=4F05h (set same window/set window/get window)
>> 2.) Memory performance to segment page A000h
>> 
>> So BIOS performance (which might be port performance to VGA index/value 
>> port) is about factor 5 slower, memory performance is about factor 100 
>> slower.
>> 
>> QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
>> tolerance) and listed only once, QEMU KVM is much more slower (details see 
>> below).
>> 
>> Test programs can be provided, source code will be release soon.
>> 
>> Any ideas why KVM is so slow? 
>
> 16-color vga is slow because kvm cannot map the framebuffer to the guest 
> (writes are not interpreted as RAM writes).  256+-color vga should be fast, 
> except when switching the vga window.  Note it's only fast on average, the 
> first write into a page will be slow as kvm maps it in.
>
> Which mode are you using?
>

I'm using VESA mode 0x101 (640x480 256 colors), but performance is 
there very low (~1MB/s). Test is also WITHOUT any vga window change, so 
there isn't any page switching overhead involved in this test case.

>> Any ideas for improvement?
>
> Currently when the physical memory map changes (which is what happens when 
> the vga window is updated), kvm drops the entire shadow cache.  It's possible 
> to do this only for vga memory, but not easy.

I don't think changing VGA window is a problem because there are 
500.000-1Mio changes/s possible.

Would it be possible to handle these writes through QEMU directly (without 
KVM), because performance is there very well (looking at the code there 
is some pointer arithmetic and some memory write done)?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Re: QEMU-KVM and video performance
  2010-04-21 10:49     ` Avi Kivity
@ 2010-04-21 18:14         ` Gerhard Wiesinger
  2010-04-21 18:39       ` Jamie Lokier
  1 sibling, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-21 18:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 01:08 PM, Jamie Lokier wrote:
>> Avi Kivity wrote:
>> 
>>> On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Finally I got QEMU-KVM to work but video performance under DOS is very
>>>> low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM
>>>> is slow)
>>>> 
>>>> I'm measuring 2 performance critical video performance parameters:
>>>> 1.) INT 10h, function AX=4F05h (set same window/set window/get window)
>>>> 2.) Memory performance to segment page A000h
>>>> 
>>>> So BIOS performance (which might be port performance to VGA
>>>> index/value port) is about factor 5 slower, memory performance is
>>>> about factor 100 slower.
>>>> 
>>>> QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement
>>>> tolerance) and listed only once, QEMU KVM is much more slower (details
>>>> see below).
>>>> 
>>>> Test programs can be provided, source code will be release soon.
>>>> 
>>>> Any ideas why KVM is so slow?
>>>> 
>>> 16-color vga is slow because kvm cannot map the framebuffer to the guest
>>> (writes are not interpreted as RAM writes).  256+-color vga should be
>>> fast, except when switching the vga window.  Note it's only fast on
>>> average, the first write into a page will be slow as kvm maps it in.
>>> 
>> I don't understand: why is 256+-colour mappable and 16-colour not mappable?
>> 
>
> Writes to vga in 16-color mode don't change set a memory location to a value, 
> instead they change multiple memory locations.
>
>> Is this a case where TCG would run significantly faster for code blocks
>> that have been detected to access the VGA memory?
>> 
>
> Yes.
>
>>> Currently when the physical memory map changes (which is what happens
>>> when the vga window is updated), kvm drops the entire shadow cache.
>>> It's possible to do this only for vga memory, but not easy.
>>> 
>> If it's a page fault handled in the kernel, I would expect it to be
>> about as fast as those old VGA DOS-extender drivers which provide the
>> illusion of a single flat mapping, and bank switch on page faults -
>> multiplied by the speed of modern CPUs compared with then.  For many
>> graphics things those DOS-extender drivers worked perfectly well.
>> 
>> If it's a trap out to qemu on every vga window change, perhaps not
>> quite so well.
>> 
>
> It's much more complicated.
>

Can you explain which code files/functions of KVM is involved in handling 
VGA memory window and page switching through the port write to the VGA 
window register (or is that part handled through QEMU), so a little bit 
architecture explaination would be nice?

BTW: In which KVM code parts is decided where "direct code" or an 
"emulated device code" is used?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 18:14         ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-21 18:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 01:08 PM, Jamie Lokier wrote:
>> Avi Kivity wrote:
>> 
>>> On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Finally I got QEMU-KVM to work but video performance under DOS is very
>>>> low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM
>>>> is slow)
>>>> 
>>>> I'm measuring 2 performance critical video performance parameters:
>>>> 1.) INT 10h, function AX=4F05h (set same window/set window/get window)
>>>> 2.) Memory performance to segment page A000h
>>>> 
>>>> So BIOS performance (which might be port performance to VGA
>>>> index/value port) is about factor 5 slower, memory performance is
>>>> about factor 100 slower.
>>>> 
>>>> QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement
>>>> tolerance) and listed only once, QEMU KVM is much more slower (details
>>>> see below).
>>>> 
>>>> Test programs can be provided, source code will be release soon.
>>>> 
>>>> Any ideas why KVM is so slow?
>>>> 
>>> 16-color vga is slow because kvm cannot map the framebuffer to the guest
>>> (writes are not interpreted as RAM writes).  256+-color vga should be
>>> fast, except when switching the vga window.  Note it's only fast on
>>> average, the first write into a page will be slow as kvm maps it in.
>>> 
>> I don't understand: why is 256+-colour mappable and 16-colour not mappable?
>> 
>
> Writes to vga in 16-color mode don't change set a memory location to a value, 
> instead they change multiple memory locations.
>
>> Is this a case where TCG would run significantly faster for code blocks
>> that have been detected to access the VGA memory?
>> 
>
> Yes.
>
>>> Currently when the physical memory map changes (which is what happens
>>> when the vga window is updated), kvm drops the entire shadow cache.
>>> It's possible to do this only for vga memory, but not easy.
>>> 
>> If it's a page fault handled in the kernel, I would expect it to be
>> about as fast as those old VGA DOS-extender drivers which provide the
>> illusion of a single flat mapping, and bank switch on page faults -
>> multiplied by the speed of modern CPUs compared with then.  For many
>> graphics things those DOS-extender drivers worked perfectly well.
>> 
>> If it's a trap out to qemu on every vga window change, perhaps not
>> quite so well.
>> 
>
> It's much more complicated.
>

Can you explain which code files/functions of KVM is involved in handling 
VGA memory window and page switching through the port write to the VGA 
window register (or is that part handled through QEMU), so a little bit 
architecture explaination would be nice?

BTW: In which KVM code parts is decided where "direct code" or an 
"emulated device code" is used?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 18:09   ` Gerhard Wiesinger
@ 2010-04-21 18:33       ` Jamie Lokier
  0 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 18:33 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, qemu-devel, kvm

Gerhard Wiesinger wrote:
> I'm using VESA mode 0x101 (640x480 256 colors), but performance is 
> there very low (~1MB/s). Test is also WITHOUT any vga window change, so 
> there isn't any page switching overhead involved in this test case.
> 
> >>Any ideas for improvement?
> >
> >Currently when the physical memory map changes (which is what happens 
> >when the vga window is updated), kvm drops the entire shadow cache.  It's 
> >possible to do this only for vga memory, but not easy.
> 
> I don't think changing VGA window is a problem because there are 
> 500.000-1Mio changes/s possible.

1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
or trap on every write?

> Would it be possible to handle these writes through QEMU directly (without 
> KVM), because performance is there very well (looking at the code there 
> is some pointer arithmetic and some memory write done)?

I've noticed extremely slow VGA performance too, when installing OSes.
It makes the difference between installing in a few minutes, and
installing taking hours - just because of the slow VGA.

So generally I use qemu for installing old versions of Windows, then
change to KVM to run them after installing.

Switching between KVM and qemu automatically based on guest code
behaviour, and making both memory models and device models compatible
at run time, is a difficult thing.  I guess it's not worth the
difficulty just to speed up VGA.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 18:33       ` Jamie Lokier
  0 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 18:33 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, kvm, qemu-devel

Gerhard Wiesinger wrote:
> I'm using VESA mode 0x101 (640x480 256 colors), but performance is 
> there very low (~1MB/s). Test is also WITHOUT any vga window change, so 
> there isn't any page switching overhead involved in this test case.
> 
> >>Any ideas for improvement?
> >
> >Currently when the physical memory map changes (which is what happens 
> >when the vga window is updated), kvm drops the entire shadow cache.  It's 
> >possible to do this only for vga memory, but not easy.
> 
> I don't think changing VGA window is a problem because there are 
> 500.000-1Mio changes/s possible.

1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
or trap on every write?

> Would it be possible to handle these writes through QEMU directly (without 
> KVM), because performance is there very well (looking at the code there 
> is some pointer arithmetic and some memory write done)?

I've noticed extremely slow VGA performance too, when installing OSes.
It makes the difference between installing in a few minutes, and
installing taking hours - just because of the slow VGA.

So generally I use qemu for installing old versions of Windows, then
change to KVM to run them after installing.

Switching between KVM and qemu automatically based on guest code
behaviour, and making both memory models and device models compatible
at run time, is a difficult thing.  I guess it's not worth the
difficulty just to speed up VGA.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 10:49     ` Avi Kivity
  2010-04-21 18:14         ` [Qemu-devel] " Gerhard Wiesinger
@ 2010-04-21 18:39       ` Jamie Lokier
  2010-04-21 20:51         ` Avi Kivity
  1 sibling, 1 reply; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 18:39 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gerhard Wiesinger, qemu-devel, kvm

Avi Kivity wrote:
> Writes to vga in 16-color mode don't change set a memory location to a 
> value, instead they change multiple memory locations.

While code is just writing to the VGA memory, not reading(*) and not
touching the VGA I/O register that control the write latches, is it
possible in principle to swizzle the format around in memory to make
regular writes work?

(*) Reading should be ok for some settings of the write latches, I
think.

I wonder if guests of interest behave like that.

> >Is this a case where TCG would run significantly faster for code blocks
> >that have been detected to access the VGA memory?
> 
> Yes.

$ date
Wed Apr 21 19:37:38 2015
$ modprobe ktcg
;-)

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 18:33       ` Jamie Lokier
@ 2010-04-21 18:50         ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-21 18:50 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Avi Kivity, qemu-devel, kvm

On Wed, 21 Apr 2010, Jamie Lokier wrote:

> Gerhard Wiesinger wrote:
>> I'm using VESA mode 0x101 (640x480 256 colors), but performance is
>> there very low (~1MB/s). Test is also WITHOUT any vga window change, so
>> there isn't any page switching overhead involved in this test case.
>>
>>>> Any ideas for improvement?
>>>
>>> Currently when the physical memory map changes (which is what happens
>>> when the vga window is updated), kvm drops the entire shadow cache.  It's
>>> possible to do this only for vga memory, but not easy.
>>
>> I don't think changing VGA window is a problem because there are
>> 500.000-1Mio changes/s possible.
>
> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
> or trap on every write?

To clarify:
Memory Performance writing to segmen A000 is about 1MB/st.
Calling INT 10 set/get window function with different windows (e.g. 
toggling between window page 0 and 1) is about 500.000 to 1Mio function 
calls per second.

To get real good VGA performance both parameters should be:
About >50MB/s for writes to segment A000
~500.000 bank switches per second.

>> Would it be possible to handle these writes through QEMU directly (without
>> KVM), because performance is there very well (looking at the code there
>> is some pointer arithmetic and some memory write done)?
>
> I've noticed extremely slow VGA performance too, when installing OSes.
> It makes the difference between installing in a few minutes, and
> installing taking hours - just because of the slow VGA.
>
> So generally I use qemu for installing old versions of Windows, then
> change to KVM to run them after installing.
>
> Switching between KVM and qemu automatically based on guest code
> behaviour, and making both memory models and device models compatible
> at run time, is a difficult thing.  I guess it's not worth the
> difficulty just to speed up VGA.

I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU 
and not through KVM (because it is much more faster). Also 16 color modes 
should be fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM 
(there is a specialized driver loaded for that PCI device in the non legacy OS).

Is that easily possible?

Thnx.

Ciao,
Gerhard

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 18:50         ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-21 18:50 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Avi Kivity, kvm, qemu-devel

On Wed, 21 Apr 2010, Jamie Lokier wrote:

> Gerhard Wiesinger wrote:
>> I'm using VESA mode 0x101 (640x480 256 colors), but performance is
>> there very low (~1MB/s). Test is also WITHOUT any vga window change, so
>> there isn't any page switching overhead involved in this test case.
>>
>>>> Any ideas for improvement?
>>>
>>> Currently when the physical memory map changes (which is what happens
>>> when the vga window is updated), kvm drops the entire shadow cache.  It's
>>> possible to do this only for vga memory, but not easy.
>>
>> I don't think changing VGA window is a problem because there are
>> 500.000-1Mio changes/s possible.
>
> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
> or trap on every write?

To clarify:
Memory Performance writing to segmen A000 is about 1MB/st.
Calling INT 10 set/get window function with different windows (e.g. 
toggling between window page 0 and 1) is about 500.000 to 1Mio function 
calls per second.

To get real good VGA performance both parameters should be:
About >50MB/s for writes to segment A000
~500.000 bank switches per second.

>> Would it be possible to handle these writes through QEMU directly (without
>> KVM), because performance is there very well (looking at the code there
>> is some pointer arithmetic and some memory write done)?
>
> I've noticed extremely slow VGA performance too, when installing OSes.
> It makes the difference between installing in a few minutes, and
> installing taking hours - just because of the slow VGA.
>
> So generally I use qemu for installing old versions of Windows, then
> change to KVM to run them after installing.
>
> Switching between KVM and qemu automatically based on guest code
> behaviour, and making both memory models and device models compatible
> at run time, is a difficult thing.  I guess it's not worth the
> difficulty just to speed up VGA.

I think this is very easy to distingish:
1.) VGA Segment A000 is legacy and should be handled through QEMU 
and not through KVM (because it is much more faster). Also 16 color modes 
should be fast enough there.
2.) All other flat PCI memory accesses should be handled through KVM 
(there is a specialized driver loaded for that PCI device in the non legacy OS).

Is that easily possible?

Thnx.

Ciao,
Gerhard

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 18:50         ` Gerhard Wiesinger
@ 2010-04-21 18:53           ` Jamie Lokier
  -1 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 18:53 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, qemu-devel, kvm

Gerhard Wiesinger wrote:
> >>Would it be possible to handle these writes through QEMU directly 
> >>(without
> >>KVM), because performance is there very well (looking at the code there
> >>is some pointer arithmetic and some memory write done)?
> >
> >I've noticed extremely slow VGA performance too, when installing OSes.
> >It makes the difference between installing in a few minutes, and
> >installing taking hours - just because of the slow VGA.
> >
> >So generally I use qemu for installing old versions of Windows, then
> >change to KVM to run them after installing.
> >
> >Switching between KVM and qemu automatically based on guest code
> >behaviour, and making both memory models and device models compatible
> >at run time, is a difficult thing.  I guess it's not worth the
> >difficulty just to speed up VGA.
> 
> I think this is very easy to distingish:
> 1.) VGA Segment A000 is legacy and should be handled through QEMU 
> and not through KVM (because it is much more faster). Also 16 color modes 
> should be fast enough there.
> 2.) All other flat PCI memory accesses should be handled through KVM 
> (there is a specialized driver loaded for that PCI device in the non 
> legacy OS).
> 
> Is that easily possible?

No it isn't.  Distingushing addresses is trivial.  You've ignored the
hard part, which is switching between different virtualisation
architectures...

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 18:53           ` Jamie Lokier
  0 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 18:53 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, kvm, qemu-devel

Gerhard Wiesinger wrote:
> >>Would it be possible to handle these writes through QEMU directly 
> >>(without
> >>KVM), because performance is there very well (looking at the code there
> >>is some pointer arithmetic and some memory write done)?
> >
> >I've noticed extremely slow VGA performance too, when installing OSes.
> >It makes the difference between installing in a few minutes, and
> >installing taking hours - just because of the slow VGA.
> >
> >So generally I use qemu for installing old versions of Windows, then
> >change to KVM to run them after installing.
> >
> >Switching between KVM and qemu automatically based on guest code
> >behaviour, and making both memory models and device models compatible
> >at run time, is a difficult thing.  I guess it's not worth the
> >difficulty just to speed up VGA.
> 
> I think this is very easy to distingish:
> 1.) VGA Segment A000 is legacy and should be handled through QEMU 
> and not through KVM (because it is much more faster). Also 16 color modes 
> should be fast enough there.
> 2.) All other flat PCI memory accesses should be handled through KVM 
> (there is a specialized driver loaded for that PCI device in the non 
> legacy OS).
> 
> Is that easily possible?

No it isn't.  Distingushing addresses is trivial.  You've ignored the
hard part, which is switching between different virtualisation
architectures...

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 18:53           ` Jamie Lokier
@ 2010-04-21 19:08             ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-21 19:08 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Avi Kivity, qemu-devel, kvm

On Wed, 21 Apr 2010, Jamie Lokier wrote:

> Gerhard Wiesinger wrote:
>>>> Would it be possible to handle these writes through QEMU directly
>>>> (without
>>>> KVM), because performance is there very well (looking at the code there
>>>> is some pointer arithmetic and some memory write done)?
>>>
>>> I've noticed extremely slow VGA performance too, when installing OSes.
>>> It makes the difference between installing in a few minutes, and
>>> installing taking hours - just because of the slow VGA.
>>>
>>> So generally I use qemu for installing old versions of Windows, then
>>> change to KVM to run them after installing.
>>>
>>> Switching between KVM and qemu automatically based on guest code
>>> behaviour, and making both memory models and device models compatible
>>> at run time, is a difficult thing.  I guess it's not worth the
>>> difficulty just to speed up VGA.
>>
>> I think this is very easy to distingish:
>> 1.) VGA Segment A000 is legacy and should be handled through QEMU
>> and not through KVM (because it is much more faster). Also 16 color modes
>> should be fast enough there.
>> 2.) All other flat PCI memory accesses should be handled through KVM
>> (there is a specialized driver loaded for that PCI device in the non
>> legacy OS).
>>
>> Is that easily possible?
>
> No it isn't.  Distingushing addresses is trivial.  You've ignored the
> hard part, which is switching between different virtualisation
> architectures...

Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW 
of QEMU even from KVM must be possible (e.g. memory and port accesses are 
done on nearly every virtual device) and therefore I'm ending in C code in
the QEMU hw/*.c directory. Therefore also the VGA memory area should be able
to be accessable from KVM but with the specialized and fast memory access of QEMU.
Am I missing something?

BTW: Still not clear why performance is low with KVM since there are 
no window changes in the testcase involved which could cause a (slow) page 
fault.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 19:08             ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-21 19:08 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Avi Kivity, kvm, qemu-devel

On Wed, 21 Apr 2010, Jamie Lokier wrote:

> Gerhard Wiesinger wrote:
>>>> Would it be possible to handle these writes through QEMU directly
>>>> (without
>>>> KVM), because performance is there very well (looking at the code there
>>>> is some pointer arithmetic and some memory write done)?
>>>
>>> I've noticed extremely slow VGA performance too, when installing OSes.
>>> It makes the difference between installing in a few minutes, and
>>> installing taking hours - just because of the slow VGA.
>>>
>>> So generally I use qemu for installing old versions of Windows, then
>>> change to KVM to run them after installing.
>>>
>>> Switching between KVM and qemu automatically based on guest code
>>> behaviour, and making both memory models and device models compatible
>>> at run time, is a difficult thing.  I guess it's not worth the
>>> difficulty just to speed up VGA.
>>
>> I think this is very easy to distingish:
>> 1.) VGA Segment A000 is legacy and should be handled through QEMU
>> and not through KVM (because it is much more faster). Also 16 color modes
>> should be fast enough there.
>> 2.) All other flat PCI memory accesses should be handled through KVM
>> (there is a specialized driver loaded for that PCI device in the non
>> legacy OS).
>>
>> Is that easily possible?
>
> No it isn't.  Distingushing addresses is trivial.  You've ignored the
> hard part, which is switching between different virtualisation
> architectures...

Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW 
of QEMU even from KVM must be possible (e.g. memory and port accesses are 
done on nearly every virtual device) and therefore I'm ending in C code in
the QEMU hw/*.c directory. Therefore also the VGA memory area should be able
to be accessable from KVM but with the specialized and fast memory access of QEMU.
Am I missing something?

BTW: Still not clear why performance is low with KVM since there are 
no window changes in the testcase involved which could cause a (slow) page 
fault.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 18:14         ` [Qemu-devel] " Gerhard Wiesinger
@ 2010-04-21 20:49           ` Avi Kivity
  -1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21 20:49 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Jamie Lokier, qemu-devel, kvm

On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:
>
> Can you explain which code files/functions of KVM is involved in 
> handling VGA memory window and page switching through the port write 
> to the VGA window register (or is that part handled through QEMU), so 
> a little bit architecture explaination would be nice?

qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
vbe_ioport_write_data() and vga_ioport_write().

>
> BTW: In which KVM code parts is decided where "direct code" or an 
> "emulated device code" is used?
>

Same place.  Look for calls to cpu_register_physical_memory().  If the 
last argument was obtained by a call to cpu_register_io_memory(), then 
all writes trap.  Otherwise, it was obtained by qemu_ram_alloc() and 
writes will not trap (except the first write to a page in a 30ms window, 
used to note that the page is dirty and needs redrawing).


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 20:49           ` Avi Kivity
  0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21 20:49 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:
>
> Can you explain which code files/functions of KVM is involved in 
> handling VGA memory window and page switching through the port write 
> to the VGA window register (or is that part handled through QEMU), so 
> a little bit architecture explaination would be nice?

qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
vbe_ioport_write_data() and vga_ioport_write().

>
> BTW: In which KVM code parts is decided where "direct code" or an 
> "emulated device code" is used?
>

Same place.  Look for calls to cpu_register_physical_memory().  If the 
last argument was obtained by a call to cpu_register_io_memory(), then 
all writes trap.  Otherwise, it was obtained by qemu_ram_alloc() and 
writes will not trap (except the first write to a page in a 30ms window, 
used to note that the page is dirty and needs redrawing).


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 18:39       ` Jamie Lokier
@ 2010-04-21 20:51         ` Avi Kivity
  2010-04-21 21:19           ` Jamie Lokier
  2010-04-22  5:44             ` Gerhard Wiesinger
  0 siblings, 2 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21 20:51 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Gerhard Wiesinger, qemu-devel, kvm

On 04/21/2010 09:39 PM, Jamie Lokier wrote:
> Avi Kivity wrote:
>    
>> Writes to vga in 16-color mode don't change set a memory location to a
>> value, instead they change multiple memory locations.
>>      
> While code is just writing to the VGA memory, not reading(*) and not
> touching the VGA I/O register that control the write latches, is it
> possible in principle to swizzle the format around in memory to make
> regular writes work?
>    

Not in software.  We can map pages, not cross address lines.

> (*) Reading should be ok for some settings of the write latches, I
> think.
>
> I wonder if guests of interest behave like that.
>    

Guests that use 16 color vga are usually of little interest.

>>> Is this a case where TCG would run significantly faster for code blocks
>>> that have been detected to access the VGA memory?
>>>        
>> Yes.
>>      
> $ date
> Wed Apr 21 19:37:38 2015
> $ modprobe ktcg
>    

That's why the vmware software vmm was faster than the hardware vmm for 
the initial iterations of vmx.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 18:50         ` Gerhard Wiesinger
@ 2010-04-21 20:56           ` Avi Kivity
  -1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21 20:56 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Jamie Lokier, qemu-devel, kvm

On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>> I don't think changing VGA window is a problem because there are
>>> 500.000-1Mio changes/s possible.
>>
>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>> or trap on every write?
>
>
> To clarify:
> Memory Performance writing to segmen A000 is about 1MB/st.

That indicates a fault every write (assuming 8-16 bit writes).  If 
you're using 256 color vga and not switching banks, this indicates a bug.

> Calling INT 10 set/get window function with different windows (e.g. 
> toggling between window page 0 and 1) is about 500.000 to 1Mio 
> function calls per second.

That's suprisingly fast. I'd expect 100-200k/sec.

Please run kvm_stat and report output for both tests to confirm.

>
> To get real good VGA performance both parameters should be:
> About >50MB/s for writes to segment A000
> ~500.000 bank switches per second.

First should be doable easily, second is borderline.

> I think this is very easy to distingish:
> 1.) VGA Segment A000 is legacy and should be handled through QEMU and 
> not through KVM (because it is much more faster). Also 16 color modes 
> should be fast enough there.
> 2.) All other flat PCI memory accesses should be handled through KVM 
> (there is a specialized driver loaded for that PCI device in the non 
> legacy OS).
>
> Is that easily possible?

No.  Code can run in either qemu or kvm, not both.  You can switch 
between them based on access statistics (early versions of qemu-kvm did 
that, without the statistics part), but this isn't trivial.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 20:56           ` Avi Kivity
  0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-21 20:56 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>> I don't think changing VGA window is a problem because there are
>>> 500.000-1Mio changes/s possible.
>>
>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>> or trap on every write?
>
>
> To clarify:
> Memory Performance writing to segmen A000 is about 1MB/st.

That indicates a fault every write (assuming 8-16 bit writes).  If 
you're using 256 color vga and not switching banks, this indicates a bug.

> Calling INT 10 set/get window function with different windows (e.g. 
> toggling between window page 0 and 1) is about 500.000 to 1Mio 
> function calls per second.

That's suprisingly fast. I'd expect 100-200k/sec.

Please run kvm_stat and report output for both tests to confirm.

>
> To get real good VGA performance both parameters should be:
> About >50MB/s for writes to segment A000
> ~500.000 bank switches per second.

First should be doable easily, second is borderline.

> I think this is very easy to distingish:
> 1.) VGA Segment A000 is legacy and should be handled through QEMU and 
> not through KVM (because it is much more faster). Also 16 color modes 
> should be fast enough there.
> 2.) All other flat PCI memory accesses should be handled through KVM 
> (there is a specialized driver loaded for that PCI device in the non 
> legacy OS).
>
> Is that easily possible?

No.  Code can run in either qemu or kvm, not both.  You can switch 
between them based on access statistics (early versions of qemu-kvm did 
that, without the statistics part), but this isn't trivial.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 20:51         ` Avi Kivity
@ 2010-04-21 21:19           ` Jamie Lokier
  2010-04-22  5:44             ` Gerhard Wiesinger
  1 sibling, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 21:19 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gerhard Wiesinger, qemu-devel, kvm

Avi Kivity wrote:
> On 04/21/2010 09:39 PM, Jamie Lokier wrote:
> >Avi Kivity wrote:
> >   
> >>Writes to vga in 16-color mode don't change set a memory location to a
> >>value, instead they change multiple memory locations.
> >>     
> >While code is just writing to the VGA memory, not reading(*) and not
> >touching the VGA I/O register that control the write latches, is it
> >possible in principle to swizzle the format around in memory to make
> >regular writes work?
> >   
> 
> Not in software.  We can map pages, not cross address lines.

Hence "swizzle".  You rearrange the data inside the page for the
crossed address lines, and undo the swizzle later on demand.  That
doesn't work for other VGA magic though.

> Guests that use 16 color vga are usually of little interest.

Fair enough.  We can move on :-)

It's been said that the super-slow VGA writes triggering this thread
are in 256-colour mode, so there's a different problem.  That should
be fast, shouldn't it?

I vaguely recall extremely slow OS installs I've seen in KVM, which
were fast in QEMU (and fast in KVM after installing), were using text
mode.  Possibly it was Windows 2000, or Windows Server 2003.  Text
mode should be fast too, shouldn't it?  I suppose it's possible that
it just looked like text mode and was really 16-colour mode.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 19:08             ` Gerhard Wiesinger
@ 2010-04-21 21:30               ` Jamie Lokier
  -1 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 21:30 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, qemu-devel, kvm

Gerhard Wiesinger wrote:
> Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW 
> of QEMU even from KVM must be possible (e.g. memory and port accesses are 
> done on nearly every virtual device) and therefore I'm ending in C code in
> the QEMU hw/*.c directory. Therefore also the VGA memory area should be 
> able to be accessable from KVM but with the specialized and fast memory
> access of QEMU.  Am I missing something?

What you're missing is that when KVM calls out to QEMU to handle
hw/*.c traps, that call is very slow.  It's because the hardware-VM
support is a bit slow when the trap happens, and then the the call
from KVM in the kernel up to QEMU is a bit slow again.  Then all the
way back.  It adds up to a lot, for every I/O operation.

When QEMU does the same thing, it's fast because it's inside the same
process; it's just a function call.

That's why the most often called devices are emulated separately in
KVM's kernel code, things like the interrupt controller, timer chip
etc.  It's also why individual instructions that need help are
emulated in KVM's kernel code, instead of passing control up to QEMU
just for one instruction.

> BTW: Still not clear why performance is low with KVM since there are 
> no window changes in the testcase involved which could cause a (slow) page 
> fault.

It sounds like a bug.  Avi gave suggests about what to look for.
If it fixes my OS install speeds too, I'll be very happy :-)

In 256-colour mode, KVM should be writing to the VGA memory at high
speed a lot like normal RAM, not trapping at the hardware-VM level,
and not calling up to the code in hw/*.c for every byte.

You might double-check if your guest is using VGA "Mode X".  (See Wikipedia.)

That was a way to accelerate VGA on real PCs, but it will be slow in
KVM for the same reasons as 16-colour mode.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-21 21:30               ` Jamie Lokier
  0 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-04-21 21:30 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, kvm, qemu-devel

Gerhard Wiesinger wrote:
> Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW 
> of QEMU even from KVM must be possible (e.g. memory and port accesses are 
> done on nearly every virtual device) and therefore I'm ending in C code in
> the QEMU hw/*.c directory. Therefore also the VGA memory area should be 
> able to be accessable from KVM but with the specialized and fast memory
> access of QEMU.  Am I missing something?

What you're missing is that when KVM calls out to QEMU to handle
hw/*.c traps, that call is very slow.  It's because the hardware-VM
support is a bit slow when the trap happens, and then the the call
from KVM in the kernel up to QEMU is a bit slow again.  Then all the
way back.  It adds up to a lot, for every I/O operation.

When QEMU does the same thing, it's fast because it's inside the same
process; it's just a function call.

That's why the most often called devices are emulated separately in
KVM's kernel code, things like the interrupt controller, timer chip
etc.  It's also why individual instructions that need help are
emulated in KVM's kernel code, instead of passing control up to QEMU
just for one instruction.

> BTW: Still not clear why performance is low with KVM since there are 
> no window changes in the testcase involved which could cause a (slow) page 
> fault.

It sounds like a bug.  Avi gave suggests about what to look for.
If it fixes my OS install speeds too, I'll be very happy :-)

In 256-colour mode, KVM should be writing to the VGA memory at high
speed a lot like normal RAM, not trapping at the hardware-VM level,
and not calling up to the code in hw/*.c for every byte.

You might double-check if your guest is using VGA "Mode X".  (See Wikipedia.)

That was a way to accelerate VGA on real PCs, but it will be slow in
KVM for the same reasons as 16-colour mode.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 20:49           ` Avi Kivity
@ 2010-04-22  5:37             ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  5:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jamie Lokier, qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:
>> 
>> Can you explain which code files/functions of KVM is involved in handling 
>> VGA memory window and page switching through the port write to the VGA 
>> window register (or is that part handled through QEMU), so a little bit 
>> architecture explaination would be nice?
>
> qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
> vbe_ioport_write_data() and vga_ioport_write().
>

Yes, I was already in that code part and that are very simple functions as 
already explained and are therefore in QEMU only very fast. But I ment: 
How is the calling path from KVM guest OS to hw/vga.c for memory and I/O 
accesses, and which parts are done in hardware directly (to understand the 
speed gap and maybe to find a solution)?

>> 
>> BTW: In which KVM code parts is decided where "direct code" or an "emulated 
>> device code" is used?
>> 
>
> Same place.  Look for calls to cpu_register_physical_memory().  If the last 
> argument was obtained by a call to cpu_register_io_memory(), then all writes 
> trap.  Otherwise, it was obtained by qemu_ram_alloc() and writes will not 
> trap (except the first write to a page in a 30ms window, used to note that 
> the page is dirty and needs redrawing).

Ok, that finally ends in:
cpu_register_physical_memory_offset()
...
// 0.12.3
     if (kvm_enabled())
         kvm_set_phys_mem(start_addr, size, phys_offset);
// KVM
     cpu_notify_set_memory(start_addr, size, phys_offset);
...

I/O is always done through:
cpu_register_io_memory => cpu_register_io_memory_fixed
cpu_register_io_memory_fixed()
...
No call to KVM?
...

Where is the trap from KVM to QEMU?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-22  5:37             ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  5:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:
>> 
>> Can you explain which code files/functions of KVM is involved in handling 
>> VGA memory window and page switching through the port write to the VGA 
>> window register (or is that part handled through QEMU), so a little bit 
>> architecture explaination would be nice?
>
> qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
> vbe_ioport_write_data() and vga_ioport_write().
>

Yes, I was already in that code part and that are very simple functions as 
already explained and are therefore in QEMU only very fast. But I ment: 
How is the calling path from KVM guest OS to hw/vga.c for memory and I/O 
accesses, and which parts are done in hardware directly (to understand the 
speed gap and maybe to find a solution)?

>> 
>> BTW: In which KVM code parts is decided where "direct code" or an "emulated 
>> device code" is used?
>> 
>
> Same place.  Look for calls to cpu_register_physical_memory().  If the last 
> argument was obtained by a call to cpu_register_io_memory(), then all writes 
> trap.  Otherwise, it was obtained by qemu_ram_alloc() and writes will not 
> trap (except the first write to a page in a 30ms window, used to note that 
> the page is dirty and needs redrawing).

Ok, that finally ends in:
cpu_register_physical_memory_offset()
...
// 0.12.3
     if (kvm_enabled())
         kvm_set_phys_mem(start_addr, size, phys_offset);
// KVM
     cpu_notify_set_memory(start_addr, size, phys_offset);
...

I/O is always done through:
cpu_register_io_memory => cpu_register_io_memory_fixed
cpu_register_io_memory_fixed()
...
No call to KVM?
...

Where is the trap from KVM to QEMU?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 20:51         ` Avi Kivity
@ 2010-04-22  5:44             ` Gerhard Wiesinger
  2010-04-22  5:44             ` Gerhard Wiesinger
  1 sibling, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  5:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jamie Lokier, qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 09:39 PM, Jamie Lokier wrote:
>> Avi Kivity wrote:
>> 
>>> Writes to vga in 16-color mode don't change set a memory location to a
>>> value, instead they change multiple memory locations.
>>> 
>> While code is just writing to the VGA memory, not reading(*) and not
>> touching the VGA I/O register that control the write latches, is it
>> possible in principle to swizzle the format around in memory to make
>> regular writes work?
>> 
>
> Not in software.  We can map pages, not cross address lines.
>
>> (*) Reading should be ok for some settings of the write latches, I
>> think.
>> 
>> I wonder if guests of interest behave like that.
>> 
>
> Guests that use 16 color vga are usually of little interest.
>

I tested 256 color modes.

>>>> Is this a case where TCG would run significantly faster for code blocks
>>>> that have been detected to access the VGA memory?
>>>> 
>>> Yes.
>>> 
>> $ date
>> Wed Apr 21 19:37:38 2015
>> $ modprobe ktcg
>> 
>
> That's why the vmware software vmm was faster than the hardware vmm for the 
> initial iterations of vmx.
>

On VMWare Server 2.0: same picture:
Calling INT10h interrupts is fast, Writing to VGA Memory is also very slow 
(1.0MB/s). Can one switch to the old software vmm in VMWare?

That was one of the reasons why I was looking for alternatives for 
graphical DOS programs. Overall summary so far:
1.) QEMU without KVM: Problem with 286 DOS Extender instruction set, but 
fast VGA
2.) QEMU with KVM: 286 DOS Extender apps ok, but slow VGA memory 
performance
3.) VMWare Server 2.0 under Linux, application ok, but slow VGA memory 
performance
4.) Virtual PC: Problems with 286 DOS Extender
5.) Bochs: Works well, but very slow.

Looks like that VMWare Server and QEMU with KVM maybe have the same 
architectural problems going through the whole slow chain from Guest OS to 
virtualization layer for VGA writes.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-22  5:44             ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  5:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 09:39 PM, Jamie Lokier wrote:
>> Avi Kivity wrote:
>> 
>>> Writes to vga in 16-color mode don't change set a memory location to a
>>> value, instead they change multiple memory locations.
>>> 
>> While code is just writing to the VGA memory, not reading(*) and not
>> touching the VGA I/O register that control the write latches, is it
>> possible in principle to swizzle the format around in memory to make
>> regular writes work?
>> 
>
> Not in software.  We can map pages, not cross address lines.
>
>> (*) Reading should be ok for some settings of the write latches, I
>> think.
>> 
>> I wonder if guests of interest behave like that.
>> 
>
> Guests that use 16 color vga are usually of little interest.
>

I tested 256 color modes.

>>>> Is this a case where TCG would run significantly faster for code blocks
>>>> that have been detected to access the VGA memory?
>>>> 
>>> Yes.
>>> 
>> $ date
>> Wed Apr 21 19:37:38 2015
>> $ modprobe ktcg
>> 
>
> That's why the vmware software vmm was faster than the hardware vmm for the 
> initial iterations of vmx.
>

On VMWare Server 2.0: same picture:
Calling INT10h interrupts is fast, Writing to VGA Memory is also very slow 
(1.0MB/s). Can one switch to the old software vmm in VMWare?

That was one of the reasons why I was looking for alternatives for 
graphical DOS programs. Overall summary so far:
1.) QEMU without KVM: Problem with 286 DOS Extender instruction set, but 
fast VGA
2.) QEMU with KVM: 286 DOS Extender apps ok, but slow VGA memory 
performance
3.) VMWare Server 2.0 under Linux, application ok, but slow VGA memory 
performance
4.) Virtual PC: Problems with 286 DOS Extender
5.) Bochs: Works well, but very slow.

Looks like that VMWare Server and QEMU with KVM maybe have the same 
architectural problems going through the whole slow chain from Guest OS to 
virtualization layer for VGA writes.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 20:56           ` Avi Kivity
@ 2010-04-22  6:04             ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  6:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jamie Lokier, qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>>> I don't think changing VGA window is a problem because there are
>>>> 500.000-1Mio changes/s possible.
>>> 
>>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>>> or trap on every write?
>> 
>> 
>> To clarify:
>> Memory Performance writing to segmen A000 is about 1MB/st.
>
> That indicates a fault every write (assuming 8-16 bit writes).  If you're 
> using 256 color vga and not switching banks, this indicates a bug.
>

Yes, 256 color VGA and no bank switches involved.

>> Calling INT 10 set/get window function with different windows (e.g. 
>> toggling between window page 0 and 1) is about 500.000 to 1Mio function 
>> calls per second.
>
> That's suprisingly fast. I'd expect 100-200k/sec.
>

Sorry, I mixed up the numbers:
1.) QEMU-KVM: ~111k
2.) QEMU only: 500k-1Mio

> Please run kvm_stat and report output for both tests to confirm.
>

See below. 2nd column is per second statistic when running the test.

>> 
>> To get real good VGA performance both parameters should be:
>> About >50MB/s for writes to segment A000
>> ~500.000 bank switches per second.
>
> First should be doable easily, second is borderline.
>
>> I think this is very easy to distingish:
>> 1.) VGA Segment A000 is legacy and should be handled through QEMU and not 
>> through KVM (because it is much more faster). Also 16 color modes should be 
>> fast enough there.
>> 2.) All other flat PCI memory accesses should be handled through KVM (there 
>> is a specialized driver loaded for that PCI device in the non legacy OS).
>> 
>> Is that easily possible?
>
> No.  Code can run in either qemu or kvm, not both.  You can switch between 
> them based on access statistics (early versions of qemu-kvm did that, without 
> the statistics part), but this isn't trivial.

Hmmm. Ok, 2 different opinions about the memory write performance:
Easily or not possible?

Thnx for you help so far.

Ciao,
Gerhard

-- 
http://www.wiesinger.com/

-------------
kvm_stat
Please mount debugfs ('mount -t debugfs debugfs /sys/kernel/debug')
and ensure the kvm modules are loaded

lsmod|grep -i kvm
kvm_amd                38276  0
kvm                   162288  1 kvm_amd

mount|grep -i debug

=>
mount -t debugfs debugfs /sys/kernel/debug

int10perf: INT10h Performance tests:
kvm statistics
  efer_reload                  0       0
  exits                 37648629  456206
  fpu_reload             8512535  455983
  halt_exits                2084       0
  halt_wakeup               2047       0
  host_state_reload      8513213  456011
  hypercalls                   0       0
  insn_emulation        29182065       0
  insn_emulation_fail          0       0
  invlpg                       0       0
  io_exits               8386082  455975
  irq_exits                51713     214
  irq_injections           21797      36
  irq_window                   0       0
  largepages                   0       0
  mmio_exits              242781       0
  mmu_cache_miss             150       0
  mmu_flooded                  0       0
  mmu_pde_zapped               0       0
  mmu_pte_updated              0       0
  mmu_pte_write             8192       0
  mmu_recycled                 0       0
  mmu_shadow_zapped          151       0
  mmu_unsync                   0       0
  mmu_unsync_global            0       0
  nmi_injections               0       0
  nmi_window                   0       0
  pf_fixed                 16935       0
  pf_guest                     0       0
  remote_tlb_flush             2       0
  request_irq                  0       0
  request_nmi                  0       0
  signal_exits                 1       0
  tlb_flush                 2251       0

Running VGA memory tests in same VGA page in Video Mode VESA 101h:
kvm statistics

  efer_reload                  0       0
  exits                 18470836  554582
  fpu_reload             2147833    3469
  halt_exits                2083       0
  halt_wakeup               2047       0
  host_state_reload      2148186    3470
  hypercalls                   0       0
  insn_emulation         7688203  554244
  insn_emulation_fail          0       0
  invlpg                       0       0
  io_exits              10701583      18
  irq_exits                50781     321
  irq_injections           25251      18
  irq_window                   0       0
  largepages                   0       0
  mmio_exits              162847    3241
  mmu_cache_miss             154       0
  mmu_flooded                  0       0
  mmu_pde_zapped               0       0
  mmu_pte_updated              0       0
  mmu_pte_write             8192       0
  mmu_recycled                 0       0
  mmu_shadow_zapped          155       0
  mmu_unsync                   0       0
  mmu_unsync_global            0       0
  nmi_injections               0       0
  nmi_window                   0       0
  pf_fixed                 16936       0
  pf_guest                     0       0
  remote_tlb_flush             5       0
  request_irq                  0       0
  request_nmi                  0       0
  signal_exits                 1       0
  tlb_flush                  112       0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-22  6:04             ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  6:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Wed, 21 Apr 2010, Avi Kivity wrote:

> On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>>> I don't think changing VGA window is a problem because there are
>>>> 500.000-1Mio changes/s possible.
>>> 
>>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>>> or trap on every write?
>> 
>> 
>> To clarify:
>> Memory Performance writing to segmen A000 is about 1MB/st.
>
> That indicates a fault every write (assuming 8-16 bit writes).  If you're 
> using 256 color vga and not switching banks, this indicates a bug.
>

Yes, 256 color VGA and no bank switches involved.

>> Calling INT 10 set/get window function with different windows (e.g. 
>> toggling between window page 0 and 1) is about 500.000 to 1Mio function 
>> calls per second.
>
> That's suprisingly fast. I'd expect 100-200k/sec.
>

Sorry, I mixed up the numbers:
1.) QEMU-KVM: ~111k
2.) QEMU only: 500k-1Mio

> Please run kvm_stat and report output for both tests to confirm.
>

See below. 2nd column is per second statistic when running the test.

>> 
>> To get real good VGA performance both parameters should be:
>> About >50MB/s for writes to segment A000
>> ~500.000 bank switches per second.
>
> First should be doable easily, second is borderline.
>
>> I think this is very easy to distingish:
>> 1.) VGA Segment A000 is legacy and should be handled through QEMU and not 
>> through KVM (because it is much more faster). Also 16 color modes should be 
>> fast enough there.
>> 2.) All other flat PCI memory accesses should be handled through KVM (there 
>> is a specialized driver loaded for that PCI device in the non legacy OS).
>> 
>> Is that easily possible?
>
> No.  Code can run in either qemu or kvm, not both.  You can switch between 
> them based on access statistics (early versions of qemu-kvm did that, without 
> the statistics part), but this isn't trivial.

Hmmm. Ok, 2 different opinions about the memory write performance:
Easily or not possible?

Thnx for you help so far.

Ciao,
Gerhard

-- 
http://www.wiesinger.com/

-------------
kvm_stat
Please mount debugfs ('mount -t debugfs debugfs /sys/kernel/debug')
and ensure the kvm modules are loaded

lsmod|grep -i kvm
kvm_amd                38276  0
kvm                   162288  1 kvm_amd

mount|grep -i debug

=>
mount -t debugfs debugfs /sys/kernel/debug

int10perf: INT10h Performance tests:
kvm statistics
  efer_reload                  0       0
  exits                 37648629  456206
  fpu_reload             8512535  455983
  halt_exits                2084       0
  halt_wakeup               2047       0
  host_state_reload      8513213  456011
  hypercalls                   0       0
  insn_emulation        29182065       0
  insn_emulation_fail          0       0
  invlpg                       0       0
  io_exits               8386082  455975
  irq_exits                51713     214
  irq_injections           21797      36
  irq_window                   0       0
  largepages                   0       0
  mmio_exits              242781       0
  mmu_cache_miss             150       0
  mmu_flooded                  0       0
  mmu_pde_zapped               0       0
  mmu_pte_updated              0       0
  mmu_pte_write             8192       0
  mmu_recycled                 0       0
  mmu_shadow_zapped          151       0
  mmu_unsync                   0       0
  mmu_unsync_global            0       0
  nmi_injections               0       0
  nmi_window                   0       0
  pf_fixed                 16935       0
  pf_guest                     0       0
  remote_tlb_flush             2       0
  request_irq                  0       0
  request_nmi                  0       0
  signal_exits                 1       0
  tlb_flush                 2251       0

Running VGA memory tests in same VGA page in Video Mode VESA 101h:
kvm statistics

  efer_reload                  0       0
  exits                 18470836  554582
  fpu_reload             2147833    3469
  halt_exits                2083       0
  halt_wakeup               2047       0
  host_state_reload      2148186    3470
  hypercalls                   0       0
  insn_emulation         7688203  554244
  insn_emulation_fail          0       0
  invlpg                       0       0
  io_exits              10701583      18
  irq_exits                50781     321
  irq_injections           25251      18
  irq_window                   0       0
  largepages                   0       0
  mmio_exits              162847    3241
  mmu_cache_miss             154       0
  mmu_flooded                  0       0
  mmu_pde_zapped               0       0
  mmu_pte_updated              0       0
  mmu_pte_write             8192       0
  mmu_recycled                 0       0
  mmu_shadow_zapped          155       0
  mmu_unsync                   0       0
  mmu_unsync_global            0       0
  nmi_injections               0       0
  nmi_window                   0       0
  pf_fixed                 16936       0
  pf_guest                     0       0
  remote_tlb_flush             5       0
  request_irq                  0       0
  request_nmi                  0       0
  signal_exits                 1       0
  tlb_flush                  112       0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-21 21:30               ` Jamie Lokier
@ 2010-04-22  6:12                 ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  6:12 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Avi Kivity, qemu-devel, kvm

On Wed, 21 Apr 2010, Jamie Lokier wrote:

> Gerhard Wiesinger wrote:
>> Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW
>> of QEMU even from KVM must be possible (e.g. memory and port accesses are
>> done on nearly every virtual device) and therefore I'm ending in C code in
>> the QEMU hw/*.c directory. Therefore also the VGA memory area should be
>> able to be accessable from KVM but with the specialized and fast memory
>> access of QEMU.  Am I missing something?
>
> What you're missing is that when KVM calls out to QEMU to handle
> hw/*.c traps, that call is very slow.  It's because the hardware-VM
> support is a bit slow when the trap happens, and then the the call
> from KVM in the kernel up to QEMU is a bit slow again.  Then all the
> way back.  It adds up to a lot, for every I/O operation.

Isn't that then a general problem of KVM virtualization (oder hardware 
virtualization) in general? Is this CPU dependend (AMD vs. Intel)?

> When QEMU does the same thing, it's fast because it's inside the same
> process; it's just a function call.

Yes, that's clear to me.

> That's why the most often called devices are emulated separately in
> KVM's kernel code, things like the interrupt controller, timer chip
> etc.  It's also why individual instructions that need help are
> emulated in KVM's kernel code, instead of passing control up to QEMU
> just for one instruction.

>> BTW: Still not clear why performance is low with KVM since there are
>> no window changes in the testcase involved which could cause a (slow) page
>> fault.
>
> It sounds like a bug.  Avi gave suggests about what to look for.
> If it fixes my OS install speeds too, I'll be very happy :-)
>

See other post for details.

> In 256-colour mode, KVM should be writing to the VGA memory at high
> speed a lot like normal RAM, not trapping at the hardware-VM level,
> and not calling up to the code in hw/*.c for every byte.
>


Yes, same picture to me: 256 color mode should be only a memory write (16 
color mode is more difficult as pixel/byte mapping is not the same).
But it looks like this isn't the case in this test scenario.

> You might double-check if your guest is using VGA "Mode X".  (See Wikipedia.)
>

Code:
 	inregs.x.ax = 0x4F02;
 	inregs.x.bx = 0xC000 | 0x101; // bh=bit 15=0 (clear), bit14=0 (windowed)
 	int86x(INT_SCREEN, &inregs, &outregs, &outsregs);		/* Call INT 10h */

I can post the whole code/exes if you want (I already planned to post my 
whole tools, but I have to do some cleanups until I wanted to publish 
whole package) .

> That was a way to accelerate VGA on real PCs, but it will be slow in
> KVM for the same reasons as 16-colour mode.

Which way do you mean?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-22  6:12                 ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-04-22  6:12 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Avi Kivity, kvm, qemu-devel

On Wed, 21 Apr 2010, Jamie Lokier wrote:

> Gerhard Wiesinger wrote:
>> Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW
>> of QEMU even from KVM must be possible (e.g. memory and port accesses are
>> done on nearly every virtual device) and therefore I'm ending in C code in
>> the QEMU hw/*.c directory. Therefore also the VGA memory area should be
>> able to be accessable from KVM but with the specialized and fast memory
>> access of QEMU.  Am I missing something?
>
> What you're missing is that when KVM calls out to QEMU to handle
> hw/*.c traps, that call is very slow.  It's because the hardware-VM
> support is a bit slow when the trap happens, and then the the call
> from KVM in the kernel up to QEMU is a bit slow again.  Then all the
> way back.  It adds up to a lot, for every I/O operation.

Isn't that then a general problem of KVM virtualization (oder hardware 
virtualization) in general? Is this CPU dependend (AMD vs. Intel)?

> When QEMU does the same thing, it's fast because it's inside the same
> process; it's just a function call.

Yes, that's clear to me.

> That's why the most often called devices are emulated separately in
> KVM's kernel code, things like the interrupt controller, timer chip
> etc.  It's also why individual instructions that need help are
> emulated in KVM's kernel code, instead of passing control up to QEMU
> just for one instruction.

>> BTW: Still not clear why performance is low with KVM since there are
>> no window changes in the testcase involved which could cause a (slow) page
>> fault.
>
> It sounds like a bug.  Avi gave suggests about what to look for.
> If it fixes my OS install speeds too, I'll be very happy :-)
>

See other post for details.

> In 256-colour mode, KVM should be writing to the VGA memory at high
> speed a lot like normal RAM, not trapping at the hardware-VM level,
> and not calling up to the code in hw/*.c for every byte.
>


Yes, same picture to me: 256 color mode should be only a memory write (16 
color mode is more difficult as pixel/byte mapping is not the same).
But it looks like this isn't the case in this test scenario.

> You might double-check if your guest is using VGA "Mode X".  (See Wikipedia.)
>

Code:
 	inregs.x.ax = 0x4F02;
 	inregs.x.bx = 0xC000 | 0x101; // bh=bit 15=0 (clear), bit14=0 (windowed)
 	int86x(INT_SCREEN, &inregs, &outregs, &outsregs);		/* Call INT 10h */

I can post the whole code/exes if you want (I already planned to post my 
whole tools, but I have to do some cleanups until I wanted to publish 
whole package) .

> That was a way to accelerate VGA on real PCs, but it will be slow in
> KVM for the same reasons as 16-colour mode.

Which way do you mean?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-22  5:37             ` Gerhard Wiesinger
@ 2010-04-22  6:57               ` Avi Kivity
  -1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-22  6:57 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Jamie Lokier, qemu-devel, kvm

On 04/22/2010 08:37 AM, Gerhard Wiesinger wrote:
> On Wed, 21 Apr 2010, Avi Kivity wrote:
>
>> On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:
>>>
>>> Can you explain which code files/functions of KVM is involved in 
>>> handling VGA memory window and page switching through the port write 
>>> to the VGA window register (or is that part handled through QEMU), 
>>> so a little bit architecture explaination would be nice?
>>
>> qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
>> vbe_ioport_write_data() and vga_ioport_write().
>>
>
> Yes, I was already in that code part and that are very simple 
> functions as already explained and are therefore in QEMU only very 
> fast. But I ment: How is the calling path from KVM guest OS to 
> hw/vga.c for memory and I/O accesses, and which parts are done in 
> hardware directly (to understand the speed gap and maybe to find a 
> solution)?

The speed gap is mostly due to hardware constraints (it takes ~2000 
cycles for an exit from guest mode, plus we need to switch a few msrs to 
get to userspace).

See vmx_vcpu_run(), the vmresume instruction is where an exit starts.

>
>>>
>>> BTW: In which KVM code parts is decided where "direct code" or an 
>>> "emulated device code" is used?
>>>
>>
>> Same place.  Look for calls to cpu_register_physical_memory().  If 
>> the last argument was obtained by a call to cpu_register_io_memory(), 
>> then all writes trap.  Otherwise, it was obtained by qemu_ram_alloc() 
>> and writes will not trap (except the first write to a page in a 30ms 
>> window, used to note that the page is dirty and needs redrawing).
>
> Ok, that finally ends in:
> cpu_register_physical_memory_offset()
> ...
> // 0.12.3
>     if (kvm_enabled())
>         kvm_set_phys_mem(start_addr, size, phys_offset);
> // KVM
>     cpu_notify_set_memory(start_addr, size, phys_offset);
> ...
>
> I/O is always done through:
> cpu_register_io_memory => cpu_register_io_memory_fixed
> cpu_register_io_memory_fixed()
> ...
> No call to KVM?

kvm_set_phys_mem() is a call to kvm.

> ...
>
> Where is the trap from KVM to QEMU?

See kvm_cpu_exec().

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-22  6:57               ` Avi Kivity
  0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-22  6:57 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 04/22/2010 08:37 AM, Gerhard Wiesinger wrote:
> On Wed, 21 Apr 2010, Avi Kivity wrote:
>
>> On 04/21/2010 09:14 PM, Gerhard Wiesinger wrote:
>>>
>>> Can you explain which code files/functions of KVM is involved in 
>>> handling VGA memory window and page switching through the port write 
>>> to the VGA window register (or is that part handled through QEMU), 
>>> so a little bit architecture explaination would be nice?
>>
>> qemu hw/vga.c and hw/cirrus_vga.c.  Boring functions like 
>> vbe_ioport_write_data() and vga_ioport_write().
>>
>
> Yes, I was already in that code part and that are very simple 
> functions as already explained and are therefore in QEMU only very 
> fast. But I ment: How is the calling path from KVM guest OS to 
> hw/vga.c for memory and I/O accesses, and which parts are done in 
> hardware directly (to understand the speed gap and maybe to find a 
> solution)?

The speed gap is mostly due to hardware constraints (it takes ~2000 
cycles for an exit from guest mode, plus we need to switch a few msrs to 
get to userspace).

See vmx_vcpu_run(), the vmresume instruction is where an exit starts.

>
>>>
>>> BTW: In which KVM code parts is decided where "direct code" or an 
>>> "emulated device code" is used?
>>>
>>
>> Same place.  Look for calls to cpu_register_physical_memory().  If 
>> the last argument was obtained by a call to cpu_register_io_memory(), 
>> then all writes trap.  Otherwise, it was obtained by qemu_ram_alloc() 
>> and writes will not trap (except the first write to a page in a 30ms 
>> window, used to note that the page is dirty and needs redrawing).
>
> Ok, that finally ends in:
> cpu_register_physical_memory_offset()
> ...
> // 0.12.3
>     if (kvm_enabled())
>         kvm_set_phys_mem(start_addr, size, phys_offset);
> // KVM
>     cpu_notify_set_memory(start_addr, size, phys_offset);
> ...
>
> I/O is always done through:
> cpu_register_io_memory => cpu_register_io_memory_fixed
> cpu_register_io_memory_fixed()
> ...
> No call to KVM?

kvm_set_phys_mem() is a call to kvm.

> ...
>
> Where is the trap from KVM to QEMU?

See kvm_cpu_exec().

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-22  6:04             ` Gerhard Wiesinger
@ 2010-04-22  7:03               ` Avi Kivity
  -1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-22  7:03 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Jamie Lokier, qemu-devel, kvm

On 04/22/2010 09:04 AM, Gerhard Wiesinger wrote:
> On Wed, 21 Apr 2010, Avi Kivity wrote:
>
>> On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>>>> I don't think changing VGA window is a problem because there are
>>>>> 500.000-1Mio changes/s possible.
>>>>
>>>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>>>> or trap on every write?
>>>
>>>
>>> To clarify:
>>> Memory Performance writing to segmen A000 is about 1MB/st.
>>
>> That indicates a fault every write (assuming 8-16 bit writes).  If 
>> you're using 256 color vga and not switching banks, this indicates a 
>> bug.
>>
>
> Yes, 256 color VGA and no bank switches involved.
>
>>> Calling INT 10 set/get window function with different windows (e.g. 
>>> toggling between window page 0 and 1) is about 500.000 to 1Mio 
>>> function calls per second.
>>
>> That's suprisingly fast. I'd expect 100-200k/sec.
>>
>
> Sorry, I mixed up the numbers:
> 1.) QEMU-KVM: ~111k
> 2.) QEMU only: 500k-1Mio
>
>> Please run kvm_stat and report output for both tests to confirm.
>>
>
> See below. 2nd column is per second statistic when running the test.

  efer_reload                  0       0
  exits                 18470836  554582
  fpu_reload             2147833    3469
  halt_exits                2083       0
  halt_wakeup               2047       0
  host_state_reload      2148186    3470
  hypercalls                   0       0
  insn_emulation         7688203  554244

This indicates that kvm is emulating instead of direct mapping.  That's 
probably a bug.  If you fix it, performance will increase dramatically.

>
>>>
>>> To get real good VGA performance both parameters should be:
>>> About >50MB/s for writes to segment A000
>>> ~500.000 bank switches per second.
>>
>> First should be doable easily, second is borderline.
>>
>>> I think this is very easy to distingish:
>>> 1.) VGA Segment A000 is legacy and should be handled through QEMU 
>>> and not through KVM (because it is much more faster). Also 16 color 
>>> modes should be fast enough there.
>>> 2.) All other flat PCI memory accesses should be handled through KVM 
>>> (there is a specialized driver loaded for that PCI device in the non 
>>> legacy OS).
>>>
>>> Is that easily possible?
>>
>> No.  Code can run in either qemu or kvm, not both.  You can switch 
>> between them based on access statistics (early versions of qemu-kvm 
>> did that, without the statistics part), but this isn't trivial.
>
> Hmmm. Ok, 2 different opinions about the memory write performance:
> Easily or not possible?

Switching between tcg and kvm is hard, but not needed.  For 256 color 
modes, direct map is possible and should yield good performance.  Bank 
switching can be improved perhaps 3x, but will never be fast.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-04-22  7:03               ` Avi Kivity
  0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-04-22  7:03 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 04/22/2010 09:04 AM, Gerhard Wiesinger wrote:
> On Wed, 21 Apr 2010, Avi Kivity wrote:
>
>> On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>>>> I don't think changing VGA window is a problem because there are
>>>>> 500.000-1Mio changes/s possible.
>>>>
>>>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>>>> or trap on every write?
>>>
>>>
>>> To clarify:
>>> Memory Performance writing to segmen A000 is about 1MB/st.
>>
>> That indicates a fault every write (assuming 8-16 bit writes).  If 
>> you're using 256 color vga and not switching banks, this indicates a 
>> bug.
>>
>
> Yes, 256 color VGA and no bank switches involved.
>
>>> Calling INT 10 set/get window function with different windows (e.g. 
>>> toggling between window page 0 and 1) is about 500.000 to 1Mio 
>>> function calls per second.
>>
>> That's suprisingly fast. I'd expect 100-200k/sec.
>>
>
> Sorry, I mixed up the numbers:
> 1.) QEMU-KVM: ~111k
> 2.) QEMU only: 500k-1Mio
>
>> Please run kvm_stat and report output for both tests to confirm.
>>
>
> See below. 2nd column is per second statistic when running the test.

  efer_reload                  0       0
  exits                 18470836  554582
  fpu_reload             2147833    3469
  halt_exits                2083       0
  halt_wakeup               2047       0
  host_state_reload      2148186    3470
  hypercalls                   0       0
  insn_emulation         7688203  554244

This indicates that kvm is emulating instead of direct mapping.  That's 
probably a bug.  If you fix it, performance will increase dramatically.

>
>>>
>>> To get real good VGA performance both parameters should be:
>>> About >50MB/s for writes to segment A000
>>> ~500.000 bank switches per second.
>>
>> First should be doable easily, second is borderline.
>>
>>> I think this is very easy to distingish:
>>> 1.) VGA Segment A000 is legacy and should be handled through QEMU 
>>> and not through KVM (because it is much more faster). Also 16 color 
>>> modes should be fast enough there.
>>> 2.) All other flat PCI memory accesses should be handled through KVM 
>>> (there is a specialized driver loaded for that PCI device in the non 
>>> legacy OS).
>>>
>>> Is that easily possible?
>>
>> No.  Code can run in either qemu or kvm, not both.  You can switch 
>> between them based on access statistics (early versions of qemu-kvm 
>> did that, without the statistics part), but this isn't trivial.
>
> Hmmm. Ok, 2 different opinions about the memory write performance:
> Easily or not possible?

Switching between tcg and kvm is hard, but not needed.  For 256 color 
modes, direct map is possible and should yield good performance.  Bank 
switching can be improved perhaps 3x, but will never be fast.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-22  7:03               ` Avi Kivity
@ 2010-05-09 19:35                 ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-05-09 19:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jamie Lokier, qemu-devel, kvm

On Thu, 22 Apr 2010, Avi Kivity wrote:

> On 04/22/2010 09:04 AM, Gerhard Wiesinger wrote:
>> On Wed, 21 Apr 2010, Avi Kivity wrote:
>> 
>>> On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>>>>> I don't think changing VGA window is a problem because there are
>>>>>> 500.000-1Mio changes/s possible.
>>>>> 
>>>>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>>>>> or trap on every write?
>>>> 
>>>> 
>>>> To clarify:
>>>> Memory Performance writing to segmen A000 is about 1MB/st.
>>> 
>>> That indicates a fault every write (assuming 8-16 bit writes).  If you're 
>>> using 256 color vga and not switching banks, this indicates a bug.
>>> 
>> 
>> Yes, 256 color VGA and no bank switches involved.
>> 
>>>> Calling INT 10 set/get window function with different windows (e.g. 
>>>> toggling between window page 0 and 1) is about 500.000 to 1Mio function 
>>>> calls per second.
>>> 
>>> That's suprisingly fast. I'd expect 100-200k/sec.
>>> 
>> 
>> Sorry, I mixed up the numbers:
>> 1.) QEMU-KVM: ~111k
>> 2.) QEMU only: 500k-1Mio
>> 
>>> Please run kvm_stat and report output for both tests to confirm.
>>> 
>> 
>> See below. 2nd column is per second statistic when running the test.
>
> efer_reload                  0       0
> exits                 18470836  554582
> fpu_reload             2147833    3469
> halt_exits                2083       0
> halt_wakeup               2047       0
> host_state_reload      2148186    3470
> hypercalls                   0       0
> insn_emulation         7688203  554244
>
> This indicates that kvm is emulating instead of direct mapping.  That's 
> probably a bug.  If you fix it, performance will increase dramatically.

Where can I start here?
Any ideas how to?

One of my ideas: Move hw/vga.c functions
vga_mem_readb
vga_mem_readw
vga_mem_readl
vga_mem_writeb
vga_mem_writew
vga_mem_writel
to KVM to avoid switching from KVM to QEMU (I can write C code even 
kernel but I'm not comfortable with KVM). Howto?

>>>> To get real good VGA performance both parameters should be:
>>>> About >50MB/s for writes to segment A000
>>>> ~500.000 bank switches per second.
>>> 
>>> First should be doable easily, second is borderline.
>>> 
>>>> I think this is very easy to distingish:
>>>> 1.) VGA Segment A000 is legacy and should be handled through QEMU and not 
>>>> through KVM (because it is much more faster). Also 16 color modes should 
>>>> be fast enough there.
>>>> 2.) All other flat PCI memory accesses should be handled through KVM 
>>>> (there is a specialized driver loaded for that PCI device in the non 
>>>> legacy OS).
>>>> 
>>>> Is that easily possible?
>>> 
>>> No.  Code can run in either qemu or kvm, not both.  You can switch between 
>>> them based on access statistics (early versions of qemu-kvm did that, 
>>> without the statistics part), but this isn't trivial.
>> 
>> Hmmm. Ok, 2 different opinions about the memory write performance:
>> Easily or not possible?
>
> Switching between tcg and kvm is hard, but not needed.  For 256 color modes, 
> direct map is possible and should yield good performance.  Bank switching can 
> be improved perhaps 3x, but will never be fast.

Where can I start for KVM performance for the bank switching (256 color 
mode)? (e.g. BIOS writes to VGA window I/O port to switch the bank)
Any ideas how to improve (architecture for the change)?

Thnx and sorry for long delay, was busy.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-05-09 19:35                 ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-05-09 19:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Thu, 22 Apr 2010, Avi Kivity wrote:

> On 04/22/2010 09:04 AM, Gerhard Wiesinger wrote:
>> On Wed, 21 Apr 2010, Avi Kivity wrote:
>> 
>>> On 04/21/2010 09:50 PM, Gerhard Wiesinger wrote:
>>>>>> I don't think changing VGA window is a problem because there are
>>>>>> 500.000-1Mio changes/s possible.
>>>>> 
>>>>> 1MB/s, 500k-1M changes/s.... Coincidence?  Is it taking a page fault
>>>>> or trap on every write?
>>>> 
>>>> 
>>>> To clarify:
>>>> Memory Performance writing to segmen A000 is about 1MB/st.
>>> 
>>> That indicates a fault every write (assuming 8-16 bit writes).  If you're 
>>> using 256 color vga and not switching banks, this indicates a bug.
>>> 
>> 
>> Yes, 256 color VGA and no bank switches involved.
>> 
>>>> Calling INT 10 set/get window function with different windows (e.g. 
>>>> toggling between window page 0 and 1) is about 500.000 to 1Mio function 
>>>> calls per second.
>>> 
>>> That's suprisingly fast. I'd expect 100-200k/sec.
>>> 
>> 
>> Sorry, I mixed up the numbers:
>> 1.) QEMU-KVM: ~111k
>> 2.) QEMU only: 500k-1Mio
>> 
>>> Please run kvm_stat and report output for both tests to confirm.
>>> 
>> 
>> See below. 2nd column is per second statistic when running the test.
>
> efer_reload                  0       0
> exits                 18470836  554582
> fpu_reload             2147833    3469
> halt_exits                2083       0
> halt_wakeup               2047       0
> host_state_reload      2148186    3470
> hypercalls                   0       0
> insn_emulation         7688203  554244
>
> This indicates that kvm is emulating instead of direct mapping.  That's 
> probably a bug.  If you fix it, performance will increase dramatically.

Where can I start here?
Any ideas how to?

One of my ideas: Move hw/vga.c functions
vga_mem_readb
vga_mem_readw
vga_mem_readl
vga_mem_writeb
vga_mem_writew
vga_mem_writel
to KVM to avoid switching from KVM to QEMU (I can write C code even 
kernel but I'm not comfortable with KVM). Howto?

>>>> To get real good VGA performance both parameters should be:
>>>> About >50MB/s for writes to segment A000
>>>> ~500.000 bank switches per second.
>>> 
>>> First should be doable easily, second is borderline.
>>> 
>>>> I think this is very easy to distingish:
>>>> 1.) VGA Segment A000 is legacy and should be handled through QEMU and not 
>>>> through KVM (because it is much more faster). Also 16 color modes should 
>>>> be fast enough there.
>>>> 2.) All other flat PCI memory accesses should be handled through KVM 
>>>> (there is a specialized driver loaded for that PCI device in the non 
>>>> legacy OS).
>>>> 
>>>> Is that easily possible?
>>> 
>>> No.  Code can run in either qemu or kvm, not both.  You can switch between 
>>> them based on access statistics (early versions of qemu-kvm did that, 
>>> without the statistics part), but this isn't trivial.
>> 
>> Hmmm. Ok, 2 different opinions about the memory write performance:
>> Easily or not possible?
>
> Switching between tcg and kvm is hard, but not needed.  For 256 color modes, 
> direct map is possible and should yield good performance.  Bank switching can 
> be improved perhaps 3x, but will never be fast.

Where can I start for KVM performance for the bank switching (256 color 
mode)? (e.g. BIOS writes to VGA window I/O port to switch the bank)
Any ideas how to improve (architecture for the change)?

Thnx and sorry for long delay, was busy.

Ciao,
Gerhard

--
http://www.wiesinger.com/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-05-09 19:35                 ` Gerhard Wiesinger
@ 2010-05-10  7:32                   ` Avi Kivity
  -1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-05-10  7:32 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Jamie Lokier, qemu-devel, kvm

On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>
>>>> Please run kvm_stat and report output for both tests to confirm.
>>>>
>>>
>>> See below. 2nd column is per second statistic when running the test.
>>
>> efer_reload                  0       0
>> exits                 18470836  554582
>> fpu_reload             2147833    3469
>> halt_exits                2083       0
>> halt_wakeup               2047       0
>> host_state_reload      2148186    3470
>> hypercalls                   0       0
>> insn_emulation         7688203  554244
>>
>> This indicates that kvm is emulating instead of direct mapping.  
>> That's probably a bug.  If you fix it, performance will increase 
>> dramatically.
>
>
> Where can I start here?
> Any ideas how to?
>
> One of my ideas: Move hw/vga.c functions
> vga_mem_readb
> vga_mem_readw
> vga_mem_readl
> vga_mem_writeb
> vga_mem_writew
> vga_mem_writel
> to KVM to avoid switching from KVM to QEMU (I can write C code even 
> kernel but I'm not comfortable with KVM). Howto?

That is already done (generically), it's called coalesced mmio.  You 
only have 3470 qemu exits/sec compared to 554244 kvm writes/sec.

>> Switching between tcg and kvm is hard, but not needed.  For 256 color 
>> modes, direct map is possible and should yield good performance.  
>> Bank switching can be improved perhaps 3x, but will never be fast.
>
> Where can I start for KVM performance for the bank switching (256 
> color mode)? (e.g. BIOS writes to VGA window I/O port to switch the bank)
> Any ideas how to improve (architecture for the change)?

For 256 color more the first priority is to find out why direct mapping 
is not used.  I'd suggest tracing the code that makes this decision (in 
hw/*vga.c) and seeing if it's right or not.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-05-10  7:32                   ` Avi Kivity
  0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-05-10  7:32 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>
>>>> Please run kvm_stat and report output for both tests to confirm.
>>>>
>>>
>>> See below. 2nd column is per second statistic when running the test.
>>
>> efer_reload                  0       0
>> exits                 18470836  554582
>> fpu_reload             2147833    3469
>> halt_exits                2083       0
>> halt_wakeup               2047       0
>> host_state_reload      2148186    3470
>> hypercalls                   0       0
>> insn_emulation         7688203  554244
>>
>> This indicates that kvm is emulating instead of direct mapping.  
>> That's probably a bug.  If you fix it, performance will increase 
>> dramatically.
>
>
> Where can I start here?
> Any ideas how to?
>
> One of my ideas: Move hw/vga.c functions
> vga_mem_readb
> vga_mem_readw
> vga_mem_readl
> vga_mem_writeb
> vga_mem_writew
> vga_mem_writel
> to KVM to avoid switching from KVM to QEMU (I can write C code even 
> kernel but I'm not comfortable with KVM). Howto?

That is already done (generically), it's called coalesced mmio.  You 
only have 3470 qemu exits/sec compared to 554244 kvm writes/sec.

>> Switching between tcg and kvm is hard, but not needed.  For 256 color 
>> modes, direct map is possible and should yield good performance.  
>> Bank switching can be improved perhaps 3x, but will never be fast.
>
> Where can I start for KVM performance for the bank switching (256 
> color mode)? (e.g. BIOS writes to VGA window I/O port to switch the bank)
> Any ideas how to improve (architecture for the change)?

For 256 color more the first priority is to find out why direct mapping 
is not used.  I'd suggest tracing the code that makes this decision (in 
hw/*vga.c) and seeing if it's right or not.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Re: QEMU-KVM and video performance
  2010-05-10  7:32                   ` Avi Kivity
@ 2010-05-12  6:14                     ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-05-12  6:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11246 bytes --]

On Mon, 10 May 2010, Avi Kivity wrote:

> On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>> 
>
> For 256 color more the first priority is to find out why direct mapping is 
> not used.  I'd suggest tracing the code that makes this decision (in 
> hw/*vga.c) and seeing if it's right or not.

I think this is because A000 is not initialized for KVM (see log below 
and logging patch attached).

Switches tried without success:
-vga std (log is from this one)
-vga cirrus
-vga vmware

I tried also to force the mapping (see patch where it is commented out) 
but some errors occour (see 2nd log below) and performance is still 
low at ~1MB/s:
s->lfb_vram_mapped = 1;

On testing the following line occour:
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
...

Any ideas? Can you reproduce it?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
--------------------------------------------------------------------------------------
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2320 bytes --]

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index 571044f..68c6083 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2581,6 +2581,7 @@ static void map_linear_vram(CirrusVGAState *s)
         cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000,
                                     (s->vga.vram_offset + s->cirrus_bank_base[1]) | IO_MEM_RAM);
 
+	printf("Cirrus VGA: lfb_vram_mapped=1\n");
         s->vga.lfb_vram_mapped = 1;
     }
     else {
diff --git a/hw/vga.c b/hw/vga.c
index a5e2387..cb8a209 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -1612,15 +1612,19 @@ static void mark_dirty(target_phys_addr_t start, target_phys_addr_t len)
 
 void vga_dirty_log_start(VGACommonState *s)
 {
+    printf("vga_dirty_log_start\n");
     if (kvm_enabled() && s->map_addr)
         if (!s1) {
+            printf("vga_dirty_log_start_mapping_map_addr, start=0x%08X, len=0x%08X\n", s->map_addr, s->map_end - s->map_addr);
             kvm_log_start(s->map_addr, s->map_end - s->map_addr);
             mark_dirty(s->map_addr, s->map_end - s->map_addr);
             s1 = 1;
         }
     if (kvm_enabled() && s->lfb_vram_mapped) {
         if (!s2) {
+            printf("vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x%08X, len=0x%08X\n", (unsigned int)(isa_mem_base + 0xa0000), 0x8000);
             kvm_log_start(isa_mem_base + 0xa0000, 0x8000);
+            printf("vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x%08X, len=0x%08X\n", (unsigned int)(isa_mem_base + 0xa8000), 0x8000);
             kvm_log_start(isa_mem_base + 0xa8000, 0x8000);
             mark_dirty(isa_mem_base + 0xa0000, 0x10000);
         }
@@ -1630,6 +1634,7 @@ void vga_dirty_log_start(VGACommonState *s)
 #ifdef CONFIG_BOCHS_VBE
     if (kvm_enabled() && s->vbe_mapped) {
         if (!s3) {
+            printf("vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x%08X, len=0x%08X\n", VBE_DISPI_LFB_PHYSICAL_ADDRESS, s->vram_size);
             kvm_log_start(VBE_DISPI_LFB_PHYSICAL_ADDRESS, s->vram_size);
         }
         s3 = 1;
@@ -1965,6 +1970,7 @@ void vga_common_reset(VGACommonState *s)
     s->map_addr = 0;
     s->map_end = 0;
     s->lfb_vram_mapped = 0;
+//    s->lfb_vram_mapped = 1;
     s->bios_offset = 0;
     s->bios_size = 0;
     s->sr_index = 0;

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-05-12  6:14                     ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2010-05-12  6:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11246 bytes --]

On Mon, 10 May 2010, Avi Kivity wrote:

> On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>> 
>
> For 256 color more the first priority is to find out why direct mapping is 
> not used.  I'd suggest tracing the code that makes this decision (in 
> hw/*vga.c) and seeing if it's right or not.

I think this is because A000 is not initialized for KVM (see log below 
and logging patch attached).

Switches tried without success:
-vga std (log is from this one)
-vga cirrus
-vga vmware

I tried also to force the mapping (see patch where it is commented out) 
but some errors occour (see 2nd log below) and performance is still 
low at ~1MB/s:
s->lfb_vram_mapped = 1;

On testing the following line occour:
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start
...

Any ideas? Can you reproduce it?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
--------------------------------------------------------------------------------------
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
vga_dirty_log_start
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start
vga_dirty_log_start_mapping_map_addr, start=0xF0000000, len=0x01000000
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a0000-00000000000a7fff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A8000, len=0x00008000
BUG: kvm_dirty_pages_log_change: invalid parameters 00000000000a8000-00000000000affff
vga_dirty_log_start_mapping_lfb_vram_mapped, start=0xE0000000, len=0x01000000

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2320 bytes --]

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index 571044f..68c6083 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2581,6 +2581,7 @@ static void map_linear_vram(CirrusVGAState *s)
         cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000,
                                     (s->vga.vram_offset + s->cirrus_bank_base[1]) | IO_MEM_RAM);
 
+	printf("Cirrus VGA: lfb_vram_mapped=1\n");
         s->vga.lfb_vram_mapped = 1;
     }
     else {
diff --git a/hw/vga.c b/hw/vga.c
index a5e2387..cb8a209 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -1612,15 +1612,19 @@ static void mark_dirty(target_phys_addr_t start, target_phys_addr_t len)
 
 void vga_dirty_log_start(VGACommonState *s)
 {
+    printf("vga_dirty_log_start\n");
     if (kvm_enabled() && s->map_addr)
         if (!s1) {
+            printf("vga_dirty_log_start_mapping_map_addr, start=0x%08X, len=0x%08X\n", s->map_addr, s->map_end - s->map_addr);
             kvm_log_start(s->map_addr, s->map_end - s->map_addr);
             mark_dirty(s->map_addr, s->map_end - s->map_addr);
             s1 = 1;
         }
     if (kvm_enabled() && s->lfb_vram_mapped) {
         if (!s2) {
+            printf("vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x%08X, len=0x%08X\n", (unsigned int)(isa_mem_base + 0xa0000), 0x8000);
             kvm_log_start(isa_mem_base + 0xa0000, 0x8000);
+            printf("vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x%08X, len=0x%08X\n", (unsigned int)(isa_mem_base + 0xa8000), 0x8000);
             kvm_log_start(isa_mem_base + 0xa8000, 0x8000);
             mark_dirty(isa_mem_base + 0xa0000, 0x10000);
         }
@@ -1630,6 +1634,7 @@ void vga_dirty_log_start(VGACommonState *s)
 #ifdef CONFIG_BOCHS_VBE
     if (kvm_enabled() && s->vbe_mapped) {
         if (!s3) {
+            printf("vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x%08X, len=0x%08X\n", VBE_DISPI_LFB_PHYSICAL_ADDRESS, s->vram_size);
             kvm_log_start(VBE_DISPI_LFB_PHYSICAL_ADDRESS, s->vram_size);
         }
         s3 = 1;
@@ -1965,6 +1970,7 @@ void vga_common_reset(VGACommonState *s)
     s->map_addr = 0;
     s->map_end = 0;
     s->lfb_vram_mapped = 0;
+//    s->lfb_vram_mapped = 1;
     s->bios_offset = 0;
     s->bios_size = 0;
     s->sr_index = 0;

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-05-12  6:14                     ` [Qemu-devel] " Gerhard Wiesinger
@ 2010-05-12  6:39                       ` Avi Kivity
  -1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-05-12  6:39 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Jamie Lokier, qemu-devel, kvm

On 05/12/2010 09:14 AM, Gerhard Wiesinger wrote:
> On Mon, 10 May 2010, Avi Kivity wrote:
>
>> On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>>>
>>
>> For 256 color more the first priority is to find out why direct 
>> mapping is not used.  I'd suggest tracing the code that makes this 
>> decision (in hw/*vga.c) and seeing if it's right or not.
>
> I think this is because A000 is not initialized for KVM (see log below 
> and logging patch attached).

Why isn't it initialized?

Did the guest configure things such as it is impossible to map it 
directly?  Or does the configuration allow direct mapping and qemu 
incorrectly decides that it cannot direct map?

Best would be to print out all the configuration registers and interpret 
them according to the specification.


>
> vga_dirty_log_start
> vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, 
> len=0x00008000
> BUG: kvm_dirty_pages_log_change: invalid parameters 
> 00000000000a0000-00000000000a7fff

Why does this happen?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-05-12  6:39                       ` Avi Kivity
  0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2010-05-12  6:39 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: qemu-devel, kvm

On 05/12/2010 09:14 AM, Gerhard Wiesinger wrote:
> On Mon, 10 May 2010, Avi Kivity wrote:
>
>> On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>>>
>>
>> For 256 color more the first priority is to find out why direct 
>> mapping is not used.  I'd suggest tracing the code that makes this 
>> decision (in hw/*vga.c) and seeing if it's right or not.
>
> I think this is because A000 is not initialized for KVM (see log below 
> and logging patch attached).

Why isn't it initialized?

Did the guest configure things such as it is impossible to map it 
directly?  Or does the configuration allow direct mapping and qemu 
incorrectly decides that it cannot direct map?

Best would be to print out all the configuration registers and interpret 
them according to the specification.


>
> vga_dirty_log_start
> vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, 
> len=0x00008000
> BUG: kvm_dirty_pages_log_change: invalid parameters 
> 00000000000a0000-00000000000a7fff

Why does this happen?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-22  6:12                 ` Gerhard Wiesinger
@ 2010-05-12 10:23                   ` Jamie Lokier
  -1 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-05-12 10:23 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, qemu-devel, kvm

Gerhard Wiesinger wrote:
> On Wed, 21 Apr 2010, Jamie Lokier wrote:
> 
> >Gerhard Wiesinger wrote:
> >>Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW
> >>of QEMU even from KVM must be possible (e.g. memory and port accesses are
> >>done on nearly every virtual device) and therefore I'm ending in C code in
> >>the QEMU hw/*.c directory. Therefore also the VGA memory area should be
> >>able to be accessable from KVM but with the specialized and fast memory
> >>access of QEMU.  Am I missing something?
> >
> >What you're missing is that when KVM calls out to QEMU to handle
> >hw/*.c traps, that call is very slow.  It's because the hardware-VM
> >support is a bit slow when the trap happens, and then the the call
> >from KVM in the kernel up to QEMU is a bit slow again.  Then all the
> >way back.  It adds up to a lot, for every I/O operation.
> 
> Isn't that then a general problem of KVM virtualization (oder hardware 
> virtualization) in general? Is this CPU dependend (AMD vs. Intel)?

Yes it is a general problem, but KVM emulates some time-critical
things in the kernel (like APIC and CPU instructions), so it's not too bad.

KVM is about 5x faster than TCG for most things, and slower for a few
things, so on balance it is usually faster.

The slow 256-colour mode writes sound like just a simple bug, though.
No need for complicated changes.

> >In 256-colour mode, KVM should be writing to the VGA memory at high
> >speed a lot like normal RAM, not trapping at the hardware-VM level,
> >and not calling up to the code in hw/*.c for every byte.
> 
> Yes, same picture to me: 256 color mode should be only a memory write (16 
> color mode is more difficult as pixel/byte mapping is not the same).
> But it looks like this isn't the case in this test scenario.
> 
> >You might double-check if your guest is using VGA "Mode X".  (See 
> >Wikipedia.)
> >
> >That was a way to accelerate VGA on real PCs, but it will be slow in
> >KVM for the same reasons as 16-colour mode.
> 
> Which way do you mean?

Look up Mode X on Wikipedia if you're interested, but it isn't
relevant to the problem you've reported.  Mode X cannot be enabled
with a BIOS call; it's a VGA hardware programming trick.  It would not
be useful in a VM environment.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-05-12 10:23                   ` Jamie Lokier
  0 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-05-12 10:23 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, kvm, qemu-devel

Gerhard Wiesinger wrote:
> On Wed, 21 Apr 2010, Jamie Lokier wrote:
> 
> >Gerhard Wiesinger wrote:
> >>Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW
> >>of QEMU even from KVM must be possible (e.g. memory and port accesses are
> >>done on nearly every virtual device) and therefore I'm ending in C code in
> >>the QEMU hw/*.c directory. Therefore also the VGA memory area should be
> >>able to be accessable from KVM but with the specialized and fast memory
> >>access of QEMU.  Am I missing something?
> >
> >What you're missing is that when KVM calls out to QEMU to handle
> >hw/*.c traps, that call is very slow.  It's because the hardware-VM
> >support is a bit slow when the trap happens, and then the the call
> >from KVM in the kernel up to QEMU is a bit slow again.  Then all the
> >way back.  It adds up to a lot, for every I/O operation.
> 
> Isn't that then a general problem of KVM virtualization (oder hardware 
> virtualization) in general? Is this CPU dependend (AMD vs. Intel)?

Yes it is a general problem, but KVM emulates some time-critical
things in the kernel (like APIC and CPU instructions), so it's not too bad.

KVM is about 5x faster than TCG for most things, and slower for a few
things, so on balance it is usually faster.

The slow 256-colour mode writes sound like just a simple bug, though.
No need for complicated changes.

> >In 256-colour mode, KVM should be writing to the VGA memory at high
> >speed a lot like normal RAM, not trapping at the hardware-VM level,
> >and not calling up to the code in hw/*.c for every byte.
> 
> Yes, same picture to me: 256 color mode should be only a memory write (16 
> color mode is more difficult as pixel/byte mapping is not the same).
> But it looks like this isn't the case in this test scenario.
> 
> >You might double-check if your guest is using VGA "Mode X".  (See 
> >Wikipedia.)
> >
> >That was a way to accelerate VGA on real PCs, but it will be slow in
> >KVM for the same reasons as 16-colour mode.
> 
> Which way do you mean?

Look up Mode X on Wikipedia if you're interested, but it isn't
relevant to the problem you've reported.  Mode X cannot be enabled
with a BIOS call; it's a VGA hardware programming trick.  It would not
be useful in a VM environment.

-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
  2010-04-22  5:44             ` Gerhard Wiesinger
@ 2010-05-12 10:34               ` Jamie Lokier
  -1 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-05-12 10:34 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, qemu-devel, kvm

Gerhard Wiesinger wrote:
> Can one switch to the old software vmm in VMWare?

Perhaps you can install a very old version of VMWare.
Maybe run it under KVM ;-)

> That was one of the reasons why I was looking for alternatives for 
> graphical DOS programs. Overall summary so far:
> 1.) QEMU without KVM: Problem with 286 DOS Extender instruction set, but 
> fast VGA
> 2.) QEMU with KVM: 286 DOS Extender apps ok, but slow VGA memory 
> performance
> 3.) VMWare Server 2.0 under Linux, application ok, but slow VGA memory 
> performance
> 4.) Virtual PC: Problems with 286 DOS Extender
> 5.) Bochs: Works well, but very slow.

I would be interested in the 286 DOS Extender issue, as I'd like to
use some 286 programs in QEMU at some point.

There were some changes to KVM in the kernel recently.  Were those
needed to get the 286 apps working?

> Looks like that VMWare Server and QEMU with KVM maybe have the same 
> architectural problems going through the whole slow chain from Guest OS to 
> virtualization layer for VGA writes.

They do have a similar architecture.

the VGA write speed is a bit surprising, as it should be fast in
256-colour non-modeX modes for both.  But maybe there's something
we've missed that makes it architecturally slow.  It will be
interesting to see what you find :-)

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance
@ 2010-05-12 10:34               ` Jamie Lokier
  0 siblings, 0 replies; 52+ messages in thread
From: Jamie Lokier @ 2010-05-12 10:34 UTC (permalink / raw)
  To: Gerhard Wiesinger; +Cc: Avi Kivity, kvm, qemu-devel

Gerhard Wiesinger wrote:
> Can one switch to the old software vmm in VMWare?

Perhaps you can install a very old version of VMWare.
Maybe run it under KVM ;-)

> That was one of the reasons why I was looking for alternatives for 
> graphical DOS programs. Overall summary so far:
> 1.) QEMU without KVM: Problem with 286 DOS Extender instruction set, but 
> fast VGA
> 2.) QEMU with KVM: 286 DOS Extender apps ok, but slow VGA memory 
> performance
> 3.) VMWare Server 2.0 under Linux, application ok, but slow VGA memory 
> performance
> 4.) Virtual PC: Problems with 286 DOS Extender
> 5.) Bochs: Works well, but very slow.

I would be interested in the 286 DOS Extender issue, as I'd like to
use some 286 programs in QEMU at some point.

There were some changes to KVM in the kernel recently.  Were those
needed to get the 286 apps working?

> Looks like that VMWare Server and QEMU with KVM maybe have the same 
> architectural problems going through the whole slow chain from Guest OS to 
> virtualization layer for VGA writes.

They do have a similar architecture.

the VGA write speed is a bit surprising, as it should be fast in
256-colour non-modeX modes for both.  But maybe there's something
we've missed that makes it architecturally slow.  It will be
interesting to see what you find :-)

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance - Update
  2010-05-12  6:39                       ` Avi Kivity
@ 2011-02-18  7:32                         ` Gerhard Wiesinger
  -1 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2011-02-18  7:32 UTC (permalink / raw)
  To: Avi Kivity, seabios, qemu-devel, kvm; +Cc: Jan Kiszka

Hello,

Some update on this issue, archive: 
http://www.mail-archive.com/kvm@vger.kernel.org/msg32600.html

Seems to be that cirrus VGA is now ok (>1000MB/s up to 2000MB/s). But 
cirrus has only 320x200x256colors (Mode 13h) mode implemented in VESA 
BIOS.

VMWare and std VGA still have the performance issue.

I guess improvement is related to the following commit:
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commitdiff;h=0d14905b5eb8aa1c2e195e13478bb7c74e1776db
Especially i guess the change in hw/cirrus_vga.c.

Any idea how to fix:
1.) More VESA modes in cirrus VGA (is VESA emulation done by Seabios or by 
KVM cirrus BIOS?) 
2.) fix in VMWare and std VGA modes the performance, too

Versions are latest dev versions of KVM user part and Seabios from GIT.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 12 May 2010, Avi Kivity wrote:

> On 05/12/2010 09:14 AM, Gerhard Wiesinger wrote:
>> On Mon, 10 May 2010, Avi Kivity wrote:
>> 
>>> On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>>>> 
>>> 
>>> For 256 color more the first priority is to find out why direct mapping is 
>>> not used.  I'd suggest tracing the code that makes this decision (in 
>>> hw/*vga.c) and seeing if it's right or not.
>> 
>> I think this is because A000 is not initialized for KVM (see log below and 
>> logging patch attached).
>
> Why isn't it initialized?
>
> Did the guest configure things such as it is impossible to map it directly? 
> Or does the configuration allow direct mapping and qemu incorrectly decides 
> that it cannot direct map?
>
> Best would be to print out all the configuration registers and interpret them 
> according to the specification.
>
>
>> 
>> vga_dirty_log_start
>> vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, 
>> len=0x00008000
>> BUG: kvm_dirty_pages_log_change: invalid parameters 
>> 00000000000a0000-00000000000a7fff
>
> Why does this happen?
>
> -- 
> Do not meddle in the internals of kernels, for they are subtle and quick to 
> panic.
>
>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] Re: QEMU-KVM and video performance - Update
@ 2011-02-18  7:32                         ` Gerhard Wiesinger
  0 siblings, 0 replies; 52+ messages in thread
From: Gerhard Wiesinger @ 2011-02-18  7:32 UTC (permalink / raw)
  To: Avi Kivity, seabios, qemu-devel, kvm; +Cc: Jan Kiszka, Kevin O'Connor

Hello,

Some update on this issue, archive: 
http://www.mail-archive.com/kvm@vger.kernel.org/msg32600.html

Seems to be that cirrus VGA is now ok (>1000MB/s up to 2000MB/s). But 
cirrus has only 320x200x256colors (Mode 13h) mode implemented in VESA 
BIOS.

VMWare and std VGA still have the performance issue.

I guess improvement is related to the following commit:
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commitdiff;h=0d14905b5eb8aa1c2e195e13478bb7c74e1776db
Especially i guess the change in hw/cirrus_vga.c.

Any idea how to fix:
1.) More VESA modes in cirrus VGA (is VESA emulation done by Seabios or by 
KVM cirrus BIOS?) 
2.) fix in VMWare and std VGA modes the performance, too

Versions are latest dev versions of KVM user part and Seabios from GIT.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 12 May 2010, Avi Kivity wrote:

> On 05/12/2010 09:14 AM, Gerhard Wiesinger wrote:
>> On Mon, 10 May 2010, Avi Kivity wrote:
>> 
>>> On 05/09/2010 10:35 PM, Gerhard Wiesinger wrote:
>>>>>> 
>>> 
>>> For 256 color more the first priority is to find out why direct mapping is 
>>> not used.  I'd suggest tracing the code that makes this decision (in 
>>> hw/*vga.c) and seeing if it's right or not.
>> 
>> I think this is because A000 is not initialized for KVM (see log below and 
>> logging patch attached).
>
> Why isn't it initialized?
>
> Did the guest configure things such as it is impossible to map it directly? 
> Or does the configuration allow direct mapping and qemu incorrectly decides 
> that it cannot direct map?
>
> Best would be to print out all the configuration registers and interpret them 
> according to the specification.
>
>
>> 
>> vga_dirty_log_start
>> vga_dirty_log_start_mapping_lfb_vram_mapped, start=0x000A0000, 
>> len=0x00008000
>> BUG: kvm_dirty_pages_log_change: invalid parameters 
>> 00000000000a0000-00000000000a7fff
>
> Why does this happen?
>
> -- 
> Do not meddle in the internals of kernels, for they are subtle and quick to 
> panic.
>
>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2011-02-18  7:33 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-19 19:14 QEMU-KVM and video performance Gerhard Wiesinger
2010-04-19 19:14 ` [Qemu-devel] " Gerhard Wiesinger
2010-04-21  8:59 ` Avi Kivity
2010-04-21  8:59   ` [Qemu-devel] " Avi Kivity
2010-04-21 10:08   ` Jamie Lokier
2010-04-21 10:49     ` Avi Kivity
2010-04-21 18:14       ` Gerhard Wiesinger
2010-04-21 18:14         ` [Qemu-devel] " Gerhard Wiesinger
2010-04-21 20:49         ` Avi Kivity
2010-04-21 20:49           ` Avi Kivity
2010-04-22  5:37           ` Gerhard Wiesinger
2010-04-22  5:37             ` Gerhard Wiesinger
2010-04-22  6:57             ` Avi Kivity
2010-04-22  6:57               ` Avi Kivity
2010-04-21 18:39       ` Jamie Lokier
2010-04-21 20:51         ` Avi Kivity
2010-04-21 21:19           ` Jamie Lokier
2010-04-22  5:44           ` Gerhard Wiesinger
2010-04-22  5:44             ` Gerhard Wiesinger
2010-05-12 10:34             ` Jamie Lokier
2010-05-12 10:34               ` Jamie Lokier
2010-04-21 18:09   ` Gerhard Wiesinger
2010-04-21 18:33     ` Jamie Lokier
2010-04-21 18:33       ` Jamie Lokier
2010-04-21 18:50       ` Gerhard Wiesinger
2010-04-21 18:50         ` Gerhard Wiesinger
2010-04-21 18:53         ` Jamie Lokier
2010-04-21 18:53           ` Jamie Lokier
2010-04-21 19:08           ` Gerhard Wiesinger
2010-04-21 19:08             ` Gerhard Wiesinger
2010-04-21 21:30             ` Jamie Lokier
2010-04-21 21:30               ` Jamie Lokier
2010-04-22  6:12               ` Gerhard Wiesinger
2010-04-22  6:12                 ` Gerhard Wiesinger
2010-05-12 10:23                 ` Jamie Lokier
2010-05-12 10:23                   ` Jamie Lokier
2010-04-21 20:56         ` Avi Kivity
2010-04-21 20:56           ` Avi Kivity
2010-04-22  6:04           ` Gerhard Wiesinger
2010-04-22  6:04             ` Gerhard Wiesinger
2010-04-22  7:03             ` Avi Kivity
2010-04-22  7:03               ` Avi Kivity
2010-05-09 19:35               ` Gerhard Wiesinger
2010-05-09 19:35                 ` Gerhard Wiesinger
2010-05-10  7:32                 ` Avi Kivity
2010-05-10  7:32                   ` Avi Kivity
2010-05-12  6:14                   ` Gerhard Wiesinger
2010-05-12  6:14                     ` [Qemu-devel] " Gerhard Wiesinger
2010-05-12  6:39                     ` Avi Kivity
2010-05-12  6:39                       ` Avi Kivity
2011-02-18  7:32                       ` [Qemu-devel] Re: QEMU-KVM and video performance - Update Gerhard Wiesinger
2011-02-18  7:32                         ` Gerhard Wiesinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.