All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
@ 2015-07-28  7:52 Dennis Luehring
  2015-07-28  9:54 ` Artyom Tarasenko
                   ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-07-28  7:52 UTC (permalink / raw)
  To: qemu-devel

(i've posted the question already on qemu-discuss@nongnu.org but was 
toled to better use this mailing list)

i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
development before-real-hardware big-endian/unaligned tests

i've benchmarked compiling of single pugixml.cpp 
(https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)

qemu-system-sparc64: >180sek
x64 native : ~ 2sek

so my sparc64 emulation is around 90 times slower then native x64

my system:

using lastest qemu git 2.3.x, with virtio for harddisk/network and qcow2 
image

https://depositfiles.com/files/sj20aqwp0 (~280MB
press the "regular download" button, wait some seconds, solve the
chapca, "download file in regular mode by browser"

there is pugi_sparc.txt in the 7z which describes how to start,use and
what is installed in the image

qemu runs natively under a ubuntu 15.04 (x64), Core i7, 8GB system doing
nothing but qemu

installed is

gcc/g++ 4.6
make
sshd running

compiling cmake 2.3.2 tooked around 10h
compiling pugixml takes also very very long

"top perf" from guest and host while compiling pugixml don't show big
blockers or something over time
http://pastebin.com/D2fUpPrM

anything i can do to speedup the emulation?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-28  7:52 [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? Dennis Luehring
@ 2015-07-28  9:54 ` Artyom Tarasenko
  2015-07-29  6:20   ` Dennis Luehring
  2015-07-29  8:07   ` Dennis Luehring
  2015-07-29  9:17 ` Karel Gardas
  2015-08-27 15:29 ` Artyom Tarasenko
  2 siblings, 2 replies; 80+ messages in thread
From: Artyom Tarasenko @ 2015-07-28  9:54 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Aurelien Jarno

On Tue, Jul 28, 2015 at 9:52 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> (i've posted the question already on qemu-discuss@nongnu.org but was toled
> to better use this mailing list)
>
> i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
> development before-real-hardware big-endian/unaligned tests
>
> i've benchmarked compiling of single pugixml.cpp
> (https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)
>
> qemu-system-sparc64: >180sek
> x64 native : ~ 2sek
>
> so my sparc64 emulation is around 90 times slower then native x64
>
> my system:
>
> using lastest qemu git 2.3.x, with virtio for harddisk/network and qcow2
> image
>
> https://depositfiles.com/files/sj20aqwp0 (~280MB
> press the "regular download" button, wait some seconds, solve the
> chapca, "download file in regular mode by browser"
>
> there is pugi_sparc.txt in the 7z which describes how to start,use and
> what is installed in the image
>
> qemu runs natively under a ubuntu 15.04 (x64), Core i7, 8GB system doing
> nothing but qemu
>
> installed is
>
> gcc/g++ 4.6
> make
> sshd running
>
> compiling cmake 2.3.2 tooked around 10h
> compiling pugixml takes also very very long
>
> "top perf" from guest and host while compiling pugixml don't show big
> blockers or something over time
> http://pastebin.com/D2fUpPrM
>
> anything i can do to speedup the emulation?

Maybe try the fresh tcg optimizer improvements from Aurelien:
https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05133.html

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-28  9:54 ` Artyom Tarasenko
@ 2015-07-29  6:20   ` Dennis Luehring
  2015-07-29  8:23     ` Artyom Tarasenko
  2015-07-29 15:01     ` Aurelien Jarno
  2015-07-29  8:07   ` Dennis Luehring
  1 sibling, 2 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-07-29  6:20 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno

Am 28.07.2015 um 11:54 schrieb Artyom Tarasenko:
>> >anything i can do to speedup the emulation?
> Maybe try the fresh tcg optimizer improvements from Aurelien:
> https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05133.html

it don't seems to target sparc (or isn't ready) many x86/x64 only etc. 
comments

it seems that not many people are interested in sparc(64) emulation speed
or on vacation :)
or the emulation is just perfect in par with a real sun4u (i don't own 
one so i can't test)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-28  9:54 ` Artyom Tarasenko
  2015-07-29  6:20   ` Dennis Luehring
@ 2015-07-29  8:07   ` Dennis Luehring
  2015-07-29 15:03     ` Aurelien Jarno
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-07-29  8:07 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno

currently qemu emulates an TI UltraSparc IIi (Sabre)
does that mean that qemu emulates the sparc somwhere around 270-480Mhz 
(i can't find the running mhz in qemu)

how can i get the Mhz the sparc is running?
(cpuinfo and lscpu missing Mhz, dmidecode is not available, 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq is empty, lshw 
does not show anything)

is there a way to increase the clock speed of the cpu/fpu without 
killing other timemings - or switch to a different cpu?
(because i do not need real speed behavior for my testing)

another benchmark

sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
   Host x64    :   1.3580s
   Qemu SPARC64: 184.2532s

------------------------------
/proc/cpuinfo:

cpu             : TI UltraSparc IIi (Sabre)
fpu             : UltraSparc IIi integrated FPU
pmu             : ultra12
prom            : OBP 3.10.24 1999/01/01 01:01
type            : sun4u
ncpus probed    : 1
ncpus active    : 1
D$ parity tl1   : 0
I$ parity tl1   : 0
Cpu0ClkTck      : 0000000005f5e100
cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type        : Spitfire

lscpu:

Architecture:          sparc64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                1
On-line CPU(s) list:   0
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             1

https://en.wikipedia.org/wiki/UltraSPARC_II

UltraSPARC IIi

The UltraSPARC IIi "Sabre" was a low-cost version introduced in 1997 
that operated at 270 to 360 MHz.
It was fabricated in a 0.35 µm process and possessed a die size of 156 mm².
It dissipated 21 W and used a 1.9 V power supply. It had a 256 KB to 2 
MB L2 cache.
In 1998, a version code-named Sapphire-Red, was fabricated in a 0.25 µm 
process, enabling
the microprocessor to operate at 333 to 480 MHz. It dissipated 21 W at 
440 MHz and used a 1.9 V power supply.

---------------------------------
my host machine is

lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 30
Model name:            Intel(R) Core(TM) i7 CPU       Q 740  @ 1.73GHz
Stepping:              5
CPU MHz:               933.000
CPU max MHz:           1734,0000
CPU min MHz:           933,0000
BogoMIPS:              3458.22
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7


Am 28.07.2015 um 11:54 schrieb Artyom Tarasenko:
> On Tue, Jul 28, 2015 at 9:52 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> > (i've posted the question already on qemu-discuss@nongnu.org but was toled
> > to better use this mailing list)
> >
> > i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
> > development before-real-hardware big-endian/unaligned tests
> >
> > i've benchmarked compiling of single pugixml.cpp
> > (https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)
> >
> > qemu-system-sparc64: >180sek
> > x64 native : ~ 2sek
> >
> > so my sparc64 emulation is around 90 times slower then native x64
> >
> > my system:
> >
> > using lastest qemu git 2.3.x, with virtio for harddisk/network and qcow2
> > image
> >
> > https://depositfiles.com/files/sj20aqwp0 (~280MB
> > press the "regular download" button, wait some seconds, solve the
> > chapca, "download file in regular mode by browser"
> >
> > there is pugi_sparc.txt in the 7z which describes how to start,use and
> > what is installed in the image
> >
> > qemu runs natively under a ubuntu 15.04 (x64), Core i7, 8GB system doing
> > nothing but qemu
> >
> > installed is
> >
> > gcc/g++ 4.6
> > make
> > sshd running
> >
> > compiling cmake 2.3.2 tooked around 10h
> > compiling pugixml takes also very very long
> >
> > "top perf" from guest and host while compiling pugixml don't show big
> > blockers or something over time
> > http://pastebin.com/D2fUpPrM
> >
> > anything i can do to speedup the emulation?
>
> Maybe try the fresh tcg optimizer improvements from Aurelien:
> https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05133.html
>
> Artyom
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29  6:20   ` Dennis Luehring
@ 2015-07-29  8:23     ` Artyom Tarasenko
  2015-07-29 15:01     ` Aurelien Jarno
  1 sibling, 0 replies; 80+ messages in thread
From: Artyom Tarasenko @ 2015-07-29  8:23 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Aurelien Jarno

On Wed, Jul 29, 2015 at 8:20 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 28.07.2015 um 11:54 schrieb Artyom Tarasenko:
>>>
>>> >anything i can do to speedup the emulation?
>>
>> Maybe try the fresh tcg optimizer improvements from Aurelien:
>> https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05133.html
>
>
> it don't seems to target sparc (or isn't ready) many x86/x64 only etc.
> comments

Which is exactly what you need, because I guess you run sparc
emulation on a x86_64 host.
Aurelien's patch series is about TCG, and a"target" there is the host
where code is emulated.
So, I'd expect it to be worth trying: if it brings 7% performance
improvement to MIPS, it should also bring a measurable performance
improvement to SPARC as well.


-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-28  7:52 [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? Dennis Luehring
  2015-07-28  9:54 ` Artyom Tarasenko
@ 2015-07-29  9:17 ` Karel Gardas
  2015-07-29 10:20   ` Dennis Luehring
  2015-07-29 10:55   ` Dennis Luehring
  2015-08-27 15:29 ` Artyom Tarasenko
  2 siblings, 2 replies; 80+ messages in thread
From: Karel Gardas @ 2015-07-29  9:17 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel

On Tue, Jul 28, 2015 at 9:52 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> (i've posted the question already on qemu-discuss@nongnu.org but was toled
> to better use this mailing list)
>
> i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
> development before-real-hardware big-endian/unaligned tests
>
> i've benchmarked compiling of single pugixml.cpp
> (https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)
>
> qemu-system-sparc64: >180sek
> x64 native : ~ 2sek

Artyom is interested in native SPARC speed, here are my bits:
Solaris 11.2 + GNU C++ 4.8.2 on both E5-2620 and T1 1GHz

- x64: 2.1s
- sparc: 17.7s

everything like with your current git, I've just cloned, run gmake and
copied command line so this is:
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c
-MMD -MP -o build/make-g++-debug-standard/src/pugixml.cpp.o

numbers average from 3-4 runs. Also this is *cheap* T1, with current
state of the art SPARC you can get way much better numbers I'm sure,
even cheap M4000 should be way better. Anyway this is just for the
reference.

Last note: few months ago I've been discussing with some
Qemu/SPARC64/Debian guys about qemu/sparc64 slowness. The discussion
was started by me when I've compared numbers (compilation performance
with GNU C like you do here) on qemu-sparc64 and on qemu-aarch64 and
they were way much different. I used nbnech2 for benchmarking qemu
emulation and time make nbench2 for benchmarking compilation speed.
What was interesting was that nbench2 was comparable on aarch64 and
sparc64 but time make was completely off (sparc much slower)  I also
provided few profile runs but the conclusion from this time is that
perhaps sparc is hard due to its MMU, I don't remember well. If
anybody is interested I can dig those old emails. IIRC I've used Qemu
2.2.0 for this benchmarking so not that old.

Thanks,
Karel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29  9:17 ` Karel Gardas
@ 2015-07-29 10:20   ` Dennis Luehring
  2015-07-29 13:45     ` Karel Gardas
  2015-07-29 10:55   ` Dennis Luehring
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-07-29 10:20 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel

Am 29.07.2015 um 11:17 schrieb Karel Gardas:
> If
> anybody is interested I can dig those old emails.

would be nice

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29  9:17 ` Karel Gardas
  2015-07-29 10:20   ` Dennis Luehring
@ 2015-07-29 10:55   ` Dennis Luehring
  2015-07-29 12:34     ` Karel Gardas
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-07-29 10:55 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel

Am 29.07.2015 um 11:17 schrieb Karel Gardas:
> What was interesting was that nbench2 was comparable on aarch64

was aarch64 a little or big-endian system, with unaligned accesses possible?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 10:55   ` Dennis Luehring
@ 2015-07-29 12:34     ` Karel Gardas
  2015-07-29 12:38       ` Karel Gardas
  2015-07-29 13:55       ` Dennis Luehring
  0 siblings, 2 replies; 80+ messages in thread
From: Karel Gardas @ 2015-07-29 12:34 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel

On Wed, Jul 29, 2015 at 12:55 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 29.07.2015 um 11:17 schrieb Karel Gardas:
>>
>> What was interesting was that nbench2 was comparable on aarch64
>
>
> was aarch64 a little or big-endian system, with unaligned accesses possible?

This was Ubuntu with Linux kernel 3.13 so I guess little and unaligned
possible to make least pain for developers. -- but I'm just guessing
since I can't boot that now, it was always a little bit tricky to boot
that btw. Once it boots, tell me how to find the asnwers to your
questions. Will dmesg do or cat /proc/cpuinfo or just file /bin/bash?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 12:34     ` Karel Gardas
@ 2015-07-29 12:38       ` Karel Gardas
  2015-07-29 13:55       ` Dennis Luehring
  1 sibling, 0 replies; 80+ messages in thread
From: Karel Gardas @ 2015-07-29 12:38 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel

aarch64 booted, this is ubuntu 14.04.1 LTS, LSB:

ubuntu@ubuntu:~$ file /bin//bash
/bin//bash: ELF 64-bit LSB  executable, ARM aarch64, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 3.7.0,
BuildID[sha1]=36b671892d00161eaeb9f602049f9ce830e9056e, stripped
ubuntu@ubuntu:~$ cat /proc/cpuinfo
Processor    : AArch64 Processor rev 0 (aarch64)
processor    : 0
Features    : fp asimd evtstrm
CPU implementer    : 0x41
CPU architecture: AArch64
CPU variant    : 0x1
CPU part    : 0xd07
CPU revision    : 0

Hardware    : linux,dummy-virt

On Wed, Jul 29, 2015 at 2:34 PM, Karel Gardas <gardask@gmail.com> wrote:
> On Wed, Jul 29, 2015 at 12:55 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
>> Am 29.07.2015 um 11:17 schrieb Karel Gardas:
>>>
>>> What was interesting was that nbench2 was comparable on aarch64
>>
>>
>> was aarch64 a little or big-endian system, with unaligned accesses possible?
>
> This was Ubuntu with Linux kernel 3.13 so I guess little and unaligned
> possible to make least pain for developers. -- but I'm just guessing
> since I can't boot that now, it was always a little bit tricky to boot
> that btw. Once it boots, tell me how to find the asnwers to your
> questions. Will dmesg do or cat /proc/cpuinfo or just file /bin/bash?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 10:20   ` Dennis Luehring
@ 2015-07-29 13:45     ` Karel Gardas
  2015-07-29 15:13       ` Aurelien Jarno
  0 siblings, 1 reply; 80+ messages in thread
From: Karel Gardas @ 2015-07-29 13:45 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel

On Wed, Jul 29, 2015 at 12:20 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 29.07.2015 um 11:17 schrieb Karel Gardas:
>>
>> If
>> anybody is interested I can dig those old emails.
>
>
> would be nice

Here is speed comparison:
https://lists.debian.org/debian-sparc/2015/02/msg00001.html but whole
thread started in january here:
https://lists.debian.org/debian-sparc/2015/01/msg00000.html

Mark then asked for profiles, I see I send them privately due to
attachements, the email is:

off-list as I'm attaching files which may be too bit for list. Also
I'm not sure if this is still relevant to debian-sparc@

Anyway, difference in IO is negligible. When I compile on SPARC on
tmpfs it was still 6m40s. On SPARC it's using -drive while on AArch64
it uses all the virtio optimization probably.

Anyway, with gprof you've hit the point. Attached two files (text
output from gprof). One shows profiler as a reference, just
boot/login/su root/poweroff/kill qemu and another is the same but ~5
hours of compilation of nbench2 in shell loop.

reference shows:
   %  cumulative    self              self    total
 time   seconds   seconds    calls  ms/call  ms/call name
 42.9     145.84   145.84                            cpu_sparc_exec [1]
  7.8     172.44    26.60                            tcg_optimize [2]
  4.4     187.38    14.94                            tcg_reg_alloc_op [3]
  4.4     202.20    14.82 get_physical_address_data [4]
  3.8     215.26    13.06 tcg_liveness_analysis [5]


while compile loop shows:
   %  cumulative    self              self    total
 time   seconds   seconds    calls  ms/call  ms/call name
 21.2    1008.09  1008.09                            tlb_flush_page [1]
 15.2    1731.09   723.00                            cpu_sparc_exec [2]
 13.6    2374.79   643.70                            tb_flush_jmp_cache [3]
  9.5    2823.86   449.07                            tcg_optimize [4]
  4.2    3024.26   200.40 tcg_liveness_analysis [5]


that's indeed a difference. -- I assume cpu_sparc_exec is what does
actual work here...

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 12:34     ` Karel Gardas
  2015-07-29 12:38       ` Karel Gardas
@ 2015-07-29 13:55       ` Dennis Luehring
  2015-07-29 14:41         ` Karel Gardas
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-07-29 13:55 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel

Am 29.07.2015 um 14:34 schrieb Karel Gardas:
>   Once it boots, tell me how to find the asnwers to your
> questions.

compile with gcc test.cpp and run

-----------
#include <stdint.h>
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <assert.h>

int main()
{
   uint16_t value = 0x1234;

   {
     volatile uint8_t* ptr = (uint8_t*)&value;
     printf("endianess: %s\n", ptr[0]==0x34 ? "little":"big");
   }

   uint8_t buffer[1+sizeof(value)]={0};
   uint8_t* ptr = buffer;
   if(ptrdiff_t(ptr) % 2 == 0)
   {
     ++ptr;
   }
   uint16_t* unaligned_word = (uint16_t*)ptr;

   ::memcpy(unaligned_word, &value, sizeof(value));

   printf("try to access unaligned word\n");
   uint16_t read = *unaligned_word; // here can happen Bus-Errors, 
Exceptions, whatever your architecture likes
   printf("  equal to 0x%04X: %s\n", value, read == value ? 
"YES":"!!NO!!"); // sometimes you get the value - but its still wrong
   printf("  value: 0x%04X\n", read);
   printf("done\n");

   return 0;
}
-----------

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 13:55       ` Dennis Luehring
@ 2015-07-29 14:41         ` Karel Gardas
  2015-07-30  3:47           ` Dennis Luehring
  0 siblings, 1 reply; 80+ messages in thread
From: Karel Gardas @ 2015-07-29 14:41 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel

ubuntu@ubuntu:~$ ./a.out
endianess: little
try to access unaligned word
  equal to 0x1234: YES
  value: 0x1234
done

On Wed, Jul 29, 2015 at 3:55 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 29.07.2015 um 14:34 schrieb Karel Gardas:
>>
>>   Once it boots, tell me how to find the asnwers to your
>> questions.
>
>
> compile with gcc test.cpp and run
>
> -----------
> #include <stdint.h>
> #include <stdio.h>
> #include <stddef.h>
> #include <string.h>
> #include <assert.h>
>
> int main()
> {
>   uint16_t value = 0x1234;
>
>   {
>     volatile uint8_t* ptr = (uint8_t*)&value;
>     printf("endianess: %s\n", ptr[0]==0x34 ? "little":"big");
>   }
>
>   uint8_t buffer[1+sizeof(value)]={0};
>   uint8_t* ptr = buffer;
>   if(ptrdiff_t(ptr) % 2 == 0)
>   {
>     ++ptr;
>   }
>   uint16_t* unaligned_word = (uint16_t*)ptr;
>
>   ::memcpy(unaligned_word, &value, sizeof(value));
>
>   printf("try to access unaligned word\n");
>   uint16_t read = *unaligned_word; // here can happen Bus-Errors,
> Exceptions, whatever your architecture likes
>   printf("  equal to 0x%04X: %s\n", value, read == value ? "YES":"!!NO!!");
> // sometimes you get the value - but its still wrong
>   printf("  value: 0x%04X\n", read);
>   printf("done\n");
>
>   return 0;
> }
> -----------
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29  6:20   ` Dennis Luehring
  2015-07-29  8:23     ` Artyom Tarasenko
@ 2015-07-29 15:01     ` Aurelien Jarno
  2015-07-30  3:52       ` Dennis Luehring
  1 sibling, 1 reply; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-29 15:01 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Artyom Tarasenko

On 2015-07-29 08:20, Dennis Luehring wrote:
> Am 28.07.2015 um 11:54 schrieb Artyom Tarasenko:
> >>>anything i can do to speedup the emulation?
> >Maybe try the fresh tcg optimizer improvements from Aurelien:
> >https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05133.html
> 
> it don't seems to target sparc (or isn't ready) many x86/x64 only etc.
> comments

It has been tested mostly on x86-64, but it contains generic
optimizations for all targets. It should bring even more improvements on
RISC hosts, as it avoid long sequence of instructions for constant
loading.

> it seems that not many people are interested in sparc(64) emulation speed
> or on vacation :)
> or the emulation is just perfect in par with a real sun4u (i don't own one
> so i can't test)

The point is that emulation has a cost, and it's quite difficult to
to lower it and thus improve the emulation speed. For what I have been
able to see, the sparc (but also mips and sh4) targets would benefit
from MMU emulation improvements. But that's not easy to do either.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29  8:07   ` Dennis Luehring
@ 2015-07-29 15:03     ` Aurelien Jarno
  0 siblings, 0 replies; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-29 15:03 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Artyom Tarasenko

On 2015-07-29 10:07, Dennis Luehring wrote:
> currently qemu emulates an TI UltraSparc IIi (Sabre)
> does that mean that qemu emulates the sparc somwhere around 270-480Mhz (i
> can't find the running mhz in qemu)

No, it emulates the TI UltraSparc IIi as fast as it can. It mostly
depends on your host CPU.

> how can i get the Mhz the sparc is running?
> (cpuinfo and lscpu missing Mhz, dmidecode is not available,
> /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq is empty, lshw does
> not show anything)

QEMU is not cycle accurate, so that would not mean anything.

> is there a way to increase the clock speed of the cpu/fpu without killing
> other timemings - or switch to a different cpu?
> (because i do not need real speed behavior for my testing)

The easiest (but not the cheapest) is probably to upgrade your host
machine to something faster.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 13:45     ` Karel Gardas
@ 2015-07-29 15:13       ` Aurelien Jarno
  0 siblings, 0 replies; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-29 15:13 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel, Dennis Luehring

On 2015-07-29 15:45, Karel Gardas wrote:
> On Wed, Jul 29, 2015 at 12:20 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> > Am 29.07.2015 um 11:17 schrieb Karel Gardas:
> >>
> >> If
> >> anybody is interested I can dig those old emails.
> >
> >
> > would be nice
> 
> Here is speed comparison:
> https://lists.debian.org/debian-sparc/2015/02/msg00001.html but whole
> thread started in january here:
> https://lists.debian.org/debian-sparc/2015/01/msg00000.html
> 
> Mark then asked for profiles, I see I send them privately due to
> attachements, the email is:
> 
> off-list as I'm attaching files which may be too bit for list. Also
> I'm not sure if this is still relevant to debian-sparc@
> 
> Anyway, difference in IO is negligible. When I compile on SPARC on
> tmpfs it was still 6m40s. On SPARC it's using -drive while on AArch64
> it uses all the virtio optimization probably.
> 
> Anyway, with gprof you've hit the point. Attached two files (text
> output from gprof). One shows profiler as a reference, just
> boot/login/su root/poweroff/kill qemu and another is the same but ~5
> hours of compilation of nbench2 in shell loop.
> 
> reference shows:
>    %  cumulative    self              self    total
>  time   seconds   seconds    calls  ms/call  ms/call name
>  42.9     145.84   145.84                            cpu_sparc_exec [1]
>   7.8     172.44    26.60                            tcg_optimize [2]

tcg_optimize should be improved by the patchset I posted.

>   4.4     187.38    14.94                            tcg_reg_alloc_op [3]
>   4.4     202.20    14.82 get_physical_address_data [4]
>   3.8     215.26    13.06 tcg_liveness_analysis [5]
> 
> 
> while compile loop shows:
>    %  cumulative    self              self    total
>  time   seconds   seconds    calls  ms/call  ms/call name
>  21.2    1008.09  1008.09                            tlb_flush_page [1]
>  15.2    1731.09   723.00                            cpu_sparc_exec [2]
>  13.6    2374.79   643.70                            tb_flush_jmp_cache [3]
>   9.5    2823.86   449.07                            tcg_optimize [4]
>   4.2    3024.26   200.40 tcg_liveness_analysis [5]
> 
> 
> that's indeed a difference. -- I assume cpu_sparc_exec is what does
> actual work here...

Depending on how your profiling is done or not it might not. It might be
that the time spent in cpu_sparc_exec is just the time needed to look
for the translated code in the TB cache.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 14:41         ` Karel Gardas
@ 2015-07-30  3:47           ` Dennis Luehring
  2015-07-30  7:12             ` Paolo Bonzini
  2015-07-30  7:55             ` Aurelien Jarno
  0 siblings, 2 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-07-30  3:47 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel

so your aarch64 is just less todo for qemu - not EVERY >= 16bit memory 
access needs swapping or needs check for unaligned access to emulate 
bus-erros

Am 29.07.2015 um 16:41 schrieb Karel Gardas:
> ubuntu@ubuntu:~$ ./a.out
> endianess: little
> try to access unaligned word
>    equal to 0x1234: YES
>    value: 0x1234
> done
>
> On Wed, Jul 29, 2015 at 3:55 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> > Am 29.07.2015 um 14:34 schrieb Karel Gardas:
> >>
> >>   Once it boots, tell me how to find the asnwers to your
> >> questions.
> >
> >
> > compile with gcc test.cpp and run
> >
> > -----------
> > #include <stdint.h>
> > #include <stdio.h>
> > #include <stddef.h>
> > #include <string.h>
> > #include <assert.h>
> >
> > int main()
> > {
> >   uint16_t value = 0x1234;
> >
> >   {
> >     volatile uint8_t* ptr = (uint8_t*)&value;
> >     printf("endianess: %s\n", ptr[0]==0x34 ? "little":"big");
> >   }
> >
> >   uint8_t buffer[1+sizeof(value)]={0};
> >   uint8_t* ptr = buffer;
> >   if(ptrdiff_t(ptr) % 2 == 0)
> >   {
> >     ++ptr;
> >   }
> >   uint16_t* unaligned_word = (uint16_t*)ptr;
> >
> >   ::memcpy(unaligned_word, &value, sizeof(value));
> >
> >   printf("try to access unaligned word\n");
> >   uint16_t read = *unaligned_word; // here can happen Bus-Errors,
> > Exceptions, whatever your architecture likes
> >   printf("  equal to 0x%04X: %s\n", value, read == value ? "YES":"!!NO!!");
> > // sometimes you get the value - but its still wrong
> >   printf("  value: 0x%04X\n", read);
> >   printf("done\n");
> >
> >   return 0;
> > }
> > -----------
> >

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-29 15:01     ` Aurelien Jarno
@ 2015-07-30  3:52       ` Dennis Luehring
  2015-07-30  7:52         ` Aurelien Jarno
  0 siblings, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-07-30  3:52 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Artyom Tarasenko

Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> The point is that emulation has a cost, and it's quite difficult to
> to lower it and thus improve the emulation speed.

so its just not strange for you to see an 1/100...200 of the native x64 
speed under qemu/SPARC64
i hoped that someone will jump up an shout "its impossible - it needs to 
be a bug" ...sadly not

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  3:47           ` Dennis Luehring
@ 2015-07-30  7:12             ` Paolo Bonzini
  2015-07-30  8:31               ` Artyom Tarasenko
  2015-07-30  7:55             ` Aurelien Jarno
  1 sibling, 1 reply; 80+ messages in thread
From: Paolo Bonzini @ 2015-07-30  7:12 UTC (permalink / raw)
  To: Dennis Luehring, Karel Gardas; +Cc: qemu-devel



On 30/07/2015 05:47, Dennis Luehring wrote:
> so your aarch64 is just less todo for qemu - not EVERY >= 16bit memory
> access needs swapping or needs check for unaligned access to emulate
> bus-erros

Not to mention register windows, which IIRC are the big source of pain
for SPARC.

Paolo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  3:52       ` Dennis Luehring
@ 2015-07-30  7:52         ` Aurelien Jarno
  2015-07-30  8:16           ` Dennis Luehring
  0 siblings, 1 reply; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-30  7:52 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Artyom Tarasenko

On 2015-07-30 05:52, Dennis Luehring wrote:
> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> >The point is that emulation has a cost, and it's quite difficult to
> >to lower it and thus improve the emulation speed.
> 
> so its just not strange for you to see an 1/100...200 of the native x64
> speed under qemu/SPARC64
> i hoped that someone will jump up an shout "its impossible - it needs to be
> a bug" ...sadly not

Overall the ratio is more around 10, but in some specific cases where
the TB cache is inefficient and TB can't be linked or with an
inefficient MMU, a ratio of 100 is possible.

Also remember you are comparing apples and oranges there. A GCC compiler
for x86-64 and for SPARC64 can't really be compared, even if the
front-end and middle-end part are the same. The job of generating the 
assembly code is not the same and might take more or less time depending
on the architecture complexity.

Finally make sure you have enough RAM in your guest and that GCC isn't
swapped. Recent GCC versions needs a lot of memory and the default 128MB
might not be enough.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  3:47           ` Dennis Luehring
  2015-07-30  7:12             ` Paolo Bonzini
@ 2015-07-30  7:55             ` Aurelien Jarno
  2015-08-17 14:19               ` Artyom Tarasenko
  1 sibling, 1 reply; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-30  7:55 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Karel Gardas

On 2015-07-30 05:47, Dennis Luehring wrote:
> so your aarch64 is just less todo for qemu - not EVERY >= 16bit memory
> access needs swapping or needs check for unaligned access to emulate
> bus-erros

On recent Intel CPU, the byteswapping comes for free (MOVBE
instruction).

About the unaligned access it's actually the reverse. The fact that
aarch64 does unaligned access means they have to go through the slow
path (I have posted a patch to improve that). On sparc given that all
access are aligned means there are more chances to go through the fast
path.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  7:52         ` Aurelien Jarno
@ 2015-07-30  8:16           ` Dennis Luehring
  2015-07-30  8:42             ` Artyom Tarasenko
  2015-07-30  8:55             ` Aurelien Jarno
  0 siblings, 2 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-07-30  8:16 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Artyom Tarasenko

Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
> On 2015-07-30 05:52, Dennis Luehring wrote:
> > Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> > >The point is that emulation has a cost, and it's quite difficult to
> > >to lower it and thus improve the emulation speed.
> >
> > so its just not strange for you to see an 1/100...200 of the native x64
> > speed under qemu/SPARC64
> > i hoped that someone will jump up an shout "its impossible - it needs to be
> > a bug" ...sadly not
>
> Overall the ratio is more around 10, but in some specific cases where
> the TB cache is inefficient and TB can't be linked or with an
> inefficient MMU, a ratio of 100 is possible.


sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
    Host x64    :   1.3580s
    Qemu SPARC64: 184.2532s

sysbench shows nearly ration of 200

>
> Also remember you are comparing apples and oranges there. A GCC compiler
> for x86-64 and for SPARC64 can't really be compared, even if the
> front-end and middle-end part are the same. The job of generating the
> assembly code is not the same and might take more or less time depending
> on the architecture complexity.

that is true - but its slow even when i compile non-complex stuff, no 
headers, simple integer math
i try to come up with an good example

>
> Finally make sure you have enough RAM in your guest and that GCC isn't
> swapped. Recent GCC versions needs a lot of memory and the default 128MB
> might not be enough.
>

during all my test 70-80% free RAM, no swapping ever

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  7:12             ` Paolo Bonzini
@ 2015-07-30  8:31               ` Artyom Tarasenko
  2015-08-02 19:12                 ` Alex Bennée
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-07-30  8:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Dennis Luehring, Karel Gardas

On Thu, Jul 30, 2015 at 9:12 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 30/07/2015 05:47, Dennis Luehring wrote:
>> so your aarch64 is just less todo for qemu - not EVERY >= 16bit memory
>> access needs swapping or needs check for unaligned access to emulate
>> bus-erros
>
> Not to mention register windows, which IIRC are the big source of pain
> for SPARC.

That's true. But then again there is an emulator called tme [1] which
doesn't do binary translation,
and still much faster then QEMU at least on a x86_64 host when
emulating a sun4u machine.

1.  http://people.csail.mit.edu/fredette/tme/index.html

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  8:16           ` Dennis Luehring
@ 2015-07-30  8:42             ` Artyom Tarasenko
  2015-07-30  8:55             ` Aurelien Jarno
  1 sibling, 0 replies; 80+ messages in thread
From: Artyom Tarasenko @ 2015-07-30  8:42 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: mttcg, qemu-devel, Aurelien Jarno

On Thu, Jul 30, 2015 at 10:16 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
>>
>> On 2015-07-30 05:52, Dennis Luehring wrote:
>> > Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
>> > >The point is that emulation has a cost, and it's quite difficult to
>> > >to lower it and thus improve the emulation speed.
>> >
>> > so its just not strange for you to see an 1/100...200 of the native x64
>> > speed under qemu/SPARC64
>> > i hoped that someone will jump up an shout "its impossible - it needs to
>> > be
>> > a bug" ...sadly not
>>
>> Overall the ratio is more around 10, but in some specific cases where
>> the TB cache is inefficient and TB can't be linked or with an
>> inefficient MMU, a ratio of 100 is possible.
>
>
>
> sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
>    Host x64    :   1.3580s
>    Qemu SPARC64: 184.2532s
>
> sysbench shows nearly ration of 200

If you are not bound to Linux, try TME
(http://people.csail.mit.edu/fredette/tme/index.html).

Last time I saw it (2010) it was quite faster than QEMU, but was only
able to boot NetBSD (It's probably not hard to add Linux support,
afair if was just missing some ESP commands.).
It doesn't do binary translation but utilizes more host cpu cores.

Otherwise maybe a multi-threaded TCG would help. If it is able to run
translation an execution of code in different threads, it would nearly
double the performance.
Adding mttcg to CC, with a hope to hear from the authors whether their
thread model brings something when emulating a single CPU, or is it
only oriented to SMP machines.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  8:16           ` Dennis Luehring
  2015-07-30  8:42             ` Artyom Tarasenko
@ 2015-07-30  8:55             ` Aurelien Jarno
  2015-07-30  9:35               ` Artyom Tarasenko
                                 ` (3 more replies)
  1 sibling, 4 replies; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-30  8:55 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Artyom Tarasenko

On 2015-07-30 10:16, Dennis Luehring wrote:
> Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
> >On 2015-07-30 05:52, Dennis Luehring wrote:
> >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> >> >The point is that emulation has a cost, and it's quite difficult to
> >> >to lower it and thus improve the emulation speed.
> >>
> >> so its just not strange for you to see an 1/100...200 of the native x64
> >> speed under qemu/SPARC64
> >> i hoped that someone will jump up an shout "its impossible - it needs to be
> >> a bug" ...sadly not
> >
> >Overall the ratio is more around 10, but in some specific cases where
> >the TB cache is inefficient and TB can't be linked or with an
> >inefficient MMU, a ratio of 100 is possible.
> 
> 
> sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
>    Host x64    :   1.3580s
>    Qemu SPARC64: 184.2532s
> 
> sysbench shows nearly ration of 200

Note that when you say SPARC64 here, it's actually only the kernel, you
are using a 32-bit userland. And that makes a difference. Here are my
tests here:

host (x86-64)                    0.8976s
sparc32 guest (sparc64 kernel)  99.6116s
sparc64 guest (sparc64 kernel)   4.4908s

So it looks like the 32-bit code is not QEMU friendly. I haven't looked
at it yet, but I guess it might be due to dynamic jumps, so that TB
can't be chained.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  8:55             ` Aurelien Jarno
@ 2015-07-30  9:35               ` Artyom Tarasenko
  2015-07-30 10:09                 ` Aurelien Jarno
  2015-07-30 15:50               ` Aurelien Jarno
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-07-30  9:35 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Dennis Luehring

On Thu, Jul 30, 2015 at 10:55 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On 2015-07-30 10:16, Dennis Luehring wrote:
>> Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
>> >On 2015-07-30 05:52, Dennis Luehring wrote:
>> >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
>> >> >The point is that emulation has a cost, and it's quite difficult to
>> >> >to lower it and thus improve the emulation speed.
>> >>
>> >> so its just not strange for you to see an 1/100...200 of the native x64
>> >> speed under qemu/SPARC64
>> >> i hoped that someone will jump up an shout "its impossible - it needs to be
>> >> a bug" ...sadly not
>> >
>> >Overall the ratio is more around 10, but in some specific cases where
>> >the TB cache is inefficient and TB can't be linked or with an
>> >inefficient MMU, a ratio of 100 is possible.
>>
>>
>> sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
>>    Host x64    :   1.3580s
>>    Qemu SPARC64: 184.2532s
>>
>> sysbench shows nearly ration of 200
>
> Note that when you say SPARC64 here, it's actually only the kernel, you
> are using a 32-bit userland. And that makes a difference. Here are my
> tests here:
>
> host (x86-64)                    0.8976s
> sparc32 guest (sparc64 kernel)  99.6116s
> sparc64 guest (sparc64 kernel)   4.4908s

Wow. That's quite a difference. What have you used as a sparc64 guest?
Are there any ready-to-use distributions, or have you built it from scratch?

> So it looks like the 32-bit code is not QEMU friendly. I haven't looked
> at it yet, but I guess it might be due to dynamic jumps, so that TB
> can't be chained.
>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net



-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  9:35               ` Artyom Tarasenko
@ 2015-07-30 10:09                 ` Aurelien Jarno
  2015-07-30 18:21                   ` Dennis Luehring
  0 siblings, 1 reply; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-30 10:09 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Dennis Luehring

On 2015-07-30 11:35, Artyom Tarasenko wrote:
> On Thu, Jul 30, 2015 at 10:55 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On 2015-07-30 10:16, Dennis Luehring wrote:
> >> Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
> >> >On 2015-07-30 05:52, Dennis Luehring wrote:
> >> >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> >> >> >The point is that emulation has a cost, and it's quite difficult to
> >> >> >to lower it and thus improve the emulation speed.
> >> >>
> >> >> so its just not strange for you to see an 1/100...200 of the native x64
> >> >> speed under qemu/SPARC64
> >> >> i hoped that someone will jump up an shout "its impossible - it needs to be
> >> >> a bug" ...sadly not
> >> >
> >> >Overall the ratio is more around 10, but in some specific cases where
> >> >the TB cache is inefficient and TB can't be linked or with an
> >> >inefficient MMU, a ratio of 100 is possible.
> >>
> >>
> >> sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
> >>    Host x64    :   1.3580s
> >>    Qemu SPARC64: 184.2532s
> >>
> >> sysbench shows nearly ration of 200
> >
> > Note that when you say SPARC64 here, it's actually only the kernel, you
> > are using a 32-bit userland. And that makes a difference. Here are my
> > tests here:
> >
> > host (x86-64)                    0.8976s
> > sparc32 guest (sparc64 kernel)  99.6116s
> > sparc64 guest (sparc64 kernel)   4.4908s
> 
> Wow. That's quite a difference. What have you used as a sparc64 guest?
> Are there any ready-to-use distributions, or have you built it from scratch?

I am using Debian SPARC64 from debian-ports. But it's not really
ready-to-use and often broken.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  8:55             ` Aurelien Jarno
  2015-07-30  9:35               ` Artyom Tarasenko
@ 2015-07-30 15:50               ` Aurelien Jarno
  2015-07-31 15:31                 ` Artyom Tarasenko
  2015-08-03  7:58               ` Dennis Luehring
  2015-08-03 14:51               ` Dennis Luehring
  3 siblings, 1 reply; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-30 15:50 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Artyom Tarasenko

[-- Attachment #1: Type: text/plain, Size: 3889 bytes --]

On 2015-07-30 10:55, Aurelien Jarno wrote:
> On 2015-07-30 10:16, Dennis Luehring wrote:
> > Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
> > >On 2015-07-30 05:52, Dennis Luehring wrote:
> > >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> > >> >The point is that emulation has a cost, and it's quite difficult to
> > >> >to lower it and thus improve the emulation speed.
> > >>
> > >> so its just not strange for you to see an 1/100...200 of the native x64
> > >> speed under qemu/SPARC64
> > >> i hoped that someone will jump up an shout "its impossible - it needs to be
> > >> a bug" ...sadly not
> > >
> > >Overall the ratio is more around 10, but in some specific cases where
> > >the TB cache is inefficient and TB can't be linked or with an
> > >inefficient MMU, a ratio of 100 is possible.
> > 
> > 
> > sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
> >    Host x64    :   1.3580s
> >    Qemu SPARC64: 184.2532s
> > 
> > sysbench shows nearly ration of 200
> 
> Note that when you say SPARC64 here, it's actually only the kernel, you
> are using a 32-bit userland. And that makes a difference. Here are my
> tests here:
> 
> host (x86-64)                    0.8976s
> sparc32 guest (sparc64 kernel)  99.6116s
> sparc64 guest (sparc64 kernel)   4.4908s
> 
> So it looks like the 32-bit code is not QEMU friendly. I haven't looked
> at it yet, but I guess it might be due to dynamic jumps, so that TB
> can't be chained.

This is the corresponding C code from sysbench, which is ran 10000
times.

| int cpu_execute_request(sb_request_t *r, int thread_id)
| { 
|   unsigned long long c;
|   unsigned long long l,t;
|   unsigned long long n=0;
|   log_msg_t           msg;
|   log_msg_oper_t      op_msg;
|   
|   (void)r; /* unused */
|   
|   /* Prepare log message */
|   msg.type = LOG_MSG_TYPE_OPER;
|   msg.data = &op_msg;
|   
|   /* So far we're using very simple test prime number tests in 64bit */
|   LOG_EVENT_START(msg, thread_id);
|   
|   for(c=3; c < max_prime; c++)
|   { 
|     t = sqrt(c); 
|     for(l = 2; l <= t; l++)
|       if (c % l == 0)
|         break;
|     if (l > t )
|       n++;
|   }
|   
|   LOG_EVENT_STOP(msg, thread_id);
|   
|   return 0;
| }

This is a very simple test, which is probably not a good representation
of the CPU performances, even more when emulated by QEMU. In addition to
that, given it mostly uses 64 bit integer, it's kind of expected that
the 32-bit version is slower.

Anyway I have extracted this code into a C file (see attached file) that
can more easily compiled to 32 or 64 bit using -m32 or -m64. I observe
the same behavior than sysbench, even with qemu-user (which is not
surprising as the above code doesn't really put pressure the MMU.

Running it in I get the following time:
x86-64 host       0.877s
sparc guest -m32  1m39s
sparc guest -m64   3.5s
opensparc T1 -m32 1m59s
opensparc T1 -m64 1m12s

So overall QEMU is faster than a not so old real hardware. That said
looking at it quickly it seems that some of the FP instructions are
actually trapped and emulated by the kernel on the opensparc T1.

Now coming back to the QEMU problem, the issue is that the 64-bit code
is using the udivx instruction to compute the modulo, while the 32-bit
code calls the __umoddi3 GCC helper. It uses a lot of integer functions
based on CPU flags, so most of the time is spent computing them in
helper_compute_psr.

So as said in my previous emails, QEMU is not cycle accurate, and you
can expect that some specific code can be emulated very quickly (x4
ratio in the 64-bit case) and some other specific code can be emulated
very slowly (x110 ratio in the 32-bit case). It appears that the
sysbench code is actually quite specific, which explains the
difference.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

[-- Attachment #2: prime.c --]
[-- Type: text/x-csrc, Size: 479 bytes --]

#include <math.h>

unsigned long long max_prime = 2000;

void prime_test()
{ 
  unsigned long long c;
  unsigned long long l,t;
  unsigned long long n=0;
  
  /* So far we're using very simple test prime number tests in 64bit */
  for(c=3; c < max_prime; c++)
  { 
    t = sqrt(c); 
    for(l = 2; l <= t; l++)
      if (c % l == 0)
        break;
    if (l > t )
      n++;
  }
}

int main()
{
  int i;

  for (i = 0 ; i < 10000 ; i++)
  {
    prime_test();
  }

  return 0;
}


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30 10:09                 ` Aurelien Jarno
@ 2015-07-30 18:21                   ` Dennis Luehring
  0 siblings, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-07-30 18:21 UTC (permalink / raw)
  To: Artyom Tarasenko, qemu-devel, Aurelien Jarno

Am 30.07.2015 um 12:09 schrieb Aurelien Jarno:
> I am using Debian SPARC64 from debian-ports. But it's not really
> ready-to-use and often broken.

the current 6.5.1 NetBSD is available for SPARC and SPARC64 - with SPARC 
using 32bit Kernel/Userland and
SPARC64 with 64bit Kernel/Userland

and according to Artyom Tarasenko's blog: 
http://tyom.blogspot.de/2014/08/upstream-qemu-can-run-netbsdsparc64.html
it should boot under Qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30 15:50               ` Aurelien Jarno
@ 2015-07-31 15:31                 ` Artyom Tarasenko
  2015-07-31 15:43                   ` Aurelien Jarno
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-07-31 15:31 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Dennis Luehring

On Thu, Jul 30, 2015 at 5:50 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On 2015-07-30 10:55, Aurelien Jarno wrote:
>> On 2015-07-30 10:16, Dennis Luehring wrote:
>> > Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
>> > >On 2015-07-30 05:52, Dennis Luehring wrote:
>> > >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
>> > >> >The point is that emulation has a cost, and it's quite difficult to
>> > >> >to lower it and thus improve the emulation speed.
>> > >>
>> > >> so its just not strange for you to see an 1/100...200 of the native x64
>> > >> speed under qemu/SPARC64
>> > >> i hoped that someone will jump up an shout "its impossible - it needs to be
>> > >> a bug" ...sadly not
>> > >
>> > >Overall the ratio is more around 10, but in some specific cases where
>> > >the TB cache is inefficient and TB can't be linked or with an
>> > >inefficient MMU, a ratio of 100 is possible.
>> >
>> >
>> > sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
>> >    Host x64    :   1.3580s
>> >    Qemu SPARC64: 184.2532s
>> >
>> > sysbench shows nearly ration of 200
>>
>> Note that when you say SPARC64 here, it's actually only the kernel, you
>> are using a 32-bit userland. And that makes a difference. Here are my
>> tests here:
>>
>> host (x86-64)                    0.8976s
>> sparc32 guest (sparc64 kernel)  99.6116s
>> sparc64 guest (sparc64 kernel)   4.4908s
>>
>> So it looks like the 32-bit code is not QEMU friendly. I haven't looked
>> at it yet, but I guess it might be due to dynamic jumps, so that TB
>> can't be chained.
>
> This is the corresponding C code from sysbench, which is ran 10000
> times.
>
> | int cpu_execute_request(sb_request_t *r, int thread_id)
> | {
> |   unsigned long long c;
> |   unsigned long long l,t;
> |   unsigned long long n=0;
> |   log_msg_t           msg;
> |   log_msg_oper_t      op_msg;
> |
> |   (void)r; /* unused */
> |
> |   /* Prepare log message */
> |   msg.type = LOG_MSG_TYPE_OPER;
> |   msg.data = &op_msg;
> |
> |   /* So far we're using very simple test prime number tests in 64bit */
> |   LOG_EVENT_START(msg, thread_id);
> |
> |   for(c=3; c < max_prime; c++)
> |   {
> |     t = sqrt(c);
> |     for(l = 2; l <= t; l++)
> |       if (c % l == 0)
> |         break;
> |     if (l > t )
> |       n++;
> |   }
> |
> |   LOG_EVENT_STOP(msg, thread_id);
> |
> |   return 0;
> | }
>
> This is a very simple test, which is probably not a good representation
> of the CPU performances, even more when emulated by QEMU. In addition to
> that, given it mostly uses 64 bit integer, it's kind of expected that
> the 32-bit version is slower.
>
> Anyway I have extracted this code into a C file (see attached file) that
> can more easily compiled to 32 or 64 bit using -m32 or -m64. I observe
> the same behavior than sysbench, even with qemu-user (which is not
> surprising as the above code doesn't really put pressure the MMU.
>
> Running it in I get the following time:
> x86-64 host       0.877s
> sparc guest -m32  1m39s
> sparc guest -m64   3.5s
> opensparc T1 -m32 1m59s
> opensparc T1 -m64 1m12s
>
> So overall QEMU is faster than a not so old real hardware. That said
> looking at it quickly it seems that some of the FP instructions are
> actually trapped and emulated by the kernel on the opensparc T1.
>
> Now coming back to the QEMU problem, the issue is that the 64-bit code
> is using the udivx instruction to compute the modulo, while the 32-bit
> code calls the __umoddi3 GCC helper.

Actually this looks like a bug/missing feature in gcc. Why doesn't it use udivx
instruction in "SPARC32PLUS, V8+ Required" code?

> It uses a lot of integer functions
> based on CPU flags, so most of the time is spent computing them in
> helper_compute_psr.

I wonder if this can be optimized. I guess most RISC CPUs would have a
similar problem. Unlike x86, the compilers usually optimize
instructions on flag usage. If there is an instruction modifying flags
in a code, the flags will be used for sure, so it probably makes a
little sense to pospone the flag computation?

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-31 15:31                 ` Artyom Tarasenko
@ 2015-07-31 15:43                   ` Aurelien Jarno
  2015-08-02 13:11                     ` Mark Cave-Ayland
                                       ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Aurelien Jarno @ 2015-07-31 15:43 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Dennis Luehring

On 2015-07-31 17:31, Artyom Tarasenko wrote:
> On Thu, Jul 30, 2015 at 5:50 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On 2015-07-30 10:55, Aurelien Jarno wrote:
> >> On 2015-07-30 10:16, Dennis Luehring wrote:
> >> > Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
> >> > >On 2015-07-30 05:52, Dennis Luehring wrote:
> >> > >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> >> > >> >The point is that emulation has a cost, and it's quite difficult to
> >> > >> >to lower it and thus improve the emulation speed.
> >> > >>
> >> > >> so its just not strange for you to see an 1/100...200 of the native x64
> >> > >> speed under qemu/SPARC64
> >> > >> i hoped that someone will jump up an shout "its impossible - it needs to be
> >> > >> a bug" ...sadly not
> >> > >
> >> > >Overall the ratio is more around 10, but in some specific cases where
> >> > >the TB cache is inefficient and TB can't be linked or with an
> >> > >inefficient MMU, a ratio of 100 is possible.
> >> >
> >> >
> >> > sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
> >> >    Host x64    :   1.3580s
> >> >    Qemu SPARC64: 184.2532s
> >> >
> >> > sysbench shows nearly ration of 200
> >>
> >> Note that when you say SPARC64 here, it's actually only the kernel, you
> >> are using a 32-bit userland. And that makes a difference. Here are my
> >> tests here:
> >>
> >> host (x86-64)                    0.8976s
> >> sparc32 guest (sparc64 kernel)  99.6116s
> >> sparc64 guest (sparc64 kernel)   4.4908s
> >>
> >> So it looks like the 32-bit code is not QEMU friendly. I haven't looked
> >> at it yet, but I guess it might be due to dynamic jumps, so that TB
> >> can't be chained.
> >
> > This is the corresponding C code from sysbench, which is ran 10000
> > times.
> >
> > | int cpu_execute_request(sb_request_t *r, int thread_id)
> > | {
> > |   unsigned long long c;
> > |   unsigned long long l,t;
> > |   unsigned long long n=0;
> > |   log_msg_t           msg;
> > |   log_msg_oper_t      op_msg;
> > |
> > |   (void)r; /* unused */
> > |
> > |   /* Prepare log message */
> > |   msg.type = LOG_MSG_TYPE_OPER;
> > |   msg.data = &op_msg;
> > |
> > |   /* So far we're using very simple test prime number tests in 64bit */
> > |   LOG_EVENT_START(msg, thread_id);
> > |
> > |   for(c=3; c < max_prime; c++)
> > |   {
> > |     t = sqrt(c);
> > |     for(l = 2; l <= t; l++)
> > |       if (c % l == 0)
> > |         break;
> > |     if (l > t )
> > |       n++;
> > |   }
> > |
> > |   LOG_EVENT_STOP(msg, thread_id);
> > |
> > |   return 0;
> > | }
> >
> > This is a very simple test, which is probably not a good representation
> > of the CPU performances, even more when emulated by QEMU. In addition to
> > that, given it mostly uses 64 bit integer, it's kind of expected that
> > the 32-bit version is slower.
> >
> > Anyway I have extracted this code into a C file (see attached file) that
> > can more easily compiled to 32 or 64 bit using -m32 or -m64. I observe
> > the same behavior than sysbench, even with qemu-user (which is not
> > surprising as the above code doesn't really put pressure the MMU.
> >
> > Running it in I get the following time:
> > x86-64 host       0.877s
> > sparc guest -m32  1m39s
> > sparc guest -m64   3.5s
> > opensparc T1 -m32 1m59s
> > opensparc T1 -m64 1m12s
> >
> > So overall QEMU is faster than a not so old real hardware. That said
> > looking at it quickly it seems that some of the FP instructions are
> > actually trapped and emulated by the kernel on the opensparc T1.
> >
> > Now coming back to the QEMU problem, the issue is that the 64-bit code
> > is using the udivx instruction to compute the modulo, while the 32-bit
> > code calls the __umoddi3 GCC helper.
> 
> Actually this looks like a bug/missing feature in gcc. Why doesn't it use udivx
> instruction in "SPARC32PLUS, V8+ Required" code?

No idea.

> > It uses a lot of integer functions
> > based on CPU flags, so most of the time is spent computing them in
> > helper_compute_psr.
> 
> I wonder if this can be optimized. I guess most RISC CPUs would have a
> similar problem. Unlike x86, the compilers usually optimize
> instructions on flag usage. If there is an instruction modifying flags
> in a code, the flags will be used for sure, so it probably makes a
> little sense to pospone the flag computation?

Indeed. ARM and SH4 use one TCG temp per flag, and they can be computed
one by one using setcond. The optimizer and the liveness analysis then
get rid of the unused computation. However while it allows intra-TB
optimization, it prevent any other flags optimization. Therefore the
only way to know if it is a good idea or not is to implement it and
benchmark that, but using a bit more than a single biased benchmark like
the one from sysbench.

Also note that the current implementation predates the introduction of
setcond, which is necessary to be able to compute the flags using TCG
code.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-31 15:43                   ` Aurelien Jarno
@ 2015-08-02 13:11                     ` Mark Cave-Ayland
  2015-08-03  8:31                     ` Artyom Tarasenko
  2015-08-17 11:32                     ` Dennis Luehring
  2 siblings, 0 replies; 80+ messages in thread
From: Mark Cave-Ayland @ 2015-08-02 13:11 UTC (permalink / raw)
  To: Aurelien Jarno, Artyom Tarasenko, Dennis Luehring, qemu-devel

On 31/07/15 16:43, Aurelien Jarno wrote:

> On 2015-07-31 17:31, Artyom Tarasenko wrote:
>> On Thu, Jul 30, 2015 at 5:50 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>> On 2015-07-30 10:55, Aurelien Jarno wrote:
>>>> On 2015-07-30 10:16, Dennis Luehring wrote:
>>>>> Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
>>>>>> On 2015-07-30 05:52, Dennis Luehring wrote:
>>>>>>> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
>>>>>>>> The point is that emulation has a cost, and it's quite difficult to
>>>>>>>> to lower it and thus improve the emulation speed.
>>>>>>>
>>>>>>> so its just not strange for you to see an 1/100...200 of the native x64
>>>>>>> speed under qemu/SPARC64
>>>>>>> i hoped that someone will jump up an shout "its impossible - it needs to be
>>>>>>> a bug" ...sadly not
>>>>>>
>>>>>> Overall the ratio is more around 10, but in some specific cases where
>>>>>> the TB cache is inefficient and TB can't be linked or with an
>>>>>> inefficient MMU, a ratio of 100 is possible.
>>>>>
>>>>>
>>>>> sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
>>>>>    Host x64    :   1.3580s
>>>>>    Qemu SPARC64: 184.2532s
>>>>>
>>>>> sysbench shows nearly ration of 200
>>>>
>>>> Note that when you say SPARC64 here, it's actually only the kernel, you
>>>> are using a 32-bit userland. And that makes a difference. Here are my
>>>> tests here:
>>>>
>>>> host (x86-64)                    0.8976s
>>>> sparc32 guest (sparc64 kernel)  99.6116s
>>>> sparc64 guest (sparc64 kernel)   4.4908s
>>>>
>>>> So it looks like the 32-bit code is not QEMU friendly. I haven't looked
>>>> at it yet, but I guess it might be due to dynamic jumps, so that TB
>>>> can't be chained.
>>>
>>> This is the corresponding C code from sysbench, which is ran 10000
>>> times.
>>>
>>> | int cpu_execute_request(sb_request_t *r, int thread_id)
>>> | {
>>> |   unsigned long long c;
>>> |   unsigned long long l,t;
>>> |   unsigned long long n=0;
>>> |   log_msg_t           msg;
>>> |   log_msg_oper_t      op_msg;
>>> |
>>> |   (void)r; /* unused */
>>> |
>>> |   /* Prepare log message */
>>> |   msg.type = LOG_MSG_TYPE_OPER;
>>> |   msg.data = &op_msg;
>>> |
>>> |   /* So far we're using very simple test prime number tests in 64bit */
>>> |   LOG_EVENT_START(msg, thread_id);
>>> |
>>> |   for(c=3; c < max_prime; c++)
>>> |   {
>>> |     t = sqrt(c);
>>> |     for(l = 2; l <= t; l++)
>>> |       if (c % l == 0)
>>> |         break;
>>> |     if (l > t )
>>> |       n++;
>>> |   }
>>> |
>>> |   LOG_EVENT_STOP(msg, thread_id);
>>> |
>>> |   return 0;
>>> | }
>>>
>>> This is a very simple test, which is probably not a good representation
>>> of the CPU performances, even more when emulated by QEMU. In addition to
>>> that, given it mostly uses 64 bit integer, it's kind of expected that
>>> the 32-bit version is slower.
>>>
>>> Anyway I have extracted this code into a C file (see attached file) that
>>> can more easily compiled to 32 or 64 bit using -m32 or -m64. I observe
>>> the same behavior than sysbench, even with qemu-user (which is not
>>> surprising as the above code doesn't really put pressure the MMU.
>>>
>>> Running it in I get the following time:
>>> x86-64 host       0.877s
>>> sparc guest -m32  1m39s
>>> sparc guest -m64   3.5s
>>> opensparc T1 -m32 1m59s
>>> opensparc T1 -m64 1m12s
>>>
>>> So overall QEMU is faster than a not so old real hardware. That said
>>> looking at it quickly it seems that some of the FP instructions are
>>> actually trapped and emulated by the kernel on the opensparc T1.
>>>
>>> Now coming back to the QEMU problem, the issue is that the 64-bit code
>>> is using the udivx instruction to compute the modulo, while the 32-bit
>>> code calls the __umoddi3 GCC helper.
>>
>> Actually this looks like a bug/missing feature in gcc. Why doesn't it use udivx
>> instruction in "SPARC32PLUS, V8+ Required" code?
> 
> No idea.
> 
>>> It uses a lot of integer functions
>>> based on CPU flags, so most of the time is spent computing them in
>>> helper_compute_psr.
>>
>> I wonder if this can be optimized. I guess most RISC CPUs would have a
>> similar problem. Unlike x86, the compilers usually optimize
>> instructions on flag usage. If there is an instruction modifying flags
>> in a code, the flags will be used for sure, so it probably makes a
>> little sense to pospone the flag computation?
> 
> Indeed. ARM and SH4 use one TCG temp per flag, and they can be computed
> one by one using setcond. The optimizer and the liveness analysis then
> get rid of the unused computation. However while it allows intra-TB
> optimization, it prevent any other flags optimization. Therefore the
> only way to know if it is a good idea or not is to implement it and
> benchmark that, but using a bit more than a single biased benchmark like
> the one from sysbench.
> 
> Also note that the current implementation predates the introduction of
> setcond, which is necessary to be able to compute the flags using TCG
> code.

Aurelien - just to say thank you for looking into this. My focus for
SPARC64, as time allows, has being more on emulation side, i.e. getting
to the point where it can start to run more OSs which is gradually
happening over time. Once the basic emulation is complete, trying to
improve performance is definitely something I would like to work on
although I will likely have many questions :)


ATB,

Mark.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  8:31               ` Artyom Tarasenko
@ 2015-08-02 19:12                 ` Alex Bennée
  0 siblings, 0 replies; 80+ messages in thread
From: Alex Bennée @ 2015-08-02 19:12 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: Paolo Bonzini, qemu-devel, Dennis Luehring, Karel Gardas


Artyom Tarasenko <atar4qemu@gmail.com> writes:

> On Thu, Jul 30, 2015 at 9:12 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>> On 30/07/2015 05:47, Dennis Luehring wrote:
>>> so your aarch64 is just less todo for qemu - not EVERY >= 16bit memory
>>> access needs swapping or needs check for unaligned access to emulate
>>> bus-erros
>>
>> Not to mention register windows, which IIRC are the big source of pain
>> for SPARC.
>
> That's true. But then again there is an emulator called tme [1] which
> doesn't do binary translation,
> and still much faster then QEMU at least on a x86_64 host when
> emulating a sun4u machine.
>
> 1.  http://people.csail.mit.edu/fredette/tme/index.html

That does seem weird. FWIW with QuickTransit's SPARC emulation we
claimed "faster than native", e.g. we could get benchmarks that run
faster under translation on the latest Intel hardware than the fastest
production SPARC chips at the time. We did have the advantage of full FPU
translation which is an area QEMU lags a little but that doesn't seem to
be the problem here.

I notice udivx is a helper function but I guess that's to make it's
overflow and exception handling easier.

>
> Artyom

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  8:55             ` Aurelien Jarno
  2015-07-30  9:35               ` Artyom Tarasenko
  2015-07-30 15:50               ` Aurelien Jarno
@ 2015-08-03  7:58               ` Dennis Luehring
  2015-08-03 14:51               ` Dennis Luehring
  3 siblings, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-03  7:58 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Artyom Tarasenko

Am 30.07.2015 um 10:55 schrieb Aurelien Jarno:
> Note that when you say SPARC64 here, it's actually only the kernel, you
> are using a 32-bit userland. And that makes a difference. Here are my
> tests here:

installing NetBSD 6.5.1 SPARC64 seems to perform much better (compared 
to Debian 7.8.0 SPARC64) - even without using virtio at all

the kernel+user-Space in NetBSD is pure SPARC or pure SPARC64

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-31 15:43                   ` Aurelien Jarno
  2015-08-02 13:11                     ` Mark Cave-Ayland
@ 2015-08-03  8:31                     ` Artyom Tarasenko
  2015-08-03  9:17                       ` Aurelien Jarno
  2015-08-17 11:32                     ` Dennis Luehring
  2 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-03  8:31 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Dennis Luehring

Hi Aurelien,

On Fri, Jul 31, 2015 at 5:43 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:

>> > It uses a lot of integer functions
>> > based on CPU flags, so most of the time is spent computing them in
>> > helper_compute_psr.
>>
>> I wonder if this can be optimized. I guess most RISC CPUs would have a
>> similar problem. Unlike x86, the compilers usually optimize
>> instructions on flag usage. If there is an instruction modifying flags
>> in a code, the flags will be used for sure, so it probably makes a
>> little sense to pospone the flag computation?
>
> Indeed. ARM and SH4 use one TCG temp per flag, and they can be computed
> one by one using setcond. The optimizer and the liveness analysis then
> get rid of the unused computation. However while it allows intra-TB
> optimization, it prevent any other flags optimization. Therefore the
> only way to know if it is a good idea or not is to implement it and
> benchmark that, but using a bit more than a single biased benchmark like
> the one from sysbench.
>
> Also note that the current implementation predates the introduction of
> setcond, which is necessary to be able to compute the flags using TCG
> code.

Thanks for explaining it, the problem is much more clear now.
Moving to setcond is definitely worth a shot. I'd like to play with it.
What would be the minimal entity to change without reworking the complete TCG:
 a) one flag for one instruction,
 b) all flags for one instruction,
 c) one flag for all instructions,
or d) all flags for all instructions (gradually moving to setcond is
not possible) ?

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-03  8:31                     ` Artyom Tarasenko
@ 2015-08-03  9:17                       ` Aurelien Jarno
  2015-08-18  9:24                         ` Artyom Tarasenko
  0 siblings, 1 reply; 80+ messages in thread
From: Aurelien Jarno @ 2015-08-03  9:17 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Dennis Luehring

On 2015-08-03 10:31, Artyom Tarasenko wrote:
> Hi Aurelien,
> 
> On Fri, Jul 31, 2015 at 5:43 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> 
> >> > It uses a lot of integer functions
> >> > based on CPU flags, so most of the time is spent computing them in
> >> > helper_compute_psr.
> >>
> >> I wonder if this can be optimized. I guess most RISC CPUs would have a
> >> similar problem. Unlike x86, the compilers usually optimize
> >> instructions on flag usage. If there is an instruction modifying flags
> >> in a code, the flags will be used for sure, so it probably makes a
> >> little sense to pospone the flag computation?
> >
> > Indeed. ARM and SH4 use one TCG temp per flag, and they can be computed
> > one by one using setcond. The optimizer and the liveness analysis then
> > get rid of the unused computation. However while it allows intra-TB
> > optimization, it prevent any other flags optimization. Therefore the
> > only way to know if it is a good idea or not is to implement it and
> > benchmark that, but using a bit more than a single biased benchmark like
> > the one from sysbench.
> >
> > Also note that the current implementation predates the introduction of
> > setcond, which is necessary to be able to compute the flags using TCG
> > code.
> 
> Thanks for explaining it, the problem is much more clear now.
> Moving to setcond is definitely worth a shot. I'd like to play with it.
> What would be the minimal entity to change without reworking the complete TCG:
>  a) one flag for one instruction,
>  b) all flags for one instruction,
>  c) one flag for all instructions,
> or d) all flags for all instructions (gradually moving to setcond is
> not possible) ?

You should with the c) option. You can look at how I done this for SH4,
starting with commit 5ed9a259c164bb9fd2a6fe8a363a4bda2e4a5461.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  8:55             ` Aurelien Jarno
                                 ` (2 preceding siblings ...)
  2015-08-03  7:58               ` Dennis Luehring
@ 2015-08-03 14:51               ` Dennis Luehring
  2015-08-03 15:59                 ` Karel Gardas
  3 siblings, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-03 14:51 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Artyom Tarasenko

Am 30.07.2015 um 10:55 schrieb Aurelien Jarno:
> Note that when you say SPARC64 here, it's actually only the kernel, you
> are using a 32-bit userland.

ok - NetBSD 6.5.1 SPARC64 is blasting fast compare to Debian 7.8.0 
SPARC64 - i installed the complete system (without X) in a few minutes
Debian needs >1h

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-03 14:51               ` Dennis Luehring
@ 2015-08-03 15:59                 ` Karel Gardas
  2015-08-03 19:51                   ` Dennis Luehring
  0 siblings, 1 reply; 80+ messages in thread
From: Karel Gardas @ 2015-08-03 15:59 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

On Mon, Aug 3, 2015 at 4:51 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> ok - NetBSD 6.5.1 SPARC64 is blasting fast compare to Debian 7.8.0 SPARC64 -
> i installed the complete system (without X) in a few minutes
> Debian needs >1h

I had the same experience with OpenBSD for sparc64, also fast to
install. The problem is that this is due to simplicity and amount of
data/bins to install and not due to fact qemu got miraculously faster
with *bsd sparc64 bins I'm afraid. IIRC mu compile nbench2 benchmark
reveals +- the same performance on sparc64/openbsd and on
sparc64/sparc32-userland debian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-03 15:59                 ` Karel Gardas
@ 2015-08-03 19:51                   ` Dennis Luehring
  2015-08-06  9:00                     ` Karel Gardas
  0 siblings, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-03 19:51 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

Am 03.08.2015 um 17:59 schrieb Karel Gardas:
> On Mon, Aug 3, 2015 at 4:51 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> > ok - NetBSD 6.5.1 SPARC64 is blasting fast compare to Debian 7.8.0 SPARC64 -
> > i installed the complete system (without X) in a few minutes
> > Debian needs >1h
>
> I had the same experience with OpenBSD for sparc64, also fast to
> install. The problem is that this is due to simplicity and amount of
> data/bins to install and not due to fact qemu got miraculously faster
> with *bsd sparc64 bins I'm afraid. IIRC mu compile nbench2 benchmark
> reveals +- the same performance on sparc64/openbsd and on
> sparc64/sparc32-userland debian.

need to check the installation size - but NetBSD was "full install 
(without x)" and
debian was just "basic system utilities"

NetBSD comes with gcc with this installtion type, debian not

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-03 19:51                   ` Dennis Luehring
@ 2015-08-06  9:00                     ` Karel Gardas
  2015-08-06  9:21                       ` Dennis Luehring
  2015-08-18  4:25                       ` Dennis Luehring
  0 siblings, 2 replies; 80+ messages in thread
From: Karel Gardas @ 2015-08-06  9:00 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

Denis, if NetBSD is fast in qemu and if it provides sparc64 user-land,
perhaps also its GCC is sparc64 binary and if so, then it would be
good if you do your original benchmark of compiling pugixml.cpp and
write the numbers here for comparison? I would certainly appreciate it
since I'll not get to this testing in foreseeable future again.

Thanks! Karel

On Mon, Aug 3, 2015 at 9:51 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 03.08.2015 um 17:59 schrieb Karel Gardas:
>>
>> On Mon, Aug 3, 2015 at 4:51 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
>> > ok - NetBSD 6.5.1 SPARC64 is blasting fast compare to Debian 7.8.0
>> > SPARC64 -
>> > i installed the complete system (without X) in a few minutes
>> > Debian needs >1h
>>
>> I had the same experience with OpenBSD for sparc64, also fast to
>> install. The problem is that this is due to simplicity and amount of
>> data/bins to install and not due to fact qemu got miraculously faster
>> with *bsd sparc64 bins I'm afraid. IIRC mu compile nbench2 benchmark
>> reveals +- the same performance on sparc64/openbsd and on
>> sparc64/sparc32-userland debian.
>
>
> need to check the installation size - but NetBSD was "full install (without
> x)" and
> debian was just "basic system utilities"
>
> NetBSD comes with gcc with this installtion type, debian not
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-06  9:00                     ` Karel Gardas
@ 2015-08-06  9:21                       ` Dennis Luehring
  2015-08-06  9:27                         ` Dennis Luehring
  2015-08-18  4:25                       ` Dennis Luehring
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-06  9:21 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

i don't know nothing about NetBSD - network isn't working (or dhcp 
inactive - i don't know), can't install wget etc. - so it will take some 
time

Am 06.08.2015 um 11:00 schrieb Karel Gardas:
> Denis, if NetBSD is fast in qemu and if it provides sparc64 user-land,
> perhaps also its GCC is sparc64 binary and if so, then it would be
> good if you do your original benchmark of compiling pugixml.cpp and
> write the numbers here for comparison? I would certainly appreciate it
> since I'll not get to this testing in foreseeable future again.
>
> Thanks! Karel
>
> On Mon, Aug 3, 2015 at 9:51 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> > Am 03.08.2015 um 17:59 schrieb Karel Gardas:
> >>
> >> On Mon, Aug 3, 2015 at 4:51 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> >> > ok - NetBSD 6.5.1 SPARC64 is blasting fast compare to Debian 7.8.0
> >> > SPARC64 -
> >> > i installed the complete system (without X) in a few minutes
> >> > Debian needs >1h
> >>
> >> I had the same experience with OpenBSD for sparc64, also fast to
> >> install. The problem is that this is due to simplicity and amount of
> >> data/bins to install and not due to fact qemu got miraculously faster
> >> with *bsd sparc64 bins I'm afraid. IIRC mu compile nbench2 benchmark
> >> reveals +- the same performance on sparc64/openbsd and on
> >> sparc64/sparc32-userland debian.
> >
> >
> > need to check the installation size - but NetBSD was "full install (without
> > x)" and
> > debian was just "basic system utilities"
> >
> > NetBSD comes with gcc with this installtion type, debian not
> >

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-06  9:21                       ` Dennis Luehring
@ 2015-08-06  9:27                         ` Dennis Luehring
  2015-08-06 12:50                           ` Karel Gardas
  0 siblings, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-06  9:27 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

Am 06.08.2015 um 11:21 schrieb Dennis Luehring:
> if NetBSD is fast in qemu and if it provides sparc64 user-land,
> >perhaps also its GCC is sparc64 binary and if so

according to the docs its pure SPARC64 kernel and userland (no exceptions)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-06  9:27                         ` Dennis Luehring
@ 2015-08-06 12:50                           ` Karel Gardas
  2015-08-06 16:35                             ` Dennis Luehring
  0 siblings, 1 reply; 80+ messages in thread
From: Karel Gardas @ 2015-08-06 12:50 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

I use -net nic,model=i82551 -net user for OpenBSD, perhaps this will
also work for you? This is for Qemu 2.2.0 and whole command line
looks: /opt/qemu-2.2.0/bin/qemu-system-sparc64 -hda
openbsd_sparc64.img  -m 1024 -nographic -net nic,model=i82551 -net
user

On Thu, Aug 6, 2015 at 11:27 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 06.08.2015 um 11:21 schrieb Dennis Luehring:
>>
>> if NetBSD is fast in qemu and if it provides sparc64 user-land,
>> >perhaps also its GCC is sparc64 binary and if so
>
>
> according to the docs its pure SPARC64 kernel and userland (no exceptions)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-06 12:50                           ` Karel Gardas
@ 2015-08-06 16:35                             ` Dennis Luehring
  0 siblings, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-06 16:35 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

using (later) git qemu 2.3.93

~/qemu/sparc64-softmmu/qemu-system-sparc64 -m 1024 -nographic -monitor 
telnet::4440,server,nowait -serial telnet::3000,server -hda 
./netbsd-615-sparc64.raw -cdrom ./NetBSD-6.1.5-sparc64.iso -boot d -net 
nic,model=i82551 -net user

gives me unlimited lines of (on installation or normal system start)
   data error type 32 sfsr=0 sfva=0 afsr=0 afva=0 tf=0x1c04f20
   data_access_error: no fault
   data error type 32 sfsr=0 sfva=0 afsr=0 afva=0 tf=0x1c04f20
   data_access_error: no fault
   data error type 32 sfsr=0 sfva=0 afsr=0 afva=0 tf=0x1c04f20
   data_access_error: no fault
   ....

installation worked before with "-net nic -net user" without ,model=i82551

Am 06.08.2015 um 14:50 schrieb Karel Gardas:
> I use -net nic,model=i82551 -net user for OpenBSD, perhaps this will
> also work for you? This is for Qemu 2.2.0 and whole command line
> looks: /opt/qemu-2.2.0/bin/qemu-system-sparc64 -hda
> openbsd_sparc64.img  -m 1024 -nographic -net nic,model=i82551 -net
> user
>
> On Thu, Aug 6, 2015 at 11:27 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> > Am 06.08.2015 um 11:21 schrieb Dennis Luehring:
> >>
> >> if NetBSD is fast in qemu and if it provides sparc64 user-land,
> >> >perhaps also its GCC is sparc64 binary and if so
> >
> >
> > according to the docs its pure SPARC64 kernel and userland (no exceptions)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-31 15:43                   ` Aurelien Jarno
  2015-08-02 13:11                     ` Mark Cave-Ayland
  2015-08-03  8:31                     ` Artyom Tarasenko
@ 2015-08-17 11:32                     ` Dennis Luehring
  2 siblings, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-17 11:32 UTC (permalink / raw)
  To: Artyom Tarasenko, qemu-devel

Am 31.07.2015 um 17:43 schrieb Aurelien Jarno:
> > >Anyway I have extracted this code into a C file (see attached file) that
> > >can more easily compiled to 32 or 64 bit using -m32 or -m64. I observe
> > >the same behavior than sysbench, even with qemu-user (which is not
> > >surprising as the above code doesn't really put pressure the MMU.
> > >
> > >Running it in I get the following time:
> > >x86-64 host       0.877s
> > >sparc guest -m32  1m39s
> > >sparc guest -m64   3.5s
> > >opensparc T1 -m32 1m59s
> > >opensparc T1 -m64 1m12s

i've redone the benchmarks with Debian and NetBSD SPARC64

host: Ubuntu 15.04 x64 (latest updates) i7, 8 Cores, 8 GB RAM
   uname -a
   Linux dl-Precision-M6500 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 
21:17:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

   file /usr/bin/gcc
   /usr/bin/gcc: symbolic link to `gcc-4.9'
   file /usr/bin/gcc-4.9
   /usr/bin/gcc-4.9: ELF 64-bit LSB executable, x86-64, version 1 
(SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, 
BuildID[sha1]=f9897a3711d41df1d427f81bf3a60a60c377cd12, stripped

----------------

qemu: qemu 2.3.93 build from source

   file ~/qemu/sparc64-softmmu/qemu-system-sparc64
   /home/dl/qemu/sparc64-softmmu/qemu-system-sparc64: ELF 64-bit LSB 
shared object, x86-64, version 1 (SYSV), dynamically linked (uses shared 
libs), for GNU/Linux 2.6.32, 
BuildID[sha1]=8cae7ad397bb9beb12d1ad670c3170a8dceef139, not stripped

----------------

guest-debian: Debian 7.8.0 SPARC64 (mixed 32/64 bit kernel/userland)

uname -a
Linux debian 3.2.0-4-sparc64 #1 Debian 3.2.68-1+deb7u2 sparc64 GNU/Linux

32bit GCC

file /usr/bin/gcc
/usr/bin/gcc: symbolic link to `gcc-4.6'
file /usr/bin/gcc-4.6
/usr/bin/gcc-4.6: ELF 32-bit MSB executable, SPARC32PLUS, V8+ Required, 
version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 
2.6.26, BuildID[sha1]=0x64ad1bef0a0bfdb8780363e811c39b7c97d567ac, stripped

----------------

guest-netsbd: NetBSD 6.1.5 SPARC64
(according to the documentation + mailing list questions its pure 64bit 
kernel and userland)

uname -a
NetBSD myhost.mydom 6.1.5 NetBSD 6.1.5 (GENERIC) sparc64

64bit GCC

file /usr/bin/gcc
/usr/bin/gcc: ELF 64-bit MSB executable, SPARC V9, relaxed memory 
ordering, (SYSV), dynamically linked (uses shared libs), for NetBSD 
6.1.5, not stripped

----------------

benchmarks:

compilation pugixml 1.6 pugixml.cpp:
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP

host: ~3 sec
guest-debian: ~3:52.6 (32bit gcc)
guest-netbsd: ~3:27.6 (64bit gcc)

runtime Aurelien Jarnos prime.c
gcc prime.c -o prime.out -lm

host: ~2 sec
guest-debian(-m32): ~3:37.5
guest-debian(-m64): ~11 sec
guest-netbsd(only -m64): ~11 sec

Aurelien Jarnos explained the "11 sec" boost running prime.c using -m64, 
but still the NetBSD 64bit gcc needs 3:27.6 to compile pugixml.cpp - its 
just
one file, 1GB of RAM, no swapping

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-30  7:55             ` Aurelien Jarno
@ 2015-08-17 14:19               ` Artyom Tarasenko
  2015-08-17 15:40                 ` Richard Henderson
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-17 14:19 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: Richard Henderson, alex.bennee, qemu-devel, Dennis Luehring,
	Karel Gardas

On Thu, Jul 30, 2015 at 9:55 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On 2015-07-30 05:47, Dennis Luehring wrote:
>> so your aarch64 is just less todo for qemu - not EVERY >= 16bit memory
>> access needs swapping or needs check for unaligned access to emulate
>> bus-erros
>
> On recent Intel CPU, the byteswapping comes for free (MOVBE
> instruction).
>
> About the unaligned access it's actually the reverse. The fact that
> aarch64 does unaligned access means they have to go through the slow
> path (I have posted a patch to improve that). On sparc given that all
> access are aligned means there are more chances to go through the fast
> path.

Well, on the other hand, every access goes via helper_check_align.
There is a comment /* XXX remove alignment check */.
I wonder how this can be done in a  more efficient way?
Inlining the check and using a brcond instruction would require that
all TCG temps would have to be changed to locals, which in turn would
produce a performance impact. Are there other ways of doing it?

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-17 14:19               ` Artyom Tarasenko
@ 2015-08-17 15:40                 ` Richard Henderson
  2015-08-17 16:25                   ` Artyom Tarasenko
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2015-08-17 15:40 UTC (permalink / raw)
  To: Artyom Tarasenko, Aurelien Jarno
  Cc: alex.bennee, qemu-devel, Dennis Luehring, Karel Gardas

On 08/17/2015 07:19 AM, Artyom Tarasenko wrote:
> Well, on the other hand, every access goes via helper_check_align.
> There is a comment /* XXX remove alignment check */.
> I wonder how this can be done in a  more efficient way?

Not ever access does so.  There are only 3 memory related calls to check_align.
 The other three are for indirect branches.

For the 8 byte memory operations we can just remove the checks.  There, the
softmmu operation checks the alignment.  For usermode, we've typically ignored
the guest alignment (which also causes failures for a host that requires
alignment emulating a guest that does not).


r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-17 15:40                 ` Richard Henderson
@ 2015-08-17 16:25                   ` Artyom Tarasenko
  2015-08-17 21:08                     ` Aurelien Jarno
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-17 16:25 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Karel Gardas, alex.bennee, Dennis Luehring, Aurelien Jarno, qemu-devel

On Mon, Aug 17, 2015 at 5:40 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/17/2015 07:19 AM, Artyom Tarasenko wrote:
>> Well, on the other hand, every access goes via helper_check_align.
>> There is a comment /* XXX remove alignment check */.
>> I wonder how this can be done in a  more efficient way?
>
> Not ever access does so.  There are only 3 memory related calls to check_align.
>  The other three are for indirect branches.

Yes, but I think it's the 3 most used ones.

> For the 8 byte memory operations we can just remove the checks.  There, the
> softmmu operation checks the alignment.

This is a good news. Where does it happen?

> For usermode, we've typically ignored
> the guest alignment (which also causes failures for a host that requires
> alignment emulating a guest that does not).

Good to know. But I think it's a good compromise between the
performance and accuracy.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-17 16:25                   ` Artyom Tarasenko
@ 2015-08-17 21:08                     ` Aurelien Jarno
  0 siblings, 0 replies; 80+ messages in thread
From: Aurelien Jarno @ 2015-08-17 21:08 UTC (permalink / raw)
  To: Artyom Tarasenko
  Cc: Karel Gardas, alex.bennee, qemu-devel, Dennis Luehring,
	Richard Henderson

On 2015-08-17 18:25, Artyom Tarasenko wrote:
> On Mon, Aug 17, 2015 at 5:40 PM, Richard Henderson <rth@twiddle.net> wrote:
> > On 08/17/2015 07:19 AM, Artyom Tarasenko wrote:
> >> Well, on the other hand, every access goes via helper_check_align.
> >> There is a comment /* XXX remove alignment check */.
> >> I wonder how this can be done in a  more efficient way?
> >
> > Not ever access does so.  There are only 3 memory related calls to check_align.
> >  The other three are for indirect branches.
> 
> Yes, but I think it's the 3 most used ones.
> 
> > For the 8 byte memory operations we can just remove the checks.  There, the
> > softmmu operation checks the alignment.
> 
> This is a good news. Where does it happen?
> 
> > For usermode, we've typically ignored
> > the guest alignment (which also causes failures for a host that requires
> > alignment emulating a guest that does not).

A tiny bit of topic, but couldn't we force the use of unaligned access 
load/store instructions in user mode instead? For example in QEMU we
can use the LWL/LWR couple instead of LW. I doubt it will make any
measurable difference in speed. For the MIPS case, that doesn't work
for 16-bit load/stores though.

The best would indeed be to switch to softmmu for the user mode. I know
there are people working on that, but given that it might take time, it
could be a simple temporary solution.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-06  9:00                     ` Karel Gardas
  2015-08-06  9:21                       ` Dennis Luehring
@ 2015-08-18  4:25                       ` Dennis Luehring
  2015-08-18  8:19                         ` Aurelien Jarno
       [not found]                         ` <CAMO55fkcW1eOaZSz2MJgqZEP29pTuHvTLe0Kna5eHYfg7cFyPA@mail.gmail.com>
  1 sibling, 2 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-18  4:25 UTC (permalink / raw)
  To: Karel Gardas; +Cc: qemu-devel, Aurelien Jarno, Artyom Tarasenko

Am 06.08.2015 um 11:00 schrieb Karel Gardas:
> Denis, if NetBSD is fast in qemu and if it provides sparc64 user-land,
> perhaps also its GCC is sparc64 binary and if so, then it would be
> good if you do your original benchmark of compiling pugixml.cpp and
> write the numbers here for comparison? I would certainly appreciate it
> since I'll not get to this testing in foreseeable future again.

i've re-redone the benchmarks with Debian and NetBSD SPARC64

benchmarks:

compilation pugixml 1.6 pugixml.cpp:
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c
-MMD -MP

host: ~3 sec
guest-debian: ~3:52.6 (32bit gcc, virtio)
guest-debian: ~3:01.7 (32bit gcc, virtio, using the qcow2 image from an 
ramfs ramdisk)
guest-netbsd: ~3:27.6 (64bit gcc, non-virtio)
guest-netbsd: ~2:51.6 (64bit gcc, non-virtio, using the qcow2 image from 
an ramfs ramdisk)

runtime Aurelien Jarnos prime.c
gcc prime.c -o prime.out -lm

host: ~2 sec
guest-debian(-m32): ~3:37.5
guest-debian(-m64): ~11 sec
guest-netbsd(only -m64): ~11 sec

Aurelien Jarnos explained the "11 sec" boost running prime.c using -m64,
but still the NetBSD 64bit gcc needs 3:27.6 to compile pugixml.cpp - its
just one file, 1GB of RAM, no swapping

using a ramdisk gives even under debian(with virtio) a 50sek speedup, 
netbsd (without virtio) just gains 30sek

-----------------

host: Ubuntu 15.04 x64 (latest updates) i7, 8 Cores, 8 GB RAM
    uname -a
    Linux dl-Precision-M6500 3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 
21:17:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

    file /usr/bin/gcc
    /usr/bin/gcc: symbolic link to `gcc-4.9'
    file /usr/bin/gcc-4.9
    /usr/bin/gcc-4.9: ELF 64-bit LSB executable, x86-64, version 1 
(SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, 
BuildID[sha1]=f9897a3711d41df1d427f81bf3a60a60c377cd12, stripped

----------------

qemu: qemu 2.4.50 build from source (the former posted 2.3.93 was the 
wrong version)

    file ~/qemu/sparc64-softmmu/qemu-system-sparc64
    /home/dl/qemu/sparc64-softmmu/qemu-system-sparc64: ELF 64-bit LSB 
shared object, x86-64, version 1 (SYSV), dynamically linked (uses shared 
libs), for GNU/Linux 2.6.32, 
BuildID[sha1]=8cae7ad397bb9beb12d1ad670c3170a8dceef139, not stripped

----------------

guest-debian: Debian 7.8.0 SPARC64 (mixed 32/64 bit kernel/userland)

uname -a
Linux debian 3.2.0-4-sparc64 #1 Debian 3.2.68-1+deb7u2 sparc64 GNU/Linux

32bit GCC

file /usr/bin/gcc
/usr/bin/gcc: symbolic link to `gcc-4.6'
file /usr/bin/gcc-4.6
/usr/bin/gcc-4.6: ELF 32-bit MSB executable, SPARC32PLUS, V8+ Required, 
version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 
2.6.26, BuildID[sha1]=0x64ad1bef0a0bfdb8780363e811c39b7c97d567ac, stripped

----------------

guest-netsbd: NetBSD 6.1.5 SPARC64
(according to the documentation + mailing list questions its pure 64bit
kernel and userland)

uname -a
NetBSD myhost.mydom 6.1.5 NetBSD 6.1.5 (GENERIC) sparc64

64bit GCC

file /usr/bin/gcc
/usr/bin/gcc: ELF 64-bit MSB executable, SPARC V9, relaxed memory 
ordering, (SYSV), dynamically linked (uses shared libs), for NetBSD 
6.1.5, not stripped

----------------

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-18  4:25                       ` Dennis Luehring
@ 2015-08-18  8:19                         ` Aurelien Jarno
  2015-08-18 10:39                           ` Dennis Luehring
  2015-08-18 11:21                           ` Dennis Luehring
       [not found]                         ` <CAMO55fkcW1eOaZSz2MJgqZEP29pTuHvTLe0Kna5eHYfg7cFyPA@mail.gmail.com>
  1 sibling, 2 replies; 80+ messages in thread
From: Aurelien Jarno @ 2015-08-18  8:19 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Artyom Tarasenko, Karel Gardas

On 2015-08-18 06:25, Dennis Luehring wrote:
> Am 06.08.2015 um 11:00 schrieb Karel Gardas:
> >Denis, if NetBSD is fast in qemu and if it provides sparc64 user-land,
> >perhaps also its GCC is sparc64 binary and if so, then it would be
> >good if you do your original benchmark of compiling pugixml.cpp and
> >write the numbers here for comparison? I would certainly appreciate it
> >since I'll not get to this testing in foreseeable future again.
> 
> i've re-redone the benchmarks with Debian and NetBSD SPARC64
> 
> benchmarks:
> 
> compilation pugixml 1.6 pugixml.cpp:
> g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c
> -MMD -MP
> 
> host: ~3 sec
> guest-debian: ~3:52.6 (32bit gcc, virtio)
> guest-debian: ~3:01.7 (32bit gcc, virtio, using the qcow2 image from an
> ramfs ramdisk)
> guest-netbsd: ~3:27.6 (64bit gcc, non-virtio)
> guest-netbsd: ~2:51.6 (64bit gcc, non-virtio, using the qcow2 image from an
> ramfs ramdisk)
> 
> runtime Aurelien Jarnos prime.c
> gcc prime.c -o prime.out -lm
> 
> host: ~2 sec
> guest-debian(-m32): ~3:37.5
> guest-debian(-m64): ~11 sec
> guest-netbsd(only -m64): ~11 sec
> 
> Aurelien Jarnos explained the "11 sec" boost running prime.c using -m64,
> but still the NetBSD 64bit gcc needs 3:27.6 to compile pugixml.cpp - its
> just one file, 1GB of RAM, no swapping
> 
> using a ramdisk gives even under debian(with virtio) a 50sek speedup, netbsd
> (without virtio) just gains 30sek

How big is the source file and the output file? I find strange that I/O
impacts so much for a compilation which should be CPU bounded. Maybe try
to add the -pipe argument to g++.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-03  9:17                       ` Aurelien Jarno
@ 2015-08-18  9:24                         ` Artyom Tarasenko
  2015-08-18 17:55                           ` Richard Henderson
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-18  9:24 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Dennis Luehring, Richard Henderson

On Mon, Aug 3, 2015 at 11:17 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On 2015-08-03 10:31, Artyom Tarasenko wrote:
>> Hi Aurelien,
>>
>> On Fri, Jul 31, 2015 at 5:43 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>
>> >> > It uses a lot of integer functions
>> >> > based on CPU flags, so most of the time is spent computing them in
>> >> > helper_compute_psr.
>> >>
>> >> I wonder if this can be optimized. I guess most RISC CPUs would have a
>> >> similar problem. Unlike x86, the compilers usually optimize
>> >> instructions on flag usage. If there is an instruction modifying flags
>> >> in a code, the flags will be used for sure, so it probably makes a
>> >> little sense to pospone the flag computation?
>> >
>> > Indeed. ARM and SH4 use one TCG temp per flag, and they can be computed
>> > one by one using setcond. The optimizer and the liveness analysis then
>> > get rid of the unused computation. However while it allows intra-TB
>> > optimization, it prevent any other flags optimization. Therefore the
>> > only way to know if it is a good idea or not is to implement it and
>> > benchmark that, but using a bit more than a single biased benchmark like
>> > the one from sysbench.
>> >
>> > Also note that the current implementation predates the introduction of
>> > setcond, which is necessary to be able to compute the flags using TCG
>> > code.
>>
>> Thanks for explaining it, the problem is much more clear now.
>> Moving to setcond is definitely worth a shot. I'd like to play with it.
>> What would be the minimal entity to change without reworking the complete TCG:
>>  a) one flag for one instruction,
>>  b) all flags for one instruction,
>>  c) one flag for all instructions,
>> or d) all flags for all instructions (gradually moving to setcond is
>> not possible) ?
>
> You should with the c) option. You can look at how I done this for SH4,
> starting with commit 5ed9a259c164bb9fd2a6fe8a363a4bda2e4a5461.

FWIW I tried this for Z and N flags, but the resulting code was slower
than the current implementation.

Actually the current implementation is already very good intra-TB
optimized: for the case where a conditional branch/move follows a
compare operation no external helpers are called.

The unoptimized case is a sequence of multiple cmp and branch
operations (likely created by a "case" statement in the original
source code), especially where cmp is in a delay slot of a branch
instruction.

I wonder whether we always have to finish a TB on a conditional jump.
Maybe it would make sense to translate further if a destination of a
jump is not too far from dc->pc? The definition of "not too far" is
indeed tricky.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-18  8:19                         ` Aurelien Jarno
@ 2015-08-18 10:39                           ` Dennis Luehring
  2015-08-18 11:21                           ` Dennis Luehring
  1 sibling, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-18 10:39 UTC (permalink / raw)
  To: Karel Gardas, qemu-devel, Artyom Tarasenko

Am 18.08.2015 um 10:19 schrieb Aurelien Jarno:
> How big is the source file and the output file? I find strange that I/O
> impacts so much for a compilation which should be CPU bounded. Maybe try
> to add the -pipe argument to g++.

NetBSD SPARC64, running from ramdisk, qemu 2.4.50

g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP
#1: 2:52.6
#2: 2:49.2
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP -pipe
#3: 2:50.2
#4: 2:52.1

pugixml.cpp:  323.273 bytes
pugixml.o:  1.095.832 bytes
pugixml.d:        101 bytes

Stop after the preprocessing stage:
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP -E > pugixml.prep.cpp

runtime: ~5sek

pugixml.cpp.prep: 788.350 bytes

Stop after the stage of compilation proper:
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP -S
runtime #1: 2:40.1
runtime #2: 2:41.0

Compile or assemble the source files, but do not link:
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP -c
runtime: 2:52.6

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-18  8:19                         ` Aurelien Jarno
  2015-08-18 10:39                           ` Dennis Luehring
@ 2015-08-18 11:21                           ` Dennis Luehring
  1 sibling, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-18 11:21 UTC (permalink / raw)
  To: Karel Gardas, qemu-devel, Artyom Tarasenko, Aurelien Jarno

Am 18.08.2015 um 10:19 schrieb Aurelien Jarno:
> How big is the source file and the output file? I find strange that I/O
> impacts so much for a compilation which should be CPU bounded. Maybe try
> to add the -pipe argument to g++.

and the gcc -ftime-report

g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP -ftime-report

Execution times (seconds)
  callgraph construction:   3.95 ( 2%) usr   1.54 ( 2%) sys   5.47 ( 2%) 
wall    1344 kB ( 2%) ggc
  callgraph optimization:   2.77 ( 1%) usr   1.35 ( 2%) sys   4.02 ( 1%) 
wall    1155 kB ( 1%) ggc
  ipa free lang data    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall       0 kB ( 0%) ggc
  cfg cleanup           :   0.73 ( 0%) usr   0.17 ( 0%) sys   0.93 ( 0%) 
wall       7 kB ( 0%) ggc
  trivially dead code   :   0.57 ( 0%) usr   0.22 ( 0%) sys   0.95 ( 0%) 
wall       0 kB ( 0%) ggc
  df live regs          :   0.76 ( 0%) usr  -0.01 (-0%) sys   0.77 ( 0%) 
wall       0 kB ( 0%) ggc
  df reg dead/unused notes: 0.95 ( 0%) usr   0.07 ( 0%) sys   1.13 ( 0%) 
wall     839 kB ( 1%) ggc
  register information  :   0.60 ( 0%) usr   0.03 ( 0%) sys   0.64 ( 0%) 
wall       0 kB ( 0%) ggc
  alias analysis        :   0.37 ( 0%) usr   0.15 ( 0%) sys   0.49 ( 0%) 
wall     311 kB ( 0%) ggc
  rebuild jump labels   :   0.49 ( 0%) usr   0.11 ( 0%) sys   0.50 ( 0%) 
wall       0 kB ( 0%) ggc
  preprocessing         :   7.70 ( 4%) usr   8.02 ( 9%) sys  15.55 ( 5%) 
wall     801 kB ( 1%) ggc
  parser                :  42.51 (21%) usr  17.30 (20%) sys  61.96 (22%) 
wall   32967 kB (37%) ggc
  name lookup           :  27.33 (14%) usr  36.38 (43%) sys  61.97 (22%) 
wall    3781 kB ( 4%) ggc
  inline heuristics     :   1.37 ( 1%) usr   0.21 ( 0%) sys   1.66 ( 1%) 
wall       0 kB ( 0%) ggc
  tree gimplify         :   6.54 ( 3%) usr   0.50 ( 1%) sys   7.46 ( 3%) 
wall    8194 kB ( 9%) ggc
  tree eh               :   0.80 ( 0%) usr   0.23 ( 0%) sys   0.94 ( 0%) 
wall     469 kB ( 1%) ggc
  tree CFG construction :   0.96 ( 0%) usr   0.22 ( 0%) sys   1.24 ( 0%) 
wall    3391 kB ( 4%) ggc
  tree CFG cleanup      :   1.46 ( 1%) usr   0.22 ( 0%) sys   1.83 ( 1%) 
wall      11 kB ( 0%) ggc
  tree find ref. vars   :   0.34 ( 0%) usr   0.07 ( 0%) sys   0.36 ( 0%) 
wall     346 kB ( 0%) ggc
  tree PHI insertion    :   0.34 ( 0%) usr   0.09 ( 0%) sys   0.43 ( 0%) 
wall     409 kB ( 0%) ggc
  tree SSA rewrite      :   0.32 ( 0%) usr   0.18 ( 0%) sys   0.61 ( 0%) 
wall    1243 kB ( 1%) ggc
  tree SSA other        :   2.30 ( 1%) usr   1.20 ( 1%) sys   3.74 ( 1%) 
wall     152 kB ( 0%) ggc
  tree operand scan     :   1.24 ( 1%) usr   1.12 ( 1%) sys   2.35 ( 1%) 
wall    2174 kB ( 2%) ggc
  dominance frontiers   :   0.09 ( 0%) usr   0.04 ( 0%) sys   0.08 ( 0%) 
wall       0 kB ( 0%) ggc
  dominance computation :   1.15 ( 1%) usr   0.23 ( 0%) sys   1.11 ( 0%) 
wall       0 kB ( 0%) ggc
  expand                :  33.10 (17%) usr   6.31 ( 7%) sys  39.45 (14%) 
wall   16724 kB (19%) ggc
  varconst              :   0.27 ( 0%) usr   0.12 ( 0%) sys   0.60 ( 0%) 
wall      44 kB ( 0%) ggc
  jump                  :   0.16 ( 0%) usr   0.16 ( 0%) sys   0.34 ( 0%) 
wall      60 kB ( 0%) ggc
  integrated RA         :  13.32 ( 7%) usr   1.00 ( 1%) sys  13.84 ( 5%) 
wall    2521 kB ( 3%) ggc
  reload                :  11.42 ( 6%) usr   0.08 ( 0%) sys  11.68 ( 4%) 
wall    3016 kB ( 3%) ggc
  thread pro- & epilogue:   2.06 ( 1%) usr   0.11 ( 0%) sys 2.24 ( 1%) 
wall    1028 kB ( 1%) ggc
  final                 :  16.51 ( 8%) usr   0.36 ( 0%) sys  17.28 ( 6%) 
wall     609 kB ( 1%) ggc
  symout                :  11.74 ( 6%) usr   0.54 ( 1%) sys  12.19 ( 4%) 
wall    6948 kB ( 8%) ggc
  plugin execution      :   3.81 ( 2%) usr   6.63 ( 8%) sys   9.67 ( 3%) 
wall       0 kB ( 0%) ggc
  rest of compilation   :   0.02 ( 0%) usr   0.02 ( 0%) sys   0.04 ( 0%) 
wall       0 kB ( 0%) ggc
  TOTAL                 : 198.47            85.23 284.09              
89326 kB

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-18  9:24                         ` Artyom Tarasenko
@ 2015-08-18 17:55                           ` Richard Henderson
  2015-08-19 10:41                             ` Artyom Tarasenko
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2015-08-18 17:55 UTC (permalink / raw)
  To: Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel, Dennis Luehring

On 08/18/2015 02:24 AM, Artyom Tarasenko wrote:
> The unoptimized case is a sequence of multiple cmp and branch
> operations (likely created by a "case" statement in the original
> source code), especially where cmp is in a delay slot of a branch
> instruction.

Interesting.

> I wonder whether we always have to finish a TB on a conditional jump.
> Maybe it would make sense to translate further if a destination of a
> jump is not too far from dc->pc? The definition of "not too far" is
> indeed tricky.

We can only handle two chained exits from a TB.  If we continue past
a conditional branch, we may well encounter a second conditional branch, which
would leave us with three different exits from the TB.

Something that may be interesting to play with, however, is to change the TB
with which the insn in a delay slot is connected.

For instance, we currently spend some amount of effort computing and saving the
branch condition, so that we can then execute the delay slot, and afterwards
use the saved branch condition to perform the branch.

Another way of doing this is to immediately branch, exiting the TB.  But we set
up PC+NPC for the next TB such that the delay slot is the first insn that is
executed within the next TB.  In that way, the compare in the delay slot that
you mention *is* in the same TB as the branch that uses it, allowing
the case to be optimized.

This could wind up creating more TBs than the current solution, so it's not
clear that it would be a win.  One can mitigate that somewhat by noticing the
case where the delay slot is a nop.  I do think it's worth an experiment.


r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
       [not found]                         ` <CAMO55fkcW1eOaZSz2MJgqZEP29pTuHvTLe0Kna5eHYfg7cFyPA@mail.gmail.com>
@ 2015-08-19  4:28                           ` Dennis Luehring
  0 siblings, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-19  4:28 UTC (permalink / raw)
  To: Karel Gardas, qemu-devel, Artyom Tarasenko, Aurelien Jarno

Am 18.08.2015 um 21:06 schrieb Karel Gardas:
> Thanks a lot for doing this. It looks like g++ is memory-bound in this
> case, isn't it? What does stream[1] benchmark tell on host and
> emulated as 32/64bit sparc binary? Let's see if the ratio is kind of
> similar to the time you get...
>
> [1]:https://www.cs.virginia.edu/stream/

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------

==>host Ubuntu 15.04 x64

-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 14147 microseconds.
    (= 14147 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            8877.1     0.018049     0.018024     0.018074
Scale:           8842.7     0.018206     0.018094     0.018749
Add:            10312.9     0.023367     0.023272     0.023901
Triad:          10114.3     0.023758     0.023729     0.023871
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

qemu 2.4.50 x64 build

==>netbsd-guest NetBSD 6.1.5 SPARC64 (pure 64bit) running from ramdisk

-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 42 microseconds.
Each test below will take on the order of 330428 microseconds.
    (= 7867 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             771.5     0.214717     0.207377     0.244214
Scale:            288.1     0.573320     0.555401     0.660161
Add:              423.5     0.633523     0.566661     1.092067
Triad:            242.9     1.053032     0.987970     1.499563
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

==>debian-guest 7.8.0 SPARC64 (mixed 32/64bit) running from ramdisk

!!32bit version!!

-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 41 microseconds.
Each test below will take on the order of 394519 microseconds.
    (= 9622 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             629.4     0.280860     0.254224     0.401105
Scale:            231.7     0.733338     0.690452     0.868741
Add:              346.9     0.747893     0.691890     0.889102
Triad:            201.4     1.239293     1.191786     1.394918
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

!!64bit version!!

-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 40 microseconds.
Each test below will take on the order of 395364 microseconds.
    (= 9884 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             651.3     0.251320     0.245668     0.274346
Scale:            240.3     0.694808     0.665834     0.770982
Add:              353.0     0.690291     0.679792     0.715228
Triad:            201.5     1.207881     1.191054     1.256001
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-18 17:55                           ` Richard Henderson
@ 2015-08-19 10:41                             ` Artyom Tarasenko
  2015-08-19 11:00                               ` Aurelien Jarno
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-19 10:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Dennis Luehring, qemu-devel, Aurelien Jarno

Hi Richard,

On Tue, Aug 18, 2015 at 7:55 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/18/2015 02:24 AM, Artyom Tarasenko wrote:
>> The unoptimized case is a sequence of multiple cmp and branch
>> operations (likely created by a "case" statement in the original
>> source code), especially where cmp is in a delay slot of a branch
>> instruction.
>
> Interesting.
>
>> I wonder whether we always have to finish a TB on a conditional jump.
>> Maybe it would make sense to translate further if a destination of a
>> jump is not too far from dc->pc? The definition of "not too far" is
>> indeed tricky.
>
> We can only handle two chained exits from a TB.  If we continue past
> a conditional branch, we may well encounter a second conditional branch, which
> would leave us with three different exits from the TB.
>
> Something that may be interesting to play with, however, is to change the TB
> with which the insn in a delay slot is connected.
>
> For instance, we currently spend some amount of effort computing and saving the
> branch condition, so that we can then execute the delay slot, and afterwards
> use the saved branch condition to perform the branch.
>
> Another way of doing this is to immediately branch, exiting the TB.  But we set
> up PC+NPC for the next TB such that the delay slot is the first insn that is
> executed within the next TB.  In that way, the compare in the delay slot that
> you mention *is* in the same TB as the branch that uses it, allowing
> the case to be optimized.
>
> This could wind up creating more TBs than the current solution, so it's not
> clear that it would be a win.  One can mitigate that somewhat by noticing the
> case where the delay slot is a nop.  I do think it's worth an experiment.

So it is possible to make a TB with non sequential instructions?
The instruction in the delay slot would be located most likely
elsewhere than the following instructions.

But I think I've been chasing a red herring. I see those helpers in
perf top when running sysbench, but not when running g++ (and at the
end g++ is much more relevant benchmark for me):


Samples: 83K of event 'cpu-clock', Event count (approx.): 15333243164,
Thread: qemu-system-spa(2743)
 27.10%  [kernel]                 [k] retint_signal
 12.66%  qemu-system-sparc64      [.] tcg_optimize
  9.18%  [vdso]                   [.] 0x0000000000000998
  8.39%  [kernel]                 [k] _raw_spin_unlock_irqrestore
  4.76%  qemu-system-sparc64      [.] tcg_liveness_analysis
  3.89%  qemu-system-sparc64      [.] tcg_reg_alloc_op
  2.80%  qemu-system-sparc64      [.] tcg_out_opc
  2.45%  qemu-system-sparc64      [.] get_physical_address_data
  1.86%  [kernel]                 [k] native_read_tsc
  1.62%  qemu-system-sparc64      [.] tlb_flush_page
  1.55%  qemu-system-sparc64      [.] tcg_out_modrm_sib_offset.constprop.42
  1.45%  [unknown]                [.] 0x00000000451c5cae
  1.43%  qemu-system-sparc64      [.] gen_intermediate_code_pc
  1.39%  qemu-system-sparc64      [.] tcg_temp_new_internal_i64
  1.24%  qemu-system-sparc64      [.] tb_flush_jmp_cache
  1.11%  qemu-system-sparc64      [.] disas_sparc_insn
  1.08%  qemu-system-sparc64      [.] tcg_out_modrm
  0.97%  qemu-system-sparc64      [.] tcg_reg_alloc_start
  0.77%  qemu-system-sparc64      [.] cpu_sparc_exec
  0.73%  qemu-system-sparc64      [.] replace_tlb_1bit_lru.isra.3
  0.72%  qemu-system-sparc64      [.] tcg_gen_code_search_pc
  0.72%  qemu-system-sparc64      [.] tcg_opt_gen_mov
  0.70%  qemu-system-sparc64      [.] reset_temp

I'm not sure why I still see kernel functions when I zoom into qemu
thread. Is this qemu signal handling?
And then it would be interesting to know where in this listing is the
generated code. Is it [vdso], [unknown] or is it hidden behind
retint_signal?

Ironically a good optimization target seems to be the tcg_optimize
function. If I zoom I see it spends most of the time in
reset_all_temps.

Any suggestions how to improve it?

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-19 10:41                             ` Artyom Tarasenko
@ 2015-08-19 11:00                               ` Aurelien Jarno
  2015-08-19 14:41                                 ` Artyom Tarasenko
  0 siblings, 1 reply; 80+ messages in thread
From: Aurelien Jarno @ 2015-08-19 11:00 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Dennis Luehring, Richard Henderson

On 2015-08-19 12:41, Artyom Tarasenko wrote:
> Hi Richard,
> 
> On Tue, Aug 18, 2015 at 7:55 PM, Richard Henderson <rth@twiddle.net> wrote:
> > On 08/18/2015 02:24 AM, Artyom Tarasenko wrote:
> >> The unoptimized case is a sequence of multiple cmp and branch
> >> operations (likely created by a "case" statement in the original
> >> source code), especially where cmp is in a delay slot of a branch
> >> instruction.
> >
> > Interesting.
> >
> >> I wonder whether we always have to finish a TB on a conditional jump.
> >> Maybe it would make sense to translate further if a destination of a
> >> jump is not too far from dc->pc? The definition of "not too far" is
> >> indeed tricky.
> >
> > We can only handle two chained exits from a TB.  If we continue past
> > a conditional branch, we may well encounter a second conditional branch, which
> > would leave us with three different exits from the TB.
> >
> > Something that may be interesting to play with, however, is to change the TB
> > with which the insn in a delay slot is connected.
> >
> > For instance, we currently spend some amount of effort computing and saving the
> > branch condition, so that we can then execute the delay slot, and afterwards
> > use the saved branch condition to perform the branch.
> >
> > Another way of doing this is to immediately branch, exiting the TB.  But we set
> > up PC+NPC for the next TB such that the delay slot is the first insn that is
> > executed within the next TB.  In that way, the compare in the delay slot that
> > you mention *is* in the same TB as the branch that uses it, allowing
> > the case to be optimized.
> >
> > This could wind up creating more TBs than the current solution, so it's not
> > clear that it would be a win.  One can mitigate that somewhat by noticing the
> > case where the delay slot is a nop.  I do think it's worth an experiment.
> 
> So it is possible to make a TB with non sequential instructions?
> The instruction in the delay slot would be located most likely
> elsewhere than the following instructions.
> 
> But I think I've been chasing a red herring. I see those helpers in
> perf top when running sysbench, but not when running g++ (and at the
> end g++ is much more relevant benchmark for me):
> 
> 
> Samples: 83K of event 'cpu-clock', Event count (approx.): 15333243164,
> Thread: qemu-system-spa(2743)
>  27.10%  [kernel]                 [k] retint_signal
>  12.66%  qemu-system-sparc64      [.] tcg_optimize
>   9.18%  [vdso]                   [.] 0x0000000000000998
>   8.39%  [kernel]                 [k] _raw_spin_unlock_irqrestore
>   4.76%  qemu-system-sparc64      [.] tcg_liveness_analysis
>   3.89%  qemu-system-sparc64      [.] tcg_reg_alloc_op
>   2.80%  qemu-system-sparc64      [.] tcg_out_opc
>   2.45%  qemu-system-sparc64      [.] get_physical_address_data
>   1.86%  [kernel]                 [k] native_read_tsc
>   1.62%  qemu-system-sparc64      [.] tlb_flush_page
>   1.55%  qemu-system-sparc64      [.] tcg_out_modrm_sib_offset.constprop.42
>   1.45%  [unknown]                [.] 0x00000000451c5cae
>   1.43%  qemu-system-sparc64      [.] gen_intermediate_code_pc
>   1.39%  qemu-system-sparc64      [.] tcg_temp_new_internal_i64
>   1.24%  qemu-system-sparc64      [.] tb_flush_jmp_cache
>   1.11%  qemu-system-sparc64      [.] disas_sparc_insn
>   1.08%  qemu-system-sparc64      [.] tcg_out_modrm
>   0.97%  qemu-system-sparc64      [.] tcg_reg_alloc_start
>   0.77%  qemu-system-sparc64      [.] cpu_sparc_exec
>   0.73%  qemu-system-sparc64      [.] replace_tlb_1bit_lru.isra.3
>   0.72%  qemu-system-sparc64      [.] tcg_gen_code_search_pc
>   0.72%  qemu-system-sparc64      [.] tcg_opt_gen_mov
>   0.70%  qemu-system-sparc64      [.] reset_temp
> 
> I'm not sure why I still see kernel functions when I zoom into qemu
> thread. Is this qemu signal handling?
> And then it would be interesting to know where in this listing is the
> generated code. Is it [vdso], [unknown] or is it hidden behind
> retint_signal?
> 
> Ironically a good optimization target seems to be the tcg_optimize
> function. If I zoom I see it spends most of the time in
> reset_all_temps.
> 
> Any suggestions how to improve it?
> 

Try this patch:
http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02042.html

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-19 11:00                               ` Aurelien Jarno
@ 2015-08-19 14:41                                 ` Artyom Tarasenko
  2015-08-20  5:22                                   ` Dennis Luehring
  2015-08-20 17:19                                   ` Richard Henderson
  0 siblings, 2 replies; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-19 14:41 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Dennis Luehring, Richard Henderson

On Wed, Aug 19, 2015 at 1:00 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On 2015-08-19 12:41, Artyom Tarasenko wrote:
>> Hi Richard,
>>
>> On Tue, Aug 18, 2015 at 7:55 PM, Richard Henderson <rth@twiddle.net> wrote:
>> > On 08/18/2015 02:24 AM, Artyom Tarasenko wrote:
>> >> The unoptimized case is a sequence of multiple cmp and branch
>> >> operations (likely created by a "case" statement in the original
>> >> source code), especially where cmp is in a delay slot of a branch
>> >> instruction.
>> >
>> > Interesting.
>> >
>> >> I wonder whether we always have to finish a TB on a conditional jump.
>> >> Maybe it would make sense to translate further if a destination of a
>> >> jump is not too far from dc->pc? The definition of "not too far" is
>> >> indeed tricky.
>> >
>> > We can only handle two chained exits from a TB.  If we continue past
>> > a conditional branch, we may well encounter a second conditional branch, which
>> > would leave us with three different exits from the TB.
>> >
>> > Something that may be interesting to play with, however, is to change the TB
>> > with which the insn in a delay slot is connected.
>> >
>> > For instance, we currently spend some amount of effort computing and saving the
>> > branch condition, so that we can then execute the delay slot, and afterwards
>> > use the saved branch condition to perform the branch.
>> >
>> > Another way of doing this is to immediately branch, exiting the TB.  But we set
>> > up PC+NPC for the next TB such that the delay slot is the first insn that is
>> > executed within the next TB.  In that way, the compare in the delay slot that
>> > you mention *is* in the same TB as the branch that uses it, allowing
>> > the case to be optimized.
>> >
>> > This could wind up creating more TBs than the current solution, so it's not
>> > clear that it would be a win.  One can mitigate that somewhat by noticing the
>> > case where the delay slot is a nop.  I do think it's worth an experiment.
>>
>> So it is possible to make a TB with non sequential instructions?
>> The instruction in the delay slot would be located most likely
>> elsewhere than the following instructions.
>>
>> But I think I've been chasing a red herring. I see those helpers in
>> perf top when running sysbench, but not when running g++ (and at the
>> end g++ is much more relevant benchmark for me):
>>
>>
>> Samples: 83K of event 'cpu-clock', Event count (approx.): 15333243164,
>> Thread: qemu-system-spa(2743)
>>  27.10%  [kernel]                 [k] retint_signal
>>  12.66%  qemu-system-sparc64      [.] tcg_optimize
>>   9.18%  [vdso]                   [.] 0x0000000000000998
>>   8.39%  [kernel]                 [k] _raw_spin_unlock_irqrestore
>>   4.76%  qemu-system-sparc64      [.] tcg_liveness_analysis
>>   3.89%  qemu-system-sparc64      [.] tcg_reg_alloc_op
>>   2.80%  qemu-system-sparc64      [.] tcg_out_opc
>>   2.45%  qemu-system-sparc64      [.] get_physical_address_data
>>   1.86%  [kernel]                 [k] native_read_tsc
>>   1.62%  qemu-system-sparc64      [.] tlb_flush_page
>>   1.55%  qemu-system-sparc64      [.] tcg_out_modrm_sib_offset.constprop.42
>>   1.45%  [unknown]                [.] 0x00000000451c5cae
>>   1.43%  qemu-system-sparc64      [.] gen_intermediate_code_pc
>>   1.39%  qemu-system-sparc64      [.] tcg_temp_new_internal_i64
>>   1.24%  qemu-system-sparc64      [.] tb_flush_jmp_cache
>>   1.11%  qemu-system-sparc64      [.] disas_sparc_insn
>>   1.08%  qemu-system-sparc64      [.] tcg_out_modrm
>>   0.97%  qemu-system-sparc64      [.] tcg_reg_alloc_start
>>   0.77%  qemu-system-sparc64      [.] cpu_sparc_exec
>>   0.73%  qemu-system-sparc64      [.] replace_tlb_1bit_lru.isra.3
>>   0.72%  qemu-system-sparc64      [.] tcg_gen_code_search_pc
>>   0.72%  qemu-system-sparc64      [.] tcg_opt_gen_mov
>>   0.70%  qemu-system-sparc64      [.] reset_temp
>>
>> I'm not sure why I still see kernel functions when I zoom into qemu
>> thread. Is this qemu signal handling?
>> And then it would be interesting to know where in this listing is the
>> generated code. Is it [vdso], [unknown] or is it hidden behind
>> retint_signal?
>>
>> Ironically a good optimization target seems to be the tcg_optimize
>> function. If I zoom I see it spends most of the time in
>> reset_all_temps.
>>
>> Any suggestions how to improve it?
>>
>
> Try this patch:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02042.html

Note: I use a different benchmark than Dennis, I compile the binutils
gold linker (which is btw broken on sparc64).

Without the patch:

 time g++ -DHAVE_CONFIG_H -I. -I../binutils-gdb/gold
-I../binutils-gdb/gold -I../binutils-gdb/gold/../include
-I../binutils-gdb/gold/../elfcpp
-DLOCALEDIR="\"/usr/local/share/locale\""
-DBINDIR="\"/usr/local/bin\"" -DTOOLBINDIR="\"/usr/local//bin\""
-DTOOLLIBDIR="\"/usr/local//lib\""   -W -Wall    -Werror
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -frandom-seed=tilegx.o
-I../binutils-gdb/gold/../zlib -g -O2 -MT tilegx.o -MD -MP -MF
.deps/tilegx.Tpo -c -o tilegx.o ../binutils-gdb/gold/tilegx.cc

real    18m31.407s
user    18m23.661s
sys     0m6.784s

The patch surely improves the situation, tcg_optimize in the perf top
takes ~7% (instead of~12%), and the only function marked red by
perf-top is init_temp_info(). So with the patch:

real    17m46.380s
user    17m37.522s
sys     0m7.120s


And if I completely disable optimizer (// #define
USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster:

real    14m17.668s
user    14m10.241s
sys     0m6.060s

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-19 14:41                                 ` Artyom Tarasenko
@ 2015-08-20  5:22                                   ` Dennis Luehring
  2015-08-20 10:40                                     ` Artyom Tarasenko
  2015-08-20 17:19                                   ` Richard Henderson
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-20  5:22 UTC (permalink / raw)
  To: Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel, Richard Henderson

Am 19.08.2015 um 16:41 schrieb Artyom Tarasenko:
> And if I completely disable optimizer (// #define
> USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster:
>
> real    14m17.668s
> user    14m10.241s
> sys     0m6.060s

my tests also without USE_TCG_OPTIMIZATIONS

qemu 2.4.50, netbsd 6.1.5 SPARC64

without-optimization
//#define USE_TCG_OPTIMIZATIONS

pugixml compile: (without-optimization is faster)
with-optimization: ~2:51.2
without-optimization: ~2:14.1

prime.c runtime: (without-optimization is faster)
with-optimization: ~11 sec
without-optimization: ~9.9 sec

stream results (with-optimization gives better results)

with-optimization:

Your clock granularity/precision appears to be 42 microseconds.
Each test below will take on the order of 330428 microseconds.
     (= 7867 clock ticks)
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             771.5     0.214717     0.207377     0.244214
Scale:            288.1     0.573320     0.555401     0.660161
Add:              423.5     0.633523     0.566661     1.092067
Triad:            242.9     1.053032     0.987970     1.499563

without-optimization:

Your clock granularity/precision appears to be 41 microseconds.
Each test below will take on the order of 745254 microseconds.
    (= 18176 clock ticks)
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             316.6     0.524065     0.505313     0.580103
Scale:            200.5     0.813356     0.798024     0.840986
Add:              243.9     1.010247     0.984025     1.119149
Triad:            182.9     1.345601     1.312236     1.427459

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-20  5:22                                   ` Dennis Luehring
@ 2015-08-20 10:40                                     ` Artyom Tarasenko
  0 siblings, 0 replies; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-20 10:40 UTC (permalink / raw)
  To: Aurelien Jarno, Richard Henderson; +Cc: qemu-devel, Dennis Luehring

On Thu, Aug 20, 2015 at 7:22 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 19.08.2015 um 16:41 schrieb Artyom Tarasenko:
>>
>> And if I completely disable optimizer (// #define
>> USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster:
>>
>> real    14m17.668s
>> user    14m10.241s
>> sys     0m6.060s
>
>
> my tests also without USE_TCG_OPTIMIZATIONS
>
> qemu 2.4.50, netbsd 6.1.5 SPARC64
>
> without-optimization
> //#define USE_TCG_OPTIMIZATIONS
>
> pugixml compile: (without-optimization is faster)
> with-optimization: ~2:51.2
> without-optimization: ~2:14.1
>
> prime.c runtime: (without-optimization is faster)
> with-optimization: ~11 sec
> without-optimization: ~9.9 sec
>
> stream results (with-optimization gives better results)

Ok, this makes sense. Optimized code performs better but requires more
time for the translation.
The question is whether TCG can translate less while running a g++.
Maybe just increase the TB cache?

I see that it always uses the default TB buffer (sizetcg_init in
accel.c is called with an uninitialized variable).
And the default is 25 % of the machine memory (size_code_gen_buffer in
translate-all.c). I tried increasing this to 50%, and observe that
tb_flushes don't happen during the g++ run. Nevertheless QEMU is still
busy translating the code.

Why does it happen? I'd expect the TBs would mostly be re-used at some
point of running the same process.
Aurelien, Richard?

Artyom

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-19 14:41                                 ` Artyom Tarasenko
  2015-08-20  5:22                                   ` Dennis Luehring
@ 2015-08-20 17:19                                   ` Richard Henderson
  2015-08-21  4:32                                     ` Dennis Luehring
  2015-08-22 16:45                                     ` Artyom Tarasenko
  1 sibling, 2 replies; 80+ messages in thread
From: Richard Henderson @ 2015-08-20 17:19 UTC (permalink / raw)
  To: Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel, Dennis Luehring

On 08/19/2015 07:41 AM, Artyom Tarasenko wrote:
> Without the patch:
>
>   time g++ -DHAVE_CONFIG_H -I. -I../binutils-gdb/gold
> -I../binutils-gdb/gold -I../binutils-gdb/gold/../include
> -I../binutils-gdb/gold/../elfcpp
> -DLOCALEDIR="\"/usr/local/share/locale\""
> -DBINDIR="\"/usr/local/bin\"" -DTOOLBINDIR="\"/usr/local//bin\""
> -DTOOLLIBDIR="\"/usr/local//lib\""   -W -Wall    -Werror
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -frandom-seed=tilegx.o
> -I../binutils-gdb/gold/../zlib -g -O2 -MT tilegx.o -MD -MP -MF
> .deps/tilegx.Tpo -c -o tilegx.o ../binutils-gdb/gold/tilegx.cc
>
> real    18m31.407s
> user    18m23.661s
> sys     0m6.784s
>
> The patch surely improves the situation, tcg_optimize in the perf top
> takes ~7% (instead of~12%), and the only function marked red by
> perf-top is init_temp_info(). So with the patch:
>
> real    17m46.380s
> user    17m37.522s
> sys     0m7.120s
>
>
> And if I completely disable optimizer (// #define
> USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster:
>
> real    14m17.668s
> user    14m10.241s
> sys     0m6.060s

This isn't surprising, because at the moment tcg optimizations are almost 
completely ineffective for sparc.  The way the register windows are implemented 
means that there are very few proper tcg temporaries to optimize.

I've just updated an old branch that attempts to cure this.  It creates proper 
tcg temporaries for the windowed registers, and uses a bit of recursion to find 
the place at which they should be stored.

   git://github.com/rth7680/qemu.git tcg-indirect

With a few quick unscientific tests, it appears to help.  It would be nice to 
put that branch side-by-side with your tests above.


r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-20 17:19                                   ` Richard Henderson
@ 2015-08-21  4:32                                     ` Dennis Luehring
  2015-08-21  5:49                                       ` Richard Henderson
  2015-08-22 16:45                                     ` Artyom Tarasenko
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-21  4:32 UTC (permalink / raw)
  To: Richard Henderson, Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel

Am 20.08.2015 um 19:19 schrieb Richard Henderson:
> This isn't surprising, because at the moment tcg optimizations are almost
> completely ineffective for sparc.  The way the register windows are implemented
> means that there are very few proper tcg temporaries to optimize.
>
> I've just updated an old branch that attempts to cure this.  It creates proper
> tcg temporaries for the windowed registers, and uses a bit of recursion to find
> the place at which they should be stored.
>
>     git://github.com/rth7680/qemu.git  tcg-indirect
>
> With a few quick unscientific tests, it appears to help.  It would be nice to
> put that branch side-by-side with your tests above.

tcg-indirect seems not to improve (stream test degrades even more)

without-optimization means qemu.org-git + undefine USE_TCG_OPTIMIZATIONS

git clone git://github.com/rth7680/qemu.git
cd qemu
git checkout tcg-indirect

g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c 
-MMD -MP

tcg-indirect: ~2:46.5
qemu.org-git: ~2:51.2 (worst result)
without-optimization: ~2:14.1 (best result)

gcc prime.c -o prime.out -lm

prime.out runtime

tcg-indirect: ~9.3 sec (best result)
qemu.org-git: ~11 sec
without-optimization: ~9.9 sec (worst result)

stream results (STREAM version $Revision: 5.10 $)

tcg-indirect: (worst result)

Your clock granularity/precision appears to be 41 microseconds.
Each test below will take on the order of 632527 microseconds.
    (= 15427 clock ticks)
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             320.8     0.511297     0.498785     0.590214
Scale:            187.0     0.858693     0.855465     0.863527
Add:              218.2     1.104654     1.099698     1.110341
Triad:            169.5     1.433273     1.416321     1.502248

qemu.org-git: (best result)

Your clock granularity/precision appears to be 42 microseconds.
Each test below will take on the order of 330428 microseconds.
     (= 7867 clock ticks)
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             771.5     0.214717     0.207377     0.244214
Scale:            288.1     0.573320     0.555401     0.660161
Add:              423.5     0.633523     0.566661     1.092067
Triad:            242.9     1.053032     0.987970     1.499563

without-optimization:

Your clock granularity/precision appears to be 41 microseconds.
Each test below will take on the order of 745254 microseconds.
    (= 18176 clock ticks)
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             316.6     0.524065     0.505313     0.580103
Scale:            200.5     0.813356     0.798024     0.840986
Add:              243.9     1.010247     0.984025     1.119149
Triad:            182.9     1.345601     1.312236     1.427459

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-21  4:32                                     ` Dennis Luehring
@ 2015-08-21  5:49                                       ` Richard Henderson
  2015-08-21  6:05                                         ` Dennis Luehring
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2015-08-21  5:49 UTC (permalink / raw)
  To: Dennis Luehring, Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel

On 08/20/2015 09:32 PM, Dennis Luehring wrote:
> gcc prime.c -o prime.out -lm
>
> prime.out runtime
>
> tcg-indirect: ~9.3 sec (best result)
> qemu.org-git: ~11 sec
> without-optimization: ~9.9 sec (worst result)

I presume this is integer prime factoring?

> g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c -MMD -MP
>
> tcg-indirect: ~2:46.5
> qemu.org-git: ~2:51.2 (worst result)
> without-optimization: ~2:14.1 (best result)

No compiler optimization?  I wouldn't expect there to be much for tcg to 
optimize there -- dropping values to memory all the time doesn't leave much.

>
> stream results (STREAM version $Revision: 5.10 $)
>
> tcg-indirect: (worst result)
>
> Your clock granularity/precision appears to be 41 microseconds.
> Each test below will take on the order of 632527 microseconds.
>    (= 15427 clock ticks)
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             320.8     0.511297     0.498785     0.590214
> Scale:            187.0     0.858693     0.855465     0.863527
> Add:              218.2     1.104654     1.099698     1.110341
> Triad:            169.5     1.433273     1.416321     1.502248
>
> qemu.org-git: (best result)
>
> Your clock granularity/precision appears to be 42 microseconds.
> Each test below will take on the order of 330428 microseconds.
>     (= 7867 clock ticks)
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             771.5     0.214717     0.207377     0.244214
> Scale:            288.1     0.573320     0.555401     0.660161
> Add:              423.5     0.633523     0.566661     1.092067
> Triad:            242.9     1.053032     0.987970     1.499563
>
> without-optimization:
>
> Your clock granularity/precision appears to be 41 microseconds.
> Each test below will take on the order of 745254 microseconds.
>    (= 18176 clock ticks)
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             316.6     0.524065     0.505313     0.580103
> Scale:            200.5     0.813356     0.798024     0.840986
> Add:              243.9     1.010247     0.984025     1.119149
> Triad:            182.9     1.345601     1.312236     1.427459

These results are weird.  Unoptimized less than half the speed of mainline? 
Improving optimization (with no extra work, mind) brings the results back down?


r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-21  5:49                                       ` Richard Henderson
@ 2015-08-21  6:05                                         ` Dennis Luehring
  2015-08-21 15:47                                           ` Richard Henderson
  0 siblings, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-21  6:05 UTC (permalink / raw)
  To: Richard Henderson, Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel

Am 21.08.2015 um 07:49 schrieb Richard Henderson:
> On 08/20/2015 09:32 PM, Dennis Luehring wrote:
> > gcc prime.c -o prime.out -lm
> >
> > prime.out runtime
> >
> > tcg-indirect: ~9.3 sec (best result)
> > qemu.org-git: ~11 sec
> > without-optimization: ~9.9 sec (worst result)
>
> I presume this is integer prime factoring?


Aurelien Jarno extracted this code from sysbench (just for my qemu 
sparc64 tests)

#include <math.h>
unsigned long long max_prime = 2000;
void prime_test()
{
   unsigned long long c;
   unsigned long long l,t;
   unsigned long long n=0;
   /* So far we're using very simple test prime number tests in 64bit */
   for(c=3; c < max_prime; c++)
   {
     t = sqrt(c);
     for(l = 2; l <= t; l++)
       if (c % l == 0)
         break;
     if (l > t )
       n++;
   }
}
int main()
{
   int i;
   for (i = 0 ; i < 10000 ; i++)
   {
     prime_test();
   }
   return 0;
}



>
> > g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c -MMD -MP
> >
> > tcg-indirect: ~2:46.5
> > qemu.org-git: ~2:51.2 (worst result)
> > without-optimization: ~2:14.1 (best result)
>
> No compiler optimization?  I wouldn't expect there to be much for tcg to
> optimize there -- dropping values to memory all the time doesn't leave much.


without-optimization means qemu.org-git release build + undefine 
USE_TCG_OPTIMIZATIONS in tcg/tcg.c
or what compiler do you mean?


>
> >
> > stream results (STREAM version $Revision: 5.10 $)
> >
> > tcg-indirect: (worst result)
> >
> > Your clock granularity/precision appears to be 41 microseconds.
> > Each test below will take on the order of 632527 microseconds.
> >    (= 15427 clock ticks)
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:             320.8     0.511297     0.498785     0.590214
> > Scale:            187.0     0.858693     0.855465     0.863527
> > Add:              218.2     1.104654     1.099698     1.110341
> > Triad:            169.5     1.433273     1.416321     1.502248
> >
> > qemu.org-git: (best result)
> >
> > Your clock granularity/precision appears to be 42 microseconds.
> > Each test below will take on the order of 330428 microseconds.
> >     (= 7867 clock ticks)
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:             771.5     0.214717     0.207377     0.244214
> > Scale:            288.1     0.573320     0.555401     0.660161
> > Add:              423.5     0.633523     0.566661     1.092067
> > Triad:            242.9     1.053032     0.987970     1.499563
> >
> > without-optimization:
> >
> > Your clock granularity/precision appears to be 41 microseconds.
> > Each test below will take on the order of 745254 microseconds.
> >    (= 18176 clock ticks)
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:             316.6     0.524065     0.505313     0.580103
> > Scale:            200.5     0.813356     0.798024     0.840986
> > Add:              243.9     1.010247     0.984025     1.119149
> > Triad:            182.9     1.345601     1.312236     1.427459
>
> These results are weird.  Unoptimized less than half the speed of mainline?
> Improving optimization (with no extra work, mind) brings the results back down?


yep they are - it seems that the assumption of the involved developers
where speed can be improved / or slowbess comes from is not correct
how are SPARC64 benchmarks done usually?

>
>
> r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-21  6:05                                         ` Dennis Luehring
@ 2015-08-21 15:47                                           ` Richard Henderson
  2015-08-21 16:13                                             ` Aurelien Jarno
  2015-08-21 16:41                                             ` Dennis Luehring
  0 siblings, 2 replies; 80+ messages in thread
From: Richard Henderson @ 2015-08-21 15:47 UTC (permalink / raw)
  To: Dennis Luehring, Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel

On 08/20/2015 11:05 PM, Dennis Luehring wrote:
>> > g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c -MMD -MP
>> >
>> > tcg-indirect: ~2:46.5
>> > qemu.org-git: ~2:51.2 (worst result)
>> > without-optimization: ~2:14.1 (best result)
>>
>> No compiler optimization?  I wouldn't expect there to be much for tcg to
>> optimize there -- dropping values to memory all the time doesn't leave much.
> 
> 
> without-optimization means qemu.org-git release build + undefine
> USE_TCG_OPTIMIZATIONS in tcg/tcg.c
> or what compiler do you mean?

The one for compiling the benchmark: g++ -O2.

>> These results are weird.  Unoptimized less than half the speed of mainline?
>> Improving optimization (with no extra work, mind) brings the results back down?
> 
> 
> yep they are - it seems that the assumption of the involved developers
> where speed can be improved / or slowbess comes from is not correct
> how are SPARC64 benchmarks done usually?

*shrug* No different than any other...


r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-21 15:47                                           ` Richard Henderson
@ 2015-08-21 16:13                                             ` Aurelien Jarno
  2015-08-21 16:41                                             ` Dennis Luehring
  1 sibling, 0 replies; 80+ messages in thread
From: Aurelien Jarno @ 2015-08-21 16:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Artyom Tarasenko, Dennis Luehring, qemu-devel

On 2015-08-21 08:47, Richard Henderson wrote:
> On 08/20/2015 11:05 PM, Dennis Luehring wrote:
> >> > g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c -MMD -MP
> >> >
> >> > tcg-indirect: ~2:46.5
> >> > qemu.org-git: ~2:51.2 (worst result)
> >> > without-optimization: ~2:14.1 (best result)
> >>
> >> No compiler optimization?  I wouldn't expect there to be much for tcg to
> >> optimize there -- dropping values to memory all the time doesn't leave much.
> > 
> > 
> > without-optimization means qemu.org-git release build + undefine
> > USE_TCG_OPTIMIZATIONS in tcg/tcg.c
> > or what compiler do you mean?
> 
> The one for compiling the benchmark: g++ -O2.
> 
> >> These results are weird.  Unoptimized less than half the speed of mainline?
> >> Improving optimization (with no extra work, mind) brings the results back down?
> > 
> > 
> > yep they are - it seems that the assumption of the involved developers
> > where speed can be improved / or slowbess comes from is not correct
> > how are SPARC64 benchmarks done usually?
> 
> *shrug* No different than any other...

It would be interesting to know if the time taking to generate code is
actually used for code translation or code re-translation. The way the
MMU is modelled might triggered plenty of costly retranslation. This
happens for example on SH4, and to a lesser extent on MIPS.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-21 15:47                                           ` Richard Henderson
  2015-08-21 16:13                                             ` Aurelien Jarno
@ 2015-08-21 16:41                                             ` Dennis Luehring
  1 sibling, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-21 16:41 UTC (permalink / raw)
  To: Richard Henderson, Artyom Tarasenko, Aurelien Jarno; +Cc: qemu-devel

Am 21.08.2015 um 17:47 schrieb Richard Henderson:
> On 08/20/2015 11:05 PM, Dennis Luehring wrote:
> >> > g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c -MMD -MP
> >> >
> >> > tcg-indirect: ~2:46.5
> >> > qemu.org-git: ~2:51.2 (worst result)
> >> > without-optimization: ~2:14.1 (best result)
> >>
> >> No compiler optimization?  I wouldn't expect there to be much for tcg to
> >> optimize there -- dropping values to memory all the time doesn't leave much.
> >
> >
> > without-optimization means qemu.org-git release build + undefine
> > USE_TCG_OPTIMIZATIONS in tcg/tcg.c
> > or what compiler do you mean?
>
> The one for compiling the benchmark: g++ -O2.


for the overall speed comparision it  is not relevant if its -O0, -O2 or -O3
as long as all my test using always the same optimization

>
> >> These results are weird.  Unoptimized less than half the speed of mainline?
> >> Improving optimization (with no extra work, mind) brings the results back down?
> >
> >
> > yep they are - it seems that the assumption of the involved developers
> > where speed can be improved / or slowbess comes from is not correct
> > how are SPARC64 benchmarks done usually?
>
> *shrug* No different than any other..

so what benchmarks are in use?
are there any download/compile/installable around
some sort of default qemu performance tests?

> .
>
>
> r~
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-20 17:19                                   ` Richard Henderson
  2015-08-21  4:32                                     ` Dennis Luehring
@ 2015-08-22 16:45                                     ` Artyom Tarasenko
  2015-08-22 17:47                                       ` Dennis Luehring
  2015-08-23  0:41                                       ` Richard Henderson
  1 sibling, 2 replies; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-22 16:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Dennis Luehring, Aurelien Jarno, qemu-devel

On Thu, Aug 20, 2015 at 7:19 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/19/2015 07:41 AM, Artyom Tarasenko wrote:
>>
>> Without the patch:
>>
>>   time g++ -DHAVE_CONFIG_H -I. -I../binutils-gdb/gold
>> -I../binutils-gdb/gold -I../binutils-gdb/gold/../include
>> -I../binutils-gdb/gold/../elfcpp
>> -DLOCALEDIR="\"/usr/local/share/locale\""
>> -DBINDIR="\"/usr/local/bin\"" -DTOOLBINDIR="\"/usr/local//bin\""
>> -DTOOLLIBDIR="\"/usr/local//lib\""   -W -Wall    -Werror
>> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -frandom-seed=tilegx.o
>> -I../binutils-gdb/gold/../zlib -g -O2 -MT tilegx.o -MD -MP -MF
>> .deps/tilegx.Tpo -c -o tilegx.o ../binutils-gdb/gold/tilegx.cc
>>
>> real    18m31.407s
>> user    18m23.661s
>> sys     0m6.784s
>>
>> The patch surely improves the situation, tcg_optimize in the perf top
>> takes ~7% (instead of~12%), and the only function marked red by
>> perf-top is init_temp_info(). So with the patch:
>>
>> real    17m46.380s
>> user    17m37.522s
>> sys     0m7.120s
>>
>>
>> And if I completely disable optimizer (// #define
>> USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster:
>>
>> real    14m17.668s
>> user    14m10.241s
>> sys     0m6.060s
>
>
> This isn't surprising, because at the moment tcg optimizations are almost
> completely ineffective for sparc.  The way the register windows are
> implemented means that there are very few proper tcg temporaries to
> optimize.
>
> I've just updated an old branch that attempts to cure this.  It creates
> proper tcg temporaries for the windowed registers, and uses a bit of
> recursion to find the place at which they should be stored.
>
>   git://github.com/rth7680/qemu.git tcg-indirect
>
> With a few quick unscientific tests, it appears to help.  It would be nice
> to put that branch side-by-side with your tests above.

Sorry for the delay with testing.

For my test case tcg-indirect brings more performance gain than for Dennis:

git master: 18m31s
tcg-indirect: 16m50s
#undef  USE_TCG_OPTIMIZATIONS: 14m18s


JIT statistic, before starting the test:
(qemu) info jit
Translation buffer state:
gen code size       31851136/314448896
TB count            128224/2457592
TB avg target size  18 max=704 bytes
TB avg host size    248 bytes (expansion ratio: 13.4)
cross page TB count 0 (0%)
direct jump count   83840 (65%) (2 jumps=64730 50%)

Statistics:
TB flush count      5
TB invalidate count 317160
TLB flush count     1180769
[TCG profiler not compiled]

After
(qemu) info jit
Translation buffer state:
gen code size       282903344/314448896
TB count            1139744/2457592
TB avg target size  17 max=704 bytes
TB avg host size    248 bytes (expansion ratio: 14.0)
cross page TB count 0 (0%)
direct jump count   739828 (64%) (2 jumps=569074 49%)

Statistics:
TB flush count      5
TB invalidate count 324362
TLB flush count     2050744

So, TB invalidate count gained only ~ 5000.
Yet tcg_optimize is ~7% in the perf top, and tcg_liveness_analysis
~3%. Why do we translate so much?


Artyom

-- 
Regards,
Artyom Tarasenko
16m50.161s
SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-22 16:45                                     ` Artyom Tarasenko
@ 2015-08-22 17:47                                       ` Dennis Luehring
  2015-08-22 18:53                                         ` Artyom Tarasenko
  2015-08-23  0:41                                       ` Richard Henderson
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-22 17:47 UTC (permalink / raw)
  To: Artyom Tarasenko, Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

Am 22.08.2015 um 18:45 schrieb Artyom Tarasenko:
> git master: 18m31s
> tcg-indirect: 16m50s
> #undef  USE_TCG_OPTIMIZATIONS: 14m18s

my results are not totaly different to yours - ~20-30% slowdown compared 
to #undef USE_TCG_OPTIMIZATIONS

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-22 17:47                                       ` Dennis Luehring
@ 2015-08-22 18:53                                         ` Artyom Tarasenko
  2015-08-23 12:11                                           ` Dennis Luehring
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-22 18:53 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Aurelien Jarno, Richard Henderson

On Sat, Aug 22, 2015 at 7:47 PM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 22.08.2015 um 18:45 schrieb Artyom Tarasenko:
>>
>> git master: 18m31s
>> tcg-indirect: 16m50s
>> #undef  USE_TCG_OPTIMIZATIONS: 14m18s
>
>
> my results are not totaly different to yours - ~20-30% slowdown compared to
> #undef USE_TCG_OPTIMIZATIONS

Compared with #undef USE_TCG_OPTIMIZATIONS , they are similar, yes.
Compared with vanilla master I get a more noticeable improvement.


-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-22 16:45                                     ` Artyom Tarasenko
  2015-08-22 17:47                                       ` Dennis Luehring
@ 2015-08-23  0:41                                       ` Richard Henderson
  2015-08-26 16:17                                         ` Artyom Tarasenko
  1 sibling, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2015-08-23  0:41 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: Dennis Luehring, qemu-devel, Aurelien Jarno

On Aug 22, 2015 9:45 AM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
> For my test case tcg-indirect brings more performance gain than for Dennis: 
>
> git master: 18m31s 
> tcg-indirect: 16m50s 
> #undef  USE_TCG_OPTIMIZATIONS: 14m18s 

Thanks.  That's useful.

>
>
> JIT statistic, before starting the test: 
> (qemu) info jit 
> Translation buffer state: 
> gen code size       31851136/314448896 
> TB count            128224/2457592 
> TB avg target size  18 max=704 bytes 
> TB avg host size    248 bytes (expansion ratio: 13.4) 
> cross page TB count 0 (0%) 
> direct jump count   83840 (65%) (2 jumps=64730 50%) 
>
> Statistics: 
> TB flush count      5 
> TB invalidate count 317160 
> TLB flush count     1180769 
> [TCG profiler not compiled] 
>
> After 
> (qemu) info jit 
> Translation buffer state: 
> gen code size       282903344/314448896 
> TB count            1139744/2457592 
> TB avg target size  17 max=704 bytes 
> TB avg host size    248 bytes (expansion ratio: 14.0) 
> cross page TB count 0 (0%) 
> direct jump count   739828 (64%) (2 jumps=569074 49%) 
>
> Statistics: 
> TB flush count      5 
> TB invalidate count 324362 
> TLB flush count     2050744 
>
> So, TB invalidate count gained only ~ 5000. 
> Yet tcg_optimize is ~7% in the perf top, and tcg_liveness_analysis 
> ~3%. Why do we translate so much? 

I don't know.  It must be something SPARC specific, as I don't see so much for alpha.

I'll try to think of good places to collect data.


r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-22 18:53                                         ` Artyom Tarasenko
@ 2015-08-23 12:11                                           ` Dennis Luehring
  0 siblings, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-08-23 12:11 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno, Richard Henderson

Am 22.08.2015 um 20:53 schrieb Artyom Tarasenko:
> Compared with #undef USE_TCG_OPTIMIZATIONS , they are similar, yes.
> Compared with vanilla master I get a more noticeable improvement.

my test suffering less from the Aurelien Jarno described Sparc32->x86_64 
"translation" if you're still using debian 7.8.0 SPARC64 default gcc and 
buildtools (these are 32bits) under a 64bit host
thats why im using NetBSD SPARC64 (a pure 64bit version) and a ramdisk - 
to reduce down the noise

what would like to have is:

-list of compile, runtime-tests like yours, the pugixml.cpp, stream.c, 
prime.c etc. that are "officialy" approved :) to be meaningfull

-32bit/64bit build of qemu in all variants (master, patched, 
without-tgc-optimization)
-NetBSD SPARC/SPARC64 installation running from ramdisk in all qemu variants

to get a overall feeling of what is/could be wrong/strange/wired - 
according to all the answers/ideas as response to my results

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-23  0:41                                       ` Richard Henderson
@ 2015-08-26 16:17                                         ` Artyom Tarasenko
  2015-08-26 19:47                                           ` Richard Henderson
  0 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-26 16:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Dennis Luehring, qemu-devel, Aurelien Jarno

Hi Richard,

On Sun, Aug 23, 2015 at 2:41 AM, Richard Henderson <rth@twiddle.net> wrote:
> On Aug 22, 2015 9:45 AM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>> For my test case tcg-indirect brings more performance gain than for Dennis:
>>
>> git master: 18m31s
>> tcg-indirect: 16m50s
>> #undef  USE_TCG_OPTIMIZATIONS: 14m18s
>
> Thanks.  That's useful.
>
>>
>>
>> JIT statistic, before starting the test:
>> (qemu) info jit
>> Translation buffer state:
>> gen code size       31851136/314448896
>> TB count            128224/2457592
>> TB avg target size  18 max=704 bytes
>> TB avg host size    248 bytes (expansion ratio: 13.4)
>> cross page TB count 0 (0%)
>> direct jump count   83840 (65%) (2 jumps=64730 50%)
>>
>> Statistics:
>> TB flush count      5
>> TB invalidate count 317160
>> TLB flush count     1180769
>> [TCG profiler not compiled]
>>
>> After
>> (qemu) info jit
>> Translation buffer state:
>> gen code size       282903344/314448896
>> TB count            1139744/2457592
>> TB avg target size  17 max=704 bytes
>> TB avg host size    248 bytes (expansion ratio: 14.0)
>> cross page TB count 0 (0%)
>> direct jump count   739828 (64%) (2 jumps=569074 49%)
>>
>> Statistics:
>> TB flush count      5
>> TB invalidate count 324362
>> TLB flush count     2050744
>>
>> So, TB invalidate count gained only ~ 5000.
>> Yet tcg_optimize is ~7% in the perf top, and tcg_liveness_analysis
>> ~3%. Why do we translate so much?
>
> I don't know.  It must be something SPARC specific, as I don't see so much for alpha.

After some debugging I think it's caused by memory faults. On every
MMU miss / access fault
TB is re-translated multiple times till the faulting instruction is found.

This happens gen_intermediate_code_internal when it's called with spc==true.

AFAICT we produce data/access faults only on load/store instructions, i.e.
if GET_FIELD(insn, 0, 1)  == 3. Can this knowledge be used to reduce
the number of re-translations?

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-26 16:17                                         ` Artyom Tarasenko
@ 2015-08-26 19:47                                           ` Richard Henderson
  2015-08-27  5:54                                             ` Dennis Luehring
  2015-08-27 15:58                                             ` Artyom Tarasenko
  0 siblings, 2 replies; 80+ messages in thread
From: Richard Henderson @ 2015-08-26 19:47 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: Dennis Luehring, qemu-devel, Aurelien Jarno

On 08/26/2015 09:17 AM, Artyom Tarasenko wrote:
> After some debugging I think it's caused by memory faults. On every
> MMU miss / access fault
> TB is re-translated multiple times till the faulting instruction is found.

That shouldn't happen.  Are you certain it's not multiple MMU misses/faults?

> AFAICT we produce data/access faults only on load/store instructions, i.e.
> if GET_FIELD(insn, 0, 1)  == 3. Can this knowledge be used to reduce
> the number of re-translations?

No.

 From the fault, we have a host address where the fault occured.  We then 
retranslate the TB looking for what guest address corresponds to the code 
generated at the host address.  This is a one-pass process, not the multiple 
passes you seem to be imagining.  It also means we can't skip non-memory insns 
during retranslation, as the host addresses would no longer line up.

That said, sun4u is a software managed tlb, which requires *lots* more extra 
faults than a hardware managed tlb.  In the later case, we can perform the page 
table lookup and then continue the memory instruction without faulting.

I think that implementing sun4v, with (most of) the hypervisor actually within 
qemu, is the only way to get good performance for Sparc.

Anyway, this sort of setup is exactly what I did for Alpha.  The PALcode 
(hypervisor-ish) layer used for qemu looks nothing like the PALcode layer used 
for real hardware.


r~

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-26 19:47                                           ` Richard Henderson
@ 2015-08-27  5:54                                             ` Dennis Luehring
  2015-08-27 15:04                                               ` Richard Henderson
  2015-08-27 15:58                                             ` Artyom Tarasenko
  1 sibling, 1 reply; 80+ messages in thread
From: Dennis Luehring @ 2015-08-27  5:54 UTC (permalink / raw)
  To: Richard Henderson, Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno

Am 26.08.2015 um 21:47 schrieb Richard Henderson:
> Anyway, this sort of setup is exactly what I did for Alpha.  The PALcode
> (hypervisor-ish) layer used for qemu looks nothing like the PALcode layer used
> for real hardware.

can post your qemu parameters for installing/starting your alpha 
emulation - i want to do the same benchmarks on your
prefered :) platform but i just get PCI-Errors on boot

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-27  5:54                                             ` Dennis Luehring
@ 2015-08-27 15:04                                               ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2015-08-27 15:04 UTC (permalink / raw)
  To: Dennis Luehring, Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno

[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

On 08/26/2015 10:54 PM, Dennis Luehring wrote:
> Am 26.08.2015 um 21:47 schrieb Richard Henderson:
>> Anyway, this sort of setup is exactly what I did for Alpha.  The PALcode
>> (hypervisor-ish) layer used for qemu looks nothing like the PALcode layer used
>> for real hardware.
>
> can post your qemu parameters for installing/starting your alpha emulation - i
> want to do the same benchmarks on your
> prefered :) platform but i just get PCI-Errors on boot

I use virtio for everything.

I've thought from time to time to improve the normal device emulation, just to 
make initial installs easier.  In the meantime you'll probably have to build 
your own kernel with virtio built-in.  I've attached a config file that I used 
once (it looks quite old, as if I've been failing to update it as kernel 
parameters change, but it should be good as a starting point).

I use gentoo, as one of the very few distros that still support alpha.

... or were you trying to use NetBSD?  I've never actually tried that.


r~

[-- Attachment #2: gen-install --]
[-- Type: text/plain, Size: 429 bytes --]

#!/bin/sh
exec ../bld-nat/alpha-softmmu/qemu-system-alpha -m 1G \
  -net nic,vlan=0,model=virtio -net user,vlan=0 \
  -drive file=gen-root.img,if=virtio,cache=none \
  -drive file=install-alpha-minimal-20130706.iso,if=virtio,readonly \
  -drive file=stage3-alpha-20130706.tar.bz2,if=virtio,readonly \
  -kernel ./vmlinux -initrd gentoo.igz \
  -append "root=/dev/ram0 init=/linuxrc looptype=squashfs loop=/image.squashfs cdroot"

[-- Attachment #3: gen-run --]
[-- Type: text/plain, Size: 281 bytes --]

#!/bin/sh
exec ../run/bin/qemu-system-alpha -m 1G \
  -net nic,vlan=0,model=virtio,macaddr=52:54:00:12:34:1 \
  -net bridge,vlan=0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \
  -drive file=./alpha1.img,if=virtio \
  -vnc :0 \
  -kernel ./vmlinux -append "root=/dev/vda2 ro"

[-- Attachment #4: config-qemu --]
[-- Type: text/plain, Size: 52129 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/alpha 3.10.0 Kernel Configuration
#
CONFIG_ALPHA=y
CONFIG_64BIT=y
CONFIG_MMU=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ZONE_DMA=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE="alphaev67-linux-"
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="qemu-3"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_FHANDLE=y
CONFIG_AUDIT=y
# CONFIG_AUDIT_LOGINUID_IMMUTABLE is not set
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_PREEMPT_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
# CONFIG_MEMCG_SWAP_ENABLED is not set
CONFIG_MEMCG_KMEM=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_UIDGID_CONVERTED=y
# CONFIG_UIDGID_STRICT_TYPE_CHECKS is not set
CONFIG_SCHED_AUTOGROUP=y
CONFIG_MM_OWNER=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
# CONFIG_EXPERT is not set
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_PCI_QUIRKS=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_PROFILING is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_HAVE_64BIT_ALIGNED_ACCESS=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_ODD_RT_SIGACTION=y
CONFIG_OLD_SIGSUSPEND=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
CONFIG_OSF_PARTITION=y
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_CFQ_GROUP_IOSCHED=y
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_ASN1=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_FREEZER=y

#
# System setup
#
# CONFIG_ALPHA_GENERIC is not set
# CONFIG_ALPHA_ALCOR is not set
# CONFIG_ALPHA_XL is not set
# CONFIG_ALPHA_BOOK1 is not set
# CONFIG_ALPHA_AVANTI_CH is not set
# CONFIG_ALPHA_CABRIOLET is not set
CONFIG_ALPHA_DP264=y
# CONFIG_ALPHA_EB164 is not set
# CONFIG_ALPHA_EB64P_CH is not set
# CONFIG_ALPHA_EB66 is not set
# CONFIG_ALPHA_EB66P is not set
# CONFIG_ALPHA_EIGER is not set
# CONFIG_ALPHA_JENSEN is not set
# CONFIG_ALPHA_LX164 is not set
# CONFIG_ALPHA_LYNX is not set
# CONFIG_ALPHA_MARVEL is not set
# CONFIG_ALPHA_MIATA is not set
# CONFIG_ALPHA_MIKASA is not set
# CONFIG_ALPHA_NAUTILUS is not set
# CONFIG_ALPHA_NONAME_CH is not set
# CONFIG_ALPHA_NORITAKE is not set
# CONFIG_ALPHA_PC164 is not set
# CONFIG_ALPHA_P2K is not set
# CONFIG_ALPHA_RAWHIDE is not set
# CONFIG_ALPHA_RUFFIAN is not set
# CONFIG_ALPHA_RX164 is not set
# CONFIG_ALPHA_SX164 is not set
# CONFIG_ALPHA_SABLE is not set
# CONFIG_ALPHA_SHARK is not set
# CONFIG_ALPHA_TAKARA is not set
# CONFIG_ALPHA_TITAN is not set
# CONFIG_ALPHA_WILDFIRE is not set
CONFIG_ISA=y
CONFIG_ISA_DMA_API=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_SYSCALL=y
CONFIG_IOMMU_HELPER=y
CONFIG_ALPHA_EV6=y
CONFIG_ALPHA_TSUNAMI=y
CONFIG_ALPHA_EV67=y
CONFIG_VGA_HOSE=y
CONFIG_ALPHA_QEMU=y
CONFIG_ALPHA_SRM=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_SMP is not set
# CONFIG_ARCH_DISCONTIGMEM_ENABLE is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_NEED_PER_CPU_KM=y
CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
# CONFIG_ZBUD is not set
# CONFIG_ZSWAP is not set
# CONFIG_VERBOSE_MCHECK is not set
# CONFIG_HZ_32 is not set
CONFIG_HZ_64=y
# CONFIG_HZ_128 is not set
# CONFIG_HZ_256 is not set
# CONFIG_HZ_1024 is not set
# CONFIG_HZ_1200 is not set
CONFIG_HZ=64
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y

#
# PCI host controller drivers
#
# CONFIG_PCCARD is not set
CONFIG_SRM_ENV=y
CONFIG_BINFMT_ELF=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_HAVE_AOUT=y
# CONFIG_BINFMT_AOUT is not set
# CONFIG_BINFMT_EM86 is not set
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=m
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_ARPD=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=y
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
# CONFIG_IPV6_GRE is not set
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_ACCT=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
# CONFIG_NF_CONNTRACK_TIMEOUT is not set
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=m
CONFIG_NF_CT_PROTO_GRE=m
CONFIG_NF_CT_PROTO_SCTP=m
CONFIG_NF_CT_PROTO_UDPLITE=m
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_NEEDED=y
CONFIG_NF_NAT_PROTO_DCCP=m
CONFIG_NF_NAT_PROTO_UDPLITE=m
CONFIG_NF_NAT_PROTO_SCTP=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NETFILTER_TPROXY=m
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
CONFIG_NETFILTER_XT_TARGET_LED=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_NFACCT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=m
CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
CONFIG_IP_VS_PROTO_SCTP=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_CONNTRACK_IPV4=m
# CONFIG_NF_CONNTRACK_PROC_COMPAT is not set
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_NF_NAT_IPV4=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PROTO_GRE=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_CONNTRACK_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_NF_NAT_IPV6=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
# CONFIG_IP6_NF_TARGET_NPT is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_IP6=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_ULOG=m
CONFIG_BRIDGE_EBT_NFLOG=m
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m

#
# DCCP CCIDs Configuration
#
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=y
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_TFRC_LIB=y

#
# DCCP Kernel Hacking
#
# CONFIG_IP_DCCP_DEBUG is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_SCTP_COOKIE_HMAC_SHA1=y
CONFIG_RDS=m
CONFIG_RDS_TCP=m
# CONFIG_RDS_DEBUG is not set
CONFIG_TIPC=m
CONFIG_TIPC_PORTS=8192
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_MRP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_BRIDGE_VLAN_FILTERING=y
CONFIG_HAVE_NET_DSA=y
CONFIG_NET_DSA=m
CONFIG_NET_DSA_TAG_DSA=y
CONFIG_NET_DSA_TAG_EDSA=y
CONFIG_NET_DSA_TAG_TRAILER=y
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
CONFIG_IPX=m
# CONFIG_IPX_INTERN is not set
CONFIG_ATALK=m
CONFIG_DEV_APPLETALK=m
# CONFIG_LTPC is not set
# CONFIG_COPS is not set
CONFIG_IPDDP=m
CONFIG_IPDDP_ENCAP=y
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_IEEE802154=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
CONFIG_NET_EMATCH_IPSET=m
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
CONFIG_NET_CLS_IND=y
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
CONFIG_DNS_RESOLVER=m
CONFIG_BATMAN_ADV=m
CONFIG_BATMAN_ADV_BLA=y
CONFIG_BATMAN_ADV_DAT=y
# CONFIG_BATMAN_ADV_NC is not set
# CONFIG_BATMAN_ADV_DEBUG is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=y
CONFIG_VSOCKETS=m
# CONFIG_NETLINK_MMAP is not set
# CONFIG_NETLINK_DIAG is not set
# CONFIG_NET_MPLS_GSO is not set
CONFIG_NETPRIO_CGROUP=m
CONFIG_NET_LL_RX_POLL=y
CONFIG_BQL=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_DMA_SHARED_BUFFER=y

#
# Bus devices
#
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
# CONFIG_PNP is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_CRYPTOLOOP=m
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_NVME is not set
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_VIRTIO_BLK=y
# CONFIG_BLK_DEV_HD is not set
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_BLK_DEV_RSXX is not set

#
# Misc devices
#
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ATMEL_SSC is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
# CONFIG_ISL29020 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_SENSORS_BH1780 is not set
# CONFIG_SENSORS_BH1770 is not set
# CONFIG_SENSORS_APDS990X is not set
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
# CONFIG_BMP085_I2C is not set
# CONFIG_PCH_PHUB is not set
# CONFIG_USB_SWITCH_FSA9480 is not set
# CONFIG_SRAM is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
CONFIG_EEPROM_AT24=m
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# CONFIG_SENSORS_LIS3_I2C is not set

#
# Altera FPGA firmware download module
#
# CONFIG_ALTERA_STAPL is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
# CONFIG_SCSI is not set
# CONFIG_SCSI_DMA is not set
# CONFIG_SCSI_NETLINK is not set
# CONFIG_ATA is not set
# CONFIG_MD is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# CONFIG_I2O is not set
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_BONDING=m
# CONFIG_DUMMY is not set
# CONFIG_EQUALIZER is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
CONFIG_TUN=m
# CONFIG_VETH is not set
CONFIG_VIRTIO_NET=y
# CONFIG_NLMON is not set
# CONFIG_ARCNET is not set
# CONFIG_ATM_DRIVERS is not set

#
# CAIF transport drivers
#

#
# Distributed Switch Architecture drivers
#
CONFIG_NET_DSA_MV88E6XXX=m
CONFIG_NET_DSA_MV88E6060=m
CONFIG_NET_DSA_MV88E6XXX_NEED_PPU=y
CONFIG_NET_DSA_MV88E6131=m
CONFIG_NET_DSA_MV88E6123_61_65=m
# CONFIG_ETHERNET is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
# CONFIG_AT803X_PHY is not set
# CONFIG_AMD_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_MDIO_BITBANG is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_WLAN is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
# CONFIG_IEEE802154_DRIVERS is not set
# CONFIG_VMXNET3 is not set
# CONFIG_ISDN is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=m
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
# CONFIG_MOUSE_PS2_SENTELIC is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_CYAPA is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
# CONFIG_SERIO_ARC_PS2 is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set
# CONFIG_N_GSM is not set
# CONFIG_TRACE_SINK is not set
# CONFIG_DEVKMEM is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_DW is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MFD_HSU is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_TIMBERDALE is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_PCH_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
# CONFIG_TCG_TPM is not set
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
# CONFIG_I2C_COMPAT is not set
# CONFIG_I2C_CHARDEV is not set
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EG20T is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA_PCI is not set
# CONFIG_I2C_SIMTEC is not set
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_ELEKTOR is not set
# CONFIG_I2C_PCA_ISA is not set
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_SPI is not set
# CONFIG_HSI is not set

#
# PPS support
#
CONFIG_PPS=m
# CONFIG_PPS_DEBUG is not set
# CONFIG_NTP_PPS is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
# CONFIG_PPS_CLIENT_LDISC is not set
# CONFIG_PPS_CLIENT_GPIO is not set

#
# PPS generators support
#

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set
# CONFIG_DP83640_PHY is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
CONFIG_GPIO_DEVRES=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_BATTERY_BQ27x00 is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_SMB347 is not set
# CONFIG_BATTERY_GOLDFISH is not set
CONFIG_POWER_RESET=y
# CONFIG_POWER_AVS is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=m
# CONFIG_MFD_CROS_EC is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_RETU is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RTSX_PCI is not set
# CONFIG_MFD_SI476X_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65217 is not set
CONFIG_MFD_WL1273_CORE=m
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
CONFIG_DRM=m
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
CONFIG_DRM_TTM=m

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
CONFIG_DRM_I2C_NXP_TDA998X=m
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_NOUVEAU is not set
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
# CONFIG_DRM_VMWGFX is not set
# CONFIG_DRM_AST is not set
# CONFIG_DRM_MGAG200 is not set
CONFIG_DRM_CIRRUS_QEMU=m
CONFIG_DRM_QXL=m
# CONFIG_VGASTATE is not set
CONFIG_VIDEO_OUTPUT_CONTROL=m
CONFIG_HDMI=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
# CONFIG_FB_CFB_FILLRECT is not set
# CONFIG_FB_CFB_COPYAREA is not set
# CONFIG_FB_CFB_IMAGEBLIT is not set
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
# CONFIG_FB_MODE_HELPERS is not set
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_TGA is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_TMIO is not set
# CONFIG_FB_GOLDFISH is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
# CONFIG_FB_AUO_K190X is not set
# CONFIG_EXYNOS_VIDEO is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_LCD_PLATFORM=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630 is not set
# CONFIG_BACKLIGHT_LM3639 is not set
CONFIG_BACKLIGHT_LP855X=m

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_LOGO_DEC_CLUT224=y
# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
CONFIG_HID_ACRUX=m
CONFIG_HID_ACRUX_FF=y
CONFIG_HID_APPLE=y
CONFIG_HID_AUREAL=m
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
CONFIG_HID_DRAGONRISE=m
CONFIG_DRAGONRISE_FF=y
CONFIG_HID_EMS_FF=m
CONFIG_HID_ELECOM=m
CONFIG_HID_EZKEY=y
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
CONFIG_HID_UCLOGIC=m
CONFIG_HID_WALTOP=m
CONFIG_HID_GYRATION=m
CONFIG_HID_ICADE=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=y
CONFIG_HID_LCPOWER=m
CONFIG_HID_LOGITECH=y
CONFIG_HID_LOGITECH_DJ=m
CONFIG_LOGITECH_FF=y
CONFIG_LOGIRUMBLEPAD2_FF=y
CONFIG_LOGIG940_FF=y
CONFIG_LOGIWHEELS_FF=y
CONFIG_HID_MAGICMOUSE=m
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
CONFIG_HID_MULTITOUCH=m
CONFIG_HID_ORTEK=m
CONFIG_HID_PANTHERLORD=m
CONFIG_PANTHERLORD_FF=y
CONFIG_HID_PETALYNX=m
CONFIG_HID_PICOLCD=m
CONFIG_HID_PICOLCD_FB=y
CONFIG_HID_PICOLCD_BACKLIGHT=y
CONFIG_HID_PICOLCD_LCD=y
CONFIG_HID_PICOLCD_LEDS=y
CONFIG_HID_PRIMAX=m
CONFIG_HID_SAITEK=m
CONFIG_HID_SAMSUNG=m
CONFIG_HID_SPEEDLINK=m
CONFIG_HID_STEELSERIES=m
CONFIG_HID_SUNPLUS=m
CONFIG_HID_GREENASIA=m
CONFIG_GREENASIA_FF=y
CONFIG_HID_SMARTJOYPLUS=m
CONFIG_SMARTJOYPLUS_FF=y
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
CONFIG_HID_THINGM=m
CONFIG_HID_THRUSTMASTER=m
CONFIG_THRUSTMASTER_FF=y
CONFIG_HID_WACOM=m
CONFIG_HID_WIIMOTE=m
CONFIG_HID_ZEROPLUS=m
CONFIG_ZEROPLUS_FF=y
CONFIG_HID_ZYDACRON=m
# CONFIG_HID_SENSOR_HUB is not set

#
# I2C HID support
#
CONFIG_I2C_HID=m
# CONFIG_USB_SUPPORT is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
CONFIG_LEDS_LM3530=m
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
CONFIG_LEDS_LP3944=m
CONFIG_LEDS_LP55XX_COMMON=m
CONFIG_LEDS_LP5521=m
CONFIG_LEDS_LP5523=m
# CONFIG_LEDS_LP5562 is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA9633 is not set
# CONFIG_LEDS_BD2802 is not set
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_LM355x is not set
# CONFIG_LEDS_OT200 is not set
CONFIG_LEDS_BLINKM=m

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_ONESHOT=m
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
# CONFIG_LEDS_TRIGGER_CPU is not set
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
# CONFIG_LEDS_TRIGGER_CAMERA is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_SYSTOHC=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_DEBUG is not set

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_DS3232 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_ISL12022 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF2127 is not set
# CONFIG_RTC_DRV_PCF8523 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_BQ32K is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set
# CONFIG_RTC_DRV_EM3027 is not set
# CONFIG_RTC_DRV_RV3029C2 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_ALPHA=y
CONFIG_RTC_DRV_ALPHA_QEMU=y
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set
# CONFIG_RTC_DRV_DS2404 is not set

#
# on-CPU RTC drivers
#

#
# HID Sensor RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VIRT_DRIVERS=y
CONFIG_VIRTIO=y

#
# Virtio drivers
#
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
CONFIG_VIRTIO_MMIO=y
# CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_STAGING is not set

#
# Hardware Spinlock drivers
#
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# CONFIG_MAILBOX is not set
# CONFIG_IOMMU_SUPPORT is not set

#
# Remoteproc drivers
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers
#
# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set
# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set
# CONFIG_FMC is not set

#
# File systems
#
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT23=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_NILFS2_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set
CONFIG_GENERIC_ACL=y

#
# Caches
#
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_FSCACHE_OBJECT_LIST=y
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
# CONFIG_MSDOS_FS is not set
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
# CONFIG_HUGETLB_PAGE is not set
CONFIG_CONFIGFS_FS=y
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_LOGFS is not set
# CONFIG_CRAMFS is not set
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
# CONFIG_SQUASHFS_LZO is not set
CONFIG_SQUASHFS_XZ=y
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_F2FS_FS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
# CONFIG_NLS_MAC_ICELAND is not set
# CONFIG_NLS_MAC_INUIT is not set
# CONFIG_NLS_MAC_ROMANIAN is not set
# CONFIG_NLS_MAC_TURKISH is not set
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4
CONFIG_BOOT_PRINTK_DELAY=y
CONFIG_DYNAMIC_DEBUG=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_READABLE_ASM is not set
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_FS=y
CONFIG_HEADERS_CHECK=y
# CONFIG_DEBUG_SECTION_MISMATCH is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
# CONFIG_DEBUG_STACK_USAGE is not set
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_RB is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_SHIRQ=y

#
# Debug Lockups and Hangs
#
CONFIG_LOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_SPARSE_RCU_POINTER=y
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
# CONFIG_FAULT_INJECTION is not set

#
# Runtime Testing
#
# CONFIG_LKDTM is not set
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_TEST_STRING_HELPERS is not set
CONFIG_TEST_KSTRTOX=y
CONFIG_BUILD_DOCSRC=y
# CONFIG_SAMPLES is not set
CONFIG_EARLY_PRINTK=y
CONFIG_ALPHA_LEGACY_START_ADDRESS=y
CONFIG_MATHEMU=y

#
# Security options
#
CONFIG_KEYS=y
CONFIG_ENCRYPTED_KEYS=m
CONFIG_KEYS_DEBUG_PROC_KEYS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
# CONFIG_SECURITY_PATH is not set
CONFIG_LSM_MMAP_MIN_ADDR=65536
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
# CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_YAMA is not set
# CONFIG_IMA is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_DEFAULT_SECURITY="selinux"
CONFIG_XOR_BLOCKS=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=m
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=m
CONFIG_CRYPTO_SEQIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_GHASH=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=m
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_LZ4 is not set
# CONFIG_CRYPTO_LZ4HC is not set

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_HW=y
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_PUBLIC_KEY_ALGO_RSA=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_BINARY_PRINTF is not set

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IO=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
CONFIG_AUDIT_GENERIC=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_LZO_COMPRESS=m
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
# CONFIG_XZ_DEC_POWERPC is not set
# CONFIG_XZ_DEC_IA64 is not set
# CONFIG_XZ_DEC_ARM is not set
# CONFIG_XZ_DEC_ARMTHUMB is not set
# CONFIG_XZ_DEC_SPARC is not set
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_DQL=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_AVERAGE=y
CONFIG_CLZ_TAB=y
CONFIG_CORDIC=m
# CONFIG_DDR is not set
CONFIG_MPILIB=y
CONFIG_OID_REGISTRY=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-07-28  7:52 [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? Dennis Luehring
  2015-07-28  9:54 ` Artyom Tarasenko
  2015-07-29  9:17 ` Karel Gardas
@ 2015-08-27 15:29 ` Artyom Tarasenko
  2015-09-02  4:34   ` Dennis Luehring
  2 siblings, 1 reply; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-27 15:29 UTC (permalink / raw)
  To: Dennis Luehring; +Cc: qemu-devel, Aurelien Jarno, Richard Henderson

On Tue, Jul 28, 2015 at 9:52 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> (i've posted the question already on qemu-discuss@nongnu.org but was toled
> to better use this mailing list)
>
> i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
> development before-real-hardware big-endian/unaligned tests
>
> i've benchmarked compiling of single pugixml.cpp
> (https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)
>
> qemu-system-sparc64: >180sek
> x64 native : ~ 2sek
>
> so my sparc64 emulation is around 90 times slower then native x64
>
> my system:
>
> using lastest qemu git 2.3.x, with virtio for harddisk/network and qcow2
> image
>
> https://depositfiles.com/files/sj20aqwp0 (~280MB
> press the "regular download" button, wait some seconds, solve the
> chapca, "download file in regular mode by browser"
>
> there is pugi_sparc.txt in the 7z which describes how to start,use and
> what is installed in the image
>
> qemu runs natively under a ubuntu 15.04 (x64), Core i7, 8GB system doing
> nothing but qemu
>

Since the guest  g++ performance problems are caused by MMU emulation,
I think the fastest solution at the moment would be using the user
mode emulation instead of the full system emulation. You can try
mounting your debian disk image with guestfish (or nbd) on your ubuntu
host and chroot into it with statically built qemu-sparc32plus (for
the released Debian/sparc) or statically built qemu-sparc64 (for the
unreleased Debian/sparc64) as described in [1] and [2]. I haven't
tried launching g++, but at least some /bin utilities used to work
with qemu-sparc32plus, at least back in 2011 [2].
NB: I think mixing sparc32plus and sparc64 binaries would not work,
but it should not be a problem, since the userspace of the released
Debian/sparc is pure sparc32plus and the userspace of the unreleased
Debian/sparc64 is pure sparc64.

Artyom

1. https://wiki.debian.org/QemuUserEmulation
2. http://tyom.blogspot.de/2011/07/user-mode-emulation-for-linuxsparc64.html


-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-26 19:47                                           ` Richard Henderson
  2015-08-27  5:54                                             ` Dennis Luehring
@ 2015-08-27 15:58                                             ` Artyom Tarasenko
  1 sibling, 0 replies; 80+ messages in thread
From: Artyom Tarasenko @ 2015-08-27 15:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Dennis Luehring, qemu-devel, Aurelien Jarno

On Wed, Aug 26, 2015 at 9:47 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/26/2015 09:17 AM, Artyom Tarasenko wrote:
>>
>> After some debugging I think it's caused by memory faults. On every
>> MMU miss / access fault
>> TB is re-translated multiple times till the faulting instruction is found.
>
>
> That shouldn't happen.  Are you certain it's not multiple MMU misses/faults?

You are right. These are multiple faults.

>> AFAICT we produce data/access faults only on load/store instructions, i.e.
>> if GET_FIELD(insn, 0, 1)  == 3. Can this knowledge be used to reduce
>> the number of re-translations?
>
>
> No.
>
> From the fault, we have a host address where the fault occured.  We then
> retranslate the TB looking for what guest address corresponds to the code
> generated at the host address.  This is a one-pass process, not the multiple
> passes you seem to be imagining.  It also means we can't skip non-memory
> insns during retranslation, as the host addresses would no longer line up.

Right, thanks for clarifying this. I was confused by the multiple
"Search PC..." messages.
But I see now that one message corresponds to one translated instruction,
not to one translation block. The log message should probably be moved
one level up, and btw it's not sparc-specific, this part seems to be a
copy-pasta in multiple targets.

> That said, sun4u is a software managed tlb, which requires *lots* more extra
> faults than a hardware managed tlb.  In the later case, we can perform the
> page table lookup and then continue the memory instruction without faulting.
>
> I think that implementing sun4v, with (most of) the hypervisor actually
> within qemu, is the only way to get good performance for Sparc.

I thought about it. The guest can profit from the knowledge it is executed
virtualized in multiple ways.

The problem with this approach is that Linux/sparc64 is currently not
the primary target OS for me.
And the legacy OSes do not support sun4v. Even a Solaris 8 has a quite
limited support for it.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
  2015-08-27 15:29 ` Artyom Tarasenko
@ 2015-09-02  4:34   ` Dennis Luehring
  0 siblings, 0 replies; 80+ messages in thread
From: Dennis Luehring @ 2015-09-02  4:34 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno, Richard Henderson

i will try that - thx

Am 27.08.2015 um 17:29 schrieb Artyom Tarasenko:
> On Tue, Jul 28, 2015 at 9:52 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> > (i've posted the question already on qemu-discuss@nongnu.org but was toled
> > to better use this mailing list)
> >
> > i've prepared an Debian 7.8.0 image for SPARC64/qemu emulation for C/C++
> > development before-real-hardware big-endian/unaligned tests
> >
> > i've benchmarked compiling of single pugixml.cpp
> > (https://github.com/zeux/pugixml/blob/master/src/pugixml.cpp)
> >
> > qemu-system-sparc64: >180sek
> > x64 native : ~ 2sek
> >
> > so my sparc64 emulation is around 90 times slower then native x64
> >
> > my system:
> >
> > using lastest qemu git 2.3.x, with virtio for harddisk/network and qcow2
> > image
> >
> > https://depositfiles.com/files/sj20aqwp0 (~280MB
> > press the "regular download" button, wait some seconds, solve the
> > chapca, "download file in regular mode by browser"
> >
> > there is pugi_sparc.txt in the 7z which describes how to start,use and
> > what is installed in the image
> >
> > qemu runs natively under a ubuntu 15.04 (x64), Core i7, 8GB system doing
> > nothing but qemu
> >
>
> Since the guest  g++ performance problems are caused by MMU emulation,
> I think the fastest solution at the moment would be using the user
> mode emulation instead of the full system emulation. You can try
> mounting your debian disk image with guestfish (or nbd) on your ubuntu
> host and chroot into it with statically built qemu-sparc32plus (for
> the released Debian/sparc) or statically built qemu-sparc64 (for the
> unreleased Debian/sparc64) as described in [1] and [2]. I haven't
> tried launching g++, but at least some /bin utilities used to work
> with qemu-sparc32plus, at least back in 2011 [2].
> NB: I think mixing sparc32plus and sparc64 binaries would not work,
> but it should not be a problem, since the userspace of the released
> Debian/sparc is pure sparc32plus and the userspace of the unreleased
> Debian/sparc64 is pure sparc64.
>
> Artyom
>
> 1. https://wiki.debian.org/QemuUserEmulation
> 2. http://tyom.blogspot.de/2011/07/user-mode-emulation-for-linuxsparc64.html
>
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2015-09-02  4:34 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-28  7:52 [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? Dennis Luehring
2015-07-28  9:54 ` Artyom Tarasenko
2015-07-29  6:20   ` Dennis Luehring
2015-07-29  8:23     ` Artyom Tarasenko
2015-07-29 15:01     ` Aurelien Jarno
2015-07-30  3:52       ` Dennis Luehring
2015-07-30  7:52         ` Aurelien Jarno
2015-07-30  8:16           ` Dennis Luehring
2015-07-30  8:42             ` Artyom Tarasenko
2015-07-30  8:55             ` Aurelien Jarno
2015-07-30  9:35               ` Artyom Tarasenko
2015-07-30 10:09                 ` Aurelien Jarno
2015-07-30 18:21                   ` Dennis Luehring
2015-07-30 15:50               ` Aurelien Jarno
2015-07-31 15:31                 ` Artyom Tarasenko
2015-07-31 15:43                   ` Aurelien Jarno
2015-08-02 13:11                     ` Mark Cave-Ayland
2015-08-03  8:31                     ` Artyom Tarasenko
2015-08-03  9:17                       ` Aurelien Jarno
2015-08-18  9:24                         ` Artyom Tarasenko
2015-08-18 17:55                           ` Richard Henderson
2015-08-19 10:41                             ` Artyom Tarasenko
2015-08-19 11:00                               ` Aurelien Jarno
2015-08-19 14:41                                 ` Artyom Tarasenko
2015-08-20  5:22                                   ` Dennis Luehring
2015-08-20 10:40                                     ` Artyom Tarasenko
2015-08-20 17:19                                   ` Richard Henderson
2015-08-21  4:32                                     ` Dennis Luehring
2015-08-21  5:49                                       ` Richard Henderson
2015-08-21  6:05                                         ` Dennis Luehring
2015-08-21 15:47                                           ` Richard Henderson
2015-08-21 16:13                                             ` Aurelien Jarno
2015-08-21 16:41                                             ` Dennis Luehring
2015-08-22 16:45                                     ` Artyom Tarasenko
2015-08-22 17:47                                       ` Dennis Luehring
2015-08-22 18:53                                         ` Artyom Tarasenko
2015-08-23 12:11                                           ` Dennis Luehring
2015-08-23  0:41                                       ` Richard Henderson
2015-08-26 16:17                                         ` Artyom Tarasenko
2015-08-26 19:47                                           ` Richard Henderson
2015-08-27  5:54                                             ` Dennis Luehring
2015-08-27 15:04                                               ` Richard Henderson
2015-08-27 15:58                                             ` Artyom Tarasenko
2015-08-17 11:32                     ` Dennis Luehring
2015-08-03  7:58               ` Dennis Luehring
2015-08-03 14:51               ` Dennis Luehring
2015-08-03 15:59                 ` Karel Gardas
2015-08-03 19:51                   ` Dennis Luehring
2015-08-06  9:00                     ` Karel Gardas
2015-08-06  9:21                       ` Dennis Luehring
2015-08-06  9:27                         ` Dennis Luehring
2015-08-06 12:50                           ` Karel Gardas
2015-08-06 16:35                             ` Dennis Luehring
2015-08-18  4:25                       ` Dennis Luehring
2015-08-18  8:19                         ` Aurelien Jarno
2015-08-18 10:39                           ` Dennis Luehring
2015-08-18 11:21                           ` Dennis Luehring
     [not found]                         ` <CAMO55fkcW1eOaZSz2MJgqZEP29pTuHvTLe0Kna5eHYfg7cFyPA@mail.gmail.com>
2015-08-19  4:28                           ` Dennis Luehring
2015-07-29  8:07   ` Dennis Luehring
2015-07-29 15:03     ` Aurelien Jarno
2015-07-29  9:17 ` Karel Gardas
2015-07-29 10:20   ` Dennis Luehring
2015-07-29 13:45     ` Karel Gardas
2015-07-29 15:13       ` Aurelien Jarno
2015-07-29 10:55   ` Dennis Luehring
2015-07-29 12:34     ` Karel Gardas
2015-07-29 12:38       ` Karel Gardas
2015-07-29 13:55       ` Dennis Luehring
2015-07-29 14:41         ` Karel Gardas
2015-07-30  3:47           ` Dennis Luehring
2015-07-30  7:12             ` Paolo Bonzini
2015-07-30  8:31               ` Artyom Tarasenko
2015-08-02 19:12                 ` Alex Bennée
2015-07-30  7:55             ` Aurelien Jarno
2015-08-17 14:19               ` Artyom Tarasenko
2015-08-17 15:40                 ` Richard Henderson
2015-08-17 16:25                   ` Artyom Tarasenko
2015-08-17 21:08                     ` Aurelien Jarno
2015-08-27 15:29 ` Artyom Tarasenko
2015-09-02  4:34   ` Dennis Luehring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.