All of lore.kernel.org
 help / color / mirror / Atom feed
* Sudden slowdown of ARM emulation in master
@ 2020-02-25 23:07 Niek Linnenbank
  2020-02-26  8:41 ` Peter Maydell
  2020-02-26  9:19 ` Igor Mammedov
  0 siblings, 2 replies; 15+ messages in thread
From: Niek Linnenbank @ 2020-02-25 23:07 UTC (permalink / raw)
  To: Igor Mammedov, Paolo Bonzini
  Cc: Peter Maydell, qemu-arm, Philippe Mathieu-Daudé, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

Hello Igor and Paolo,

Just now I was working on some small fixes for the cubieboard machine and
rebasing my Allwinner H3 branches.
While doing some testing, I noticed that suddenly the machines were much
slower than before.
I only see this happening when I rebase to this commit:
   ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/
20200219160953.13771-1-imammedo@redhat.com' of
https://github.com/patchew-project/qemu into HEAD")

Also the avocado tests I'm running started to timeout:

+ AVOCADO_ALLOW_LARGE_STORAGE=yes avocado --show=app,console run -t
machine:cubieboard tests/acceptance/boot_linux_console.py
...
(1/2)
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_initrd:
|console: Uncompressing Linux... done, booting the kernel.
|console: Booting Linux on physical CPU 0x0
console: Linux version 4.20.7-sunxi (root@armbian.com) (gcc version 7.2.1
20171011 (Linaro GCC 7.2-2017.11)) #5.75 SMP Fri Feb 8 09:02:10 CET 2019
console: CPU: ARMv7 Processor [410fc080] revision 0 (ARMv7), cr=50c5387d
console: CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing
instruction cache
console: OF: fdt: Machine model: Cubietech Cubieboard
...
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout
reached\n
Original status: ERROR\n{'name':
'1-tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_initrd',
'logdir': '/home/me/avocado/job-results/job-2020-02-25T23.58-d43884...
(90.41 s)
...
console: random: crng init done
/console: mount: mounting devtmpfs on /dev failed: Device or resource busy
-console: EXT4-fs (sda): re-mounted. Opts:
block_validity,barrier,user_xattr,acl
/console: Starting logging: OK
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout
reached\nOriginal status: ERROR\n{'name':
'2-tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_sata',
'logdir': '/home/fox/avocado/job-results/job-2020-02-25T23.58-d438849/...
(90.53 s)
RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 2 |
CANCEL 0
JOB TIME   : 181.22 s
 ....

Have you noticed a similar performance change?
Do you have any clue if there may be something changed here that could
cause a slowdown?

Regards,
Niek


-- 
Niek Linnenbank

[-- Attachment #2: Type: text/html, Size: 2986 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-25 23:07 Sudden slowdown of ARM emulation in master Niek Linnenbank
@ 2020-02-26  8:41 ` Peter Maydell
  2020-02-26  8:44   ` Philippe Mathieu-Daudé
  2020-02-26  8:45   ` Howard Spoelstra
  2020-02-26  9:19 ` Igor Mammedov
  1 sibling, 2 replies; 15+ messages in thread
From: Peter Maydell @ 2020-02-26  8:41 UTC (permalink / raw)
  To: Niek Linnenbank
  Cc: Richard Henderson, QEMU Developers, qemu-arm, Paolo Bonzini,
	Igor Mammedov, Philippe Mathieu-Daudé

On Tue, 25 Feb 2020 at 23:08, Niek Linnenbank <nieklinnenbank@gmail.com> wrote:

> Just now I was working on some small fixes for the cubieboard machine and rebasing my Allwinner H3 branches.
> While doing some testing, I noticed that suddenly the machines were much slower than before.
> I only see this happening when I rebase to this commit:
>    ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD")

Yeah, I noticed a slowdown yesterday as well, but haven't tracked it down
as yet. The first thing would be to do a git bisect to try to narrow
down what commit caused it.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  8:41 ` Peter Maydell
@ 2020-02-26  8:44   ` Philippe Mathieu-Daudé
  2020-02-26  8:48     ` Peter Maydell
  2020-02-26  8:45   ` Howard Spoelstra
  1 sibling, 1 reply; 15+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-26  8:44 UTC (permalink / raw)
  To: Peter Maydell, Niek Linnenbank
  Cc: Igor Mammedov, qemu-arm, Richard Henderson, QEMU Developers,
	Paolo Bonzini

On 2/26/20 9:41 AM, Peter Maydell wrote:
> On Tue, 25 Feb 2020 at 23:08, Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
> 
>> Just now I was working on some small fixes for the cubieboard machine and rebasing my Allwinner H3 branches.
>> While doing some testing, I noticed that suddenly the machines were much slower than before.
>> I only see this happening when I rebase to this commit:
>>     ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD")
> 
> Yeah, I noticed a slowdown yesterday as well, but haven't tracked it down
> as yet. The first thing would be to do a git bisect to try to narrow
> down what commit caused it.

My guess: biggest chunk of memory is the DRAM, registered as "fast RAM" 
by QEMU, but the SoCs provide SRAM which is supposed to be faster. Not 
anymore with QEMU. And Linux try to use the SRAM when possible.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  8:41 ` Peter Maydell
  2020-02-26  8:44   ` Philippe Mathieu-Daudé
@ 2020-02-26  8:45   ` Howard Spoelstra
  2020-02-26  9:11     ` Igor Mammedov
  1 sibling, 1 reply; 15+ messages in thread
From: Howard Spoelstra @ 2020-02-26  8:45 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, QEMU Developers, Niek Linnenbank, qemu-arm,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 986 bytes --]

On Wed, Feb 26, 2020 at 9:42 AM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Tue, 25 Feb 2020 at 23:08, Niek Linnenbank <nieklinnenbank@gmail.com>
> wrote:
>
> > Just now I was working on some small fixes for the cubieboard machine
> and rebasing my Allwinner H3 branches.
> > While doing some testing, I noticed that suddenly the machines were much
> slower than before.
> > I only see this happening when I rebase to this commit:
> >    ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/
> 20200219160953.13771-1-imammedo@redhat.com' of
> https://github.com/patchew-project/qemu into HEAD")
>
> Yeah, I noticed a slowdown yesterday as well, but haven't tracked it down
> as yet. The first thing would be to do a git bisect to try to narrow
> down what commit caused it.
>
> thanks
> -- PMM
>


Perhaps related? I noticed a slow down on qemu-system-ppc and tracked it
down here:
https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg07262.html

Best,
Howard

[-- Attachment #2: Type: text/html, Size: 1866 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  8:44   ` Philippe Mathieu-Daudé
@ 2020-02-26  8:48     ` Peter Maydell
  2020-02-26  8:51       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-02-26  8:48 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Richard Henderson, QEMU Developers, Niek Linnenbank, qemu-arm,
	Paolo Bonzini, Igor Mammedov

On Wed, 26 Feb 2020 at 08:44, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> On 2/26/20 9:41 AM, Peter Maydell wrote:
> > On Tue, 25 Feb 2020 at 23:08, Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
> >
> >> Just now I was working on some small fixes for the cubieboard machine and rebasing my Allwinner H3 branches.
> >> While doing some testing, I noticed that suddenly the machines were much slower than before.
> >> I only see this happening when I rebase to this commit:
> >>     ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD")
> >
> > Yeah, I noticed a slowdown yesterday as well, but haven't tracked it down
> > as yet. The first thing would be to do a git bisect to try to narrow
> > down what commit caused it.
>
> My guess: biggest chunk of memory is the DRAM, registered as "fast RAM"
> by QEMU, but the SoCs provide SRAM which is supposed to be faster. Not
> anymore with QEMU. And Linux try to use the SRAM when possible.

Doesn't sound very likely to me: generally Linux doesn't use random small
lumps of SRAM, it just goes for whatever the dtb says is the main RAM,
usually DRAM. And I thought that all RAM blocks within QEMU performed
the same?

From the commit that Howard tracked down as the cause it looks like
an ordering-of-actions issue in vl.c where something that was assuming
memory-size-related stuff was set up is now running before those
variables/fields are set correctly rather than after ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  8:48     ` Peter Maydell
@ 2020-02-26  8:51       ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 15+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-26  8:51 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, QEMU Developers, Niek Linnenbank, qemu-arm,
	Paolo Bonzini, Igor Mammedov

On 2/26/20 9:48 AM, Peter Maydell wrote:
> On Wed, 26 Feb 2020 at 08:44, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>>
>> On 2/26/20 9:41 AM, Peter Maydell wrote:
>>> On Tue, 25 Feb 2020 at 23:08, Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
>>>
>>>> Just now I was working on some small fixes for the cubieboard machine and rebasing my Allwinner H3 branches.
>>>> While doing some testing, I noticed that suddenly the machines were much slower than before.
>>>> I only see this happening when I rebase to this commit:
>>>>      ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD")
>>>
>>> Yeah, I noticed a slowdown yesterday as well, but haven't tracked it down
>>> as yet. The first thing would be to do a git bisect to try to narrow
>>> down what commit caused it.
>>
>> My guess: biggest chunk of memory is the DRAM, registered as "fast RAM"
>> by QEMU, but the SoCs provide SRAM which is supposed to be faster. Not
>> anymore with QEMU. And Linux try to use the SRAM when possible.
> 
> Doesn't sound very likely to me: generally Linux doesn't use random small
> lumps of SRAM, it just goes for whatever the dtb says is the main RAM,
> usually DRAM. And I thought that all RAM blocks within QEMU performed
> the same?
> 
>  From the commit that Howard tracked down as the cause it looks like
> an ordering-of-actions issue in vl.c where something that was assuming
> memory-size-related stuff was set up is now running before those
> variables/fields are set correctly rather than after ?

Yes, I just saw Howard's email and was going to reply to my own "I'm 
probably wrong since the mac99 machine doesn't use SRAM".



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  8:45   ` Howard Spoelstra
@ 2020-02-26  9:11     ` Igor Mammedov
  0 siblings, 0 replies; 15+ messages in thread
From: Igor Mammedov @ 2020-02-26  9:11 UTC (permalink / raw)
  To: Howard Spoelstra
  Cc: Peter Maydell, Richard Henderson, QEMU Developers,
	Niek Linnenbank, qemu-arm, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Wed, 26 Feb 2020 09:45:48 +0100
Howard Spoelstra <hsp.cat7@gmail.com> wrote:

> On Wed, Feb 26, 2020 at 9:42 AM Peter Maydell <peter.maydell@linaro.org>
> wrote:
> 
> > On Tue, 25 Feb 2020 at 23:08, Niek Linnenbank <nieklinnenbank@gmail.com>
> > wrote:
> >  
> > > Just now I was working on some small fixes for the cubieboard machine  
> > and rebasing my Allwinner H3 branches.  
> > > While doing some testing, I noticed that suddenly the machines were much  
> > slower than before.  
> > > I only see this happening when I rebase to this commit:
> > >    ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/  
> > 20200219160953.13771-1-imammedo@redhat.com' of
> > https://github.com/patchew-project/qemu into HEAD")
> >
> > Yeah, I noticed a slowdown yesterday as well, but haven't tracked it down
> > as yet. The first thing would be to do a git bisect to try to narrow
> > down what commit caused it.
> >
> > thanks
> > -- PMM
> >  
> 
> 
> Perhaps related? I noticed a slow down on qemu-system-ppc and tracked it
> down here:
> https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg07262.html

There might be some implicit dependency in TCG that depends on ram_size.
I'm looking for what it could be

> Best,
> Howard



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-25 23:07 Sudden slowdown of ARM emulation in master Niek Linnenbank
  2020-02-26  8:41 ` Peter Maydell
@ 2020-02-26  9:19 ` Igor Mammedov
  2020-02-26  9:32   ` Howard Spoelstra
  2020-02-26 10:03   ` Peter Maydell
  1 sibling, 2 replies; 15+ messages in thread
From: Igor Mammedov @ 2020-02-26  9:19 UTC (permalink / raw)
  To: Niek Linnenbank
  Cc: Peter Maydell, QEMU Developers, qemu-arm, Howard Spoelstra,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Wed, 26 Feb 2020 00:07:55 +0100
Niek Linnenbank <nieklinnenbank@gmail.com> wrote:

> Hello Igor and Paolo,

does following hack solves issue?

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a08ab11f65..ab2448c5aa 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
         /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
            static buffer, we could size this on RESERVED_VA, on the text
            segment size of the executable, or continue to use the default.  */
-        tb_size = (unsigned long)(ram_size / 4);
+        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
 #endif
     }
     if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {


> 
> Just now I was working on some small fixes for the cubieboard machine and
> rebasing my Allwinner H3 branches.
> While doing some testing, I noticed that suddenly the machines were much
> slower than before.
> I only see this happening when I rebase to this commit:
>    ca6155c0f2bd39b4b4162533be401c98bd960820 ("Merge tag 'patchew/
> 20200219160953.13771-1-imammedo@redhat.com' of
> https://github.com/patchew-project/qemu into HEAD")
> 
> Also the avocado tests I'm running started to timeout:
> 
> + AVOCADO_ALLOW_LARGE_STORAGE=yes avocado --show=app,console run -t
> machine:cubieboard tests/acceptance/boot_linux_console.py
> ...
> (1/2)
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_initrd:
> |console: Uncompressing Linux... done, booting the kernel.
> |console: Booting Linux on physical CPU 0x0
> console: Linux version 4.20.7-sunxi (root@armbian.com) (gcc version 7.2.1
> 20171011 (Linaro GCC 7.2-2017.11)) #5.75 SMP Fri Feb 8 09:02:10 CET 2019
> console: CPU: ARMv7 Processor [410fc080] revision 0 (ARMv7), cr=50c5387d
> console: CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing
> instruction cache
> console: OF: fdt: Machine model: Cubietech Cubieboard
> ...
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout
> reached\n
> Original status: ERROR\n{'name':
> '1-tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_initrd',
> 'logdir': '/home/me/avocado/job-results/job-2020-02-25T23.58-d43884...
> (90.41 s)
> ...
> console: random: crng init done
> /console: mount: mounting devtmpfs on /dev failed: Device or resource busy
> -console: EXT4-fs (sda): re-mounted. Opts:
> block_validity,barrier,user_xattr,acl
> /console: Starting logging: OK
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout
> reached\nOriginal status: ERROR\n{'name':
> '2-tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_sata',
> 'logdir': '/home/fox/avocado/job-results/job-2020-02-25T23.58-d438849/...
> (90.53 s)
> RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 2 |
> CANCEL 0
> JOB TIME   : 181.22 s
>  ....
> 
> Have you noticed a similar performance change?
> Do you have any clue if there may be something changed here that could
> cause a slowdown?
> 
> Regards,
> Niek
> 
> 



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  9:19 ` Igor Mammedov
@ 2020-02-26  9:32   ` Howard Spoelstra
  2020-02-26 10:14     ` Igor Mammedov
  2020-02-26 10:03   ` Peter Maydell
  1 sibling, 1 reply; 15+ messages in thread
From: Howard Spoelstra @ 2020-02-26  9:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, QEMU Developers, Niek Linnenbank, qemu-arm,
	Paolo Bonzini, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

On Wed, Feb 26, 2020 at 10:19 AM Igor Mammedov <imammedo@redhat.com> wrote:

> On Wed, 26 Feb 2020 00:07:55 +0100
> Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
>
> > Hello Igor and Paolo,
>
> does following hack solves issue?
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index a08ab11f65..ab2448c5aa 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t
> tb_size)
>          /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
>             static buffer, we could size this on RESERVED_VA, on the text
>             segment size of the executable, or continue to use the
> default.  */
> -        tb_size = (unsigned long)(ram_size / 4);
> +        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
>  #endif
>      }
>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
>
>

Nice, for me, that brings qemu-system-ppc back up to speed. (applied to
ppc-for-5.0)

Best,
Howard

[-- Attachment #2: Type: text/html, Size: 1549 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  9:19 ` Igor Mammedov
  2020-02-26  9:32   ` Howard Spoelstra
@ 2020-02-26 10:03   ` Peter Maydell
  2020-02-26 10:36     ` Mark Cave-Ayland
  2020-02-26 14:13     ` Alex Bennée
  1 sibling, 2 replies; 15+ messages in thread
From: Peter Maydell @ 2020-02-26 10:03 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Richard Henderson, QEMU Developers, Niek Linnenbank, qemu-arm,
	Howard Spoelstra, Paolo Bonzini, Philippe Mathieu-Daudé

On Wed, 26 Feb 2020 at 09:19, Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Wed, 26 Feb 2020 00:07:55 +0100
> Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
>
> > Hello Igor and Paolo,
>
> does following hack solves issue?
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index a08ab11f65..ab2448c5aa 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
>          /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
>             static buffer, we could size this on RESERVED_VA, on the text
>             segment size of the executable, or continue to use the default.  */
> -        tb_size = (unsigned long)(ram_size / 4);
> +        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
>  #endif
>      }
>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {

Cc'ing Richard to ask: does it still make sense for TCG
to pick a codegen buffer size based on the guest RAM size?
(We should fix the regression anyway, but it surprised me
slightly to find a config detail of the guest machine being
used here.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26  9:32   ` Howard Spoelstra
@ 2020-02-26 10:14     ` Igor Mammedov
  0 siblings, 0 replies; 15+ messages in thread
From: Igor Mammedov @ 2020-02-26 10:14 UTC (permalink / raw)
  To: Howard Spoelstra
  Cc: Peter Maydell, QEMU Developers, Niek Linnenbank, qemu-arm,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Wed, 26 Feb 2020 10:32:38 +0100
Howard Spoelstra <hsp.cat7@gmail.com> wrote:

> On Wed, Feb 26, 2020 at 10:19 AM Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Wed, 26 Feb 2020 00:07:55 +0100
> > Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
> >  
> > > Hello Igor and Paolo,  
> >
> > does following hack solves issue?
> >
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index a08ab11f65..ab2448c5aa 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t
> > tb_size)
> >          /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
> >             static buffer, we could size this on RESERVED_VA, on the text
> >             segment size of the executable, or continue to use the
> > default.  */
> > -        tb_size = (unsigned long)(ram_size / 4);
> > +        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
> >  #endif
> >      }
> >      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
> >
> >  
> 
> Nice, for me, that brings qemu-system-ppc back up to speed. (applied to
> ppc-for-5.0)

thanks for confirming.

My patch a1b18df9a4 'vl.c: move -m parsing after memory backends has been processed'
moved ram_size parsing after accelerator init, but tcg allocates
buffer based on global ram_size and since ram_size is still 0 it
falls back to MIN_CODE_GEN_BUFFER_SIZE (see size_code_gen_buffer)
and if ram_size were too large it would cap buffet at
MAX_CODE_GEN_BUFFER_SIZE.

 *-user doesn't use ram_size, it uses DEFAULT_CODE_GEN_BUFFER_SIZE
and static buffer so it's no affected.

For softmmu it should be possible to postpone buffer allocation
till accel_setup_post(current_machine) time and fetch ram_size
from current machine dropping random access to global variable.
That would put buffer allocation after ram_size is parsed.

Does it look like a feasible approach?

> 
> Best,
> Howard



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26 10:03   ` Peter Maydell
@ 2020-02-26 10:36     ` Mark Cave-Ayland
  2020-02-26 14:13     ` Alex Bennée
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Cave-Ayland @ 2020-02-26 10:36 UTC (permalink / raw)
  To: Peter Maydell, Igor Mammedov
  Cc: Richard Henderson, QEMU Developers, Niek Linnenbank, qemu-arm,
	Howard Spoelstra, Paolo Bonzini, Philippe Mathieu-Daudé

On 26/02/2020 10:03, Peter Maydell wrote:

>> On Wed, 26 Feb 2020 00:07:55 +0100
>> Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
>>
>>> Hello Igor and Paolo,
>>
>> does following hack solves issue?
>>
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index a08ab11f65..ab2448c5aa 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
>>          /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
>>             static buffer, we could size this on RESERVED_VA, on the text
>>             segment size of the executable, or continue to use the default.  */
>> -        tb_size = (unsigned long)(ram_size / 4);
>> +        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
>>  #endif
>>      }
>>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
> 
> Cc'ing Richard to ask: does it still make sense for TCG
> to pick a codegen buffer size based on the guest RAM size?
> (We should fix the regression anyway, but it surprised me
> slightly to find a config detail of the guest machine being
> used here.)

FWIW the NetBSD guys have been running their QEMU-based CI for some time now with an
extra -tb-size parameter to improve performance: http://gnats.netbsd.org/52184.


ATB,

Mark.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26 10:03   ` Peter Maydell
  2020-02-26 10:36     ` Mark Cave-Ayland
@ 2020-02-26 14:13     ` Alex Bennée
  2020-02-26 14:45       ` Igor Mammedov
  1 sibling, 1 reply; 15+ messages in thread
From: Alex Bennée @ 2020-02-26 14:13 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Howard Spoelstra, Richard Henderson, qemu-devel, Niek Linnenbank,
	qemu-arm, Paolo Bonzini, Igor Mammedov,
	Philippe Mathieu-Daudé


Peter Maydell <peter.maydell@linaro.org> writes:

> On Wed, 26 Feb 2020 at 09:19, Igor Mammedov <imammedo@redhat.com> wrote:
>>
>> On Wed, 26 Feb 2020 00:07:55 +0100
>> Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
>>
>> > Hello Igor and Paolo,
>>
>> does following hack solves issue?
>>
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index a08ab11f65..ab2448c5aa 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
>>          /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
>>             static buffer, we could size this on RESERVED_VA, on the text
>>             segment size of the executable, or continue to use the default.  */
>> -        tb_size = (unsigned long)(ram_size / 4);
>> +        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
>>  #endif
>>      }
>>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
>
> Cc'ing Richard to ask: does it still make sense for TCG
> to pick a codegen buffer size based on the guest RAM size?

Arguably you would never get more than ram_size * tcg gen overhead of
active TBs at any one point although you can come up with pathological
patterns where only a subset of pages are flushed in and out at a time.

However the backing for the code is mmap'ed anyway so surely the kernel
can work out the kinks here. We will never allocate more than the code
generator can generate jumps for anyway.

Looking at the SoftMMU version of alloc_code_gen_buffer it looks like
everything now falls under the:

  # if defined(__PIE__) || defined(__PIC__)

leg so there is a bunch of code to be deleted there. The remaining
question is what to do for linux-user because there is a bit more logic
to deal with some corner cases on the static code generation buffer.

I'd be tempted to rename DEFAULT_CODE_GEN_BUFFER_SIZE to
SMALL_CODE_GEN_BUFFER_SIZE and only bother with a static allocation for
32 bit linux-user hosts. Otherwise why not default to
MAX_CODE_GEN_BUFFER_SIZE on 64 bit systems and let the kernel deal with
it?

> (We should fix the regression anyway, but it surprised me
> slightly to find a config detail of the guest machine being
> used here.)
>
> thanks
> -- PMM


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26 14:13     ` Alex Bennée
@ 2020-02-26 14:45       ` Igor Mammedov
  2020-02-26 15:29         ` Alex Bennée
  0 siblings, 1 reply; 15+ messages in thread
From: Igor Mammedov @ 2020-02-26 14:45 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, Richard Henderson, qemu-devel, Niek Linnenbank,
	qemu-arm, Howard Spoelstra, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Wed, 26 Feb 2020 14:13:11 +0000
Alex Bennée <alex.bennee@linaro.org> wrote:

> Peter Maydell <peter.maydell@linaro.org> writes:
> 
> > On Wed, 26 Feb 2020 at 09:19, Igor Mammedov <imammedo@redhat.com> wrote:  
> >>
> >> On Wed, 26 Feb 2020 00:07:55 +0100
> >> Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
> >>  
> >> > Hello Igor and Paolo,  
> >>
> >> does following hack solves issue?
> >>
> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> >> index a08ab11f65..ab2448c5aa 100644
> >> --- a/accel/tcg/translate-all.c
> >> +++ b/accel/tcg/translate-all.c
> >> @@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
> >>          /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
> >>             static buffer, we could size this on RESERVED_VA, on the text
> >>             segment size of the executable, or continue to use the default.  */
> >> -        tb_size = (unsigned long)(ram_size / 4);
> >> +        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
> >>  #endif
> >>      }
> >>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {  
> >
> > Cc'ing Richard to ask: does it still make sense for TCG
> > to pick a codegen buffer size based on the guest RAM size?  
> 
> Arguably you would never get more than ram_size * tcg gen overhead of
> active TBs at any one point although you can come up with pathological
> patterns where only a subset of pages are flushed in and out at a time.
> 
> However the backing for the code is mmap'ed anyway so surely the kernel
> can work out the kinks here. We will never allocate more than the code
> generator can generate jumps for anyway.
> 
> Looking at the SoftMMU version of alloc_code_gen_buffer it looks like
> everything now falls under the:
> 
>   # if defined(__PIE__) || defined(__PIC__)
> 
> leg so there is a bunch of code to be deleted there. The remaining
> question is what to do for linux-user because there is a bit more logic
> to deal with some corner cases on the static code generation buffer.
> 
> I'd be tempted to rename DEFAULT_CODE_GEN_BUFFER_SIZE to
> SMALL_CODE_GEN_BUFFER_SIZE and only bother with a static allocation for
> 32 bit linux-user hosts. Otherwise why not default to
> MAX_CODE_GEN_BUFFER_SIZE on 64 bit systems and let the kernel deal with
> it?

*-user call
  tcg_exec_init(0);
which in in the end results in
  DEFAULT_CODE_GEN_BUFFER_SIZE -> DEFAULT_CODE_GEN_BUFFER_SIZE_1

so for *-user cases we can just always call
   code_gen_alloc(DEFAULT_CODE_GEN_BUFFER_SIZE)

> > (We should fix the regression anyway, but it surprised me
> > slightly to find a config detail of the guest machine being
> > used here.)
> >
> > thanks
> > -- PMM  
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Sudden slowdown of ARM emulation in master
  2020-02-26 14:45       ` Igor Mammedov
@ 2020-02-26 15:29         ` Alex Bennée
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Bennée @ 2020-02-26 15:29 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Richard Henderson, qemu-devel, Niek Linnenbank,
	qemu-arm, Howard Spoelstra, Paolo Bonzini,
	Philippe Mathieu-Daudé


Igor Mammedov <imammedo@redhat.com> writes:

> On Wed, 26 Feb 2020 14:13:11 +0000
> Alex Bennée <alex.bennee@linaro.org> wrote:
>
>> Peter Maydell <peter.maydell@linaro.org> writes:
>> 
>> > On Wed, 26 Feb 2020 at 09:19, Igor Mammedov <imammedo@redhat.com> wrote:  
>> >>
>> >> On Wed, 26 Feb 2020 00:07:55 +0100
>> >> Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
>> >>  
>> >> > Hello Igor and Paolo,  
>> >>
>> >> does following hack solves issue?
>> >>
>> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> >> index a08ab11f65..ab2448c5aa 100644
>> >> --- a/accel/tcg/translate-all.c
>> >> +++ b/accel/tcg/translate-all.c
>> >> @@ -944,7 +944,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
>> >>          /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
>> >>             static buffer, we could size this on RESERVED_VA, on the text
>> >>             segment size of the executable, or continue to use the default.  */
>> >> -        tb_size = (unsigned long)(ram_size / 4);
>> >> +        tb_size = MAX_CODE_GEN_BUFFER_SIZE;
>> >>  #endif
>> >>      }
>> >>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {  
>> >
>> > Cc'ing Richard to ask: does it still make sense for TCG
>> > to pick a codegen buffer size based on the guest RAM size?  
>> 
>> Arguably you would never get more than ram_size * tcg gen overhead of
>> active TBs at any one point although you can come up with pathological
>> patterns where only a subset of pages are flushed in and out at a time.
>> 
>> However the backing for the code is mmap'ed anyway so surely the kernel
>> can work out the kinks here. We will never allocate more than the code
>> generator can generate jumps for anyway.
>> 
>> Looking at the SoftMMU version of alloc_code_gen_buffer it looks like
>> everything now falls under the:
>> 
>>   # if defined(__PIE__) || defined(__PIC__)
>> 
>> leg so there is a bunch of code to be deleted there. The remaining
>> question is what to do for linux-user because there is a bit more logic
>> to deal with some corner cases on the static code generation buffer.
>> 
>> I'd be tempted to rename DEFAULT_CODE_GEN_BUFFER_SIZE to
>> SMALL_CODE_GEN_BUFFER_SIZE and only bother with a static allocation for
>> 32 bit linux-user hosts. Otherwise why not default to
>> MAX_CODE_GEN_BUFFER_SIZE on 64 bit systems and let the kernel deal with
>> it?
>
> *-user call
>   tcg_exec_init(0);
> which in in the end results in
>   DEFAULT_CODE_GEN_BUFFER_SIZE -> DEFAULT_CODE_GEN_BUFFER_SIZE_1
>
> so for *-user cases we can just always call
>    code_gen_alloc(DEFAULT_CODE_GEN_BUFFER_SIZE)
<snip>

I've gone for a variation of that, coming to a mailing list near you
real soon now ;-)

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-02-26 15:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25 23:07 Sudden slowdown of ARM emulation in master Niek Linnenbank
2020-02-26  8:41 ` Peter Maydell
2020-02-26  8:44   ` Philippe Mathieu-Daudé
2020-02-26  8:48     ` Peter Maydell
2020-02-26  8:51       ` Philippe Mathieu-Daudé
2020-02-26  8:45   ` Howard Spoelstra
2020-02-26  9:11     ` Igor Mammedov
2020-02-26  9:19 ` Igor Mammedov
2020-02-26  9:32   ` Howard Spoelstra
2020-02-26 10:14     ` Igor Mammedov
2020-02-26 10:03   ` Peter Maydell
2020-02-26 10:36     ` Mark Cave-Ayland
2020-02-26 14:13     ` Alex Bennée
2020-02-26 14:45       ` Igor Mammedov
2020-02-26 15:29         ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.