* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
@ 2020-01-15 14:14 ` Marc Zyngier
0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2020-01-15 14:14 UTC (permalink / raw)
To: Will Deacon
Cc: mark.rutland, daniel.lezcano, kvm, linux-doc, peterz,
catalin.marinas, suzuki.poulose, linux-kernel, virtualization,
Zengruan Ye, james.morse, julien.thierry.kdev, linux,
steven.price, kvmarm, linux-arm-kernel
On 2020-01-13 12:12, Will Deacon wrote:
> [+PeterZ]
>
> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>> This patch set aims to support the vcpu_is_preempted() functionality
>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>> currently running or not. This will enhance lock performance on
>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>> system) as doing busy waits for preempted VCPUs will hurt system
>> performance far worse than early yielding.
>>
>> We have observed some performace improvements in uninx benchmark
>> tests.
>>
>> unix benchmark result:
>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>
>> test-case | after-patch |
>> before-patch
>> ----------------------------------------+-------------------+------------------
>> Dhrystone 2 using register variables | 334600751.0 lps |
>> 335319028.3 lps
>> Double-Precision Whetstone | 32856.1 MWIPS |
>> 32849.6 MWIPS
>> Execl Throughput | 3662.1 lps |
>> 2718.0 lps
>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps |
>> 158011.8 KBps
>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps |
>> 37664.0 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps |
>> 441108.8 KBps
>> Pipe Throughput | 6405029.6 lps |
>> 6021457.6 lps
>> Pipe-based Context Switching | 185872.7 lps |
>> 184255.3 lps
>> Process Creation | 4025.7 lps |
>> 3706.6 lps
>> Shell Scripts (1 concurrent) | 6745.6 lpm |
>> 6436.1 lpm
>> Shell Scripts (8 concurrent) | 998.7 lpm |
>> 931.1 lpm
>> System Call Overhead | 3913363.1 lps |
>> 3883287.8 lps
>> ----------------------------------------+-------------------+------------------
>> System Benchmarks Index Score | 1835.1 |
>> 1327.6
>
> Interesting, thanks for the numbers.
>
> So it looks like there is a decent improvement to be had from targetted
> vCPU
> wakeup, but I really dislike the explicit PV interface and it's already
> been
> shown to interact badly with the WFE-based polling in
> smp_cond_load_*().
>
> Rather than expose a divergent interface, I would instead like to
> explore an
> improvement to smp_cond_load_*() and see how that performs before we
> commit
> to something more intrusive. Marc and I looked at this very briefly in
> the
> past, and the basic idea is to register all of the WFE sites with the
> hypervisor, indicating which register contains the address being spun
> on
> and which register contains the "bad" value. That way, you don't bother
> rescheduling a vCPU if the value at the address is still bad, because
> you
> know it will exit immediately.
>
> Of course, the devil is in the details because when I say "address",
> that's
> a guest virtual address, so you need to play some tricks in the
> hypervisor
> so that you have a separate mapping for the lockword (it's enough to
> keep
> track of the physical address).
>
> Our hacks are here but we basically ran out of time to work on them
> beyond
> an unoptimised and hacky prototype:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>
> Marc -- how would you prefer to handle this?
Let me try and rebase this thing to a modern kernel (I doubt it applies
without
conflicts to mainline). We can then have discussion about its merit on
the list
once I post it. It'd be good to have a pointer to the benchamrks that
have been
used here.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
@ 2020-01-15 14:14 ` Marc Zyngier
0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2020-01-15 14:14 UTC (permalink / raw)
To: Will Deacon
Cc: daniel.lezcano, kvm, linux-doc, peterz, catalin.marinas,
linux-kernel, virtualization, linux, steven.price, kvmarm,
linux-arm-kernel
On 2020-01-13 12:12, Will Deacon wrote:
> [+PeterZ]
>
> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>> This patch set aims to support the vcpu_is_preempted() functionality
>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>> currently running or not. This will enhance lock performance on
>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>> system) as doing busy waits for preempted VCPUs will hurt system
>> performance far worse than early yielding.
>>
>> We have observed some performace improvements in uninx benchmark
>> tests.
>>
>> unix benchmark result:
>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>
>> test-case | after-patch |
>> before-patch
>> ----------------------------------------+-------------------+------------------
>> Dhrystone 2 using register variables | 334600751.0 lps |
>> 335319028.3 lps
>> Double-Precision Whetstone | 32856.1 MWIPS |
>> 32849.6 MWIPS
>> Execl Throughput | 3662.1 lps |
>> 2718.0 lps
>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps |
>> 158011.8 KBps
>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps |
>> 37664.0 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps |
>> 441108.8 KBps
>> Pipe Throughput | 6405029.6 lps |
>> 6021457.6 lps
>> Pipe-based Context Switching | 185872.7 lps |
>> 184255.3 lps
>> Process Creation | 4025.7 lps |
>> 3706.6 lps
>> Shell Scripts (1 concurrent) | 6745.6 lpm |
>> 6436.1 lpm
>> Shell Scripts (8 concurrent) | 998.7 lpm |
>> 931.1 lpm
>> System Call Overhead | 3913363.1 lps |
>> 3883287.8 lps
>> ----------------------------------------+-------------------+------------------
>> System Benchmarks Index Score | 1835.1 |
>> 1327.6
>
> Interesting, thanks for the numbers.
>
> So it looks like there is a decent improvement to be had from targetted
> vCPU
> wakeup, but I really dislike the explicit PV interface and it's already
> been
> shown to interact badly with the WFE-based polling in
> smp_cond_load_*().
>
> Rather than expose a divergent interface, I would instead like to
> explore an
> improvement to smp_cond_load_*() and see how that performs before we
> commit
> to something more intrusive. Marc and I looked at this very briefly in
> the
> past, and the basic idea is to register all of the WFE sites with the
> hypervisor, indicating which register contains the address being spun
> on
> and which register contains the "bad" value. That way, you don't bother
> rescheduling a vCPU if the value at the address is still bad, because
> you
> know it will exit immediately.
>
> Of course, the devil is in the details because when I say "address",
> that's
> a guest virtual address, so you need to play some tricks in the
> hypervisor
> so that you have a separate mapping for the lockword (it's enough to
> keep
> track of the physical address).
>
> Our hacks are here but we basically ran out of time to work on them
> beyond
> an unoptimised and hacky prototype:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>
> Marc -- how would you prefer to handle this?
Let me try and rebase this thing to a modern kernel (I doubt it applies
without
conflicts to mainline). We can then have discussion about its merit on
the list
once I post it. It'd be good to have a pointer to the benchamrks that
have been
used here.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
@ 2020-01-15 14:14 ` Marc Zyngier
0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2020-01-15 14:14 UTC (permalink / raw)
To: Will Deacon
Cc: daniel.lezcano, kvm, linux-doc, peterz, catalin.marinas,
linux-kernel, virtualization, linux, steven.price, kvmarm,
linux-arm-kernel
On 2020-01-13 12:12, Will Deacon wrote:
> [+PeterZ]
>
> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>> This patch set aims to support the vcpu_is_preempted() functionality
>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>> currently running or not. This will enhance lock performance on
>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>> system) as doing busy waits for preempted VCPUs will hurt system
>> performance far worse than early yielding.
>>
>> We have observed some performace improvements in uninx benchmark
>> tests.
>>
>> unix benchmark result:
>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>
>> test-case | after-patch |
>> before-patch
>> ----------------------------------------+-------------------+------------------
>> Dhrystone 2 using register variables | 334600751.0 lps |
>> 335319028.3 lps
>> Double-Precision Whetstone | 32856.1 MWIPS |
>> 32849.6 MWIPS
>> Execl Throughput | 3662.1 lps |
>> 2718.0 lps
>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps |
>> 158011.8 KBps
>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps |
>> 37664.0 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps |
>> 441108.8 KBps
>> Pipe Throughput | 6405029.6 lps |
>> 6021457.6 lps
>> Pipe-based Context Switching | 185872.7 lps |
>> 184255.3 lps
>> Process Creation | 4025.7 lps |
>> 3706.6 lps
>> Shell Scripts (1 concurrent) | 6745.6 lpm |
>> 6436.1 lpm
>> Shell Scripts (8 concurrent) | 998.7 lpm |
>> 931.1 lpm
>> System Call Overhead | 3913363.1 lps |
>> 3883287.8 lps
>> ----------------------------------------+-------------------+------------------
>> System Benchmarks Index Score | 1835.1 |
>> 1327.6
>
> Interesting, thanks for the numbers.
>
> So it looks like there is a decent improvement to be had from targetted
> vCPU
> wakeup, but I really dislike the explicit PV interface and it's already
> been
> shown to interact badly with the WFE-based polling in
> smp_cond_load_*().
>
> Rather than expose a divergent interface, I would instead like to
> explore an
> improvement to smp_cond_load_*() and see how that performs before we
> commit
> to something more intrusive. Marc and I looked at this very briefly in
> the
> past, and the basic idea is to register all of the WFE sites with the
> hypervisor, indicating which register contains the address being spun
> on
> and which register contains the "bad" value. That way, you don't bother
> rescheduling a vCPU if the value at the address is still bad, because
> you
> know it will exit immediately.
>
> Of course, the devil is in the details because when I say "address",
> that's
> a guest virtual address, so you need to play some tricks in the
> hypervisor
> so that you have a separate mapping for the lockword (it's enough to
> keep
> track of the physical address).
>
> Our hacks are here but we basically ran out of time to work on them
> beyond
> an unoptimised and hacky prototype:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>
> Marc -- how would you prefer to handle this?
Let me try and rebase this thing to a modern kernel (I doubt it applies
without
conflicts to mainline). We can then have discussion about its merit on
the list
once I post it. It'd be good to have a pointer to the benchamrks that
have been
used here.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
2020-01-15 14:14 ` Marc Zyngier
@ 2020-12-16 8:45 ` yezengruan
-1 siblings, 0 replies; 18+ messages in thread
From: yezengruan @ 2020-12-16 8:45 UTC (permalink / raw)
To: Marc Zyngier, Will Deacon
Cc: daniel.lezcano, kvm, linux-doc, peterz, catalin.marinas,
linux-kernel, virtualization, linux, steven.price, kvmarm,
linux-arm-kernel
[-- Attachment #1.1: Type: text/plain, Size: 6389 bytes --]
On 2020/1/15 22:14, Marc Zyngier wrote:
> On 2020-01-13 12:12, Will Deacon wrote:
>> [+PeterZ]
>>
>> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>>> This patch set aims to support the vcpu_is_preempted() functionality
>>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>>> currently running or not. This will enhance lock performance on
>>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>>> system) as doing busy waits for preempted VCPUs will hurt system
>>> performance far worse than early yielding.
>>>
>>> We have observed some performace improvements in uninx benchmark tests.
>>>
>>> unix benchmark result:
>>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>>
>>> test-case | after-patch | before-patch
>>> ----------------------------------------+-------------------+------------------
>>> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
>>> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
>>> Execl Throughput | 3662.1 lps | 2718.0 lps
>>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
>>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
>>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
>>> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
>>> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
>>> Process Creation | 4025.7 lps | 3706.6 lps
>>> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
>>> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
>>> System Call Overhead | 3913363.1 lps | 3883287.8 lps
>>> ----------------------------------------+-------------------+------------------
>>> System Benchmarks Index Score | 1835.1 | 1327.6
>>
>> Interesting, thanks for the numbers.
>>
>> So it looks like there is a decent improvement to be had from targetted vCPU
>> wakeup, but I really dislike the explicit PV interface and it's already been
>> shown to interact badly with the WFE-based polling in smp_cond_load_*().
>>
>> Rather than expose a divergent interface, I would instead like to explore an
>> improvement to smp_cond_load_*() and see how that performs before we commit
>> to something more intrusive. Marc and I looked at this very briefly in the
>> past, and the basic idea is to register all of the WFE sites with the
>> hypervisor, indicating which register contains the address being spun on
>> and which register contains the "bad" value. That way, you don't bother
>> rescheduling a vCPU if the value at the address is still bad, because you
>> know it will exit immediately.
>>
>> Of course, the devil is in the details because when I say "address", that's
>> a guest virtual address, so you need to play some tricks in the hypervisor
>> so that you have a separate mapping for the lockword (it's enough to keep
>> track of the physical address).
>>
>> Our hacks are here but we basically ran out of time to work on them beyond
>> an unoptimised and hacky prototype:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>>
>> Marc -- how would you prefer to handle this?
>
> Let me try and rebase this thing to a modern kernel (I doubt it applies without
> conflicts to mainline). We can then have discussion about its merit on the list
> once I post it. It'd be good to have a pointer to the benchamrks that have been
> used here.
Hi Marc, Will,
My apologies for the slow reply. Just checking what is the latest on this
PV cond yield prototype?
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
Recently, I re-doed the unixbench test comparison between vCPU preempted check
and PV cond yield. The results are as follows:
unix benchmark result:
host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.10.0-rc6, 16 VCPUs
| 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted
System Benchmarks Index Values | INDEX | INDEX | INDEX
---------------------------------------+------------+---------------+-------------------
Dhrystone 2 using register variables | 29164.0 | 29156.9 | 29207.2
Double-Precision Whetstone | 6807.6 | 6789.2 | 6912.1
Execl Throughput | 856.7 | 1195.6 | 863.1
File Copy 1024 bufsize 2000 maxblocks | 189.9 | 923.5 | 1094.2
File Copy 256 bufsize 500 maxblocks | 121.9 | 578.4 | 588.7
File Copy 4096 bufsize 8000 maxblocks | 419.9 | 1992.0 | 2733.7
Pipe Throughput | 6727.2 | 6670.2 | 6743.2
Pipe-based Context Switching | 486.9 | 547.0 | 471.9
Process Creation | 353.4 | 345.1 | 338.5
Shell Scripts (1 concurrent) | 3187.2 | 1432.2 | 2798.7
Shell Scripts (8 concurrent) | 3410.5 | 1360.1 | 2672.9
System Call Overhead | 2967.0 | 3273.9 | 3497.9
---------------------------------------+------------+---------------+-------------------
System Benchmarks Index Score | 1410.0 | 1885.8 | 2128.5
Thanks,
Zengruan
>
> Thanks,
>
> M.
[-- Attachment #1.2: Type: text/html, Size: 10216 bytes --]
[-- Attachment #2: Type: text/plain, Size: 151 bytes --]
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
@ 2020-12-16 8:45 ` yezengruan
0 siblings, 0 replies; 18+ messages in thread
From: yezengruan @ 2020-12-16 8:45 UTC (permalink / raw)
To: Marc Zyngier, Will Deacon
Cc: mark.rutland, daniel.lezcano, kvm, linux-doc, peterz,
catalin.marinas, suzuki.poulose, linux-kernel, virtualization,
james.morse, julien.thierry.kdev, Wanghaibin (D),
linux, steven.price, kvmarm, linux-arm-kernel
[-- Attachment #1.1: Type: text/plain, Size: 6389 bytes --]
On 2020/1/15 22:14, Marc Zyngier wrote:
> On 2020-01-13 12:12, Will Deacon wrote:
>> [+PeterZ]
>>
>> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>>> This patch set aims to support the vcpu_is_preempted() functionality
>>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>>> currently running or not. This will enhance lock performance on
>>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>>> system) as doing busy waits for preempted VCPUs will hurt system
>>> performance far worse than early yielding.
>>>
>>> We have observed some performace improvements in uninx benchmark tests.
>>>
>>> unix benchmark result:
>>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>>
>>> test-case | after-patch | before-patch
>>> ----------------------------------------+-------------------+------------------
>>> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
>>> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
>>> Execl Throughput | 3662.1 lps | 2718.0 lps
>>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
>>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
>>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
>>> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
>>> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
>>> Process Creation | 4025.7 lps | 3706.6 lps
>>> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
>>> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
>>> System Call Overhead | 3913363.1 lps | 3883287.8 lps
>>> ----------------------------------------+-------------------+------------------
>>> System Benchmarks Index Score | 1835.1 | 1327.6
>>
>> Interesting, thanks for the numbers.
>>
>> So it looks like there is a decent improvement to be had from targetted vCPU
>> wakeup, but I really dislike the explicit PV interface and it's already been
>> shown to interact badly with the WFE-based polling in smp_cond_load_*().
>>
>> Rather than expose a divergent interface, I would instead like to explore an
>> improvement to smp_cond_load_*() and see how that performs before we commit
>> to something more intrusive. Marc and I looked at this very briefly in the
>> past, and the basic idea is to register all of the WFE sites with the
>> hypervisor, indicating which register contains the address being spun on
>> and which register contains the "bad" value. That way, you don't bother
>> rescheduling a vCPU if the value at the address is still bad, because you
>> know it will exit immediately.
>>
>> Of course, the devil is in the details because when I say "address", that's
>> a guest virtual address, so you need to play some tricks in the hypervisor
>> so that you have a separate mapping for the lockword (it's enough to keep
>> track of the physical address).
>>
>> Our hacks are here but we basically ran out of time to work on them beyond
>> an unoptimised and hacky prototype:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>>
>> Marc -- how would you prefer to handle this?
>
> Let me try and rebase this thing to a modern kernel (I doubt it applies without
> conflicts to mainline). We can then have discussion about its merit on the list
> once I post it. It'd be good to have a pointer to the benchamrks that have been
> used here.
Hi Marc, Will,
My apologies for the slow reply. Just checking what is the latest on this
PV cond yield prototype?
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
Recently, I re-doed the unixbench test comparison between vCPU preempted check
and PV cond yield. The results are as follows:
unix benchmark result:
host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.10.0-rc6, 16 VCPUs
| 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted
System Benchmarks Index Values | INDEX | INDEX | INDEX
---------------------------------------+------------+---------------+-------------------
Dhrystone 2 using register variables | 29164.0 | 29156.9 | 29207.2
Double-Precision Whetstone | 6807.6 | 6789.2 | 6912.1
Execl Throughput | 856.7 | 1195.6 | 863.1
File Copy 1024 bufsize 2000 maxblocks | 189.9 | 923.5 | 1094.2
File Copy 256 bufsize 500 maxblocks | 121.9 | 578.4 | 588.7
File Copy 4096 bufsize 8000 maxblocks | 419.9 | 1992.0 | 2733.7
Pipe Throughput | 6727.2 | 6670.2 | 6743.2
Pipe-based Context Switching | 486.9 | 547.0 | 471.9
Process Creation | 353.4 | 345.1 | 338.5
Shell Scripts (1 concurrent) | 3187.2 | 1432.2 | 2798.7
Shell Scripts (8 concurrent) | 3410.5 | 1360.1 | 2672.9
System Call Overhead | 2967.0 | 3273.9 | 3497.9
---------------------------------------+------------+---------------+-------------------
System Benchmarks Index Score | 1410.0 | 1885.8 | 2128.5
Thanks,
Zengruan
>
> Thanks,
>
> M.
[-- Attachment #1.2: Type: text/html, Size: 10216 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
2020-01-15 14:14 ` Marc Zyngier
(?)
(?)
@ 2020-12-29 8:50 ` yezengruan
-1 siblings, 0 replies; 18+ messages in thread
From: yezengruan @ 2020-12-29 8:50 UTC (permalink / raw)
To: Marc Zyngier, Will Deacon
Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-doc,
virtualization, james.morse, linux, suzuki.poulose,
julien.thierry.kdev, catalin.marinas, mark.rutland, steven.price,
daniel.lezcano, peterz, Wanghaibin (D)
On 2020/1/15 22:14, Marc Zyngier wrote:
> On 2020-01-13 12:12, Will Deacon wrote:
>> [+PeterZ]
>>
>> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>>> This patch set aims to support the vcpu_is_preempted() functionality
>>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>>> currently running or not. This will enhance lock performance on
>>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>>> system) as doing busy waits for preempted VCPUs will hurt system
>>> performance far worse than early yielding.
>>>
>>> We have observed some performace improvements in uninx benchmark tests.
>>>
>>> unix benchmark result:
>>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>>
>>> test-case | after-patch | before-patch
>>> ----------------------------------------+-------------------+------------------
>>> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
>>> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
>>> Execl Throughput | 3662.1 lps | 2718.0 lps
>>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
>>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
>>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
>>> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
>>> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
>>> Process Creation | 4025.7 lps | 3706.6 lps
>>> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
>>> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
>>> System Call Overhead | 3913363.1 lps | 3883287.8 lps
>>> ----------------------------------------+-------------------+------------------
>>> System Benchmarks Index Score | 1835.1 | 1327.6
>>
>> Interesting, thanks for the numbers.
>>
>> So it looks like there is a decent improvement to be had from targetted vCPU
>> wakeup, but I really dislike the explicit PV interface and it's already been
>> shown to interact badly with the WFE-based polling in smp_cond_load_*().
>>
>> Rather than expose a divergent interface, I would instead like to explore an
>> improvement to smp_cond_load_*() and see how that performs before we commit
>> to something more intrusive. Marc and I looked at this very briefly in the
>> past, and the basic idea is to register all of the WFE sites with the
>> hypervisor, indicating which register contains the address being spun on
>> and which register contains the "bad" value. That way, you don't bother
>> rescheduling a vCPU if the value at the address is still bad, because you
>> know it will exit immediately.
>>
>> Of course, the devil is in the details because when I say "address", that's
>> a guest virtual address, so you need to play some tricks in the hypervisor
>> so that you have a separate mapping for the lockword (it's enough to keep
>> track of the physical address).
>>
>> Our hacks are here but we basically ran out of time to work on them beyond
>> an unoptimised and hacky prototype:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>>
>> Marc -- how would you prefer to handle this?
>
> Let me try and rebase this thing to a modern kernel (I doubt it applies without
> conflicts to mainline). We can then have discussion about its merit on the list
> once I post it. It'd be good to have a pointer to the benchamrks that have been
> used here.
>
> Thanks,
>
> M.
Hi Marc, Will,
My apologies for the slow reply. Just checking what is the latest on this
PV cond yield prototype?
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
The following are the unixbench test results of PV cond yield prototype:
unix benchmark result:
host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.10.0-rc6, 16 VCPUs
| 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted
System Benchmarks Index Values | INDEX | INDEX | INDEX
---------------------------------------+------------+---------------+-------------------
Dhrystone 2 using register variables | 29164.0 | 29156.9 | 29207.2
Double-Precision Whetstone | 6807.6 | 6789.2 | 6912.1
Execl Throughput | 856.7 | 1195.6 | 863.1
File Copy 1024 bufsize 2000 maxblocks | 189.9 | 923.5 | 1094.2
File Copy 256 bufsize 500 maxblocks | 121.9 | 578.4 | 588.7
File Copy 4096 bufsize 8000 maxblocks | 419.9 | 1992.0 | 2733.7
Pipe Throughput | 6727.2 | 6670.2 | 6743.2
Pipe-based Context Switching | 486.9 | 547.0 | 471.9
Process Creation | 353.4 | 345.1 | 338.5
Shell Scripts (1 concurrent) | 3187.2 | 1432.2 | 2798.7
Shell Scripts (8 concurrent) | 3410.5 | 1360.1 | 2672.9
System Call Overhead | 2967.0 | 3273.9 | 3497.9
---------------------------------------+------------+---------------+-------------------
System Benchmarks Index Score | 1410.0 | 1885.8 | 2128.5
Thanks,
Zengruan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
@ 2020-12-29 8:50 ` yezengruan
0 siblings, 0 replies; 18+ messages in thread
From: yezengruan @ 2020-12-29 8:50 UTC (permalink / raw)
To: Marc Zyngier, Will Deacon
Cc: mark.rutland, daniel.lezcano, kvm, linux-doc, peterz,
catalin.marinas, suzuki.poulose, linux-kernel, virtualization,
james.morse, julien.thierry.kdev, Wanghaibin (D),
linux, steven.price, kvmarm, linux-arm-kernel
On 2020/1/15 22:14, Marc Zyngier wrote:
> On 2020-01-13 12:12, Will Deacon wrote:
>> [+PeterZ]
>>
>> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>>> This patch set aims to support the vcpu_is_preempted() functionality
>>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>>> currently running or not. This will enhance lock performance on
>>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>>> system) as doing busy waits for preempted VCPUs will hurt system
>>> performance far worse than early yielding.
>>>
>>> We have observed some performace improvements in uninx benchmark tests.
>>>
>>> unix benchmark result:
>>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>>
>>> test-case | after-patch | before-patch
>>> ----------------------------------------+-------------------+------------------
>>> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
>>> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
>>> Execl Throughput | 3662.1 lps | 2718.0 lps
>>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
>>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
>>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
>>> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
>>> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
>>> Process Creation | 4025.7 lps | 3706.6 lps
>>> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
>>> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
>>> System Call Overhead | 3913363.1 lps | 3883287.8 lps
>>> ----------------------------------------+-------------------+------------------
>>> System Benchmarks Index Score | 1835.1 | 1327.6
>>
>> Interesting, thanks for the numbers.
>>
>> So it looks like there is a decent improvement to be had from targetted vCPU
>> wakeup, but I really dislike the explicit PV interface and it's already been
>> shown to interact badly with the WFE-based polling in smp_cond_load_*().
>>
>> Rather than expose a divergent interface, I would instead like to explore an
>> improvement to smp_cond_load_*() and see how that performs before we commit
>> to something more intrusive. Marc and I looked at this very briefly in the
>> past, and the basic idea is to register all of the WFE sites with the
>> hypervisor, indicating which register contains the address being spun on
>> and which register contains the "bad" value. That way, you don't bother
>> rescheduling a vCPU if the value at the address is still bad, because you
>> know it will exit immediately.
>>
>> Of course, the devil is in the details because when I say "address", that's
>> a guest virtual address, so you need to play some tricks in the hypervisor
>> so that you have a separate mapping for the lockword (it's enough to keep
>> track of the physical address).
>>
>> Our hacks are here but we basically ran out of time to work on them beyond
>> an unoptimised and hacky prototype:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>>
>> Marc -- how would you prefer to handle this?
>
> Let me try and rebase this thing to a modern kernel (I doubt it applies without
> conflicts to mainline). We can then have discussion about its merit on the list
> once I post it. It'd be good to have a pointer to the benchamrks that have been
> used here.
>
> Thanks,
>
> M.
Hi Marc, Will,
My apologies for the slow reply. Just checking what is the latest on this
PV cond yield prototype?
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
The following are the unixbench test results of PV cond yield prototype:
unix benchmark result:
host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.10.0-rc6, 16 VCPUs
| 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted
System Benchmarks Index Values | INDEX | INDEX | INDEX
---------------------------------------+------------+---------------+-------------------
Dhrystone 2 using register variables | 29164.0 | 29156.9 | 29207.2
Double-Precision Whetstone | 6807.6 | 6789.2 | 6912.1
Execl Throughput | 856.7 | 1195.6 | 863.1
File Copy 1024 bufsize 2000 maxblocks | 189.9 | 923.5 | 1094.2
File Copy 256 bufsize 500 maxblocks | 121.9 | 578.4 | 588.7
File Copy 4096 bufsize 8000 maxblocks | 419.9 | 1992.0 | 2733.7
Pipe Throughput | 6727.2 | 6670.2 | 6743.2
Pipe-based Context Switching | 486.9 | 547.0 | 471.9
Process Creation | 353.4 | 345.1 | 338.5
Shell Scripts (1 concurrent) | 3187.2 | 1432.2 | 2798.7
Shell Scripts (8 concurrent) | 3410.5 | 1360.1 | 2672.9
System Call Overhead | 2967.0 | 3273.9 | 3497.9
---------------------------------------+------------+---------------+-------------------
System Benchmarks Index Score | 1410.0 | 1885.8 | 2128.5
Thanks,
Zengruan
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
@ 2020-12-29 8:50 ` yezengruan
0 siblings, 0 replies; 18+ messages in thread
From: yezengruan @ 2020-12-29 8:50 UTC (permalink / raw)
To: Marc Zyngier, Will Deacon
Cc: mark.rutland, daniel.lezcano, kvm, linux-doc, peterz,
catalin.marinas, suzuki.poulose, linux-kernel, virtualization,
james.morse, julien.thierry.kdev, Wanghaibin (D),
linux, steven.price, kvmarm, linux-arm-kernel
On 2020/1/15 22:14, Marc Zyngier wrote:
> On 2020-01-13 12:12, Will Deacon wrote:
>> [+PeterZ]
>>
>> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>>> This patch set aims to support the vcpu_is_preempted() functionality
>>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>>> currently running or not. This will enhance lock performance on
>>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>>> system) as doing busy waits for preempted VCPUs will hurt system
>>> performance far worse than early yielding.
>>>
>>> We have observed some performace improvements in uninx benchmark tests.
>>>
>>> unix benchmark result:
>>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>>
>>> test-case | after-patch | before-patch
>>> ----------------------------------------+-------------------+------------------
>>> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
>>> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
>>> Execl Throughput | 3662.1 lps | 2718.0 lps
>>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
>>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
>>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
>>> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
>>> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
>>> Process Creation | 4025.7 lps | 3706.6 lps
>>> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
>>> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
>>> System Call Overhead | 3913363.1 lps | 3883287.8 lps
>>> ----------------------------------------+-------------------+------------------
>>> System Benchmarks Index Score | 1835.1 | 1327.6
>>
>> Interesting, thanks for the numbers.
>>
>> So it looks like there is a decent improvement to be had from targetted vCPU
>> wakeup, but I really dislike the explicit PV interface and it's already been
>> shown to interact badly with the WFE-based polling in smp_cond_load_*().
>>
>> Rather than expose a divergent interface, I would instead like to explore an
>> improvement to smp_cond_load_*() and see how that performs before we commit
>> to something more intrusive. Marc and I looked at this very briefly in the
>> past, and the basic idea is to register all of the WFE sites with the
>> hypervisor, indicating which register contains the address being spun on
>> and which register contains the "bad" value. That way, you don't bother
>> rescheduling a vCPU if the value at the address is still bad, because you
>> know it will exit immediately.
>>
>> Of course, the devil is in the details because when I say "address", that's
>> a guest virtual address, so you need to play some tricks in the hypervisor
>> so that you have a separate mapping for the lockword (it's enough to keep
>> track of the physical address).
>>
>> Our hacks are here but we basically ran out of time to work on them beyond
>> an unoptimised and hacky prototype:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>>
>> Marc -- how would you prefer to handle this?
>
> Let me try and rebase this thing to a modern kernel (I doubt it applies without
> conflicts to mainline). We can then have discussion about its merit on the list
> once I post it. It'd be good to have a pointer to the benchamrks that have been
> used here.
>
> Thanks,
>
> M.
Hi Marc, Will,
My apologies for the slow reply. Just checking what is the latest on this
PV cond yield prototype?
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
The following are the unixbench test results of PV cond yield prototype:
unix benchmark result:
host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.10.0-rc6, 16 VCPUs
| 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted
System Benchmarks Index Values | INDEX | INDEX | INDEX
---------------------------------------+------------+---------------+-------------------
Dhrystone 2 using register variables | 29164.0 | 29156.9 | 29207.2
Double-Precision Whetstone | 6807.6 | 6789.2 | 6912.1
Execl Throughput | 856.7 | 1195.6 | 863.1
File Copy 1024 bufsize 2000 maxblocks | 189.9 | 923.5 | 1094.2
File Copy 256 bufsize 500 maxblocks | 121.9 | 578.4 | 588.7
File Copy 4096 bufsize 8000 maxblocks | 419.9 | 1992.0 | 2733.7
Pipe Throughput | 6727.2 | 6670.2 | 6743.2
Pipe-based Context Switching | 486.9 | 547.0 | 471.9
Process Creation | 353.4 | 345.1 | 338.5
Shell Scripts (1 concurrent) | 3187.2 | 1432.2 | 2798.7
Shell Scripts (8 concurrent) | 3410.5 | 1360.1 | 2672.9
System Call Overhead | 2967.0 | 3273.9 | 3497.9
---------------------------------------+------------+---------------+-------------------
System Benchmarks Index Score | 1410.0 | 1885.8 | 2128.5
Thanks,
Zengruan
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
@ 2020-12-29 8:50 ` yezengruan
0 siblings, 0 replies; 18+ messages in thread
From: yezengruan @ 2020-12-29 8:50 UTC (permalink / raw)
To: Marc Zyngier, Will Deacon
Cc: daniel.lezcano, kvm, linux-doc, peterz, catalin.marinas,
linux-kernel, virtualization, linux, steven.price, kvmarm,
linux-arm-kernel
On 2020/1/15 22:14, Marc Zyngier wrote:
> On 2020-01-13 12:12, Will Deacon wrote:
>> [+PeterZ]
>>
>> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>>> This patch set aims to support the vcpu_is_preempted() functionality
>>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>>> currently running or not. This will enhance lock performance on
>>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>>> system) as doing busy waits for preempted VCPUs will hurt system
>>> performance far worse than early yielding.
>>>
>>> We have observed some performace improvements in uninx benchmark tests.
>>>
>>> unix benchmark result:
>>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>>
>>> test-case | after-patch | before-patch
>>> ----------------------------------------+-------------------+------------------
>>> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
>>> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
>>> Execl Throughput | 3662.1 lps | 2718.0 lps
>>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
>>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
>>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
>>> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
>>> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
>>> Process Creation | 4025.7 lps | 3706.6 lps
>>> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
>>> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
>>> System Call Overhead | 3913363.1 lps | 3883287.8 lps
>>> ----------------------------------------+-------------------+------------------
>>> System Benchmarks Index Score | 1835.1 | 1327.6
>>
>> Interesting, thanks for the numbers.
>>
>> So it looks like there is a decent improvement to be had from targetted vCPU
>> wakeup, but I really dislike the explicit PV interface and it's already been
>> shown to interact badly with the WFE-based polling in smp_cond_load_*().
>>
>> Rather than expose a divergent interface, I would instead like to explore an
>> improvement to smp_cond_load_*() and see how that performs before we commit
>> to something more intrusive. Marc and I looked at this very briefly in the
>> past, and the basic idea is to register all of the WFE sites with the
>> hypervisor, indicating which register contains the address being spun on
>> and which register contains the "bad" value. That way, you don't bother
>> rescheduling a vCPU if the value at the address is still bad, because you
>> know it will exit immediately.
>>
>> Of course, the devil is in the details because when I say "address", that's
>> a guest virtual address, so you need to play some tricks in the hypervisor
>> so that you have a separate mapping for the lockword (it's enough to keep
>> track of the physical address).
>>
>> Our hacks are here but we basically ran out of time to work on them beyond
>> an unoptimised and hacky prototype:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>>
>> Marc -- how would you prefer to handle this?
>
> Let me try and rebase this thing to a modern kernel (I doubt it applies without
> conflicts to mainline). We can then have discussion about its merit on the list
> once I post it. It'd be good to have a pointer to the benchamrks that have been
> used here.
>
> Thanks,
>
> M.
Hi Marc, Will,
My apologies for the slow reply. Just checking what is the latest on this
PV cond yield prototype?
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
The following are the unixbench test results of PV cond yield prototype:
unix benchmark result:
host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.10.0-rc6, 16 VCPUs
| 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted
System Benchmarks Index Values | INDEX | INDEX | INDEX
---------------------------------------+------------+---------------+-------------------
Dhrystone 2 using register variables | 29164.0 | 29156.9 | 29207.2
Double-Precision Whetstone | 6807.6 | 6789.2 | 6912.1
Execl Throughput | 856.7 | 1195.6 | 863.1
File Copy 1024 bufsize 2000 maxblocks | 189.9 | 923.5 | 1094.2
File Copy 256 bufsize 500 maxblocks | 121.9 | 578.4 | 588.7
File Copy 4096 bufsize 8000 maxblocks | 419.9 | 1992.0 | 2733.7
Pipe Throughput | 6727.2 | 6670.2 | 6743.2
Pipe-based Context Switching | 486.9 | 547.0 | 471.9
Process Creation | 353.4 | 345.1 | 338.5
Shell Scripts (1 concurrent) | 3187.2 | 1432.2 | 2798.7
Shell Scripts (8 concurrent) | 3410.5 | 1360.1 | 2672.9
System Call Overhead | 2967.0 | 3273.9 | 3497.9
---------------------------------------+------------+---------------+-------------------
System Benchmarks Index Score | 1410.0 | 1885.8 | 2128.5
Thanks,
Zengruan
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 18+ messages in thread