All of lore.kernel.org
 help / color / mirror / Atom feed
* Experiments using perf support in arm kvm guest
@ 2013-09-23 15:53 William Cohen
  2013-09-24  0:06 ` David Ahern
  0 siblings, 1 reply; 3+ messages in thread
From: William Cohen @ 2013-09-23 15:53 UTC (permalink / raw)
  To: PAPI list, linux-perf-users

Hi All,

I was curious to see how well (or poorly) perf events work in a virtualizated environment.  As a little experiment I have tried building papi from the git repo in a fedora rawhide guest vm running on an Intel ivy bridge.   I also ran things on the f19 host to compare results of "make fulltest" between the raw and virtualized hardware.  Despite trying to copy the host machine processor information in the set up of the guest machine, the guest vm thinks it is a sandy bridge rather than the Intel Ivy Bridge, but it looks like the same events are used in papi_events.csb for both.  The papi "make fulltest" results look similar on the x86.

There has been some work on arm cortex a15 to support hardware virtualization (http://osdir.com/ml/fedora-arm/2013-09/msg00011.html).  I have kvm hardware accelerated virtualization running on my Samsung ARM chromebook.  Both host and guest are running Fedora 19. The host is running a 3.11 kernel with a patch so that Samsung exynos 5250 boots up. The guest is running a stock Fedora 19 3.10.1-200 kernel.  For arm the guest papi "make fulltest" results are not so good.  It appears that access to the perf counters on the arm guest are not so good.  On the arm guest it looks like only the cycle count event is working::

Performance counter stats for 'ls':

          4.043500 task-clock                #    0.799 CPUs utilized          
                 0 context-switches          #    0.000 K/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
               237 page-faults               #    0.059 M/sec                  
     2,147,483,647 cycles                    #  531.095 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
     <not counted> instructions            
     <not counted> branches                
     <not counted> branch-misses           

       0.005059000 seconds time elapsed


On the arm host see:

 Performance counter stats for 'ls':

         19.259873 task-clock                #    0.777 CPUs utilized          
                 2 context-switches          #    0.104 K/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
               242 page-faults               #    0.013 M/sec                  
         6,242,062 cycles                    #    0.324 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
         3,479,441 instructions              #    0.56  insns per cycle        
           644,120 branches                  #   33.444 M/sec                  
            37,372 branch-misses             #    5.80% of all branches        

       0.024776800 seconds time elapsed

Are there reasons that the arm hardware cannot virtualize the performance counters like the x86 machines? Or is this something that just hasn't been implmented yet in the kernel? Or is this suppose to work and there is a bug?


-Will

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Experiments using perf support in arm kvm guest
  2013-09-23 15:53 Experiments using perf support in arm kvm guest William Cohen
@ 2013-09-24  0:06 ` David Ahern
  2013-10-02 15:33   ` Gleb Natapov
  0 siblings, 1 reply; 3+ messages in thread
From: David Ahern @ 2013-09-24  0:06 UTC (permalink / raw)
  To: William Cohen; +Cc: PAPI list, linux-perf-users, Gleb Natapov, KVM

[Added Gleb and kvm list]

On 9/23/13 9:53 AM, William Cohen wrote:
> Hi All,
>
> I was curious to see how well (or poorly) perf events work in a virtualizated environment.  As a little experiment I have tried building papi from the git repo in a fedora rawhide guest vm running on an Intel ivy bridge.   I also ran things on the f19 host to compare results of "make fulltest" between the raw and virtualized hardware.  Despite trying to copy the host machine processor information in the set up of the guest machine, the guest vm thinks it is a sandy bridge rather than the Intel Ivy Bridge, but it looks like the same events are used in papi_events.csb for both.  The papi "make fulltest" results look similar on the x86.
>
> There has been some work on arm cortex a15 to support hardware virtualization (http://osdir.com/ml/fedora-arm/2013-09/msg00011.html).  I have kvm hardware accelerated virtualization running on my Samsung ARM chromebook.  Both host and guest are running Fedora 19. The host is running a 3.11 kernel with a patch so that Samsung exynos 5250 boots up. The guest is running a stock Fedora 19 3.10.1-200 kernel.  For arm the guest papi "make fulltest" results are not so good.  It appears that access to the perf counters on the arm guest are not so good.  On the arm guest it looks like only the cycle count event is working::

to my knowledge a vPMU is only supported for kvm on x86. Perhaps Gleb / 
kvm list knows other wise.

David

>
> Performance counter stats for 'ls':
>
>            4.043500 task-clock                #    0.799 CPUs utilized
>                   0 context-switches          #    0.000 K/sec
>                   0 cpu-migrations            #    0.000 K/sec
>                 237 page-faults               #    0.059 M/sec
>       2,147,483,647 cycles                    #  531.095 GHz
>     <not supported> stalled-cycles-frontend
>     <not supported> stalled-cycles-backend
>       <not counted> instructions
>       <not counted> branches
>       <not counted> branch-misses
>
>         0.005059000 seconds time elapsed
>
>
> On the arm host see:
>
>   Performance counter stats for 'ls':
>
>           19.259873 task-clock                #    0.777 CPUs utilized
>                   2 context-switches          #    0.104 K/sec
>                   0 cpu-migrations            #    0.000 K/sec
>                 242 page-faults               #    0.013 M/sec
>           6,242,062 cycles                    #    0.324 GHz
>     <not supported> stalled-cycles-frontend
>     <not supported> stalled-cycles-backend
>           3,479,441 instructions              #    0.56  insns per cycle
>             644,120 branches                  #   33.444 M/sec
>              37,372 branch-misses             #    5.80% of all branches
>
>         0.024776800 seconds time elapsed
>
> Are there reasons that the arm hardware cannot virtualize the performance counters like the x86 machines? Or is this something that just hasn't been implmented yet in the kernel? Or is this suppose to work and there is a bug?
>
>
> -Will
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Experiments using perf support in arm kvm guest
  2013-09-24  0:06 ` David Ahern
@ 2013-10-02 15:33   ` Gleb Natapov
  0 siblings, 0 replies; 3+ messages in thread
From: Gleb Natapov @ 2013-10-02 15:33 UTC (permalink / raw)
  To: David Ahern; +Cc: William Cohen, PAPI list, linux-perf-users, KVM

On Mon, Sep 23, 2013 at 06:06:46PM -0600, David Ahern wrote:
> [Added Gleb and kvm list]
> 
Sorry for the late answer.

> On 9/23/13 9:53 AM, William Cohen wrote:
> >Hi All,
> >
> >I was curious to see how well (or poorly) perf events work in a virtualizated environment.  As a little experiment I have tried building papi from the git repo in a fedora rawhide guest vm running on an Intel ivy bridge.   I also ran things on the f19 host to compare results of "make fulltest" between the raw and virtualized hardware.  Despite trying to copy the host machine processor information in the set up of the guest machine, the guest vm thinks it is a sandy bridge rather than the Intel Ivy Bridge, but it looks like the same events are used in papi_events.csb for both.  The papi "make fulltest" results look similar on the x86.
> >
> >There has been some work on arm cortex a15 to support hardware virtualization (http://osdir.com/ml/fedora-arm/2013-09/msg00011.html).  I have kvm hardware accelerated virtualization running on my Samsung ARM chromebook.  Both host and guest are running Fedora 19. The host is running a 3.11 kernel with a patch so that Samsung exynos 5250 boots up. The guest is running a stock Fedora 19 3.10.1-200 kernel.  For arm the guest papi "make fulltest" results are not so good.  It appears that access to the perf counters on the arm guest are not so good.  On the arm guest it looks like only the cycle count event is working::
> 
> to my knowledge a vPMU is only supported for kvm on x86. Perhaps
> Gleb / kvm list knows other wise.
> 

For x86 very limited set of features (architectural PMU only basically)
is supported on Intel only. Most of PMU is not virtualizable on x86. For
ARM you can ask arm mailing list kvmarm@lists.cs.columbia.edu.


> David
> 
> >
> >Performance counter stats for 'ls':
> >
> >           4.043500 task-clock                #    0.799 CPUs utilized
> >                  0 context-switches          #    0.000 K/sec
> >                  0 cpu-migrations            #    0.000 K/sec
> >                237 page-faults               #    0.059 M/sec
> >      2,147,483,647 cycles                    #  531.095 GHz
> >    <not supported> stalled-cycles-frontend
> >    <not supported> stalled-cycles-backend
> >      <not counted> instructions
> >      <not counted> branches
> >      <not counted> branch-misses
> >
> >        0.005059000 seconds time elapsed
> >
> >
> >On the arm host see:
> >
> >  Performance counter stats for 'ls':
> >
> >          19.259873 task-clock                #    0.777 CPUs utilized
> >                  2 context-switches          #    0.104 K/sec
> >                  0 cpu-migrations            #    0.000 K/sec
> >                242 page-faults               #    0.013 M/sec
> >          6,242,062 cycles                    #    0.324 GHz
> >    <not supported> stalled-cycles-frontend
> >    <not supported> stalled-cycles-backend
> >          3,479,441 instructions              #    0.56  insns per cycle
> >            644,120 branches                  #   33.444 M/sec
> >             37,372 branch-misses             #    5.80% of all branches
> >
> >        0.024776800 seconds time elapsed
> >
> >Are there reasons that the arm hardware cannot virtualize the performance counters like the x86 machines? Or is this something that just hasn't been implmented yet in the kernel? Or is this suppose to work and there is a bug?
> >
> >
> >-Will
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

--
			Gleb.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-10-02 15:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-23 15:53 Experiments using perf support in arm kvm guest William Cohen
2013-09-24  0:06 ` David Ahern
2013-10-02 15:33   ` Gleb Natapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.