[PATCH v2 0/2] Introduce runstate area registration with phys address

* [PATCH v2 0/2] Introduce runstate area registration with phys address
@ 2019-04-23  8:10 Andrii Anisov
  2019-04-23  8:10   ` [Xen-devel] " Andrii Anisov
                   ` (3 more replies)
  0 siblings, 4 replies; 83+ messages in thread
From: Andrii Anisov @ 2019-04-23  8:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, Andrii Anisov, Jan Beulich, Roger Pau Monné

From: Andrii Anisov <andrii_anisov@epam.com>

Following discussion [1] it is introduced and implemented a runstate
registration interface which uses guest's phys address instead of a virtual one.
The new hypercall employes the same data structures as a predecessor, but
expects the vcpu_runstate_info structure to not cross a page boundary.
The interface is implemented in a way vcpu_runstate_info structure is mapped to
the hypervisor on the hypercall processing and is directly accessed during its
updates. This runstate area mapping follows vcpu_info structure registration.

Permanent mapping of runstate area would consume vmap area on arm32 what is
limited to 1G. Though it is assumed that ARM32 does not target the server market
and the rest of possible applications will not host a huge number of VCPUs to
render the limitation into the issue.

The series is tested for ARM64. Build tested for x86. I'd appreciate if someone
could check it with x86.
The Linux kernel patch is here [2]. Though it is for 4.14.

Changes in:

  v2: It was reconsidered the new runstate interface implementation. The new 
      interface is made independent of the old one. Do not share runstate_area
      field, and consequently avoid excessive concurrency with the old runstate
      interface usage.
      Introduced locks in order to resolve possible concurrency between runstate
      area registration and usage. 
      Addressed comments from Jan Beulich [3][4] about coding style nits. Though
      some of them become obsolete with refactoring and few are picked into this
      thread for further discussion.

      There were made performance measurements of approaches (runstate mapped on
      access vs mapped on registration). The test setups are as following:

      Thin Dom0 (Linux with intiramfs) with DomD running rich Yocto Linux. In
      DomD 3d benchmark numbers are compared. The benchmark is GlMark2. GLMark2
      is ran with different resolutions in order to emit different irq load, 
      where 320x240 emits high IRQ load, but 1920x1080 emits low irq load.
      Separately tested baking DomD benchmark run with primitive Dom0 CPU burn
      (dd), in order to stimulate VCPU(dX)->VCPU(dY) switches rather than
      VCPU(dX)->idle->VCPU(dX).
      with following results:

                            mapped               mapped
                            on access            on init
      GLMark2 320x240       2852                 2877          +0.8%
          +Dom0 CPUBurn     2088                 2094          +0.2%
      GLMark2 800x600       2368                 2375          +0.3%
          +Dom0 CPUBurn     1868                 1921          +2.8%
      GLMark2 1920x1080     931                  931            0%
          +Dom0 CPUBurn     892                  894           +0.2%

      Please note that "mapped on access" means using the old runstate
      registering interface. And runstate update in this case still often fails
      to map runstate area like [5], despite the fact that our Linux kernel
      does not have KPTI enabled. So runstate area update, in this case, is
      really shortened.

      Also it was checked IRQ latency difference using TBM in a setup similar to
      [5]. Please note that the IRQ rate is one in 30 seconds, and only
      VCPU->idle->VCPU use-case is considered. With following results (in ns,
      the timer granularity 120ns):

      mapped on access:
          max=9960 warm_max=8640 min=7200 avg=7626
      mapped on init:
          max=9480 warm_max=8400 min=7080 avg=7341

      Unfortunately there are no consitent results yet from profiling using
      Lauterbach PowerTrace. Still in communication with the tracer vendor in
      order to setup the proper configuration.

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html
[2] https://github.com/aanisov/linux/commit/ba34d2780f57ea43f81810cd695aace7b55c0f29
[3] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00936.html
[4] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00934.html
[5] https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02369.html
[6] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg02297.html

Andrii Anisov (2):
  xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
  xen: implement VCPUOP_register_runstate_phys_memory_area

 xen/arch/arm/domain.c        |  62 +++++++++++++++++--------
 xen/arch/x86/domain.c        | 105 +++++++++++++++++++++++++++++++------------
 xen/common/domain.c          |  81 +++++++++++++++++++++++++++++++++
 xen/include/asm-arm/domain.h |   2 +
 xen/include/public/vcpu.h    |  15 +++++++
 xen/include/xen/domain.h     |   2 +
 xen/include/xen/sched.h      |   8 ++++
 7 files changed, 227 insertions(+), 48 deletions(-)

-- 
2.7.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread