All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shunsuke Mie <mie@igel.co.jp>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 4/9] vringh: unify the APIs for all accessors
Date: Wed, 11 Jan 2023 13:10:15 +0900	[thread overview]
Message-ID: <28b421af-838d-e70a-ec95-2f14f21e3a90@igel.co.jp> (raw)
In-Reply-To: <20221228021354-mutt-send-email-mst@kernel.org>


On 2022/12/28 16:20, Michael S. Tsirkin wrote:
> On Wed, Dec 28, 2022 at 11:24:10AM +0900, Shunsuke Mie wrote:
>> 2022年12月27日(火) 23:37 Michael S. Tsirkin <mst@redhat.com>:
>>> On Tue, Dec 27, 2022 at 07:22:36PM +0900, Shunsuke Mie wrote:
>>>> 2022年12月27日(火) 16:49 Shunsuke Mie <mie@igel.co.jp>:
>>>>> 2022年12月27日(火) 16:04 Michael S. Tsirkin <mst@redhat.com>:
>>>>>> On Tue, Dec 27, 2022 at 11:25:26AM +0900, Shunsuke Mie wrote:
>>>>>>> Each vringh memory accessors that are for user, kern and iotlb has own
>>>>>>> interfaces that calls common code. But some codes are duplicated and that
>>>>>>> becomes loss extendability.
>>>>>>>
>>>>>>> Introduce a struct vringh_ops and provide a common APIs for all accessors.
>>>>>>> It can bee easily extended vringh code for new memory accessor and
>>>>>>> simplified a caller code.
>>>>>>>
>>>>>>> Signed-off-by: Shunsuke Mie <mie@igel.co.jp>
>>>>>>> ---
>>>>>>>   drivers/vhost/vringh.c | 667 +++++++++++------------------------------
>>>>>>>   include/linux/vringh.h | 100 +++---
>>>>>>>   2 files changed, 225 insertions(+), 542 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>>>>>>> index aa3cd27d2384..ebfd3644a1a3 100644
>>>>>>> --- a/drivers/vhost/vringh.c
>>>>>>> +++ b/drivers/vhost/vringh.c
>>>>>>> @@ -35,15 +35,12 @@ static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
>>>>>>>   }
>>>>>>>
>>>>>>>   /* Returns vring->num if empty, -ve on error. */
>>>>>>> -static inline int __vringh_get_head(const struct vringh *vrh,
>>>>>>> -                                 int (*getu16)(const struct vringh *vrh,
>>>>>>> -                                               u16 *val, const __virtio16 *p),
>>>>>>> -                                 u16 *last_avail_idx)
>>>>>>> +static inline int __vringh_get_head(const struct vringh *vrh, u16 *last_avail_idx)
>>>>>>>   {
>>>>>>>        u16 avail_idx, i, head;
>>>>>>>        int err;
>>>>>>>
>>>>>>> -     err = getu16(vrh, &avail_idx, &vrh->vring.avail->idx);
>>>>>>> +     err = vrh->ops.getu16(vrh, &avail_idx, &vrh->vring.avail->idx);
>>>>>>>        if (err) {
>>>>>>>                vringh_bad("Failed to access avail idx at %p",
>>>>>>>                           &vrh->vring.avail->idx);
>>>>>> I like that this patch removes more lines of code than it adds.
>>>>>>
>>>>>> However one of the design points of vringh abstractions is that they were
>>>>>> carefully written to be very low overhead.
>>>>>> This is why we are passing function pointers to inline functions -
>>>>>> compiler can optimize that out.
>>>>>>
>>>>>> I think that introducing ops indirect functions calls here is going to break
>>>>>> these assumptions and hurt performance.
>>>>>> Unless compiler can somehow figure it out and optimize?
>>>>>> I don't see how it's possible with ops pointer in memory
>>>>>> but maybe I'm wrong.
>>>>> I think your concern is correct. I have to understand the compiler
>>>>> optimization and redesign this approach If it is needed.
>>>>>> Was any effort taken to test effect of these patches on performance?
>>>>> I just tested vringh_test and already faced little performance reduction.
>>>>> I have to investigate that, as you said.
>>>> I attempted to test with perf. I found that the performance of patched code
>>>> is almost the same as the upstream one. However, I have to investigate way
>>>> this patch leads to this result, also the profiling should be run on
>>>> more powerful
>>>> machines too.
>>>>
>>>> environment:
>>>> $ grep 'model name' /proc/cpuinfo
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>>
>>>> results:
>>>> * for patched code
>>>>   Performance counter stats for 'nice -n -20 ./vringh_test_patched
>>>> --parallel --eventidx --fast-vringh --indirect --virtio-1' (20 runs):
>>>>
>>>>            3,028.05 msec task-clock                #    0.995 CPUs
>>>> utilized            ( +-  0.12% )
>>>>              78,150      context-switches          #   25.691 K/sec
>>>>                 ( +-  0.00% )
>>>>                   5      cpu-migrations            #    1.644 /sec
>>>>                 ( +-  3.33% )
>>>>                 190      page-faults               #   62.461 /sec
>>>>                 ( +-  0.41% )
>>>>       6,919,025,222      cycles                    #    2.275 GHz
>>>>                 ( +-  0.13% )
>>>>       8,990,220,160      instructions              #    1.29  insn per
>>>> cycle           ( +-  0.04% )
>>>>       1,788,326,786      branches                  #  587.899 M/sec
>>>>                 ( +-  0.05% )
>>>>           4,557,398      branch-misses             #    0.25% of all
>>>> branches          ( +-  0.43% )
>>>>
>>>>             3.04359 +- 0.00378 seconds time elapsed  ( +-  0.12% )
>>>>
>>>> * for upstream code
>>>>   Performance counter stats for 'nice -n -20 ./vringh_test_base
>>>> --parallel --eventidx --fast-vringh --indirect --virtio-1' (10 runs):
>>>>
>>>>            3,058.41 msec task-clock                #    0.999 CPUs
>>>> utilized            ( +-  0.14% )
>>>>              78,149      context-switches          #   25.545 K/sec
>>>>                 ( +-  0.00% )
>>>>                   5      cpu-migrations            #    1.634 /sec
>>>>                 ( +-  2.67% )
>>>>                 194      page-faults               #   63.414 /sec
>>>>                 ( +-  0.43% )
>>>>       6,988,713,963      cycles                    #    2.284 GHz
>>>>                 ( +-  0.14% )
>>>>       8,512,533,269      instructions              #    1.22  insn per
>>>> cycle           ( +-  0.04% )
>>>>       1,638,375,371      branches                  #  535.549 M/sec
>>>>                 ( +-  0.05% )
>>>>           4,428,866      branch-misses             #    0.27% of all
>>>> branches          ( +- 22.57% )
>>>>
>>>>             3.06085 +- 0.00420 seconds time elapsed  ( +-  0.14% )
>>>
>>> How you compiled it also matters. ATM we don't enable retpolines
>>> and it did not matter since we didn't have indirect calls,
>>> but we should. Didn't yet investigate how to do that for virtio tools.
>> I think the retpolines certainly affect performance. Thank you for pointing
>> it out. I'd like to start the investigation that how to apply the
>> retpolines to the
>> virtio tools.
>>>>> Thank you for your comments.
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>> Best,
>>>>> Shunsuke.
> This isn't all that trivial if we want this at runtime.
> But compile time is kind of easy.
> See Documentation/admin-guide/hw-vuln/spectre.rst

Thank you for showing it.


I followed the document and added options to CFLAGS to the tools Makefile.

That is

---

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index 1b25cc7c64bb..7b7139d97d74 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -4,7 +4,7 @@ test: virtio_test vringh_test
  virtio_test: virtio_ring.o virtio_test.o
  vringh_test: vringh_test.o vringh.o virtio_ring.o

-CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. 
-I../include/ -I ../../usr/include/ -Wno-pointer-sign 
-fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h
+CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. 
-I../include/ -I ../../usr/include/ -Wno-pointer-sign 
-fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h 
-mfunction-return=thunk -fcf-protection=none -mindirect-branch-register
  CFLAGS += -pthread
  LDFLAGS += -pthread
  vpath %.c ../../drivers/virtio ../../drivers/vhost
---

And results of evaluation are following:

- base with retpoline

$ sudo perf stat --repeat 20 -- nice -n -20 ./vringh_test_retp_origin 
--parallel --eventidx --fast-vringh
Using CPUS 0 and 3
Guest: notified 0, pinged 98040
Host: notified 98040, pinged 0
...

  Performance counter stats for 'nice -n -20 ./vringh_test_retp_origin 
--parallel --eventidx --fast-vringh' (20 runs):

           6,228.33 msec task-clock                #    1.004 CPUs 
utilized            ( +-  0.05% )
            196,110      context-switches          #   31.616 
K/sec                    ( +-  0.00% )
                  6      cpu-migrations            #    0.967 
/sec                     ( +-  2.39% )
                205      page-faults               #   33.049 
/sec                     ( +-  0.46% )
     14,218,527,987      cycles                    #    2.292 
GHz                      ( +-  0.05% )
     10,342,897,254      instructions              #    0.73  insn per 
cycle           ( +-  0.02% )
      2,310,572,989      branches                  #  372.500 
M/sec                    ( +-  0.03% )
        178,273,068      branch-misses             #    7.72% of all 
branches          ( +-  0.04% )

            6.20406 +- 0.00308 seconds time elapsed  ( +-  0.05% )

- patched (unified APIs) with retpoline

$ sudo perf stat --repeat 20 -- nice -n -20 ./vringh_test_retp_patched 
--parallel --eventidx --fast-vringh
Using CPUS 0 and 3
Guest: notified 0, pinged 98040
Host: notified 98040, pinged 0
...

  Performance counter stats for 'nice -n -20 ./vringh_test_retp_patched 
--parallel --eventidx --fast-vringh' (20 runs):

           6,103.94 msec task-clock                #    1.001 CPUs 
utilized            ( +-  0.03% )
            196,125      context-switches          #   32.165 
K/sec                    ( +-  0.00% )
                  7      cpu-migrations            #    1.148 
/sec                     ( +-  1.56% )
                196      page-faults               #   32.144 
/sec                     ( +-  0.41% )
     13,933,055,778      cycles                    #    2.285 
GHz                      ( +-  0.03% )
     10,309,004,718      instructions              #    0.74  insn per 
cycle           ( +-  0.03% )
      2,368,447,519      branches                  #  388.425 
M/sec                    ( +-  0.04% )
        211,364,886      branch-misses             #    8.94% of all 
branches          ( +-  0.05% )

            6.09888 +- 0.00155 seconds time elapsed  ( +-  0.03% )

As a result, at the patched code, the branch-misses was increased but
elapsed time became faster than the based code. The number of 
page-faults was
a little different. I'm suspicious of that the page-fault penalty leads the
performance result.

I think that a pattern of memory access for data is same with those, but
for instruction is different. Actually a code size (.text segment) was a
little smaller. 0x6a65 and 0x63f5.

$ readelf -a ./vringh_test_retp_origin |grep .text -1
        0000000000000008  0000000000000008  AX       0     0     8
   [14] .text             PROGBITS         0000000000001230 00001230
        0000000000006a65  0000000000000000  AX       0     0     16
--
    02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym 
.dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
    03     .init .plt .plt.got .text .fini
    04     .rodata .eh_frame_hdr .eh_frame


$ readelf -a ./vringh_test_retp_patched |grep .text -1
        0000000000000008  0000000000000008  AX       0     0     8
   [14] .text             PROGBITS         0000000000001230 00001230
        00000000000063f5  0000000000000000  AX       0     0     16
--
    02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym 
.dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
    03     .init .plt .plt.got .text .fini
    04     .rodata .eh_frame_hdr .eh_frame

I'll keep this investigation. I was wondering if you could comment me.

Best

>

WARNING: multiple messages have this Message-ID (diff)
From: Shunsuke Mie <mie@igel.co.jp>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org, netdev@vger.kernel.org,
	Rusty Russell <rusty@rustcorp.com.au>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
Subject: Re: [RFC PATCH 4/9] vringh: unify the APIs for all accessors
Date: Wed, 11 Jan 2023 13:10:15 +0900	[thread overview]
Message-ID: <28b421af-838d-e70a-ec95-2f14f21e3a90@igel.co.jp> (raw)
In-Reply-To: <20221228021354-mutt-send-email-mst@kernel.org>


On 2022/12/28 16:20, Michael S. Tsirkin wrote:
> On Wed, Dec 28, 2022 at 11:24:10AM +0900, Shunsuke Mie wrote:
>> 2022年12月27日(火) 23:37 Michael S. Tsirkin <mst@redhat.com>:
>>> On Tue, Dec 27, 2022 at 07:22:36PM +0900, Shunsuke Mie wrote:
>>>> 2022年12月27日(火) 16:49 Shunsuke Mie <mie@igel.co.jp>:
>>>>> 2022年12月27日(火) 16:04 Michael S. Tsirkin <mst@redhat.com>:
>>>>>> On Tue, Dec 27, 2022 at 11:25:26AM +0900, Shunsuke Mie wrote:
>>>>>>> Each vringh memory accessors that are for user, kern and iotlb has own
>>>>>>> interfaces that calls common code. But some codes are duplicated and that
>>>>>>> becomes loss extendability.
>>>>>>>
>>>>>>> Introduce a struct vringh_ops and provide a common APIs for all accessors.
>>>>>>> It can bee easily extended vringh code for new memory accessor and
>>>>>>> simplified a caller code.
>>>>>>>
>>>>>>> Signed-off-by: Shunsuke Mie <mie@igel.co.jp>
>>>>>>> ---
>>>>>>>   drivers/vhost/vringh.c | 667 +++++++++++------------------------------
>>>>>>>   include/linux/vringh.h | 100 +++---
>>>>>>>   2 files changed, 225 insertions(+), 542 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>>>>>>> index aa3cd27d2384..ebfd3644a1a3 100644
>>>>>>> --- a/drivers/vhost/vringh.c
>>>>>>> +++ b/drivers/vhost/vringh.c
>>>>>>> @@ -35,15 +35,12 @@ static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
>>>>>>>   }
>>>>>>>
>>>>>>>   /* Returns vring->num if empty, -ve on error. */
>>>>>>> -static inline int __vringh_get_head(const struct vringh *vrh,
>>>>>>> -                                 int (*getu16)(const struct vringh *vrh,
>>>>>>> -                                               u16 *val, const __virtio16 *p),
>>>>>>> -                                 u16 *last_avail_idx)
>>>>>>> +static inline int __vringh_get_head(const struct vringh *vrh, u16 *last_avail_idx)
>>>>>>>   {
>>>>>>>        u16 avail_idx, i, head;
>>>>>>>        int err;
>>>>>>>
>>>>>>> -     err = getu16(vrh, &avail_idx, &vrh->vring.avail->idx);
>>>>>>> +     err = vrh->ops.getu16(vrh, &avail_idx, &vrh->vring.avail->idx);
>>>>>>>        if (err) {
>>>>>>>                vringh_bad("Failed to access avail idx at %p",
>>>>>>>                           &vrh->vring.avail->idx);
>>>>>> I like that this patch removes more lines of code than it adds.
>>>>>>
>>>>>> However one of the design points of vringh abstractions is that they were
>>>>>> carefully written to be very low overhead.
>>>>>> This is why we are passing function pointers to inline functions -
>>>>>> compiler can optimize that out.
>>>>>>
>>>>>> I think that introducing ops indirect functions calls here is going to break
>>>>>> these assumptions and hurt performance.
>>>>>> Unless compiler can somehow figure it out and optimize?
>>>>>> I don't see how it's possible with ops pointer in memory
>>>>>> but maybe I'm wrong.
>>>>> I think your concern is correct. I have to understand the compiler
>>>>> optimization and redesign this approach If it is needed.
>>>>>> Was any effort taken to test effect of these patches on performance?
>>>>> I just tested vringh_test and already faced little performance reduction.
>>>>> I have to investigate that, as you said.
>>>> I attempted to test with perf. I found that the performance of patched code
>>>> is almost the same as the upstream one. However, I have to investigate way
>>>> this patch leads to this result, also the profiling should be run on
>>>> more powerful
>>>> machines too.
>>>>
>>>> environment:
>>>> $ grep 'model name' /proc/cpuinfo
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>> model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
>>>>
>>>> results:
>>>> * for patched code
>>>>   Performance counter stats for 'nice -n -20 ./vringh_test_patched
>>>> --parallel --eventidx --fast-vringh --indirect --virtio-1' (20 runs):
>>>>
>>>>            3,028.05 msec task-clock                #    0.995 CPUs
>>>> utilized            ( +-  0.12% )
>>>>              78,150      context-switches          #   25.691 K/sec
>>>>                 ( +-  0.00% )
>>>>                   5      cpu-migrations            #    1.644 /sec
>>>>                 ( +-  3.33% )
>>>>                 190      page-faults               #   62.461 /sec
>>>>                 ( +-  0.41% )
>>>>       6,919,025,222      cycles                    #    2.275 GHz
>>>>                 ( +-  0.13% )
>>>>       8,990,220,160      instructions              #    1.29  insn per
>>>> cycle           ( +-  0.04% )
>>>>       1,788,326,786      branches                  #  587.899 M/sec
>>>>                 ( +-  0.05% )
>>>>           4,557,398      branch-misses             #    0.25% of all
>>>> branches          ( +-  0.43% )
>>>>
>>>>             3.04359 +- 0.00378 seconds time elapsed  ( +-  0.12% )
>>>>
>>>> * for upstream code
>>>>   Performance counter stats for 'nice -n -20 ./vringh_test_base
>>>> --parallel --eventidx --fast-vringh --indirect --virtio-1' (10 runs):
>>>>
>>>>            3,058.41 msec task-clock                #    0.999 CPUs
>>>> utilized            ( +-  0.14% )
>>>>              78,149      context-switches          #   25.545 K/sec
>>>>                 ( +-  0.00% )
>>>>                   5      cpu-migrations            #    1.634 /sec
>>>>                 ( +-  2.67% )
>>>>                 194      page-faults               #   63.414 /sec
>>>>                 ( +-  0.43% )
>>>>       6,988,713,963      cycles                    #    2.284 GHz
>>>>                 ( +-  0.14% )
>>>>       8,512,533,269      instructions              #    1.22  insn per
>>>> cycle           ( +-  0.04% )
>>>>       1,638,375,371      branches                  #  535.549 M/sec
>>>>                 ( +-  0.05% )
>>>>           4,428,866      branch-misses             #    0.27% of all
>>>> branches          ( +- 22.57% )
>>>>
>>>>             3.06085 +- 0.00420 seconds time elapsed  ( +-  0.14% )
>>>
>>> How you compiled it also matters. ATM we don't enable retpolines
>>> and it did not matter since we didn't have indirect calls,
>>> but we should. Didn't yet investigate how to do that for virtio tools.
>> I think the retpolines certainly affect performance. Thank you for pointing
>> it out. I'd like to start the investigation that how to apply the
>> retpolines to the
>> virtio tools.
>>>>> Thank you for your comments.
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>> Best,
>>>>> Shunsuke.
> This isn't all that trivial if we want this at runtime.
> But compile time is kind of easy.
> See Documentation/admin-guide/hw-vuln/spectre.rst

Thank you for showing it.


I followed the document and added options to CFLAGS to the tools Makefile.

That is

---

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index 1b25cc7c64bb..7b7139d97d74 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -4,7 +4,7 @@ test: virtio_test vringh_test
  virtio_test: virtio_ring.o virtio_test.o
  vringh_test: vringh_test.o vringh.o virtio_ring.o

-CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. 
-I../include/ -I ../../usr/include/ -Wno-pointer-sign 
-fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h
+CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. 
-I../include/ -I ../../usr/include/ -Wno-pointer-sign 
-fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h 
-mfunction-return=thunk -fcf-protection=none -mindirect-branch-register
  CFLAGS += -pthread
  LDFLAGS += -pthread
  vpath %.c ../../drivers/virtio ../../drivers/vhost
---

And results of evaluation are following:

- base with retpoline

$ sudo perf stat --repeat 20 -- nice -n -20 ./vringh_test_retp_origin 
--parallel --eventidx --fast-vringh
Using CPUS 0 and 3
Guest: notified 0, pinged 98040
Host: notified 98040, pinged 0
...

  Performance counter stats for 'nice -n -20 ./vringh_test_retp_origin 
--parallel --eventidx --fast-vringh' (20 runs):

           6,228.33 msec task-clock                #    1.004 CPUs 
utilized            ( +-  0.05% )
            196,110      context-switches          #   31.616 
K/sec                    ( +-  0.00% )
                  6      cpu-migrations            #    0.967 
/sec                     ( +-  2.39% )
                205      page-faults               #   33.049 
/sec                     ( +-  0.46% )
     14,218,527,987      cycles                    #    2.292 
GHz                      ( +-  0.05% )
     10,342,897,254      instructions              #    0.73  insn per 
cycle           ( +-  0.02% )
      2,310,572,989      branches                  #  372.500 
M/sec                    ( +-  0.03% )
        178,273,068      branch-misses             #    7.72% of all 
branches          ( +-  0.04% )

            6.20406 +- 0.00308 seconds time elapsed  ( +-  0.05% )

- patched (unified APIs) with retpoline

$ sudo perf stat --repeat 20 -- nice -n -20 ./vringh_test_retp_patched 
--parallel --eventidx --fast-vringh
Using CPUS 0 and 3
Guest: notified 0, pinged 98040
Host: notified 98040, pinged 0
...

  Performance counter stats for 'nice -n -20 ./vringh_test_retp_patched 
--parallel --eventidx --fast-vringh' (20 runs):

           6,103.94 msec task-clock                #    1.001 CPUs 
utilized            ( +-  0.03% )
            196,125      context-switches          #   32.165 
K/sec                    ( +-  0.00% )
                  7      cpu-migrations            #    1.148 
/sec                     ( +-  1.56% )
                196      page-faults               #   32.144 
/sec                     ( +-  0.41% )
     13,933,055,778      cycles                    #    2.285 
GHz                      ( +-  0.03% )
     10,309,004,718      instructions              #    0.74  insn per 
cycle           ( +-  0.03% )
      2,368,447,519      branches                  #  388.425 
M/sec                    ( +-  0.04% )
        211,364,886      branch-misses             #    8.94% of all 
branches          ( +-  0.05% )

            6.09888 +- 0.00155 seconds time elapsed  ( +-  0.03% )

As a result, at the patched code, the branch-misses was increased but
elapsed time became faster than the based code. The number of 
page-faults was
a little different. I'm suspicious of that the page-fault penalty leads the
performance result.

I think that a pattern of memory access for data is same with those, but
for instruction is different. Actually a code size (.text segment) was a
little smaller. 0x6a65 and 0x63f5.

$ readelf -a ./vringh_test_retp_origin |grep .text -1
        0000000000000008  0000000000000008  AX       0     0     8
   [14] .text             PROGBITS         0000000000001230 00001230
        0000000000006a65  0000000000000000  AX       0     0     16
--
    02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym 
.dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
    03     .init .plt .plt.got .text .fini
    04     .rodata .eh_frame_hdr .eh_frame


$ readelf -a ./vringh_test_retp_patched |grep .text -1
        0000000000000008  0000000000000008  AX       0     0     8
   [14] .text             PROGBITS         0000000000001230 00001230
        00000000000063f5  0000000000000000  AX       0     0     16
--
    02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym 
.dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
    03     .init .plt .plt.got .text .fini
    04     .rodata .eh_frame_hdr .eh_frame

I'll keep this investigation. I was wondering if you could comment me.

Best

>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2023-01-11  4:10 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-27  2:25 [RFC PATCH 0/6] Introduce a vringh accessor for IO memory Shunsuke Mie
2022-12-27  2:25 ` Shunsuke Mie
2022-12-27  2:25 ` [RFC PATCH 1/9] vringh: fix a typo in comments for vringh_kiov Shunsuke Mie
2022-12-27  2:25   ` Shunsuke Mie
2022-12-27  2:25 ` [RFC PATCH 2/9] vringh: remove vringh_iov and unite to vringh_kiov Shunsuke Mie
2022-12-27  2:25   ` Shunsuke Mie
2022-12-27  6:04   ` Jason Wang
2022-12-27  6:04     ` Jason Wang
2022-12-27  7:05     ` Michael S. Tsirkin
2022-12-27  7:05       ` Michael S. Tsirkin
2022-12-27  7:13       ` Shunsuke Mie
2022-12-27  7:13         ` Shunsuke Mie
2022-12-27  7:56         ` Michael S. Tsirkin
2022-12-27  7:56           ` Michael S. Tsirkin
2022-12-27  7:57           ` Shunsuke Mie
2022-12-27  7:57             ` Shunsuke Mie
2022-12-27  7:05     ` Shunsuke Mie
2022-12-27  7:05       ` Shunsuke Mie
2022-12-28  6:36       ` Jason Wang
2022-12-28  6:36         ` Jason Wang
2023-01-11  3:26         ` Shunsuke Mie
2023-01-11  3:26           ` Shunsuke Mie
2023-01-11  5:54           ` Jason Wang
2023-01-11  5:54             ` Jason Wang
2023-01-11  6:19             ` Shunsuke Mie
2023-01-11  6:19               ` Shunsuke Mie
2022-12-27  2:25 ` [RFC PATCH 3/9] tools/virtio: convert to new vringh user APIs Shunsuke Mie
2022-12-27  2:25   ` Shunsuke Mie
2022-12-27  2:25 ` [RFC PATCH 4/9] vringh: unify the APIs for all accessors Shunsuke Mie
2022-12-27  2:25   ` Shunsuke Mie
2022-12-27  7:04   ` Michael S. Tsirkin
2022-12-27  7:04     ` Michael S. Tsirkin
2022-12-27  7:49     ` Shunsuke Mie
2022-12-27  7:49       ` Shunsuke Mie
2022-12-27 10:22       ` Shunsuke Mie
2022-12-27 10:22         ` Shunsuke Mie
2022-12-27 14:37         ` Michael S. Tsirkin
2022-12-27 14:37           ` Michael S. Tsirkin
2022-12-28  2:24           ` Shunsuke Mie
2022-12-28  2:24             ` Shunsuke Mie
2022-12-28  7:20             ` Michael S. Tsirkin
2022-12-28  7:20               ` Michael S. Tsirkin
2023-01-11  4:10               ` Shunsuke Mie [this message]
2023-01-11  4:10                 ` Shunsuke Mie
2022-12-27  2:25 ` [RFC PATCH 5/9] tools/virtio: convert to use new unified vringh APIs Shunsuke Mie
2022-12-27  2:25   ` Shunsuke Mie
2022-12-27  2:25 ` [RFC PATCH 6/9] caif_virtio: convert to " Shunsuke Mie
2022-12-27  2:25   ` Shunsuke Mie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28b421af-838d-e70a-ec95-2f14f21e3a90@igel.co.jp \
    --to=mie@igel.co.jp \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.