All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Samudrala, Sridhar" <sridhar.samudrala@intel.com>
To: alexei.starovoitov@gmail.com
Cc: bjorn.topel@gmail.com, bjorn.topel@intel.com,
	bpf@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	jakub.kicinski@netronome.com, maciej.fijalkowski@intel.com,
	magnus.karlsson@intel.com, netdev@vger.kernel.org,
	sridhar.samudrala@intel.com, toke@redhat.com,
	tom.herbert@intel.com
Subject: Re: [Intel-wired-lan] FW: [PATCH bpf-next 2/4] xsk: allow AF_XDP sockets to receive packets directly from a queue
Date: Thu, 24 Oct 2019 11:12:55 -0700	[thread overview]
Message-ID: <68d6e154-8646-7904-f081-10ec32115496@intel.com> (raw)
In-Reply-To: <CAADnVQKwnMChzeGaC66A99cHn5szB4hPZaGXq8JAhd8sjrdGeA@mail.gmail.com>


> > OK. Here is another data point that shows the perf report with the same test but CPU mitigations
> > turned OFF. Here bpf_prog overhead goes down from almost (10.18 + 4.51)% to (3.23 + 1.44%).
> >
> >    21.40%  ksoftirqd/28     [i40e]                     [k] i40e_clean_rx_irq_zc
> >    14.13%  xdpsock          [i40e]                     [k] i40e_clean_rx_irq_zc
> >     8.33%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_rcv
> >     6.09%  ksoftirqd/28     [kernel.vmlinux]           [k] xdp_do_redirect
> >     5.19%  xdpsock          xdpsock                    [.] main
> >     3.48%  ksoftirqd/28     [kernel.vmlinux]           [k] bpf_xdp_redirect_map
> >     3.23%  ksoftirqd/28     bpf_prog_3c8251c7e0fef8db  [k] bpf_prog_3c8251c7e0fef8db
> >
> > So a major component of the bpf_prog overhead seems to be due to the CPU vulnerability mitigations.

> I feel that it's an incorrect conclusion because JIT is not doing
> any retpolines (because there are no indirect calls in bpf).
> There should be no difference in bpf_prog runtime with or without mitigations.
> Also you're running root, so no spectre mitigations either.

> This 3% seems like a lot for a function that does few loads that should
> hit d-cache and one direct call.
> Please investigate why you're seeing this 10% cpu cost when mitigations are on.
> perf report/annotate is the best.
> Also please double check that you're using the latest perf.
> Since bpf performance analysis was greatly improved several versions ago.
> I don't think old perf will be showing bogus numbers, but better to
> run the latest.

Here is perf annotate output for bpf_prog_ with and without mitigations turned ON
Using the perf built from the bpf-next tree.
   perf version 5.3.g4071324a76c1

With mitigations ON
-------------------
Samples: 6K of event 'cycles', 4000 Hz, Event count (approx.): 5646512726
bpf_prog_3c8251c7e0fef8db  bpf_prog_3c8251c7e0fef8db [Percent: local period]
  45.05      push   %rbp
   0.02      mov    %rsp,%rbp
   0.03      sub    $0x8,%rsp
  22.09      push   %rbx
   7.66      push   %r13
   1.08      push   %r14
   1.85      push   %r15
   0.63      pushq  $0x0
   1.13      mov    0x28(%rdi),%rsi
   0.47      mov    0x8(%rsi),%esi
   3.47      mov    %esi,-0x4(%rbp)
   0.02      movabs $0xffff8ab414a83e00,%rdi
   0.90      mov    $0x2,%edx
   2.85      callq  *ffffffffd149fc5f
   1.55      and    $0x6,%rax
             test   %rax,%rax
   1.48      jne    72
             mov    %rbp,%rsi
             add    $0xfffffffffffffffc,%rsi
             movabs $0xffff8ab414a83e00,%rdi
             callq  *ffffffffd0e5fd5f
             mov    %rax,%rdi
             mov    $0x2,%eax
             test   %rdi,%rdi
             je     72
             mov    -0x4(%rbp),%esi
             movabs $0xffff8ab414a83e00,%rdi
             xor    %edx,%edx
             callq  *ffffffffd149fc5f
        72:  pop    %rbx
             pop    %r15
   1.90      pop    %r14
   1.93      pop    %r13
             pop    %rbx
   3.63      leaveq
   2.27      retq

With mitigations OFF
--------------------
Samples: 2K of event 'cycles', 4000 Hz, Event count (approx.): 1872116166
bpf_prog_3c8251c7e0fef8db  bpf_prog_3c8251c7e0fef8db [Percent: local period]
   0.15      push   %rbp
             mov    %rsp,%rbp
  13.79      sub    $0x8,%rsp
   0.30      push   %rbx
   0.15      push   %r13
   0.20      push   %r14
  14.50      push   %r15
   0.20      pushq  $0x0
             mov    0x28(%rdi),%rsi
   0.25      mov    0x8(%rsi),%esi
  14.37      mov    %esi,-0x4(%rbp)
   0.25      movabs $0xffff8ea2c673b800,%rdi
             mov    $0x2,%edx
  13.60      callq  *ffffffffe50c2f38
  14.33      and    $0x6,%rax
             test   %rax,%rax
             jne    72
             mov    %rbp,%rsi
             add    $0xfffffffffffffffc,%rsi
             movabs $0xffff8ea2c673b800,%rdi
             callq  *ffffffffe4a83038
             mov    %rax,%rdi
             mov    $0x2,%eax
             test   %rdi,%rdi
             je     72
             mov    -0x4(%rbp),%esi
             movabs $0xffff8ea2c673b800,%rdi
             xor    %edx,%edx
             callq  *ffffffffe50c2f38
        72:  pop    %rbx
             pop    %r15
  13.97      pop    %r14
   0.10      pop    %r13
             pop    %rbx
  13.71      leaveq
   0.15      retq

Do you see any issues with this data? With mitigations ON push %rbp and push %rbx overhead seems to
be pretty high.

Thanks
Sridhar

WARNING: multiple messages have this Message-ID (diff)
From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] FW: [PATCH bpf-next 2/4] xsk: allow AF_XDP sockets to receive packets directly from a queue
Date: Thu, 24 Oct 2019 11:12:55 -0700	[thread overview]
Message-ID: <68d6e154-8646-7904-f081-10ec32115496@intel.com> (raw)
In-Reply-To: <CAADnVQKwnMChzeGaC66A99cHn5szB4hPZaGXq8JAhd8sjrdGeA@mail.gmail.com>


> > OK. Here is another data point that shows the perf report with the same test but CPU mitigations
> > turned OFF. Here bpf_prog overhead goes down from almost (10.18 + 4.51)% to (3.23 + 1.44%).
> >
> >    21.40%  ksoftirqd/28     [i40e]                     [k] i40e_clean_rx_irq_zc
> >    14.13%  xdpsock          [i40e]                     [k] i40e_clean_rx_irq_zc
> >     8.33%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_rcv
> >     6.09%  ksoftirqd/28     [kernel.vmlinux]           [k] xdp_do_redirect
> >     5.19%  xdpsock          xdpsock                    [.] main
> >     3.48%  ksoftirqd/28     [kernel.vmlinux]           [k] bpf_xdp_redirect_map
> >     3.23%  ksoftirqd/28     bpf_prog_3c8251c7e0fef8db  [k] bpf_prog_3c8251c7e0fef8db
> >
> > So a major component of the bpf_prog overhead seems to be due to the CPU vulnerability mitigations.

> I feel that it's an incorrect conclusion because JIT is not doing
> any retpolines (because there are no indirect calls in bpf).
> There should be no difference in bpf_prog runtime with or without mitigations.
> Also you're running root, so no spectre mitigations either.

> This 3% seems like a lot for a function that does few loads that should
> hit d-cache and one direct call.
> Please investigate why you're seeing this 10% cpu cost when mitigations are on.
> perf report/annotate is the best.
> Also please double check that you're using the latest perf.
> Since bpf performance analysis was greatly improved several versions ago.
> I don't think old perf will be showing bogus numbers, but better to
> run the latest.

Here is perf annotate output for bpf_prog_ with and without mitigations turned ON
Using the perf built from the bpf-next tree.
   perf version 5.3.g4071324a76c1

With mitigations ON
-------------------
Samples: 6K of event 'cycles', 4000 Hz, Event count (approx.): 5646512726
bpf_prog_3c8251c7e0fef8db  bpf_prog_3c8251c7e0fef8db [Percent: local period]
  45.05      push   %rbp
   0.02      mov    %rsp,%rbp
   0.03      sub    $0x8,%rsp
  22.09      push   %rbx
   7.66      push   %r13
   1.08      push   %r14
   1.85      push   %r15
   0.63      pushq  $0x0
   1.13      mov    0x28(%rdi),%rsi
   0.47      mov    0x8(%rsi),%esi
   3.47      mov    %esi,-0x4(%rbp)
   0.02      movabs $0xffff8ab414a83e00,%rdi
   0.90      mov    $0x2,%edx
   2.85      callq  *ffffffffd149fc5f
   1.55      and    $0x6,%rax
             test   %rax,%rax
   1.48      jne    72
             mov    %rbp,%rsi
             add    $0xfffffffffffffffc,%rsi
             movabs $0xffff8ab414a83e00,%rdi
             callq  *ffffffffd0e5fd5f
             mov    %rax,%rdi
             mov    $0x2,%eax
             test   %rdi,%rdi
             je     72
             mov    -0x4(%rbp),%esi
             movabs $0xffff8ab414a83e00,%rdi
             xor    %edx,%edx
             callq  *ffffffffd149fc5f
        72:  pop    %rbx
             pop    %r15
   1.90      pop    %r14
   1.93      pop    %r13
             pop    %rbx
   3.63      leaveq
   2.27      retq

With mitigations OFF
--------------------
Samples: 2K of event 'cycles', 4000 Hz, Event count (approx.): 1872116166
bpf_prog_3c8251c7e0fef8db  bpf_prog_3c8251c7e0fef8db [Percent: local period]
   0.15      push   %rbp
             mov    %rsp,%rbp
  13.79      sub    $0x8,%rsp
   0.30      push   %rbx
   0.15      push   %r13
   0.20      push   %r14
  14.50      push   %r15
   0.20      pushq  $0x0
             mov    0x28(%rdi),%rsi
   0.25      mov    0x8(%rsi),%esi
  14.37      mov    %esi,-0x4(%rbp)
   0.25      movabs $0xffff8ea2c673b800,%rdi
             mov    $0x2,%edx
  13.60      callq  *ffffffffe50c2f38
  14.33      and    $0x6,%rax
             test   %rax,%rax
             jne    72
             mov    %rbp,%rsi
             add    $0xfffffffffffffffc,%rsi
             movabs $0xffff8ea2c673b800,%rdi
             callq  *ffffffffe4a83038
             mov    %rax,%rdi
             mov    $0x2,%eax
             test   %rdi,%rdi
             je     72
             mov    -0x4(%rbp),%esi
             movabs $0xffff8ea2c673b800,%rdi
             xor    %edx,%edx
             callq  *ffffffffe50c2f38
        72:  pop    %rbx
             pop    %r15
  13.97      pop    %r14
   0.10      pop    %r13
             pop    %rbx
  13.71      leaveq
   0.15      retq

Do you see any issues with this data? With mitigations ON push %rbp and push %rbx overhead seems to
be pretty high.

Thanks
Sridhar

  reply	other threads:[~2019-10-24 18:12 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  6:16 [PATCH bpf-next 0/4] Enable direct receive on AF_XDP sockets Sridhar Samudrala
2019-10-08  6:16 ` [Intel-wired-lan] " Sridhar Samudrala
2019-10-08  6:16 ` [PATCH bpf-next 1/4] bpf: introduce bpf_get_prog_id and bpf_set_prog_id helper functions Sridhar Samudrala
2019-10-08  6:16   ` [Intel-wired-lan] " Sridhar Samudrala
2019-10-08  6:16 ` [PATCH bpf-next 2/4] xsk: allow AF_XDP sockets to receive packets directly from a queue Sridhar Samudrala
2019-10-08  6:16   ` [Intel-wired-lan] " Sridhar Samudrala
2019-10-08  6:58   ` Toke Høiland-Jørgensen
2019-10-08  6:58     ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2019-10-08  8:47     ` Björn Töpel
2019-10-08  8:47       ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-08  8:48       ` Björn Töpel
2019-10-08  8:48         ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-08  9:04       ` Toke Høiland-Jørgensen
2019-10-08  9:04         ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2019-10-08  8:05   ` Björn Töpel
2019-10-08  8:05     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-09 16:32     ` Samudrala, Sridhar
2019-10-09 16:32       ` [Intel-wired-lan] " Samudrala, Sridhar
2019-10-09  1:20   ` Alexei Starovoitov
2019-10-09  1:20     ` [Intel-wired-lan] " Alexei Starovoitov
     [not found]     ` <3ED8E928C4210A4289A677D2FEB48235140134CE@fmsmsx111.amr.corp.intel.com>
2019-10-09 16:53       ` FW: " Samudrala, Sridhar
2019-10-09 16:53         ` [Intel-wired-lan] " Samudrala, Sridhar
2019-10-09 17:17         ` Alexei Starovoitov
2019-10-09 17:17           ` [Intel-wired-lan] " Alexei Starovoitov
2019-10-09 19:12           ` Samudrala, Sridhar
2019-10-09 19:12             ` [Intel-wired-lan] " Samudrala, Sridhar
2019-10-10  1:06             ` Alexei Starovoitov
2019-10-10  1:06               ` [Intel-wired-lan] " Alexei Starovoitov
2019-10-18 18:40               ` Samudrala, Sridhar
2019-10-18 18:40                 ` [Intel-wired-lan] " Samudrala, Sridhar
2019-10-18 19:22                 ` Toke Høiland-Jørgensen
2019-10-18 19:22                   ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2019-10-19  0:14                 ` Alexei Starovoitov
2019-10-19  0:14                   ` [Intel-wired-lan] " Alexei Starovoitov
2019-10-19  0:45                   ` Samudrala, Sridhar
2019-10-19  0:45                     ` [Intel-wired-lan] " Samudrala, Sridhar
2019-10-19  2:25                     ` Alexei Starovoitov
2019-10-19  2:25                       ` [Intel-wired-lan] " Alexei Starovoitov
2019-10-20 10:14                       ` Toke Høiland-Jørgensen
2019-10-20 10:14                         ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2019-10-20 17:12                         ` Björn Töpel
2019-10-20 17:12                           ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-21 20:10                           ` Samudrala, Sridhar
2019-10-21 20:10                             ` Samudrala, Sridhar
2019-10-21 22:34                             ` Alexei Starovoitov
2019-10-21 22:34                               ` Alexei Starovoitov
2019-10-22 19:06                               ` Samudrala, Sridhar
2019-10-22 19:06                                 ` Samudrala, Sridhar
2019-10-23 17:42                                 ` Alexei Starovoitov
2019-10-23 17:42                                   ` Alexei Starovoitov
2019-10-24 18:12                                   ` Samudrala, Sridhar [this message]
2019-10-24 18:12                                     ` Samudrala, Sridhar
2019-10-25  7:42                                     ` Björn Töpel
2019-10-25  7:42                                       ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-31 22:38                                       ` Samudrala, Sridhar
2019-10-31 22:38                                         ` Samudrala, Sridhar
2019-10-31 23:15                                         ` Alexei Starovoitov
2019-10-31 23:15                                           ` Alexei Starovoitov
2019-11-01  0:21                                         ` Jakub Kicinski
2019-11-01  0:21                                           ` Jakub Kicinski
2019-11-01 18:31                                           ` Samudrala, Sridhar
2019-11-01 18:31                                             ` Samudrala, Sridhar
2019-11-04  2:08                                           ` dan
2019-11-04  2:08                                             ` dan
2019-10-25  9:07                                   ` Björn Töpel
2019-10-25  9:07                                     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-08  6:16 ` [PATCH bpf-next 3/4] libbpf: handle AF_XDP sockets created with XDP_DIRECT bind flag Sridhar Samudrala
2019-10-08  6:16   ` [Intel-wired-lan] " Sridhar Samudrala
2019-10-08  8:05   ` Björn Töpel
2019-10-08  8:05     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-08  6:16 ` [PATCH bpf-next 4/4] xdpsock: add an option to create AF_XDP sockets in XDP_DIRECT mode Sridhar Samudrala
2019-10-08  6:16   ` [Intel-wired-lan] " Sridhar Samudrala
2019-10-08  8:05   ` Björn Töpel
2019-10-08  8:05     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-08  8:05 ` [PATCH bpf-next 0/4] Enable direct receive on AF_XDP sockets Björn Töpel
2019-10-08  8:05   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-10-09 16:19   ` Samudrala, Sridhar
2019-10-09 16:19     ` [Intel-wired-lan] " Samudrala, Sridhar
2019-10-09  0:49 ` Jakub Kicinski
2019-10-09  0:49   ` [Intel-wired-lan] " Jakub Kicinski
2019-10-09  6:29   ` Samudrala, Sridhar
2019-10-09  6:29     ` [Intel-wired-lan] " Samudrala, Sridhar
2019-10-09 16:53     ` Jakub Kicinski
2019-10-09 16:53       ` [Intel-wired-lan] " Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68d6e154-8646-7904-f081-10ec32115496@intel.com \
    --to=sridhar.samudrala@intel.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bjorn.topel@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    --cc=tom.herbert@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.