netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
@ 2019-08-24  2:00 Jakub Kicinski
  2019-08-26  5:36 ` Song Liu
  2019-08-26 21:34 ` Daniel Borkmann
  0 siblings, 2 replies; 7+ messages in thread
From: Jakub Kicinski @ 2019-08-24  2:00 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Jakub Kicinski

From: Jiong Wang <jiong.wang@netronome.com>

NFP is using Local Memory to model stack. LM_addr could be used as base of
a 16 32-bit word region of Local Memory. Then, if the stack offset is
beyond the current region, the local index needs to be updated. The update
needs at least three cycles to take effect, therefore the sequence normally
looks like:

  local_csr_wr[ActLMAddr3, gprB_5]
  nop
  nop
  nop

If the local index switch happens on a narrow loads, then the instruction
preparing value to zero high 32-bit of the destination register could be
counted as one cycle, the sequence then could be something like:

  local_csr_wr[ActLMAddr3, gprB_5]
  nop
  nop
  immed[gprB_5, 0]

However, we have zero extension optimization that zeroing high 32-bit could
be eliminated, therefore above IMMED insn won't be available for which case
the first sequence needs to be generated.

Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 4054b70d7719..5afcb3c4c2ef 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1163,7 +1163,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	     bool clr_gpr, lmem_step step)
 {
 	s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off;
-	bool first = true, last;
+	bool first = true, narrow_ld, last;
 	bool needs_inc = false;
 	swreg stack_off_reg;
 	u8 prev_gpr = 255;
@@ -1209,13 +1209,22 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 		needs_inc = true;
 	}
+
+	narrow_ld = clr_gpr && size < 8;
+
 	if (lm3) {
+		unsigned int nop_cnt;
+
 		emit_csr_wr(nfp_prog, imm_b(nfp_prog), NFP_CSR_ACT_LM_ADDR3);
-		/* For size < 4 one slot will be filled by zeroing of upper. */
-		wrp_nops(nfp_prog, clr_gpr && size < 8 ? 2 : 3);
+		/* For size < 4 one slot will be filled by zeroing of upper,
+		 * but be careful, that zeroing could be eliminated by zext
+		 * optimization.
+		 */
+		nop_cnt = narrow_ld && meta->flags & FLAG_INSN_DO_ZEXT ? 2 : 3;
+		wrp_nops(nfp_prog, nop_cnt);
 	}
 
-	if (clr_gpr && size < 8)
+	if (narrow_ld)
 		wrp_zext(nfp_prog, meta, gpr);
 
 	while (size) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
  2019-08-24  2:00 [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register Jakub Kicinski
@ 2019-08-26  5:36 ` Song Liu
  2019-08-26 15:57   ` Jakub Kicinski
  2019-08-26 21:34 ` Daniel Borkmann
  1 sibling, 1 reply; 7+ messages in thread
From: Song Liu @ 2019-08-26  5:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Networking,
	oss-drivers, Jiong Wang

On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> From: Jiong Wang <jiong.wang@netronome.com>
>
> NFP is using Local Memory to model stack. LM_addr could be used as base of
> a 16 32-bit word region of Local Memory. Then, if the stack offset is
> beyond the current region, the local index needs to be updated. The update
> needs at least three cycles to take effect, therefore the sequence normally
> looks like:
>
>   local_csr_wr[ActLMAddr3, gprB_5]
>   nop
>   nop
>   nop
>
> If the local index switch happens on a narrow loads, then the instruction
> preparing value to zero high 32-bit of the destination register could be
> counted as one cycle, the sequence then could be something like:
>
>   local_csr_wr[ActLMAddr3, gprB_5]
>   nop
>   nop
>   immed[gprB_5, 0]
>
> However, we have zero extension optimization that zeroing high 32-bit could
> be eliminated, therefore above IMMED insn won't be available for which case
> the first sequence needs to be generated.
>
> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
I haven't looked into the code yet. But ^^^ should be

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

right?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
  2019-08-26  5:36 ` Song Liu
@ 2019-08-26 15:57   ` Jakub Kicinski
  2019-08-26 16:18     ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2019-08-26 15:57 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Networking,
	OSS Drivers, Jiong Wang

On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote:
> On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote:
> > From: Jiong Wang <jiong.wang@netronome.com>
> >
> > NFP is using Local Memory to model stack. LM_addr could be used as base of
> > a 16 32-bit word region of Local Memory. Then, if the stack offset is
> > beyond the current region, the local index needs to be updated. The update
> > needs at least three cycles to take effect, therefore the sequence normally
> > looks like:
> >
> >   local_csr_wr[ActLMAddr3, gprB_5]
> >   nop
> >   nop
> >   nop
> >
> > If the local index switch happens on a narrow loads, then the instruction
> > preparing value to zero high 32-bit of the destination register could be
> > counted as one cycle, the sequence then could be something like:
> >
> >   local_csr_wr[ActLMAddr3, gprB_5]
> >   nop
> >   nop
> >   immed[gprB_5, 0]
> >
> > However, we have zero extension optimization that zeroing high 32-bit could
> > be eliminated, therefore above IMMED insn won't be available for which case
> > the first sequence needs to be generated.
> >
> > Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
> > Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> I haven't looked into the code yet. But ^^^ should be
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>
> right?

I prefer Review on code I review, ack on code I ack, and sign-off on
code I co-author.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
  2019-08-26 15:57   ` Jakub Kicinski
@ 2019-08-26 16:18     ` Alexei Starovoitov
  2019-08-26 16:25       ` Daniel Borkmann
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2019-08-26 16:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Song Liu, Daniel Borkmann, bpf, Networking, OSS Drivers, Jiong Wang

On Mon, Aug 26, 2019 at 8:57 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote:
> > On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote:
> > > From: Jiong Wang <jiong.wang@netronome.com>
> > >
> > > NFP is using Local Memory to model stack. LM_addr could be used as base of
> > > a 16 32-bit word region of Local Memory. Then, if the stack offset is
> > > beyond the current region, the local index needs to be updated. The update
> > > needs at least three cycles to take effect, therefore the sequence normally
> > > looks like:
> > >
> > >   local_csr_wr[ActLMAddr3, gprB_5]
> > >   nop
> > >   nop
> > >   nop
> > >
> > > If the local index switch happens on a narrow loads, then the instruction
> > > preparing value to zero high 32-bit of the destination register could be
> > > counted as one cycle, the sequence then could be something like:
> > >
> > >   local_csr_wr[ActLMAddr3, gprB_5]
> > >   nop
> > >   nop
> > >   immed[gprB_5, 0]
> > >
> > > However, we have zero extension optimization that zeroing high 32-bit could
> > > be eliminated, therefore above IMMED insn won't be available for which case
> > > the first sequence needs to be generated.
> > >
> > > Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
> > > Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> > > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > I haven't looked into the code yet. But ^^^ should be
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >
> > right?
>
> I prefer Review on code I review, ack on code I ack, and sign-off on
> code I co-author.

I believe if you're sending somebody else patch you have to add your SOB
in addition to their 'Author:' and their SOB fields.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
  2019-08-26 16:18     ` Alexei Starovoitov
@ 2019-08-26 16:25       ` Daniel Borkmann
  2019-08-26 16:41         ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2019-08-26 16:25 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski
  Cc: Song Liu, bpf, Networking, OSS Drivers, Jiong Wang

On 8/26/19 6:18 PM, Alexei Starovoitov wrote:
> On Mon, Aug 26, 2019 at 8:57 AM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote:
>>> On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote:
>>>> From: Jiong Wang <jiong.wang@netronome.com>
>>>>
>>>> NFP is using Local Memory to model stack. LM_addr could be used as base of
>>>> a 16 32-bit word region of Local Memory. Then, if the stack offset is
>>>> beyond the current region, the local index needs to be updated. The update
>>>> needs at least three cycles to take effect, therefore the sequence normally
>>>> looks like:
>>>>
>>>>    local_csr_wr[ActLMAddr3, gprB_5]
>>>>    nop
>>>>    nop
>>>>    nop
>>>>
>>>> If the local index switch happens on a narrow loads, then the instruction
>>>> preparing value to zero high 32-bit of the destination register could be
>>>> counted as one cycle, the sequence then could be something like:
>>>>
>>>>    local_csr_wr[ActLMAddr3, gprB_5]
>>>>    nop
>>>>    nop
>>>>    immed[gprB_5, 0]
>>>>
>>>> However, we have zero extension optimization that zeroing high 32-bit could
>>>> be eliminated, therefore above IMMED insn won't be available for which case
>>>> the first sequence needs to be generated.
>>>>
>>>> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
>>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> I haven't looked into the code yet. But ^^^ should be
>>>
>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>
>>> right?
>>
>> I prefer Review on code I review, ack on code I ack, and sign-off on
>> code I co-author.
> 
> I believe if you're sending somebody else patch you have to add your SOB
> in addition to their 'Author:' and their SOB fields.

+1, for co-authoring there's a 'Co-authored-by:' tag which seems to be frequently
used these days.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
  2019-08-26 16:25       ` Daniel Borkmann
@ 2019-08-26 16:41         ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2019-08-26 16:41 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Song Liu, bpf, Networking, OSS Drivers, Jiong Wang

On Mon, 26 Aug 2019 18:25:10 +0200, Daniel Borkmann wrote:
> On 8/26/19 6:18 PM, Alexei Starovoitov wrote:
> > On Mon, Aug 26, 2019 at 8:57 AM Jakub Kicinski
> > <jakub.kicinski@netronome.com> wrote:  
> >> On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote:  
> >>> On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote:  
> >>>> From: Jiong Wang <jiong.wang@netronome.com>
> >>>>
> >>>> NFP is using Local Memory to model stack. LM_addr could be used as base of
> >>>> a 16 32-bit word region of Local Memory. Then, if the stack offset is
> >>>> beyond the current region, the local index needs to be updated. The update
> >>>> needs at least three cycles to take effect, therefore the sequence normally
> >>>> looks like:
> >>>>
> >>>>    local_csr_wr[ActLMAddr3, gprB_5]
> >>>>    nop
> >>>>    nop
> >>>>    nop
> >>>>
> >>>> If the local index switch happens on a narrow loads, then the instruction
> >>>> preparing value to zero high 32-bit of the destination register could be
> >>>> counted as one cycle, the sequence then could be something like:
> >>>>
> >>>>    local_csr_wr[ActLMAddr3, gprB_5]
> >>>>    nop
> >>>>    nop
> >>>>    immed[gprB_5, 0]
> >>>>
> >>>> However, we have zero extension optimization that zeroing high 32-bit could
> >>>> be eliminated, therefore above IMMED insn won't be available for which case
> >>>> the first sequence needs to be generated.
> >>>>
> >>>> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
> >>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>  
> >>> I haven't looked into the code yet. But ^^^ should be
> >>>
> >>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>>
> >>> right?  
> >>
> >> I prefer Review on code I review, ack on code I ack, and sign-off on
> >> code I co-author.  
> > 
> > I believe if you're sending somebody else patch you have to add your SOB
> > in addition to their 'Author:' and their SOB fields.  
> 
> +1, for co-authoring there's a 'Co-authored-by:' tag which seems to be frequently
> used these days.

Ack, there is a difference between co-author of code, and co-author as
step by step guidance. I've been doing this for 6 years now, and nobody
ever complained :)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Is that enough or should I repost?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
  2019-08-24  2:00 [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register Jakub Kicinski
  2019-08-26  5:36 ` Song Liu
@ 2019-08-26 21:34 ` Daniel Borkmann
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Borkmann @ 2019-08-26 21:34 UTC (permalink / raw)
  To: Jakub Kicinski, alexei.starovoitov; +Cc: bpf, netdev, oss-drivers, Jiong Wang

On 8/24/19 4:00 AM, Jakub Kicinski wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
> 
> NFP is using Local Memory to model stack. LM_addr could be used as base of
> a 16 32-bit word region of Local Memory. Then, if the stack offset is
> beyond the current region, the local index needs to be updated. The update
> needs at least three cycles to take effect, therefore the sequence normally
> looks like:
> 
>    local_csr_wr[ActLMAddr3, gprB_5]
>    nop
>    nop
>    nop
> 
> If the local index switch happens on a narrow loads, then the instruction
> preparing value to zero high 32-bit of the destination register could be
> counted as one cycle, the sequence then could be something like:
> 
>    local_csr_wr[ActLMAddr3, gprB_5]
>    nop
>    nop
>    immed[gprB_5, 0]
> 
> However, we have zero extension optimization that zeroing high 32-bit could
> be eliminated, therefore above IMMED insn won't be available for which case
> the first sequence needs to be generated.
> 
> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-26 21:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-24  2:00 [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register Jakub Kicinski
2019-08-26  5:36 ` Song Liu
2019-08-26 15:57   ` Jakub Kicinski
2019-08-26 16:18     ` Alexei Starovoitov
2019-08-26 16:25       ` Daniel Borkmann
2019-08-26 16:41         ` Jakub Kicinski
2019-08-26 21:34 ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).