bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next] bpf: doc: update answer for 32-bit subregister question
@ 2019-05-30  7:44 Jiong Wang
  2019-05-30 18:16 ` Song Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Jiong Wang @ 2019-05-30  7:44 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

There has been quite a few progress around the two steps mentioned in the
answer to the following question:

  Q: BPF 32-bit subregister requirements

This patch updates the answer to reflect what has been done.

v1:
 - Integrated rephrase from Quentin and Jakub.

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 Documentation/bpf/bpf_design_QA.rst | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
index cb402c5..5092a2a 100644
--- a/Documentation/bpf/bpf_design_QA.rst
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit
 CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
 be added to BPF in the future?
 
-A: NO. The first thing to improve performance on 32-bit archs is to teach
-LLVM to generate code that uses 32-bit subregisters. Then second step
-is to teach verifier to mark operations where zero-ing upper bits
-is unnecessary. Then JITs can take advantage of those markings and
-drastically reduce size of generated code and improve performance.
+A: NO
+
+But some optimizations on zero-ing the upper 32 bits for BPF registers are
+available, and can be leveraged to improve the performance of JIT compilers
+for 32-bit architectures.
+
+Starting with version 7, LLVM is able to generate instructions that operate
+on 32-bit subregisters, provided the option -mattr=+alu32 is passed for
+compiling a program. Furthermore, the verifier can now mark the
+instructions for which zero-ing the upper bits of the destination register
+is required, and insert an explicit zero-extension (zext) instruction
+(a mov32 variant). This means that for architectures without zext hardware
+support, the JIT back-ends do not need to clear the upper bits for
+subregisters written by alu32 instructions or narrow loads. Instead, the
+back-ends simply need to support code generation for that mov32 variant,
+and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to
+enable zext insertion in the verifier).
+
+Note that it is possible for a JIT back-end to have partial hardware
+support for zext. In that case, if verifier zext insertion is enabled,
+it could lead to the insertion of unnecessary zext instructions. Such
+instructions could be removed by creating a simple peephole inside the JIT
+back-end: if one instruction has hardware support for zext and if the next
+instruction is an explicit zext, then the latter can be skipped when doing
+the code generation.
 
 Q: Does BPF have a stable ABI?
 ------------------------------
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf-next] bpf: doc: update answer for 32-bit subregister question
  2019-05-30  7:44 [PATCH bpf-next] bpf: doc: update answer for 32-bit subregister question Jiong Wang
@ 2019-05-30 18:16 ` Song Liu
  2019-05-30 20:11   ` Jiong Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Song Liu @ 2019-05-30 18:16 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Networking, oss-drivers

On Thu, May 30, 2019 at 12:46 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
> There has been quite a few progress around the two steps mentioned in the
> answer to the following question:
>
>   Q: BPF 32-bit subregister requirements
>
> This patch updates the answer to reflect what has been done.
>
> v1:
>  - Integrated rephrase from Quentin and Jakub.
>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  Documentation/bpf/bpf_design_QA.rst | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
> index cb402c5..5092a2a 100644
> --- a/Documentation/bpf/bpf_design_QA.rst
> +++ b/Documentation/bpf/bpf_design_QA.rst
> @@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit
>  CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
>  be added to BPF in the future?
>
> -A: NO. The first thing to improve performance on 32-bit archs is to teach
> -LLVM to generate code that uses 32-bit subregisters. Then second step
> -is to teach verifier to mark operations where zero-ing upper bits
> -is unnecessary. Then JITs can take advantage of those markings and
> -drastically reduce size of generated code and improve performance.
> +A: NO

Add period "."?

> +
> +But some optimizations on zero-ing the upper 32 bits for BPF registers are
> +available, and can be leveraged to improve the performance of JIT compilers
> +for 32-bit architectures.

I guess it should be "improve the performance of JITed BPF programs for 32-bit
architectures"?

Thanks,
Song

> +
> +Starting with version 7, LLVM is able to generate instructions that operate
> +on 32-bit subregisters, provided the option -mattr=+alu32 is passed for
> +compiling a program. Furthermore, the verifier can now mark the
> +instructions for which zero-ing the upper bits of the destination register
> +is required, and insert an explicit zero-extension (zext) instruction
> +(a mov32 variant). This means that for architectures without zext hardware
> +support, the JIT back-ends do not need to clear the upper bits for
> +subregisters written by alu32 instructions or narrow loads. Instead, the
> +back-ends simply need to support code generation for that mov32 variant,
> +and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to
> +enable zext insertion in the verifier).
> +
> +Note that it is possible for a JIT back-end to have partial hardware
> +support for zext. In that case, if verifier zext insertion is enabled,
> +it could lead to the insertion of unnecessary zext instructions. Such
> +instructions could be removed by creating a simple peephole inside the JIT
> +back-end: if one instruction has hardware support for zext and if the next
> +instruction is an explicit zext, then the latter can be skipped when doing
> +the code generation.
>
>  Q: Does BPF have a stable ABI?
>  ------------------------------
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf-next] bpf: doc: update answer for 32-bit subregister question
  2019-05-30 18:16 ` Song Liu
@ 2019-05-30 20:11   ` Jiong Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Jiong Wang @ 2019-05-30 20:11 UTC (permalink / raw)
  To: Song Liu
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, bpf, Networking,
	oss-drivers


Song Liu writes:

> On Thu, May 30, 2019 at 12:46 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>> There has been quite a few progress around the two steps mentioned in the
>> answer to the following question:
>>
>>   Q: BPF 32-bit subregister requirements
>>
>> This patch updates the answer to reflect what has been done.
>>
>> v1:
>>  - Integrated rephrase from Quentin and Jakub.
>>
>> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>  Documentation/bpf/bpf_design_QA.rst | 30 +++++++++++++++++++++++++-----
>>  1 file changed, 25 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
>> index cb402c5..5092a2a 100644
>> --- a/Documentation/bpf/bpf_design_QA.rst
>> +++ b/Documentation/bpf/bpf_design_QA.rst
>> @@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit
>>  CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
>>  be added to BPF in the future?
>>
>> -A: NO. The first thing to improve performance on 32-bit archs is to teach
>> -LLVM to generate code that uses 32-bit subregisters. Then second step
>> -is to teach verifier to mark operations where zero-ing upper bits
>> -is unnecessary. Then JITs can take advantage of those markings and
>> -drastically reduce size of generated code and improve performance.
>> +A: NO
>
> Add period "."?

Ack

>
>> +
>> +But some optimizations on zero-ing the upper 32 bits for BPF registers are
>> +available, and can be leveraged to improve the performance of JIT compilers
>> +for 32-bit architectures.
>
> I guess it should be "improve the performance of JITed BPF programs for 32-bit
> architectures"?

Ack, that is more accurate.

Will respin.

Thanks.

Regards,
Jiong

>
> Thanks,
> Song
>
>> +
>> +Starting with version 7, LLVM is able to generate instructions that operate
>> +on 32-bit subregisters, provided the option -mattr=+alu32 is passed for
>> +compiling a program. Furthermore, the verifier can now mark the
>> +instructions for which zero-ing the upper bits of the destination register
>> +is required, and insert an explicit zero-extension (zext) instruction
>> +(a mov32 variant). This means that for architectures without zext hardware
>> +support, the JIT back-ends do not need to clear the upper bits for
>> +subregisters written by alu32 instructions or narrow loads. Instead, the
>> +back-ends simply need to support code generation for that mov32 variant,
>> +and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to
>> +enable zext insertion in the verifier).
>> +
>> +Note that it is possible for a JIT back-end to have partial hardware
>> +support for zext. In that case, if verifier zext insertion is enabled,
>> +it could lead to the insertion of unnecessary zext instructions. Such
>> +instructions could be removed by creating a simple peephole inside the JIT
>> +back-end: if one instruction has hardware support for zext and if the next
>> +instruction is an explicit zext, then the latter can be skipped when doing
>> +the code generation.
>>
>>  Q: Does BPF have a stable ABI?
>>  ------------------------------
>> --
>> 2.7.4
>>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-05-30 20:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-30  7:44 [PATCH bpf-next] bpf: doc: update answer for 32-bit subregister question Jiong Wang
2019-05-30 18:16 ` Song Liu
2019-05-30 20:11   ` Jiong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).