All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH man v4] bpf.2: various updates/follow-ups to address some fixmes
@ 2015-07-29 22:25 Daniel Borkmann
       [not found] ` <e56dadb710e4beec76c02823933c232e19af025b.1438208478.git.daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2015-07-29 22:25 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: ast-uqk4Ao+rVK5Wk0Htik3J/w, linux-man-u79uwXL29TY76Z2rM5mHXA,
	Daniel Borkmann

A couple of follow-ups to the bpf(2) man-page, besides others:

 * Description of map data types
 * Explanation on eBPF tail calls and program arrays
 * Paragraph on tc holding ref of the eBPF program in the kernel
 * Updated ASCII image with tc ingress and egress invocations
 * __sync_fetch_and_add() and example usage mentioned on arrays
 * minor reword on the licensing and other minor fixups

Signed-off-by: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
---
 v3->v4:
  - s/bpf (2)/bpf ()/, thanks Mike!
 v2->v3:
  - Feedback from Michael, thanks!
 v1->v2:
  - Reworded __sync_fetch_and_add sentence, hope that's better.

 man2/bpf.2 | 147 +++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 89 insertions(+), 58 deletions(-)

diff --git a/man2/bpf.2 b/man2/bpf.2
index 2b96ebc..85b1631 100644
--- a/man2/bpf.2
+++ b/man2/bpf.2
@@ -51,42 +51,45 @@ opcode extension provided by eBPF)
 and access shared data structures such as eBPF maps.
 .\"
 .SS Extended BPF Design/Architecture
-.\"
-.\" FIXME In the following line, what does "different data types" mean?
-.\"       Are the values in a map not just blobs?
-.\" Daniel Borkmann commented:
-.\"     Sort of, currently, these blobs can have different sizes of keys
-.\"     and values (you can even have structs as keys). For the map itself
-.\"     they are treated as blob internally. However, recently, bpf tail call
-.\"     got added where you can lookup another program from an array map and
-.\"     call into it. Here, that particular type of map can only have entries
-.\"     of type of eBPF program fd. I think, if needed, adding a paragraph to
-.\"     the tail call could be done as follow-up after we have an initial man
-.\"     page in the tree included.
-.\"
 eBPF maps are a generic data structure for storage of different data types.
+Data types are generally treated as binary blobs, so a user just specifies
+the size of the key and the size of the value at map creation time.
+In other words, a key/value for a given map can have an arbitrary structure.
+
 A user process can create multiple maps (with key/value-pairs being
 opaque bytes of data) and access them via file descriptors.
 Different eBPF programs can access the same maps in parallel.
 It's up to the user process and eBPF program to decide what they store
 inside maps.
+
+There's one special map type which is a program array.
+This map stores file descriptors to other eBPF programs.
+Thus, when a lookup in that map is performed, the program flow is
+redirected in-place to the beginning of the new eBPF program without
+returning back.
+The level of nesting has a fixed limit of 32, so that infinite loops cannot
+be crafted.
+During runtime, the program file descriptors stored in that map can be modified,
+so program functionality can be altered based on specific requirements.
+All programs stored in such a map have been loaded into the kernel via
+.BR bpf ()
+as well.
+In case a lookup has failed, the current program continues its execution.
+See BPF_MAP_TYPE_PROG_ARRAY below for further details.
 .P
-eBPF programs are loaded by the user
-process and automatically unloaded when the process exits.
-.\"
-.\" FIXME Daniel Borkmann commented about the preceding sentence:
-.\"
-.\"     Generally that's true. Btw, in 4.1 kernel, tc(8) also got support for
-.\"     eBPF classifier and actions, and here it's slightly different: in tc,
-.\"     we load the programs, maps etc, and push down the eBPF program fd in
-.\"     order to let the kernel hold reference on the program itself.
-.\"
-.\"     Thus, there, the program fd that the application owns is gone when the
-.\"     application terminates, but the eBPF program itself still lives on
-.\"     inside the kernel.
-.\"
-.\" Probably something should be said about this in this man page.
-.\"
+Generally, eBPF programs are loaded by the user process and automatically
+unloaded when the process exits. In some cases, for example,
+.BR tc-bpf (8),
+the program will continue to stay alive inside the kernel even after the
+the process that loaded the program exits.
+In that case, the tc subsystem holds a reference to the program after the
+file descriptor has been dropped by the user.
+Thus, whether a specific program continues to live inside the kernel
+depends on how it is further attached to a given kernel subsystem
+after it was loaded via
+.BR bpf ()
+\.
+
 Each program is a set of instructions that is safe to run until
 its completion.
 An in-kernel verifier statically determines that the eBPF program
@@ -105,20 +108,21 @@ A new event triggers execution of the eBPF program, which
 may store information about the event in eBPF maps.
 Beyond storing data, eBPF programs may call a fixed set of
 in-kernel helper functions.
+
 The same eBPF program can be attached to multiple events and different
 eBPF programs can access the same map:
 
 .in +4n
 .nf
-tracing     tracing     tracing     packet      packet
-event A     event B     event C     on eth0     on eth1
- |             |          |           |           |
- |             |          |           |           |
- --> tracing <--      tracing       socket    tc ingress
-      prog_1           prog_2       prog_3    classifier
-      |  |               |            |         prog_4
-   |---  -----|  |-------|           map_3
- map_1       map_2
+tracing     tracing     tracing     packet      packet     packet
+event A     event B     event C     on eth0     on eth1    on eth2
+ |             |          |           |           |          ^
+ |             |          |           |           v          |
+ --> tracing <--      tracing       socket    tc ingress   tc egress
+      prog_1           prog_2       prog_3    classifier    action
+      |  |               |            |         prog_4      prog_5
+   |---  -----|  |-------|           map_3        |           |
+ map_1       map_2                                --| map_4 |--
 .fi
 .in
 .\"
@@ -612,10 +616,15 @@ since elements cannot be deleted.
 replaces elements in a
 .B nonatomic
 fashion;
-.\" FIXME
-.\" Daniel Borkmann: when you have a value_size of sizeof(long), you can
-.\" however use __sync_fetch_and_add() atomic builtin from the LLVM backend
-for atomic updates, a hash-table map should be used instead.
+for atomic updates, a hash-table map should be used instead. There is
+however one special case that can also be used with arrays: the atomic
+built-in
+.BR __sync_fetch_and_add()
+can be used on 32 and 64 bit atomic counters. For example, it can be
+applied on the whole value itself if it represents a single counter,
+or in case of a structure containing multiple counters, it could be
+used on individual ones. This is quite often useful for aggregation
+and accounting of events.
 .RE
 .IP
 Among the uses for array maps are the following:
@@ -626,11 +635,46 @@ and where the value is a collection of 'global' variables which
 eBPF programs can use to keep state between events.
 .IP *
 Aggregation of tracing events into a fixed set of buckets.
+.IP *
+Accounting of networking events, for example, number of packets and packet
+sizes.
 .RE
 .TP
 .BR BPF_MAP_TYPE_PROG_ARRAY " (since Linux 4.2)"
-.\" FIXME we need documentation of BPF_MAP_TYPE_PROG_ARRAY
-[To be completed]
+A program array map is a special kind of array map, whose map values only
+contain valid file descriptors to other eBPF programs. Thus both the
+key_size and value_size must be exactly four bytes.
+This map is used in conjunction with the
+.BR bpf_tail_call()
+helper.
+
+This means that an eBPF program with a program array map attached to it
+can call from kernel side into
+
+.in +4n
+.nf
+void bpf_tail_call(void *context, void *prog_map, unsigned int index);
+.fi
+.in
+
+and therefore replace its own program flow with the one from the program
+at the given program array slot if present. This can be regarded as kind
+of a jump table to a different eBPF program. The invoked program will then
+reuse the same stack. When a jump into the new program has been performed,
+it won't return to the old one anymore.
+
+If no eBPF program is found at the given index of the program array,
+execution continues with the current eBPF program.
+This can be used as a fall-through for default cases.
+
+A program array map is useful, for example, in tracing or networking, to
+handle individual system calls resp. protocols in its own sub-programs and
+use their identifiers as an individual map index. This approach may result
+in performance benefits, and also makes it possible to overcome the maximum
+instruction limit of a single program.
+In dynamic environments, a user space daemon may atomically replace individual
+sub-programs at run-time with newer versions to alter overall program
+behavior, for instance, when global policies might change.
 .\"
 .SS eBPF programs
 The
@@ -699,20 +743,7 @@ is a license string, which must be GPL compatible to call helper functions
 marked
 .IR gpl_only .
 (The licensing rules are the same as for kernel modules,
-so that dual licenses, such as "Dual BSD/GPL", may be used.)
-.\" Daniel Borkmann commented:
-.\"     Not strictly. So here, the same rules apply as with kernel modules.
-.\"     I.e. what the kernel checks for are the following license strings:
-.\"
-.\"     static inline int license_is_gpl_compatible(const char *license)
-.\"     {
-.\"     	return (strcmp(license, "GPL") == 0
-.\"     		|| strcmp(license, "GPL v2") == 0
-.\"     		|| strcmp(license, "GPL and additional rights") == 0
-.\"     		|| strcmp(license, "Dual BSD/GPL") == 0
-.\"     		|| strcmp(license, "Dual MIT/GPL") == 0
-.\"     		|| strcmp(license, "Dual MPL/GPL") == 0);
-.\"     }
+so that also dual licenses, such as "Dual BSD/GPL", may be used.)
 .IP *
 .I log_buf
 is a pointer to a caller-allocated buffer in which the in-kernel
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH man v4] bpf.2: various updates/follow-ups to address some fixmes
       [not found] ` <e56dadb710e4beec76c02823933c232e19af025b.1438208478.git.daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-08-05 20:09   ` Michael Kerrisk (man-pages)
       [not found]     ` <55C26D97.6090408-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-08-05 20:09 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, ast-uqk4Ao+rVK5Wk0Htik3J/w,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi Daniel,

Thanks very much for this! I've applied this patch, but have a few
comments below.

On 07/30/2015 12:25 AM, Daniel Borkmann wrote:
> A couple of follow-ups to the bpf(2) man-page, besides others:
> 
>  * Description of map data types
>  * Explanation on eBPF tail calls and program arrays
>  * Paragraph on tc holding ref of the eBPF program in the kernel
>  * Updated ASCII image with tc ingress and egress invocations
>  * __sync_fetch_and_add() and example usage mentioned on arrays
>  * minor reword on the licensing and other minor fixups
> 
> Signed-off-by: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> ---
>  v3->v4:
>   - s/bpf (2)/bpf ()/, thanks Mike!
>  v2->v3:
>   - Feedback from Michael, thanks!
>  v1->v2:
>   - Reworded __sync_fetch_and_add sentence, hope that's better.
> 
>  man2/bpf.2 | 147 +++++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 89 insertions(+), 58 deletions(-)
> 
> diff --git a/man2/bpf.2 b/man2/bpf.2
> index 2b96ebc..85b1631 100644
> --- a/man2/bpf.2
> +++ b/man2/bpf.2
> @@ -51,42 +51,45 @@ opcode extension provided by eBPF)
>  and access shared data structures such as eBPF maps.
>  .\"
>  .SS Extended BPF Design/Architecture
> -.\"
> -.\" FIXME In the following line, what does "different data types" mean?
> -.\"       Are the values in a map not just blobs?
> -.\" Daniel Borkmann commented:
> -.\"     Sort of, currently, these blobs can have different sizes of keys
> -.\"     and values (you can even have structs as keys). For the map itself
> -.\"     they are treated as blob internally. However, recently, bpf tail call
> -.\"     got added where you can lookup another program from an array map and
> -.\"     call into it. Here, that particular type of map can only have entries
> -.\"     of type of eBPF program fd. I think, if needed, adding a paragraph to
> -.\"     the tail call could be done as follow-up after we have an initial man
> -.\"     page in the tree included.
> -.\"
>  eBPF maps are a generic data structure for storage of different data types.
> +Data types are generally treated as binary blobs, so a user just specifies
> +the size of the key and the size of the value at map creation time.
> +In other words, a key/value for a given map can have an arbitrary structure.
> +
>  A user process can create multiple maps (with key/value-pairs being
>  opaque bytes of data) and access them via file descriptors.
>  Different eBPF programs can access the same maps in parallel.
>  It's up to the user process and eBPF program to decide what they store
>  inside maps.
> +
> +There's one special map type which is a program array.
> +This map stores file descriptors to other eBPF programs.
> +Thus, when a lookup in that map is performed, the program flow is
> +redirected in-place to the beginning of the new eBPF program without

I changed "of the new" to "of another". Okay.

> +returning back.
> +The level of nesting has a fixed limit of 32, so that infinite loops cannot

Where is that limit of 32 defined, by the way?

> +be crafted.
> +During runtime, the program file descriptors stored in that map can be modified,
> +so program functionality can be altered based on specific requirements.
> +All programs stored in such a map have been loaded into the kernel via
> +.BR bpf ()
> +as well.
> +In case a lookup has failed, the current program continues its execution.

What are the possible causes of failure? It may be worth saying something about
this in the man page.

> +See BPF_MAP_TYPE_PROG_ARRAY below for further details.
>  .P
> -eBPF programs are loaded by the user
> -process and automatically unloaded when the process exits.
> -.\"
> -.\" FIXME Daniel Borkmann commented about the preceding sentence:
> -.\"
> -.\"     Generally that's true. Btw, in 4.1 kernel, tc(8) also got support for
> -.\"     eBPF classifier and actions, and here it's slightly different: in tc,
> -.\"     we load the programs, maps etc, and push down the eBPF program fd in
> -.\"     order to let the kernel hold reference on the program itself.
> -.\"
> -.\"     Thus, there, the program fd that the application owns is gone when the
> -.\"     application terminates, but the eBPF program itself still lives on
> -.\"     inside the kernel.
> -.\"
> -.\" Probably something should be said about this in this man page.
> -.\"
> +Generally, eBPF programs are loaded by the user process and automatically
> +unloaded when the process exits. In some cases, for example,

(For future patches, please start new sentences on new lines.)

> +.BR tc-bpf (8),
> +the program will continue to stay alive inside the kernel even after the
> +the process that loaded the program exits.
> +In that case, the tc subsystem holds a reference to the program after the
> +file descriptor has been dropped by the user.
> +Thus, whether a specific program continues to live inside the kernel
> +depends on how it is further attached to a given kernel subsystem
> +after it was loaded via
> +.BR bpf ()
> +\.
> +
>  Each program is a set of instructions that is safe to run until
>  its completion.
>  An in-kernel verifier statically determines that the eBPF program
> @@ -105,20 +108,21 @@ A new event triggers execution of the eBPF program, which
>  may store information about the event in eBPF maps.
>  Beyond storing data, eBPF programs may call a fixed set of
>  in-kernel helper functions.
> +
>  The same eBPF program can be attached to multiple events and different
>  eBPF programs can access the same map:
>  
>  .in +4n
>  .nf
> -tracing     tracing     tracing     packet      packet
> -event A     event B     event C     on eth0     on eth1
> - |             |          |           |           |
> - |             |          |           |           |
> - --> tracing <--      tracing       socket    tc ingress
> -      prog_1           prog_2       prog_3    classifier
> -      |  |               |            |         prog_4
> -   |---  -----|  |-------|           map_3
> - map_1       map_2
> +tracing     tracing     tracing     packet      packet     packet
> +event A     event B     event C     on eth0     on eth1    on eth2
> + |             |          |           |           |          ^
> + |             |          |           |           v          |
> + --> tracing <--      tracing       socket    tc ingress   tc egress
> +      prog_1           prog_2       prog_3    classifier    action
> +      |  |               |            |         prog_4      prog_5
> +   |---  -----|  |-------|           map_3        |           |
> + map_1       map_2                                --| map_4 |--
>  .fi
>  .in
>  .\"
> @@ -612,10 +616,15 @@ since elements cannot be deleted.
>  replaces elements in a
>  .B nonatomic
>  fashion;
> -.\" FIXME
> -.\" Daniel Borkmann: when you have a value_size of sizeof(long), you can
> -.\" however use __sync_fetch_and_add() atomic builtin from the LLVM backend
> -for atomic updates, a hash-table map should be used instead.
> +for atomic updates, a hash-table map should be used instead. There is
> +however one special case that can also be used with arrays: the atomic
> +built-in
> +.BR __sync_fetch_and_add()
> +can be used on 32 and 64 bit atomic counters. For example, it can be
> +applied on the whole value itself if it represents a single counter,
> +or in case of a structure containing multiple counters, it could be
> +used on individual ones. This is quite often useful for aggregation
> +and accounting of events.
>  .RE
>  .IP
>  Among the uses for array maps are the following:
> @@ -626,11 +635,46 @@ and where the value is a collection of 'global' variables which
>  eBPF programs can use to keep state between events.
>  .IP *
>  Aggregation of tracing events into a fixed set of buckets.
> +.IP *
> +Accounting of networking events, for example, number of packets and packet
> +sizes.
>  .RE
>  .TP
>  .BR BPF_MAP_TYPE_PROG_ARRAY " (since Linux 4.2)"
> -.\" FIXME we need documentation of BPF_MAP_TYPE_PROG_ARRAY
> -[To be completed]
> +A program array map is a special kind of array map, whose map values only
> +contain valid file descriptors to other eBPF programs. Thus both the
> +key_size and value_size must be exactly four bytes.
> +This map is used in conjunction with the
> +.BR bpf_tail_call()
> +helper.

Is this caller already in mainline? I could not find it in the current rc?

> +
> +This means that an eBPF program with a program array map attached to it
> +can call from kernel side into
> +
> +.in +4n
> +.nf
> +void bpf_tail_call(void *context, void *prog_map, unsigned int index);
> +.fi
> +.in
> +
> +and therefore replace its own program flow with the one from the program
> +at the given program array slot if present. This can be regarded as kind
> +of a jump table to a different eBPF program. The invoked program will then
> +reuse the same stack. 

Are there any implications of the fact that it uses the same stack
that should be mentioned in the man page?

> When a jump into the new program has been performed,
> +it won't return to the old one anymore.
> +
> +If no eBPF program is found at the given index of the program array,

I find this language a little unclear. The array does not contain eBPF
programs, but rather file descriptors. So, what does "not found" here
really mean? (I added a FIXME.)

> +execution continues with the current eBPF program.
> +This can be used as a fall-through for default cases.
> +
> +A program array map is useful, for example, in tracing or networking, to
> +handle individual system calls resp. protocols in its own sub-programs and
> +use their identifiers as an individual map index. This approach may result
> +in performance benefits, and also makes it possible to overcome the maximum
> +instruction limit of a single program.
> +In dynamic environments, a user space daemon may atomically replace individual
> +sub-programs at run-time with newer versions to alter overall program
> +behavior, for instance, when global policies might change.
>  .\"
>  .SS eBPF programs
>  The
> @@ -699,20 +743,7 @@ is a license string, which must be GPL compatible to call helper functions
>  marked
>  .IR gpl_only .
>  (The licensing rules are the same as for kernel modules,
> -so that dual licenses, such as "Dual BSD/GPL", may be used.)
> -.\" Daniel Borkmann commented:
> -.\"     Not strictly. So here, the same rules apply as with kernel modules.
> -.\"     I.e. what the kernel checks for are the following license strings:
> -.\"
> -.\"     static inline int license_is_gpl_compatible(const char *license)
> -.\"     {
> -.\"     	return (strcmp(license, "GPL") == 0
> -.\"     		|| strcmp(license, "GPL v2") == 0
> -.\"     		|| strcmp(license, "GPL and additional rights") == 0
> -.\"     		|| strcmp(license, "Dual BSD/GPL") == 0
> -.\"     		|| strcmp(license, "Dual MIT/GPL") == 0
> -.\"     		|| strcmp(license, "Dual MPL/GPL") == 0);
> -.\"     }
> +so that also dual licenses, such as "Dual BSD/GPL", may be used.)
>  .IP *
>  .I log_buf
>  is a pointer to a caller-allocated buffer in which the in-kernel

I made some minor tweaks after applying your patch, but
I think I've noted the more significant changes I made above.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH man v4] bpf.2: various updates/follow-ups to address some fixmes
       [not found]     ` <55C26D97.6090408-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-08-05 20:33       ` Daniel Borkmann
       [not found]         ` <55C2730E.1090807-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2015-08-05 20:33 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: ast-uqk4Ao+rVK5Wk0Htik3J/w, linux-man-u79uwXL29TY76Z2rM5mHXA

On 08/05/2015 10:09 PM, Michael Kerrisk (man-pages) wrote:
...
>> +There's one special map type which is a program array.
>> +This map stores file descriptors to other eBPF programs.
>> +Thus, when a lookup in that map is performed, the program flow is
>> +redirected in-place to the beginning of the new eBPF program without
>
> I changed "of the new" to "of another". Okay.

Okay, thanks.

>> +returning back.
>> +The level of nesting has a fixed limit of 32, so that infinite loops cannot
>
> Where is that limit of 32 defined, by the way?

Currently, an implementation detail in the kernel, so not exposed under uapi.
It's MAX_TAIL_CALL_CNT in include/linux/bpf.h.

>> +be crafted.
>> +During runtime, the program file descriptors stored in that map can be modified,
>> +so program functionality can be altered based on specific requirements.
>> +All programs stored in such a map have been loaded into the kernel via
>> +.BR bpf ()
>> +as well.
>> +In case a lookup has failed, the current program continues its execution.
>
> What are the possible causes of failure? It may be worth saying something about
> this in the man page.

That the map slot doesn't contain a program file descriptor, that the
used lookup index/key is out of bounds, or that we've exceeded the maximum
nesting (MAX_TAIL_CALL_CNT).

> (For future patches, please start new sentences on new lines.)

Okay, I guess I still have to get used to it, sorry.

>> +A program array map is a special kind of array map, whose map values only
>> +contain valid file descriptors to other eBPF programs. Thus both the
>> +key_size and value_size must be exactly four bytes.
>> +This map is used in conjunction with the
>> +.BR bpf_tail_call()
>> +helper.
>
> Is this caller already in mainline? I could not find it in the current rc?

It's since commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
bpf programs").

>> +and therefore replace its own program flow with the one from the program
>> +at the given program array slot if present. This can be regarded as kind
>> +of a jump table to a different eBPF program. The invoked program will then
>> +reuse the same stack.
>
> Are there any implications of the fact that it uses the same stack
> that should be mentioned in the man page?

Due to the stack reuse better performance.

>> When a jump into the new program has been performed,
>> +it won't return to the old one anymore.
>> +
>> +If no eBPF program is found at the given index of the program array,
>
> I find this language a little unclear. The array does not contain eBPF
> programs, but rather file descriptors. So, what does "not found" here
> really mean? (I added a FIXME.)

Ok, it's a bit more complicated to explain I guess. So from user space
side, a lookup on that map type is not allowed. When user space adds a
new file descriptor into the prog map, the kernel internally translates
that to the actual program holds reference, etc. The tail call helper is
mapped into a eBPF instruction, so no real helper call here. That instruction
will then have a register setting as if we'd have a function call, so it
has the lookup key and uses it directly to find the array slot. From
there, it has access to the actual underlying program. "Not found" means
conditions mentioned as earlier above.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH man v4] bpf.2: various updates/follow-ups to address some fixmes
       [not found]         ` <55C2730E.1090807-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-08-07  9:40           ` Michael Kerrisk (man-pages)
       [not found]             ` <55C47D11.3040903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-08-07  9:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, ast-uqk4Ao+rVK5Wk0Htik3J/w,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi Daniel,

Thanks for the follow up.

On 08/05/2015 10:33 PM, Daniel Borkmann wrote:
> On 08/05/2015 10:09 PM, Michael Kerrisk (man-pages) wrote:
> ...
>>> +returning back.
>>> +The level of nesting has a fixed limit of 32, so that infinite loops cannot
>>
>> Where is that limit of 32 defined, by the way?
> 
> Currently, an implementation detail in the kernel, so not exposed under uapi.
> It's MAX_TAIL_CALL_CNT in include/linux/bpf.h.

Thanks, I dropped a comment into the page source.

>>> +be crafted.
>>> +During runtime, the program file descriptors stored in that map can be modified,
>>> +so program functionality can be altered based on specific requirements.
>>> +All programs stored in such a map have been loaded into the kernel via
>>> +.BR bpf ()
>>> +as well.
>>> +In case a lookup has failed, the current program continues its execution.
>>
>> What are the possible causes of failure? It may be worth saying something about
>> this in the man page.
> 
> That the map slot doesn't contain a program file descriptor, that the
> used lookup index/key is out of bounds, or that we've exceeded the maximum
> nesting (MAX_TAIL_CALL_CNT).

Okay thanks. See below.

>> (For future patches, please start new sentences on new lines.)
> 
> Okay, I guess I still have to get used to it, sorry.
> 
>>> +A program array map is a special kind of array map, whose map values only
>>> +contain valid file descriptors to other eBPF programs. Thus both the
>>> +key_size and value_size must be exactly four bytes.
>>> +This map is used in conjunction with the
>>> +.BR bpf_tail_call()
>>> +helper.
>>
>> Is this caller already in mainline? I could not find it in the current rc?
> 
> It's since commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
> bpf programs").

Got it. (I thought I was looking in the -rc before, but I was confused.)

>>> +and therefore replace its own program flow with the one from the program
>>> +at the given program array slot if present. This can be regarded as kind
>>> +of a jump table to a different eBPF program. The invoked program will then
>>> +reuse the same stack.
>>
>> Are there any implications of the fact that it uses the same stack
>> that should be mentioned in the man page?
> 
> Due to the stack reuse better performance.

Okay.

>>> When a jump into the new program has been performed,
>>> +it won't return to the old one anymore.
>>> +
>>> +If no eBPF program is found at the given index of the program array,
>>
>> I find this language a little unclear. The array does not contain eBPF
>> programs, but rather file descriptors. So, what does "not found" here
>> really mean? (I added a FIXME.)
> 
> Ok, it's a bit more complicated to explain I guess. So from user space
> side, a lookup on that map type is not allowed. When user space adds a
> new file descriptor into the prog map, the kernel internally translates
> that to the actual program holds reference, etc. The tail call helper is
> mapped into a eBPF instruction, so no real helper call here. That instruction
> will then have a register setting as if we'd have a function call, so it
> has the lookup key and uses it directly to find the array slot. From
> there, it has access to the actual underlying program. "Not found" means
> conditions mentioned as earlier above.

Okay I've changed that paragraph to now read:

              If no eBPF program is found at the given index of the pro‐
              gram  array  (because the map slot doesn't contain a valid
              program file descriptor, the specified lookup index/key is
              out  of  bounds,  or the limit of 32 nested calls has been
              exceed) execution continues with the current eBPF program.
              This can be used as a fall-through for default cases.

Okay?

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH man v4] bpf.2: various updates/follow-ups to address some fixmes
       [not found]             ` <55C47D11.3040903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-08-07  9:55               ` Daniel Borkmann
       [not found]                 ` <55C4809F.7020900-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2015-08-07  9:55 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: ast-uqk4Ao+rVK5Wk0Htik3J/w, linux-man-u79uwXL29TY76Z2rM5mHXA

On 08/07/2015 11:40 AM, Michael Kerrisk (man-pages) wrote:
> Hi Daniel,
>
> Thanks for the follow up.
>
> On 08/05/2015 10:33 PM, Daniel Borkmann wrote:
>> On 08/05/2015 10:09 PM, Michael Kerrisk (man-pages) wrote:
>> ...
>>>> +returning back.
>>>> +The level of nesting has a fixed limit of 32, so that infinite loops cannot
>>>
>>> Where is that limit of 32 defined, by the way?
>>
>> Currently, an implementation detail in the kernel, so not exposed under uapi.
>> It's MAX_TAIL_CALL_CNT in include/linux/bpf.h.
>
> Thanks, I dropped a comment into the page source.

Okay, cool.

>>>> +be crafted.
>>>> +During runtime, the program file descriptors stored in that map can be modified,
>>>> +so program functionality can be altered based on specific requirements.
>>>> +All programs stored in such a map have been loaded into the kernel via
>>>> +.BR bpf ()
>>>> +as well.
>>>> +In case a lookup has failed, the current program continues its execution.
>>>
>>> What are the possible causes of failure? It may be worth saying something about
>>> this in the man page.
>>
>> That the map slot doesn't contain a program file descriptor, that the
>> used lookup index/key is out of bounds, or that we've exceeded the maximum
>> nesting (MAX_TAIL_CALL_CNT).
>
> Okay thanks. See below.
>
>>> (For future patches, please start new sentences on new lines.)
>>
>> Okay, I guess I still have to get used to it, sorry.
>>
>>>> +A program array map is a special kind of array map, whose map values only
>>>> +contain valid file descriptors to other eBPF programs. Thus both the
>>>> +key_size and value_size must be exactly four bytes.
>>>> +This map is used in conjunction with the
>>>> +.BR bpf_tail_call()
>>>> +helper.
>>>
>>> Is this caller already in mainline? I could not find it in the current rc?
>>
>> It's since commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
>> bpf programs").
>
> Got it. (I thought I was looking in the -rc before, but I was confused.)
>
>>>> +and therefore replace its own program flow with the one from the program
>>>> +at the given program array slot if present. This can be regarded as kind
>>>> +of a jump table to a different eBPF program. The invoked program will then
>>>> +reuse the same stack.
>>>
>>> Are there any implications of the fact that it uses the same stack
>>> that should be mentioned in the man page?
>>
>> Due to the stack reuse better performance.
>
> Okay.
>
>>>> When a jump into the new program has been performed,
>>>> +it won't return to the old one anymore.
>>>> +
>>>> +If no eBPF program is found at the given index of the program array,
>>>
>>> I find this language a little unclear. The array does not contain eBPF
>>> programs, but rather file descriptors. So, what does "not found" here
>>> really mean? (I added a FIXME.)
>>
>> Ok, it's a bit more complicated to explain I guess. So from user space
>> side, a lookup on that map type is not allowed. When user space adds a
>> new file descriptor into the prog map, the kernel internally translates
>> that to the actual program holds reference, etc. The tail call helper is
>> mapped into a eBPF instruction, so no real helper call here. That instruction
>> will then have a register setting as if we'd have a function call, so it
>> has the lookup key and uses it directly to find the array slot. From
>> there, it has access to the actual underlying program. "Not found" means
>> conditions mentioned as earlier above.
>
> Okay I've changed that paragraph to now read:
>
>                If no eBPF program is found at the given index of the pro‐
>                gram  array  (because the map slot doesn't contain a valid
>                program file descriptor, the specified lookup index/key is
>                out  of  bounds,  or the limit of 32 nested calls has been
>                exceed) execution continues with the current eBPF program.
>                This can be used as a fall-through for default cases.
>
> Okay?

Works for me, thanks!

I'll drop you some more patches on remaining items, at the very latest after
LinuxCon NA & Plumbers.

Cheers & thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH man v4] bpf.2: various updates/follow-ups to address some fixmes
       [not found]                 ` <55C4809F.7020900-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-08-07 12:31                   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-08-07 12:31 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, ast-uqk4Ao+rVK5Wk0Htik3J/w,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi Daniel,

[...]
>>>>> When a jump into the new program has been performed,
>>>>> +it won't return to the old one anymore.
>>>>> +
>>>>> +If no eBPF program is found at the given index of the program array,
>>>>
>>>> I find this language a little unclear. The array does not contain eBPF
>>>> programs, but rather file descriptors. So, what does "not found" here
>>>> really mean? (I added a FIXME.)
>>>
>>> Ok, it's a bit more complicated to explain I guess. So from user space
>>> side, a lookup on that map type is not allowed. When user space adds a
>>> new file descriptor into the prog map, the kernel internally translates
>>> that to the actual program holds reference, etc. The tail call helper is
>>> mapped into a eBPF instruction, so no real helper call here. That instruction
>>> will then have a register setting as if we'd have a function call, so it
>>> has the lookup key and uses it directly to find the array slot. From
>>> there, it has access to the actual underlying program. "Not found" means
>>> conditions mentioned as earlier above.
>>
>> Okay I've changed that paragraph to now read:
>>
>>                If no eBPF program is found at the given index of the pro‐
>>                gram  array  (because the map slot doesn't contain a valid
>>                program file descriptor, the specified lookup index/key is
>>                out  of  bounds,  or the limit of 32 nested calls has been
>>                exceed) execution continues with the current eBPF program.
>>                This can be used as a fall-through for default cases.
>>
>> Okay?
> 
> Works for me, thanks!

Thanks for checking.

> I'll drop you some more patches on remaining items, at the very latest after
> LinuxCon NA & Plumbers.

Thanks. The page is already looking a lot better than the version that went
out with the last man-pages release.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-07 12:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-29 22:25 [PATCH man v4] bpf.2: various updates/follow-ups to address some fixmes Daniel Borkmann
     [not found] ` <e56dadb710e4beec76c02823933c232e19af025b.1438208478.git.daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-08-05 20:09   ` Michael Kerrisk (man-pages)
     [not found]     ` <55C26D97.6090408-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-08-05 20:33       ` Daniel Borkmann
     [not found]         ` <55C2730E.1090807-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-08-07  9:40           ` Michael Kerrisk (man-pages)
     [not found]             ` <55C47D11.3040903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-08-07  9:55               ` Daniel Borkmann
     [not found]                 ` <55C4809F.7020900-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-08-07 12:31                   ` Michael Kerrisk (man-pages)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.