From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Borkmann Subject: Re: [PATCH bpf-next v3 6/8] bpf: add documentation for eBPF helpers (42-50) Date: Thu, 19 Apr 2018 14:40:05 +0200 Message-ID: <25596c19-9091-d46b-c323-cc1547dd3aeb@iogearbox.net> References: <20180417143438.7018-1-quentin.monnet@netronome.com> <20180417143438.7018-7-quentin.monnet@netronome.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180417143438.7018-7-quentin.monnet@netronome.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org To: Quentin Monnet , ast@kernel.org Cc: netdev@vger.kernel.org, oss-drivers@netronome.com, linux-doc@vger.kernel.org, linux-man@vger.kernel.org, Kaixu Xia , Martin KaFai Lau , Sargun Dhillon , Thomas Graf , Gianluca Borello , Chenbo Feng List-Id: linux-man@vger.kernel.org On 04/17/2018 04:34 PM, Quentin Monnet wrote: > Add documentation for eBPF helper functions to bpf.h user header file. > This documentation can be parsed with the Python script provided in > another commit of the patch series, in order to provide a RST document > that can later be converted into a man page. > > The objective is to make the documentation easily understandable and > accessible to all eBPF developers, including beginners. > > This patch contains descriptions for the following helper functions: > > Helper from Kaixu: > - bpf_perf_event_read() > > Helpers from Martin: > - bpf_skb_under_cgroup() > - bpf_xdp_adjust_head() > > Helpers from Sargun: > - bpf_probe_write_user() > - bpf_current_task_under_cgroup() > > Helper from Thomas: > - bpf_skb_change_head() > > Helper from Gianluca: > - bpf_probe_read_str() > > Helpers from Chenbo: > - bpf_get_socket_cookie() > - bpf_get_socket_uid() > > v3: > - bpf_perf_event_read(): Fix time of selection for perf event type in > description. Remove occurences of "cores" to avoid confusion with > "CPU". > > Cc: Kaixu Xia > Cc: Martin KaFai Lau > Cc: Sargun Dhillon > Cc: Thomas Graf > Cc: Gianluca Borello > Cc: Chenbo Feng > Signed-off-by: Quentin Monnet > --- > include/uapi/linux/bpf.h | 158 +++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 158 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 3a40f5debac2..dd79a1c82adf 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -753,6 +753,25 @@ union bpf_attr { > * Return > * 0 on success, or a negative error in case of failure. > * > + * u64 bpf_perf_event_read(struct bpf_map *map, u64 flags) > + * Description > + * Read the value of a perf event counter. This helper relies on a > + * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of > + * the perf event counter is selected when *map* is updated with > + * perf event file descriptors. The *map* is an array whose size > + * is the number of available CPUs, and each cell contains a value > + * relative to one CPU. The value to retrieve is indicated by > + * *flags*, that contains the index of the CPU to look up, masked > + * with **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to > + * **BPF_F_CURRENT_CPU** to indicate that the value for the > + * current CPU should be retrieved. > + * > + * Note that before Linux 4.13, only hardware perf event can be > + * retrieved. > + * Return > + * The value of the perf event counter read from the map, or a > + * negative error code in case of failure. > + * > * int bpf_redirect(u32 ifindex, u64 flags) > * Description > * Redirect the packet to another net device of index *ifindex*. > @@ -965,6 +984,17 @@ union bpf_attr { > * Return > * 0 on success, or a negative error in case of failure. > * > + * int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32 index) > + * Description > + * Check whether *skb* is a descendant of the cgroup2 held by > + * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*. > + * Return > + * The return value depends on the result of the test, and can be: > + * > + * * 0, if the *skb* failed the cgroup2 descendant test. > + * * 1, if the *skb* succeeded the cgroup2 descendant test. > + * * A negative error code, if an error occurred. > + * > * u32 bpf_get_hash_recalc(struct sk_buff *skb) > * Description > * Retrieve the hash of the packet, *skb*\ **->hash**. If it is > @@ -985,6 +1015,37 @@ union bpf_attr { > * Return > * A pointer to the current task struct. > * > + * int bpf_probe_write_user(void *dst, const void *src, u32 len) > + * Description > + * Attempt in a safe way to write *len* bytes from the buffer > + * *src* to *dst* in memory. It only works for threads that are in > + * user context. Plus the dst address must be a valid user space address. > + * This helper should not be used to implement any kind of > + * security mechanism because of TOC-TOU attacks, but rather to > + * debug, divert, and manipulate execution of semi-cooperative > + * processes. > + * > + * Keep in mind that this feature is meant for experiments, and it > + * has a risk of crashing the system and running programs. Ditto, crashing user space applications. > + * Therefore, when an eBPF program using this helper is attached, > + * a warning including PID and process name is printed to kernel > + * logs. > + * Return > + * 0 on success, or a negative error in case of failure. > + * > + * int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index) > + * Description > + * Check whether the probe is being run is the context of a given > + * subset of the cgroup2 hierarchy. The cgroup2 to test is held by > + * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*. > + * Return > + * The return value depends on the result of the test, and can be: > + * > + * * 0, if the *skb* task belongs to the cgroup2. > + * * 1, if the *skb* task does not belong to the cgroup2. > + * * A negative error code, if an error occurred. > + * > * int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags) > * Description > * Resize (trim or grow) the packet associated to *skb* to the > @@ -1069,6 +1130,103 @@ union bpf_attr { > * Return > * The id of current NUMA node. > * > + * int bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags) > + * Description > + * Grows headroom of packet associated to *skb* and adjusts the > + * offset of the MAC header accordingly, adding *len* bytes of > + * space. It automatically extends and reallocates memory as > + * required. > + * > + * This helper can be used on a layer 3 *skb* to push a MAC header > + * for redirection into a layer 2 device. > + * > + * All values for *flags* are reserved for future usage, and must > + * be left at zero. > + * > + * A call to this helper is susceptible to change data from the > + * packet. Therefore, at load time, all checks on pointers > + * previously done by the verifier are invalidated and must be > + * performed again. > + * Return > + * 0 on success, or a negative error in case of failure. > + * > + * int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta) > + * Description > + * Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that > + * it is possible to use a negative value for *delta*. This helper > + * can be used to prepare the packet for pushing or popping > + * headers. > + * > + * A call to this helper is susceptible to change data from the > + * packet. Therefore, at load time, all checks on pointers > + * previously done by the verifier are invalidated and must be > + * performed again. > + * Return > + * 0 on success, or a negative error in case of failure. > + * > + * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) > + * Description > + * Copy a NUL terminated string from an unsafe address > + * *unsafe_ptr* to *dst*. The *size* should include the > + * terminating NUL byte. In case the string length is smaller than > + * *size*, the target is not padded with further NUL bytes. If the > + * string length is larger than *size*, just *size*-1 bytes are > + * copied and the last byte is set to NUL. > + * > + * On success, the length of the copied string is returned. This > + * makes this helper useful in tracing programs for reading > + * strings, and more importantly to get its length at runtime. See > + * the following snippet: > + * > + * :: > + * > + * SEC("kprobe/sys_open") > + * void bpf_sys_open(struct pt_regs *ctx) > + * { > + * char buf[PATHLEN]; // PATHLEN is defined to 256 > + * int res = bpf_probe_read_str(buf, sizeof(buf), > + * ctx->di); > + * > + * // Consume buf, for example push it to > + * // userspace via bpf_perf_event_output(); we > + * // can use res (the string length) as event > + * // size, after checking its boundaries. > + * } > + * > + * In comparison, using **bpf_probe_read()** helper here instead > + * to read the string would require to estimate the length at > + * compile time, and would often result in copying more memory > + * than necessary. > + * > + * Another useful use case is when parsing individual process > + * arguments or individual environment variables navigating > + * *current*\ **->mm->arg_start** and *current*\ > + * **->mm->env_start**: using this helper and the return value, > + * one can quickly iterate at the right offset of the memory area. > + * Return > + * On success, the strictly positive length of the string, > + * including the trailing NUL character. On error, a negative > + * value. > + * > + * u64 bpf_get_socket_cookie(struct sk_buff *skb) > + * Description > + * Retrieve the socket cookie generated by the kernel from a > + * **struct sk_buff** with a known socket. If none has been set > + * yet, generate a new cookie. This helper can be useful for > + * monitoring per socket networking traffic statistics as it > + * provides a unique socket identifier per namespace. > + * Return > + * A 8-byte long non-decreasing number on success, or 0 if the > + * socket field is missing inside *skb*. > + * > + * u32 bpf_get_socket_uid(struct sk_buff *skb) > + * Return > + * The owner UID of the socket associated to *skb*. If the socket > + * is **NULL**, or if it is not a full socket (i.e. if it is a > + * time-wait or a request socket instead), **overflowuid** value > + * is returned (note that **overflowuid** might also be the actual > + * UID value for the socket). > + * > * u32 bpf_set_hash(struct sk_buff *skb, u32 hash) > * Description > * Set the full hash for *skb* (set the field *skb*\ **->hash**) >