netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alan Maguire <alan.maguire@oracle.com>
To: David Ahern <dsahern@gmail.com>
Cc: Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andriin@fb.com>,
	bpf@vger.kernel.org, Martin KaFai Lau <kafai@fb.com>,
	netdev@vger.kernel.org, Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com
Subject: Re: [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
Date: Fri, 17 Apr 2020 11:54:01 +0100 (BST)	[thread overview]
Message-ID: <alpine.LRH.2.21.2004171106580.32559@localhost> (raw)
In-Reply-To: <40e427e2-5b15-e9aa-e2cb-42dc1b53d047@gmail.com>

On Wed, 15 Apr 2020, David Ahern wrote:

> On 4/15/20 1:27 PM, Yonghong Song wrote:
> > 
> > As there are some discussions regarding to the kernel interface/steps to
> > create file/anonymous dumpers, I think it will be beneficial for
> > discussion with this work in progress.
> > 
> > Motivation:
> >   The current way to dump kernel data structures mostly:
> >     1. /proc system
> >     2. various specific tools like "ss" which requires kernel support.
> >     3. drgn
> >   The dropback for the first two is that whenever you want to dump more, you
> >   need change the kernel. For example, Martin wants to dump socket local
> 
> If kernel support is needed for bpfdump of kernel data structures, you
> are not really solving the kernel support problem. i.e., to dump
> ipv4_route's you need to modify the relevant proc show function.
>

I need to dig into this patchset a bit more, but if there is
a need for in-kernel BTF-based structure dumping I've got a
work-in-progress patchset that does this by generalizing the code
that  deals with seq output in the verifier. I've posted it
as an RFC in case it has anything useful to offer here:

https://lore.kernel.org/bpf/1587120160-3030-1-git-send-email-alan.maguire@oracle.com/T/#t

The idea is that by using different callback function we can achieve
seq, snprintf or other output in-kernel using the kernel BTF data. 
I created one consumer as a proof-of-concept; it's a printk pointer 
format specifier.  Since the dump format is determined in kernel
it's a bit constrained format-wise, but may be good enough for
some cases.

To give a flavour for what the printed-out data looks like,
here we use pr_info() to display a struct sk_buff *.  Note
we specify the 'N' modifier to show type field names:

  struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);

  pr_info("%pTN<struct sk_buff>", skb);

...gives us:

{{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,.rb_right=00000000c7916e9c,.rb_left=00000000c7916e9c}|.list={.next=00000000c7916e9c,.prev=00000000c7916e9c}},{.sk=00000000c7916e9c|.ip_defrag_offset=0},{.tstamp=0|.skb_mstamp_ns=0},.cb=['\0'],{{._skb_refdst=0,.destructor=00000000c7916e9c}|.tcp_tsorted_anchor={.next=00000000c7916e9c,.prev=00000000c7916e9c}},._nfct=0,.len=0,.data_len=0,.mac_len=0,.hdr_len=0,.queue_mapping=0,.__cloned_offset=[],.cloned=0x0,.nohdr=0x0,.fclone=0x0,.peeked=0x0,.head_frag=0x0,.pfmemalloc=0x0,.active_extensions=0,.headers_start=[],.__pkt_type_offset=[],.pkt_type=0x0,.ignore_df=0x0,.nf_trace=0x0,.ip_summed=0x0,.ooo_okay=0x0,.l4_hash=0x0,.sw_hash=0x0,.wifi_acked_valid=0x0,.wifi_acked=0x0,.no_fcs=0x0,.encapsulation=0x0,.encap_hdr_csum=0x0,.csum_valid=0x0,.__pkt_vlan_present_offset=[],.vlan_present=0x0,.csum_complete_sw=0x0,.csum_level=0x0,.csum_not_inet=0x0,.dst_pending_co

[printk output is truncated at 1024 bytes, but more
compact output can be achieved by not specifying 'N'
for type names. I may need to add a specifier to avoid
pointer obfuscation]

With a printk format specifier, trace_printk() in BPF then
inherits this dumping behaviour for free, but I think it
would also be possible to add a helper so that the type
name didn't have to be specified.  The verifier could insert
BTF ids and type data could be dumped for tracing arguments
via a flavour of bpf_perf_event_output() helper or similar.
To be clear I haven't done any of that yet in the RFC patchset,
but it seems feasible at least.

Anyway perhaps there's something useful in it which can help
towards the goal of easier dumping of data structures.

I'll spend some time over the weekend looking at the
BTF dumper patchset; apologies I haven't got very far
with it yet.

Thanks!

Alan

> 
> >   storage with "ss". Kernel change is needed for it to work ([1]).
> >   This is also the direct motivation for this work.
> > 
> >   drgn ([2]) solves this proble nicely and no kernel change is not needed.
> >   But since drgn is not able to verify the validity of a particular pointer value,
> >   it might present the wrong results in rare cases.
> > 
> >   In this patch set, we introduce bpf based dumping. Initial kernel changes are
> >   still needed, but a data structure change will not require kernel changes
> >   any more. bpf program itself is used to adapt to new data structure
> >   changes. This will give certain flexibility with guaranteed correctness.
> > 
> >   Here, kernel seq_ops is used to facilitate dumping, similar to current
> >   /proc and many other lossless kernel dumping facilities.
> > 
> > User Interfaces:
> >   1. A new mount file system, bpfdump at /sys/kernel/bpfdump is introduced.
> >      Different from /sys/fs/bpf, this is a single user mount. Mount command
> >      can be:
> >         mount -t bpfdump bpfdump /sys/kernel/bpfdump
> >   2. Kernel bpf dumpable data structures are represented as directories
> >      under /sys/kernel/bpfdump, e.g.,
> >        /sys/kernel/bpfdump/ipv6_route/
> >        /sys/kernel/bpfdump/netlink/
> 
> The names of bpfdump fs entries do not match actual data structure names
> - e.g., there is no ipv6_route struct. On the one hand that is a good
> thing since structure names can change, but that also means a mapping is
> needed between the dumper filesystem entries and what you get for context.
> 
> Further, what is the expectation in terms of stable API for these fs
> entries? Entries in the context can change. Data structure names can
> change. Entries in the structs can change. All of that breaks the idea
> of stable programs that are compiled once and run for all future
> releases. When structs change, those programs will break - and
> structures will change.
> 
> What does bpfdumper provide that you can not do with a tracepoint on a
> relevant function and then putting a program on the tracepoint? ie., why
> not just put a tracepoint in the relevant dump functions.
> 

  parent reply	other threads:[~2020-04-17 10:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 01/17] net: refactor net assignment for seq_net_private structure Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 02/17] bpf: create /sys/kernel/bpfdump mount file system Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 03/17] bpf: provide a way for targets to register themselves Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 04/17] bpf: allow loading of a dumper program Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 05/17] bpf: create file or anonymous dumpers Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 06/17] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 07/17] bpf: add netlink and ipv6_route targets Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 08/17] bpf: add bpf_map target Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 09/17] bpf: add task and task/file targets Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 10/17] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 11/17] bpf: support variable length array in tracing programs Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 12/17] bpf: implement query for target_proto and file dumper prog_id Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 13/17] tools/libbpf: libbpf support for bpfdump Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 14/17] tools/bpftool: add bpf dumper support Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 15/17] tools/bpf: selftests: add dumper programs for ipv6_route and netlink Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 16/17] tools/bpf: selftests: add dumper progs for bpf_map/task/task_file Yonghong Song
2020-04-15 19:28 ` [RFC PATCH bpf-next v2 17/17] tools/bpf: selftests: add a selftest for anonymous dumper Yonghong Song
2020-04-16  2:23 ` [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures David Ahern
2020-04-16  6:41   ` Yonghong Song
2020-04-17 15:02     ` Alan Maguire
2020-04-19  5:34       ` Yonghong Song
2020-04-17 10:54   ` Alan Maguire [this message]
2020-04-19  5:30     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.21.2004171106580.32559@localhost \
    --to=alan.maguire@oracle.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).