bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tc_classid access in skb bpf context
@ 2019-05-21 23:52 Matthew Cover
  2019-05-22  8:37 ` Daniel Borkmann
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Cover @ 2019-05-21 23:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, netdev, bpf, linux-kernel
  Cc: Matthew Cover

__sk_buff has a member tc_classid which I'm interested in accessing from the skb bpf context.

A bpf program which accesses skb->tc_classid compiles, but fails verification; the specific failure is "invalid bpf_context access".

if (skb->tc_classid != 0)
 return 1;
return 0;

Some of the tests in tools/testing/selftests/bpf/verifier/ (those on tc_classid) further confirm that this is, in all likelihood, intentional behavior.

The very similar bpf program which instead accesses skb->mark works as desired.

if (skb->mark != 0)
 return 1;
return 0;

I built a kernel (v5.1) with 4 instances of the following line removed from net/core/filter.c to test the behavior when the instructions pass verification.

    switch (off) {
-    case bpf_ctx_range(struct __sk_buff, tc_classid):
...
        return false;

It appears skb->tc_classid is always zero within my bpf program, even when I verify by other means (e.g. netfilter) that the value is set non-zero.

I gather that sk_buff proper sometimes (i.e. at some layers) has qdisc_skb_cb stored in skb->cb, but not always.

I suspect that the tc_classid is available at l3 (and therefore to utils like netfilter, ip route, tc), but not at l2 (and not to AF_PACKET).

Is it impractical to make skb->tc_classid available in this bpf context or is there just some plumbing which hasn't been connected yet?

Is my suspicion that skb->cb no longer contains qdisc_skb_cb due to crossing a layer boundary well founded?

I'm willing to look into hooking things together as time permits if it's a feasible task.

It's trivial to have iptables match on tc_classid and set a mark which is available to bpf at l2, but I'd like to better understand this.

Thanks,
Matt C.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: tc_classid access in skb bpf context
  2019-05-21 23:52 tc_classid access in skb bpf context Matthew Cover
@ 2019-05-22  8:37 ` Daniel Borkmann
  2019-05-22 17:26   ` Matthew Cover
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Borkmann @ 2019-05-22  8:37 UTC (permalink / raw)
  To: Matthew Cover, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, netdev, bpf, linux-kernel
  Cc: Matthew Cover

On 05/22/2019 01:52 AM, Matthew Cover wrote:
> __sk_buff has a member tc_classid which I'm interested in accessing from the skb bpf context.
> 
> A bpf program which accesses skb->tc_classid compiles, but fails verification; the specific failure is "invalid bpf_context access".
> 
> if (skb->tc_classid != 0)
>  return 1;
> return 0;
> 
> Some of the tests in tools/testing/selftests/bpf/verifier/ (those on tc_classid) further confirm that this is, in all likelihood, intentional behavior.
> 
> The very similar bpf program which instead accesses skb->mark works as desired.
> 
> if (skb->mark != 0)
>  return 1;
> return 0;

You should be able to access skb->tc_classid, perhaps you're using the wrong program
type? BPF_PROG_TYPE_SCHED_CLS is supposed to work (if not we'd have a regression).

> I built a kernel (v5.1) with 4 instances of the following line removed from net/core/filter.c to test the behavior when the instructions pass verification.
> 
>     switch (off) {
> -    case bpf_ctx_range(struct __sk_buff, tc_classid):
> ...
>         return false;
> 
> It appears skb->tc_classid is always zero within my bpf program, even when I verify by other means (e.g. netfilter) that the value is set non-zero.
> 
> I gather that sk_buff proper sometimes (i.e. at some layers) has qdisc_skb_cb stored in skb->cb, but not always.
> 
> I suspect that the tc_classid is available at l3 (and therefore to utils like netfilter, ip route, tc), but not at l2 (and not to AF_PACKET).

From tc/BPF context you can use it; it's been long time, but I think back then
we mapped it into cb[] so it can be used within the BPF context to pass skb data
around e.g. between tail calls, and cls_bpf_classify() when in direct-action mode
which likely everyone is/should-be using then maps that skb->tc_classid u16 cb[]
value to res->classid on program return which then in either sch_handle_ingress()
or sch_handle_egress() is transferred into the skb->tc_index.

> Is it impractical to make skb->tc_classid available in this bpf context or is there just some plumbing which hasn't been connected yet?
> 
> Is my suspicion that skb->cb no longer contains qdisc_skb_cb due to crossing a layer boundary well founded?
> 
> I'm willing to look into hooking things together as time permits if it's a feasible task.
> 
> It's trivial to have iptables match on tc_classid and set a mark which is available to bpf at l2, but I'd like to better understand this.
> 
> Thanks,
> Matt C.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: tc_classid access in skb bpf context
  2019-05-22  8:37 ` Daniel Borkmann
@ 2019-05-22 17:26   ` Matthew Cover
  2019-05-22 21:28     ` Matthew Cover
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Cover @ 2019-05-22 17:26 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, netdev, bpf, linux-kernel
  Cc: Matthew Cover


On 05/22/2019 01:52 AM, Matthew Cover wrote:
> > __sk_buff has a member tc_classid which I'm interested in accessing from the skb bpf context.
> > 
> > A bpf program which accesses skb->tc_classid compiles, but fails verification; the specific failure is "invalid bpf_context access".
> > 
> > if (skb->tc_classid != 0)
> >  return 1;
> > return 0;
> > 
> > Some of the tests in tools/testing/selftests/bpf/verifier/ (those on tc_classid) further confirm that this is, in all likelihood, intentional behavior.
> > 
> > The very similar bpf program which instead accesses skb->mark works as desired.
> > 
> > if (skb->mark != 0)
> >  return 1;
> > return 0;
> 
> You should be able to access skb->tc_classid, perhaps you're using the wrong program
> type? BPF_PROG_TYPE_SCHED_CLS is supposed to work (if not we'd have a regression).
> 

I am in fact using BPF_PROG_TYPE_SOCKET_FILTER and using the program as PACKET_FANOUT_DATA with PACKET_FANOUT_EBPF.

I have been working on a series of utils which leverage PACKET_FANOUT to provide various per-socket-fd (per-cpu, per-queue, per-rx-flow-hash-indirection-table-idx) statistics and pcap files. While playing with PACKET_FANOUT_EBPF, I realized that I could use the bpf program to categorize packets in ways packet-filter(7) does not provide.

As a concrete example, I plan to build a util `rxtxmark` which could be passed something like `--mark-list 42,88`. This would be translated to a bpf program where the return code is the ordinality of the mark in the list.

if (skb->mark == 42)
 return 1;
if (skb->mark == 88)
 return 2;
return 0;

Packets enqueued to fd0 are simply ignored. Packets enqueued to the other fds are processed into pcaps and statistics.

While I may build a util for tc_classid which does per-user-requested-classid pcaps and statistics like `rxtxmark` does for marks, I'm also interested in using tc_classid as a simple way to capture tx packets from a long running program on the fly.

The program under inspection would simply be added to a net_cls cgroup which has a unique classid defined. A bpf program would be attached to map packets with that classid to fd1. While I can do this already by using iptables to translate the tc_classid to a mark, that complicates the implementation greatly since the firewall has to be touched (which is probably overreaching for a packet capture util and would most likely be left to the user to configure).

> > I built a kernel (v5.1) with 4 instances of the following line removed from net/core/filter.c to test the behavior when the instructions pass verification.
> > 
> >     switch (off) {
> > -    case bpf_ctx_range(struct __sk_buff, tc_classid):
> > ...
> >         return false;
> > 
> > It appears skb->tc_classid is always zero within my bpf program, even when I verify by other means (e.g. netfilter) that the value is set non-zero.
> > 
> > I gather that sk_buff proper sometimes (i.e. at some layers) has qdisc_skb_cb stored in skb->cb, but not always.
> > 
> > I suspect that the tc_classid is available at l3 (and therefore to utils like netfilter, ip route, tc), but not at l2 (and not to AF_PACKET).
> 
> From tc/BPF context you can use it; it's been long time, but I think back then
> we mapped it into cb[] so it can be used within the BPF context to pass skb data
> around e.g. between tail calls, and cls_bpf_classify() when in direct-action mode
> which likely everyone is/should-be using then maps that skb->tc_classid u16 cb[]
> value to res->classid on program return which then in either sch_handle_ingress()
> or sch_handle_egress() is transferred into the skb->tc_index.
> 

It sounds like just before the start of a BPF_PROG_TYPE_SCHED_CLS bpf program tc_classid id placed in skb->cb. The missing plumbing to support my use case is probably the same thing, but for BPF_PROG_TYPE_SOCKET_FILTER.

I'll see about familiarizing myself with both as time permits and perhaps I can get tc_classid working for a BPF_PROG_TYPE_SOCKET_FILTER program; it certainly sounds like it's doable.

> > Is it impractical to make skb->tc_classid available in this bpf context or is there just some plumbing which hasn't been connected yet?
> > 
> > Is my suspicion that skb->cb no longer contains qdisc_skb_cb due to crossing a layer boundary well founded?
> > 
> > I'm willing to look into hooking things together as time permits if it's a feasible task.
> > 
> > It's trivial to have iptables match on tc_classid and set a mark which is available to bpf at l2, but I'd like to better understand this.
> > 
> > Thanks,
> > Matt C.
> > 

    

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: tc_classid access in skb bpf context
  2019-05-22 17:26   ` Matthew Cover
@ 2019-05-22 21:28     ` Matthew Cover
  0 siblings, 0 replies; 4+ messages in thread
From: Matthew Cover @ 2019-05-22 21:28 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, netdev, bpf, linux-kernel
  Cc: Matthew Cover


On 05/22/2019 01:52 AM, Matthew Cover wrote:
> > > __sk_buff has a member tc_classid which I'm interested in accessing from the skb bpf context.
> > > 
> > > A bpf program which accesses skb->tc_classid compiles, but fails verification; the specific failure is "invalid bpf_context access".
> > > 
> > > if (skb->tc_classid != 0)
> > >  return 1;
> > > return 0;
> > > 
> > > Some of the tests in tools/testing/selftests/bpf/verifier/ (those on tc_classid) further confirm that this is, in all likelihood, intentional behavior.
> > > 
> > > The very similar bpf program which instead accesses skb->mark works as desired.
> > > 
> > > if (skb->mark != 0)
> > >  return 1;
> > > return 0;
> > 
> > You should be able to access skb->tc_classid, perhaps you're using the wrong program
> > type? BPF_PROG_TYPE_SCHED_CLS is supposed to work (if not we'd have a regression).
> > 
> 
> I am in fact using BPF_PROG_TYPE_SOCKET_FILTER and using the program as PACKET_FANOUT_DATA with PACKET_FANOUT_EBPF.
> 
> I have been working on a series of utils which leverage PACKET_FANOUT to provide various per-socket-fd (per-cpu, per-queue, per-rx-flow-hash-indirection-table-idx) statistics and pcap files. While playing with PACKET_FANOUT_EBPF, I realized that I could use  the bpf program to categorize packets in ways packet-filter(7) does not provide.
> 
> As a concrete example, I plan to build a util `rxtxmark` which could be passed something like `--mark-list 42,88`. This would be translated to a bpf program where the return code is the ordinality of the mark in the list.
> 
> if (skb->mark == 42)
>  return 1;
> if (skb->mark == 88)
>  return 2;
> return 0;
> 
> Packets enqueued to fd0 are simply ignored. Packets enqueued to the other fds are processed into pcaps and statistics.
> 
> While I may build a util for tc_classid which does per-user-requested-classid pcaps and statistics like `rxtxmark` does for marks, I'm also interested in using tc_classid as a simple way to capture tx packets from a long running program on the fly.
> 
> The program under inspection would simply be added to a net_cls cgroup which has a unique classid defined. A bpf program would be attached to map packets with that classid to fd1. While I can do this already by using iptables to translate the tc_classid to  a mark, that complicates the implementation greatly since the firewall has to be touched (which is probably overreaching for a packet capture util and would most likely be left to the user to configure).
> 

And only now do I discover netsniff-ng; a seriously cool set of utils! Thank you for your efforts there Daniel!

I still plan to continue advancing my various PACKET_FANOUT utils and eventually seeing how much, if any, of the common code would be of interest to the libpcap maintainers. But very cool that a quick look at the netsniff-ng help file shows that rxtxcpu et al could be accomplished with the right number of concurrent invocations of netsniff-ng.

> > > I built a kernel (v5.1) with 4 instances of the following line removed from net/core/filter.c to test the behavior when the instructions pass verification.
> > > 
> > >     switch (off) {
> > > -    case bpf_ctx_range(struct __sk_buff, tc_classid):
> > > ...
> > >         return false;
> > > 
> > > It appears skb->tc_classid is always zero within my bpf program, even when I verify by other means (e.g. netfilter) that the value is set non-zero.
> > > 
> > > I gather that sk_buff proper sometimes (i.e. at some layers) has qdisc_skb_cb stored in skb->cb, but not always.
> > > 
> > > I suspect that the tc_classid is available at l3 (and therefore to utils like netfilter, ip route, tc), but not at l2 (and not to AF_PACKET).
> > 
> > From tc/BPF context you can use it; it's been long time, but I think back then
> > we mapped it into cb[] so it can be used within the BPF context to pass skb data
> > around e.g. between tail calls, and cls_bpf_classify() when in direct-action mode
> > which likely everyone is/should-be using then maps that skb->tc_classid u16 cb[]
> > value to res->classid on program return which then in either sch_handle_ingress()
> > or sch_handle_egress() is transferred into the skb->tc_index.
> > 
> 
> It sounds like just before the start of a BPF_PROG_TYPE_SCHED_CLS bpf program tc_classid id placed in skb->cb. The missing plumbing to support my use case is probably the same thing, but for BPF_PROG_TYPE_SOCKET_FILTER.
> 
> I'll see about familiarizing myself with both as time permits and perhaps I can get tc_classid working for a BPF_PROG_TYPE_SOCKET_FILTER program; it certainly sounds like it's doable.
> 
> > > Is it impractical to make skb->tc_classid available in this bpf context or is there just some plumbing which hasn't been connected yet?
> > > 
> > > Is my suspicion that skb->cb no longer contains qdisc_skb_cb due to crossing a layer boundary well founded?
> > > 
> > > I'm willing to look into hooking things together as time permits if it's a feasible task.
> > > 
> > > It's trivial to have iptables match on tc_classid and set a mark which is available to bpf at l2, but I'd like to better understand this.
> > > 
> > > Thanks,
> > > Matt C.
> > > 

        

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-05-22 21:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-21 23:52 tc_classid access in skb bpf context Matthew Cover
2019-05-22  8:37 ` Daniel Borkmann
2019-05-22 17:26   ` Matthew Cover
2019-05-22 21:28     ` Matthew Cover

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).