All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Best userspace programming API for XDP features query to kernel?
       [not found] ` <20180404142856.284ee8c9-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-04-05 10:37   ` Daniel Borkmann via iovisor-dev
       [not found]     ` <95ac793b-66aa-b53c-a9bc-c86cbc47ee1a-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann via iovisor-dev @ 2018-04-05 10:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Eric Leblond, Victor Julien, Peter Manev
  Cc: oisf-devel-ZwoEplunGu2j570ONfqVQLVmwVP6tfMwSoIsB4E12gc,
	Jakub Kicinski, Alexei Starovoitov,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy, Saeed Mahameed,
	Daniel Borkmann

On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:
> Hi Suricata people,
> 
> When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
> into the issue, that at Suricata load/start time, we cannot determine
> if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
> this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
> 
> We would have liked a way to report that suricata.yaml config was
> invalid for this hardware/setup.  Now, it just loads, and packets gets
> silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
> 
> My question to suricata developers: (Q1) Do you already have code that
> query the kernel or drivers for features?
> 
> At the IOvisor call (2 weeks ago), we discussed two options of exposing
> XDP features avail in a given driver.
> 
> Option#1: Extend existing ethtool -k/-K "offload and other features"
> with some XDP features, that userspace can query. (Do you already query
> offloads, regarding Q1)
> 
> Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.

I don't really mind if you go via ethtool, as long as we handle this
generically from there and e.g. call the dev's ndo_bpf handler such that
we keep all the information in one place. This can be a new ndo_bpf command
e.g. XDP_QUERY_FEATURES or such.

More specifically, how would such feature mask look like? How fine grained
would this be? When you add a new minor feature to, say, cpumap that not
all drivers support yet, we'd need a new flag each time, no? Same for meta data,
then potentially for the redirect memory return work, or the af_xdp bits, the
xdp_rxq_info would have needed it, etc. What about nfp in terms of XDP
offload capabilities, should they be included as well or is probing to load
the program and see if it loads/JITs as we do today just fine (e.g. you'd
otherwise end up with extra flags on a per BPF helper basis)? To make a
somewhat reliable assertion whether feature xyz would work, this would
explode in new feature bits long term. Additionally, if we end up with a
lot of feature flags, it will be very hard for users to determine whether
this particular set of features a driver supports actually represents a
fully supported native XDP driver.

What about keeping this high level to users? E.g. say you have 2 options
that drivers can expose as netdev_features_strings 'xdp-native-full' or
'xdp-native-partial'. If a driver truly supports all XDP features for a
given kernel e.g. v4.16, then a query like 'ethtool -k foo' will say
'xdp-native-full', if at least one feature is missing to be feature complete
from e.g. above list, then ethtool will tell 'xdp-native-partial', and if
not even ndo_bpf callback exists then no 'xdp-native-*' is reported.

Side-effect might be that it would give incentive to keep drivers in state
'xdp-native-full' instead of being downgraded to 'xdp-native-partial'.
Potentially, in the 'xdp-native-partial' state, we can expose a high-level
list of missing features that the driver does not support yet, which would
over time converge towards 'zero' and thus 'xdp-native-full' again. ethtool
itself could get a new XDP specific query option that, based on this info,
can then dump the full list of supported and not supported features. In order
for this to not explode, such features would need to be kept on a high-level
basis, meaning if e.g. cpumap gets extended along with support for a number
of drivers, then those that missed out would need to be temporarily
re-flagged with e.g. 'cpumap not supported' until it gets also implemented
there. That way, we don't explode in adding too fine-grained feature bit
combinations long term and make it easier to tell whether a driver supports
the full set in native XDP or not. Thoughts?

> (Q2) Do Suricata devs have any preference (or other options/ideas) for
> the way the kernel expose this info to userspace?
> 
> [1] http://suricata.readthedocs.io/en/latest/capture-hardware/ebpf-xdp.html#the-xdp-cpu-redirect-case

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Best userspace programming API for XDP features query to kernel?
       [not found]     ` <95ac793b-66aa-b53c-a9bc-c86cbc47ee1a-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2018-04-05 20:51       ` Jesper Dangaard Brouer via iovisor-dev
       [not found]         ` <20180405225133.18a09883-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jesper Dangaard Brouer via iovisor-dev @ 2018-04-05 20:51 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: oisf-devel-ZwoEplunGu2j570ONfqVQLVmwVP6tfMwSoIsB4E12gc,
	Jakub Kicinski, Alexei Starovoitov, Victor Julien,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy, Peter Manev,
	Jiri Benc, Saeed Mahameed, Eric Leblond, Daniel Borkmann

On Thu, 5 Apr 2018 12:37:19 +0200
Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org> wrote:

> On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:
> > Hi Suricata people,
> > 
> > When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
> > into the issue, that at Suricata load/start time, we cannot determine
> > if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
> > this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
> > 
> > We would have liked a way to report that suricata.yaml config was
> > invalid for this hardware/setup.  Now, it just loads, and packets gets
> > silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
> > 
> > My question to suricata developers: (Q1) Do you already have code that
> > query the kernel or drivers for features?
> > 
> > At the IOvisor call (2 weeks ago), we discussed two options of exposing
> > XDP features avail in a given driver.
> > 
> > Option#1: Extend existing ethtool -k/-K "offload and other features"
> > with some XDP features, that userspace can query. (Do you already query
> > offloads, regarding Q1)
> > 
> > Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.  
> 
> I don't really mind if you go via ethtool, as long as we handle this
> generically from there and e.g. call the dev's ndo_bpf handler such that
> we keep all the information in one place. This can be a new ndo_bpf command
> e.g. XDP_QUERY_FEATURES or such.

Just to be clear: notice as Victor points out[2], they are programmable
going though the IOCTL (SIOCETHTOOL) and not using cmdline tools.

[2] https://github.com/OISF/suricata/blob/master/src/util-ioctl.c#L326

If you want everything to go through the drivers ndo_bpf call anyway
(which userspace API is netlink based) then at what point to you
want drivers to call their own ndo_bpf, when activated though their
ethtool_ops ? (Sorry, but I don't follow the flow you are proposing)

Notice, I'm not directly against using the drivers ndo_bpf call.  I can
see it does provide kernel more flexibility than the ethtool IOCTL.


> More specifically, how would such feature mask look like? How fine grained
> would this be? When you add a new minor feature to, say, cpumap that not
> all drivers support yet, we'd need a new flag each time, no?

No, CPUMAP is not a driver level feature, and thus does not require a
features flag exposed by the driver.  CPUMAP depends on the driver
feature XDP_REDIRECT.

It is important we separate driver-level features and BPF/XDP core
features.  I feel that we constantly talk past each-other when we mix
that up.

It is true that Suricata _also_ need to detect if the running kernel
support the map type called: BPF_MAP_TYPE_CPUMAP.  *BUT* that is a
completely separate mechanism.  It is a core kernel bpf feature, and I
have accepted that this can only be done via probing the kernel (simply
use the bpf syscall and try to create this map type).

Here, I want to discuss how drivers expose/tell userspace that they
support a given feature: Specifically a bit for: XDP_REDIRECT action
support.


> Same for meta data,

Well, not really.  It would be a "nice-to-have", but not strictly
needed as a feature bit.  XDP meta-data is controlled via a helper.
And the BPF-prog can detect/see runtime, that the helper bpf_xdp_adjust_meta
returns -ENOTSUPP (and need to check the ret value anyhow).  Thus,
there is that not much gained by exposing this to be detected setup
time, as all drivers should eventually support this, and we can detect
it runtime.

The missing XDP_REDIRECT action features bit it different, as the
BPF-prog cannot detect runtime that this is an unsupported action.
Plus, setup time we cannot query the driver for supported XDP actions.


> then potentially for the redirect memory return work,

I'm not sure the redirect memory return types, belong here.  First of
all it is per RX-ring.  Second, some other config method will likely be
config interface.  Like with AF_XDP-zero-copy it is a new NDO. So, it
is more losely coupled.

> or the af_xdp bits, 

No need for bit for AF_XDP in copy-mode (current RFC), as this only
depend on driver supporting XDP_REDIRECT action.

For AF_XDP in zero-copy mode, then yes we need a way for userspace to
"see" if this mode is supported by the driver.  But it might not need a
feature bit here... as the bind() call (which knows the ifindex) could
fail when it tried to enable this ZC mode.  It would make userspace's
live easier to add ZC as a driver feature bit.


> the xdp_rxq_info would have needed it, etc. 

Same comment as mem-type, not necessarily, as it is more losely coupled
to XDP.

> What about nfp in terms of XDP
> offload capabilities, should they be included as well or is probing to load
> the program and see if it loads/JITs as we do today just fine (e.g. you'd
> otherwise end up with extra flags on a per BPF helper basis)?

No, flags per BPF helper basis. As I've described above, helper belong
to the BPF core, not the driver.  Here I want to know what the specific
driver support.

> To make a
> somewhat reliable assertion whether feature xyz would work, this would
> explode in new feature bits long term. Additionally, if we end up with a
> lot of feature flags, it will be very hard for users to determine whether
> this particular set of features a driver supports actually represents a
> fully supported native XDP driver.

Think about it, what does a "fully supported native XDP driver" mean,
when the kernel evolve and new features gets added.  How will the
end-user know what XDP features are in their running kernel release?

> What about keeping this high level to users? E.g. say you have 2 options
> that drivers can expose as netdev_features_strings 'xdp-native-full' or
> 'xdp-native-partial'. If a driver truly supports all XDP features for a
> given kernel e.g. v4.16, then a query like 'ethtool -k foo' will say
> 'xdp-native-full', if at least one feature is missing to be feature complete
> from e.g. above list, then ethtool will tell 'xdp-native-partial', and if
> not even ndo_bpf callback exists then no 'xdp-native-*' is reported.

I use-to-be, an advocate for this.  I even think I send patches
implementing this. Later, I've realized that this model is flawed.

When e.g. suricata loads it need to look at both "xdp-native-full" and
the kernel version, to determine if XDP_REDIRECT action is avail.
Later when a new kernel version gets released, the driver is missing a
new XDP feature.  Then suricata, which doesn't use/need the new
feature, need to be updated, to check that kernel below this version,
with 'xdp-native-partial' and this NIC driver is still okay.  Can you
see the problem?

Even if Suricate goes though the pain of keeping track of kernel
version vs drivers vs xdp-native-full/partial.  Then, they also want to
run their product on distro kernels.  They might convince some distro,
to backport some XDP features they need.  So, now they also need to
keep track of distro kernel minor versions... and all they really
wanted as a single feature bit saying if the running NIC driver
supports the XDP_REDIRECT action code.

 
> Side-effect might be that it would give incentive to keep drivers in
> state 'xdp-native-full' instead of being downgraded to
> 'xdp-native-partial'. Potentially, in the 'xdp-native-partial' state,
> we can expose a high-level list of missing features that the driver
> does not support yet, which would over time converge towards 'zero'
> and thus 'xdp-native-full' again. ethtool itself could get a new XDP
> specific query option that, based on this info, can then dump the
> full list of supported and not supported features. In order for this
> to not explode, such features would need to be kept on a high-level
> basis, meaning if e.g. cpumap gets extended along with support for a
> number of drivers, then those that missed out would need to be
> temporarily re-flagged with e.g. 'cpumap not supported' until it gets
> also implemented there. That way, we don't explode in adding too
> fine-grained feature bit combinations long term and make it easier to
> tell whether a driver supports the full set in native XDP or not.
> Thoughts?

(I really liked creating an incentive for driver vendors)
Thoughs inlined above...
 
> > (Q2) Do Suricata devs have any preference (or other options/ideas)
> > for the way the kernel expose this info to userspace?
> > 
> > [1]
> > http://suricata.readthedocs.io/en/latest/capture-hardware/ebpf-xdp.html#the-xdp-cpu-redirect-case  

Regarding how fine grained features bits I want.  I also want to mention
that, I want the drivers XDP_REDIRECT action support to be decoupled
from whether the driver support ndo_xdp_xmit, and if ndo_xdp_xmit is
enabled or disabled.
  E.g. for the macvlan driver, I don't see much performance gain in
implementing the native XDP-RX actions "side", while there will be a
huge performance gain in implemeting ndo_xdp_xmit.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Best userspace programming API for XDP features query to kernel?
       [not found]         ` <20180405225133.18a09883-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-04-05 21:47           ` Jakub Kicinski via iovisor-dev
       [not found]             ` <20180405144716.2bdd4214-68UzVGuGftmUSpRRplVxJ1aTQe2KTcn/@public.gmane.org>
  2018-04-06 10:36           ` Daniel Borkmann via iovisor-dev
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski via iovisor-dev @ 2018-04-05 21:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: oisf-devel-ZwoEplunGu2j570ONfqVQLVmwVP6tfMwSoIsB4E12gc,
	Alexei Starovoitov, Victor Julien, netdev-u79uwXL29TY76Z2rM5mHXA,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy, Peter Manev,
	Jiri Benc, Saeed Mahameed, Eric Leblond, Daniel Borkmann

On Thu, 5 Apr 2018 22:51:33 +0200, Jesper Dangaard Brouer wrote:
> > What about nfp in terms of XDP
> > offload capabilities, should they be included as well or is probing to load
> > the program and see if it loads/JITs as we do today just fine (e.g. you'd
> > otherwise end up with extra flags on a per BPF helper basis)?  
> 
> No, flags per BPF helper basis. As I've described above, helper belong
> to the BPF core, not the driver.  Here I want to know what the specific
> driver support.

I think Daniel meant for nfp offload.  The offload restrictions are
quite involved, are we going to be able to express those?

This is a bit simpler but reminds me of the TC flower capability
discussion.  Expressing features and capabilities gets messy quickly.

I have a gut feeling that a good starting point would be defining and
building a test suite or a set of probing tests to check things work at
system level (incl. redirects to different ports etc.)  I think having
a concrete set of litmus tests that confirm the meaning of a given
feature/capability would go a long way in making people more comfortable
with accepting any form of BPF driver capability.  And serious BPF
projects already do probing so it's just centralizing this in the
kernel.

That's my two cents.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Best userspace programming API for XDP features query to kernel?
       [not found]             ` <20180405144716.2bdd4214-68UzVGuGftmUSpRRplVxJ1aTQe2KTcn/@public.gmane.org>
@ 2018-04-06  8:47               ` Jesper Dangaard Brouer via iovisor-dev
  0 siblings, 0 replies; 7+ messages in thread
From: Jesper Dangaard Brouer via iovisor-dev @ 2018-04-06  8:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: oisf-devel-ZwoEplunGu2j570ONfqVQLVmwVP6tfMwSoIsB4E12gc,
	Alexei Starovoitov, Victor Julien, netdev-u79uwXL29TY76Z2rM5mHXA,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy, Peter Manev,
	Jiri Benc, Saeed Mahameed, Eric Leblond, Daniel Borkmann

On Thu, 5 Apr 2018 14:47:16 -0700
Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org> wrote:

> On Thu, 5 Apr 2018 22:51:33 +0200, Jesper Dangaard Brouer wrote:
> > > What about nfp in terms of XDP
> > > offload capabilities, should they be included as well or is probing to load
> > > the program and see if it loads/JITs as we do today just fine (e.g. you'd
> > > otherwise end up with extra flags on a per BPF helper basis)?    
> > 
> > No, flags per BPF helper basis. As I've described above, helper belong
> > to the BPF core, not the driver.  Here I want to know what the specific
> > driver support.  
> 
> I think Daniel meant for nfp offload.  The offload restrictions are
> quite involved, are we going to be able to express those?

Let's keep thing separate.

I'm requesting something really simple.  I want the driver to tell me
what XDP actions it supports.  We/I can implement an XDP_QUERY_ACTIONS
via ndo_bpf, problem solved.  It is small, specific and simple.

For my other use-case of enabling XDP-xmit on a device, I can
implement another ndo_bpf extension. Current approach today is loading
a dummy XDP prog via ndo_bpf anyway (which is awkward). Again a
specific change that let us move one-step further.


For your nfp offload use-case, you/we have to find a completely
different solution.  You have hit a design choice made by BPF.
Which is that BPF is part of the core kernel, and helpers cannot be
loaded as kernel modules.  As we cannot remove or add helpers after the
verifier certified the program.  And basically your nfp offload driver
comes as a kernel module.
 (Details: and you basically already solved your issue by modifying the
core verifier to do a call back to bpf_prog_offload_verify_insn()).
Point being this is very different from what I'm requesting.  Thus, for
offloading you already have a solution, to my setup time detect
problem, as your program gets rejected setup/load time by the verifier.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Best userspace programming API for XDP features query to kernel?
       [not found]         ` <20180405225133.18a09883-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2018-04-05 21:47           ` Jakub Kicinski via iovisor-dev
@ 2018-04-06 10:36           ` Daniel Borkmann via iovisor-dev
  2018-04-09 11:26             ` [iovisor-dev] " Jesper Dangaard Brouer
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Borkmann via iovisor-dev @ 2018-04-06 10:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: oisf-devel-ZwoEplunGu2j570ONfqVQLVmwVP6tfMwSoIsB4E12gc,
	Jakub Kicinski, Alexei Starovoitov, Victor Julien,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy, Peter Manev,
	Jiri Benc, Saeed Mahameed, Eric Leblond, Daniel Borkmann

On 04/05/2018 10:51 PM, Jesper Dangaard Brouer wrote:
> On Thu, 5 Apr 2018 12:37:19 +0200
> Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org> wrote:
> 
>> On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:
>>> Hi Suricata people,
>>>
>>> When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
>>> into the issue, that at Suricata load/start time, we cannot determine
>>> if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
>>> this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
>>>
>>> We would have liked a way to report that suricata.yaml config was
>>> invalid for this hardware/setup.  Now, it just loads, and packets gets
>>> silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
>>>
>>> My question to suricata developers: (Q1) Do you already have code that
>>> query the kernel or drivers for features?
>>>
>>> At the IOvisor call (2 weeks ago), we discussed two options of exposing
>>> XDP features avail in a given driver.
>>>
>>> Option#1: Extend existing ethtool -k/-K "offload and other features"
>>> with some XDP features, that userspace can query. (Do you already query
>>> offloads, regarding Q1)
>>>
>>> Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.  
>>
>> I don't really mind if you go via ethtool, as long as we handle this
>> generically from there and e.g. call the dev's ndo_bpf handler such that
>> we keep all the information in one place. This can be a new ndo_bpf command
>> e.g. XDP_QUERY_FEATURES or such.
> 
> Just to be clear: notice as Victor points out[2], they are programmable
> going though the IOCTL (SIOCETHTOOL) and not using cmdline tools.

Sure, that was perfectly clear. (But at the same time if you extend the
ioctl, it's obvious to also add support to actual ethtool cmdline tool.)

> [2] https://github.com/OISF/suricata/blob/master/src/util-ioctl.c#L326
> 
> If you want everything to go through the drivers ndo_bpf call anyway
> (which userspace API is netlink based) then at what point to you

Not really, that's the front end. ndo_bpf itself is a plain netdev op
and has no real tie to netlink.

> want drivers to call their own ndo_bpf, when activated though their
> ethtool_ops ? (Sorry, but I don't follow the flow you are proposing)
> 
> Notice, I'm not directly against using the drivers ndo_bpf call.  I can
> see it does provide kernel more flexibility than the ethtool IOCTL.

What I was saying is that even if you go via ethtool ioctl api, where
you end up in dev_ethtool() and have some new ETHTOOL_* query command,
then instead of adding a new ethtool_ops callback, we can and should
reuse ndo_bpf from there.

[...]
> Here, I want to discuss how drivers expose/tell userspace that they
> support a given feature: Specifically a bit for: XDP_REDIRECT action
> support.
> 
>> Same for meta data,
> 
> Well, not really.  It would be a "nice-to-have", but not strictly
> needed as a feature bit.  XDP meta-data is controlled via a helper.
> And the BPF-prog can detect/see runtime, that the helper bpf_xdp_adjust_meta
> returns -ENOTSUPP (and need to check the ret value anyhow).  Thus,
> there is that not much gained by exposing this to be detected setup
> time, as all drivers should eventually support this, and we can detect
> it runtime.
> 
> The missing XDP_REDIRECT action features bit it different, as the
> BPF-prog cannot detect runtime that this is an unsupported action.
> Plus, setup time we cannot query the driver for supported XDP actions.

Ok, so with the example of meta data, you're arguing that it's okay
to load a native XDP program onto a driver, and run actual traffic on
the NIC in order probe for the availability of the feature when you're
saying that it "can detect/see [at] runtime". I totally agree with you
that all drivers should eventually support this (same with XDP_REDIRECT),
but today there are even differences in drivers on bpf_xdp_adjust_meta()/
bpf_xdp_adjust_head() with regards to how much headroom they have available,
etc (e.g. some of them have none), so right now you can either go and
read the code or do a runtime test with running actual traffic through
the NIC to check whether your BPF prog is supported or not. Theoretically,
you can do the same runtime test with XDP_REDIRECT (taking the warn in
bpf_warn_invalid_xdp_action() aside for a moment), but you do have the
trace_xdp_exception() tracepoint to figure it out, yes, it's a painful
hassle, but overall, it's not that different as you were trying to argue
here. For /both/ cases it would be nice to know at setup time whether
this would be supported or not. Hence, such query is not just limited to
XDP_REDIRECT alone. Eventually once such interface is agreed upon,
undoubtedly the list of feature bits will grow is what I'm trying to say;
only arguing on the XDP_REDIRECT here would be short term.

[...]
>> What about keeping this high level to users? E.g. say you have 2 options
>> that drivers can expose as netdev_features_strings 'xdp-native-full' or
>> 'xdp-native-partial'. If a driver truly supports all XDP features for a
>> given kernel e.g. v4.16, then a query like 'ethtool -k foo' will say
>> 'xdp-native-full', if at least one feature is missing to be feature complete
>> from e.g. above list, then ethtool will tell 'xdp-native-partial', and if
>> not even ndo_bpf callback exists then no 'xdp-native-*' is reported.
> 
> I use-to-be, an advocate for this.  I even think I send patches
> implementing this. Later, I've realized that this model is flawed.
> 
> When e.g. suricata loads it need to look at both "xdp-native-full" and
> the kernel version, to determine if XDP_REDIRECT action is avail.
> Later when a new kernel version gets released, the driver is missing a
> new XDP feature.  Then suricata, which doesn't use/need the new
> feature, need to be updated, to check that kernel below this version,
> with 'xdp-native-partial' and this NIC driver is still okay.  Can you
> see the problem?
> 
> Even if Suricate goes though the pain of keeping track of kernel
> version vs drivers vs xdp-native-full/partial.  Then, they also want to
> run their product on distro kernels.  They might convince some distro,
> to backport some XDP features they need.  So, now they also need to
> keep track of distro kernel minor versions... and all they really
> wanted as a single feature bit saying if the running NIC driver
> supports the XDP_REDIRECT action code.

Yep, agree it's not pretty, not claiming any of this is. You kind of
need to be aware of the underlying kernel, similar to the tracing case.
The underlying problem is effectively the decoupling of program verification
that doesn't have/know the context of where it is being attached to in
this case. Thinking out loud for a sec on couple of other options aside
from feature bits, what about i) providing the target ifindex to the
verifier for XDP programs, such that at verification time you have the
full context similar to nfp offload case today, or ii) populating some
XDP specific auxillary data to the BPF program at verification time such
that the driver can check at program attach time whether the requested
features are possible and if not it will reject and respond with netlink
extack message to the user (as we do in various XDP attach cases already
through XDP_SETUP_PROG{,_HW}).

This would, for example, avoid the need for feature bits, and do actual
rejection of the program while retaining flexibility (and avoiding to
expose bits that over time hopefully will deprecate anyway due to all
XDP aware drivers implementing them). For both cases i) and ii), it
would mean we make the verifier a bit smarter with regards to keeping
track of driver related (XDP) requirements. Program return code checking
is already present in the verifier's check_return_code() and we could
extend it for XDP as well, for example. Seems cleaner and more extensible
than feature bits, imho.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [iovisor-dev] Best userspace programming API for XDP features query to kernel?
  2018-04-06 10:36           ` Daniel Borkmann via iovisor-dev
@ 2018-04-09 11:26             ` Jesper Dangaard Brouer
  2018-04-10 10:13               ` Daniel Borkmann
  0 siblings, 1 reply; 7+ messages in thread
From: Jesper Dangaard Brouer @ 2018-04-09 11:26 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Eric Leblond, Victor Julien, Peter Manev, oisf-devel,
	Jakub Kicinski, Alexei Starovoitov, iovisor-dev, Saeed Mahameed,
	Daniel Borkmann, netdev, davem, Jiri Benc, brouer

On Fri, 6 Apr 2018 12:36:18 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 04/05/2018 10:51 PM, Jesper Dangaard Brouer wrote:
> > On Thu, 5 Apr 2018 12:37:19 +0200
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> >   
> >> On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:  
> >>> Hi Suricata people,
> >>>
> >>> When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
> >>> into the issue, that at Suricata load/start time, we cannot determine
> >>> if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
> >>> this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
> >>>
> >>> We would have liked a way to report that suricata.yaml config was
> >>> invalid for this hardware/setup.  Now, it just loads, and packets gets
> >>> silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
> >>>
> >>> My question to suricata developers: (Q1) Do you already have code that
> >>> query the kernel or drivers for features?
> >>>
> >>> At the IOvisor call (2 weeks ago), we discussed two options of exposing
> >>> XDP features avail in a given driver.
> >>>
> >>> Option#1: Extend existing ethtool -k/-K "offload and other features"
> >>> with some XDP features, that userspace can query. (Do you already query
> >>> offloads, regarding Q1)
> >>>
> >>> Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.    
> >>
> >> I don't really mind if you go via ethtool, as long as we handle this
> >> generically from there and e.g. call the dev's ndo_bpf handler such that
> >> we keep all the information in one place. This can be a new ndo_bpf command
> >> e.g. XDP_QUERY_FEATURES or such.  
> > 
> > Just to be clear: notice as Victor points out[2], they are programmable
> > going though the IOCTL (SIOCETHTOOL) and not using cmdline tools.  
> 
> Sure, that was perfectly clear. (But at the same time if you extend the
> ioctl, it's obvious to also add support to actual ethtool cmdline tool.)
> 
> > [2] https://github.com/OISF/suricata/blob/master/src/util-ioctl.c#L326
> > 
> > If you want everything to go through the drivers ndo_bpf call anyway
> > (which userspace API is netlink based) then at what point to you  
> 
> Not really, that's the front end. ndo_bpf itself is a plain netdev op
> and has no real tie to netlink.
> 
> > want drivers to call their own ndo_bpf, when activated though their
> > ethtool_ops ? (Sorry, but I don't follow the flow you are proposing)
> > 
> > Notice, I'm not directly against using the drivers ndo_bpf call.  I can
> > see it does provide kernel more flexibility than the ethtool IOCTL.  
> 
> What I was saying is that even if you go via ethtool ioctl api, where
> you end up in dev_ethtool() and have some new ETHTOOL_* query command,
> then instead of adding a new ethtool_ops callback, we can and should
> reuse ndo_bpf from there.

Okay, you want to catch this in dev_ethtool(), that connected the dots
for me, thanks.  BUT it does feel a little fake and confusing to do
this, as the driver developers seeing the info via ethtool, would
expect they need to implement an ethtool_ops callback.

My original idea behind going through the ethtool APIs, were that every
driver is already using this, and it's familiar to the driver
developers.  And there are existing userspace tools that sysadm's
already know, that we can leverage.

Thinking about this over the weekend.  I have changed my mind a bit,
based on your feedback. I do buy into your idea of forcing things
through the drivers ndo_bpf call.  I'm no-longer convinced this should
go through ethtool, but as (you desc) we can, if we find that this is
the easiest ways to export and leverage an existing userspace tool.



> [...]
> > Here, I want to discuss how drivers expose/tell userspace that they
> > support a given feature: Specifically a bit for: XDP_REDIRECT action
> > support.
> >   
> >> Same for meta data,  
> > 
> > Well, not really.  It would be a "nice-to-have", but not strictly
> > needed as a feature bit.  XDP meta-data is controlled via a helper.
> > And the BPF-prog can detect/see runtime, that the helper bpf_xdp_adjust_meta
> > returns -ENOTSUPP (and need to check the ret value anyhow).  Thus,
> > there is that not much gained by exposing this to be detected setup
> > time, as all drivers should eventually support this, and we can detect
> > it runtime.
> > 
> > The missing XDP_REDIRECT action features bit it different, as the
> > BPF-prog cannot detect runtime that this is an unsupported action.
> > Plus, setup time we cannot query the driver for supported XDP actions.  
> 
> Ok, so with the example of meta data, you're arguing that it's okay
> to load a native XDP program onto a driver, and run actual traffic on
> the NIC in order probe for the availability of the feature when you're
> saying that it "can detect/see [at] runtime". I totally agree with you
> that all drivers should eventually support this (same with XDP_REDIRECT),
> but today there are even differences in drivers on bpf_xdp_adjust_meta()/
> bpf_xdp_adjust_head() with regards to how much headroom they have available,
> etc (e.g. some of them have none), so right now you can either go and
> read the code or do a runtime test with running actual traffic through
> the NIC to check whether your BPF prog is supported or not. Theoretically,
> you can do the same runtime test with XDP_REDIRECT (taking the warn in
> bpf_warn_invalid_xdp_action() aside for a moment), but you do have the
> trace_xdp_exception() tracepoint to figure it out, yes, it's a painful
> hassle, but overall, it's not that different as you were trying to argue
> here. For /both/ cases it would be nice to know at setup time whether
> this would be supported or not. Hence, such query is not just limited to
> XDP_REDIRECT alone. Eventually once such interface is agreed upon,
> undoubtedly the list of feature bits will grow is what I'm trying to say;
> only arguing on the XDP_REDIRECT here would be short term.
> 
> [...]
> The underlying problem is effectively the decoupling of program
> verification that doesn't have/know the context of where it is being
> attached to in this case. 

Yes, exactly, that the underlying problem. (plus the first XDP prog
gets verified and driver attached, and subsequent added bpf tail calls,
cannot be "rejected" (at "driver attachment time") as its too late).

> Thinking out loud for a sec on couple of other options aside
> from feature bits, what about i) providing the target ifindex to the
> verifier for XDP programs, such that at verification time you have the
> full context similar to nfp offload case today, 

I do like that we could provide the target ifindex earlier.  But
userspace still need some way to express that it e.g. need the
XDP_REDIRECT features (as verifier cannot reliability detect the action
return codes used, as discussed before, due to bpf tail calls and maps
used as return values).

> or ii) populating some
> XDP specific auxillary data to the BPF program at verification time such
> that the driver can check at program attach time whether the requested
> features are possible and if not it will reject and respond with netlink
> extack message to the user (as we do in various XDP attach cases already
> through XDP_SETUP_PROG{,_HW}).

I like proposal ii) better.  But how do I specify/input that I need
e.g. the XDP_REDIRECT feature, such that is it avail to "the BPF
program at verification time"?


My proposal iii), what about at XDP attachment time, create a new
netlink attribute that describe XDP action codes the program
needs/wants. If the info is provided, the ndo_bpf call check and
reject, and respond with netlink extack message.
  If I want to query for avail action codes, then I can use the same
netlink attribute format, and kernel will return it "populated" with
the info.


It is very useful that the program gets rejected at attachment time (to
avoid loading an XDP prog that silently drops packets). BUT I also
want a query option/functionality (reuse netlink attr format).

Specifically for Suricata the same "bypass" features can be implemented
at different levels (XDP, XDP-generic or clsbpf), and the XDP program
contains multiple features.  Thus, depending on what NIC driver
supports, we want to load different XDP and/or clsbpf/TC BPF-programs.
Thus, still support the same user requested feature/functionality, even
if XDP_REDIRECT is not avail, just slightly slower.


> This would, for example, avoid the need for feature bits, and do actual
> rejection of the program while retaining flexibility (and avoiding to
> expose bits that over time hopefully will deprecate anyway due to all
> XDP aware drivers implementing them). For both cases i) and ii), it
> would mean we make the verifier a bit smarter with regards to keeping
> track of driver related (XDP) requirements. Program return code checking
> is already present in the verifier's check_return_code() and we could
> extend it for XDP as well, for example. Seems cleaner and more extensible
> than feature bits, imho.

I thought the verifier's check_return_code() couldn't see/verify if the
XDP return action code is provided as a map lookup result(?). How is
that handled?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [iovisor-dev] Best userspace programming API for XDP features query to kernel?
  2018-04-09 11:26             ` [iovisor-dev] " Jesper Dangaard Brouer
@ 2018-04-10 10:13               ` Daniel Borkmann
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Borkmann @ 2018-04-10 10:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Daniel Borkmann
  Cc: Eric Leblond, Victor Julien, Peter Manev, oisf-devel,
	Jakub Kicinski, Alexei Starovoitov, iovisor-dev, Saeed Mahameed,
	netdev, davem, Jiri Benc

On 04/09/2018 01:26 PM, Jesper Dangaard Brouer wrote:
> On Fri, 6 Apr 2018 12:36:18 +0200
> Daniel Borkmann <daniel@iogearbox.net> wrote:
[...]
>> [...]
>> The underlying problem is effectively the decoupling of program
>> verification that doesn't have/know the context of where it is being
>> attached to in this case. 
> 
> Yes, exactly, that the underlying problem. (plus the first XDP prog
> gets verified and driver attached, and subsequent added bpf tail calls,
> cannot be "rejected" (at "driver attachment time") as its too late).
> 
>> Thinking out loud for a sec on couple of other options aside
>> from feature bits, what about i) providing the target ifindex to the
>> verifier for XDP programs, such that at verification time you have the
>> full context similar to nfp offload case today, 
> 
> I do like that we could provide the target ifindex earlier.  But
> userspace still need some way to express that it e.g. need the
> XDP_REDIRECT features (as verifier cannot reliability detect the action
> return codes used, as discussed before, due to bpf tail calls and maps
> used as return values).

(See below on the detection of it.)

>> or ii) populating some
>> XDP specific auxillary data to the BPF program at verification time such
>> that the driver can check at program attach time whether the requested
>> features are possible and if not it will reject and respond with netlink
>> extack message to the user (as we do in various XDP attach cases already
>> through XDP_SETUP_PROG{,_HW}).
> 
> I like proposal ii) better.  But how do I specify/input that I need
> e.g. the XDP_REDIRECT feature, such that is it avail to "the BPF
> program at verification time"?
> 
> My proposal iii), what about at XDP attachment time, create a new
> netlink attribute that describe XDP action codes the program
> needs/wants. If the info is provided, the ndo_bpf call check and
> reject, and respond with netlink extack message.
>   If I want to query for avail action codes, then I can use the same
> netlink attribute format, and kernel will return it "populated" with
> the info.
> 
> It is very useful that the program gets rejected at attachment time (to
> avoid loading an XDP prog that silently drops packets). BUT I also
> want a query option/functionality (reuse netlink attr format).
> 
> Specifically for Suricata the same "bypass" features can be implemented
> at different levels (XDP, XDP-generic or clsbpf), and the XDP program
> contains multiple features.  Thus, depending on what NIC driver
> supports, we want to load different XDP and/or clsbpf/TC BPF-programs.
> Thus, still support the same user requested feature/functionality, even
> if XDP_REDIRECT is not avail, just slightly slower.

Makes sense to have such fallbacks.

>> This would, for example, avoid the need for feature bits, and do actual
>> rejection of the program while retaining flexibility (and avoiding to
>> expose bits that over time hopefully will deprecate anyway due to all
>> XDP aware drivers implementing them). For both cases i) and ii), it
>> would mean we make the verifier a bit smarter with regards to keeping
>> track of driver related (XDP) requirements. Program return code checking
>> is already present in the verifier's check_return_code() and we could
>> extend it for XDP as well, for example. Seems cleaner and more extensible
>> than feature bits, imho.
> 
> I thought the verifier's check_return_code() couldn't see/verify if the
> XDP return action code is provided as a map lookup result(?). How is
> that handled?

For the latter, I would just add a BPF_F_STRICT_CONST_VERDICT load flag
which e.g. enforces a constant return code in all prog types. It also needs
to check for helpers like bpf_xdp_redirect() and track R0 from there that
it either contains XDP_REDIRECT or XDP_ABORTED.

For the bpf_prog_aux, we need a prog dependent void *private_data area
pointer in bpf_prog_aux that verifier populates; e.g. we could migrate
already some of the prog type specific fields into that like kprobe_override,
cb_access, dst_needed that are non-generic anyway. For XDP, verifier would
in your case record all seen return codes into private_data. When the flag
BPF_F_STRICT_CONST_VERDICT is not used and we noticed there were cases
where the verdict was not a verifiable constant, then we e.g. mark all
XDP return codes as seen. Potentially the same is needed for tail calls.

We can add a ndo_bpf query to drivers that return their supported XDP return
codes and compare them in e.g. dev_change_xdp_fd() out of generic code and
reject with extack.

For tail calls, the only way that comes to mind right now where you could
lift that requirement with having to mark all return codes as seen is that
you'd need to pass the ifindex as in offload case at load time, such that
you the program becomes tied to the device. Then you also need to record
dev pointer in the private_data such that you can add a new callback to
bpf_prog_ops for prog dependent compare in bpf_prog_array_compatible() to
make sure both would have the same target owner dev where the return code
check was previously asserted.

If you want to expose that internal ndo_bpf query which specifically returns
the set of supported XDP return codes I wouldn't mind, that way it's not
some sort of generic feature bit query, but a specific query for the set of
return codes the driver supports, thus keeping this very limited and avoiding
mixing this with other future feature bits that could turn this into a big
mess; I'm not sure right now though what would be the best uapi to query
that info from.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-10 10:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20180404142856.284ee8c9@redhat.com>
     [not found] ` <20180404142856.284ee8c9-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-05 10:37   ` Best userspace programming API for XDP features query to kernel? Daniel Borkmann via iovisor-dev
     [not found]     ` <95ac793b-66aa-b53c-a9bc-c86cbc47ee1a-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2018-04-05 20:51       ` Jesper Dangaard Brouer via iovisor-dev
     [not found]         ` <20180405225133.18a09883-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-05 21:47           ` Jakub Kicinski via iovisor-dev
     [not found]             ` <20180405144716.2bdd4214-68UzVGuGftmUSpRRplVxJ1aTQe2KTcn/@public.gmane.org>
2018-04-06  8:47               ` Jesper Dangaard Brouer via iovisor-dev
2018-04-06 10:36           ` Daniel Borkmann via iovisor-dev
2018-04-09 11:26             ` [iovisor-dev] " Jesper Dangaard Brouer
2018-04-10 10:13               ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.