From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>, Ido Schimmel <idosch@idosch.org>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, jiri@mellanox.com, mlxsw@mellanox.com,
dsahern@gmail.com, roopa@cumulusnetworks.com,
nikolay@cumulusnetworks.com, andy@greyhouse.net,
pablo@netfilter.org, jakub.kicinski@netronome.com,
pieter.jansenvanvuuren@netronome.com, andrew@lunn.ch,
f.fainelli@gmail.com, vivien.didelot@gmail.com,
idosch@mellanox.com
Subject: Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
Date: Fri, 12 Jul 2019 11:27:55 +0200 [thread overview]
Message-ID: <87r26vvbz8.fsf@toke.dk> (raw)
In-Reply-To: <20190711235354.GA30396@hmswarspite.think-freely.org>
Neil Horman <nhorman@tuxdriver.com> writes:
> On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
>> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
>> > From: Ido Schimmel <idosch@idosch.org>
>> > Date: Sun, 7 Jul 2019 10:58:17 +0300
>> >
>> > > Users have several ways to debug the kernel and understand why a packet
>> > > was dropped. For example, using "drop monitor" and "perf". Both
>> > > utilities trace kfree_skb(), which is the function called when a packet
>> > > is freed as part of a failure. The information provided by these tools
>> > > is invaluable when trying to understand the cause of a packet loss.
>> > >
>> > > In recent years, large portions of the kernel data path were offloaded
>> > > to capable devices. Today, it is possible to perform L2 and L3
>> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
>> > > Different TC classifiers and actions are also offloaded to capable
>> > > devices, at both ingress and egress.
>> > >
>> > > However, when the data path is offloaded it is not possible to achieve
>> > > the same level of introspection as tools such "perf" and "drop monitor"
>> > > become irrelevant.
>> > >
>> > > This patchset aims to solve this by allowing users to monitor packets
>> > > that the underlying device decided to drop along with relevant metadata
>> > > such as the drop reason and ingress port.
>> >
>> > We are now going to have 5 or so ways to capture packets passing through
>> > the system, this is nonsense.
>> >
>> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
>> > devlink thing.
>> >
>> > This is insanity, too many ways to do the same thing and therefore the
>> > worst possible user experience.
>> >
>> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
>> > XDP perf events, and these taps there too.
>> >
>> > I mean really, think about it from the average user's perspective. To
>> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
>> > listen on devlink but configure a special tap thing beforehand and then
>> > if someone is using XDP I gotta setup another perf event buffer capture
>> > thing too.
>>
>> Dave,
>>
>> Before I start working on v2, I would like to get your feedback on the
>> high level plan. Also adding Neil who is the maintainer of drop_monitor
>> (and counterpart DropWatch tool [1]).
>>
>> IIUC, the problem you point out is that users need to use different
>> tools to monitor packet drops based on where these drops occur
>> (SW/HW/XDP).
>>
>> Therefore, my plan is to extend the existing drop_monitor netlink
>> channel to also cover HW drops. I will add a new message type and a new
>> multicast group for HW drops and encode in the message what is currently
>> encoded in the devlink events.
>>
> A few things here:
> IIRC we don't announce individual hardware drops, drivers record them in
> internal structures, and they are retrieved on demand via ethtool calls, so you
> will either need to include some polling (probably not a very performant idea),
> or some sort of flagging mechanism to indicate that on the next message sent to
> user space you should go retrieve hw stats from a given interface. I certainly
> wouldn't mind seeing this happen, but its more work than just adding a new
> netlink message.
>
> Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
> hardware (you'll need your hw drop gathering mechanism for that), but for xdp
> programs run on the cpu, dropwatch should alrady catch those. I.e. if the xdp
> program returns a DROP result for a packet being processed, the OS will call
> kfree_skb on its behalf, and dropwatch wil call that.
There is no skb by the time an XDP program runs, so this is not true. As
I mentioned upthread, there's a tracepoint that will get called if an
error occurs (or the program returns XDP_ABORTED), but in most cases,
XDP_DROP just means that the packet silently disappears...
-Toke
next prev parent reply other threads:[~2019-07-12 9:28 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-07 7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 01/11] devlink: Create helper to fill port type information Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 02/11] devlink: Add packet trap infrastructure Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 03/11] devlink: Add generic packet traps and groups Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 04/11] Documentation: Add devlink-trap documentation Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 05/11] netdevsim: Add devlink-trap support Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 06/11] Documentation: Add description of netdevsim traps Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 07/11] mlxsw: core: Add API to set trap action Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 08/11] mlxsw: reg: Add new " Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 09/11] mlxsw: Add layer 2 discard trap IDs Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 10/11] mlxsw: Add trap group for layer 2 discards Ido Schimmel
2019-07-07 7:58 ` [PATCH net-next 11/11] mlxsw: spectrum: Add devlink-trap support Ido Schimmel
2019-07-07 8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
2019-07-07 8:01 ` [PATCH iproute2-next 1/7] devlink: Increase number of supported options Ido Schimmel
2019-07-07 8:01 ` [PATCH iproute2-next 2/7] devlink: Add devlink trap set and show commands Ido Schimmel
2019-07-07 8:01 ` [PATCH iproute2-next 3/7] devlink: Add devlink trap group " Ido Schimmel
2019-07-07 8:01 ` [PATCH iproute2-next 4/7] devlink: Add devlink trap monitor support Ido Schimmel
2019-07-07 8:01 ` [PATCH iproute2-next 5/7] devlink: Set NETLINK_NO_ENOBUFS when monitoring events Ido Schimmel
2019-07-07 8:01 ` [PATCH iproute2-next 6/7] devlink: Add fflush() to print functions Ido Schimmel
2019-07-07 8:02 ` [PATCH iproute2-next 7/7] devlink: Add man page for devlink-trap Ido Schimmel
2019-07-07 8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
2019-07-07 8:03 ` [RFC PATCH net-next 1/5] selftests: devlink_trap: Add test cases for devlink-trap Ido Schimmel
2019-07-07 8:03 ` [RFC PATCH net-next 2/5] Documentation: Add a section for devlink-trap testing Ido Schimmel
2019-07-07 8:03 ` [RFC PATCH net-next 3/5] selftests: forwarding: devlink_lib: Add devlink-trap helpers Ido Schimmel
2019-07-07 8:03 ` [RFC PATCH net-next 4/5] selftests: mlxsw: Add test cases for devlink-trap L2 drops Ido Schimmel
2019-07-07 8:03 ` [RFC PATCH net-next 5/5] selftests: mlxsw: Add a test case for devlink-trap Ido Schimmel
2019-07-07 8:15 ` [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
2019-07-07 19:45 ` David Miller
2019-07-08 13:19 ` Ido Schimmel
2019-07-08 22:51 ` Jakub Kicinski
2019-07-09 12:38 ` Ido Schimmel
2019-07-09 22:34 ` Jakub Kicinski
2019-07-10 11:20 ` Ido Schimmel
2019-07-10 11:39 ` Toke Høiland-Jørgensen
2019-07-11 12:39 ` Ido Schimmel
2019-07-11 19:02 ` David Miller
2019-07-11 23:53 ` Neil Horman
2019-07-12 3:40 ` Florian Fainelli
2019-07-12 12:05 ` Neil Horman
2019-07-12 9:27 ` Toke Høiland-Jørgensen [this message]
2019-07-12 12:18 ` Neil Horman
2019-07-12 12:33 ` Toke Høiland-Jørgensen
2019-07-13 0:40 ` Neil Horman
2019-07-13 8:07 ` Toke Høiland-Jørgensen
2019-07-12 13:52 ` Ido Schimmel
2019-07-14 11:29 ` Neil Horman
2019-07-14 12:43 ` Ido Schimmel
2019-07-14 2:38 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r26vvbz8.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrew@lunn.ch \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=f.fainelli@gmail.com \
--cc=idosch@idosch.org \
--cc=idosch@mellanox.com \
--cc=jakub.kicinski@netronome.com \
--cc=jiri@mellanox.com \
--cc=mlxsw@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=nikolay@cumulusnetworks.com \
--cc=pablo@netfilter.org \
--cc=pieter.jansenvanvuuren@netronome.com \
--cc=roopa@cumulusnetworks.com \
--cc=vivien.didelot@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).