netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Ido Schimmel <idosch@idosch.org>,
	David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org, jiri@mellanox.com, mlxsw@mellanox.com,
	dsahern@gmail.com, roopa@cumulusnetworks.com,
	nikolay@cumulusnetworks.com, andy@greyhouse.net,
	pablo@netfilter.org, jakub.kicinski@netronome.com,
	pieter.jansenvanvuuren@netronome.com, andrew@lunn.ch,
	f.fainelli@gmail.com, vivien.didelot@gmail.com,
	idosch@mellanox.com
Subject: Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
Date: Fri, 12 Jul 2019 08:18:59 -0400	[thread overview]
Message-ID: <20190712121859.GB13696@hmswarspite.think-freely.org> (raw)
In-Reply-To: <87r26vvbz8.fsf@toke.dk>

On Fri, Jul 12, 2019 at 11:27:55AM +0200, Toke Høiland-Jørgensen wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
> >> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> >> > From: Ido Schimmel <idosch@idosch.org>
> >> > Date: Sun,  7 Jul 2019 10:58:17 +0300
> >> > 
> >> > > Users have several ways to debug the kernel and understand why a packet
> >> > > was dropped. For example, using "drop monitor" and "perf". Both
> >> > > utilities trace kfree_skb(), which is the function called when a packet
> >> > > is freed as part of a failure. The information provided by these tools
> >> > > is invaluable when trying to understand the cause of a packet loss.
> >> > > 
> >> > > In recent years, large portions of the kernel data path were offloaded
> >> > > to capable devices. Today, it is possible to perform L2 and L3
> >> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> >> > > Different TC classifiers and actions are also offloaded to capable
> >> > > devices, at both ingress and egress.
> >> > > 
> >> > > However, when the data path is offloaded it is not possible to achieve
> >> > > the same level of introspection as tools such "perf" and "drop monitor"
> >> > > become irrelevant.
> >> > > 
> >> > > This patchset aims to solve this by allowing users to monitor packets
> >> > > that the underlying device decided to drop along with relevant metadata
> >> > > such as the drop reason and ingress port.
> >> > 
> >> > We are now going to have 5 or so ways to capture packets passing through
> >> > the system, this is nonsense.
> >> > 
> >> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> >> > devlink thing.
> >> > 
> >> > This is insanity, too many ways to do the same thing and therefore the
> >> > worst possible user experience.
> >> > 
> >> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> >> > XDP perf events, and these taps there too.
> >> > 
> >> > I mean really, think about it from the average user's perspective.  To
> >> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> >> > listen on devlink but configure a special tap thing beforehand and then
> >> > if someone is using XDP I gotta setup another perf event buffer capture
> >> > thing too.
> >> 
> >> Dave,
> >> 
> >> Before I start working on v2, I would like to get your feedback on the
> >> high level plan. Also adding Neil who is the maintainer of drop_monitor
> >> (and counterpart DropWatch tool [1]).
> >> 
> >> IIUC, the problem you point out is that users need to use different
> >> tools to monitor packet drops based on where these drops occur
> >> (SW/HW/XDP).
> >> 
> >> Therefore, my plan is to extend the existing drop_monitor netlink
> >> channel to also cover HW drops. I will add a new message type and a new
> >> multicast group for HW drops and encode in the message what is currently
> >> encoded in the devlink events.
> >> 
> > A few things here:
> > IIRC we don't announce individual hardware drops, drivers record them in
> > internal structures, and they are retrieved on demand via ethtool calls, so you
> > will either need to include some polling (probably not a very performant idea),
> > or some sort of flagging mechanism to indicate that on the next message sent to
> > user space you should go retrieve hw stats from a given interface.  I certainly
> > wouldn't mind seeing this happen, but its more work than just adding a new
> > netlink message.
> >
> > Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
> > hardware (you'll need your hw drop gathering mechanism for that), but for xdp
> > programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
> > program returns a DROP result for a packet being processed, the OS will call
> > kfree_skb on its behalf, and dropwatch wil call that.
> 
> There is no skb by the time an XDP program runs, so this is not true. As
> I mentioned upthread, there's a tracepoint that will get called if an
> error occurs (or the program returns XDP_ABORTED), but in most cases,
> XDP_DROP just means that the packet silently disappears...
> 
As I noted, thats only true for xdp programs that are offloaded to hardware, I
was only speaking for XDP programs that run on the cpu.  For the former case, we
obviously need some other mechanism to detect drops, but for cpu executed xdp
programs, the OS is responsible for freeing skbs associated with programs the
return XDP_DROP.

Neil

> -Toke
> 

  reply	other threads:[~2019-07-12 12:19 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 01/11] devlink: Create helper to fill port type information Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 02/11] devlink: Add packet trap infrastructure Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 03/11] devlink: Add generic packet traps and groups Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 04/11] Documentation: Add devlink-trap documentation Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 05/11] netdevsim: Add devlink-trap support Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 06/11] Documentation: Add description of netdevsim traps Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 07/11] mlxsw: core: Add API to set trap action Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 08/11] mlxsw: reg: Add new " Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 09/11] mlxsw: Add layer 2 discard trap IDs Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 10/11] mlxsw: Add trap group for layer 2 discards Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 11/11] mlxsw: spectrum: Add devlink-trap support Ido Schimmel
2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 1/7] devlink: Increase number of supported options Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 2/7] devlink: Add devlink trap set and show commands Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 3/7] devlink: Add devlink trap group " Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 4/7] devlink: Add devlink trap monitor support Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 5/7] devlink: Set NETLINK_NO_ENOBUFS when monitoring events Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 6/7] devlink: Add fflush() to print functions Ido Schimmel
2019-07-07  8:02   ` [PATCH iproute2-next 7/7] devlink: Add man page for devlink-trap Ido Schimmel
2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 1/5] selftests: devlink_trap: Add test cases for devlink-trap Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 2/5] Documentation: Add a section for devlink-trap testing Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 3/5] selftests: forwarding: devlink_lib: Add devlink-trap helpers Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 4/5] selftests: mlxsw: Add test cases for devlink-trap L2 drops Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 5/5] selftests: mlxsw: Add a test case for devlink-trap Ido Schimmel
2019-07-07  8:15 ` [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
2019-07-07 19:45 ` David Miller
2019-07-08 13:19   ` Ido Schimmel
2019-07-08 22:51     ` Jakub Kicinski
2019-07-09 12:38       ` Ido Schimmel
2019-07-09 22:34         ` Jakub Kicinski
2019-07-10 11:20           ` Ido Schimmel
2019-07-10 11:39             ` Toke Høiland-Jørgensen
2019-07-11 12:39   ` Ido Schimmel
2019-07-11 19:02     ` David Miller
2019-07-11 23:53     ` Neil Horman
2019-07-12  3:40       ` Florian Fainelli
2019-07-12 12:05         ` Neil Horman
2019-07-12  9:27       ` Toke Høiland-Jørgensen
2019-07-12 12:18         ` Neil Horman [this message]
2019-07-12 12:33           ` Toke Høiland-Jørgensen
2019-07-13  0:40             ` Neil Horman
2019-07-13  8:07               ` Toke Høiland-Jørgensen
2019-07-12 13:52       ` Ido Schimmel
2019-07-14 11:29         ` Neil Horman
2019-07-14 12:43           ` Ido Schimmel
2019-07-14  2:38     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190712121859.GB13696@hmswarspite.think-freely.org \
    --to=nhorman@tuxdriver.com \
    --cc=andrew@lunn.ch \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=f.fainelli@gmail.com \
    --cc=idosch@idosch.org \
    --cc=idosch@mellanox.com \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=mlxsw@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=pablo@netfilter.org \
    --cc=pieter.jansenvanvuuren@netronome.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=toke@redhat.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).