All of lore.kernel.org
 help / color / mirror / Atom feed
* How to send nf trace notifications to userspace?
@ 2015-11-06 23:29 Florian Westphal
  2015-11-07 16:38 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2015-11-06 23:29 UTC (permalink / raw)
  To: netfilter-devel

$subject says most of it.

I started to resurrect the 'nft trace' patch set from

https://patchwork.ozlabs.org/patch/444772/

It works ok with couple of changes to nf_tables_core and nfnetlink
to propagate trace type (return, policy, ...) and the handle of the
rule currently processed.  Also saves us from the 'rule counting'
that we're currently doing in kernel.

$ src/nft trace
SRC=192.168.7.1 DST=192.168.7.10 LEN=88 TTL=64 ID=39366 PROTO=6 SPT=47028 DPT=22 raw prerouting (# handle 6)
        ip protocol tcp limit rate 1/second nftrace set 1
RETURN raw prerouting
ACCEPT raw prerouting
filter input (# handle 7)
        tcp dport ssh jump foo
filter foo (# handle 5)
        ip daddr 192.168.7.10
filter foo (# handle 6)
        counter packets 912 bytes 63619
RETURN filter foo
filter input (# handle 8)
        ip protocol tcp ip daddr 192.168.7.10
filter input (# handle 9)
        tcp dport ssh accept

So basic functionality works on current nftables head.

However, the code uses nfnetlink_log.  It appears that its
unicast only, i.e. its not possible to run another 'nft trace' in parallel
and also clashes with ulogd or other running logger.

Thats a non-starter.

So I'm inclined to toss the nfnetlink_log based changes and
instead (re-)use the infrastructure that we have for chain, rule, ... notifications.

That would mean i'd also just plug the tracing notifications decoding into nft monitor.

Is that ok?
Any better ideas?

Do you think its useful to also dump skb data (i.e. in/outdev,
packet payload, etc?)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to send nf trace notifications to userspace?
  2015-11-06 23:29 How to send nf trace notifications to userspace? Florian Westphal
@ 2015-11-07 16:38 ` Pablo Neira Ayuso
  2015-11-07 20:49   ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2015-11-07 16:38 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Sat, Nov 07, 2015 at 12:29:00AM +0100, Florian Westphal wrote:
[...]
> However, the code uses nfnetlink_log.  It appears that its
> unicast only, i.e. its not possible to run another 'nft trace' in parallel
> and also clashes with ulogd or other running logger.
> 
> Thats a non-starter.
> 
> So I'm inclined to toss the nfnetlink_log based changes and
> instead (re-)use the infrastructure that we have for chain, rule, ... notifications.

Something that integrates tightly with the infrastructure is the way
to go. I'd suggest you add a new NFNLGRP_NFTABLES_TRACE nfnetlink
group.

> [...] Also saves us from the 'rule counting' that we're currently
> doing in kernel.

I think we'll have to keep that counting around to provide the legacy
output from the compat layer.

> $ src/nft trace
> SRC=192.168.7.1 DST=192.168.7.10 LEN=88 TTL=64 ID=39366 PROTO=6 SPT=47028 DPT=22 raw prerouting (# handle 6)
> ACCEPT raw prerouting
> filter input (# handle 7)
>         tcp dport ssh jump foo
> filter foo (# handle 5)
>         ip daddr 192.168.7.10
> filter foo (# handle 6)
>         counter packets 912 bytes 63619
> RETURN filter foo
> filter input (# handle 8)
>         ip protocol tcp ip daddr 192.168.7.10
> filter input (# handle 9)
>         tcp dport ssh accept

I guess this is a draft proposal, but I think it would be good to have
a more compact output that can be easily read, eg.

matches: filter input tcp dport ssh accept # handle 9

> Do you think its useful to also dump skb data (i.e. in/outdev,
> packet payload, etc?)

Such information is useful to know what packet is matching what rules.

If we have a way to uniquely identify the packet (probably part of the
skbuff address as we do with conntrack ID from ctnetlink?), we can
just dump this packet data and metadata only once when setting
skb->nf_trace to 1, then from the trace event we only include the
packet ID that results from matching a rule, so userspace can
correlate.

On a different front but related, I think this infrastructure may
allow us to implement the packet path test infrastructure that we
discussed many times before during netfilter workshops at some point.
Kind of similar thing that Rusty made time ago with nfsim.

Thanks for taking over this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to send nf trace notifications to userspace?
  2015-11-07 16:38 ` Pablo Neira Ayuso
@ 2015-11-07 20:49   ` Florian Westphal
  2015-11-09 13:36     ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2015-11-07 20:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > However, the code uses nfnetlink_log.  It appears that its
> > unicast only, i.e. its not possible to run another 'nft trace' in parallel
> > and also clashes with ulogd or other running logger.
> > 
> > Thats a non-starter.
> > 
> > So I'm inclined to toss the nfnetlink_log based changes and
> > instead (re-)use the infrastructure that we have for chain, rule, ... notifications.
> 
> Something that integrates tightly with the infrastructure is the way
> to go. I'd suggest you add a new NFNLGRP_NFTABLES_TRACE nfnetlink
> group.

Okay, can do.

> > [...] Also saves us from the 'rule counting' that we're currently
> > doing in kernel.
> 
> I think we'll have to keep that counting around to provide the legacy
> output from the compat layer.

Can you elaborate?  I find scavenging dmesg output rather painful so I
see no reason to keep the current code once the new facility is available.

In any case, i can keep it since I planned to see if we can add static
key to avoid all the 'if (skb->nf_trace)' stuff for the normal case.

But I'd want to see a convincing argument to keep the existing trace
stuff first :-)

> > $ src/nft trace
> > SRC=192.168.7.1 DST=192.168.7.10 LEN=88 TTL=64 ID=39366 PROTO=6 SPT=47028 DPT=22 raw prerouting (# handle 6)
> > ACCEPT raw prerouting
> > filter input (# handle 7)
> >         tcp dport ssh jump foo
> > filter foo (# handle 5)
> >         ip daddr 192.168.7.10
> > filter foo (# handle 6)
> >         counter packets 912 bytes 63619
> > RETURN filter foo
> > filter input (# handle 8)
> >         ip protocol tcp ip daddr 192.168.7.10
> > filter input (# handle 9)
> >         tcp dport ssh accept
> 
> I guess this is a draft proposal, but I think it would be good to have
> a more compact output that can be easily read, eg.

Yes, I have not thought about representation in userspace so far.

On a related note, should that be part of 'nft monitor' or a distinct
command?

My gut feeling was to re-use monitor mode, re-use NFNLGRP_NFTABLES and
just add a new message type (NFT_MSG_TRACE) for it, on the grounds that
tracing is usually off on a production system so I don't see why we would
have to prevent normal event listeners from seeing trace messages.

But, your call.

> matches: filter input tcp dport ssh accept # handle 9

looks more similar to standard '--handle' format, so its definitely
better.  I'll deal with this after all the interface details are hashed
out.

> > Do you think its useful to also dump skb data (i.e. in/outdev,
> > packet payload, etc?)
> 
> Such information is useful to know what packet is matching what rules.
> 
> If we have a way to uniquely identify the packet (probably part of the
> skbuff address as we do with conntrack ID from ctnetlink?), we can
> just dump this packet data and metadata only once when setting
> skb->nf_trace to 1, then from the trace event we only include the
> packet ID that results from matching a rule, so userspace can
> correlate.

Ok, that seems sensible.
What would you suggest?

I'm inclined to go with nested attributes for this, i.e.:

nest = nla_nest_start(skb, NFTA_TRACE_RULE);
..
nla_put_be64(skb, NFTA_RULE_HANDLE, cpu_to_be64(rule->handle)))
nla_put_string( NFTA_RULE_CHAIN, TABLE, etc...
nla_nest_end()

For the first trace message (when skb->nf_trace changes to 1),
this would mean we'd have something like:

nest = nla_nest_start(skb, NFTA_TRACE_PACKET)
 -> NFULA_PACKET_HDR,
 -> NFULA_PREFIX

etc.

I'd see if we can do this by refactoring nfnetlink_log.c to provide
a helper to fill in this nested attibute data to avoid copy&paste of
__build_packet_message().

It could then be called by nft_meta.c when nf_trace is set on an skb.
Downside is that this would create module dependency on nfnetlink_log.

What do you think?
Alternative is to copy&paste, but trim it down to not include all
the info nfnetlink_log offers, f.e. i'm not sure iif, oif, etc are
needed given this can be filtered by setting the nftables meta nftrace
rules appropriately.

This also has implications on nft dependencies -- should it use
libnfnetlink_log?  With reduced attributes-we-log that could be avoided.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to send nf trace notifications to userspace?
  2015-11-07 20:49   ` Florian Westphal
@ 2015-11-09 13:36     ` Patrick McHardy
  2015-11-12 15:46       ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2015-11-09 13:36 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel

On 07.11, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Do you think its useful to also dump skb data (i.e. in/outdev,
> > > packet payload, etc?)
> > 
> > Such information is useful to know what packet is matching what rules.
> > 
> > If we have a way to uniquely identify the packet (probably part of the
> > skbuff address as we do with conntrack ID from ctnetlink?), we can
> > just dump this packet data and metadata only once when setting
> > skb->nf_trace to 1, then from the trace event we only include the
> > packet ID that results from matching a rule, so userspace can
> > correlate.
> 
> Ok, that seems sensible.
> What would you suggest?
> 
> I'm inclined to go with nested attributes for this, i.e.:
> 
> nest = nla_nest_start(skb, NFTA_TRACE_RULE);
> ..
> nla_put_be64(skb, NFTA_RULE_HANDLE, cpu_to_be64(rule->handle)))
> nla_put_string( NFTA_RULE_CHAIN, TABLE, etc...
> nla_nest_end()
> 
> For the first trace message (when skb->nf_trace changes to 1),
> this would mean we'd have something like:
> 
> nest = nla_nest_start(skb, NFTA_TRACE_PACKET)
>  -> NFULA_PACKET_HDR,
>  -> NFULA_PREFIX
> 
> etc.
> 
> I'd see if we can do this by refactoring nfnetlink_log.c to provide
> a helper to fill in this nested attibute data to avoid copy&paste of
> __build_packet_message().
> 
> It could then be called by nft_meta.c when nf_trace is set on an skb.
> Downside is that this would create module dependency on nfnetlink_log.

On dumping the packet meta data - one idea would be to have the user
specify what he is interested in using the existing expressions. We
could simply dump that data to the registers and include the raw data
in the message, nft has enough information to decode it. The upside
would be that we don't have to unconditionally include all data and
the user can also dump "unusual" parts in case he's interested in them.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure
  2015-11-09 13:36     ` Patrick McHardy
@ 2015-11-12 15:46       ` Florian Westphal
  2015-11-14 18:50         ` Patrick McHardy
  2015-11-16 21:56         ` Pablo Neira Ayuso
  0 siblings, 2 replies; 10+ messages in thread
From: Florian Westphal @ 2015-11-12 15:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

RFC for the kernel-side of the tracing changes.

Caveats+Questions are:
 - should kernel dump entire rule instead of just the handle?
 PRO of dumping entire rule matched:
  * Userspace doesn't need to dump rules first to display
  the rule that matched
  * nft trace doesn't need to also monitor ruleset for changes
  * would make it possible to use a new nfnetlink group as Pablo
  suggested

 CON:
  * needs more memory in kernel space (can't use NLMSG_GOODSIZE -
    allocation uses GFP_ATOMIC)

 This patch only dumps the rule handle.

 As you can see, I reimplemented parts of nfqueue/nfnetlink_log attributes.
 Problem is that I don't want to add dependencies on nfetlink_log for this,
 but things like iif and oif are rather imporant, esp. when debugging
 packets being forwarded.

 The first 128 bytes of the packet data is also dumped to userspace, currently
 limited to when we do the initial 'skb->nf_trace=1' assignment in nft_meta.c

 Patrick, I saw your idea wrt. dumping register contents instead, but it
 seems both complex and 'backwards' to me -- isn't the 'meta set nftrace'
 rule already a selector, i.e. user would say something like

 'tcp port 22 limit rate 1/second meta nftrace set 1'

 So I'm not sure its right to extend nftrace with additional selectors (its
 also possible that I failed to understand what you were suggesting 8-} )

The userspace side currently lives in libnftnl, with new api functions:

struct nftnl_trace *nftnl_trace_alloc(void);
void nftnl_trace_free(struct nftnl_trace *trace);
bool nftnl_trace_is_set(const struct nftnl_trace *trace, uint16_t type);

const void *nftnl_trace_get_data(const struct nftnl_trace *trace, uint16_t type, uint32_t *data_len);
uint32_t nftnl_trace_get_u32(const struct nftnl_trace *trace, uint16_t type);
uint64_t nftnl_trace_get_u64(const struct nftnl_trace *trace, uint16_t type);
const char *nftnl_trace_get_str(const struct nftnl_trace *trace, uint16_t type);

int nftnl_trace_nlmsg_parse(const struct nlmsghdr *nlh, struct nftnl_trace *t);
int nftnl_trace_snprintf(char *buf, size_t size, const struct nftnl_trace *t, uint32_t type);
int nftnl_trace_fprintf(FILE *fh, const struct nftnl_trace *t, uint32_t type);


The nftnl_trace struct is readonly (thus no setters).
I'd then extend nft to use those helpers.

Cheers,
Florian
--
 include/net/netfilter/nf_tables.h        |   6 ++
 include/uapi/linux/netfilter/nf_tables.h |  27 ++++++++
 net/netfilter/nf_tables_api.c            | 114 +++++++++++++++++++++++++++++++
 net/netfilter/nf_tables_core.c           |  63 +++++------------
 net/netfilter/nft_meta.c                 |   3 +
 5 files changed, 166 insertions(+), 47 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index c9149cc..a92be10 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -878,6 +878,12 @@ void nft_unregister_chain_type(const struct nf_chain_type *);
 int nft_register_expr(struct nft_expr_type *);
 void nft_unregister_expr(struct nft_expr_type *);
 
+void nf_tables_trace_notify(const struct nft_pktinfo *pkt,
+			    const struct nft_chain *chain,
+			    const struct nft_rule *rule,
+			    u32 verdict,
+			    enum nft_trace_types type);
+
 #define nft_dereference(p)					\
 	nfnl_dereference(p, NFNL_SUBSYS_NFTABLES)
 
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index d8c8a7c..c7584dc 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -83,6 +83,7 @@ enum nft_verdicts {
  * @NFT_MSG_DELSETELEM: delete a set element (enum nft_set_elem_attributes)
  * @NFT_MSG_NEWGEN: announce a new generation, only for events (enum nft_gen_attributes)
  * @NFT_MSG_GETGEN: get the rule-set generation (enum nft_gen_attributes)
+ * @NFT_MSG_TRACE: trace event (enum nft_trace_attributes)
  */
 enum nf_tables_msg_types {
 	NFT_MSG_NEWTABLE,
@@ -102,6 +103,7 @@ enum nf_tables_msg_types {
 	NFT_MSG_DELSETELEM,
 	NFT_MSG_NEWGEN,
 	NFT_MSG_GETGEN,
+	NFT_MSG_TRACE,
 	NFT_MSG_MAX,
 };
 
@@ -970,4 +972,29 @@ enum nft_gen_attributes {
 };
 #define NFTA_GEN_MAX		(__NFTA_GEN_MAX - 1)
 
+enum nft_trace_attibutes {
+	NFTA_TRACE_UNSPEC,
+	NFTA_TRACE_CHAIN,
+	NFTA_TRACE_ID,
+	NFTA_TRACE_IIF,
+	NFTA_TRACE_OIF,
+	NFTA_TRACE_MARK,
+	NFTA_TRACE_PAYLOAD,
+	NFTA_TRACE_TABLE,
+	NFTA_TRACE_TYPE,
+	NFTA_TRACE_RULE_HANDLE,
+	NFTA_TRACE_VERDICT,
+	__NFTA_TRACE_MAX
+};
+#define NFTA_TRACE_MAX (__NFTA_TRACE_MAX - 1)
+
+enum nft_trace_types {
+	NFT_TRACETYPE_UNSPEC,
+	NFT_TRACETYPE_PACKET,
+	NFT_TRACETYPE_POLICY,
+	NFT_TRACETYPE_RETURN,
+	NFT_TRACETYPE_RULE,
+	__NFT_TRACETYPE_MAX
+};
+#define NFT_TRACETYPE_MAX (__NFT_TRACETYPE_MAX - 1)
 #endif /* _LINUX_NF_TABLES_H */
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 93cc473..65a6635 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -9,6 +9,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/hash.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/skbuff.h>
@@ -499,6 +500,119 @@ err:
 	return err;
 }
 
+void nf_tables_trace_notify(const struct nft_pktinfo *pkt,
+			    const struct nft_chain *chain,
+			    const struct nft_rule *rule,
+			    u32 verdict,
+			    enum nft_trace_types type)
+{
+	static const unsigned int hdrlen = 128;
+	struct nfgenmsg *nfmsg;
+	struct nlmsghdr *nlh;
+	struct sk_buff *skb;
+	unsigned int size;
+	int event = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_TRACE;
+
+	if (!nfnetlink_has_listeners(pkt->net, NFNLGRP_NFTABLES))
+		return;
+
+	/* Unlike other notifiers we need GFP_ATOMIC so use actual size
+	 * needed instead of NLMSG_GOODSIZE.
+	 */
+	size = nlmsg_total_size(sizeof(struct nfgenmsg))
+		+ nla_total_size(sizeof(__be32))	/* trace type */
+		+ nla_total_size(NFT_TABLE_MAXNAMELEN)
+		+ nla_total_size(NFT_CHAIN_MAXNAMELEN)
+		+ nla_total_size(sizeof(u32))	/* iif */
+		+ nla_total_size(sizeof(u32))	/* oif */
+		+ nla_total_size(sizeof(u32))	/* id */
+		+ nla_total_size(sizeof(u32))	/* mark */
+		+ nla_total_size(sizeof(u32))	/* verdict */
+		+ nla_total_size(sizeof(__be64)); /* rule handle */
+
+	switch (type) {
+	case NFT_TRACETYPE_PACKET:
+		size += nla_total_size(hdrlen);
+		break;
+	default:
+		break;
+	}
+
+	skb = nlmsg_new(size, GFP_ATOMIC);
+	if (skb == NULL)
+		return;
+
+	nlh = nlmsg_put(skb, 0, 0, event, sizeof(struct nfgenmsg), 0);
+	if (nlh == NULL)
+		goto nla_put_failure;
+
+	nfmsg = nlmsg_data(nlh);
+	nfmsg->nfgen_family	= pkt->pf;
+	nfmsg->version		= NFNETLINK_V0;
+	nfmsg->res_id		= htons(pkt->net->nft.base_seq & 0xffff);
+
+	if (nla_put_be32(skb, NFTA_TRACE_TYPE, htonl(type)))
+		goto nla_put_failure;
+
+	if (nla_put_be32(skb, NFTA_TRACE_ID, htonl(hash_ptr(skb, 32))))
+		goto nla_put_failure;
+
+	if (chain) {
+		if (nla_put_string(skb, NFTA_TRACE_TABLE, chain->table->name))
+			goto nla_put_failure;
+		if (nla_put_string(skb, NFTA_TRACE_CHAIN, chain->name))
+			goto nla_put_failure;
+	}
+
+	if (rule && nla_put_be64(skb, NFTA_TRACE_RULE_HANDLE,
+				 cpu_to_be64(rule->handle)))
+		goto nla_put_failure;
+
+	if (pkt->in &&
+	    nla_put_be32(skb, NFTA_TRACE_IIF, htonl(pkt->in->ifindex)))
+		goto nla_put_failure;
+	if (pkt->out &&
+	    nla_put_be32(skb, NFTA_TRACE_OIF, htonl(pkt->out->ifindex)))
+		goto nla_put_failure;
+	if (pkt->skb->mark &&
+	    nla_put_be32(skb, NFTA_TRACE_MARK, htonl(pkt->skb->mark)))
+		goto nla_put_failure;
+
+	switch (type) {
+	case NFT_TRACETYPE_POLICY:
+	case NFT_TRACETYPE_RETURN:
+	case NFT_TRACETYPE_RULE:
+		if (nla_put_be32(skb, NFTA_TRACE_VERDICT, htonl(verdict)))
+			goto nla_put_failure;
+		break;
+	case NFT_TRACETYPE_PACKET: {
+		unsigned int plen = min(hdrlen, pkt->skb->len);
+		struct nlattr *nla;
+
+		if (skb_tailroom(skb) < nla_total_size(plen))
+			goto nla_put_failure;
+
+		nla = (struct nlattr *)skb_put(skb, nla_total_size(plen));
+		nla->nla_type = NFTA_TRACE_PAYLOAD;
+		nla->nla_len = nla_attr_size(plen);
+
+		if (skb_copy_bits(pkt->skb, 0, nla_data(nla), plen))
+			goto nla_put_failure;
+		}
+		break;
+	default:
+		break;
+	}
+
+	nlmsg_end(skb, nlh);
+	nfnetlink_send(skb, pkt->net, 0, NFNLGRP_NFTABLES, 0, GFP_ATOMIC);
+	return;
+
+ nla_put_failure:
+	WARN_ON_ONCE(1);
+	kfree_skb(skb);
+}
+
 static int nf_tables_dump_tables(struct sk_buff *skb,
 				 struct netlink_callback *cb)
 {
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index f3695a4..731909d 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -22,44 +22,14 @@
 #include <net/netfilter/nf_tables.h>
 #include <net/netfilter/nf_log.h>
 
-enum nft_trace {
-	NFT_TRACE_RULE,
-	NFT_TRACE_RETURN,
-	NFT_TRACE_POLICY,
-};
-
-static const char *const comments[] = {
-	[NFT_TRACE_RULE]	= "rule",
-	[NFT_TRACE_RETURN]	= "return",
-	[NFT_TRACE_POLICY]	= "policy",
-};
-
-static struct nf_loginfo trace_loginfo = {
-	.type = NF_LOG_TYPE_LOG,
-	.u = {
-		.log = {
-			.level = LOGLEVEL_WARNING,
-			.logflags = NF_LOG_MASK,
-	        },
-	},
-};
-
-static void __nft_trace_packet(const struct nft_pktinfo *pkt,
-			       const struct nft_chain *chain,
-			       int rulenum, enum nft_trace type)
-{
-	nf_log_trace(pkt->net, pkt->pf, pkt->hook, pkt->skb, pkt->in,
-		     pkt->out, &trace_loginfo, "TRACE: %s:%s:%s:%u ",
-		     chain->table->name, chain->name, comments[type],
-		     rulenum);
-}
-
 static inline void nft_trace_packet(const struct nft_pktinfo *pkt,
 				    const struct nft_chain *chain,
-				    int rulenum, enum nft_trace type)
+				    const struct nft_rule *rule,
+				    u32 verdict,
+				    enum nft_trace_types type)
 {
 	if (unlikely(pkt->skb->nf_trace))
-		__nft_trace_packet(pkt, chain, rulenum, type);
+		nf_tables_trace_notify(pkt, chain, rule, verdict, type);
 }
 
 static void nft_cmp_fast_eval(const struct nft_expr *expr,
@@ -105,7 +75,6 @@ static bool nft_payload_fast_eval(const struct nft_expr *expr,
 struct nft_jumpstack {
 	const struct nft_chain	*chain;
 	const struct nft_rule	*rule;
-	int			rulenum;
 };
 
 unsigned int
@@ -119,11 +88,9 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
 	unsigned int stackptr = 0;
 	struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE];
 	struct nft_stats *stats;
-	int rulenum;
 	unsigned int gencursor = nft_genmask_cur(net);
 
 do_chain:
-	rulenum = 0;
 	rule = list_entry(&chain->rules, struct nft_rule, list);
 next_rule:
 	regs.verdict.code = NFT_CONTINUE;
@@ -133,8 +100,6 @@ next_rule:
 		if (unlikely(rule->genmask & (1 << gencursor)))
 			continue;
 
-		rulenum++;
-
 		nft_rule_for_each_expr(expr, last, rule) {
 			if (expr->ops == &nft_cmp_fast_ops)
 				nft_cmp_fast_eval(expr, &regs);
@@ -151,7 +116,8 @@ next_rule:
 			regs.verdict.code = NFT_CONTINUE;
 			continue;
 		case NFT_CONTINUE:
-			nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+			nft_trace_packet(pkt, chain, rule,
+					 regs.verdict.code, NFT_TRACETYPE_RULE);
 			continue;
 		}
 		break;
@@ -161,7 +127,9 @@ next_rule:
 	case NF_ACCEPT:
 	case NF_DROP:
 	case NF_QUEUE:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+		nft_trace_packet(pkt, chain, rule,
+				 regs.verdict.code & NF_VERDICT_MASK,
+				 NFT_TRACETYPE_RULE);
 		return regs.verdict.code;
 	}
 
@@ -170,19 +138,20 @@ next_rule:
 		BUG_ON(stackptr >= NFT_JUMP_STACK_SIZE);
 		jumpstack[stackptr].chain = chain;
 		jumpstack[stackptr].rule  = rule;
-		jumpstack[stackptr].rulenum = rulenum;
 		stackptr++;
 		/* fall through */
 	case NFT_GOTO:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+		nft_trace_packet(pkt, chain, rule,
+				 regs.verdict.code, NFT_TRACETYPE_RULE);
 
 		chain = regs.verdict.chain;
 		goto do_chain;
 	case NFT_CONTINUE:
-		rulenum++;
 		/* fall through */
 	case NFT_RETURN:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RETURN);
+		if (stackptr)
+			nft_trace_packet(pkt, chain, rule,
+					 regs.verdict.code, NFT_TRACETYPE_RETURN);
 		break;
 	default:
 		WARN_ON(1);
@@ -192,11 +161,11 @@ next_rule:
 		stackptr--;
 		chain = jumpstack[stackptr].chain;
 		rule  = jumpstack[stackptr].rule;
-		rulenum = jumpstack[stackptr].rulenum;
 		goto next_rule;
 	}
 
-	nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY);
+	nft_trace_packet(pkt, basechain, NULL,
+			 nft_base_chain(basechain)->policy, NFT_TRACETYPE_POLICY);
 
 	rcu_read_lock_bh();
 	stats = this_cpu_ptr(rcu_dereference(nft_base_chain(basechain)->stats));
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index e4ad2c2..857a652 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -200,6 +200,9 @@ void nft_meta_set_eval(const struct nft_expr *expr,
 		skb->priority = value;
 		break;
 	case NFT_META_NFTRACE:
+		if (skb->nf_trace == 0)
+			nf_tables_trace_notify(pkt, NULL, NULL, 0,
+					       NFT_TRACETYPE_PACKET);
 		skb->nf_trace = 1;
 		break;
 	default:
-- 
2.4.10


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure
  2015-11-12 15:46       ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
@ 2015-11-14 18:50         ` Patrick McHardy
  2015-11-16 21:56         ` Pablo Neira Ayuso
  1 sibling, 0 replies; 10+ messages in thread
From: Patrick McHardy @ 2015-11-14 18:50 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 12.11, Florian Westphal wrote:
> RFC for the kernel-side of the tracing changes.
> 
>  The first 128 bytes of the packet data is also dumped to userspace, currently
>  limited to when we do the initial 'skb->nf_trace=1' assignment in nft_meta.c
> 
>  Patrick, I saw your idea wrt. dumping register contents instead, but it
>  seems both complex and 'backwards' to me -- isn't the 'meta set nftrace'
>  rule already a selector, i.e. user would say something like
> 
>  'tcp port 22 limit rate 1/second meta nftrace set 1'
> 
>  So I'm not sure its right to extend nftrace with additional selectors (its
>  also possible that I failed to understand what you were suggesting 8-} )

Yeah, it was about the data that we dump, to make that explicitly selectable.
The idea was that you could explicitly specify to dump lets say tcp dport, skuid
and cpu number. But it might not work since we'd have to propagate that
selection along with the packet, so I guess, please forget about it :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure
  2015-11-12 15:46       ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
  2015-11-14 18:50         ` Patrick McHardy
@ 2015-11-16 21:56         ` Pablo Neira Ayuso
  2015-11-17  0:03           ` Florian Westphal
  1 sibling, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2015-11-16 21:56 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kaber

Hi Florian,

Cc'ing Patrick to shake him for ideas :)

On Thu, Nov 12, 2015 at 04:46:49PM +0100, Florian Westphal wrote:
> RFC for the kernel-side of the tracing changes.
> 
> Caveats+Questions are:
>  - should kernel dump entire rule instead of just the handle?
>  PRO of dumping entire rule matched:
>   * Userspace doesn't need to dump rules first to display
>   the rule that matched
>   * nft trace doesn't need to also monitor ruleset for changes
>   * would make it possible to use a new nfnetlink group as Pablo
>   suggested
> 
>  CON:
>   * needs more memory in kernel space (can't use NLMSG_GOODSIZE -
>     allocation uses GFP_ATOMIC)
> 
>  This patch only dumps the rule handle.

Agreed, only rule handle should be fine. We have limited bandwidth
delivering netlink event notifications from the packet path, so
keeping the message small there looks like a good idea to me.

>  As you can see, I reimplemented parts of nfqueue/nfnetlink_log attributes.
>  Problem is that I don't want to add dependencies on nfetlink_log for this,
>  but things like iif and oif are rather imporant, esp. when debugging
>  packets being forwarded.
> 
>  The first 128 bytes of the packet data is also dumped to userspace, currently
>  limited to when we do the initial 'skb->nf_trace=1' assignment in nft_meta.c
>
>  Patrick, I saw your idea wrt. dumping register contents instead, but it
>  seems both complex and 'backwards' to me -- isn't the 'meta set nftrace'
>  rule already a selector, i.e. user would say something like
> 
>  'tcp port 22 limit rate 1/second meta nftrace set 1'
> 
>  So I'm not sure its right to extend nftrace with additional selectors (its
>  also possible that I failed to understand what you were suggesting  8-} )

I can see room to extend this infrastructure later on if we need to
make it more flexible (eg. allow the user to specify what part of the
packets need to be dumped to userspace). We can probably even add a
specific 'trace' expression from the kernel to allow more specific
selection on what packet fields you want to dump to userspace.

BTW, any reason to remove the existing infrastructure? It's been there
since almost the beginning (this would break users that are expecting
the existing behaviour). Moreover, we'll have people using the
iptables-compat infrastructure also expecting a similar output to
native iptables.

You can probably generalize the trace infrastructure through static
key plus indirections to keep both tracing approaches around (allowing
only one at the same time should be enough).

>From userspace, we would need to have a way to indicate that we want
to use the new infrastructure, not the classic tracing. That syntax
would need some thinking. Probably we can introduce a 'trace'
statement, so the syntax looks compact like this:

'tcp port 22 limit rate 1/second trace'

Meaning in this case that the userspace is specifically requesting for
the new kind of tracing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure
  2015-11-16 21:56         ` Pablo Neira Ayuso
@ 2015-11-17  0:03           ` Florian Westphal
  2015-11-17 10:40             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2015-11-17  0:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, kaber

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > 
> >  So I'm not sure its right to extend nftrace with additional selectors (its
> >  also possible that I failed to understand what you were suggesting  8-} )
> 
> I can see room to extend this infrastructure later on if we need to
> make it more flexible (eg. allow the user to specify what part of the
> packets need to be dumped to userspace). We can probably even add a
> specific 'trace' expression from the kernel to allow more specific
> selection on what packet fields you want to dump to userspace.
> 
> BTW, any reason to remove the existing infrastructure? It's been there
> since almost the beginning (this would break users that are expecting
> the existing behaviour).

No specific reason, but I question the need to keep it around.
Once we have native tooling there should be no reason whatsover
to wade through dmesg output + manual matching of rules...  its just a
PITA and thats why I removed it -- 'nft monitor trace' (or whatever,
I've not decided on what nft side should look like) would be much
simpler/easier to use.

> Moreover, we'll have people using the
> iptables-compat infrastructure also expecting a similar output to
> native iptables.

Hmm, is iptables-compat a worthy focus?

AFAIU even iptables-compat users would be able to use nft trace
infrastructure, right?

Wouldn't it make more sense to convince people to go with real nft
rather than the compat layer?

> You can probably generalize the trace infrastructure through static
> key plus indirections to keep both tracing approaches around (allowing
> only one at the same time should be enough).

I'd like to see a stronger argument than this.

Bottom line is that i want to keep kernel side as simple as possible,
so I'd much rather just get rid of the printk-based tracing
(I'm *NOT* talking about the ip(6)tables stuff, that remains as-is)
rather than add more conditionals or indirect calls to some
trace-backends...

> From userspace, we would need to have a way to indicate that we want
> to use the new infrastructure, not the classic tracing. That syntax
> would need some thinking. Probably we can introduce a 'trace'
> statement, so the syntax looks compact like this:
> 
> 'tcp port 22 limit rate 1/second trace'
> 
> Meaning in this case that the userspace is specifically requesting for
> the new kind of tracing.

Hmm.  I'm not convinced there is any gain in doing this.
Who in their right mind would want to manually decode rules...?

That being said, I am not overly zealous about this, and I can keep the
printk stuff around.
I just think its not needed anymore.

If you have new nft with old kernel:
- you need to look and dmesg output (or syslog output if ypu use
nfnetlink_log logging backend).

if you have new nft with new kernel:
- nft monitor trace (subject to change, I'm open to suggestions)

If you have old nft with new kernel:
- possibly a problem, but then again, if you can update kernel,
why not use a newer nft as well...?

If you don't mind, i would prefer to work on this patch + nft +
libnftables integration and then submit that formally.

If you're then still convinced that its too problematic to remove the printk
based trace i'd consider to rework the kernel patch.

[ I'm not in a hurry, we can discuss this at netdev/netconf if you like ]

Let me know.

Thanks && regards,
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure
  2015-11-17  0:03           ` Florian Westphal
@ 2015-11-17 10:40             ` Pablo Neira Ayuso
  2015-11-17 11:04               ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2015-11-17 10:40 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kaber

On Tue, Nov 17, 2015 at 01:03:52AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
[...]
> > 
> > BTW, any reason to remove the existing infrastructure? It's been there
> > since almost the beginning (this would break users that are expecting
> > the existing behaviour).
> 
> No specific reason, but I question the need to keep it around.
> Once we have native tooling there should be no reason whatsover
> to wade through dmesg output + manual matching of rules...

It is not only dmesg.

You can also configure this thing to send trace messages through
nfnetlink_log through group 0.

> its just a PITA and thats why I removed it -- 'nft monitor trace'
> (or whatever, I've not decided on what nft side should look like)
> would be much simpler/easier to use.

It's always a PITA to keep extra code around to keep things
compatible, but there is no other way to deprecate things.

Don't forget this tracing infrastructure has been there for nearly two
years.

[...]
> Wouldn't it make more sense to convince people to go with real nft
> rather than the compat layer?

I wish there could be a way to "convice" users to move in a quick way,
but there is not. We can just provide something better and wait for
quite some time to deprecate things.

> If you don't mind, i would prefer to work on this patch + nft +
> libnftables integration and then submit that formally.

I would like to see a nice netlink-based tracing infrastructure like
the one you're currently shaping, no question about that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure
  2015-11-17 10:40             ` Pablo Neira Ayuso
@ 2015-11-17 11:04               ` Florian Westphal
  0 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2015-11-17 11:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, kaber

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, Nov 17, 2015 at 01:03:52AM +0100, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> [...]
> > > 
> > > BTW, any reason to remove the existing infrastructure? It's been there
> > > since almost the beginning (this would break users that are expecting
> > > the existing behaviour).
> > 
> > No specific reason, but I question the need to keep it around.
> > Once we have native tooling there should be no reason whatsover
> > to wade through dmesg output + manual matching of rules...
> 
> It is not only dmesg.
> 
> You can also configure this thing to send trace messages through
> nfnetlink_log through group 0.
> 
> > its just a PITA and thats why I removed it -- 'nft monitor trace'
> > (or whatever, I've not decided on what nft side should look like)
> > would be much simpler/easier to use.
> 
> It's always a PITA to keep extra code around to keep things
> compatible, but there is no other way to deprecate things.
> 
> Don't forget this tracing infrastructure has been there for nearly two
> years.

Hmm, I had hoped that we can just ignore that since its not an abi
and lack of these printks would apply some gentle pressure to switch to
native nft ;)

> > Wouldn't it make more sense to convince people to go with real nft
> > rather than the compat layer?
> 
> I wish there could be a way to "convice" users to move in a quick way,
> but there is not. We can just provide something better and wait for
> quite some time to deprecate things.

I have no idea how to make this nice :-(

I don't want to clutter dmesg/nfnetlink_log when someone is using nft trace.

So I'd propose to just check in the kernel trace event path if there is
a nfnl group, and if there is one, disable the printk forever:

if (old && nfnetlink_has_listeners(pkt->net, NFNLGRP_NFTABLES))
        old = false;

I see no other way to really make this play nice.  Any other ideas?
I'd rather not add yet another sysctl toggle...

On a different note, here is how userspace output looks like at the
moment:

# src/nft -a monitor trace
trace event ip packet src 192.168.7.1 dst 192.168.7.10 len 96 ttl 64 id 31765 proto 6 sport 53434 dport 22
trace event ip raw prerouting rule verdict continue
trace rule tcp dport ssh limit rate 1/second nftrace set 1 # handle 2
trace event ip raw prerouting policy verdict accept
trace event ip filter input rule verdict jump
trace rule ip saddr . tcp dport vmap { } # handle 7
trace event ip filter test rule verdict accept
trace rule accept # handle 4
add rule ip filter input ip protocol icmp accept # handle 15
trace event ip packet src 192.168.7.1 dst 192.168.7.10 len 52 ttl 64 id 28163 proto 6 sport 53420 dport 22
trace event ip raw prerouting rule verdict continue
trace rule tcp dport ssh limit rate 1/second nftrace set 1 # handle 2
trace event ip raw prerouting policy verdict accept
trace event ip filter input rule verdict jump
delete rule ip filter input handle 10
trace rule ip saddr . tcp dport vmap { } # handle 7
trace event ip filter test rule verdict accept
trace rule accept # handle 4
trace event ip packet src 192.168.7.1 dst 192.168.7.10 len 52 ttl 64 id
^C

Input on how to improve this welcome.
Most of the formatting is done via libnfnetlink helpers.

Main other point is (what we discussed earlier) -- how to match up
packet + the traced rules when several packets are traced at the same
time.

The kernel already provides a 32bit id value, based on skb address.
Simple solution would be to just print that id as well, e.g.

trace event packet 0x12345 src 192.168.7.1 dst 192.168.7.10 len 52 ttl 64 id 28163 proto 6 sport 53420 dport 22
trace rule packet 0x12345 tcp dport ssh ...
trace event packet 0x42 src 192.168.1.1 dst 10...

is that good enough or do you see a better/nicer solution?

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-11-17 11:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-06 23:29 How to send nf trace notifications to userspace? Florian Westphal
2015-11-07 16:38 ` Pablo Neira Ayuso
2015-11-07 20:49   ` Florian Westphal
2015-11-09 13:36     ` Patrick McHardy
2015-11-12 15:46       ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
2015-11-14 18:50         ` Patrick McHardy
2015-11-16 21:56         ` Pablo Neira Ayuso
2015-11-17  0:03           ` Florian Westphal
2015-11-17 10:40             ` Pablo Neira Ayuso
2015-11-17 11:04               ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.