* How to send nf trace notifications to userspace? @ 2015-11-06 23:29 Florian Westphal 2015-11-07 16:38 ` Pablo Neira Ayuso 0 siblings, 1 reply; 10+ messages in thread From: Florian Westphal @ 2015-11-06 23:29 UTC (permalink / raw) To: netfilter-devel $subject says most of it. I started to resurrect the 'nft trace' patch set from https://patchwork.ozlabs.org/patch/444772/ It works ok with couple of changes to nf_tables_core and nfnetlink to propagate trace type (return, policy, ...) and the handle of the rule currently processed. Also saves us from the 'rule counting' that we're currently doing in kernel. $ src/nft trace SRC=192.168.7.1 DST=192.168.7.10 LEN=88 TTL=64 ID=39366 PROTO=6 SPT=47028 DPT=22 raw prerouting (# handle 6) ip protocol tcp limit rate 1/second nftrace set 1 RETURN raw prerouting ACCEPT raw prerouting filter input (# handle 7) tcp dport ssh jump foo filter foo (# handle 5) ip daddr 192.168.7.10 filter foo (# handle 6) counter packets 912 bytes 63619 RETURN filter foo filter input (# handle 8) ip protocol tcp ip daddr 192.168.7.10 filter input (# handle 9) tcp dport ssh accept So basic functionality works on current nftables head. However, the code uses nfnetlink_log. It appears that its unicast only, i.e. its not possible to run another 'nft trace' in parallel and also clashes with ulogd or other running logger. Thats a non-starter. So I'm inclined to toss the nfnetlink_log based changes and instead (re-)use the infrastructure that we have for chain, rule, ... notifications. That would mean i'd also just plug the tracing notifications decoding into nft monitor. Is that ok? Any better ideas? Do you think its useful to also dump skb data (i.e. in/outdev, packet payload, etc?) Thanks, Florian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to send nf trace notifications to userspace? 2015-11-06 23:29 How to send nf trace notifications to userspace? Florian Westphal @ 2015-11-07 16:38 ` Pablo Neira Ayuso 2015-11-07 20:49 ` Florian Westphal 0 siblings, 1 reply; 10+ messages in thread From: Pablo Neira Ayuso @ 2015-11-07 16:38 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel On Sat, Nov 07, 2015 at 12:29:00AM +0100, Florian Westphal wrote: [...] > However, the code uses nfnetlink_log. It appears that its > unicast only, i.e. its not possible to run another 'nft trace' in parallel > and also clashes with ulogd or other running logger. > > Thats a non-starter. > > So I'm inclined to toss the nfnetlink_log based changes and > instead (re-)use the infrastructure that we have for chain, rule, ... notifications. Something that integrates tightly with the infrastructure is the way to go. I'd suggest you add a new NFNLGRP_NFTABLES_TRACE nfnetlink group. > [...] Also saves us from the 'rule counting' that we're currently > doing in kernel. I think we'll have to keep that counting around to provide the legacy output from the compat layer. > $ src/nft trace > SRC=192.168.7.1 DST=192.168.7.10 LEN=88 TTL=64 ID=39366 PROTO=6 SPT=47028 DPT=22 raw prerouting (# handle 6) > ACCEPT raw prerouting > filter input (# handle 7) > tcp dport ssh jump foo > filter foo (# handle 5) > ip daddr 192.168.7.10 > filter foo (# handle 6) > counter packets 912 bytes 63619 > RETURN filter foo > filter input (# handle 8) > ip protocol tcp ip daddr 192.168.7.10 > filter input (# handle 9) > tcp dport ssh accept I guess this is a draft proposal, but I think it would be good to have a more compact output that can be easily read, eg. matches: filter input tcp dport ssh accept # handle 9 > Do you think its useful to also dump skb data (i.e. in/outdev, > packet payload, etc?) Such information is useful to know what packet is matching what rules. If we have a way to uniquely identify the packet (probably part of the skbuff address as we do with conntrack ID from ctnetlink?), we can just dump this packet data and metadata only once when setting skb->nf_trace to 1, then from the trace event we only include the packet ID that results from matching a rule, so userspace can correlate. On a different front but related, I think this infrastructure may allow us to implement the packet path test infrastructure that we discussed many times before during netfilter workshops at some point. Kind of similar thing that Rusty made time ago with nfsim. Thanks for taking over this. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to send nf trace notifications to userspace? 2015-11-07 16:38 ` Pablo Neira Ayuso @ 2015-11-07 20:49 ` Florian Westphal 2015-11-09 13:36 ` Patrick McHardy 0 siblings, 1 reply; 10+ messages in thread From: Florian Westphal @ 2015-11-07 20:49 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > However, the code uses nfnetlink_log. It appears that its > > unicast only, i.e. its not possible to run another 'nft trace' in parallel > > and also clashes with ulogd or other running logger. > > > > Thats a non-starter. > > > > So I'm inclined to toss the nfnetlink_log based changes and > > instead (re-)use the infrastructure that we have for chain, rule, ... notifications. > > Something that integrates tightly with the infrastructure is the way > to go. I'd suggest you add a new NFNLGRP_NFTABLES_TRACE nfnetlink > group. Okay, can do. > > [...] Also saves us from the 'rule counting' that we're currently > > doing in kernel. > > I think we'll have to keep that counting around to provide the legacy > output from the compat layer. Can you elaborate? I find scavenging dmesg output rather painful so I see no reason to keep the current code once the new facility is available. In any case, i can keep it since I planned to see if we can add static key to avoid all the 'if (skb->nf_trace)' stuff for the normal case. But I'd want to see a convincing argument to keep the existing trace stuff first :-) > > $ src/nft trace > > SRC=192.168.7.1 DST=192.168.7.10 LEN=88 TTL=64 ID=39366 PROTO=6 SPT=47028 DPT=22 raw prerouting (# handle 6) > > ACCEPT raw prerouting > > filter input (# handle 7) > > tcp dport ssh jump foo > > filter foo (# handle 5) > > ip daddr 192.168.7.10 > > filter foo (# handle 6) > > counter packets 912 bytes 63619 > > RETURN filter foo > > filter input (# handle 8) > > ip protocol tcp ip daddr 192.168.7.10 > > filter input (# handle 9) > > tcp dport ssh accept > > I guess this is a draft proposal, but I think it would be good to have > a more compact output that can be easily read, eg. Yes, I have not thought about representation in userspace so far. On a related note, should that be part of 'nft monitor' or a distinct command? My gut feeling was to re-use monitor mode, re-use NFNLGRP_NFTABLES and just add a new message type (NFT_MSG_TRACE) for it, on the grounds that tracing is usually off on a production system so I don't see why we would have to prevent normal event listeners from seeing trace messages. But, your call. > matches: filter input tcp dport ssh accept # handle 9 looks more similar to standard '--handle' format, so its definitely better. I'll deal with this after all the interface details are hashed out. > > Do you think its useful to also dump skb data (i.e. in/outdev, > > packet payload, etc?) > > Such information is useful to know what packet is matching what rules. > > If we have a way to uniquely identify the packet (probably part of the > skbuff address as we do with conntrack ID from ctnetlink?), we can > just dump this packet data and metadata only once when setting > skb->nf_trace to 1, then from the trace event we only include the > packet ID that results from matching a rule, so userspace can > correlate. Ok, that seems sensible. What would you suggest? I'm inclined to go with nested attributes for this, i.e.: nest = nla_nest_start(skb, NFTA_TRACE_RULE); .. nla_put_be64(skb, NFTA_RULE_HANDLE, cpu_to_be64(rule->handle))) nla_put_string( NFTA_RULE_CHAIN, TABLE, etc... nla_nest_end() For the first trace message (when skb->nf_trace changes to 1), this would mean we'd have something like: nest = nla_nest_start(skb, NFTA_TRACE_PACKET) -> NFULA_PACKET_HDR, -> NFULA_PREFIX etc. I'd see if we can do this by refactoring nfnetlink_log.c to provide a helper to fill in this nested attibute data to avoid copy&paste of __build_packet_message(). It could then be called by nft_meta.c when nf_trace is set on an skb. Downside is that this would create module dependency on nfnetlink_log. What do you think? Alternative is to copy&paste, but trim it down to not include all the info nfnetlink_log offers, f.e. i'm not sure iif, oif, etc are needed given this can be filtered by setting the nftables meta nftrace rules appropriately. This also has implications on nft dependencies -- should it use libnfnetlink_log? With reduced attributes-we-log that could be avoided. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to send nf trace notifications to userspace? 2015-11-07 20:49 ` Florian Westphal @ 2015-11-09 13:36 ` Patrick McHardy 2015-11-12 15:46 ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal 0 siblings, 1 reply; 10+ messages in thread From: Patrick McHardy @ 2015-11-09 13:36 UTC (permalink / raw) To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel On 07.11, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > Do you think its useful to also dump skb data (i.e. in/outdev, > > > packet payload, etc?) > > > > Such information is useful to know what packet is matching what rules. > > > > If we have a way to uniquely identify the packet (probably part of the > > skbuff address as we do with conntrack ID from ctnetlink?), we can > > just dump this packet data and metadata only once when setting > > skb->nf_trace to 1, then from the trace event we only include the > > packet ID that results from matching a rule, so userspace can > > correlate. > > Ok, that seems sensible. > What would you suggest? > > I'm inclined to go with nested attributes for this, i.e.: > > nest = nla_nest_start(skb, NFTA_TRACE_RULE); > .. > nla_put_be64(skb, NFTA_RULE_HANDLE, cpu_to_be64(rule->handle))) > nla_put_string( NFTA_RULE_CHAIN, TABLE, etc... > nla_nest_end() > > For the first trace message (when skb->nf_trace changes to 1), > this would mean we'd have something like: > > nest = nla_nest_start(skb, NFTA_TRACE_PACKET) > -> NFULA_PACKET_HDR, > -> NFULA_PREFIX > > etc. > > I'd see if we can do this by refactoring nfnetlink_log.c to provide > a helper to fill in this nested attibute data to avoid copy&paste of > __build_packet_message(). > > It could then be called by nft_meta.c when nf_trace is set on an skb. > Downside is that this would create module dependency on nfnetlink_log. On dumping the packet meta data - one idea would be to have the user specify what he is interested in using the existing expressions. We could simply dump that data to the registers and include the raw data in the message, nft has enough information to decode it. The upside would be that we don't have to unconditionally include all data and the user can also dump "unusual" parts in case he's interested in them. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure 2015-11-09 13:36 ` Patrick McHardy @ 2015-11-12 15:46 ` Florian Westphal 2015-11-14 18:50 ` Patrick McHardy 2015-11-16 21:56 ` Pablo Neira Ayuso 0 siblings, 2 replies; 10+ messages in thread From: Florian Westphal @ 2015-11-12 15:46 UTC (permalink / raw) To: netfilter-devel; +Cc: Florian Westphal RFC for the kernel-side of the tracing changes. Caveats+Questions are: - should kernel dump entire rule instead of just the handle? PRO of dumping entire rule matched: * Userspace doesn't need to dump rules first to display the rule that matched * nft trace doesn't need to also monitor ruleset for changes * would make it possible to use a new nfnetlink group as Pablo suggested CON: * needs more memory in kernel space (can't use NLMSG_GOODSIZE - allocation uses GFP_ATOMIC) This patch only dumps the rule handle. As you can see, I reimplemented parts of nfqueue/nfnetlink_log attributes. Problem is that I don't want to add dependencies on nfetlink_log for this, but things like iif and oif are rather imporant, esp. when debugging packets being forwarded. The first 128 bytes of the packet data is also dumped to userspace, currently limited to when we do the initial 'skb->nf_trace=1' assignment in nft_meta.c Patrick, I saw your idea wrt. dumping register contents instead, but it seems both complex and 'backwards' to me -- isn't the 'meta set nftrace' rule already a selector, i.e. user would say something like 'tcp port 22 limit rate 1/second meta nftrace set 1' So I'm not sure its right to extend nftrace with additional selectors (its also possible that I failed to understand what you were suggesting 8-} ) The userspace side currently lives in libnftnl, with new api functions: struct nftnl_trace *nftnl_trace_alloc(void); void nftnl_trace_free(struct nftnl_trace *trace); bool nftnl_trace_is_set(const struct nftnl_trace *trace, uint16_t type); const void *nftnl_trace_get_data(const struct nftnl_trace *trace, uint16_t type, uint32_t *data_len); uint32_t nftnl_trace_get_u32(const struct nftnl_trace *trace, uint16_t type); uint64_t nftnl_trace_get_u64(const struct nftnl_trace *trace, uint16_t type); const char *nftnl_trace_get_str(const struct nftnl_trace *trace, uint16_t type); int nftnl_trace_nlmsg_parse(const struct nlmsghdr *nlh, struct nftnl_trace *t); int nftnl_trace_snprintf(char *buf, size_t size, const struct nftnl_trace *t, uint32_t type); int nftnl_trace_fprintf(FILE *fh, const struct nftnl_trace *t, uint32_t type); The nftnl_trace struct is readonly (thus no setters). I'd then extend nft to use those helpers. Cheers, Florian -- include/net/netfilter/nf_tables.h | 6 ++ include/uapi/linux/netfilter/nf_tables.h | 27 ++++++++ net/netfilter/nf_tables_api.c | 114 +++++++++++++++++++++++++++++++ net/netfilter/nf_tables_core.c | 63 +++++------------ net/netfilter/nft_meta.c | 3 + 5 files changed, 166 insertions(+), 47 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index c9149cc..a92be10 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -878,6 +878,12 @@ void nft_unregister_chain_type(const struct nf_chain_type *); int nft_register_expr(struct nft_expr_type *); void nft_unregister_expr(struct nft_expr_type *); +void nf_tables_trace_notify(const struct nft_pktinfo *pkt, + const struct nft_chain *chain, + const struct nft_rule *rule, + u32 verdict, + enum nft_trace_types type); + #define nft_dereference(p) \ nfnl_dereference(p, NFNL_SUBSYS_NFTABLES) diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index d8c8a7c..c7584dc 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -83,6 +83,7 @@ enum nft_verdicts { * @NFT_MSG_DELSETELEM: delete a set element (enum nft_set_elem_attributes) * @NFT_MSG_NEWGEN: announce a new generation, only for events (enum nft_gen_attributes) * @NFT_MSG_GETGEN: get the rule-set generation (enum nft_gen_attributes) + * @NFT_MSG_TRACE: trace event (enum nft_trace_attributes) */ enum nf_tables_msg_types { NFT_MSG_NEWTABLE, @@ -102,6 +103,7 @@ enum nf_tables_msg_types { NFT_MSG_DELSETELEM, NFT_MSG_NEWGEN, NFT_MSG_GETGEN, + NFT_MSG_TRACE, NFT_MSG_MAX, }; @@ -970,4 +972,29 @@ enum nft_gen_attributes { }; #define NFTA_GEN_MAX (__NFTA_GEN_MAX - 1) +enum nft_trace_attibutes { + NFTA_TRACE_UNSPEC, + NFTA_TRACE_CHAIN, + NFTA_TRACE_ID, + NFTA_TRACE_IIF, + NFTA_TRACE_OIF, + NFTA_TRACE_MARK, + NFTA_TRACE_PAYLOAD, + NFTA_TRACE_TABLE, + NFTA_TRACE_TYPE, + NFTA_TRACE_RULE_HANDLE, + NFTA_TRACE_VERDICT, + __NFTA_TRACE_MAX +}; +#define NFTA_TRACE_MAX (__NFTA_TRACE_MAX - 1) + +enum nft_trace_types { + NFT_TRACETYPE_UNSPEC, + NFT_TRACETYPE_PACKET, + NFT_TRACETYPE_POLICY, + NFT_TRACETYPE_RETURN, + NFT_TRACETYPE_RULE, + __NFT_TRACETYPE_MAX +}; +#define NFT_TRACETYPE_MAX (__NFT_TRACETYPE_MAX - 1) #endif /* _LINUX_NF_TABLES_H */ diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 93cc473..65a6635 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -9,6 +9,7 @@ */ #include <linux/module.h> +#include <linux/hash.h> #include <linux/init.h> #include <linux/list.h> #include <linux/skbuff.h> @@ -499,6 +500,119 @@ err: return err; } +void nf_tables_trace_notify(const struct nft_pktinfo *pkt, + const struct nft_chain *chain, + const struct nft_rule *rule, + u32 verdict, + enum nft_trace_types type) +{ + static const unsigned int hdrlen = 128; + struct nfgenmsg *nfmsg; + struct nlmsghdr *nlh; + struct sk_buff *skb; + unsigned int size; + int event = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_TRACE; + + if (!nfnetlink_has_listeners(pkt->net, NFNLGRP_NFTABLES)) + return; + + /* Unlike other notifiers we need GFP_ATOMIC so use actual size + * needed instead of NLMSG_GOODSIZE. + */ + size = nlmsg_total_size(sizeof(struct nfgenmsg)) + + nla_total_size(sizeof(__be32)) /* trace type */ + + nla_total_size(NFT_TABLE_MAXNAMELEN) + + nla_total_size(NFT_CHAIN_MAXNAMELEN) + + nla_total_size(sizeof(u32)) /* iif */ + + nla_total_size(sizeof(u32)) /* oif */ + + nla_total_size(sizeof(u32)) /* id */ + + nla_total_size(sizeof(u32)) /* mark */ + + nla_total_size(sizeof(u32)) /* verdict */ + + nla_total_size(sizeof(__be64)); /* rule handle */ + + switch (type) { + case NFT_TRACETYPE_PACKET: + size += nla_total_size(hdrlen); + break; + default: + break; + } + + skb = nlmsg_new(size, GFP_ATOMIC); + if (skb == NULL) + return; + + nlh = nlmsg_put(skb, 0, 0, event, sizeof(struct nfgenmsg), 0); + if (nlh == NULL) + goto nla_put_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = pkt->pf; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = htons(pkt->net->nft.base_seq & 0xffff); + + if (nla_put_be32(skb, NFTA_TRACE_TYPE, htonl(type))) + goto nla_put_failure; + + if (nla_put_be32(skb, NFTA_TRACE_ID, htonl(hash_ptr(skb, 32)))) + goto nla_put_failure; + + if (chain) { + if (nla_put_string(skb, NFTA_TRACE_TABLE, chain->table->name)) + goto nla_put_failure; + if (nla_put_string(skb, NFTA_TRACE_CHAIN, chain->name)) + goto nla_put_failure; + } + + if (rule && nla_put_be64(skb, NFTA_TRACE_RULE_HANDLE, + cpu_to_be64(rule->handle))) + goto nla_put_failure; + + if (pkt->in && + nla_put_be32(skb, NFTA_TRACE_IIF, htonl(pkt->in->ifindex))) + goto nla_put_failure; + if (pkt->out && + nla_put_be32(skb, NFTA_TRACE_OIF, htonl(pkt->out->ifindex))) + goto nla_put_failure; + if (pkt->skb->mark && + nla_put_be32(skb, NFTA_TRACE_MARK, htonl(pkt->skb->mark))) + goto nla_put_failure; + + switch (type) { + case NFT_TRACETYPE_POLICY: + case NFT_TRACETYPE_RETURN: + case NFT_TRACETYPE_RULE: + if (nla_put_be32(skb, NFTA_TRACE_VERDICT, htonl(verdict))) + goto nla_put_failure; + break; + case NFT_TRACETYPE_PACKET: { + unsigned int plen = min(hdrlen, pkt->skb->len); + struct nlattr *nla; + + if (skb_tailroom(skb) < nla_total_size(plen)) + goto nla_put_failure; + + nla = (struct nlattr *)skb_put(skb, nla_total_size(plen)); + nla->nla_type = NFTA_TRACE_PAYLOAD; + nla->nla_len = nla_attr_size(plen); + + if (skb_copy_bits(pkt->skb, 0, nla_data(nla), plen)) + goto nla_put_failure; + } + break; + default: + break; + } + + nlmsg_end(skb, nlh); + nfnetlink_send(skb, pkt->net, 0, NFNLGRP_NFTABLES, 0, GFP_ATOMIC); + return; + + nla_put_failure: + WARN_ON_ONCE(1); + kfree_skb(skb); +} + static int nf_tables_dump_tables(struct sk_buff *skb, struct netlink_callback *cb) { diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index f3695a4..731909d 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -22,44 +22,14 @@ #include <net/netfilter/nf_tables.h> #include <net/netfilter/nf_log.h> -enum nft_trace { - NFT_TRACE_RULE, - NFT_TRACE_RETURN, - NFT_TRACE_POLICY, -}; - -static const char *const comments[] = { - [NFT_TRACE_RULE] = "rule", - [NFT_TRACE_RETURN] = "return", - [NFT_TRACE_POLICY] = "policy", -}; - -static struct nf_loginfo trace_loginfo = { - .type = NF_LOG_TYPE_LOG, - .u = { - .log = { - .level = LOGLEVEL_WARNING, - .logflags = NF_LOG_MASK, - }, - }, -}; - -static void __nft_trace_packet(const struct nft_pktinfo *pkt, - const struct nft_chain *chain, - int rulenum, enum nft_trace type) -{ - nf_log_trace(pkt->net, pkt->pf, pkt->hook, pkt->skb, pkt->in, - pkt->out, &trace_loginfo, "TRACE: %s:%s:%s:%u ", - chain->table->name, chain->name, comments[type], - rulenum); -} - static inline void nft_trace_packet(const struct nft_pktinfo *pkt, const struct nft_chain *chain, - int rulenum, enum nft_trace type) + const struct nft_rule *rule, + u32 verdict, + enum nft_trace_types type) { if (unlikely(pkt->skb->nf_trace)) - __nft_trace_packet(pkt, chain, rulenum, type); + nf_tables_trace_notify(pkt, chain, rule, verdict, type); } static void nft_cmp_fast_eval(const struct nft_expr *expr, @@ -105,7 +75,6 @@ static bool nft_payload_fast_eval(const struct nft_expr *expr, struct nft_jumpstack { const struct nft_chain *chain; const struct nft_rule *rule; - int rulenum; }; unsigned int @@ -119,11 +88,9 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv) unsigned int stackptr = 0; struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE]; struct nft_stats *stats; - int rulenum; unsigned int gencursor = nft_genmask_cur(net); do_chain: - rulenum = 0; rule = list_entry(&chain->rules, struct nft_rule, list); next_rule: regs.verdict.code = NFT_CONTINUE; @@ -133,8 +100,6 @@ next_rule: if (unlikely(rule->genmask & (1 << gencursor))) continue; - rulenum++; - nft_rule_for_each_expr(expr, last, rule) { if (expr->ops == &nft_cmp_fast_ops) nft_cmp_fast_eval(expr, ®s); @@ -151,7 +116,8 @@ next_rule: regs.verdict.code = NFT_CONTINUE; continue; case NFT_CONTINUE: - nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE); + nft_trace_packet(pkt, chain, rule, + regs.verdict.code, NFT_TRACETYPE_RULE); continue; } break; @@ -161,7 +127,9 @@ next_rule: case NF_ACCEPT: case NF_DROP: case NF_QUEUE: - nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE); + nft_trace_packet(pkt, chain, rule, + regs.verdict.code & NF_VERDICT_MASK, + NFT_TRACETYPE_RULE); return regs.verdict.code; } @@ -170,19 +138,20 @@ next_rule: BUG_ON(stackptr >= NFT_JUMP_STACK_SIZE); jumpstack[stackptr].chain = chain; jumpstack[stackptr].rule = rule; - jumpstack[stackptr].rulenum = rulenum; stackptr++; /* fall through */ case NFT_GOTO: - nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE); + nft_trace_packet(pkt, chain, rule, + regs.verdict.code, NFT_TRACETYPE_RULE); chain = regs.verdict.chain; goto do_chain; case NFT_CONTINUE: - rulenum++; /* fall through */ case NFT_RETURN: - nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RETURN); + if (stackptr) + nft_trace_packet(pkt, chain, rule, + regs.verdict.code, NFT_TRACETYPE_RETURN); break; default: WARN_ON(1); @@ -192,11 +161,11 @@ next_rule: stackptr--; chain = jumpstack[stackptr].chain; rule = jumpstack[stackptr].rule; - rulenum = jumpstack[stackptr].rulenum; goto next_rule; } - nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY); + nft_trace_packet(pkt, basechain, NULL, + nft_base_chain(basechain)->policy, NFT_TRACETYPE_POLICY); rcu_read_lock_bh(); stats = this_cpu_ptr(rcu_dereference(nft_base_chain(basechain)->stats)); diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index e4ad2c2..857a652 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -200,6 +200,9 @@ void nft_meta_set_eval(const struct nft_expr *expr, skb->priority = value; break; case NFT_META_NFTRACE: + if (skb->nf_trace == 0) + nf_tables_trace_notify(pkt, NULL, NULL, 0, + NFT_TRACETYPE_PACKET); skb->nf_trace = 1; break; default: -- 2.4.10 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure 2015-11-12 15:46 ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal @ 2015-11-14 18:50 ` Patrick McHardy 2015-11-16 21:56 ` Pablo Neira Ayuso 1 sibling, 0 replies; 10+ messages in thread From: Patrick McHardy @ 2015-11-14 18:50 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel On 12.11, Florian Westphal wrote: > RFC for the kernel-side of the tracing changes. > > The first 128 bytes of the packet data is also dumped to userspace, currently > limited to when we do the initial 'skb->nf_trace=1' assignment in nft_meta.c > > Patrick, I saw your idea wrt. dumping register contents instead, but it > seems both complex and 'backwards' to me -- isn't the 'meta set nftrace' > rule already a selector, i.e. user would say something like > > 'tcp port 22 limit rate 1/second meta nftrace set 1' > > So I'm not sure its right to extend nftrace with additional selectors (its > also possible that I failed to understand what you were suggesting 8-} ) Yeah, it was about the data that we dump, to make that explicitly selectable. The idea was that you could explicitly specify to dump lets say tcp dport, skuid and cpu number. But it might not work since we'd have to propagate that selection along with the packet, so I guess, please forget about it :) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure 2015-11-12 15:46 ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal 2015-11-14 18:50 ` Patrick McHardy @ 2015-11-16 21:56 ` Pablo Neira Ayuso 2015-11-17 0:03 ` Florian Westphal 1 sibling, 1 reply; 10+ messages in thread From: Pablo Neira Ayuso @ 2015-11-16 21:56 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, kaber Hi Florian, Cc'ing Patrick to shake him for ideas :) On Thu, Nov 12, 2015 at 04:46:49PM +0100, Florian Westphal wrote: > RFC for the kernel-side of the tracing changes. > > Caveats+Questions are: > - should kernel dump entire rule instead of just the handle? > PRO of dumping entire rule matched: > * Userspace doesn't need to dump rules first to display > the rule that matched > * nft trace doesn't need to also monitor ruleset for changes > * would make it possible to use a new nfnetlink group as Pablo > suggested > > CON: > * needs more memory in kernel space (can't use NLMSG_GOODSIZE - > allocation uses GFP_ATOMIC) > > This patch only dumps the rule handle. Agreed, only rule handle should be fine. We have limited bandwidth delivering netlink event notifications from the packet path, so keeping the message small there looks like a good idea to me. > As you can see, I reimplemented parts of nfqueue/nfnetlink_log attributes. > Problem is that I don't want to add dependencies on nfetlink_log for this, > but things like iif and oif are rather imporant, esp. when debugging > packets being forwarded. > > The first 128 bytes of the packet data is also dumped to userspace, currently > limited to when we do the initial 'skb->nf_trace=1' assignment in nft_meta.c > > Patrick, I saw your idea wrt. dumping register contents instead, but it > seems both complex and 'backwards' to me -- isn't the 'meta set nftrace' > rule already a selector, i.e. user would say something like > > 'tcp port 22 limit rate 1/second meta nftrace set 1' > > So I'm not sure its right to extend nftrace with additional selectors (its > also possible that I failed to understand what you were suggesting 8-} ) I can see room to extend this infrastructure later on if we need to make it more flexible (eg. allow the user to specify what part of the packets need to be dumped to userspace). We can probably even add a specific 'trace' expression from the kernel to allow more specific selection on what packet fields you want to dump to userspace. BTW, any reason to remove the existing infrastructure? It's been there since almost the beginning (this would break users that are expecting the existing behaviour). Moreover, we'll have people using the iptables-compat infrastructure also expecting a similar output to native iptables. You can probably generalize the trace infrastructure through static key plus indirections to keep both tracing approaches around (allowing only one at the same time should be enough). >From userspace, we would need to have a way to indicate that we want to use the new infrastructure, not the classic tracing. That syntax would need some thinking. Probably we can introduce a 'trace' statement, so the syntax looks compact like this: 'tcp port 22 limit rate 1/second trace' Meaning in this case that the userspace is specifically requesting for the new kind of tracing. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure 2015-11-16 21:56 ` Pablo Neira Ayuso @ 2015-11-17 0:03 ` Florian Westphal 2015-11-17 10:40 ` Pablo Neira Ayuso 0 siblings, 1 reply; 10+ messages in thread From: Florian Westphal @ 2015-11-17 0:03 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, kaber Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > So I'm not sure its right to extend nftrace with additional selectors (its > > also possible that I failed to understand what you were suggesting 8-} ) > > I can see room to extend this infrastructure later on if we need to > make it more flexible (eg. allow the user to specify what part of the > packets need to be dumped to userspace). We can probably even add a > specific 'trace' expression from the kernel to allow more specific > selection on what packet fields you want to dump to userspace. > > BTW, any reason to remove the existing infrastructure? It's been there > since almost the beginning (this would break users that are expecting > the existing behaviour). No specific reason, but I question the need to keep it around. Once we have native tooling there should be no reason whatsover to wade through dmesg output + manual matching of rules... its just a PITA and thats why I removed it -- 'nft monitor trace' (or whatever, I've not decided on what nft side should look like) would be much simpler/easier to use. > Moreover, we'll have people using the > iptables-compat infrastructure also expecting a similar output to > native iptables. Hmm, is iptables-compat a worthy focus? AFAIU even iptables-compat users would be able to use nft trace infrastructure, right? Wouldn't it make more sense to convince people to go with real nft rather than the compat layer? > You can probably generalize the trace infrastructure through static > key plus indirections to keep both tracing approaches around (allowing > only one at the same time should be enough). I'd like to see a stronger argument than this. Bottom line is that i want to keep kernel side as simple as possible, so I'd much rather just get rid of the printk-based tracing (I'm *NOT* talking about the ip(6)tables stuff, that remains as-is) rather than add more conditionals or indirect calls to some trace-backends... > From userspace, we would need to have a way to indicate that we want > to use the new infrastructure, not the classic tracing. That syntax > would need some thinking. Probably we can introduce a 'trace' > statement, so the syntax looks compact like this: > > 'tcp port 22 limit rate 1/second trace' > > Meaning in this case that the userspace is specifically requesting for > the new kind of tracing. Hmm. I'm not convinced there is any gain in doing this. Who in their right mind would want to manually decode rules...? That being said, I am not overly zealous about this, and I can keep the printk stuff around. I just think its not needed anymore. If you have new nft with old kernel: - you need to look and dmesg output (or syslog output if ypu use nfnetlink_log logging backend). if you have new nft with new kernel: - nft monitor trace (subject to change, I'm open to suggestions) If you have old nft with new kernel: - possibly a problem, but then again, if you can update kernel, why not use a newer nft as well...? If you don't mind, i would prefer to work on this patch + nft + libnftables integration and then submit that formally. If you're then still convinced that its too problematic to remove the printk based trace i'd consider to rework the kernel patch. [ I'm not in a hurry, we can discuss this at netdev/netconf if you like ] Let me know. Thanks && regards, Florian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure 2015-11-17 0:03 ` Florian Westphal @ 2015-11-17 10:40 ` Pablo Neira Ayuso 2015-11-17 11:04 ` Florian Westphal 0 siblings, 1 reply; 10+ messages in thread From: Pablo Neira Ayuso @ 2015-11-17 10:40 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, kaber On Tue, Nov 17, 2015 at 01:03:52AM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: [...] > > > > BTW, any reason to remove the existing infrastructure? It's been there > > since almost the beginning (this would break users that are expecting > > the existing behaviour). > > No specific reason, but I question the need to keep it around. > Once we have native tooling there should be no reason whatsover > to wade through dmesg output + manual matching of rules... It is not only dmesg. You can also configure this thing to send trace messages through nfnetlink_log through group 0. > its just a PITA and thats why I removed it -- 'nft monitor trace' > (or whatever, I've not decided on what nft side should look like) > would be much simpler/easier to use. It's always a PITA to keep extra code around to keep things compatible, but there is no other way to deprecate things. Don't forget this tracing infrastructure has been there for nearly two years. [...] > Wouldn't it make more sense to convince people to go with real nft > rather than the compat layer? I wish there could be a way to "convice" users to move in a quick way, but there is not. We can just provide something better and wait for quite some time to deprecate things. > If you don't mind, i would prefer to work on this patch + nft + > libnftables integration and then submit that formally. I would like to see a nice netlink-based tracing infrastructure like the one you're currently shaping, no question about that. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure 2015-11-17 10:40 ` Pablo Neira Ayuso @ 2015-11-17 11:04 ` Florian Westphal 0 siblings, 0 replies; 10+ messages in thread From: Florian Westphal @ 2015-11-17 11:04 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, kaber Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Tue, Nov 17, 2015 at 01:03:52AM +0100, Florian Westphal wrote: > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > [...] > > > > > > BTW, any reason to remove the existing infrastructure? It's been there > > > since almost the beginning (this would break users that are expecting > > > the existing behaviour). > > > > No specific reason, but I question the need to keep it around. > > Once we have native tooling there should be no reason whatsover > > to wade through dmesg output + manual matching of rules... > > It is not only dmesg. > > You can also configure this thing to send trace messages through > nfnetlink_log through group 0. > > > its just a PITA and thats why I removed it -- 'nft monitor trace' > > (or whatever, I've not decided on what nft side should look like) > > would be much simpler/easier to use. > > It's always a PITA to keep extra code around to keep things > compatible, but there is no other way to deprecate things. > > Don't forget this tracing infrastructure has been there for nearly two > years. Hmm, I had hoped that we can just ignore that since its not an abi and lack of these printks would apply some gentle pressure to switch to native nft ;) > > Wouldn't it make more sense to convince people to go with real nft > > rather than the compat layer? > > I wish there could be a way to "convice" users to move in a quick way, > but there is not. We can just provide something better and wait for > quite some time to deprecate things. I have no idea how to make this nice :-( I don't want to clutter dmesg/nfnetlink_log when someone is using nft trace. So I'd propose to just check in the kernel trace event path if there is a nfnl group, and if there is one, disable the printk forever: if (old && nfnetlink_has_listeners(pkt->net, NFNLGRP_NFTABLES)) old = false; I see no other way to really make this play nice. Any other ideas? I'd rather not add yet another sysctl toggle... On a different note, here is how userspace output looks like at the moment: # src/nft -a monitor trace trace event ip packet src 192.168.7.1 dst 192.168.7.10 len 96 ttl 64 id 31765 proto 6 sport 53434 dport 22 trace event ip raw prerouting rule verdict continue trace rule tcp dport ssh limit rate 1/second nftrace set 1 # handle 2 trace event ip raw prerouting policy verdict accept trace event ip filter input rule verdict jump trace rule ip saddr . tcp dport vmap { } # handle 7 trace event ip filter test rule verdict accept trace rule accept # handle 4 add rule ip filter input ip protocol icmp accept # handle 15 trace event ip packet src 192.168.7.1 dst 192.168.7.10 len 52 ttl 64 id 28163 proto 6 sport 53420 dport 22 trace event ip raw prerouting rule verdict continue trace rule tcp dport ssh limit rate 1/second nftrace set 1 # handle 2 trace event ip raw prerouting policy verdict accept trace event ip filter input rule verdict jump delete rule ip filter input handle 10 trace rule ip saddr . tcp dport vmap { } # handle 7 trace event ip filter test rule verdict accept trace rule accept # handle 4 trace event ip packet src 192.168.7.1 dst 192.168.7.10 len 52 ttl 64 id ^C Input on how to improve this welcome. Most of the formatting is done via libnfnetlink helpers. Main other point is (what we discussed earlier) -- how to match up packet + the traced rules when several packets are traced at the same time. The kernel already provides a 32bit id value, based on skb address. Simple solution would be to just print that id as well, e.g. trace event packet 0x12345 src 192.168.7.1 dst 192.168.7.10 len 52 ttl 64 id 28163 proto 6 sport 53420 dport 22 trace rule packet 0x12345 tcp dport ssh ... trace event packet 0x42 src 192.168.1.1 dst 10... is that good enough or do you see a better/nicer solution? Thanks! ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-11-17 11:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-06 23:29 How to send nf trace notifications to userspace? Florian Westphal 2015-11-07 16:38 ` Pablo Neira Ayuso 2015-11-07 20:49 ` Florian Westphal 2015-11-09 13:36 ` Patrick McHardy 2015-11-12 15:46 ` [PATCH RFC] netfilter: nf_tables: extend tracing infrastructure Florian Westphal 2015-11-14 18:50 ` Patrick McHardy 2015-11-16 21:56 ` Pablo Neira Ayuso 2015-11-17 0:03 ` Florian Westphal 2015-11-17 10:40 ` Pablo Neira Ayuso 2015-11-17 11:04 ` Florian Westphal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.