* [PATCH net] netfilter: Use consistent ct id hash calculation @ 2019-08-07 0:25 Dirk Morris 2019-08-07 0:34 ` Florian Westphal 0 siblings, 1 reply; 9+ messages in thread From: Dirk Morris @ 2019-08-07 0:25 UTC (permalink / raw) To: netfilter-devel Currently the new ct id is a hash based on several attributes which do not change throughout lifetime of the connection. However, the hash includes the reply side tuple which does change during NAT on the first packet. This means the ct id associated with the first packet of a connection pre-NAT is different than it is post-NAT. This means if you are using nfqueue or nflog in userspace the associated ct id changes from pre-NAT on the first packet to post-NAT on the first packet or all subsequent packets. Below is a patch that (I think) provides the same level of uniqueness of the hash, but is consistent through the lifetime of the first packet and afterwards because it only uses the original direction tuple. Signed-off-by: Dirk Morris <dmorris@metaloft.com> --- nf_conntrack_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index a542761..eae4898 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -471,7 +471,8 @@ u32 nf_ct_get_id(const struct nf_conn *ct) a = (unsigned long)ct; b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); c = (unsigned long)ct->ext; - d = (unsigned long)siphash(&ct->tuplehash, sizeof(ct->tuplehash), + d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple), &ct_id_seed); #ifdef CONFIG_64BIT return siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &ct_id_seed); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-07 0:25 [PATCH net] netfilter: Use consistent ct id hash calculation Dirk Morris @ 2019-08-07 0:34 ` Florian Westphal 2019-08-07 0:57 ` Dirk Morris 0 siblings, 1 reply; 9+ messages in thread From: Florian Westphal @ 2019-08-07 0:34 UTC (permalink / raw) To: Dirk Morris; +Cc: netfilter-devel Dirk Morris <dmorris@metaloft.com> wrote: > Currently the new ct id is a hash based on several attributes which do > not change throughout lifetime of the connection. However, the hash > includes the reply side tuple which does change during NAT on the > first packet. This means the ct id associated with the first packet of > a connection pre-NAT is different than it is post-NAT. This means if > you are using nfqueue or nflog in userspace the associated ct id > changes from pre-NAT on the first packet to post-NAT on the first > packet or all subsequent packets. > > Below is a patch that (I think) provides the same level of uniqueness > of the hash, but is consistent through the lifetime of the first > packet and afterwards because it only uses the original direction > tuple. This is unexpected, as the id function is only supposed to be called once the conntrack has been confirmed, at which point all NAT side effects are supposed to be done. In which case(s) does that assumption not hold? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-07 0:34 ` Florian Westphal @ 2019-08-07 0:57 ` Dirk Morris 2019-08-07 16:36 ` Pablo Neira Ayuso 0 siblings, 1 reply; 9+ messages in thread From: Dirk Morris @ 2019-08-07 0:57 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel On 8/6/19 5:34 PM, Florian Westphal wrote: > > This is unexpected, as the id function is only supposed to be called > once the conntrack has been confirmed, at which point all NAT side > effects are supposed to be done. > > In which case(s) does that assumption not hold? > Yeah, I figured that might be the reasoning. In my case either from a queue event or log event in userspace. Which is always pre-confirmation for the first packet of any connection until late. psuedo-code: nft add table inet foo nft add chain inet foo prerouting "{ type filter hook prerouting priority -123 ; }" nft add chain inet foo postrouting "{ type filter hook postrouting priority -123 ; }" nft add rule inet foo prerouting queue num 1234 nft add rule inet foo postrouting queue num 1234 or nft add table inet foo nft add chain inet foo prerouting "{ type filter hook prerouting priority -123 ; }" nft add chain inet foo postrouting "{ type filter hook postrouting priority -123 ; }" nft add rule inet foo prerouting log prefix "prerouting" group 0 nft add rule inet foo postrouting log prefix "postrouting" group 0 and then in some userspace daemon something like: ct_len = nfq_get_ct_info(nfad, &ct_data); nfct_payload_parse((void *)ct_data,ct_len,l3num,ct) id = nfct_get_attr_u32(ct,ATTR_ID); or *data = nfnl_get_pointer_to_data(nfa->nfa,NFULA_CT,char) nfct_payload_parse((void *)data,ct_len,l3num,ct ) id = nfct_get_attr_u32(ct,ATTR_ID); Its just a convenience to have it not change if possible (assuming the proposal maintains the same level of uniqueness - not 100% sure on that). Listening to the conntrack events this does not happen, as you get the NEW event only after the ct is confirmed, but unfortunately you get the packet from queue and the log messages well before that. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-07 0:57 ` Dirk Morris @ 2019-08-07 16:36 ` Pablo Neira Ayuso 2019-08-07 18:01 ` Florian Westphal 0 siblings, 1 reply; 9+ messages in thread From: Pablo Neira Ayuso @ 2019-08-07 16:36 UTC (permalink / raw) To: Dirk Morris; +Cc: Florian Westphal, netfilter-devel On Tue, Aug 06, 2019 at 05:57:50PM -0700, Dirk Morris wrote: > On 8/6/19 5:34 PM, Florian Westphal wrote: > > > > > This is unexpected, as the id function is only supposed to be called > > once the conntrack has been confirmed, at which point all NAT side > > effects are supposed to be done. > > > > In which case(s) does that assumption not hold? > > > > Yeah, I figured that might be the reasoning. > > In my case either from a queue event or log event in userspace. > Which is always pre-confirmation for the first packet of any connection until late. > > psuedo-code: > > nft add table inet foo > nft add chain inet foo prerouting "{ type filter hook prerouting priority -123 ; }" > nft add chain inet foo postrouting "{ type filter hook postrouting priority -123 ; }" > nft add rule inet foo prerouting queue num 1234 > nft add rule inet foo postrouting queue num 1234 > > or > > nft add table inet foo > nft add chain inet foo prerouting "{ type filter hook prerouting priority -123 ; }" > nft add chain inet foo postrouting "{ type filter hook postrouting priority -123 ; }" > nft add rule inet foo prerouting log prefix "prerouting" group 0 > nft add rule inet foo postrouting log prefix "postrouting" group 0 > > and then in some userspace daemon something like: > > ct_len = nfq_get_ct_info(nfad, &ct_data); > nfct_payload_parse((void *)ct_data,ct_len,l3num,ct) > id = nfct_get_attr_u32(ct,ATTR_ID); > > or > > *data = nfnl_get_pointer_to_data(nfa->nfa,NFULA_CT,char) > nfct_payload_parse((void *)data,ct_len,l3num,ct ) > id = nfct_get_attr_u32(ct,ATTR_ID); > > Its just a convenience to have it not change if possible (assuming the proposal > maintains the same level of uniqueness - not 100% sure on that). > > Listening to the conntrack events this does not happen, as you get the NEW event > only after the ct is confirmed, but unfortunately you get the packet from queue > and the log messages well before that. ct->ext also might change while packet is in transit, since new extension can be added too. &ct->tuplehash was not good either for event retransmission, since hnnode might change (when moving the object to the dying list), so ct->tuplehash[x].tuple looks good to me. @Florian: by mangling this patch not to use ct->ext, including Dirk's update, conntrackd works again (remember that bug we discussed during NFWS). @@ -470,8 +470,8 @@ u32 nf_ct_get_id(const struct nf_conn *ct) a = (unsigned long)ct; b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); - c = (unsigned long)ct->ext; - d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL], sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL]), + c = (unsigned long)0; + d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); I think it's safe to turn this into: a = (unsigned long)ct; b = (unsigned long)ct->master; c = (unsigned long)nf_ct_net(ct)); d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-07 16:36 ` Pablo Neira Ayuso @ 2019-08-07 18:01 ` Florian Westphal 2019-08-07 20:31 ` Pablo Neira Ayuso 0 siblings, 1 reply; 9+ messages in thread From: Florian Westphal @ 2019-08-07 18:01 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Dirk Morris, Florian Westphal, netfilter-devel Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > Listening to the conntrack events this does not happen, as you get the NEW event > > only after the ct is confirmed, but unfortunately you get the packet from queue > > and the log messages well before that. > > ct->ext also might change while packet is in transit, since new > extension can be added too. Not after confirmation though. > &ct->tuplehash was not good either for event retransmission, since > hnnode might change (when moving the object to the dying list), so > ct->tuplehash[x].tuple looks good to me. Oh, thats a fair point (dying list), had not considered that. So this needs fixing in any case. > @Florian: by mangling this patch not to use ct->ext, including Dirk's > update, conntrackd works again (remember that bug we discussed during > NFWS). But conntrackd is still borken. It can't rely on id recycling -- it will just take a lot longer before it starts to fill up. > @@ -470,8 +470,8 @@ u32 nf_ct_get_id(const struct nf_conn *ct) > > a = (unsigned long)ct; > b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); > - c = (unsigned long)ct->ext; > - d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL], sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL]), > + c = (unsigned long)0; > + d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); > I think it's safe to turn this into: > > a = (unsigned long)ct; > b = (unsigned long)ct->master; > c = (unsigned long)nf_ct_net(ct)); > d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); No, not if we allow using the function before confirmation, the tuple can also change in original dir when e.g. queuing before NAT hooks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-07 18:01 ` Florian Westphal @ 2019-08-07 20:31 ` Pablo Neira Ayuso 2019-08-07 23:45 ` Florian Westphal 0 siblings, 1 reply; 9+ messages in thread From: Pablo Neira Ayuso @ 2019-08-07 20:31 UTC (permalink / raw) To: Florian Westphal; +Cc: Dirk Morris, netfilter-devel On Wed, Aug 07, 2019 at 08:01:57PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > @Florian: by mangling this patch not to use ct->ext, including Dirk's > > update, conntrackd works again (remember that bug we discussed during > > NFWS). > > But conntrackd is still borken. > It can't rely on id recycling -- it will just take a lot > longer before it starts to fill up. Conntrackd does not rely on ID recycling. Conntrackd is in trouble because of event loss. It seems the event re-delivery routine is buggy, if the destroy event gets to userspace sooner or later, then this entry would not get stuck in the cache forever. I can just remove the check for the ID in userspace, so conntrackd would get rid of the stale entry by when a new entry with the same tuple shows up (lazy garbage collection). > > @@ -470,8 +470,8 @@ u32 nf_ct_get_id(const struct nf_conn *ct) > > > > a = (unsigned long)ct; > > b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); > > - c = (unsigned long)ct->ext; > > - d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL], sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL]), > > + c = (unsigned long)0; > > + d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); > > > I think it's safe to turn this into: > > > > a = (unsigned long)ct; > > b = (unsigned long)ct->master; > > c = (unsigned long)nf_ct_net(ct)); > > d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); > > No, not if we allow using the function before confirmation, the tuple > can also change in original dir when e.g. queuing before NAT hooks. Tuple could be artificially built from original source as source and reply source as destination, those never change IIRC. This hash-based ID calculation is a simple approach, but it looks weak / easy to break. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-07 20:31 ` Pablo Neira Ayuso @ 2019-08-07 23:45 ` Florian Westphal 2019-08-08 5:01 ` Dirk Morris 0 siblings, 1 reply; 9+ messages in thread From: Florian Westphal @ 2019-08-07 23:45 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, Dirk Morris, netfilter-devel Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Wed, Aug 07, 2019 at 08:01:57PM +0200, Florian Westphal wrote: > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > @Florian: by mangling this patch not to use ct->ext, including Dirk's > > > update, conntrackd works again (remember that bug we discussed during > > > NFWS). > > > > But conntrackd is still borken. > > It can't rely on id recycling -- it will just take a lot > > longer before it starts to fill up. > > Conntrackd does not rely on ID recycling. Conntrackd is in trouble > because of event loss. It seems the event re-delivery routine is > buggy, if the destroy event gets to userspace sooner or later, then > this entry would not get stuck in the cache forever. I can just remove > the check for the ID in userspace, so conntrackd would get rid of the > stale entry by when a new entry with the same tuple shows up (lazy > garbage collection). Pardon my ignorance, but why is the id compared at all in conntrackd? > > > @@ -470,8 +470,8 @@ u32 nf_ct_get_id(const struct nf_conn *ct) > > > > > > a = (unsigned long)ct; > > > b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); > > > - c = (unsigned long)ct->ext; > > > - d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL], sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL]), > > > + c = (unsigned long)0; > > > + d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); > > > > > I think it's safe to turn this into: > > > > > > a = (unsigned long)ct; > > > b = (unsigned long)ct->master; > > > c = (unsigned long)nf_ct_net(ct)); > > > d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); > > > > No, not if we allow using the function before confirmation, the tuple > > can also change in original dir when e.g. queuing before NAT hooks. My claim is bogus, only the reply tuple is altered of course. So Pablos suggestion above should work just fine. Dirk, can you spin a v2 with that change? > This hash-based ID calculation is a simple approach, but it looks weak > / easy to break. It works fine for what its intended, I did not consider the node pointer change after we link the ct to the dying list. Might also resolve the earlier reported bug wrt. conntrackd table exhaustion as the id changed after relink to dying list. Alternative is to use IDA and allocate ID based on object, thats very similar to the original (u32)addr but we need to store the additional identifier in the conntrack object which i tried to avoid. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-07 23:45 ` Florian Westphal @ 2019-08-08 5:01 ` Dirk Morris 2019-08-08 10:28 ` Florian Westphal 0 siblings, 1 reply; 9+ messages in thread From: Dirk Morris @ 2019-08-08 5:01 UTC (permalink / raw) To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel On 8/7/19 4:45 PM, Florian Westphal wrote: > > So Pablos suggestion above should work just fine. > Dirk, can you spin a v2 with that change? > Yes, will do tomorrow. Also, just an idea, I also played around with just adding u32 id to struct nf_conn and just calculating the hash inside __nf_conntack_alloc when initialized or even lazily in nf_ct_get_id. This seems to work fine and you don't have to worry about anything changing and only calculate the hash once. I'm presuming this method was avoided for some reason, like keeping the struct size to a minimum. --- include/net/netfilter/nf_conntrack.h | 3 +++ net/netfilter/nf_conntrack_core.c | 30 +++++++++++++++--------------- 2 files changed, 18 insertions(+), 15 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 93bbae8..9772ddc 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -74,6 +74,9 @@ struct nf_conn { /* jiffies32 when this ct is considered dead */ u32 timeout; + /* ct id */ + u32 id; + possible_net_t ct_net; #if IS_ENABLED(CONFIG_NF_NAT) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index ab73c5f..614fd86 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -312,21 +312,7 @@ EXPORT_SYMBOL_GPL(nf_ct_invert_tuple); */ u32 nf_ct_get_id(const struct nf_conn *ct) { - static __read_mostly siphash_key_t ct_id_seed; - unsigned long a, b, c, d; - - net_get_random_once(&ct_id_seed, sizeof(ct_id_seed)); - - a = (unsigned long)ct; - b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); - c = (unsigned long)ct->ext; - d = (unsigned long)siphash(&ct->tuplehash, sizeof(ct->tuplehash), - &ct_id_seed); -#ifdef CONFIG_64BIT - return siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &ct_id_seed); -#else - return siphash_4u32((u32)a, (u32)b, (u32)c, (u32)d, &ct_id_seed); -#endif + return ct->id; } EXPORT_SYMBOL_GPL(nf_ct_get_id); @@ -1178,6 +1164,7 @@ __nf_conntrack_alloc(struct net *net, gfp_t gfp, u32 hash) { struct nf_conn *ct; + static __read_mostly siphash_key_t ct_id_seed; /* We don't want any race condition at early drop stage */ atomic_inc(&net->ct.count); @@ -1215,6 +1202,19 @@ __nf_conntrack_alloc(struct net *net, nf_ct_zone_add(ct, zone); + unsigned long a, b, c; + net_get_random_once(&ct_id_seed, sizeof(ct_id_seed)); + a = (unsigned long)ct; + b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); + c = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple), + &ct_id_seed); +#ifdef CONFIG_64BIT + ct->id = siphash_3u64((u64)a, (u64)b, (u64)c, &ct_id_seed); +#else + ct->id = siphash_3u32((u32)a, (u32)b, (u32)c, &ct_id_seed); +#endif + /* Because we use RCU lookups, we set ct_general.use to zero before * this is inserted in any list. */ ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net] netfilter: Use consistent ct id hash calculation 2019-08-08 5:01 ` Dirk Morris @ 2019-08-08 10:28 ` Florian Westphal 0 siblings, 0 replies; 9+ messages in thread From: Florian Westphal @ 2019-08-08 10:28 UTC (permalink / raw) To: Dirk Morris; +Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel Dirk Morris <dmorris@metaloft.com> wrote: > On 8/7/19 4:45 PM, Florian Westphal wrote: > > So Pablos suggestion above should work just fine. > > Dirk, can you spin a v2 with that change? > > > > Yes, will do tomorrow. Thanks. > Also, just an idea, I also played around with just adding > u32 id to struct nf_conn and just calculating the hash inside > __nf_conntack_alloc when initialized or even lazily in nf_ct_get_id. > This seems to work fine and you don't have to worry about anything changing > and only calculate the hash once. > > I'm presuming this method was avoided for some reason, like keeping the struct > size to a minimum. Yes, exactly. If we go for storing id in the struct we could also just use a random value rather than computing a hash. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-08-08 10:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-07 0:25 [PATCH net] netfilter: Use consistent ct id hash calculation Dirk Morris 2019-08-07 0:34 ` Florian Westphal 2019-08-07 0:57 ` Dirk Morris 2019-08-07 16:36 ` Pablo Neira Ayuso 2019-08-07 18:01 ` Florian Westphal 2019-08-07 20:31 ` Pablo Neira Ayuso 2019-08-07 23:45 ` Florian Westphal 2019-08-08 5:01 ` Dirk Morris 2019-08-08 10:28 ` Florian Westphal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).