All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer
@ 2015-04-01 20:36 Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Florian Westphal
                   ` (13 more replies)
  0 siblings, 14 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev

[ netdev hackers are encouraged to look at patches #2 and #11 specifically,
  those are the ones with largest impact outside netfilter land ]

Remove skb->nf_bridge pointer.

Unfortunately we still need some way to decide if skb is bridged
or not, else kfree_skb, skb_clone etc. would have to do costly lookups
in bridge netfilter.

We use a 2 bit state field in the skb for this purpose.
If its zero, skb is not bridged (same as skb->nf_bridge == NULL
in current kernel).

nf_bridge_info is stored in an rhashtable; bridge netfilter
and the few other places (nfqueue, nflog, physdev match) that need
to access bridge netfilter data do on-demand lookups in an rhashtable
to access the data associated with a bridged skb.

skb_clone and skb_copy will call into netfilter core
helpers for bridged skbs to duplicate the information if needed.
Likewise, kfree_skb removes and frees the bridge netfilter meta data
as well if needed.

In order to avoid those lookups where we're dealing with non-bridged
skbs, we store 2 bit state field in the skb.

Tested, on host connected to kvm-bridge:

ping -s $bignum $ip_behind_bridge

on bridge:
-j REDIRECT
-j DNAT --to-destination $ip_behind_bridge
-m physdev match with in/outdev match in FORWARD and INPUT (indev only)
- same w. active -j NFQUEUE.

 Patch 11 substitutes the pointer for on-demand lookups, most of the
 other patches prepare for this change by adding helpers and splitting
 state information into 'public' and 'bridge netfilter private'.

 The alternative to the rhashtable is to store the bridge netfilter
 metadata in skb->cb[], but there are some caveats since we need
 such metadata to survive local delivery too (else we'd break use of
 physdev match in INPUT).

 This is why external store was chosen.

 Feedback and suggestions welcome.

 include/linux/netfilter.h                  |    8 
 include/linux/netfilter_bridge.h           |  104 +++++-
 include/linux/skbuff.h                     |   87 ++---
 include/net/ip.h                           |    4 
 net/bridge/br_device.c                     |   19 -
 net/bridge/br_netfilter.c                  |  482 ++++++++++++++++++++---------
 net/bridge/br_private.h                    |    2 
 net/core/skbuff.c                          |    5 
 net/ipv4/ip_output.c                       |   30 +
 net/ipv4/netfilter/nf_defrag_ipv4.c        |    3 
 net/ipv4/netfilter/nf_reject_ipv4.c        |    6 
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |    3 
 net/ipv6/netfilter/nf_reject_ipv6.c        |    6 
 net/netfilter/core.c                       |   49 ++
 net/netfilter/ipset/ip_set_hash_netiface.c |   32 +
 net/netfilter/nf_log_common.c              |    7 
 net/netfilter/nf_queue.c                   |   22 -
 net/netfilter/nfnetlink_log.c              |   17 -
 net/netfilter/nfnetlink_queue_core.c       |   34 +-
 net/netfilter/xt_physdev.c                 |   36 +-
 20 files changed, 670 insertions(+), 286 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-02  8:53   ` Pablo Neira Ayuso
  2015-04-01 20:36 ` [PATCH nf-next 02/14] net: untangle ip_fragment and bridge netfilter Florian Westphal
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

We also need to save/store in forward, else br_parse_ip_options call
will zero frag_max_size as well.

Fixes: 93fdd47e5 ('bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING')
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_netfilter.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index f3884a1..282ed76 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -686,6 +686,13 @@ static int br_nf_forward_finish(struct sk_buff *skb)
 	struct net_device *in;
 
 	if (!IS_ARP(skb) && !IS_VLAN_ARP(skb)) {
+		int frag_max_size;
+
+		if (skb->protocol == htons(ETH_P_IP)) {
+			frag_max_size = IPCB(skb)->frag_max_size;
+			BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
+		}
+
 		in = nf_bridge->physindev;
 		if (nf_bridge->mask & BRNF_PKT_TYPE) {
 			skb->pkt_type = PACKET_OTHERHOST;
@@ -745,8 +752,14 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 		nf_bridge->mask |= BRNF_PKT_TYPE;
 	}
 
-	if (pf == NFPROTO_IPV4 && br_parse_ip_options(skb))
-		return NF_DROP;
+	if (pf == NFPROTO_IPV4) {
+		int frag_max = BR_INPUT_SKB_CB(skb)->frag_max_size;
+
+		if (br_parse_ip_options(skb))
+			return NF_DROP;
+
+		IPCB(skb)->frag_max_size = frag_max;
+	}
 
 	nf_bridge->physoutdev = skb->dev;
 	if (pf == NFPROTO_IPV4)
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 02/14] net: untangle ip_fragment and bridge netfilter
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-02  3:09   ` David Miller
  2015-04-01 20:36 ` [PATCH nf-next 03/14] netfilter: bridge: don't use nf_bridge_info data to store mac header Florian Westphal
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

Long time ago it was possible for the netfilter ip_conntrack
core to call ip_fragment in POST_ROUTING hook.

This is no longer the case, so the only case where bridge netfilter
ends up calling ip_fragment is the direct call site in br_netfilter.c.

Add mtu arguments to ip_fragment and remove the bridge netfilter mtu helper.

bridge netfilter uses following MTU choice strategy:

A - frag_max_size is set:
   We're looking at skb where at least one fragment had the DF bit set.
   Use frag_max_size as the MTU to re-create the original fragments as
   close as possible.

   If this would bring us over the MTU, silently drop the skb.
   (This can happen with bridge where not all devices have same MTU,
    but its difficult to imagine such setup that doesn't already have
    worse problems)

B - frag_max_size not set.
    This means original packet did not have DF bit set,
    just use the output device MTU for refragmentation.
    This is safe since any router in the path is free to re-fragment
    as needed.

C - skb too big and ignore_df not set.  Means the input device mtu is bigger
    than outdevice mtu.  Drop skb, setup should be fixed instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 I looked at various other alternatives for 'please undo defrag on this ipv4 packet'.
 But this is the most sensible thing to do.

 Keeping original fragments doesn't fly since we might have performed NAT
 or other manipluations on the (defragmented) skb.
 Building defrag/frag into GRO/GSO doesn't work either since the purpose
 of GRO is a lot different (we also can't rely on all fragments arriving
 in same napi context).

 Instead we refragment using size of largest fragment seen, or using
 the bridge device mtu.

 include/linux/netfilter_bridge.h |  7 -------
 include/net/ip.h                 |  4 ++--
 net/bridge/br_netfilter.c        | 40 +++++++++++++++++++++++++++++++++-------
 net/ipv4/ip_output.c             | 30 ++++++++++++++++++------------
 4 files changed, 53 insertions(+), 28 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 2734977..b131613 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -23,13 +23,6 @@ enum nf_br_hook_priorities {
 #define BRNF_8021Q			0x10
 #define BRNF_PPPoE			0x20
 
-static inline unsigned int nf_bridge_mtu_reduction(const struct sk_buff *skb)
-{
-	if (unlikely(skb->nf_bridge->mask & BRNF_PPPoE))
-		return PPPOE_SES_HLEN;
-	return 0;
-}
-
 int br_handle_frame_finish(struct sk_buff *skb);
 
 static inline void br_drop_fake_rtable(struct sk_buff *skb)
diff --git a/include/net/ip.h b/include/net/ip.h
index d0808a3..1fe203a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -108,8 +108,8 @@ int ip_local_deliver(struct sk_buff *skb);
 int ip_mr_input(struct sk_buff *skb);
 int ip_output(struct sock *sk, struct sk_buff *skb);
 int ip_mc_output(struct sock *sk, struct sk_buff *skb);
-int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
-int ip_do_nat(struct sk_buff *skb);
+int ip_fragment(struct sk_buff *skb, unsigned int mtu,
+		int (*output)(struct sk_buff *));
 void ip_send_check(struct iphdr *ip);
 int __ip_local_out(struct sk_buff *skb);
 int ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 282ed76..c0c8700 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -837,30 +837,56 @@ static int br_nf_push_frag_xmit(struct sk_buff *skb)
 	return br_dev_queue_push_xmit(skb);
 }
 
+static unsigned int nf_bridge_mtu_reduction(const struct sk_buff *skb)
+{
+	if (skb->nf_bridge->mask & BRNF_PPPoE)
+		return PPPOE_SES_HLEN;
+	return 0;
+}
+
 static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 {
 	int ret;
 	int frag_max_size;
-	unsigned int mtu_reserved;
+	unsigned int mtu_reserved, mtu;
 
 	if (skb_is_gso(skb) || skb->protocol != htons(ETH_P_IP))
 		return br_dev_queue_push_xmit(skb);
 
 	mtu_reserved = nf_bridge_mtu_reduction(skb);
+	mtu = min(skb->dev->mtu, IP_MAX_MTU) - mtu_reserved;
+
 	/* This is wrong! We should preserve the original fragment
 	 * boundaries by preserving frag_list rather than refragmenting.
+	 *
+	 * However, the skb might have been subject to NAT.
+	 * We try to un-do the fragmentation, using largest received fragment
+	 * size as mtu if it is set.
 	 */
-	if (skb->len + mtu_reserved > skb->dev->mtu) {
+	if (skb->len > mtu) {
+		if (!skb->ignore_df) /* was not a defragmented */
+			goto err_out;
+
 		frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
+		if (frag_max_size) {
+			if (frag_max_size > mtu) /* too big and DF set */
+				goto err_out;
+			mtu = frag_max_size;
+		}
+
 		if (br_parse_ip_options(skb))
-			/* Drop invalid packet */
-			return NF_DROP;
-		IPCB(skb)->frag_max_size = frag_max_size;
-		ret = ip_fragment(skb, br_nf_push_frag_xmit);
-	} else
+			goto err_out;
+
+		ret = ip_fragment(skb, mtu, br_nf_push_frag_xmit);
+	} else {
 		ret = br_dev_queue_push_xmit(skb);
+	}
 
 	return ret;
+ err_out:
+	IP_INC_STATS_BH(dev_net(skb->dev), IPSTATS_MIB_FRAGFAILS);
+	kfree_skb(skb);
+	return 0;
 }
 #else
 static int br_nf_dev_queue_xmit(struct sk_buff *skb)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8259e77..9054995 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -216,6 +216,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 	netdev_features_t features;
 	struct sk_buff *segs;
 	int ret = 0;
+	unsigned int mtu;
 
 	/* common case: locally created skb or seglen is <= mtu */
 	if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) ||
@@ -236,6 +237,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 		return -ENOMEM;
 	}
 
+	mtu = ip_skb_dst_mtu(skb);
 	consume_skb(skb);
 
 	do {
@@ -243,7 +245,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 		int err;
 
 		segs->next = NULL;
-		err = ip_fragment(segs, ip_finish_output2);
+		err = ip_fragment(segs, mtu, ip_finish_output2);
 
 		if (err && ret == 0)
 			ret = err;
@@ -255,6 +257,8 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 
 static int ip_finish_output(struct sk_buff *skb)
 {
+	unsigned int mtu;
+
 #if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
 	/* Policy lookup after SNAT yielded a new policy */
 	if (skb_dst(skb)->xfrm != NULL) {
@@ -265,8 +269,9 @@ static int ip_finish_output(struct sk_buff *skb)
 	if (skb_is_gso(skb))
 		return ip_finish_output_gso(skb);
 
-	if (skb->len > ip_skb_dst_mtu(skb))
-		return ip_fragment(skb, ip_finish_output2);
+	mtu = ip_skb_dst_mtu(skb);
+	if (skb->len > mtu)
+		return ip_fragment(skb, mtu, ip_finish_output2);
 
 	return ip_finish_output2(skb);
 }
@@ -473,20 +478,26 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 	skb_copy_secmark(to, from);
 }
 
-/*
+/**
+ *	ip_fragment - fragment IP datagram or send ICMP error
+ *
+ *	@skb: the skb to fragment
+ *	@mtu: mtu to use for fragmentation
+ *	@output: transmit function used to send fragments
+ *
  *	This IP datagram is too large to be sent in one piece.  Break it up into
  *	smaller pieces (each of size equal to IP header plus
  *	a block of the data of the original IP data part) that will yet fit in a
  *	single device frame, and queue such a frame for sending.
  */
-
-int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
+int ip_fragment(struct sk_buff *skb, unsigned int mtu,
+		int (*output)(struct sk_buff *))
 {
 	struct iphdr *iph;
 	int ptr;
 	struct net_device *dev;
 	struct sk_buff *skb2;
-	unsigned int mtu, hlen, left, len, ll_rs;
+	unsigned int hlen, left, len, ll_rs;
 	int offset;
 	__be16 not_last_frag;
 	struct rtable *rt = skb_rtable(skb);
@@ -500,7 +511,6 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 	iph = ip_hdr(skb);
 
-	mtu = ip_skb_dst_mtu(skb);
 	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) ||
 		     (IPCB(skb)->frag_max_size &&
 		      IPCB(skb)->frag_max_size > mtu))) {
@@ -517,10 +527,6 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 	hlen = iph->ihl * 4;
 	mtu = mtu - hlen;	/* Size of data space */
-#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (skb->nf_bridge)
-		mtu -= nf_bridge_mtu_reduction(skb);
-#endif
 	IPCB(skb)->flags |= IPSKB_FRAG_COMPLETE;
 
 	/* When frag_list is given, use it. First, check its validity:
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 03/14] netfilter: bridge: don't use nf_bridge_info data to store mac header
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 02/14] net: untangle ip_fragment and bridge netfilter Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 04/14] netfilter: bridge: start splitting mask into public/private chunks Florian Westphal
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

Currently br_netfilter maintains an extra state, nf_bridge_info,
which is attached to skb via skb->nf_bridge pointer.  Amongst
other things we use skb->nf_bridge->data to store the original
mac header for every processed skb.

This is required for ip refragmentation when using conntrack
on top of bridge, because ip_fragment doesn't copy it from original skb.

However there is no need anymore to do this unconditionally.

Move this to the one place where its needed -- when br_netfilter calls
ip_fragment().

Also switch to percpu storage for this so we can handle fragmenting
without accessing nf_bridge meta data.

After this change, only one user of skb->nf_bridge->data is left.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_netfilter.c | 57 ++++++++++++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index c0c8700..6ccb1af 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -111,6 +111,19 @@ static inline __be16 pppoe_proto(const struct sk_buff *skb)
 	 pppoe_proto(skb) == htons(PPP_IPV6) && \
 	 brnf_filter_pppoe_tagged)
 
+/* largest possible L2 header, see br_nf_dev_queue_xmit() */
+#define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN)
+
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
+struct brnf_frag_data {
+	char mac[NF_BRIDGE_MAX_MAC_HEADER_LENGTH];
+	u8 encap_size;
+	u8 size;
+};
+
+static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage);
+#endif
+
 static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
 {
 	struct net_bridge_port *port;
@@ -189,14 +202,6 @@ static inline void nf_bridge_pull_encap_header_rcsum(struct sk_buff *skb)
 	skb->network_header += len;
 }
 
-static inline void nf_bridge_save_header(struct sk_buff *skb)
-{
-	int header_size = ETH_HLEN + nf_bridge_encap_header_len(skb);
-
-	skb_copy_from_linear_data_offset(skb, -header_size,
-					 skb->nf_bridge->data, header_size);
-}
-
 /* When handing a packet over to the IP layer
  * check whether we have a skb that is in the
  * expected format
@@ -810,30 +815,22 @@ static unsigned int br_nf_forward_arp(const struct nf_hook_ops *ops,
 }
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
-static bool nf_bridge_copy_header(struct sk_buff *skb)
+static int br_nf_push_frag_xmit(struct sk_buff *skb)
 {
+	struct brnf_frag_data *data;
 	int err;
-	unsigned int header_size;
 
-	nf_bridge_update_protocol(skb);
-	header_size = ETH_HLEN + nf_bridge_encap_header_len(skb);
-	err = skb_cow_head(skb, header_size);
-	if (err)
-		return false;
-
-	skb_copy_to_linear_data_offset(skb, -header_size,
-				       skb->nf_bridge->data, header_size);
-	__skb_push(skb, nf_bridge_encap_header_len(skb));
-	return true;
-}
+	data = this_cpu_ptr(&brnf_frag_data_storage);
+	err = skb_cow_head(skb, data->size);
 
-static int br_nf_push_frag_xmit(struct sk_buff *skb)
-{
-	if (!nf_bridge_copy_header(skb)) {
+	if (err) {
 		kfree_skb(skb);
 		return 0;
 	}
 
+	skb_copy_to_linear_data_offset(skb, -data->size, data->mac, data->size);
+	__skb_push(skb, data->encap_size);
+
 	return br_dev_queue_push_xmit(skb);
 }
 
@@ -864,6 +861,8 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 	 * size as mtu if it is set.
 	 */
 	if (skb->len > mtu) {
+		struct brnf_frag_data *data;
+
 		if (!skb->ignore_df) /* was not a defragmented */
 			goto err_out;
 
@@ -877,6 +876,15 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 		if (br_parse_ip_options(skb))
 			goto err_out;
 
+		nf_bridge_update_protocol(skb);
+
+		data = this_cpu_ptr(&brnf_frag_data_storage);
+		data->encap_size = nf_bridge_encap_header_len(skb);
+		data->size = ETH_HLEN + data->encap_size;
+
+		skb_copy_from_linear_data_offset(skb, -data->size, data->mac,
+						 data->size);
+
 		ret = ip_fragment(skb, mtu, br_nf_push_frag_xmit);
 	} else {
 		ret = br_dev_queue_push_xmit(skb);
@@ -932,7 +940,6 @@ static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops,
 	}
 
 	nf_bridge_pull_encap_header(skb);
-	nf_bridge_save_header(skb);
 	if (pf == NFPROTO_IPV4)
 		skb->protocol = htons(ETH_P_IP);
 	else
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 04/14] netfilter: bridge: start splitting mask into public/private chunks
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (2 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 03/14] netfilter: bridge: don't use nf_bridge_info data to store mac header Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 05/14] netfilter: bridge: make BRNF_PKT_TYPE flag a bool Florian Westphal
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

->mask is a bit info field that mixes various use cases.

In particular, we have flags that are mutually exlusive, and flags that
are only used within br_netfilter while others need to be exposed to
other parts of the kernel.

Remove BRNF_8021Q/PPPoE flags.  They're mutually exclusive and only
needed within br_netfilter context.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter_bridge.h |  2 --
 include/linux/skbuff.h           |  5 +++++
 net/bridge/br_netfilter.c        | 17 ++++++++++++-----
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index b131613..848226e 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -20,8 +20,6 @@ enum nf_br_hook_priorities {
 #define BRNF_PKT_TYPE			0x01
 #define BRNF_BRIDGED_DNAT		0x02
 #define BRNF_NF_BRIDGE_PREROUTING	0x08
-#define BRNF_8021Q			0x10
-#define BRNF_PPPoE			0x20
 
 int br_handle_frame_finish(struct sk_buff *skb);
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 36f3f43..fe1b557 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -166,6 +166,11 @@ struct nf_conntrack {
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 struct nf_bridge_info {
 	atomic_t		use;
+	enum {
+		BRNF_PROTO_UNCHANGED,
+		BRNF_PROTO_8021Q,
+		BRNF_PROTO_PPPOE
+	} orig_proto;
 	unsigned int		mask;
 	struct net_device	*physindev;
 	struct net_device	*physoutdev;
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 6ccb1af..8c86f37 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -257,10 +257,16 @@ drop:
 
 static void nf_bridge_update_protocol(struct sk_buff *skb)
 {
-	if (skb->nf_bridge->mask & BRNF_8021Q)
+	switch (skb->nf_bridge->orig_proto) {
+	case BRNF_PROTO_8021Q:
 		skb->protocol = htons(ETH_P_8021Q);
-	else if (skb->nf_bridge->mask & BRNF_PPPoE)
+		break;
+	case BRNF_PROTO_PPPOE:
 		skb->protocol = htons(ETH_P_PPP_SES);
+		break;
+	case BRNF_PROTO_UNCHANGED:
+		break;
+	}
 }
 
 /* PF_BRIDGE/PRE_ROUTING *********************************************/
@@ -498,10 +504,11 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb)
 	nf_bridge->mask |= BRNF_NF_BRIDGE_PREROUTING;
 	nf_bridge->physindev = skb->dev;
 	skb->dev = brnf_get_logical_dev(skb, skb->dev);
+
 	if (skb->protocol == htons(ETH_P_8021Q))
-		nf_bridge->mask |= BRNF_8021Q;
+		nf_bridge->orig_proto = BRNF_PROTO_8021Q;
 	else if (skb->protocol == htons(ETH_P_PPP_SES))
-		nf_bridge->mask |= BRNF_PPPoE;
+		nf_bridge->orig_proto = BRNF_PROTO_PPPOE;
 
 	/* Must drop socket now because of tproxy. */
 	skb_orphan(skb);
@@ -836,7 +843,7 @@ static int br_nf_push_frag_xmit(struct sk_buff *skb)
 
 static unsigned int nf_bridge_mtu_reduction(const struct sk_buff *skb)
 {
-	if (skb->nf_bridge->mask & BRNF_PPPoE)
+	if (skb->nf_bridge->orig_proto == BRNF_PROTO_PPPOE)
 		return PPPOE_SES_HLEN;
 	return 0;
 }
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 05/14] netfilter: bridge: make BRNF_PKT_TYPE flag a bool
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (3 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 04/14] netfilter: bridge: start splitting mask into public/private chunks Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 06/14] netfilter: bridge: rename and resize 'data' field Florian Westphal
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

nf_bridge_info->mask is used for several things, for example to
remember if skb->pkt_type was set to OTHER_HOST.

For a bridge, OTHER_HOST is expected case. For ip forward its a non-starter
though -- routing expects PACKET_HOST.

Bridge netfilter thus changes OTHER_HOST to PACKET_HOST before hook
invocation and then un-does it after hook traversal.

This information is irrelevant outside of br_netfilter.

After this change, ->mask now only contains flags that need to be
known outside of br_netfilter in fast-path.

Next patch will change mask into a 2bit state field in sk_buff, so that
we can remove skb->nf_bridge pointer for good and consider all remaining
places that access nf_bridge info content a slowpath.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter_bridge.h |  1 -
 include/linux/skbuff.h           |  1 +
 net/bridge/br_netfilter.c        | 18 +++++++++---------
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 848226e..66245b5 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -17,7 +17,6 @@ enum nf_br_hook_priorities {
 
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 
-#define BRNF_PKT_TYPE			0x01
 #define BRNF_BRIDGED_DNAT		0x02
 #define BRNF_NF_BRIDGE_PREROUTING	0x08
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fe1b557..afa53e4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -171,6 +171,7 @@ struct nf_bridge_info {
 		BRNF_PROTO_8021Q,
 		BRNF_PROTO_PPPOE
 	} orig_proto;
+	bool			pkt_otherhost;
 	unsigned int		mask;
 	struct net_device	*physindev;
 	struct net_device	*physoutdev;
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 8c86f37..02ccf38 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -277,9 +277,9 @@ static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
 	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
 	struct rtable *rt;
 
-	if (nf_bridge->mask & BRNF_PKT_TYPE) {
+	if (nf_bridge->pkt_otherhost) {
 		skb->pkt_type = PACKET_OTHERHOST;
-		nf_bridge->mask ^= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = false;
 	}
 	nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
 
@@ -410,9 +410,9 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
 	frag_max_size = IPCB(skb)->frag_max_size;
 	BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
 
-	if (nf_bridge->mask & BRNF_PKT_TYPE) {
+	if (nf_bridge->pkt_otherhost) {
 		skb->pkt_type = PACKET_OTHERHOST;
-		nf_bridge->mask ^= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = false;
 	}
 	nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
 	if (dnat_took_place(skb)) {
@@ -498,7 +498,7 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb)
 
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
-		nf_bridge->mask |= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = true;
 	}
 
 	nf_bridge->mask |= BRNF_NF_BRIDGE_PREROUTING;
@@ -706,9 +706,9 @@ static int br_nf_forward_finish(struct sk_buff *skb)
 		}
 
 		in = nf_bridge->physindev;
-		if (nf_bridge->mask & BRNF_PKT_TYPE) {
+		if (nf_bridge->pkt_otherhost) {
 			skb->pkt_type = PACKET_OTHERHOST;
-			nf_bridge->mask ^= BRNF_PKT_TYPE;
+			nf_bridge->pkt_otherhost = false;
 		}
 		nf_bridge_update_protocol(skb);
 	} else {
@@ -761,7 +761,7 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 	nf_bridge = skb->nf_bridge;
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
-		nf_bridge->mask |= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = true;
 	}
 
 	if (pf == NFPROTO_IPV4) {
@@ -943,7 +943,7 @@ static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops,
 	 * about the value of skb->pkt_type. */
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
-		nf_bridge->mask |= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = true;
 	}
 
 	nf_bridge_pull_encap_header(skb);
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 06/14] netfilter: bridge: rename and resize 'data' field
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (4 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 05/14] netfilter: bridge: make BRNF_PKT_TYPE flag a bool Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 07/14] netfilter: bridge: add helpers for fetching physin/outdev Florian Westphal
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

Only user left is neigh resolution when DNAT is detected, to hold
the original source mac address (neigh resolution builds new mac header
using bridge mac).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/skbuff.h    | 2 +-
 net/bridge/br_netfilter.c | 9 ++++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index afa53e4..0991259 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -175,7 +175,7 @@ struct nf_bridge_info {
 	unsigned int		mask;
 	struct net_device	*physindev;
 	struct net_device	*physoutdev;
-	unsigned long		data[32 / sizeof(unsigned long)];
+	char			neigh_header[8];
 };
 #endif
 
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 02ccf38..947a2f6 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -329,7 +329,7 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 			 */
 			skb_copy_from_linear_data_offset(skb,
 							 -(ETH_HLEN-ETH_ALEN),
-							 skb->nf_bridge->data,
+							 nf_bridge->neigh_header,
 							 ETH_HLEN-ETH_ALEN);
 			/* tell br_dev_xmit to continue with forwarding */
 			nf_bridge->mask |= BRNF_BRIDGED_DNAT;
@@ -991,8 +991,11 @@ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 	skb_pull(skb, ETH_HLEN);
 	nf_bridge->mask &= ~BRNF_BRIDGED_DNAT;
 
-	skb_copy_to_linear_data_offset(skb, -(ETH_HLEN-ETH_ALEN),
-				       skb->nf_bridge->data, ETH_HLEN-ETH_ALEN);
+	BUILD_BUG_ON(sizeof(nf_bridge->neigh_header) != (ETH_HLEN - ETH_ALEN));
+
+	skb_copy_to_linear_data_offset(skb, -(ETH_HLEN - ETH_ALEN),
+				       nf_bridge->neigh_header,
+				       ETH_HLEN - ETH_ALEN);
 	skb->dev = nf_bridge->physindev;
 	br_handle_frame_finish(skb);
 }
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 07/14] netfilter: bridge: add helpers for fetching physin/outdev
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (5 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 06/14] netfilter: bridge: rename and resize 'data' field Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 08/14] netfilter: physdev: use helpers Florian Westphal
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

right now we store this in the nf_bridge_info struct, accessible
via skb->nf_bridge.  This patch prepares removal of this pointer from skb:

Instead of using skb->nf_bridge->x, we use helpers to obtain the in/out
device (or ifindexes).

Followup patches to netfilter will then allow nf_bridge_info to be
obtained by a call into the br_netfilter core, rather than keeping a
pointer to it in sk_buff.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter_bridge.h           | 23 ++++++++++++++++++++-
 net/ipv4/netfilter/nf_reject_ipv4.c        |  4 +++-
 net/ipv6/netfilter/nf_reject_ipv6.c        |  4 +++-
 net/netfilter/ipset/ip_set_hash_netiface.c | 32 ++++++++++++++++++++++--------
 net/netfilter/nf_log_common.c              |  5 +++--
 net/netfilter/nf_queue.c                   | 18 +++++++++--------
 net/netfilter/nfnetlink_log.c              | 17 ++++++++++++----
 net/netfilter/nfnetlink_queue_core.c       | 28 +++++++++++++++++---------
 8 files changed, 97 insertions(+), 34 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 66245b5..61251e4 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -2,7 +2,7 @@
 #define __LINUX_BRIDGE_NETFILTER_H
 
 #include <uapi/linux/netfilter_bridge.h>
-
+#include <linux/skbuff.h>
 
 enum nf_br_hook_priorities {
 	NF_BR_PRI_FIRST = INT_MIN,
@@ -30,6 +30,27 @@ static inline void br_drop_fake_rtable(struct sk_buff *skb)
 		skb_dst_drop(skb);
 }
 
+static inline int nf_bridge_get_physinif(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physindev->ifindex : 0;
+}
+
+static inline int nf_bridge_get_physoutif(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physoutdev->ifindex : 0;
+}
+
+static inline struct net_device *
+nf_bridge_get_physindev(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physindev : NULL;
+}
+
+static inline struct net_device *
+nf_bridge_get_physoutdev(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physoutdev : NULL;
+}
 #else
 #define br_drop_fake_rtable(skb)	        do { } while (0)
 #endif /* CONFIG_BRIDGE_NETFILTER */
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index c5b794d..3262e41 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -13,6 +13,7 @@
 #include <net/dst.h>
 #include <net/netfilter/ipv4/nf_reject.h>
 #include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_bridge.h>
 #include <net/netfilter/ipv4/nf_reject.h>
 
 const struct tcphdr *nf_reject_ip_tcphdr_get(struct sk_buff *oldskb,
@@ -146,7 +147,8 @@ void nf_send_reset(struct sk_buff *oldskb, int hook)
 	 */
 	if (oldskb->nf_bridge) {
 		struct ethhdr *oeth = eth_hdr(oldskb);
-		nskb->dev = oldskb->nf_bridge->physindev;
+
+		nskb->dev = nf_bridge_get_physindev(oldskb);
 		niph->tot_len = htons(nskb->len);
 		ip_send_check(niph);
 		if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol),
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 3afdce0..94b4c6d 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -13,6 +13,7 @@
 #include <net/ip6_checksum.h>
 #include <net/netfilter/ipv6/nf_reject.h>
 #include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_bridge.h>
 #include <net/netfilter/ipv6/nf_reject.h>
 
 const struct tcphdr *nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb,
@@ -195,7 +196,8 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 	 */
 	if (oldskb->nf_bridge) {
 		struct ethhdr *oeth = eth_hdr(oldskb);
-		nskb->dev = oldskb->nf_bridge->physindev;
+
+		nskb->dev = nf_bridge_get_physindev(oldskb);
 		nskb->protocol = htons(ETH_P_IPV6);
 		ip6h->payload_len = htons(sizeof(struct tcphdr));
 		if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol),
diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c
index 758b002..380ef51 100644
--- a/net/netfilter/ipset/ip_set_hash_netiface.c
+++ b/net/netfilter/ipset/ip_set_hash_netiface.c
@@ -19,6 +19,7 @@
 #include <net/netlink.h>
 
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/netfilter/ipset/pfxlen.h>
 #include <linux/netfilter/ipset/ip_set.h>
 #include <linux/netfilter/ipset/ip_set_hash.h>
@@ -211,6 +212,22 @@ hash_netiface4_data_next(struct hash_netiface4_elem *next,
 #define HKEY_DATALEN	sizeof(struct hash_netiface4_elem_hashed)
 #include "ip_set_hash_gen.h"
 
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+static const char *get_physindev_name(const struct sk_buff *skb)
+{
+	struct net_device *dev = nf_bridge_get_physindev(skb);
+
+	return dev ? dev->name : NULL;
+}
+
+static const char *get_phyoutdev_name(const struct sk_buff *skb)
+{
+	struct net_device *dev = nf_bridge_get_physoutdev(skb);
+
+	return dev ? dev->name : NULL;
+}
+#endif
+
 static int
 hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb,
 		    const struct xt_action_param *par,
@@ -234,16 +251,15 @@ hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	e.ip &= ip_set_netmask(e.cidr);
 
 #define IFACE(dir)	(par->dir ? par->dir->name : NULL)
-#define PHYSDEV(dir)	(nf_bridge->dir ? nf_bridge->dir->name : NULL)
 #define SRCDIR		(opt->flags & IPSET_DIM_TWO_SRC)
 
 	if (opt->cmdflags & IPSET_FLAG_PHYSDEV) {
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-		const struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+		e.iface = SRCDIR ? get_physindev_name(skb) :
+				   get_phyoutdev_name(skb);
 
-		if (!nf_bridge)
+		if (!e.iface)
 			return -EINVAL;
-		e.iface = SRCDIR ? PHYSDEV(physindev) : PHYSDEV(physoutdev);
 		e.physdev = 1;
 #else
 		e.iface = NULL;
@@ -476,11 +492,11 @@ hash_netiface6_kadt(struct ip_set *set, const struct sk_buff *skb,
 
 	if (opt->cmdflags & IPSET_FLAG_PHYSDEV) {
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-		const struct nf_bridge_info *nf_bridge = skb->nf_bridge;
-
-		if (!nf_bridge)
+		e.iface = SRCDIR ? get_physindev_name(skb) :
+				   get_phyoutdev_name(skb);
+		if (!e.iface)
 			return -EINVAL;
-		e.iface = SRCDIR ? PHYSDEV(physindev) : PHYSDEV(physoutdev);
+
 		e.physdev = 1;
 #else
 		e.iface = NULL;
diff --git a/net/netfilter/nf_log_common.c b/net/netfilter/nf_log_common.c
index 2631876..a5aa596 100644
--- a/net/netfilter/nf_log_common.c
+++ b/net/netfilter/nf_log_common.c
@@ -17,6 +17,7 @@
 #include <net/route.h>
 
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/netfilter/xt_LOG.h>
 #include <net/netfilter/nf_log.h>
 
@@ -163,10 +164,10 @@ nf_log_dump_packet_common(struct nf_log_buf *m, u_int8_t pf,
 		const struct net_device *physindev;
 		const struct net_device *physoutdev;
 
-		physindev = skb->nf_bridge->physindev;
+		physindev = nf_bridge_get_physindev(skb);
 		if (physindev && in != physindev)
 			nf_log_buf_add(m, "PHYSIN=%s ", physindev->name);
-		physoutdev = skb->nf_bridge->physoutdev;
+		physoutdev = nf_bridge_get_physoutdev(skb);
 		if (physoutdev && out != physoutdev)
 			nf_log_buf_add(m, "PHYSOUT=%s ", physoutdev->name);
 	}
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 4c8b68e..fb045b4 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -10,6 +10,7 @@
 #include <linux/proc_fs.h>
 #include <linux/skbuff.h>
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/seq_file.h>
 #include <linux/rcupdate.h>
 #include <net/protocol.h>
@@ -54,12 +55,14 @@ void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 		dev_put(entry->outdev);
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (entry->skb->nf_bridge) {
-		struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
+		struct net_device *physdev;
 
-		if (nf_bridge->physindev)
-			dev_put(nf_bridge->physindev);
-		if (nf_bridge->physoutdev)
-			dev_put(nf_bridge->physoutdev);
+		physdev = nf_bridge_get_physindev(entry->skb);
+		if (physdev)
+			dev_put(physdev);
+		physdev = nf_bridge_get_physoutdev(entry->skb);
+		if (physdev)
+			dev_put(physdev);
 	}
 #endif
 	/* Drop reference to owner of hook which queued us. */
@@ -79,13 +82,12 @@ bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
 		dev_hold(entry->outdev);
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (entry->skb->nf_bridge) {
-		struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
 		struct net_device *physdev;
 
-		physdev = nf_bridge->physindev;
+		physdev = nf_bridge_get_physindev(entry->skb);
 		if (physdev)
 			dev_hold(physdev);
-		physdev = nf_bridge->physoutdev;
+		physdev = nf_bridge_get_physoutdev(entry->skb);
 		if (physdev)
 			dev_hold(physdev);
 	}
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 957b83a..51afea4 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -23,6 +23,7 @@
 #include <linux/ipv6.h>
 #include <linux/netdevice.h>
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <net/netlink.h>
 #include <linux/netfilter/nfnetlink.h>
 #include <linux/netfilter/nfnetlink_log.h>
@@ -448,14 +449,18 @@ __build_packet_message(struct nfnl_log_net *log,
 					 htonl(br_port_get_rcu(indev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			struct net_device *physindev;
+
 			/* Case 2: indev is bridge group, we need to look for
 			 * physical device (when called from ipv4) */
 			if (nla_put_be32(inst->skb, NFULA_IFINDEX_INDEV,
 					 htonl(indev->ifindex)))
 				goto nla_put_failure;
-			if (skb->nf_bridge && skb->nf_bridge->physindev &&
+
+			physindev = nf_bridge_get_physindev(skb);
+			if (physindev &&
 			    nla_put_be32(inst->skb, NFULA_IFINDEX_PHYSINDEV,
-					 htonl(skb->nf_bridge->physindev->ifindex)))
+					 htonl(physindev->ifindex)))
 				goto nla_put_failure;
 		}
 #endif
@@ -479,14 +484,18 @@ __build_packet_message(struct nfnl_log_net *log,
 					 htonl(br_port_get_rcu(outdev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			struct net_device *physoutdev;
+
 			/* Case 2: indev is a bridge group, we need to look
 			 * for physical device (when called from ipv4) */
 			if (nla_put_be32(inst->skb, NFULA_IFINDEX_OUTDEV,
 					 htonl(outdev->ifindex)))
 				goto nla_put_failure;
-			if (skb->nf_bridge && skb->nf_bridge->physoutdev &&
+
+			physoutdev = nf_bridge_get_physoutdev(skb);
+			if (physoutdev &&
 			    nla_put_be32(inst->skb, NFULA_IFINDEX_PHYSOUTDEV,
-					 htonl(skb->nf_bridge->physoutdev->ifindex)))
+					 htonl(physoutdev->ifindex)))
 				goto nla_put_failure;
 		}
 #endif
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 86ee8b0..94e1aaf 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -25,6 +25,7 @@
 #include <linux/proc_fs.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/netfilter/nfnetlink.h>
 #include <linux/netfilter/nfnetlink_queue.h>
 #include <linux/list.h>
@@ -396,14 +397,18 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 					 htonl(br_port_get_rcu(indev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			int physinif;
+
 			/* Case 2: indev is bridge group, we need to look for
 			 * physical device (when called from ipv4) */
 			if (nla_put_be32(skb, NFQA_IFINDEX_INDEV,
 					 htonl(indev->ifindex)))
 				goto nla_put_failure;
-			if (entskb->nf_bridge && entskb->nf_bridge->physindev &&
+
+			physinif = nf_bridge_get_physinif(entskb);
+			if (physinif &&
 			    nla_put_be32(skb, NFQA_IFINDEX_PHYSINDEV,
-					 htonl(entskb->nf_bridge->physindev->ifindex)))
+					 htonl(physinif)))
 				goto nla_put_failure;
 		}
 #endif
@@ -426,14 +431,18 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 					 htonl(br_port_get_rcu(outdev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			int physoutif;
+
 			/* Case 2: outdev is bridge group, we need to look for
 			 * physical output device (when called from ipv4) */
 			if (nla_put_be32(skb, NFQA_IFINDEX_OUTDEV,
 					 htonl(outdev->ifindex)))
 				goto nla_put_failure;
-			if (entskb->nf_bridge && entskb->nf_bridge->physoutdev &&
+
+			physoutif = nf_bridge_get_physoutif(entskb);
+			if (physoutif &&
 			    nla_put_be32(skb, NFQA_IFINDEX_PHYSOUTDEV,
-					 htonl(entskb->nf_bridge->physoutdev->ifindex)))
+					 htonl(physoutif)))
 				goto nla_put_failure;
 		}
 #endif
@@ -765,11 +774,12 @@ dev_cmp(struct nf_queue_entry *entry, unsigned long ifindex)
 			return 1;
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (entry->skb->nf_bridge) {
-		if (entry->skb->nf_bridge->physindev &&
-		    entry->skb->nf_bridge->physindev->ifindex == ifindex)
-			return 1;
-		if (entry->skb->nf_bridge->physoutdev &&
-		    entry->skb->nf_bridge->physoutdev->ifindex == ifindex)
+		int physinif, physoutif;
+
+		physinif = nf_bridge_get_physinif(entry->skb);
+		physoutif = nf_bridge_get_physoutif(entry->skb);
+
+		if (physinif == ifindex || physoutif == ifindex)
 			return 1;
 	}
 #endif
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 08/14] netfilter: physdev: use helpers
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (6 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 07/14] netfilter: bridge: add helpers for fetching physin/outdev Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 09/14] netfilter: bridge: add and use nf_bridge_info_get helper Florian Westphal
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

Avoid skb->nf_bridge accesses where possible.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_physdev.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index 50a5204..1caaccb 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -25,16 +25,15 @@ MODULE_ALIAS("ip6t_physdev");
 static bool
 physdev_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
-	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	const struct xt_physdev_info *info = par->matchinfo;
+	const struct net_device *physdev;
 	unsigned long ret;
 	const char *indev, *outdev;
-	const struct nf_bridge_info *nf_bridge;
 
 	/* Not a bridged IP packet or no info available yet:
 	 * LOCAL_OUT/mangle and LOCAL_OUT/nat don't know if
 	 * the destination device will be a bridge. */
-	if (!(nf_bridge = skb->nf_bridge)) {
+	if (!skb->nf_bridge) {
 		/* Return MATCH if the invert flags of the used options are on */
 		if ((info->bitmask & XT_PHYSDEV_OP_BRIDGED) &&
 		    !(info->invert & XT_PHYSDEV_OP_BRIDGED))
@@ -54,30 +53,41 @@ physdev_mt(const struct sk_buff *skb, struct xt_action_param *par)
 		return true;
 	}
 
+	physdev = nf_bridge_get_physoutdev(skb);
+	outdev = physdev ? physdev->name : NULL;
+
 	/* This only makes sense in the FORWARD and POSTROUTING chains */
 	if ((info->bitmask & XT_PHYSDEV_OP_BRIDGED) &&
-	    (!!nf_bridge->physoutdev ^ !(info->invert & XT_PHYSDEV_OP_BRIDGED)))
+	    (!!outdev ^ !(info->invert & XT_PHYSDEV_OP_BRIDGED)))
 		return false;
 
+	physdev = nf_bridge_get_physindev(skb);
+	indev = physdev ? physdev->name : NULL;
+
 	if ((info->bitmask & XT_PHYSDEV_OP_ISIN &&
-	    (!nf_bridge->physindev ^ !!(info->invert & XT_PHYSDEV_OP_ISIN))) ||
+	    (!indev ^ !!(info->invert & XT_PHYSDEV_OP_ISIN))) ||
 	    (info->bitmask & XT_PHYSDEV_OP_ISOUT &&
-	    (!nf_bridge->physoutdev ^ !!(info->invert & XT_PHYSDEV_OP_ISOUT))))
+	    (!outdev ^ !!(info->invert & XT_PHYSDEV_OP_ISOUT))))
 		return false;
 
 	if (!(info->bitmask & XT_PHYSDEV_OP_IN))
 		goto match_outdev;
-	indev = nf_bridge->physindev ? nf_bridge->physindev->name : nulldevname;
-	ret = ifname_compare_aligned(indev, info->physindev, info->in_mask);
 
-	if (!ret ^ !(info->invert & XT_PHYSDEV_OP_IN))
-		return false;
+	if (indev) {
+		ret = ifname_compare_aligned(indev, info->physindev,
+					     info->in_mask);
+
+		if (!ret ^ !(info->invert & XT_PHYSDEV_OP_IN))
+			return false;
+	}
 
 match_outdev:
 	if (!(info->bitmask & XT_PHYSDEV_OP_OUT))
 		return true;
-	outdev = nf_bridge->physoutdev ?
-		 nf_bridge->physoutdev->name : nulldevname;
+
+	if (!outdev)
+		return false;
+
 	ret = ifname_compare_aligned(outdev, info->physoutdev, info->out_mask);
 
 	return (!!ret ^ !(info->invert & XT_PHYSDEV_OP_OUT));
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 09/14] netfilter: bridge: add and use nf_bridge_info_get helper
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (7 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 08/14] netfilter: physdev: use helpers Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 10/14] netfilter: bridge: move bridge netfilter state into sk_buff Florian Westphal
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

Don't access skb->nf_bridge directly, this pointer will be removed soon.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_netfilter.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 947a2f6..40009b1 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -124,6 +124,11 @@ struct brnf_frag_data {
 static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage);
 #endif
 
+static struct nf_bridge_info *nf_bridge_info_get(const struct sk_buff *skb)
+{
+	return skb->nf_bridge;
+}
+
 static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
 {
 	struct net_bridge_port *port;
@@ -274,7 +279,7 @@ static void nf_bridge_update_protocol(struct sk_buff *skb)
  * bridge PRE_ROUTING hook. */
 static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct rtable *rt;
 
 	if (nf_bridge->pkt_otherhost) {
@@ -306,7 +311,6 @@ static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
  */
 static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
 	struct neighbour *neigh;
 	struct dst_entry *dst;
 
@@ -316,6 +320,7 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 	dst = skb_dst(skb);
 	neigh = dst_neigh_lookup_skb(dst, skb);
 	if (neigh) {
+		struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 		int ret;
 
 		if (neigh->hh.hh_len) {
@@ -402,7 +407,7 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
 	struct iphdr *iph = ip_hdr(skb);
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct rtable *rt;
 	int err;
 	int frag_max_size;
@@ -494,7 +499,7 @@ static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct
 /* Some common code for IPv4/IPv6 */
 static struct net_device *setup_pre_routing(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
@@ -694,7 +699,7 @@ static unsigned int br_nf_local_in(const struct nf_hook_ops *ops,
 /* PF_BRIDGE/FORWARD *************************************************/
 static int br_nf_forward_finish(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct net_device *in;
 
 	if (!IS_ARP(skb) && !IS_VLAN_ARP(skb)) {
@@ -745,6 +750,10 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 	if (!nf_bridge_unshare(skb))
 		return NF_DROP;
 
+	nf_bridge = nf_bridge_info_get(skb);
+	if (!nf_bridge)
+		return NF_DROP;
+
 	parent = bridge_parent(out);
 	if (!parent)
 		return NF_DROP;
@@ -758,7 +767,6 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 
 	nf_bridge_pull_encap_header(skb);
 
-	nf_bridge = skb->nf_bridge;
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
 		nf_bridge->pkt_otherhost = true;
@@ -917,7 +925,7 @@ static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops,
 				       const struct net_device *out,
 				       int (*okfn)(struct sk_buff *))
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct net_device *realoutdev = bridge_parent(skb->dev);
 	u_int8_t pf;
 
@@ -986,7 +994,7 @@ static unsigned int ip_sabotage_in(const struct nf_hook_ops *ops,
  */
 static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
 	skb_pull(skb, ETH_HLEN);
 	nf_bridge->mask &= ~BRNF_BRIDGED_DNAT;
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 10/14] netfilter: bridge: move bridge netfilter state into sk_buff
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (8 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 09/14] netfilter: bridge: add and use nf_bridge_info_get helper Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 11/14] netfilter: bridge: remove skb->nf_bridge Florian Westphal
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

skb->nf_bridge is only used when interface is part of a bridge and
bridge netfilter is used with 'call-iptables' sysctl set to 1 and
the skb was received on a bridge port. IOW, almost noone uses it.

This moves the skb bridge netfilter state (2 bits) to the skb itself.

This means we can now determine if an skb is subject to bridge
netfilter by looking at skb->nf_bridge_state in addition to
skb->nf_bridge pointer.

This makes it possible to remove the skb->nf_bridge in a followup
patch, while still allowing fast-path users to termine if slow-path
operations are needed by looking at skb->nf_bridge_state.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter_bridge.h          | 14 ++++++++++--
 include/linux/skbuff.h                    | 28 ++++++++++++++++-------
 net/bridge/br_device.c                    | 19 +++++++++++-----
 net/bridge/br_netfilter.c                 | 37 +++++++++++++------------------
 net/bridge/br_private.h                   |  2 +-
 net/core/skbuff.c                         |  3 ++-
 net/ipv4/netfilter/nf_defrag_ipv4.c       |  3 +--
 net/ipv4/netfilter/nf_reject_ipv4.c       |  2 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c |  3 +--
 net/ipv6/netfilter/nf_reject_ipv6.c       |  2 +-
 net/netfilter/nf_log_common.c             |  2 +-
 net/netfilter/nf_queue.c                  |  4 ++--
 net/netfilter/nfnetlink_queue_core.c      |  6 ++---
 net/netfilter/xt_physdev.c                |  2 +-
 14 files changed, 75 insertions(+), 52 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 61251e4..f2d7abc 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -16,9 +16,19 @@ enum nf_br_hook_priorities {
 };
 
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+enum brnf_state {
+	BRNF_STATE_NONE,
+	BRNF_STATE_SEEN = 1,
 
-#define BRNF_BRIDGED_DNAT		0x02
-#define BRNF_NF_BRIDGE_PREROUTING	0x08
+	/* IPV4/IPV6 PRE_ROUTING called from bridge netfilter */
+	BRNF_STATE_PREROUTING,
+
+	/* skb that is 'transmitted' via bridge must to be injected
+	 * back into br forwarding for delivery to the correct bridge output
+	 * port due to DNAT to a destination on the same (bridged) network.
+	 */
+	BRNF_STATE_BRIDGED_DNAT,
+};
 
 int br_handle_frame_finish(struct sk_buff *skb);
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0991259..c060db5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -172,7 +172,6 @@ struct nf_bridge_info {
 		BRNF_PROTO_PPPOE
 	} orig_proto;
 	bool			pkt_otherhost;
-	unsigned int		mask;
 	struct net_device	*physindev;
 	struct net_device	*physoutdev;
 	char			neigh_header[8];
@@ -474,6 +473,9 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
  *	@pkt_type: Packet class
  *	@fclone: skbuff clone status
  *	@ipvs_property: skbuff is owned by ipvs
+ *	@inner_protocol_type: encapsulation type
+ *	@remcsum_offload: udp remote checksum offload
+ *	@nf_bridge_state: bridge netfilter skb state
  *	@peeked: this packet has been seen already, so stats have been
  *		done for it, don't do them again
  *	@nf_trace: netfilter packet trace flag
@@ -614,8 +616,8 @@ struct sk_buff {
 	__u8			ipvs_property:1;
 	__u8			inner_protocol_type:1;
 	__u8			remcsum_offload:1;
-	/* 3 or 5 bit hole */
-
+	__u8			nf_bridge_state:2;
+	/* 1 or 3 bit hole */
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
 #ifdef CONFIG_NET_CLS_ACT
@@ -3180,8 +3182,11 @@ static inline void nf_reset(struct sk_buff *skb)
 	skb->nfct = NULL;
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	nf_bridge_put(skb->nf_bridge);
-	skb->nf_bridge = NULL;
+	if (skb->nf_bridge_state) {
+		nf_bridge_put(skb->nf_bridge);
+		skb->nf_bridge = NULL;
+		skb->nf_bridge_state = 0;
+	}
 #endif
 }
 
@@ -3203,8 +3208,12 @@ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src,
 		dst->nfctinfo = src->nfctinfo;
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	dst->nf_bridge  = src->nf_bridge;
-	nf_bridge_get(src->nf_bridge);
+	if (src->nf_bridge_state) {
+		dst->nf_bridge = src->nf_bridge;
+		nf_bridge_get(src->nf_bridge);
+	} else {
+		dst->nf_bridge = NULL;
+	}
 #endif
 #if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE) || defined(CONFIG_NF_TABLES)
 	if (copy)
@@ -3218,7 +3227,10 @@ static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src)
 	nf_conntrack_put(dst->nfct);
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	nf_bridge_put(dst->nf_bridge);
+	if (dst->nf_bridge_state) {
+		nf_bridge_put(dst->nf_bridge);
+		dst->nf_bridge = NULL;
+	}
 #endif
 	__nf_copy(dst, src, true);
 }
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 4ff77a1..99b78ff 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -36,16 +36,23 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct net_bridge_fdb_entry *dst;
 	struct net_bridge_mdb_entry *mdst;
 	struct pcpu_sw_netstats *brstats = this_cpu_ptr(br->stats);
-	const struct nf_br_ops *nf_ops;
 	u16 vid = 0;
 
 	rcu_read_lock();
-	nf_ops = rcu_dereference(nf_br_ops);
-	if (nf_ops && nf_ops->br_dev_xmit_hook(skb)) {
-		rcu_read_unlock();
-		return NETDEV_TX_OK;
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+	if (skb->nf_bridge_state == BRNF_STATE_BRIDGED_DNAT) {
+		const struct nf_br_ops *nf_ops;
+
+		nf_ops = rcu_dereference(nf_br_ops);
+		if (nf_ops) {
+			nf_ops->br_dev_dnat_hook(skb);
+		} else {
+			/* br_netfilter module removed while skb in qdisc */
+			kfree_skb(skb);
+		}
+		goto out;
 	}
-
+#endif
 	u64_stats_update_begin(&brstats->syncp);
 	brstats->tx_packets++;
 	brstats->tx_bytes += skb->len;
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 40009b1..832164e 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -148,8 +148,11 @@ static inline struct net_device *bridge_parent(const struct net_device *dev)
 static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
 {
 	skb->nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC);
-	if (likely(skb->nf_bridge))
+
+	if (likely(skb->nf_bridge)) {
 		atomic_set(&(skb->nf_bridge->use), 1);
+		skb->nf_bridge_state = BRNF_STATE_SEEN;
+	}
 
 	return skb->nf_bridge;
 }
@@ -286,7 +289,7 @@ static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
 		skb->pkt_type = PACKET_OTHERHOST;
 		nf_bridge->pkt_otherhost = false;
 	}
-	nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
+	skb->nf_bridge_state = BRNF_STATE_SEEN;
 
 	rt = bridge_parent_rtable(nf_bridge->physindev);
 	if (!rt) {
@@ -337,8 +340,7 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 							 nf_bridge->neigh_header,
 							 ETH_HLEN-ETH_ALEN);
 			/* tell br_dev_xmit to continue with forwarding */
-			nf_bridge->mask |= BRNF_BRIDGED_DNAT;
-			/* FIXME Need to refragment */
+			skb->nf_bridge_state = BRNF_STATE_BRIDGED_DNAT;
 			ret = neigh->output(neigh, skb);
 		}
 		neigh_release(neigh);
@@ -419,7 +421,7 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
 		skb->pkt_type = PACKET_OTHERHOST;
 		nf_bridge->pkt_otherhost = false;
 	}
-	nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
+	skb->nf_bridge_state = BRNF_STATE_SEEN;
 	if (dnat_took_place(skb)) {
 		if ((err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, dev))) {
 			struct in_device *in_dev = __in_dev_get_rcu(dev);
@@ -506,7 +508,7 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb)
 		nf_bridge->pkt_otherhost = true;
 	}
 
-	nf_bridge->mask |= BRNF_NF_BRIDGE_PREROUTING;
+	skb->nf_bridge_state = BRNF_STATE_PREROUTING;
 	nf_bridge->physindev = skb->dev;
 	skb->dev = brnf_get_logical_dev(skb, skb->dev);
 
@@ -742,7 +744,7 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 	struct net_device *parent;
 	u_int8_t pf;
 
-	if (!skb->nf_bridge)
+	if (!skb->nf_bridge_state)
 		return NF_ACCEPT;
 
 	/* Need exclusive nf_bridge_info since we might have multiple
@@ -862,6 +864,8 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 	int frag_max_size;
 	unsigned int mtu_reserved, mtu;
 
+	skb->nf_bridge_state = BRNF_STATE_NONE;
+
 	if (skb_is_gso(skb) || skb->protocol != htons(ETH_P_IP))
 		return br_dev_queue_push_xmit(skb);
 
@@ -914,6 +918,8 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 #else
 static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 {
+	skb->nf_bridge_state = BRNF_STATE_NONE;
+
         return br_dev_queue_push_xmit(skb);
 }
 #endif
@@ -975,10 +981,8 @@ static unsigned int ip_sabotage_in(const struct nf_hook_ops *ops,
 				   const struct net_device *out,
 				   int (*okfn)(struct sk_buff *))
 {
-	if (skb->nf_bridge &&
-	    !(skb->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) {
+	if (skb->nf_bridge_state != BRNF_STATE_PREROUTING)
 		return NF_STOP;
-	}
 
 	return NF_ACCEPT;
 }
@@ -997,7 +1001,7 @@ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
 	skb_pull(skb, ETH_HLEN);
-	nf_bridge->mask &= ~BRNF_BRIDGED_DNAT;
+	skb->nf_bridge_state = BRNF_STATE_SEEN;
 
 	BUILD_BUG_ON(sizeof(nf_bridge->neigh_header) != (ETH_HLEN - ETH_ALEN));
 
@@ -1008,17 +1012,8 @@ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 	br_handle_frame_finish(skb);
 }
 
-static int br_nf_dev_xmit(struct sk_buff *skb)
-{
-	if (skb->nf_bridge && (skb->nf_bridge->mask & BRNF_BRIDGED_DNAT)) {
-		br_nf_pre_routing_finish_bridge_slow(skb);
-		return 1;
-	}
-	return 0;
-}
-
 static const struct nf_br_ops br_ops = {
-	.br_dev_xmit_hook =	br_nf_dev_xmit,
+	.br_dev_dnat_hook = br_nf_pre_routing_finish_bridge_slow,
 };
 
 void br_netfilter_enable(void)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index b46fa0c..7f22f09 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -764,7 +764,7 @@ static inline int br_vlan_enabled(struct net_bridge *br)
 #endif
 
 struct nf_br_ops {
-	int (*br_dev_xmit_hook)(struct sk_buff *skb);
+	void (*br_dev_dnat_hook)(struct sk_buff *skb);
 };
 extern const struct nf_br_ops __rcu *nf_br_ops;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index cdb939b..16ccbec 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -677,7 +677,8 @@ static void skb_release_head_state(struct sk_buff *skb)
 	nf_conntrack_put(skb->nfct);
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	nf_bridge_put(skb->nf_bridge);
+	if (skb->nf_bridge_state)
+		nf_bridge_put(skb->nf_bridge);
 #endif
 }
 
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 7e5ca6f..5b2096e 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -51,8 +51,7 @@ static enum ip_defrag_users nf_ct_defrag_user(unsigned int hooknum,
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (skb->nf_bridge &&
-	    skb->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)
+	if (skb->nf_bridge_state == BRNF_STATE_PREROUTING)
 		return IP_DEFRAG_CONNTRACK_BRIDGE_IN + zone;
 #endif
 	if (hooknum == NF_INET_PRE_ROUTING)
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index 3262e41..3be3f1a 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -145,7 +145,7 @@ void nf_send_reset(struct sk_buff *oldskb, int hook)
 	 * build the eth header using the original destination's MAC as the
 	 * source, and send the RST packet directly.
 	 */
-	if (oldskb->nf_bridge) {
+	if (oldskb->nf_bridge_state) {
 		struct ethhdr *oeth = eth_hdr(oldskb);
 
 		nskb->dev = nf_bridge_get_physindev(oldskb);
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index e70382e..01706d3 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -41,8 +41,7 @@ static enum ip6_defrag_users nf_ct6_defrag_user(unsigned int hooknum,
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (skb->nf_bridge &&
-	    skb->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)
+	if (skb->nf_bridge_state == BRNF_STATE_PREROUTING)
 		return IP6_DEFRAG_CONNTRACK_BRIDGE_IN + zone;
 #endif
 	if (hooknum == NF_INET_PRE_ROUTING)
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 94b4c6d..c313bb7 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -194,7 +194,7 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 	 * build the eth header using the original destination's MAC as the
 	 * source, and send the RST packet directly.
 	 */
-	if (oldskb->nf_bridge) {
+	if (oldskb->nf_bridge_state) {
 		struct ethhdr *oeth = eth_hdr(oldskb);
 
 		nskb->dev = nf_bridge_get_physindev(oldskb);
diff --git a/net/netfilter/nf_log_common.c b/net/netfilter/nf_log_common.c
index a5aa596..f433262 100644
--- a/net/netfilter/nf_log_common.c
+++ b/net/netfilter/nf_log_common.c
@@ -160,7 +160,7 @@ nf_log_dump_packet_common(struct nf_log_buf *m, u_int8_t pf,
 	       in ? in->name : "",
 	       out ? out->name : "");
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (skb->nf_bridge) {
+	if (skb->nf_bridge_state) {
 		const struct net_device *physindev;
 		const struct net_device *physoutdev;
 
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index fb045b4..2dba75c 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -54,7 +54,7 @@ void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 	if (entry->outdev)
 		dev_put(entry->outdev);
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (entry->skb->nf_bridge) {
+	if (entry->skb->nf_bridge_state) {
 		struct net_device *physdev;
 
 		physdev = nf_bridge_get_physindev(entry->skb);
@@ -81,7 +81,7 @@ bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
 	if (entry->outdev)
 		dev_hold(entry->outdev);
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (entry->skb->nf_bridge) {
+	if (entry->skb->nf_bridge_state) {
 		struct net_device *physdev;
 
 		physdev = nf_bridge_get_physindev(entry->skb);
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 94e1aaf..e26abe5 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -585,13 +585,13 @@ nf_queue_entry_dup(struct nf_queue_entry *e)
  */
 static void nf_bridge_adjust_skb_data(struct sk_buff *skb)
 {
-	if (skb->nf_bridge)
+	if (skb->nf_bridge_state)
 		__skb_push(skb, skb->network_header - skb->mac_header);
 }
 
 static void nf_bridge_adjust_segmented_data(struct sk_buff *skb)
 {
-	if (skb->nf_bridge)
+	if (skb->nf_bridge_state)
 		__skb_pull(skb, skb->network_header - skb->mac_header);
 }
 #else
@@ -773,7 +773,7 @@ dev_cmp(struct nf_queue_entry *entry, unsigned long ifindex)
 		if (entry->outdev->ifindex == ifindex)
 			return 1;
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (entry->skb->nf_bridge) {
+	if (entry->skb->nf_bridge_state) {
 		int physinif, physoutif;
 
 		physinif = nf_bridge_get_physinif(entry->skb);
diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index 1caaccb..535e4be 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -33,7 +33,7 @@ physdev_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	/* Not a bridged IP packet or no info available yet:
 	 * LOCAL_OUT/mangle and LOCAL_OUT/nat don't know if
 	 * the destination device will be a bridge. */
-	if (!skb->nf_bridge) {
+	if (skb->nf_bridge_state == 0) {
 		/* Return MATCH if the invert flags of the used options are on */
 		if ((info->bitmask & XT_PHYSDEV_OP_BRIDGED) &&
 		    !(info->invert & XT_PHYSDEV_OP_BRIDGED))
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 11/14] netfilter: bridge: remove skb->nf_bridge
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (9 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 10/14] netfilter: bridge: move bridge netfilter state into sk_buff Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 12/14] netfilter: bridge: discard nf_bridge info on xmit Florian Westphal
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

Instead of carrying around the nf_bridge metadata using
sk_buff, add a br_netfilter private storage table and do lookups
on demand.

sk_buff now only stores a 2bit bridge netfilter state to avoid the
nf_bridge_info lookups in the hot-path (i.e., if state field is zero
we know that there is no nf_bridge_info structure since skb isn't
bridged).

The nf_bridge_info data is stored in an rhashtable.
Key is the address of the skb that the nf_bridge_info is storing
data for.  Each clone has its own nf_bridge_info copy.

The deep copy on skb_clone() is best-effort: if allocation fails
the next processing step in the bridge netfilter chain will drop
the skb if it cannot find the needed metadata supposedly associated
with the skb.

SLAB_DESTROY_BY_RCU is used to avoid excess memory usage, we cannot
wait for rcu grace periods in packet processing path.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter.h        |   8 ++
 include/linux/netfilter_bridge.h |  49 ++++++++-
 include/linux/skbuff.h           |  51 ++-------
 net/bridge/br_netfilter.c        | 231 ++++++++++++++++++++++++++++++---------
 net/core/skbuff.c                |   2 +-
 net/netfilter/core.c             |  49 +++++++++
 6 files changed, 289 insertions(+), 101 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 2517ece..2385135 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -339,4 +339,12 @@ extern struct nfq_ct_hook __rcu *nfq_ct_hook;
 static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
 #endif
 
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+struct nf_br_hook {
+	void (*nf_br_destroy)(struct sk_buff *);
+	struct nf_bridge_info* (*nf_br_find)(const struct sk_buff *);
+	bool (*nf_br_copy)(struct sk_buff *dst, const struct sk_buff *src);
+};
+#endif
+
 #endif /*__LINUX_NETFILTER_H*/
diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index f2d7abc..621a2e4 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -3,6 +3,7 @@
 
 #include <uapi/linux/netfilter_bridge.h>
 #include <linux/skbuff.h>
+#include <linux/rhashtable.h>
 
 enum nf_br_hook_priorities {
 	NF_BR_PRI_FIRST = INT_MIN,
@@ -30,6 +31,22 @@ enum brnf_state {
 	BRNF_STATE_BRIDGED_DNAT,
 };
 
+struct nf_bridge_info {
+	struct rhash_head	node;
+	enum {
+		BRNF_PROTO_UNCHANGED,
+		BRNF_PROTO_8021Q,
+		BRNF_PROTO_PPPOE
+	} orig_proto;
+	bool			pkt_otherhost;
+	unsigned long		owner;
+	struct net_device	*physindev;
+	struct net_device	*physoutdev;
+	char			neigh_header[8];
+};
+
+struct nf_bridge_info *nf_bridge_find(const struct sk_buff *skb);
+
 int br_handle_frame_finish(struct sk_buff *skb);
 
 static inline void br_drop_fake_rtable(struct sk_buff *skb)
@@ -42,24 +59,48 @@ static inline void br_drop_fake_rtable(struct sk_buff *skb)
 
 static inline int nf_bridge_get_physinif(const struct sk_buff *skb)
 {
-	return skb->nf_bridge ? skb->nf_bridge->physindev->ifindex : 0;
+	struct nf_bridge_info *nf_bridge;
+
+	if (skb->nf_bridge_state == 0)
+		return 0;
+
+	nf_bridge = nf_bridge_find(skb);
+	return nf_bridge ? nf_bridge->physindev->ifindex : 0;
 }
 
 static inline int nf_bridge_get_physoutif(const struct sk_buff *skb)
 {
-	return skb->nf_bridge ? skb->nf_bridge->physoutdev->ifindex : 0;
+	struct nf_bridge_info *nf_bridge;
+
+	if (skb->nf_bridge_state == 0)
+		return 0;
+
+	nf_bridge = nf_bridge_find(skb);
+	return nf_bridge ? nf_bridge->physoutdev->ifindex : 0;
 }
 
 static inline struct net_device *
 nf_bridge_get_physindev(const struct sk_buff *skb)
 {
-	return skb->nf_bridge ? skb->nf_bridge->physindev : NULL;
+	struct nf_bridge_info *nf_bridge;
+
+	if (skb->nf_bridge_state == 0)
+		return NULL;
+
+	nf_bridge = nf_bridge_find(skb);
+	return nf_bridge ? nf_bridge->physindev : NULL;
 }
 
 static inline struct net_device *
 nf_bridge_get_physoutdev(const struct sk_buff *skb)
 {
-	return skb->nf_bridge ? skb->nf_bridge->physoutdev : NULL;
+	struct nf_bridge_info *nf_bridge;
+
+	if (skb->nf_bridge_state == 0)
+		return NULL;
+
+	nf_bridge = nf_bridge_find(skb);
+	return nf_bridge ? nf_bridge->physoutdev : NULL;
 }
 #else
 #define br_drop_fake_rtable(skb)	        do { } while (0)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c060db5..c596e4e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -163,21 +163,6 @@ struct nf_conntrack {
 };
 #endif
 
-#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-struct nf_bridge_info {
-	atomic_t		use;
-	enum {
-		BRNF_PROTO_UNCHANGED,
-		BRNF_PROTO_8021Q,
-		BRNF_PROTO_PPPOE
-	} orig_proto;
-	bool			pkt_otherhost;
-	struct net_device	*physindev;
-	struct net_device	*physoutdev;
-	char			neigh_header[8];
-};
-#endif
-
 struct sk_buff_head {
 	/* These two members must be first. */
 	struct sk_buff	*next;
@@ -482,7 +467,6 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
  *	@protocol: Packet protocol from driver
  *	@destructor: Destruct function
  *	@nfct: Associated connection, if any
- *	@nf_bridge: Saved data about a bridged frame - see br_netfilter.c
  *	@skb_iif: ifindex of device we arrived on
  *	@tc_index: Traffic control index
  *	@tc_verd: traffic control verdict
@@ -550,9 +534,6 @@ struct sk_buff {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	struct nf_conntrack	*nfct;
 #endif
-#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	struct nf_bridge_info	*nf_bridge;
-#endif
 	unsigned int		len,
 				data_len;
 	__u16			mac_len,
@@ -3164,17 +3145,10 @@ static inline void nf_conntrack_get(struct nf_conntrack *nfct)
 }
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
-{
-	if (nf_bridge && atomic_dec_and_test(&nf_bridge->use))
-		kfree(nf_bridge);
-}
-static inline void nf_bridge_get(struct nf_bridge_info *nf_bridge)
-{
-	if (nf_bridge)
-		atomic_inc(&nf_bridge->use);
-}
-#endif /* CONFIG_BRIDGE_NETFILTER */
+void nf_bridge_destroy(struct sk_buff *skb);
+bool nf_bridge_copy(struct sk_buff *dst, const struct sk_buff *src);
+#endif
+
 static inline void nf_reset(struct sk_buff *skb)
 {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
@@ -3183,8 +3157,7 @@ static inline void nf_reset(struct sk_buff *skb)
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (skb->nf_bridge_state) {
-		nf_bridge_put(skb->nf_bridge);
-		skb->nf_bridge = NULL;
+		nf_bridge_destroy(skb);
 		skb->nf_bridge_state = 0;
 	}
 #endif
@@ -3208,12 +3181,8 @@ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src,
 		dst->nfctinfo = src->nfctinfo;
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (src->nf_bridge_state) {
-		dst->nf_bridge = src->nf_bridge;
-		nf_bridge_get(src->nf_bridge);
-	} else {
-		dst->nf_bridge = NULL;
-	}
+	if (src->nf_bridge_state)
+		nf_bridge_copy(dst, src);
 #endif
 #if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE) || defined(CONFIG_NF_TABLES)
 	if (copy)
@@ -3227,10 +3196,8 @@ static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src)
 	nf_conntrack_put(dst->nfct);
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (dst->nf_bridge_state) {
-		nf_bridge_put(dst->nf_bridge);
-		dst->nf_bridge = NULL;
-	}
+	if (dst->nf_bridge_state)
+		nf_bridge_destroy(dst);
 #endif
 	__nf_copy(dst, src, true);
 }
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 832164e..3f1f920 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -25,12 +25,14 @@
 #include <linux/if_vlan.h>
 #include <linux/if_pppox.h>
 #include <linux/ppp_defs.h>
+#include <linux/netfilter.h>
 #include <linux/netfilter_bridge.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_ipv6.h>
 #include <linux/netfilter_arp.h>
 #include <linux/in_route.h>
 #include <linux/inetdevice.h>
+#include <linux/rhashtable.h>
 
 #include <net/ip.h>
 #include <net/ipv6.h>
@@ -124,9 +126,77 @@ struct brnf_frag_data {
 static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage);
 #endif
 
+extern const struct nf_br_hook __rcu *nf_br_hook;
+static struct kmem_cache *nf_bridge_cachep __read_mostly;
+static struct rhashtable nf_bridge_info_table;
+
+static const struct rhashtable_params nf_bridge_info_params = {
+	.head_offset = offsetof(struct nf_bridge_info, node),
+	.key_offset = offsetof(struct nf_bridge_info, owner),
+	.key_len = FIELD_SIZEOF(struct nf_bridge_info, owner),
+	.hashfn = jhash,
+	.nulls_base = (3U << RHT_BASE_SHIFT),
+};
+
+/* must be called with rcu read lock held */
 static struct nf_bridge_info *nf_bridge_info_get(const struct sk_buff *skb)
 {
-	return skb->nf_bridge;
+	unsigned long key = (unsigned long)skb;
+	struct nf_bridge_info *nf_bridge;
+
+	nf_bridge = rhashtable_lookup_fast(&nf_bridge_info_table, &key,
+					   nf_bridge_info_params);
+
+	WARN_ON_ONCE(!nf_bridge && skb->nf_bridge_state);
+
+	return nf_bridge;
+}
+
+static void nf_bridge_info_free(struct nf_bridge_info *info)
+{
+	kmem_cache_free(nf_bridge_cachep, info);
+}
+
+/* must be called with rcu read lock held */
+static bool nf_bridge_info_copy(struct sk_buff *dst, const struct sk_buff *src)
+{
+	struct nf_bridge_info *info, *newinfo;
+
+	info = nf_bridge_info_get(src);
+	if (WARN_ON_ONCE(!info))
+		return false;
+
+	newinfo = kmem_cache_alloc(nf_bridge_cachep, GFP_ATOMIC);
+	if (!newinfo)
+		return false;
+
+	memcpy(newinfo, info, sizeof(*newinfo));
+
+	newinfo->owner = (unsigned long)dst;
+
+	if (rhashtable_insert_fast(&nf_bridge_info_table, &newinfo->node,
+				   nf_bridge_info_params) == 0)
+		return true;
+
+	nf_bridge_info_free(newinfo);
+	return false;
+}
+
+static void nf_bridge_info_del(struct sk_buff *skb)
+{
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
+
+	if (nf_bridge) {
+		int err = rhashtable_remove_fast(&nf_bridge_info_table,
+						 &nf_bridge->node,
+						 nf_bridge_info_params);
+		WARN_ON(err);
+
+		if (err == 0)
+			nf_bridge_info_free(nf_bridge);
+	} else {
+		WARN_ON_ONCE(1);
+	}
 }
 
 static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
@@ -147,31 +217,25 @@ static inline struct net_device *bridge_parent(const struct net_device *dev)
 
 static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
 {
-	skb->nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC);
+	struct nf_bridge_info *nf_bridge;
 
-	if (likely(skb->nf_bridge)) {
-		atomic_set(&(skb->nf_bridge->use), 1);
-		skb->nf_bridge_state = BRNF_STATE_SEEN;
-	}
+	nf_bridge = kmem_cache_alloc(nf_bridge_cachep, GFP_ATOMIC);
+	if (!nf_bridge)
+		return NULL;
 
-	return skb->nf_bridge;
-}
+	skb->nf_bridge_state = BRNF_STATE_SEEN;
 
-static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb)
-{
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	nf_bridge->orig_proto = BRNF_PROTO_UNCHANGED;
+	nf_bridge->pkt_otherhost = false;
+	nf_bridge->owner = (unsigned long)skb;
+	nf_bridge->physoutdev = NULL;
 
-	if (atomic_read(&nf_bridge->use) > 1) {
-		struct nf_bridge_info *tmp = nf_bridge_alloc(skb);
+	if (rhashtable_insert_fast(&nf_bridge_info_table, &nf_bridge->node,
+				   nf_bridge_info_params) == 0)
+		return nf_bridge;
 
-		if (tmp) {
-			memcpy(tmp, nf_bridge, sizeof(struct nf_bridge_info));
-			atomic_set(&tmp->use, 1);
-		}
-		nf_bridge_put(nf_bridge);
-		nf_bridge = tmp;
-	}
-	return nf_bridge;
+	kmem_cache_free(nf_bridge_cachep, nf_bridge);
+	return NULL;
 }
 
 static unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb)
@@ -263,9 +327,10 @@ drop:
 	return -1;
 }
 
-static void nf_bridge_update_protocol(struct sk_buff *skb)
+static void nf_bridge_update_protocol(struct sk_buff *skb,
+				      const struct nf_bridge_info *nf_bridge)
 {
-	switch (skb->nf_bridge->orig_proto) {
+	switch (nf_bridge->orig_proto) {
 	case BRNF_PROTO_8021Q:
 		skb->protocol = htons(ETH_P_8021Q);
 		break;
@@ -285,6 +350,11 @@ static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
 	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct rtable *rt;
 
+	if (WARN_ON_ONCE(!nf_bridge)) {
+		kfree_skb(skb);
+		return 0;
+	}
+
 	if (nf_bridge->pkt_otherhost) {
 		skb->pkt_type = PACKET_OTHERHOST;
 		nf_bridge->pkt_otherhost = false;
@@ -299,7 +369,7 @@ static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
 	skb_dst_set_noref(skb, &rt->dst);
 
 	skb->dev = nf_bridge->physindev;
-	nf_bridge_update_protocol(skb);
+	nf_bridge_update_protocol(skb, nf_bridge);
 	nf_bridge_push_encap_header(skb);
 	NF_HOOK_THRESH(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
 		       br_handle_frame_finish, 1);
@@ -326,6 +396,9 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 		struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 		int ret;
 
+		if (!nf_bridge)
+			goto free_skb;
+
 		if (neigh->hh.hh_len) {
 			neigh_hh_bridge(&neigh->hh, skb);
 			skb->dev = nf_bridge->physindev;
@@ -414,6 +487,9 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
 	int err;
 	int frag_max_size;
 
+	if (WARN_ON_ONCE(!nf_bridge))
+		goto free_skb;
+
 	frag_max_size = IPCB(skb)->frag_max_size;
 	BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
 
@@ -454,7 +530,7 @@ free_skb:
 			if (skb_dst(skb)->dev == dev) {
 bridged_dnat:
 				skb->dev = nf_bridge->physindev;
-				nf_bridge_update_protocol(skb);
+				nf_bridge_update_protocol(skb, nf_bridge);
 				nf_bridge_push_encap_header(skb);
 				NF_HOOK_THRESH(NFPROTO_BRIDGE,
 					       NF_BR_PRE_ROUTING,
@@ -476,7 +552,7 @@ bridged_dnat:
 	}
 
 	skb->dev = nf_bridge->physindev;
-	nf_bridge_update_protocol(skb);
+	nf_bridge_update_protocol(skb, nf_bridge);
 	nf_bridge_push_encap_header(skb);
 	NF_HOOK_THRESH(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
 		       br_handle_frame_finish, 1);
@@ -503,6 +579,12 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb)
 {
 	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
+	if (!nf_bridge) {
+		nf_bridge = nf_bridge_alloc(skb);
+		if (!nf_bridge)
+			return NULL;
+	}
+
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
 		nf_bridge->pkt_otherhost = true;
@@ -610,9 +692,6 @@ static unsigned int br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops,
 	if (hdr->nexthdr == NEXTHDR_HOP && check_hbh_len(skb))
 		return NF_DROP;
 
-	nf_bridge_put(skb->nf_bridge);
-	if (!nf_bridge_alloc(skb))
-		return NF_DROP;
 	if (!setup_pre_routing(skb))
 		return NF_DROP;
 
@@ -666,9 +745,6 @@ static unsigned int br_nf_pre_routing(const struct nf_hook_ops *ops,
 	if (br_parse_ip_options(skb))
 		return NF_DROP;
 
-	nf_bridge_put(skb->nf_bridge);
-	if (!nf_bridge_alloc(skb))
-		return NF_DROP;
 	if (!setup_pre_routing(skb))
 		return NF_DROP;
 
@@ -701,12 +777,17 @@ static unsigned int br_nf_local_in(const struct nf_hook_ops *ops,
 /* PF_BRIDGE/FORWARD *************************************************/
 static int br_nf_forward_finish(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct net_device *in;
 
 	if (!IS_ARP(skb) && !IS_VLAN_ARP(skb)) {
+		struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 		int frag_max_size;
 
+		if (WARN_ON_ONCE(!nf_bridge)) {
+			kfree_skb(skb);
+			return 0;
+		}
+
 		if (skb->protocol == htons(ETH_P_IP)) {
 			frag_max_size = IPCB(skb)->frag_max_size;
 			BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
@@ -717,7 +798,7 @@ static int br_nf_forward_finish(struct sk_buff *skb)
 			skb->pkt_type = PACKET_OTHERHOST;
 			nf_bridge->pkt_otherhost = false;
 		}
-		nf_bridge_update_protocol(skb);
+		nf_bridge_update_protocol(skb, nf_bridge);
 	} else {
 		in = *((struct net_device **)(skb->cb));
 	}
@@ -747,11 +828,6 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 	if (!skb->nf_bridge_state)
 		return NF_ACCEPT;
 
-	/* Need exclusive nf_bridge_info since we might have multiple
-	 * different physoutdevs. */
-	if (!nf_bridge_unshare(skb))
-		return NF_DROP;
-
 	nf_bridge = nf_bridge_info_get(skb);
 	if (!nf_bridge)
 		return NF_DROP;
@@ -851,9 +927,10 @@ static int br_nf_push_frag_xmit(struct sk_buff *skb)
 	return br_dev_queue_push_xmit(skb);
 }
 
-static unsigned int nf_bridge_mtu_reduction(const struct sk_buff *skb)
+static unsigned int
+nf_bridge_mtu_reduction(const struct nf_bridge_info *nf_bridge)
 {
-	if (skb->nf_bridge->orig_proto == BRNF_PROTO_PPPOE)
+	if (nf_bridge->orig_proto == BRNF_PROTO_PPPOE)
 		return PPPOE_SES_HLEN;
 	return 0;
 }
@@ -863,13 +940,16 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 	int ret;
 	int frag_max_size;
 	unsigned int mtu_reserved, mtu;
-
-	skb->nf_bridge_state = BRNF_STATE_NONE;
+	struct nf_bridge_info *nf_bridge;
 
 	if (skb_is_gso(skb) || skb->protocol != htons(ETH_P_IP))
 		return br_dev_queue_push_xmit(skb);
 
-	mtu_reserved = nf_bridge_mtu_reduction(skb);
+	nf_bridge = nf_bridge_info_get(skb);
+	if (!nf_bridge)
+		goto err_out;
+
+	mtu_reserved = nf_bridge_mtu_reduction(nf_bridge);
 	mtu = min(skb->dev->mtu, IP_MAX_MTU) - mtu_reserved;
 
 	/* This is wrong! We should preserve the original fragment
@@ -895,7 +975,7 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 		if (br_parse_ip_options(skb))
 			goto err_out;
 
-		nf_bridge_update_protocol(skb);
+		nf_bridge_update_protocol(skb, nf_bridge);
 
 		data = this_cpu_ptr(&brnf_frag_data_storage);
 		data->encap_size = nf_bridge_encap_header_len(skb);
@@ -918,9 +998,7 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 #else
 static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 {
-	skb->nf_bridge_state = BRNF_STATE_NONE;
-
-        return br_dev_queue_push_xmit(skb);
+	return br_dev_queue_push_xmit(skb);
 }
 #endif
 
@@ -931,16 +1009,22 @@ static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops,
 				       const struct net_device *out,
 				       int (*okfn)(struct sk_buff *))
 {
-	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
+	struct nf_bridge_info *nf_bridge;
 	struct net_device *realoutdev = bridge_parent(skb->dev);
 	u_int8_t pf;
 
-	/* if nf_bridge is set, but ->physoutdev is NULL, this packet came in
-	 * on a bridge, but was delivered locally and is now being routed:
-	 *
+	if (!skb->nf_bridge_state) /* locally generated */
+		return NF_ACCEPT;
+
+	nf_bridge = nf_bridge_info_get(skb);
+	if (!nf_bridge)
+		return NF_DROP;
+
+	/* if ->physoutdev is NULL, this packet came in on a bridge, but
+	 * was delivered locally and is now being routed.
 	 * POST_ROUTING was already invoked from the ip stack.
 	 */
-	if (!nf_bridge || !nf_bridge->physoutdev)
+	if (!nf_bridge->physoutdev)
 		return NF_ACCEPT;
 
 	if (!realoutdev)
@@ -1000,6 +1084,11 @@ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 {
 	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
+	if (!nf_bridge) {
+		kfree_skb(skb);
+		return;
+	}
+
 	skb_pull(skb, ETH_HLEN);
 	skb->nf_bridge_state = BRNF_STATE_SEEN;
 
@@ -1016,6 +1105,12 @@ static const struct nf_br_ops br_ops = {
 	.br_dev_dnat_hook = br_nf_pre_routing_finish_bridge_slow,
 };
 
+static const struct nf_br_hook br_hook = {
+	.nf_br_destroy = nf_bridge_info_del,
+	.nf_br_find = nf_bridge_info_get,
+	.nf_br_copy = nf_bridge_info_copy,
+};
+
 void br_netfilter_enable(void)
 {
 }
@@ -1140,24 +1235,45 @@ static int __init br_netfilter_init(void)
 {
 	int ret;
 
+	nf_bridge_cachep = kmem_cache_create("nf_bridge_info",
+					     sizeof(struct nf_bridge_info),
+					     0, SLAB_DESTROY_BY_RCU, NULL);
+
+	ret = rhashtable_init(&nf_bridge_info_table, &nf_bridge_info_params);
+	if (ret < 0) {
+		kmem_cache_destroy(nf_bridge_cachep);
+		return ret;
+	}
+
 	ret = nf_register_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
-	if (ret < 0)
+	if (ret < 0) {
+		rhashtable_destroy(&nf_bridge_info_table);
+		kmem_cache_destroy(nf_bridge_cachep);
 		return ret;
+	}
 
 #ifdef CONFIG_SYSCTL
 	brnf_sysctl_header = register_net_sysctl(&init_net, "net/bridge", brnf_table);
 	if (brnf_sysctl_header == NULL) {
 		printk(KERN_WARNING
 		       "br_netfilter: can't register to sysctl.\n");
+		rhashtable_destroy(&nf_bridge_info_table);
+		kmem_cache_destroy(nf_bridge_cachep);
 		nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
 		return -ENOMEM;
 	}
 #endif
 	RCU_INIT_POINTER(nf_br_ops, &br_ops);
+	RCU_INIT_POINTER(nf_br_hook, &br_hook);
 	printk(KERN_NOTICE "Bridge firewalling registered\n");
 	return 0;
 }
 
+static void __exit nf_bridge_info_free_destroy(void *a, void *unused)
+{
+	nf_bridge_info_free(a);
+}
+
 static void __exit br_netfilter_fini(void)
 {
 	RCU_INIT_POINTER(nf_br_ops, NULL);
@@ -1165,6 +1281,13 @@ static void __exit br_netfilter_fini(void)
 #ifdef CONFIG_SYSCTL
 	unregister_net_sysctl_table(brnf_sysctl_header);
 #endif
+	RCU_INIT_POINTER(nf_br_hook, NULL);
+
+	synchronize_net();
+
+	rhashtable_free_and_destroy(&nf_bridge_info_table,
+				    nf_bridge_info_free_destroy, NULL);
+	kmem_cache_destroy(nf_bridge_cachep);
 }
 
 module_init(br_netfilter_init);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 16ccbec..ae249048 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -678,7 +678,7 @@ static void skb_release_head_state(struct sk_buff *skb)
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (skb->nf_bridge_state)
-		nf_bridge_put(skb->nf_bridge);
+		nf_bridge_destroy(skb);
 #endif
 }
 
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fea9ef5..85efbfc 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -269,6 +269,55 @@ EXPORT_SYMBOL_GPL(nfq_ct_nat_hook);
 
 #endif /* CONFIG_NF_CONNTRACK */
 
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+struct nf_br_hook __rcu *nf_br_hook __read_mostly;
+EXPORT_SYMBOL_GPL(nf_br_hook);
+
+void nf_bridge_destroy(struct sk_buff *skb)
+{
+	struct nf_br_hook *h;
+
+	rcu_read_lock();
+	h = rcu_dereference(nf_br_hook);
+	if (h)
+		h->nf_br_destroy(skb);
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(nf_bridge_destroy);
+
+struct nf_bridge_info *nf_bridge_find(const struct sk_buff *skb)
+{
+	struct nf_bridge_info *info = NULL;
+	struct nf_br_hook *h;
+
+	rcu_read_lock();
+	h = rcu_dereference(nf_br_hook);
+	if (h)
+		info = h->nf_br_find(skb);
+	rcu_read_unlock();
+
+	return info;
+}
+EXPORT_SYMBOL_GPL(nf_bridge_find);
+
+bool nf_bridge_copy(struct sk_buff *dst, const struct sk_buff *src)
+{
+	struct nf_br_hook *h;
+	bool ret = false;
+
+	rcu_read_lock();
+	h = rcu_dereference(nf_br_hook);
+	if (!h)
+		goto out;
+
+	ret = h->nf_br_copy(dst, src);
+ out:
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL(nf_bridge_copy);
+#endif
+
 #ifdef CONFIG_NF_NAT_NEEDED
 void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
 EXPORT_SYMBOL(nf_nat_decode_session_hook);
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 12/14] netfilter: bridge: discard nf_bridge info on xmit
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (10 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 11/14] netfilter: bridge: remove skb->nf_bridge Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 13/14] netfilter: bridge: neigh_head and physoutdev can't be used at same time Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 14/14] netfilter: bridge: hold physinport ref during neigh resolution Florian Westphal
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

nf_bridge information is only needed for -m physdev, so we can always
dismantle this information after POST_ROUTING if we know we're the only
owner.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_netfilter.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 3f1f920..715157c 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -199,6 +199,14 @@ static void nf_bridge_info_del(struct sk_buff *skb)
 	}
 }
 
+static void nf_bridge_info_drop(struct sk_buff *skb)
+{
+	if (!skb_shared(skb)) {
+		nf_bridge_info_del(skb);
+		skb->nf_bridge_state = BRNF_STATE_NONE;
+	}
+}
+
 static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
 {
 	struct net_bridge_port *port;
@@ -924,6 +932,7 @@ static int br_nf_push_frag_xmit(struct sk_buff *skb)
 	skb_copy_to_linear_data_offset(skb, -data->size, data->mac, data->size);
 	__skb_push(skb, data->encap_size);
 
+	nf_bridge_info_drop(skb);
 	return br_dev_queue_push_xmit(skb);
 }
 
@@ -942,8 +951,10 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 	unsigned int mtu_reserved, mtu;
 	struct nf_bridge_info *nf_bridge;
 
-	if (skb_is_gso(skb) || skb->protocol != htons(ETH_P_IP))
+	if (skb_is_gso(skb) || skb->protocol != htons(ETH_P_IP)) {
+		nf_bridge_info_drop(skb);
 		return br_dev_queue_push_xmit(skb);
+	}
 
 	nf_bridge = nf_bridge_info_get(skb);
 	if (!nf_bridge)
@@ -986,6 +997,7 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 
 		ret = ip_fragment(skb, mtu, br_nf_push_frag_xmit);
 	} else {
+		nf_bridge_info_drop(skb);
 		ret = br_dev_queue_push_xmit(skb);
 	}
 
@@ -998,6 +1010,7 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 #else
 static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 {
+	nf_bridge_info_drop(skb);
 	return br_dev_queue_push_xmit(skb);
 }
 #endif
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 13/14] netfilter: bridge: neigh_head and physoutdev can't be used at same time
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (11 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 12/14] netfilter: bridge: discard nf_bridge info on xmit Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  2015-04-01 20:36 ` [PATCH nf-next 14/14] netfilter: bridge: hold physinport ref during neigh resolution Florian Westphal
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

The neigh_header is only needed when we detect DNAT after prerouting
and neigh cache didn't have a mac address for us.

The output port has not been chosen yet so we can re-use its memory,
bringing size of meta data down to 40 bytes on x86_64.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter_bridge.h | 7 +++++--
 net/bridge/br_netfilter.c        | 2 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 621a2e4..c31024a 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -41,8 +41,11 @@ struct nf_bridge_info {
 	bool			pkt_otherhost;
 	unsigned long		owner;
 	struct net_device	*physindev;
-	struct net_device	*physoutdev;
-	char			neigh_header[8];
+
+	union {
+		struct net_device *physoutdev;
+		char neigh_header[8];
+	};
 };
 
 struct nf_bridge_info *nf_bridge_find(const struct sk_buff *skb);
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 715157c..0730be6 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -1111,6 +1111,8 @@ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 				       nf_bridge->neigh_header,
 				       ETH_HLEN - ETH_ALEN);
 	skb->dev = nf_bridge->physindev;
+	nf_bridge->physoutdev = NULL;
+
 	br_handle_frame_finish(skb);
 }
 
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 14/14] netfilter: bridge: hold physinport ref during neigh resolution
  2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
                   ` (12 preceding siblings ...)
  2015-04-01 20:36 ` [PATCH nf-next 13/14] netfilter: bridge: neigh_head and physoutdev can't be used at same time Florian Westphal
@ 2015-04-01 20:36 ` Florian Westphal
  13 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-01 20:36 UTC (permalink / raw)
  To: netfilter-devel, netdev; +Cc: Florian Westphal

This problem existed forever: when performing DNAT resolution we keep
physinport around for later use as skb->dev .

We should ensure it cannot go away.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter_bridge.h |  1 +
 net/bridge/br_netfilter.c        | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index c31024a..8a6a4af 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -39,6 +39,7 @@ struct nf_bridge_info {
 		BRNF_PROTO_PPPOE
 	} orig_proto;
 	bool			pkt_otherhost;
+	bool			physin_ref;
 	unsigned long		owner;
 	struct net_device	*physindev;
 
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 0730be6..ae03117 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -154,6 +154,8 @@ static struct nf_bridge_info *nf_bridge_info_get(const struct sk_buff *skb)
 
 static void nf_bridge_info_free(struct nf_bridge_info *info)
 {
+	if (info->physin_ref)
+		dev_put(info->physindev);
 	kmem_cache_free(nf_bridge_cachep, info);
 }
 
@@ -174,6 +176,9 @@ static bool nf_bridge_info_copy(struct sk_buff *dst, const struct sk_buff *src)
 
 	newinfo->owner = (unsigned long)dst;
 
+	if (newinfo->physin_ref)
+		dev_hold(newinfo->physindev);
+
 	if (rhashtable_insert_fast(&nf_bridge_info_table, &newinfo->node,
 				   nf_bridge_info_params) == 0)
 		return true;
@@ -235,6 +240,7 @@ static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
 
 	nf_bridge->orig_proto = BRNF_PROTO_UNCHANGED;
 	nf_bridge->pkt_otherhost = false;
+	nf_bridge->physin_ref = false;
 	nf_bridge->owner = (unsigned long)skb;
 	nf_bridge->physoutdev = NULL;
 
@@ -422,6 +428,10 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 							 ETH_HLEN-ETH_ALEN);
 			/* tell br_dev_xmit to continue with forwarding */
 			skb->nf_bridge_state = BRNF_STATE_BRIDGED_DNAT;
+
+			nf_bridge->physin_ref = true;
+			dev_hold(nf_bridge->physindev);
+
 			ret = neigh->output(neigh, skb);
 		}
 		neigh_release(neigh);
@@ -1113,6 +1123,11 @@ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 	skb->dev = nf_bridge->physindev;
 	nf_bridge->physoutdev = NULL;
 
+	WARN_ON_ONCE(!nf_bridge->physin_ref);
+
+	dev_put(nf_bridge->physindev);
+	nf_bridge->physin_ref = false;
+
 	br_handle_frame_finish(skb);
 }
 
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH nf-next 02/14] net: untangle ip_fragment and bridge netfilter
  2015-04-01 20:36 ` [PATCH nf-next 02/14] net: untangle ip_fragment and bridge netfilter Florian Westphal
@ 2015-04-02  3:09   ` David Miller
  2015-04-02 12:16     ` Florian Westphal
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2015-04-02  3:09 UTC (permalink / raw)
  To: fw; +Cc: netfilter-devel, netdev

From: Florian Westphal <fw@strlen.de>
Date: Wed,  1 Apr 2015 22:36:28 +0200

> Add mtu arguments to ip_fragment and remove the bridge netfilter mtu
> helper.

I told you I disagree with this approach.  Anything that adds
an 'mtu' argument to ip_fragment() I am not even going to look
at seriously, there must be device context when you call that
function.

Furthermore, and even more importantly, right now what bridge
netfilter does with fragmentation is _terminally_ broken.  It
absolutely does not guarantee to preserve the geometry of the incoming
fragment stream.

This is why you must use something like GRO/GSO, which is built
to positively and provably preserve the geometry of SKBs as they
are packed and unpacked.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING
  2015-04-01 20:36 ` [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Florian Westphal
@ 2015-04-02  8:53   ` Pablo Neira Ayuso
  2015-04-02  8:54     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-02  8:53 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Wed, Apr 01, 2015 at 10:36:27PM +0200, Florian Westphal wrote:
> We also need to save/store in forward, else br_parse_ip_options call
> will zero frag_max_size as well.

I have only applied this fix to nf-next. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING
  2015-04-02  8:53   ` Pablo Neira Ayuso
@ 2015-04-02  8:54     ` Pablo Neira Ayuso
  0 siblings, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-02  8:54 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev, David Miller

On Thu, Apr 02, 2015 at 10:53:22AM +0200, Pablo Neira Ayuso wrote:
> On Wed, Apr 01, 2015 at 10:36:27PM +0200, Florian Westphal wrote:
> > We also need to save/store in forward, else br_parse_ip_options call
> > will zero frag_max_size as well.
> 
> I have only applied this fix to nf-next. Thanks.

BTW, you can resubmit patches in case they are independent of the
ip_fragment() change in first place. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nf-next 02/14] net: untangle ip_fragment and bridge netfilter
  2015-04-02  3:09   ` David Miller
@ 2015-04-02 12:16     ` Florian Westphal
  0 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-04-02 12:16 UTC (permalink / raw)
  To: David Miller; +Cc: fw, netfilter-devel, netdev

David Miller <davem@davemloft.net> wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Wed,  1 Apr 2015 22:36:28 +0200
> 
> > Add mtu arguments to ip_fragment and remove the bridge netfilter mtu
> > helper.
> 
> I told you I disagree with this approach.  Anything that adds
> an 'mtu' argument to ip_fragment() I am not even going to look
> at seriously, there must be device context when you call that
> function.

Not sure.  There is one case where we must not use device mtu:
if DF was set on one of the fragments, we must not increase fragment
size, thats why I added the MTU argument to make sure that largest
fragment size will be the upper boundary.

I don't see where we break PMTUD or induce any other kind of
breakage after this change.

For the non-df case the original sizes of the fragments
don't matter since any device in the path will (re)fragment.

> Furthermore, and even more importantly, right now what bridge
> netfilter does with fragmentation is _terminally_ broken.

I didn't claim it wasn't :-]

> This is why you must use something like GRO/GSO, which is built
> to positively and provably preserve the geometry of SKBs as they
> are packed and unpacked.

Thats not as trivial as it sounds.
GRO only aggregates same-size packets.

All the fragments might have different sizes.

Futhermore we need to handle overlapping fragments, duplicates and
arrival of fragments in any order.

Preserving the original skbs and sending those instead of
refragmentation doesn't work either since the reassembled skb might have
been subject to NAT.

So the only possible compromise I see is to record/store all fragment sizes
in the correct logical order (according to offset) and use that as
'replay split points', i.e. we'd still end up not using the device mtu
when undoing the defrag operation.

We'd also not have full transparency unless we would also re-create
the overlapping and duplicate packets that we might have seen, and
send the refragmented skbs in the same order as we received them.

I don't think thats something we want in ip stack, so we'd have
to (re)implement ip (de|re)-fragmentation for the bridge in bridge
netfilter.

Is that what you had in mind?

Or do you see some way to re-use existing implementation without
fuglifying normal defrag operations?

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-04-02 12:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-01 20:36 [PATCH nf-next 00/14] get rid of skb->nf_bridge pointer Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 01/14] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Florian Westphal
2015-04-02  8:53   ` Pablo Neira Ayuso
2015-04-02  8:54     ` Pablo Neira Ayuso
2015-04-01 20:36 ` [PATCH nf-next 02/14] net: untangle ip_fragment and bridge netfilter Florian Westphal
2015-04-02  3:09   ` David Miller
2015-04-02 12:16     ` Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 03/14] netfilter: bridge: don't use nf_bridge_info data to store mac header Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 04/14] netfilter: bridge: start splitting mask into public/private chunks Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 05/14] netfilter: bridge: make BRNF_PKT_TYPE flag a bool Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 06/14] netfilter: bridge: rename and resize 'data' field Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 07/14] netfilter: bridge: add helpers for fetching physin/outdev Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 08/14] netfilter: physdev: use helpers Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 09/14] netfilter: bridge: add and use nf_bridge_info_get helper Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 10/14] netfilter: bridge: move bridge netfilter state into sk_buff Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 11/14] netfilter: bridge: remove skb->nf_bridge Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 12/14] netfilter: bridge: discard nf_bridge info on xmit Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 13/14] netfilter: bridge: neigh_head and physoutdev can't be used at same time Florian Westphal
2015-04-01 20:36 ` [PATCH nf-next 14/14] netfilter: bridge: hold physinport ref during neigh resolution Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.