All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] RCU: introduce noref debug
@ 2017-10-06 12:57 Paolo Abeni
  2017-10-06 12:57 ` [PATCH 1/4] rcu: " Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

The networking subsystem is currently using some kind of long-lived
RCU-protected, references to avoid the overhead of full book-keeping.

Such references - skb_dst() noref - are stored inside the skbs and can be
moved across relevant slices of the network stack, with the users
being in charge of properly clearing the relevant skb - or properly refcount
the related dst references - before the skb escapes the RCU section.

We currently don't have any deterministic debug infrastructure to check
the dst noref usages - and the introduction of others noref artifact is
currently under discussion.

This series tries to tackle the above introducing an RCU debug infrastructure
aimed at spotting incorrect noref pointer usage, in patch one. The
infrastructure is small and must be explicitly enabled via a newly introduced
build option.

Patch two uses such infrastructure to track dst noref usage in the networking
stack.

Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
on basic scenarios.

Paolo Abeni (4):
  rcu: introduce noref debug
  net: use RCU noref infrastructure to track dst noref
  ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref()
  tcp: avoid noref dst leak on input path

 include/linux/rcupdate.h | 11 ++++++
 include/linux/skbuff.h   |  1 +
 include/net/dst.h        |  5 +++
 kernel/rcu/Kconfig.debug | 15 ++++++++
 kernel/rcu/Makefile      |  1 +
 kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/route.c         |  7 +---
 net/ipv4/tcp_input.c     |  5 ++-
 8 files changed, 127 insertions(+), 7 deletions(-)
 create mode 100644 kernel/rcu/noref_debug.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] rcu: introduce noref debug
  2017-10-06 12:57 [PATCH 0/4] RCU: introduce noref debug Paolo Abeni
@ 2017-10-06 12:57 ` Paolo Abeni
  2017-10-06 14:13   ` Steven Rostedt
  2017-10-06 12:57 ` [PATCH 2/4] net: use RCU noref infrastructure to track dst noref Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

We currently lack a debugging infrastructure to ensure that
long-lived noref dst are properly handled - namely dropped
or converted to a refcounted version before escaping the current
RCU section.

This changeset implements such infra, tracking the noref pointer
on a per CPU store and checking that all noref tracked at any
given RCU nesting level are cleared before leaving such RCU
section.

Each noref entity is identified using the entity pointer and an
additional, optional, key/opaque pointer. This is needed to cope
with usage case scenario where the same noref entity is stored
in different places (e.g. noref dst on skb_clone()).

To keep the patch/implementation simple RCU_NOREF_DEBUG depends
on PREEMPT_RCU=n in Kconfig.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/rcupdate.h | 11 ++++++
 kernel/rcu/Kconfig.debug | 15 ++++++++
 kernel/rcu/Makefile      |  1 +
 kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+)
 create mode 100644 kernel/rcu/noref_debug.c

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de50d8a4cf41..20c1ce08e3eb 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -60,6 +60,15 @@ void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
 void rcu_barrier_tasks(void);
 
+#ifdef CONFIG_RCU_NOREF_DEBUG
+void rcu_track_noref(void *key, void *noref, bool track);
+void __rcu_check_noref(void);
+
+#else
+static inline void rcu_track_noref(void *key, void *noref, bool add) { }
+static inline void __rcu_check_noref(void) { }
+#endif
+
 #ifdef CONFIG_PREEMPT_RCU
 
 void __rcu_read_lock(void);
@@ -85,6 +94,7 @@ static inline void __rcu_read_lock(void)
 
 static inline void __rcu_read_unlock(void)
 {
+	__rcu_check_noref();
 	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
 		preempt_enable();
 }
@@ -723,6 +733,7 @@ static inline void rcu_read_unlock_bh(void)
 			 "rcu_read_unlock_bh() used illegally while idle");
 	rcu_lock_release(&rcu_bh_lock_map);
 	__release(RCU_BH);
+	__rcu_check_noref();
 	local_bh_enable();
 }
 
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 0ec7d1d33a14..6c7f52a3e809 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -68,6 +68,21 @@ config RCU_TRACE
 	  Say Y here if you want to enable RCU tracing
 	  Say N if you are unsure.
 
+config RCU_NOREF_DEBUG
+	bool "Debugging for RCU-protected noref entries"
+	depends on PREEMPT_RCU=n
+	select PREEMPT_COUNT
+	default n
+	help
+	  In case of long lasting rcu_read_lock sections this debug
+	  feature enables one to ensure that no rcu managed dereferenced
+	  data leaves the locked section. One use case is the tracking
+	  of dst_entries in struct sk_buff ->dst, which is used to pass
+	  the dst_entry around in the whole stack while under RCU lock.
+
+	  Say Y here if you want to enable noref RCU debugging
+	  Say N if you are unsure.
+
 config RCU_EQS_DEBUG
 	bool "Provide debugging asserts for adding NO_HZ support to an arch"
 	depends on DEBUG_KERNEL
diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
index 13c0fc852767..c67d7c65c582 100644
--- a/kernel/rcu/Makefile
+++ b/kernel/rcu/Makefile
@@ -11,3 +11,4 @@ obj-$(CONFIG_TREE_RCU) += tree.o
 obj-$(CONFIG_PREEMPT_RCU) += tree.o
 obj-$(CONFIG_TINY_RCU) += tiny.o
 obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
+obj-$(CONFIG_RCU_NOREF_DEBUG) += noref_debug.o
\ No newline at end of file
diff --git a/kernel/rcu/noref_debug.c b/kernel/rcu/noref_debug.c
new file mode 100644
index 000000000000..ae2e104b11d6
--- /dev/null
+++ b/kernel/rcu/noref_debug.c
@@ -0,0 +1,89 @@
+#include <linux/bug.h>
+#include <linux/percpu.h>
+#include <linux/skbuff.h>
+#include <linux/bitops.h>
+
+#define NOREF_MAX 7
+struct noref_entry {
+	void *noref;
+	void *key;
+	int nesting;
+};
+
+struct noref_cache {
+	struct noref_entry store[NOREF_MAX];
+};
+
+static DEFINE_PER_CPU(struct noref_cache, per_cpu_noref_cache);
+
+static int noref_cache_lookup(struct noref_cache *cache, void *noref, void *key)
+{
+	int i;
+
+	for (i = 0; i < NOREF_MAX; ++i)
+		if (cache->store[i].noref == noref &&
+		    cache->store[i].key == key)
+			return i;
+
+	return -1;
+}
+
+void rcu_track_noref(void *key, void *noref, bool track)
+{
+	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
+	int i;
+
+	if (track) {
+		/* find the first empty slot */
+		i = noref_cache_lookup(cache, NULL, 0);
+		if (i < 0) {
+			WARN_ONCE(1, "can't find empty an slot to track a noref"
+				  " noref tracking will be inaccurate");
+			return;
+		}
+
+		cache->store[i].noref = noref;
+		cache->store[i].key = key;
+		cache->store[i].nesting = preempt_count();
+		return;
+	}
+
+	i = noref_cache_lookup(cache, noref, key);
+	if (i == -1) {
+		WARN_ONCE(1, "to-be-untracked noref entity %p not found in "
+			  "cache\n", noref);
+		return;
+	}
+
+	cache->store[i].noref = NULL;
+	cache->store[i].key = NULL;
+	cache->store[i].nesting = 0;
+}
+EXPORT_SYMBOL_GPL(rcu_track_noref);
+
+void __rcu_check_noref(void)
+{
+	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
+	char *cur, buf[strlen("0xffffffffffffffff") * NOREF_MAX + 1];
+	int nesting = preempt_count();
+	int i, ret, cnt = 0;
+
+	cur = buf;
+	for (i = 0; i < NOREF_MAX; ++i) {
+		if (!cache->store[i].noref ||
+		    cache->store[i].nesting != nesting)
+			continue;
+
+		cnt++;
+		ret = sprintf(cur, " %p", cache->store[i].noref);
+		if (ret >= 0)
+			cur += ret;
+		cache->store[i].noref = NULL;
+		cache->store[i].key = NULL;
+		cache->store[i].nesting = 0;
+	}
+
+	WARN_ONCE(cnt, "%d noref entities escaped an RCU section, "
+		  "nesting %d, leaked noref list %s", cnt, nesting, buf);
+}
+EXPORT_SYMBOL_GPL(__rcu_check_noref);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] net: use RCU noref infrastructure to track dst noref
  2017-10-06 12:57 [PATCH 0/4] RCU: introduce noref debug Paolo Abeni
  2017-10-06 12:57 ` [PATCH 1/4] rcu: " Paolo Abeni
@ 2017-10-06 12:57 ` Paolo Abeni
  2017-10-06 12:57 ` [PATCH 3/4] ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref() Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

We can now ensure noref dsts do no escape the relevant RCU
section. This does not introduce any functional changes, it adds
the relevant debug code enabled by CONFIG_RCU_NOREF_DEBUG.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/skbuff.h | 1 +
 include/net/dst.h      | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72299ef00061..5df77c8a0319 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -905,6 +905,7 @@ static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
 static inline void skb_dst_set_noref(struct sk_buff *skb, struct dst_entry *dst)
 {
 	WARN_ON(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+	rcu_track_noref(skb, dst, true);
 	skb->_skb_refdst = (unsigned long)dst | SKB_DST_NOREF;
 }
 
diff --git a/include/net/dst.h b/include/net/dst.h
index 06a6765da074..6bc6fefe178e 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -294,6 +294,8 @@ static inline void refdst_drop(unsigned long refdst)
 static inline void skb_dst_drop(struct sk_buff *skb)
 {
 	if (skb->_skb_refdst) {
+		if (skb->_skb_refdst & SKB_DST_NOREF)
+			rcu_track_noref(skb, skb_dst(skb), false);
 		refdst_drop(skb->_skb_refdst);
 		skb->_skb_refdst = 0UL;
 	}
@@ -304,6 +306,8 @@ static inline void __skb_dst_copy(struct sk_buff *nskb, unsigned long refdst)
 	nskb->_skb_refdst = refdst;
 	if (!(nskb->_skb_refdst & SKB_DST_NOREF))
 		dst_clone(skb_dst(nskb));
+	else
+		rcu_track_noref(nskb, skb_dst(nskb), true);
 }
 
 static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb)
@@ -335,6 +339,7 @@ static inline void skb_dst_force(struct sk_buff *skb)
 		struct dst_entry *dst = skb_dst(skb);
 
 		WARN_ON(!rcu_read_lock_held());
+		rcu_track_noref(skb, dst, false);
 		if (!dst_hold_safe(dst))
 			dst = NULL;
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref()
  2017-10-06 12:57 [PATCH 0/4] RCU: introduce noref debug Paolo Abeni
  2017-10-06 12:57 ` [PATCH 1/4] rcu: " Paolo Abeni
  2017-10-06 12:57 ` [PATCH 2/4] net: use RCU noref infrastructure to track dst noref Paolo Abeni
@ 2017-10-06 12:57 ` Paolo Abeni
  2017-10-06 12:57 ` [PATCH 4/4] tcp: avoid noref dst leak on input path Paolo Abeni
  2017-10-06 13:34 ` [PATCH 0/4] RCU: introduce noref debug Paul E. McKenney
  4 siblings, 0 replies; 16+ messages in thread
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat on the
first ingress IPv4 packet:

       1 noref entities escaped an RCU section, nesting 259, leaked noref list  ffff8edcefb1dc00
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 0 at kernel/rcu/noref_debug.c:87 __rcu_check_noref+0xf8/0x100
       Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd mei_me ipmi_ssif sg mei mxm_wmi iTCO_wdt iTCO_vendor_support dcdbas lpc_ich ipmi_si pcspkr ipmi_devintf ipmi_msghandler shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio ahci ptp crc32c_intel i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc1.noref_3+ #1609
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
       task: ffffffffae019500 task.stack: ffffffffae000000
       RIP: 0010:__rcu_check_noref+0x7b/0xd0
       RSP: 0018:ffff900afbe03b30 EFLAGS: 00010246
       RAX: 0000000000000034 RBX: ffff900afbfd2500 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
       RBP: ffff900afbe03b48 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 00000000c388883e R12: 0000000000000103
       R13: 000000001d24100a R14: ffff900af13e0000 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff900afbe00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00005589ce876398 CR3: 0000001ff52f9005 CR4: 00000000003606f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <IRQ>
        ip_route_input_noref+0xa6/0x150
        ? ip_route_input_noref+0x5/0x150
        ip_rcv_finish+0x78/0x5e0
        ip_rcv+0x2a7/0x540
        ? packet_rcv+0x52/0x450
        __netif_receive_skb_core+0x3b9/0xe10
        ? netif_receive_skb_internal+0x40/0x390
        __netif_receive_skb+0x18/0x60
        netif_receive_skb_internal+0x8d/0x390
        ? netif_receive_skb_internal+0x40/0x390
        napi_gro_receive+0x15c/0x1f0
        igb_clean_rx_irq+0x36d/0x7f0 [igb]
        igb_poll+0x303/0x780 [igb]
        ? save_stack_trace+0x1b/0x20
        ? __lock_acquire+0xcf2/0x11c0
        ? net_rx_action+0xb4/0x520
        net_rx_action+0x27d/0x520
        __do_softirq+0xd1/0x4f5
        irq_exit+0xfb/0x110
        do_IRQ+0x67/0x120
        common_interrupt+0xa7/0xa7
        </IRQ>
       RIP: 0010:cpuidle_enter_state+0xd0/0x360
       RSP: 0018:ffffffffae003df8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff79
       RAX: ffffffffae019500 RBX: ffffd7d67c604400 RCX: 0000000000000000
       RDX: ffffffffae019500 RSI: 0000000000000001 RDI: ffffffffae019500
       RBP: ffffffffae003e30 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000018 R12: 0000000000000004
       R13: 0000000000000000 R14: ffffd7d67c604400 R15: 00000009c8eafd50
        ? cpuidle_enter_state+0xc9/0x360
        cpuidle_enter+0x17/0x20
        call_cpuidle+0x23/0x40
        do_idle+0x183/0x200
        cpu_startup_entry+0x73/0x80
        rest_init+0xc3/0xd0
        start_kernel+0x4f7/0x518
        ? set_init_arg+0x5a/0x5a
        x86_64_start_reservations+0x24/0x26
        x86_64_start_kernel+0x6f/0x72
        secondary_startup_64+0xa5/0xa5
       Code: f6 75 07 5b 41 5c 41 5d 5d c3 80 3d eb e4 ff 00 00 75 1a 44 89 e2 48 c7 c7 88 54 e7 ad 31 c0 c6 05 d6 e4 ff 00 01 e8 28 af fe ff <0f> ff 41 bd 07 00 00 00 48 8b 33 48 85 f6 74 06 44 39 63 10 74

The rcu protection in ip_route_input_noref() is unneeded and
misleading: the caller still needs to acquire and retain the rcu
lock until the skb - carrying a noref dst on successful return -
is either dropped or the relevant dst is forced to a ref-counted
version.

This change just drops the unneeded lock.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/route.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 94d4cd2d5ea4..5a6ca1f16d3f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2069,14 +2069,9 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			 u8 tos, struct net_device *dev)
 {
 	struct fib_result res;
-	int err;
 
 	tos &= IPTOS_RT_MASK;
-	rcu_read_lock();
-	err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
-	rcu_read_unlock();
-
-	return err;
+	return ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
 }
 EXPORT_SYMBOL(ip_route_input_noref);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] tcp: avoid noref dst leak on input path
  2017-10-06 12:57 [PATCH 0/4] RCU: introduce noref debug Paolo Abeni
                   ` (2 preceding siblings ...)
  2017-10-06 12:57 ` [PATCH 3/4] ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref() Paolo Abeni
@ 2017-10-06 12:57 ` Paolo Abeni
  2017-10-06 14:37   ` Eric Dumazet
  2017-10-06 13:34 ` [PATCH 0/4] RCU: introduce noref debug Paul E. McKenney
  4 siblings, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
processing tcp packets:

           to-be-untracked noref entity ffff942cb71ea300 not found in cache
           ------------[ cut here ]------------
           WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 rcu_track_noref+0xa4/0xf0
           Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
           CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 4.14.0-rc1.noref_route+ #1610
           Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
           task: ffff940e48300000 task.stack: ffffaec406a20000
           RIP: 0010:rcu_track_noref+0xa4/0xf0
           RSP: 0018:ffffaec406a238e0 EFLAGS: 00010246
           RAX: 0000000000000040 RBX: 0000000000000000 RCX: 0000000000000002
           RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000292
           RBP: ffffaec406a238e0 R08: 0000000000000000 R09: 0000000000000000
           R10: 0000000000000001 R11: 0000000000000003 R12: 0000000000000000
           R13: ffff942cb5110000 R14: 000000000000fe88 R15: ffff942cb1a20200
           FS:  0000000000000000(0000) GS:ffff942cbee00000(0000) knlGS:0000000000000000
           CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
           CR2: 00007febc072d140 CR3: 0000001feebd6002 CR4: 00000000003606e0
           DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
           DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
           Call Trace:
            tcp_data_queue+0x82a/0xce0
            tcp_rcv_established+0x283/0x570
            tcp_v4_do_rcv+0x102/0x1e0
            tcp_v4_rcv+0xa9e/0xc10
            ip_local_deliver_finish+0x128/0x380
            ? ip_local_deliver_finish+0x41/0x380
            ip_local_deliver+0x74/0x230
            ip_rcv_finish+0x105/0x5e0
            ip_rcv+0x2a7/0x540
            __netif_receive_skb_core+0x3b9/0xe10
            ? netif_receive_skb_internal+0x40/0x390
            __netif_receive_skb+0x18/0x60
            netif_receive_skb_internal+0x8d/0x390
            ? netif_receive_skb_internal+0x40/0x390
            napi_gro_complete+0x127/0x1d0
            ? napi_gro_complete+0x2a/0x1d0
            napi_gro_flush+0x5f/0x80
            napi_complete_done+0x96/0x100
            ixgbe_poll+0x5f8/0x7c0 [ixgbe]
            net_rx_action+0x27d/0x520
            __do_softirq+0xd1/0x4f5
            run_ksoftirqd+0x25/0x70
            smpboot_thread_fn+0x11a/0x1f0
            kthread+0x155/0x190
            ? sort_range+0x30/0x30
            ? kthread_create_on_node+0x70/0x70
            ret_from_fork+0x2a/0x40
           Code: e9 83 c2 01 48 83 c0 18 83 fa 07 75 ef 80 3d b0 e5 ff 00 00 75 d2 48 c7 c7 50 54 07 9d 31 c0 c6 05 9e e5 ff 00 01 e8 ef af fe ff <0f> ff 5d c3 80 3d 8f e5 ff 00 00 75 b0 48 c7 c7 00 54 07 9d 31

we must clear the skb dst before enqueuing the skb somewhere,
but currently the dst entry for packets delivered
via tcp_rcv_established() -> tcp_queue_rcv() is not cleared.

Fix it by adding the explicit drop in tcp_queue_rcv() and moving
the current skb_dst_drop() just before the other enqueuing
operation, do avoid unneeded double skb_dst_drop() for some
path.

The leak itself is not harmful, because the tcp recvmsg() code
should not access such info.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/tcp_input.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656beeee..bf4e17edfe7a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4422,6 +4422,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		return;
 	}
 
+	/* drop the -possibly noref - dst before delivery the skb to ofo tree */
+	skb_dst_drop(skb);
+
 	/* Stash tstamp to avoid being stomped on by rbnode */
 	if (TCP_SKB_CB(skb)->has_rxtstamp)
 		TCP_SKB_CB(skb)->swtstamp = skb->tstamp;
@@ -4560,6 +4563,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
 				  skb, fragstolen)) ? 1 : 0;
 	tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
 	if (!eaten) {
+		skb_dst_drop(skb);
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
 		skb_set_owner_r(skb, sk);
 	}
@@ -4626,7 +4630,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		__kfree_skb(skb);
 		return;
 	}
-	skb_dst_drop(skb);
 	__skb_pull(skb, tcp_hdr(skb)->doff * 4);
 
 	tcp_ecn_accept_cwr(tp, skb);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] RCU: introduce noref debug
  2017-10-06 12:57 [PATCH 0/4] RCU: introduce noref debug Paolo Abeni
                   ` (3 preceding siblings ...)
  2017-10-06 12:57 ` [PATCH 4/4] tcp: avoid noref dst leak on input path Paolo Abeni
@ 2017-10-06 13:34 ` Paul E. McKenney
  2017-10-06 15:10   ` Paolo Abeni
  4 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-10-06 13:34 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote:
> The networking subsystem is currently using some kind of long-lived
> RCU-protected, references to avoid the overhead of full book-keeping.
> 
> Such references - skb_dst() noref - are stored inside the skbs and can be
> moved across relevant slices of the network stack, with the users
> being in charge of properly clearing the relevant skb - or properly refcount
> the related dst references - before the skb escapes the RCU section.
> 
> We currently don't have any deterministic debug infrastructure to check
> the dst noref usages - and the introduction of others noref artifact is
> currently under discussion.
> 
> This series tries to tackle the above introducing an RCU debug infrastructure
> aimed at spotting incorrect noref pointer usage, in patch one. The
> infrastructure is small and must be explicitly enabled via a newly introduced
> build option.
> 
> Patch two uses such infrastructure to track dst noref usage in the networking
> stack.
> 
> Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
> on basic scenarios.

This patchset does not look like it handles rcu_read_lock() nesting.
For example, given code like this:

	void foo(void)
	{
		rcu_read_lock();
		rcu_track_noref(&key2, &noref2, true);
		do_something();
		rcu_track_noref(&key2, &noref2, false);
		rcu_read_unlock();
	}

	void bar(void)
	{
		rcu_read_lock();
		rcu_track_noref(&key1, &noref1, true);
		do_something_more();
		foo();
		do_something_else();
		rcu_track_noref(&key1, &noref1, false);
		rcu_read_unlock();
	}

	void grill(void)
	{
		foo();
	}

It looks like foo()'s rcu_read_unlock() will complain about key1.
You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but
that will break the call from grill().

Or am I missing something subtle here?  Given patch 3/4, I suspect not...

							Thanx, Paul

> Paolo Abeni (4):
>   rcu: introduce noref debug
>   net: use RCU noref infrastructure to track dst noref
>   ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref()
>   tcp: avoid noref dst leak on input path
> 
>  include/linux/rcupdate.h | 11 ++++++
>  include/linux/skbuff.h   |  1 +
>  include/net/dst.h        |  5 +++
>  kernel/rcu/Kconfig.debug | 15 ++++++++
>  kernel/rcu/Makefile      |  1 +
>  kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv4/route.c         |  7 +---
>  net/ipv4/tcp_input.c     |  5 ++-
>  8 files changed, 127 insertions(+), 7 deletions(-)
>  create mode 100644 kernel/rcu/noref_debug.c
> 
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] rcu: introduce noref debug
  2017-10-06 12:57 ` [PATCH 1/4] rcu: " Paolo Abeni
@ 2017-10-06 14:13   ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2017-10-06 14:13 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Paul E. McKenney, Josh Triplett, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

On Fri,  6 Oct 2017 14:57:46 +0200
Paolo Abeni <pabeni@redhat.com> wrote:

> We currently lack a debugging infrastructure to ensure that
> long-lived noref dst are properly handled - namely dropped
> or converted to a refcounted version before escaping the current
> RCU section.
> 
> This changeset implements such infra, tracking the noref pointer
> on a per CPU store and checking that all noref tracked at any
> given RCU nesting level are cleared before leaving such RCU
> section.
> 
> Each noref entity is identified using the entity pointer and an
> additional, optional, key/opaque pointer. This is needed to cope
> with usage case scenario where the same noref entity is stored
> in different places (e.g. noref dst on skb_clone()).
> 
> To keep the patch/implementation simple RCU_NOREF_DEBUG depends
> on PREEMPT_RCU=n in Kconfig.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/rcupdate.h | 11 ++++++
>  kernel/rcu/Kconfig.debug | 15 ++++++++
>  kernel/rcu/Makefile      |  1 +
>  kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 116 insertions(+)
>  create mode 100644 kernel/rcu/noref_debug.c
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index de50d8a4cf41..20c1ce08e3eb 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -60,6 +60,15 @@ void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
>  void synchronize_sched(void);
>  void rcu_barrier_tasks(void);
>  
> +#ifdef CONFIG_RCU_NOREF_DEBUG
> +void rcu_track_noref(void *key, void *noref, bool track);
> +void __rcu_check_noref(void);
> +
> +#else
> +static inline void rcu_track_noref(void *key, void *noref, bool add) { }
> +static inline void __rcu_check_noref(void) { }
> +#endif
> +
>  #ifdef CONFIG_PREEMPT_RCU
>  
>  void __rcu_read_lock(void);
> @@ -85,6 +94,7 @@ static inline void __rcu_read_lock(void)
>  
>  static inline void __rcu_read_unlock(void)
>  {
> +	__rcu_check_noref();
>  	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
>  		preempt_enable();
>  }
> @@ -723,6 +733,7 @@ static inline void rcu_read_unlock_bh(void)
>  			 "rcu_read_unlock_bh() used illegally while idle");
>  	rcu_lock_release(&rcu_bh_lock_map);
>  	__release(RCU_BH);
> +	__rcu_check_noref();
>  	local_bh_enable();
>  }
>  
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 0ec7d1d33a14..6c7f52a3e809 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -68,6 +68,21 @@ config RCU_TRACE
>  	  Say Y here if you want to enable RCU tracing
>  	  Say N if you are unsure.
>  
> +config RCU_NOREF_DEBUG
> +	bool "Debugging for RCU-protected noref entries"
> +	depends on PREEMPT_RCU=n
> +	select PREEMPT_COUNT
> +	default n
> +	help
> +	  In case of long lasting rcu_read_lock sections this debug
> +	  feature enables one to ensure that no rcu managed dereferenced
> +	  data leaves the locked section. One use case is the tracking
> +	  of dst_entries in struct sk_buff ->dst, which is used to pass
> +	  the dst_entry around in the whole stack while under RCU lock.
> +
> +	  Say Y here if you want to enable noref RCU debugging
> +	  Say N if you are unsure.
> +
>  config RCU_EQS_DEBUG
>  	bool "Provide debugging asserts for adding NO_HZ support to an arch"
>  	depends on DEBUG_KERNEL
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 13c0fc852767..c67d7c65c582 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -11,3 +11,4 @@ obj-$(CONFIG_TREE_RCU) += tree.o
>  obj-$(CONFIG_PREEMPT_RCU) += tree.o
>  obj-$(CONFIG_TINY_RCU) += tiny.o
>  obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
> +obj-$(CONFIG_RCU_NOREF_DEBUG) += noref_debug.o
> \ No newline at end of file
> diff --git a/kernel/rcu/noref_debug.c b/kernel/rcu/noref_debug.c
> new file mode 100644
> index 000000000000..ae2e104b11d6
> --- /dev/null
> +++ b/kernel/rcu/noref_debug.c
> @@ -0,0 +1,89 @@
> +#include <linux/bug.h>
> +#include <linux/percpu.h>
> +#include <linux/skbuff.h>
> +#include <linux/bitops.h>
> +
> +#define NOREF_MAX 7
> +struct noref_entry {
> +	void *noref;
> +	void *key;
> +	int nesting;
> +};
> +
> +struct noref_cache {
> +	struct noref_entry store[NOREF_MAX];
> +};
> +
> +static DEFINE_PER_CPU(struct noref_cache, per_cpu_noref_cache);
> +
> +static int noref_cache_lookup(struct noref_cache *cache, void *noref, void *key)
> +{
> +	int i;
> +
> +	for (i = 0; i < NOREF_MAX; ++i)
> +		if (cache->store[i].noref == noref &&
> +		    cache->store[i].key == key)
> +			return i;
> +
> +	return -1;
> +}
> +

Please add a comment above this function on how to use it.

-- Steve

> +void rcu_track_noref(void *key, void *noref, bool track)
> +{
> +	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
> +	int i;
> +
> +	if (track) {
> +		/* find the first empty slot */
> +		i = noref_cache_lookup(cache, NULL, 0);
> +		if (i < 0) {
> +			WARN_ONCE(1, "can't find empty an slot to track a noref"
> +				  " noref tracking will be inaccurate");
> +			return;
> +		}
> +
> +		cache->store[i].noref = noref;
> +		cache->store[i].key = key;
> +		cache->store[i].nesting = preempt_count();
> +		return;
> +	}
> +
> +	i = noref_cache_lookup(cache, noref, key);
> +	if (i == -1) {
> +		WARN_ONCE(1, "to-be-untracked noref entity %p not found in "
> +			  "cache\n", noref);
> +		return;
> +	}
> +
> +	cache->store[i].noref = NULL;
> +	cache->store[i].key = NULL;
> +	cache->store[i].nesting = 0;
> +}
> +EXPORT_SYMBOL_GPL(rcu_track_noref);
> +
> +void __rcu_check_noref(void)
> +{
> +	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
> +	char *cur, buf[strlen("0xffffffffffffffff") * NOREF_MAX + 1];
> +	int nesting = preempt_count();
> +	int i, ret, cnt = 0;
> +
> +	cur = buf;
> +	for (i = 0; i < NOREF_MAX; ++i) {
> +		if (!cache->store[i].noref ||
> +		    cache->store[i].nesting != nesting)
> +			continue;
> +
> +		cnt++;
> +		ret = sprintf(cur, " %p", cache->store[i].noref);
> +		if (ret >= 0)
> +			cur += ret;
> +		cache->store[i].noref = NULL;
> +		cache->store[i].key = NULL;
> +		cache->store[i].nesting = 0;
> +	}
> +
> +	WARN_ONCE(cnt, "%d noref entities escaped an RCU section, "
> +		  "nesting %d, leaked noref list %s", cnt, nesting, buf);
> +}
> +EXPORT_SYMBOL_GPL(__rcu_check_noref);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] tcp: avoid noref dst leak on input path
  2017-10-06 12:57 ` [PATCH 4/4] tcp: avoid noref dst leak on input path Paolo Abeni
@ 2017-10-06 14:37   ` Eric Dumazet
  2017-10-06 15:21     ` Paolo Abeni
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2017-10-06 14:37 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	David S. Miller, Eric Dumazet, Hannes Frederic Sowa, netdev

On Fri, 2017-10-06 at 14:57 +0200, Paolo Abeni wrote:
> Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
> processing tcp packets:
> 
>            to-be-untracked noref entity ffff942cb71ea300 not found in cache
>            ------------[ cut here ]------------
>            WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 rcu_track_noref+0xa4/0xf0
>            Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
>            CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 4.14.0-rc1.noref_route+ #1610
>            Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
>            task: ffff940e48300000 task.stack: ffffaec406a20000
>            RIP: 0010:rcu_track_noref+0xa4/0xf0
>            RSP: 0018:ffffaec406a238e0 EFLAGS: 00010246
>            RAX: 0000000000000040 RBX: 0000000000000000 RCX: 0000000000000002
>            RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000292
>            RBP: ffffaec406a238e0 R08: 0000000000000000 R09: 0000000000000000
>            R10: 0000000000000001 R11: 0000000000000003 R12: 0000000000000000
>            R13: ffff942cb5110000 R14: 000000000000fe88 R15: ffff942cb1a20200
>            FS:  0000000000000000(0000) GS:ffff942cbee00000(0000) knlGS:0000000000000000
>            CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>            CR2: 00007febc072d140 CR3: 0000001feebd6002 CR4: 00000000003606e0
>            DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>            DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>            Call Trace:
>             tcp_data_queue+0x82a/0xce0

That is strange, since tcp_data_queue() starts with

	skb_dst_drop(skb); 

So this stack trace looks suspicious.

>             tcp_rcv_established+0x283/0x570
>             tcp_v4_do_rcv+0x102/0x1e0
>             tcp_v4_rcv+0xa9e/0xc10
>             ip_local_deliver_finish+0x128/0x380
>             ? ip_local_deliver_finish+0x41/0x380
>             ip_local_deliver+0x74/0x230
>             ip_rcv_finish+0x105/0x5e0
>             ip_rcv+0x2a7/0x540
>             __netif_receive_skb_core+0x3b9/0xe10
>             ? netif_receive_skb_internal+0x40/0x390
>             __netif_receive_skb+0x18/0x60
>             netif_receive_skb_internal+0x8d/0x390
>             ? netif_receive_skb_internal+0x40/0x390
>             napi_gro_complete+0x127/0x1d0
>             ? napi_gro_complete+0x2a/0x1d0
>             napi_gro_flush+0x5f/0x80
>             napi_complete_done+0x96/0x100
>             ixgbe_poll+0x5f8/0x7c0 [ixgbe]
>             net_rx_action+0x27d/0x520
>             __do_softirq+0xd1/0x4f5
>             run_ksoftirqd+0x25/0x70
>             smpboot_thread_fn+0x11a/0x1f0
>             kthread+0x155/0x190
>             ? sort_range+0x30/0x30
>             ? kthread_create_on_node+0x70/0x70
>             ret_from_fork+0x2a/0x40
>            Code: e9 83 c2 01 48 83 c0 18 83 fa 07 75 ef 80 3d b0 e5 ff 00 00 75 d2 48 c7 c7 50 54 07 9d 31 c0 c6 05 9e e5 ff 00 01 e8 ef af fe ff <0f> ff 5d c3 80 3d 8f e5 ff 00 00 75 b0 48 c7 c7 00 54 07 9d 31
> 
> we must clear the skb dst before enqueuing the skb somewhere,
> but currently the dst entry for packets delivered
> via tcp_rcv_established() -> tcp_queue_rcv() is not cleared.
> 
> Fix it by adding the explicit drop in tcp_queue_rcv() and moving
> the current skb_dst_drop() just before the other enqueuing
> operation, do avoid unneeded double skb_dst_drop() for some
> path.
> 
> The leak itself is not harmful, because the tcp recvmsg() code
> should not access such info.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv4/tcp_input.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index c5d7656beeee..bf4e17edfe7a 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4422,6 +4422,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
>  		return;
>  	}
>  
> +	/* drop the -possibly noref - dst before delivery the skb to ofo tree */
> +	skb_dst_drop(skb);
> +
>  	/* Stash tstamp to avoid being stomped on by rbnode */
>  	if (TCP_SKB_CB(skb)->has_rxtstamp)
>  		TCP_SKB_CB(skb)->swtstamp = skb->tstamp;
> @@ -4560,6 +4563,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
>  				  skb, fragstolen)) ? 1 : 0;
>  	tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
>  	if (!eaten) {
> +		skb_dst_drop(skb);
>  		__skb_queue_tail(&sk->sk_receive_queue, skb);
>  		skb_set_owner_r(skb, sk);
>  	}
> @@ -4626,7 +4630,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
>  		__kfree_skb(skb);
>  		return;
>  	}
> -	skb_dst_drop(skb);
>  	__skb_pull(skb, tcp_hdr(skb)->doff * 4);
>  
>  	tcp_ecn_accept_cwr(tp, skb);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] RCU: introduce noref debug
  2017-10-06 13:34 ` [PATCH 0/4] RCU: introduce noref debug Paul E. McKenney
@ 2017-10-06 15:10   ` Paolo Abeni
  2017-10-06 16:34     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2017-10-06 15:10 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

Hi,

On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote:
> On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote:
> > The networking subsystem is currently using some kind of long-lived
> > RCU-protected, references to avoid the overhead of full book-keeping.
> > 
> > Such references - skb_dst() noref - are stored inside the skbs and can be
> > moved across relevant slices of the network stack, with the users
> > being in charge of properly clearing the relevant skb - or properly refcount
> > the related dst references - before the skb escapes the RCU section.
> > 
> > We currently don't have any deterministic debug infrastructure to check
> > the dst noref usages - and the introduction of others noref artifact is
> > currently under discussion.
> > 
> > This series tries to tackle the above introducing an RCU debug infrastructure
> > aimed at spotting incorrect noref pointer usage, in patch one. The
> > infrastructure is small and must be explicitly enabled via a newly introduced
> > build option.
> > 
> > Patch two uses such infrastructure to track dst noref usage in the networking
> > stack.
> > 
> > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
> > on basic scenarios.

Thank you for the prompt reply!
> 
> This patchset does not look like it handles rcu_read_lock() nesting.
> For example, given code like this:
> 
> 	void foo(void)
> 	{
> 		rcu_read_lock();
> 		rcu_track_noref(&key2, &noref2, true);
> 		do_something();
> 		rcu_track_noref(&key2, &noref2, false);
> 		rcu_read_unlock();
> 	}
> 
> 	void bar(void)
> 	{
> 		rcu_read_lock();
> 		rcu_track_noref(&key1, &noref1, true);
> 		do_something_more();
> 		foo();
> 		do_something_else();
> 		rcu_track_noref(&key1, &noref1, false);
> 		rcu_read_unlock();
> 	}
> 
> 	void grill(void)
> 	{
> 		foo();
> 	}
> 
> It looks like foo()'s rcu_read_unlock() will complain about key1.
> You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but
> that will break the call from grill().

Actually the code should cope correctly with your example; when foo()'s
rcu_read_unlock() is called, 'cache' contains:

{ { &key1, &noref1, 1},  // ...

and when the related __rcu_check_noref() is invoked preempt_count() is
2 - because the check is called before decreasing the preempt counter.

In the main loop inside __rcu_check_noref() we will hit always the
'continue' statement because 'cache->store[i].nesting != nesting', so
no warn will be triggered.

> Or am I missing something subtle here?  Given patch 3/4, I suspect not...

The problem with the code in patch 3/4 is different; currently
ip_route_input_noref() is basically doing:

rcu_read_lock();

rcu_track_noref(&key1, &noref1, true);

rcu_read_unlock();

So the rcu lock there silence any RCU based check inside
ip_route_input_noref() but does not really protect the noref dst.

Please let me know if the above clarify the scenario.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] tcp: avoid noref dst leak on input path
  2017-10-06 14:37   ` Eric Dumazet
@ 2017-10-06 15:21     ` Paolo Abeni
  2017-10-06 15:32       ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2017-10-06 15:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	David S. Miller, Eric Dumazet, Hannes Frederic Sowa, netdev

Hi,

On Fri, 2017-10-06 at 07:37 -0700, Eric Dumazet wrote:
> On Fri, 2017-10-06 at 14:57 +0200, Paolo Abeni wrote:
> > Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
> > processing tcp packets:
> > 
> >            to-be-untracked noref entity ffff942cb71ea300 not found in cache
> >            ------------[ cut here ]------------
> >            WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 rcu_track_noref+0xa4/0xf0
> >            Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
> >            CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 4.14.0-rc1.noref_route+ #1610
> >            Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
> >            task: ffff940e48300000 task.stack: ffffaec406a20000
> >            RIP: 0010:rcu_track_noref+0xa4/0xf0
> >            RSP: 0018:ffffaec406a238e0 EFLAGS: 00010246
> >            RAX: 0000000000000040 RBX: 0000000000000000 RCX: 0000000000000002
> >            RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000292
> >            RBP: ffffaec406a238e0 R08: 0000000000000000 R09: 0000000000000000
> >            R10: 0000000000000001 R11: 0000000000000003 R12: 0000000000000000
> >            R13: ffff942cb5110000 R14: 000000000000fe88 R15: ffff942cb1a20200
> >            FS:  0000000000000000(0000) GS:ffff942cbee00000(0000) knlGS:0000000000000000
> >            CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >            CR2: 00007febc072d140 CR3: 0000001feebd6002 CR4: 00000000003606e0
> >            DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >            DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >            Call Trace:
> >             tcp_data_queue+0x82a/0xce0
> 
> That is strange, since tcp_data_queue() starts with
> 
> 	skb_dst_drop(skb); 
> 
> So this stack trace looks suspicious.

Thank you for the feedback.

I most probably messed-up while extracting the info from dmsg, as this
issue gives a couple of splats almost concurrently. Please let me re-do 
the test and post a more resonable dmsg.

The problem with the current code is that in the tcp_rcv_established()
-> tcp_queue_rcv() path, the skb_dst() is not cleared.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] tcp: avoid noref dst leak on input path
  2017-10-06 15:21     ` Paolo Abeni
@ 2017-10-06 15:32       ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2017-10-06 15:32 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Eric Dumazet, LKML, Paul E. McKenney, Josh Triplett,
	Steven Rostedt, David S. Miller, Hannes Frederic Sowa, netdev

On Fri, Oct 6, 2017 at 8:21 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> Hi,
>
> On Fri, 2017-10-06 at 07:37 -0700, Eric Dumazet wrote:
>> On Fri, 2017-10-06 at 14:57 +0200, Paolo Abeni wrote:
>> > Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
>> > processing tcp packets:
>> >
>> >            to-be-untracked noref entity ffff942cb71ea300 not found in cache
>> >            ------------[ cut here ]------------
>> >            WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 rcu_track_noref+0xa4/0xf0
>> >            Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
>> >            CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 4.14.0-rc1.noref_route+ #1610
>> >            Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
>> >            task: ffff940e48300000 task.stack: ffffaec406a20000
>> >            RIP: 0010:rcu_track_noref+0xa4/0xf0
>> >            RSP: 0018:ffffaec406a238e0 EFLAGS: 00010246
>> >            RAX: 0000000000000040 RBX: 0000000000000000 RCX: 0000000000000002
>> >            RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000292
>> >            RBP: ffffaec406a238e0 R08: 0000000000000000 R09: 0000000000000000

> Thank you for the feedback.
>
> I most probably messed-up while extracting the info from dmsg, as this
> issue gives a couple of splats almost concurrently. Please let me re-do
> the test and post a more resonable dmsg.
>
> The problem with the current code is that in the tcp_rcv_established()
> -> tcp_queue_rcv() path, the skb_dst() is not cleared.
>

In any case, I would rather put one skb_dst_drop() right after the
last possible use of skb dst in TCP stack,
probably after sk_rx_dst_set() call.

Trying to move it in multiple places has been error prone, even if
current code is not buggy.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] RCU: introduce noref debug
  2017-10-06 15:10   ` Paolo Abeni
@ 2017-10-06 16:34     ` Paul E. McKenney
  2017-10-09 16:53       ` Paolo Abeni
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-10-06 16:34 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

On Fri, Oct 06, 2017 at 05:10:09PM +0200, Paolo Abeni wrote:
> Hi,
> 
> On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote:
> > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote:
> > > The networking subsystem is currently using some kind of long-lived
> > > RCU-protected, references to avoid the overhead of full book-keeping.
> > > 
> > > Such references - skb_dst() noref - are stored inside the skbs and can be
> > > moved across relevant slices of the network stack, with the users
> > > being in charge of properly clearing the relevant skb - or properly refcount
> > > the related dst references - before the skb escapes the RCU section.
> > > 
> > > We currently don't have any deterministic debug infrastructure to check
> > > the dst noref usages - and the introduction of others noref artifact is
> > > currently under discussion.
> > > 
> > > This series tries to tackle the above introducing an RCU debug infrastructure
> > > aimed at spotting incorrect noref pointer usage, in patch one. The
> > > infrastructure is small and must be explicitly enabled via a newly introduced
> > > build option.
> > > 
> > > Patch two uses such infrastructure to track dst noref usage in the networking
> > > stack.
> > > 
> > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
> > > on basic scenarios.
> 
> Thank you for the prompt reply!
> > 
> > This patchset does not look like it handles rcu_read_lock() nesting.
> > For example, given code like this:
> > 
> > 	void foo(void)
> > 	{
> > 		rcu_read_lock();
> > 		rcu_track_noref(&key2, &noref2, true);
> > 		do_something();
> > 		rcu_track_noref(&key2, &noref2, false);
> > 		rcu_read_unlock();
> > 	}
> > 
> > 	void bar(void)
> > 	{
> > 		rcu_read_lock();
> > 		rcu_track_noref(&key1, &noref1, true);
> > 		do_something_more();
> > 		foo();
> > 		do_something_else();
> > 		rcu_track_noref(&key1, &noref1, false);
> > 		rcu_read_unlock();
> > 	}
> > 
> > 	void grill(void)
> > 	{
> > 		foo();
> > 	}
> > 
> > It looks like foo()'s rcu_read_unlock() will complain about key1.
> > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but
> > that will break the call from grill().
> 
> Actually the code should cope correctly with your example; when foo()'s
> rcu_read_unlock() is called, 'cache' contains:
> 
> { { &key1, &noref1, 1},  // ...
> 
> and when the related __rcu_check_noref() is invoked preempt_count() is
> 2 - because the check is called before decreasing the preempt counter.
> 
> In the main loop inside __rcu_check_noref() we will hit always the
> 'continue' statement because 'cache->store[i].nesting != nesting', so
> no warn will be triggered.

You are right, it was too early, and my example wasn't correct.  How
about this one?

	void foo(void (*f)(struct s *sp), struct s **spp)
	{
		rcu_read_lock();
		rcu_track_noref(&key2, &noref2, true);
		f(spp);
		rcu_track_noref(&key2, &noref2, false);
		rcu_read_unlock();
	}

	void barcb(struct s **spp)
	{
		*spp = &noref3;
		rcu_track_noref(&key3, *spp, true);
	}

	void bar(void)
	{
		struct s *sp;

		rcu_read_lock();
		rcu_track_noref(&key1, &noref1, true);
		do_something_more();
		foo(barcb, &sp);
		do_something_else(sp);
		rcu_track_noref(&key3, sp, false);
		rcu_track_noref(&key1, &noref1, false);
		rcu_read_unlock();
	}

	void grillcb(struct s **spp)
	{
		*spp
	}

	void grill(void)
	{
		foo();
	}

How does the user select the key argument?  It looks like someone
can choose to just pass in NULL -- is that the intent, or am I confused
about this as well?

> > Or am I missing something subtle here?  Given patch 3/4, I suspect not...
> 
> The problem with the code in patch 3/4 is different; currently
> ip_route_input_noref() is basically doing:
> 
> rcu_read_lock();
> 
> rcu_track_noref(&key1, &noref1, true);
> 
> rcu_read_unlock();
> 
> So the rcu lock there silence any RCU based check inside
> ip_route_input_noref() but does not really protect the noref dst.
> 
> Please let me know if the above clarify the scenario.

OK.

I am also concerned about false negatives when the user invokes
rcu_track_noref(..., false) but then leaks the pointer anyway.  Or is
there something you are doing that catches this that I am missing?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] RCU: introduce noref debug
  2017-10-06 16:34     ` Paul E. McKenney
@ 2017-10-09 16:53       ` Paolo Abeni
  2017-10-11  4:02         ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2017-10-09 16:53 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

On Fri, 2017-10-06 at 09:34 -0700, Paul E. McKenney wrote:
> On Fri, Oct 06, 2017 at 05:10:09PM +0200, Paolo Abeni wrote:
> > Hi,
> > 
> > On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote:
> > > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote:
> > > > The networking subsystem is currently using some kind of long-lived
> > > > RCU-protected, references to avoid the overhead of full book-keeping.
> > > > 
> > > > Such references - skb_dst() noref - are stored inside the skbs and can be
> > > > moved across relevant slices of the network stack, with the users
> > > > being in charge of properly clearing the relevant skb - or properly refcount
> > > > the related dst references - before the skb escapes the RCU section.
> > > > 
> > > > We currently don't have any deterministic debug infrastructure to check
> > > > the dst noref usages - and the introduction of others noref artifact is
> > > > currently under discussion.
> > > > 
> > > > This series tries to tackle the above introducing an RCU debug infrastructure
> > > > aimed at spotting incorrect noref pointer usage, in patch one. The
> > > > infrastructure is small and must be explicitly enabled via a newly introduced
> > > > build option.
> > > > 
> > > > Patch two uses such infrastructure to track dst noref usage in the networking
> > > > stack.
> > > > 
> > > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
> > > > on basic scenarios.
> > 
> > Thank you for the prompt reply!
> > > 
> > > This patchset does not look like it handles rcu_read_lock() nesting.
> > > For example, given code like this:
> > > 
> > > 	void foo(void)
> > > 	{
> > > 		rcu_read_lock();
> > > 		rcu_track_noref(&key2, &noref2, true);
> > > 		do_something();
> > > 		rcu_track_noref(&key2, &noref2, false);
> > > 		rcu_read_unlock();
> > > 	}
> > > 
> > > 	void bar(void)
> > > 	{
> > > 		rcu_read_lock();
> > > 		rcu_track_noref(&key1, &noref1, true);
> > > 		do_something_more();
> > > 		foo();
> > > 		do_something_else();
> > > 		rcu_track_noref(&key1, &noref1, false);
> > > 		rcu_read_unlock();
> > > 	}
> > > 
> > > 	void grill(void)
> > > 	{
> > > 		foo();
> > > 	}
> > > 
> > > It looks like foo()'s rcu_read_unlock() will complain about key1.
> > > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but
> > > that will break the call from grill().
> > 
> > Actually the code should cope correctly with your example; when foo()'s
> > rcu_read_unlock() is called, 'cache' contains:
> > 
> > { { &key1, &noref1, 1},  // ...
> > 
> > and when the related __rcu_check_noref() is invoked preempt_count() is
> > 2 - because the check is called before decreasing the preempt counter.
> > 
> > In the main loop inside __rcu_check_noref() we will hit always the
> > 'continue' statement because 'cache->store[i].nesting != nesting', so
> > no warn will be triggered.
> 
> You are right, it was too early, and my example wasn't correct.  How
> about this one?
> 
> 	void foo(void (*f)(struct s *sp), struct s **spp)
> 	{
> 		rcu_read_lock();
> 		rcu_track_noref(&key2, &noref2, true);
> 		f(spp);
> 		rcu_track_noref(&key2, &noref2, false);
> 		rcu_read_unlock();
> 	}
> 
> 	void barcb(struct s **spp)
> 	{
> 		*spp = &noref3;
> 		rcu_track_noref(&key3, *spp, true);
> 	}
> 
> 	void bar(void)
> 	{
> 		struct s *sp;
> 
> 		rcu_read_lock();
> 		rcu_track_noref(&key1, &noref1, true);
> 		do_something_more();
> 		foo(barcb, &sp);
> 		do_something_else(sp);
> 		rcu_track_noref(&key3, sp, false);
> 		rcu_track_noref(&key1, &noref1, false);
> 		rcu_read_unlock();
> 	}
> 
> 	void grillcb(struct s **spp)
> 	{
> 		*spp
> 	}
> 
> 	void grill(void)
> 	{
> 		foo();
> 	}

You are right: this will generate a splat, even if the code it safe.
The false positive can be avoided looking for leaked references only in
the outermost rcu unlook. I did a previous implementation performing
such check, but it emitted very generic splat so I tried to be more
strict. The latter choice allowed to find/do 3/4.

What about using save_stack_trace() in rcu_track_noref(, true) and
reporting such stack trace when the check in the outer most rcu fails?

the current strict/false-postive-prone check could be enabled under an
additional build flag.

> How does the user select the key argument?  It looks like someone
> can choose to just pass in NULL -- is that the intent, or am I confused
> about this as well?

The 'key' argument is intented to discriminate the scope of the same
noref dst attached to different skbs, which happens e.g. as a result of
as skb_dst_copy().

In a generic use case, it can be NULL, too.

> I am also concerned about false negatives when the user invokes
> rcu_track_noref(..., false) but then leaks the pointer anyway.  Or is
> there something you are doing that catches this that I am missing?

If the rcu_track_noref(..., false) is misplaced or missing, yes we can
have false negative. 

In the noref rcu use-case the rcu_track_noref(, false) call is/must be
placed side-by-side with the code clearing the the noref pointer, so
is/should be quite easy avoiding such mistakes.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] RCU: introduce noref debug
  2017-10-09 16:53       ` Paolo Abeni
@ 2017-10-11  4:02         ` Paul E. McKenney
  2017-10-11 14:50           ` Paolo Abeni
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-10-11  4:02 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

On Mon, Oct 09, 2017 at 06:53:12PM +0200, Paolo Abeni wrote:
> On Fri, 2017-10-06 at 09:34 -0700, Paul E. McKenney wrote:
> > On Fri, Oct 06, 2017 at 05:10:09PM +0200, Paolo Abeni wrote:
> > > Hi,
> > > 
> > > On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote:
> > > > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote:
> > > > > The networking subsystem is currently using some kind of long-lived
> > > > > RCU-protected, references to avoid the overhead of full book-keeping.
> > > > > 
> > > > > Such references - skb_dst() noref - are stored inside the skbs and can be
> > > > > moved across relevant slices of the network stack, with the users
> > > > > being in charge of properly clearing the relevant skb - or properly refcount
> > > > > the related dst references - before the skb escapes the RCU section.
> > > > > 
> > > > > We currently don't have any deterministic debug infrastructure to check
> > > > > the dst noref usages - and the introduction of others noref artifact is
> > > > > currently under discussion.
> > > > > 
> > > > > This series tries to tackle the above introducing an RCU debug infrastructure
> > > > > aimed at spotting incorrect noref pointer usage, in patch one. The
> > > > > infrastructure is small and must be explicitly enabled via a newly introduced
> > > > > build option.
> > > > > 
> > > > > Patch two uses such infrastructure to track dst noref usage in the networking
> > > > > stack.
> > > > > 
> > > > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
> > > > > on basic scenarios.
> > > 
> > > Thank you for the prompt reply!
> > > > 
> > > > This patchset does not look like it handles rcu_read_lock() nesting.
> > > > For example, given code like this:
> > > > 
> > > > 	void foo(void)
> > > > 	{
> > > > 		rcu_read_lock();
> > > > 		rcu_track_noref(&key2, &noref2, true);
> > > > 		do_something();
> > > > 		rcu_track_noref(&key2, &noref2, false);
> > > > 		rcu_read_unlock();
> > > > 	}
> > > > 
> > > > 	void bar(void)
> > > > 	{
> > > > 		rcu_read_lock();
> > > > 		rcu_track_noref(&key1, &noref1, true);
> > > > 		do_something_more();
> > > > 		foo();
> > > > 		do_something_else();
> > > > 		rcu_track_noref(&key1, &noref1, false);
> > > > 		rcu_read_unlock();
> > > > 	}
> > > > 
> > > > 	void grill(void)
> > > > 	{
> > > > 		foo();
> > > > 	}
> > > > 
> > > > It looks like foo()'s rcu_read_unlock() will complain about key1.
> > > > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but
> > > > that will break the call from grill().
> > > 
> > > Actually the code should cope correctly with your example; when foo()'s
> > > rcu_read_unlock() is called, 'cache' contains:
> > > 
> > > { { &key1, &noref1, 1},  // ...
> > > 
> > > and when the related __rcu_check_noref() is invoked preempt_count() is
> > > 2 - because the check is called before decreasing the preempt counter.
> > > 
> > > In the main loop inside __rcu_check_noref() we will hit always the
> > > 'continue' statement because 'cache->store[i].nesting != nesting', so
> > > no warn will be triggered.
> > 
> > You are right, it was too early, and my example wasn't correct.  How
> > about this one?
> > 
> > 	void foo(void (*f)(struct s *sp), struct s **spp)
> > 	{
> > 		rcu_read_lock();
> > 		rcu_track_noref(&key2, &noref2, true);
> > 		f(spp);
> > 		rcu_track_noref(&key2, &noref2, false);
> > 		rcu_read_unlock();
> > 	}
> > 
> > 	void barcb(struct s **spp)
> > 	{
> > 		*spp = &noref3;
> > 		rcu_track_noref(&key3, *spp, true);
> > 	}
> > 
> > 	void bar(void)
> > 	{
> > 		struct s *sp;
> > 
> > 		rcu_read_lock();
> > 		rcu_track_noref(&key1, &noref1, true);
> > 		do_something_more();
> > 		foo(barcb, &sp);
> > 		do_something_else(sp);
> > 		rcu_track_noref(&key3, sp, false);
> > 		rcu_track_noref(&key1, &noref1, false);
> > 		rcu_read_unlock();
> > 	}
> > 
> > 	void grillcb(struct s **spp)
> > 	{
> > 		*spp
> > 	}
> > 
> > 	void grill(void)
> > 	{
> > 		foo();
> > 	}
> 
> You are right: this will generate a splat, even if the code it safe.
> The false positive can be avoided looking for leaked references only in
> the outermost rcu unlook. I did a previous implementation performing
> such check, but it emitted very generic splat so I tried to be more
> strict. The latter choice allowed to find/do 3/4.
> 
> What about using save_stack_trace() in rcu_track_noref(, true) and
> reporting such stack trace when the check in the outer most rcu fails?
> 
> the current strict/false-postive-prone check could be enabled under an
> additional build flag.

Linus and Ingo will ask me how users decide how they should set that
additional build flag.  Especially given that if there is code that
requires non-strict checking, isn't everyone required to set up non-strict
checking to avoid false positives?  Unless you can also configure out
all the code that requires non-strict checking, I suppose, but how
would you keep track of what to configure out?

> > How does the user select the key argument?  It looks like someone
> > can choose to just pass in NULL -- is that the intent, or am I confused
> > about this as well?
> 
> The 'key' argument is intented to discriminate the scope of the same
> noref dst attached to different skbs, which happens e.g. as a result of
> as skb_dst_copy().
> 
> In a generic use case, it can be NULL, too.

OK.  There will probably be some discussion about the API in that case.

> > I am also concerned about false negatives when the user invokes
> > rcu_track_noref(..., false) but then leaks the pointer anyway.  Or is
> > there something you are doing that catches this that I am missing?
> 
> If the rcu_track_noref(..., false) is misplaced or missing, yes we can
> have false negative. 
> 
> In the noref rcu use-case the rcu_track_noref(, false) call is/must be
> placed side-by-side with the code clearing the the noref pointer, so
> is/should be quite easy avoiding such mistakes.

True enough.  Except that if people were good about always clearing the
pointer, then the pointer couldn't leak, right?  Or am I missing something
in your use cases?

Or to put it another way -- have you been able to catch any real
pointer-leak bugs with this?  The other RCU debug options have had
pretty long found-bug lists before we accepted them.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] RCU: introduce noref debug
  2017-10-11  4:02         ` Paul E. McKenney
@ 2017-10-11 14:50           ` Paolo Abeni
  2017-10-11 15:45             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2017-10-11 14:50 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

On Tue, 2017-10-10 at 21:02 -0700, Paul E. McKenney wrote:
> Linus and Ingo will ask me how users decide how they should set that
> additional build flag.  Especially given that if there is code that
> requires non-strict checking, isn't everyone required to set up non-strict
> checking to avoid false positives?  Unless you can also configure out
> all the code that requires non-strict checking, I suppose, but how
> would you keep track of what to configure out?

I'm working to a new version using a single compile flag - without
additional strict option.

I don't know of any other subsytem that stores rcu pointer in
datastructures for a longer amount of time. That having said, I wonder
if the tests should completely move to the networking subsystem for the
time being. The Kconfig option would thus be called NET_DEBUG or
something along the lines. For abstraction it would be possible to add
an atomic_notifier_chain to the rcu_read/unlock methods, where multiple
users or checkers could register for. That way we keep the users
seperate from the implementation for the cost of a bit more layering
and more indirect calls. But given that this will anyway slow down
execution a lot, it will anyway only be suitable in
testing/verification/debugging environments.

> OK.  There will probably be some discussion about the API in that case.

I'll drop noref parameter, the key will became mandatory - the exact
position of where the reference of RCU managed object is stored. In the
case of noref dst it is &skb->_skb_refdst. With this kind of API it
should suite more subsystems.

> True enough.  Except that if people were good about always clearing the
> pointer, then the pointer couldn't leak, right?  Or am I missing something
> in your use cases?

This is correct. The dst_entry checking in skb, which this patch series
implements there are strict brackets in the sense of skb_dst_set,
skb_dst_set_noref, skb_dst_force, etc., which form brackets around the
safe uses of those dst_entries. This patch series validates that the
correct skb_dst_* functions are being called before the sk_buff leaves
the rcu protected section. Thus we don't need to modify and review a
lot of code but we can just patch into those helpers already.

> Or to put it another way -- have you been able to catch any real
> pointer-leak bugs with thister-leak bugs with this?  The other RCU
> debug options have had pretty long found-bug lists before we accepted
> them.

There have been two problems found so far, one is a rather minor one
while the other one seems like a normal bug. The patches for those are
part of this series (3/4 and 4/4).

Regards,

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] RCU: introduce noref debug
  2017-10-11 14:50           ` Paolo Abeni
@ 2017-10-11 15:45             ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2017-10-11 15:45 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

On Wed, Oct 11, 2017 at 04:50:36PM +0200, Paolo Abeni wrote:
> On Tue, 2017-10-10 at 21:02 -0700, Paul E. McKenney wrote:
> > Linus and Ingo will ask me how users decide how they should set that
> > additional build flag.  Especially given that if there is code that
> > requires non-strict checking, isn't everyone required to set up non-strict
> > checking to avoid false positives?  Unless you can also configure out
> > all the code that requires non-strict checking, I suppose, but how
> > would you keep track of what to configure out?
> 
> I'm working to a new version using a single compile flag - without
> additional strict option.
> 
> I don't know of any other subsytem that stores rcu pointer in
> datastructures for a longer amount of time. That having said, I wonder
> if the tests should completely move to the networking subsystem for the
> time being. The Kconfig option would thus be called NET_DEBUG or
> something along the lines. For abstraction it would be possible to add
> an atomic_notifier_chain to the rcu_read/unlock methods, where multiple
> users or checkers could register for. That way we keep the users
> seperate from the implementation for the cost of a bit more layering
> and more indirect calls. But given that this will anyway slow down
> execution a lot, it will anyway only be suitable in
> testing/verification/debugging environments.

I like this approach.  And if it does a good job of finding networking
bugs over the next year or so, I would be quite happy to consider
something for all RCU users.

> > OK.  There will probably be some discussion about the API in that case.
> 
> I'll drop noref parameter, the key will became mandatory - the exact
> position of where the reference of RCU managed object is stored. In the
> case of noref dst it is &skb->_skb_refdst. With this kind of API it
> should suite more subsystems.

Interesting.  Do you intend to allow rcu_track_noref() to refuse to
record a pointer?  Other than for the array-full case.

> > True enough.  Except that if people were good about always clearing the
> > pointer, then the pointer couldn't leak, right?  Or am I missing something
> > in your use cases?
> 
> This is correct. The dst_entry checking in skb, which this patch series
> implements there are strict brackets in the sense of skb_dst_set,
> skb_dst_set_noref, skb_dst_force, etc., which form brackets around the
> safe uses of those dst_entries. This patch series validates that the
> correct skb_dst_* functions are being called before the sk_buff leaves
> the rcu protected section. Thus we don't need to modify and review a
> lot of code but we can just patch into those helpers already.

Makes sense.  Those willing to strictly bracket uses gain some debug
assist.

> > Or to put it another way -- have you been able to catch any real
> > pointer-leak bugs with thister-leak bugs with this?  The other RCU
> > debug options have had pretty long found-bug lists before we accepted
> > them.
> 
> There have been two problems found so far, one is a rather minor one
> while the other one seems like a normal bug. The patches for those are
> part of this series (3/4 and 4/4).

I agree that you have started gathering evidence and that the early
indications are positive, if quite mildly so.  If this time next year
there are a few tens of such bugs found, preferably including some
serious ones, I would very likely look favorably on pulling this in to
allow others to use it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-10-11 15:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-06 12:57 [PATCH 0/4] RCU: introduce noref debug Paolo Abeni
2017-10-06 12:57 ` [PATCH 1/4] rcu: " Paolo Abeni
2017-10-06 14:13   ` Steven Rostedt
2017-10-06 12:57 ` [PATCH 2/4] net: use RCU noref infrastructure to track dst noref Paolo Abeni
2017-10-06 12:57 ` [PATCH 3/4] ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref() Paolo Abeni
2017-10-06 12:57 ` [PATCH 4/4] tcp: avoid noref dst leak on input path Paolo Abeni
2017-10-06 14:37   ` Eric Dumazet
2017-10-06 15:21     ` Paolo Abeni
2017-10-06 15:32       ` Eric Dumazet
2017-10-06 13:34 ` [PATCH 0/4] RCU: introduce noref debug Paul E. McKenney
2017-10-06 15:10   ` Paolo Abeni
2017-10-06 16:34     ` Paul E. McKenney
2017-10-09 16:53       ` Paolo Abeni
2017-10-11  4:02         ` Paul E. McKenney
2017-10-11 14:50           ` Paolo Abeni
2017-10-11 15:45             ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.