All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] net: add refcnt tracking for some common objects
@ 2021-12-07  4:02 Xin Long
  2021-12-07  4:02 ` [PATCH net-next 1/5] lib: add obj_cnt infrastructure Xin Long
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Xin Long @ 2021-12-07  4:02 UTC (permalink / raw)
  To: network dev
  Cc: Eric Dumazet, davem, kuba, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

This patchset provides a simple lib(obj_cnt) to count the operatings on any
objects, and saves them into a gobal hashtable. Each node in this hashtable
can be identified with a calltrace and an object pointer. A calltrace could
be a function called from somewhere, like dev_hold() called by:

    inetdev_init+0xff/0x1c0
    inetdev_event+0x4b7/0x600
    raw_notifier_call_chain+0x41/0x50
    register_netdevice+0x481/0x580

and an object pointer would be the dev that this function is accessing:

    dev_hold(dev).

When this call comes to this object, a node including calltrace + object +
counter will be created if it doesn't exist, and the counter in this node
will increment if it already exists. Pretty simple.

So naturally this lib can be used to track the refcnt of any objects, all
it has to do is put obj_cnt_track() to the place where this object is
held or put. It will count how many times this call has operated this
object after checking if this object and this type(hold/put) accessing
are being tracked.

After the 1st lib patch, the other patches add the refcnt tracking for
netdev, dst, in6_dev and xfrm_state, and each has example how to use
in the changelog. The common use is:

    # sysctl -w obj_cnt.control="clear" # clear the old result

    # sysctl -w obj_cnt.type=0x1     # track type 0x1 operating
    # sysctl -w obj_cnt.name=test    # match name == test or
    # sysctl -w obj_cnt.index=1      # match index == 1
    # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' calltrace

    ... (reproduce the issue)

    # sysctl -w obj_cnt.control="scan"  # print the new result

Note that after seeing Eric's another patchset for refcnt tracking I
decided to post this patchset. As in this implemenation, it has some
benefits which I think worth sharing:

  - it runs fast:
    1. it doesn't create nodes for the repeatitive calls to the same
       objects, and it saves memory and time.
    2. the depth of the calltrace to record is configurable, at most
       time small calltrace also saves memory and time, but will not
       affect the analysis.
    3. kmem_cache used also contributes to the performance.

  - easy to use:
    1. it doesn't add any members to the object structure, just place
       an API to the hold/put functions, and it keep the kernel code
       clear and won't break any ABIs.
    2. three types of matching conditions for tracking can be set up,
       int, string by sysctl and API, and pointer by API.

This patchset has helped solve quite some refcnt leaks, from netdev to
dst, in6_dev, xfrm_dst. There are also some difficult cases that we've
addressed with this pathset:

  - some leaks were only reproduciable in customer's environment by running
    for a couple of months, "probe" data was even too huge to save and
    analyse, so saving memory is crucial.

  - some are not able to reproduce if the tracking patch worked slowly,
    like not using kmem cache, so running fast is important.

  - some leak was a chain, such as a leak we see it as a dev leak, but it
    was caused by a dst or in6_dev leak, and this dst or in6_dev leak was
    caused by another object, so tracking multiple types at the same time
    is effective.

Xin Long (5):
  lib: add obj_cnt infrastructure
  net: track netdev refcnt with obj_cnt
  net: track dst refcnt with obj_cnt
  net: track in6_dev refcnt with obj_cnt
  net: track xfrm_state refcnt with obj_cnt

 include/linux/netdevice.h |  11 ++
 include/linux/obj_cnt.h   |  20 +++
 include/net/addrconf.h    |   7 +-
 include/net/dst.h         |   8 +-
 include/net/sock.h        |   3 +-
 include/net/xfrm.h        |  11 ++
 lib/Kconfig.debug         |   7 +
 lib/Makefile              |   1 +
 lib/obj_cnt.c             | 285 ++++++++++++++++++++++++++++++++++++++
 net/core/dst.c            |   2 +
 10 files changed, 352 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/obj_cnt.h
 create mode 100644 lib/obj_cnt.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/5] lib: add obj_cnt infrastructure
  2021-12-07  4:02 [PATCH net-next 0/5] net: add refcnt tracking for some common objects Xin Long
@ 2021-12-07  4:02 ` Xin Long
  2021-12-07  4:02 ` [PATCH net-next 2/5] net: track netdev refcnt with obj_cnt Xin Long
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Xin Long @ 2021-12-07  4:02 UTC (permalink / raw)
  To: network dev
  Cc: Eric Dumazet, davem, kuba, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

This patch is to create a lib to counter any operatings to objects,
the same call to operate the same object will be saved as one node
which includes the call trace and object pointer, and the counter
in the node will increase next time when the same call comes to
this object.

There are a few sysctl parameters are exposed to users:

  1. Three of them are used to filter the calls with the objects:
    - index
      a 'int' type that can be used to match objects, such as netdev's
      index for netdev, and xfrm_state's spi for xfrm_state;
    - name
      a 'char *' type that can be used to match objects, such as netdev's
      name for netdev, and dst->dev's name for dst.
    - type
      a 'bitmap' that can be used to mark which operating is allowed to
      be counted, such as dev_hold, dev_put, inet6_hold, inet6_put.

  2. One is used to 'clear' or 'scan' the test result:
     - control
       it uses 'clear' to drop all nodes from the hashtable, and 'scan'
       to print out the details of all counter nodes.

  3. Another one is used to set up the stack depth we want to save:
     - nr_entries
       how detailed is a call trace it wants to use (1 to 16), the bigger
       it set to, the more nodes might be created, as the call might come
       with different call traces.

There are 2 APIs are exported for developers to count any calls to operate
objects they want:

  1. void obj_cnt_track(type, index, name, obj);
     check if this call can be matched with type and this object can be
     matched with any of index, name and obj. If yes, record this call
     to operate one obj, and create a node including this obj and calli
     trace if it doesn't exist in the hashtable, and increment the
     count of this node if it exists.

  2. void obj_cnt_set(index, name, obj);
     this won't be used unless a developer want to set the match condition
     somewhere in kernel code, especially to match with obj.

This lib can typically be used to track one object's refcnt, and all
we have to do is put obj_cnt_track() into this object's hold and put
functions. More details, see the following patches.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/obj_cnt.h |  12 ++
 lib/Kconfig.debug       |   7 +
 lib/Makefile            |   1 +
 lib/obj_cnt.c           | 277 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 297 insertions(+)
 create mode 100644 include/linux/obj_cnt.h
 create mode 100644 lib/obj_cnt.c

diff --git a/include/linux/obj_cnt.h b/include/linux/obj_cnt.h
new file mode 100644
index 000000000000..e5185f7022d1
--- /dev/null
+++ b/include/linux/obj_cnt.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_OBJ_CNT_H
+#define _LINUX_OBJ_CNT_H
+
+enum {
+	OBJ_CNT_TYPE_MAX
+};
+
+void obj_cnt_track(int type, int index, char *name, void *obj);
+void obj_cnt_set(int index, char *name, void *obj);
+
+#endif /* _LINUX_OBJ_CNT_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6504b97f8dfd..2c9d14b98783 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2665,6 +2665,13 @@ config HYPERV_TESTING
 
 endmenu # "Kernel Testing and Coverage"
 
+config OBJ_CNT
+	bool "object operating counter"
+	select STACKTRACE
+	help
+	  This lib provide several APIs to count some objects' operating,
+	  and can be used to track refcnt.
+
 source "Documentation/Kconfig"
 
 endmenu # Kernel hacking
diff --git a/lib/Makefile b/lib/Makefile
index b213a7bbf3fd..5dc53da2c0b2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -271,6 +271,7 @@ KASAN_SANITIZE_stackdepot.o := n
 KCOV_INSTRUMENT_stackdepot.o := n
 
 obj-$(CONFIG_REF_TRACKER) += ref_tracker.o
+obj-$(CONFIG_OBJ_CNT) += obj_cnt.o
 
 libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \
 	       fdt_empty_tree.o fdt_addresses.o
diff --git a/lib/obj_cnt.c b/lib/obj_cnt.c
new file mode 100644
index 000000000000..19ced2303452
--- /dev/null
+++ b/lib/obj_cnt.c
@@ -0,0 +1,277 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/stacktrace.h>
+#include <linux/sysctl.h>
+#include <linux/obj_cnt.h>
+
+#define OBJ_CNT_HASHENTRIES	(1 << 8)
+#define OBJ_CNT_MAX		UINT_MAX
+#define OBJ_CNT_NRENTRIES	16
+static struct kmem_cache	*obj_cnt_cache	__read_mostly;
+static struct hlist_head	*obj_cnt_head;
+static unsigned int		obj_cnt_num;
+static spinlock_t		obj_cnt_lock;
+
+static char *obj_cnt_str[OBJ_CNT_TYPE_MAX] = {
+};
+
+struct obj_cnt {
+	struct hlist_node	hlist;
+	void			*obj;	/* the obj to count its operations */
+	u64			cnt;	/* how many times it's been operated */
+	int			type;	/* operation to the obj like get put */
+	unsigned long		entries[OBJ_CNT_NRENTRIES];	/* the stack */
+	unsigned int		nr_entries;
+};
+
+static int	obj_cnt_index;
+static int	obj_cnt_type;
+static char	obj_cnt_name[16];
+static void	*obj_cnt_obj;
+static int	obj_cnt_nr_entries;
+static int	obj_cnt_max_nr_entries = OBJ_CNT_NRENTRIES;
+
+static inline struct hlist_head *obj_cnt_hash(unsigned long hash)
+{
+	return &obj_cnt_head[hash & (OBJ_CNT_HASHENTRIES - 1)];
+}
+
+static struct obj_cnt *obj_cnt_lookup(void *obj, int type, unsigned long entries[],
+				      unsigned int nr_entries)
+{
+	struct hlist_head *head;
+	struct obj_cnt *oc;
+
+	head = obj_cnt_hash(entries[0]);
+	hlist_for_each_entry(oc, head, hlist)
+		if (oc->obj == obj && oc->type == type &&
+		    oc->nr_entries == nr_entries &&
+		    !memcmp(oc->entries, entries, nr_entries * sizeof(unsigned long)))
+			return oc;
+
+	return NULL;
+}
+
+static struct obj_cnt *obj_cnt_create(void *obj, int type, unsigned long entries[],
+				      unsigned int nr_entries)
+{
+	struct obj_cnt *oc;
+
+	if (obj_cnt_num == OBJ_CNT_MAX - 1) {
+		pr_err("OBJ_CNT: %s: too many obj_cnt added\n", __func__);
+		return NULL;
+	}
+	oc = kmem_cache_alloc(obj_cnt_cache, GFP_ATOMIC);
+	if (!oc) {
+		pr_err("OBJ_CNT: %s: no memory\n", __func__);
+		return NULL;
+	}
+	oc->nr_entries = nr_entries;
+	memcpy(oc->entries, entries, nr_entries * sizeof(unsigned long));
+	oc->obj = obj;
+	oc->cnt = 0;
+	oc->type = type;
+	hlist_add_head(&oc->hlist, obj_cnt_hash(oc->entries[0]));
+	obj_cnt_num++;
+
+	return oc;
+}
+
+static bool obj_cnt_allowed(int type, int index, char *name, void *obj)
+{
+	if (!(obj_cnt_type & (1 << type)))
+		return false;
+	if (index && index == obj_cnt_index)
+		return true;
+	if (name && !strcmp(name, obj_cnt_name))
+		return true;
+	if (obj && obj == obj_cnt_obj)
+		return true;
+	return false;
+}
+
+void obj_cnt_track(int type, int index, char *name, void *obj)
+{
+	unsigned long entries[OBJ_CNT_NRENTRIES];
+	unsigned int nr_entries;
+	unsigned long flags;
+	struct obj_cnt *oc;
+
+	if (!obj_cnt_allowed(type, index, name, obj))
+		return;
+
+	nr_entries = stack_trace_save(entries, obj_cnt_nr_entries, 1);
+	nr_entries = filter_irq_stacks(entries, nr_entries);
+	spin_lock_irqsave(&obj_cnt_lock, flags);  /* TODO: use rcu lock for lookup */
+	oc = obj_cnt_lookup(obj, type, entries, nr_entries);
+	if (!oc)
+		oc = obj_cnt_create(obj, type, entries, nr_entries);
+	if (oc) {
+		oc->cnt++;
+		WARN_ONCE(!oc->cnt, "OBJ_CNT: %s, the counter overflows\n", __func__);
+		pr_debug("OBJ_CNT: %s: obj: %px, type: %s, cnt: %llu, caller: %pS\n",
+			 __func__, oc->obj, obj_cnt_str[oc->type], oc->cnt, (void *)oc->entries[0]);
+	}
+	spin_unlock_irqrestore(&obj_cnt_lock, flags);
+}
+EXPORT_SYMBOL(obj_cnt_track);
+
+static void obj_cnt_dump(void *obj, int type)
+{
+	struct hlist_head *head;
+	struct obj_cnt *oc;
+	int h, first = 1;
+
+	spin_lock_bh(&obj_cnt_lock);
+	for (h = 0; h < OBJ_CNT_HASHENTRIES; h++) {
+		head = &obj_cnt_head[h];
+		hlist_for_each_entry(oc, head, hlist) {
+			if ((type && oc->type != type) || (obj && oc->obj != obj))
+				continue;
+			if (first) {
+				pr_info("OBJ_CNT: results =>\n");
+				first = 0;
+			}
+			pr_info("OBJ_CNT: %s: obj: %px, type: %s, cnt: %llu, caller: %pS, calltrace:\n",
+				__func__, oc->obj, obj_cnt_str[oc->type], oc->cnt, (void *)oc->entries[0]);
+			if (oc->nr_entries > 1)
+				stack_trace_print(oc->entries, oc->nr_entries, 4);
+		}
+	}
+	spin_unlock_bh(&obj_cnt_lock);
+}
+
+void obj_cnt_set(int index, char *name, void *obj)
+{
+	if (name)
+		strncpy(obj_cnt_name, name, min_t(size_t, 16, strlen(name)));
+	if (index)
+		obj_cnt_index = index;
+	if (obj)
+		obj_cnt_obj = obj;
+}
+EXPORT_SYMBOL(obj_cnt_set);
+
+static void obj_cnt_free(void)
+{
+	struct hlist_head *head;
+	struct hlist_node *tmp;
+	struct obj_cnt *oc;
+	int h;
+
+	spin_lock_bh(&obj_cnt_lock);
+	for (h = 0; h < OBJ_CNT_HASHENTRIES; h++) {
+		head = &obj_cnt_head[h];
+		hlist_for_each_entry_safe(oc, tmp, head, hlist) {
+			hlist_del(&oc->hlist);
+			kmem_cache_free(obj_cnt_cache, oc);
+		}
+	}
+	obj_cnt_num = 0;
+	spin_unlock_bh(&obj_cnt_lock);
+}
+
+static int proc_docntcmd(struct ctl_table *table, int write, void *buffer,
+			 size_t *lenp, loff_t *ppos)
+{
+	struct ctl_table tbl;
+	char cmd[8] = {0};
+	int ret;
+
+	if (!write)
+		return -EINVAL;
+
+	memset(&tbl, 0, sizeof(struct ctl_table));
+	tbl.data = cmd;
+	tbl.maxlen = sizeof(cmd);
+	ret = proc_dostring(&tbl, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+
+	if (!strcmp(cmd, "clear"))
+		obj_cnt_free();
+	else if (!strcmp(cmd, "scan"))
+		obj_cnt_dump(NULL, 0);
+	else
+		return -EINVAL;
+	return 0;
+}
+
+static struct ctl_table obj_cnt_table[] = {
+	{
+		.procname	= "index",
+		.data		= &obj_cnt_index,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
+		.procname	= "name",
+		.data		= obj_cnt_name,
+		.maxlen		= 16,
+		.mode		= 0644,
+		.proc_handler	= proc_dostring
+	},
+	{
+		.procname	= "type",
+		.data		= &obj_cnt_type,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
+		.procname	= "control",
+		.maxlen		= 16,
+		.mode		= 0200,
+		.proc_handler	= proc_docntcmd
+	},
+	{
+		.procname	= "nr_entries",
+		.data		= &obj_cnt_nr_entries,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ONE,
+		.extra2         = &obj_cnt_max_nr_entries
+	},
+	{ }
+};
+
+static __init int obj_cnt_init(void)
+{
+	static struct ctl_table_header *hdr;
+	int i;
+
+	memset(obj_cnt_name, 0, sizeof(obj_cnt_name));
+	obj_cnt_index = 0;
+	obj_cnt_type = 0;
+	obj_cnt_obj = NULL;
+	obj_cnt_nr_entries = 8;
+	hdr = register_sysctl("obj_cnt", obj_cnt_table);
+	if (!hdr)
+		return -ENOMEM;
+
+	obj_cnt_cache = kmem_cache_create("obj_cnt_cache", sizeof(struct obj_cnt),
+					  0, SLAB_HWCACHE_ALIGN, NULL);
+	if (!obj_cnt_cache) {
+		unregister_sysctl_table(hdr);
+		return -ENOMEM;
+	}
+
+	obj_cnt_head = kmalloc_array(OBJ_CNT_HASHENTRIES, sizeof(*obj_cnt_head),
+				     GFP_KERNEL);
+	if (!obj_cnt_head) {
+		kmem_cache_destroy(obj_cnt_cache);
+		unregister_sysctl_table(hdr);
+		return -ENOMEM;
+	}
+	for (i = 0; i < OBJ_CNT_HASHENTRIES; i++)
+		INIT_HLIST_HEAD(&obj_cnt_head[i]);
+
+	spin_lock_init(&obj_cnt_lock);
+	return 0;
+}
+
+subsys_initcall(obj_cnt_init);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/5] net: track netdev refcnt with obj_cnt
  2021-12-07  4:02 [PATCH net-next 0/5] net: add refcnt tracking for some common objects Xin Long
  2021-12-07  4:02 ` [PATCH net-next 1/5] lib: add obj_cnt infrastructure Xin Long
@ 2021-12-07  4:02 ` Xin Long
  2021-12-07  4:02 ` [PATCH net-next 3/5] net: track dst " Xin Long
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Xin Long @ 2021-12-07  4:02 UTC (permalink / raw)
  To: network dev
  Cc: Eric Dumazet, davem, kuba, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

Two types are added into obj_cnt to count dev_hold and dev_put,
and all it does is put obj_cnt_track_by_dev() into these two
functions.

Here is an example to track the refcnt of a netdev named dummy0:

  # sysctl -w obj_cnt.control="clear" # clear the old result

  # sysctl -w obj_cnt.type=0x3     # enable dev_hold/put track
  # sysctl -w obj_cnt.name=dummy0  # count dev_hold/put(dummy0)
  # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' call trace

  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip link set dummy0 down
  # ip link del dummy0

  # sysctl -w obj_cnt.control="scan"  # print the new result
  # dmesg
  OBJ_CNT: obj_cnt_dump: obj: ffff894402397000, type: dev_put, cnt: 1,:
       in_dev_finish_destroy+0x6a/0x80
       rcu_do_batch+0x164/0x4b0
       rcu_core+0x249/0x350
       __do_softirq+0xf5/0x2ea
  OBJ_CNT: obj_cnt_dump: obj: ffff894402397000, type: dev_hold, cnt: 1,:
       inetdev_init+0xff/0x1c0
       inetdev_event+0x4b7/0x600
       raw_notifier_call_chain+0x41/0x50
       register_netdevice+0x481/0x580
  ...
  OBJ_CNT: obj_cnt_dump: obj: ffff894402397000, type: dev_put, cnt: 1,:
       rx_queue_release+0xa8/0xb0
       kobject_release+0x43/0x140
       net_rx_queue_update_kobjects+0x13c/0x190
       netdev_unregister_kobject+0x4a/0x80
  OBJ_CNT: obj_cnt_dump: obj: ffff894402397000, type: dev_put, cnt: 3,:
       fib_nh_common_release+0x10f/0x120
       fib6_info_destroy_rcu+0x73/0xc0
       rcu_do_batch+0x164/0x4b0
       rcu_core+0x249/0x350
  ...

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/netdevice.h | 11 +++++++++++
 include/linux/obj_cnt.h   |  2 ++
 lib/obj_cnt.c             |  2 ++
 3 files changed, 15 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 69dca1edd5a6..4cbcd34829da 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -49,6 +49,7 @@
 #include <linux/hashtable.h>
 #include <linux/rbtree.h>
 #include <linux/ref_tracker.h>
+#include <linux/obj_cnt.h>
 
 struct netpoll_info;
 struct device;
@@ -3815,6 +3816,14 @@ extern unsigned int	netdev_budget_usecs;
 /* Called by rtnetlink.c:rtnl_unlock() */
 void netdev_run_todo(void);
 
+static inline void obj_cnt_track_by_dev(void *obj, struct net_device *dev, int type)
+{
+#ifdef CONFIG_OBJ_CNT
+	if (dev)
+		obj_cnt_track(type, dev->ifindex, dev->name, obj);
+#endif
+}
+
 /**
  *	dev_put - release reference to device
  *	@dev: network device
@@ -3825,6 +3834,7 @@ void netdev_run_todo(void);
 static inline void dev_put(struct net_device *dev)
 {
 	if (dev) {
+		obj_cnt_track_by_dev(dev, dev, OBJ_CNT_DEV_PUT);
 #ifdef CONFIG_PCPU_DEV_REFCNT
 		this_cpu_dec(*dev->pcpu_refcnt);
 #else
@@ -3843,6 +3853,7 @@ static inline void dev_put(struct net_device *dev)
 static inline void dev_hold(struct net_device *dev)
 {
 	if (dev) {
+		obj_cnt_track_by_dev(dev, dev, OBJ_CNT_DEV_HOLD);
 #ifdef CONFIG_PCPU_DEV_REFCNT
 		this_cpu_inc(*dev->pcpu_refcnt);
 #else
diff --git a/include/linux/obj_cnt.h b/include/linux/obj_cnt.h
index e5185f7022d1..bb2d37484a32 100644
--- a/include/linux/obj_cnt.h
+++ b/include/linux/obj_cnt.h
@@ -3,6 +3,8 @@
 #define _LINUX_OBJ_CNT_H
 
 enum {
+	OBJ_CNT_DEV_HOLD,
+	OBJ_CNT_DEV_PUT,
 	OBJ_CNT_TYPE_MAX
 };
 
diff --git a/lib/obj_cnt.c b/lib/obj_cnt.c
index 19ced2303452..12a1fdafd632 100644
--- a/lib/obj_cnt.c
+++ b/lib/obj_cnt.c
@@ -15,6 +15,8 @@ static unsigned int		obj_cnt_num;
 static spinlock_t		obj_cnt_lock;
 
 static char *obj_cnt_str[OBJ_CNT_TYPE_MAX] = {
+	"dev_hold",
+	"dev_put"
 };
 
 struct obj_cnt {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/5] net: track dst refcnt with obj_cnt
  2021-12-07  4:02 [PATCH net-next 0/5] net: add refcnt tracking for some common objects Xin Long
  2021-12-07  4:02 ` [PATCH net-next 1/5] lib: add obj_cnt infrastructure Xin Long
  2021-12-07  4:02 ` [PATCH net-next 2/5] net: track netdev refcnt with obj_cnt Xin Long
@ 2021-12-07  4:02 ` Xin Long
  2021-12-07  4:02 ` [PATCH net-next 4/5] net: track in6_dev " Xin Long
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Xin Long @ 2021-12-07  4:02 UTC (permalink / raw)
  To: network dev
  Cc: Eric Dumazet, davem, kuba, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

Two types are added into obj_cnt to count dst_hold* and dst_release*,
and all it does is put obj_cnt_track_by_dev() into these two
functions.

Here is an example to track the refcnt of a dst whose dev is dummy0:

  # sysctl -w obj_cnt.control="clear" # clear the old result

  # sysctl -w obj_cnt.type=0xc     # enable dst_hold/put track
  # sysctl -w obj_cnt.name=dummy0  # count dst_hold/put(dst)
  # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' call trace
  #
  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 1.1.1.1/24 dev dummy0
  # ping 1.1.1.2 -c 2
  # ip link set dummy0 down
  # ip link del dummy0

  # sysctl -w obj_cnt.control="scan"  # print the new result
  # dmesg
  OBJ_CNT: obj_cnt_dump: obj: ffff9e45d7e8b780, type: dst_hold, cnt: 1,:
       rt_cache_route+0x45/0xc0
       rt_set_nexthop.constprop.63+0x143/0x3c0
       ip_route_output_key_hash_rcu+0x256/0x9a0
       ip_route_output_key_hash+0x72/0xa0
  OBJ_CNT: obj_cnt_dump: obj: ffff9e45cbef9100, type: dst_put, cnt: 1,:
       dst_release+0x2a/0x90
       __dev_queue_xmit+0x72c/0xc90
       ip6_finish_output2+0x2d2/0x660
       ip6_output+0x6e/0x130
  ...
  OBJ_CNT: obj_cnt_dump: obj: ffff9e45ca463d00, type: dst_put, cnt: 1,:
       dst_release+0x2a/0x90
       __dev_queue_xmit+0x72c/0xc90
       ip6_finish_output2+0x1e8/0x660
       ip6_output+0x6e/0x130
  OBJ_CNT: obj_cnt_dump: obj: ffff9e45d7e8b780, type: dst_hold, cnt: 2,:
       ip_route_output_key_hash_rcu+0x88e/0x9a0
       ip_route_output_key_hash+0x72/0xa0
       ip_route_output_flow+0x19/0x50
       raw_sendmsg+0x32b/0xe40
  ...

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/obj_cnt.h | 2 ++
 include/net/dst.h       | 8 +++++++-
 include/net/sock.h      | 3 ++-
 lib/obj_cnt.c           | 4 +++-
 net/core/dst.c          | 2 ++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/linux/obj_cnt.h b/include/linux/obj_cnt.h
index bb2d37484a32..ae4c12beb876 100644
--- a/include/linux/obj_cnt.h
+++ b/include/linux/obj_cnt.h
@@ -5,6 +5,8 @@
 enum {
 	OBJ_CNT_DEV_HOLD,
 	OBJ_CNT_DEV_PUT,
+	OBJ_CNT_DST_HOLD,
+	OBJ_CNT_DST_PUT,
 	OBJ_CNT_TYPE_MAX
 };
 
diff --git a/include/net/dst.h b/include/net/dst.h
index 6aa252c3fc55..e2704495d32f 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -227,6 +227,7 @@ static inline void dst_hold(struct dst_entry *dst)
 	 * If your kernel compilation stops here, please check
 	 * the placement of __refcnt in struct dst_entry
 	 */
+	obj_cnt_track_by_dev(dst, dst->dev, OBJ_CNT_DST_HOLD);
 	BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63);
 	WARN_ON(atomic_inc_not_zero(&dst->__refcnt) == 0);
 }
@@ -298,7 +299,12 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb
  */
 static inline bool dst_hold_safe(struct dst_entry *dst)
 {
-	return atomic_inc_not_zero(&dst->__refcnt);
+	if (atomic_inc_not_zero(&dst->__refcnt)) {
+		obj_cnt_track_by_dev(dst, dst->dev, OBJ_CNT_DST_HOLD);
+		return true;
+	}
+
+	return false;
 }
 
 /**
diff --git a/include/net/sock.h b/include/net/sock.h
index ae61cd0b650d..07d59c27ac11 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2071,8 +2071,9 @@ sk_dst_get(struct sock *sk)
 
 	rcu_read_lock();
 	dst = rcu_dereference(sk->sk_dst_cache);
-	if (dst && !atomic_inc_not_zero(&dst->__refcnt))
+	if (dst && !dst_hold_safe(dst))
 		dst = NULL;
+
 	rcu_read_unlock();
 	return dst;
 }
diff --git a/lib/obj_cnt.c b/lib/obj_cnt.c
index 12a1fdafd632..648adc135080 100644
--- a/lib/obj_cnt.c
+++ b/lib/obj_cnt.c
@@ -16,7 +16,9 @@ static spinlock_t		obj_cnt_lock;
 
 static char *obj_cnt_str[OBJ_CNT_TYPE_MAX] = {
 	"dev_hold",
-	"dev_put"
+	"dev_put",
+	"dst_hold",
+	"dst_put"
 };
 
 struct obj_cnt {
diff --git a/net/core/dst.c b/net/core/dst.c
index d16c2c9bfebd..94ff1fe0fc09 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -169,6 +169,7 @@ void dst_release(struct dst_entry *dst)
 	if (dst) {
 		int newrefcnt;
 
+		obj_cnt_track_by_dev(dst, dst->dev, OBJ_CNT_DST_PUT);
 		newrefcnt = atomic_dec_return(&dst->__refcnt);
 		if (WARN_ONCE(newrefcnt < 0, "dst_release underflow"))
 			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
@@ -184,6 +185,7 @@ void dst_release_immediate(struct dst_entry *dst)
 	if (dst) {
 		int newrefcnt;
 
+		obj_cnt_track_by_dev(dst, dst->dev, OBJ_CNT_DST_PUT);
 		newrefcnt = atomic_dec_return(&dst->__refcnt);
 		if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow"))
 			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/5] net: track in6_dev refcnt with obj_cnt
  2021-12-07  4:02 [PATCH net-next 0/5] net: add refcnt tracking for some common objects Xin Long
                   ` (2 preceding siblings ...)
  2021-12-07  4:02 ` [PATCH net-next 3/5] net: track dst " Xin Long
@ 2021-12-07  4:02 ` Xin Long
  2021-12-07  4:02 ` [PATCH net-next 5/5] net: track xfrm_state " Xin Long
  2021-12-07  4:41 ` [PATCH net-next 0/5] net: add refcnt tracking for some common objects Eric Dumazet
  5 siblings, 0 replies; 10+ messages in thread
From: Xin Long @ 2021-12-07  4:02 UTC (permalink / raw)
  To: network dev
  Cc: Eric Dumazet, davem, kuba, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

Two types are added into obj_cnt to count in6_dev_hold/get and in6_dev_put,
and all it does is put obj_cnt_track_by_dev() into these two functions.

Here is an example to track the refcnt of a in6_dev which is attached
on a netdev named dummy0:

  # sysctl -w obj_cnt.control="clear" # clear the old result

  # sysctl -w obj_cnt.type=0x30    # enable in6_dev_hold/put track
  # sysctl -w obj_cnt.name=dummy0  # count in6_dev_hold/put(in6_dev)
  # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' call trace

  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 2020::1/64 dev dummy0
  # ip link set dummy0 down
  # ip link del dummy0

  # sysctl -w obj_cnt.control="scan"  # print the new result
  # dmesg
  OBJ_CNT: obj_cnt_dump: obj: ffff8a1e17906000, type: in6_dev_put, cnt: 1,:
       ipv6_mc_down+0x11e/0x1a0
       addrconf_ifdown+0x53c/0x670
       addrconf_notify+0xb8/0x940
       raw_notifier_call_chain+0x41/0x50
  OBJ_CNT: obj_cnt_dump: obj: ffff8a1e17906000, type: in6_dev_put, cnt: 1,:
       ma_put+0x4f/0xb0
       ipv6_mc_destroy_dev+0x150/0x180
       addrconf_ifdown+0x478/0x670
       addrconf_notify+0xb8/0x940
  ...
  OBJ_CNT: obj_cnt_dump: obj: ffff8a1e17906000, type: in6_dev_hold, cnt: 2,:
       fib6_nh_init+0x6b4/0x8f0
       ip6_route_info_create+0x4f2/0x670
       ip6_route_add+0x18/0x90
       addrconf_prefix_route.isra.50+0x100/0x150
  OBJ_CNT: obj_cnt_dump: obj: ffff8a1e17906000, type: in6_dev_hold, cnt: 2,:
       fib6_nh_init+0x6b4/0x8f0
       ip6_route_info_create+0x4f2/0x670
       addrconf_f6i_alloc+0xe3/0x130
       ipv6_add_addr+0x16a/0x740
  ...

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/obj_cnt.h | 2 ++
 include/net/addrconf.h  | 7 ++++++-
 lib/obj_cnt.c           | 4 +++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/obj_cnt.h b/include/linux/obj_cnt.h
index ae4c12beb876..f014b2e613d9 100644
--- a/include/linux/obj_cnt.h
+++ b/include/linux/obj_cnt.h
@@ -7,6 +7,8 @@ enum {
 	OBJ_CNT_DEV_PUT,
 	OBJ_CNT_DST_HOLD,
 	OBJ_CNT_DST_PUT,
+	OBJ_CNT_IN6_DEV_HOLD,
+	OBJ_CNT_IN6_DEV_PUT,
 	OBJ_CNT_TYPE_MAX
 };
 
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 78ea3e332688..370a96b57dbd 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -357,8 +357,10 @@ static inline struct inet6_dev *in6_dev_get(const struct net_device *dev)
 
 	rcu_read_lock();
 	idev = rcu_dereference(dev->ip6_ptr);
-	if (idev)
+	if (idev) {
+		obj_cnt_track_by_dev(idev, idev->dev, OBJ_CNT_IN6_DEV_HOLD);
 		refcount_inc(&idev->refcnt);
+	}
 	rcu_read_unlock();
 	return idev;
 }
@@ -374,6 +376,7 @@ void in6_dev_finish_destroy(struct inet6_dev *idev);
 
 static inline void in6_dev_put(struct inet6_dev *idev)
 {
+	obj_cnt_track_by_dev(idev, idev->dev, OBJ_CNT_IN6_DEV_PUT);
 	if (refcount_dec_and_test(&idev->refcnt))
 		in6_dev_finish_destroy(idev);
 }
@@ -390,11 +393,13 @@ static inline void in6_dev_put_clear(struct inet6_dev **pidev)
 
 static inline void __in6_dev_put(struct inet6_dev *idev)
 {
+	obj_cnt_track_by_dev(idev, idev->dev, OBJ_CNT_IN6_DEV_PUT);
 	refcount_dec(&idev->refcnt);
 }
 
 static inline void in6_dev_hold(struct inet6_dev *idev)
 {
+	obj_cnt_track_by_dev(idev, idev->dev, OBJ_CNT_IN6_DEV_HOLD);
 	refcount_inc(&idev->refcnt);
 }
 
diff --git a/lib/obj_cnt.c b/lib/obj_cnt.c
index 648adc135080..8756efc005ed 100644
--- a/lib/obj_cnt.c
+++ b/lib/obj_cnt.c
@@ -18,7 +18,9 @@ static char *obj_cnt_str[OBJ_CNT_TYPE_MAX] = {
 	"dev_hold",
 	"dev_put",
 	"dst_hold",
-	"dst_put"
+	"dst_put",
+	"in6_dev_hold",
+	"in6_dev_put"
 };
 
 struct obj_cnt {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 5/5] net: track xfrm_state refcnt with obj_cnt
  2021-12-07  4:02 [PATCH net-next 0/5] net: add refcnt tracking for some common objects Xin Long
                   ` (3 preceding siblings ...)
  2021-12-07  4:02 ` [PATCH net-next 4/5] net: track in6_dev " Xin Long
@ 2021-12-07  4:02 ` Xin Long
  2021-12-07  4:41 ` [PATCH net-next 0/5] net: add refcnt tracking for some common objects Eric Dumazet
  5 siblings, 0 replies; 10+ messages in thread
From: Xin Long @ 2021-12-07  4:02 UTC (permalink / raw)
  To: network dev
  Cc: Eric Dumazet, davem, kuba, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

Two types are added into obj_cnt to count xfrm_state_hold and
xfrm_state_put*, and all it does is put obj_cnt_track_by_index()
into these two functions.

Here is a example to track the refcnt of a xfrm_state whose spi
is 0x100:

  # sysctl -w obj_cnt.control="clear" # clear the old result

  # sysctl -w obj_cnt.type=0xc0    # enable xfrm_state_hold/put track
  # sysctl -w obj_cnt.index=0x100  # count xfrm_state_hold/put(state)
  # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' call trace

  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 1.1.1.1/24 dev dummy0
  # ip xfrm state add src 1.1.1.1 dst 1.1.1.2 spi 0x100 proto esp enc aes \
    0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f \
    mode tunnel sel src 1.1.1.1 dst 1.1.1.2
  # ip xfrm policy add dir out src 1.1.1.1 dst 1.1.1.2 tmpl src 1.1.1.1 \
    dst 1.1.1.2 proto esp mode tunnel
  # ping 1.1.1.2 -c 1
  # ip link set dummy0 down
  # ip link del dummy0

  # sysctl -w obj_cnt.control="scan"  # print the new result
  # dmesg
  OBJ_CNT: obj_cnt_dump: obj: ffff8ca629cb0f00, type: xfrm_state_hold, cnt: 1,:
       xfrm_add_sa+0x476/0x5f0
       xfrm_user_rcv_msg+0x13c/0x250
       netlink_rcv_skb+0x50/0x100
       xfrm_netlink_rcv+0x30/0x40
  OBJ_CNT: obj_cnt_dump: obj: ffff8ca629cb0f00, type: xfrm_state_put, cnt: 2,:
       xfrm4_dst_destroy+0x110/0x130
       dst_destroy+0x37/0xe0
       rcu_do_batch+0x164/0x4b0
       rcu_core+0x249/0x350
  OBJ_CNT: obj_cnt_dump: obj: ffff8ca629cb0f00, type: xfrm_state_put, cnt: 1,:
       xfrm_add_sa+0x497/0x5f0
       xfrm_user_rcv_msg+0x13c/0x250
       netlink_rcv_skb+0x50/0x100
       xfrm_netlink_rcv+0x30/0x40

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/obj_cnt.h |  2 ++
 include/net/xfrm.h      | 11 +++++++++++
 lib/obj_cnt.c           |  4 +++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/obj_cnt.h b/include/linux/obj_cnt.h
index f014b2e613d9..42611aa321c7 100644
--- a/include/linux/obj_cnt.h
+++ b/include/linux/obj_cnt.h
@@ -9,6 +9,8 @@ enum {
 	OBJ_CNT_DST_PUT,
 	OBJ_CNT_IN6_DEV_HOLD,
 	OBJ_CNT_IN6_DEV_PUT,
+	OBJ_CNT_XFRM_STATE_HOLD,
+	OBJ_CNT_XFRM_STATE_PUT,
 	OBJ_CNT_TYPE_MAX
 };
 
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 2308210793a0..543840fba068 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -772,25 +772,36 @@ static inline void xfrm_pols_put(struct xfrm_policy **pols, int npols)
 
 void __xfrm_state_destroy(struct xfrm_state *, bool);
 
+static inline void obj_cnt_track_by_index(void *obj, int index, int type)
+{
+#ifdef CONFIG_OBJ_CNT
+	obj_cnt_track(type, index, NULL, obj);
+#endif
+}
+
 static inline void __xfrm_state_put(struct xfrm_state *x)
 {
+	obj_cnt_track_by_index(x, ntohl(x->id.spi), OBJ_CNT_XFRM_STATE_PUT);
 	refcount_dec(&x->refcnt);
 }
 
 static inline void xfrm_state_put(struct xfrm_state *x)
 {
+	obj_cnt_track_by_index(x, ntohl(x->id.spi), OBJ_CNT_XFRM_STATE_PUT);
 	if (refcount_dec_and_test(&x->refcnt))
 		__xfrm_state_destroy(x, false);
 }
 
 static inline void xfrm_state_put_sync(struct xfrm_state *x)
 {
+	obj_cnt_track_by_index(x, ntohl(x->id.spi), OBJ_CNT_XFRM_STATE_PUT);
 	if (refcount_dec_and_test(&x->refcnt))
 		__xfrm_state_destroy(x, true);
 }
 
 static inline void xfrm_state_hold(struct xfrm_state *x)
 {
+	obj_cnt_track_by_index(x, ntohl(x->id.spi), OBJ_CNT_XFRM_STATE_HOLD);
 	refcount_inc(&x->refcnt);
 }
 
diff --git a/lib/obj_cnt.c b/lib/obj_cnt.c
index 8756efc005ed..f054455abe8d 100644
--- a/lib/obj_cnt.c
+++ b/lib/obj_cnt.c
@@ -20,7 +20,9 @@ static char *obj_cnt_str[OBJ_CNT_TYPE_MAX] = {
 	"dst_hold",
 	"dst_put",
 	"in6_dev_hold",
-	"in6_dev_put"
+	"in6_dev_put",
+	"xfrm_state_hold",
+	"xfrm_state_put"
 };
 
 struct obj_cnt {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/5] net: add refcnt tracking for some common objects
  2021-12-07  4:02 [PATCH net-next 0/5] net: add refcnt tracking for some common objects Xin Long
                   ` (4 preceding siblings ...)
  2021-12-07  4:02 ` [PATCH net-next 5/5] net: track xfrm_state " Xin Long
@ 2021-12-07  4:41 ` Eric Dumazet
  2021-12-07 18:17   ` Xin Long
  5 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2021-12-07  4:41 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, davem, kuba, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

On Mon, Dec 6, 2021 at 8:02 PM Xin Long <lucien.xin@gmail.com> wrote:
>
> This patchset provides a simple lib(obj_cnt) to count the operatings on any
> objects, and saves them into a gobal hashtable. Each node in this hashtable
> can be identified with a calltrace and an object pointer. A calltrace could
> be a function called from somewhere, like dev_hold() called by:
>
>     inetdev_init+0xff/0x1c0
>     inetdev_event+0x4b7/0x600
>     raw_notifier_call_chain+0x41/0x50
>     register_netdevice+0x481/0x580
>
> and an object pointer would be the dev that this function is accessing:
>
>     dev_hold(dev).
>
> When this call comes to this object, a node including calltrace + object +
> counter will be created if it doesn't exist, and the counter in this node
> will increment if it already exists. Pretty simple.
>
> So naturally this lib can be used to track the refcnt of any objects, all
> it has to do is put obj_cnt_track() to the place where this object is
> held or put. It will count how many times this call has operated this
> object after checking if this object and this type(hold/put) accessing
> are being tracked.
>
> After the 1st lib patch, the other patches add the refcnt tracking for
> netdev, dst, in6_dev and xfrm_state, and each has example how to use
> in the changelog. The common use is:
>
>     # sysctl -w obj_cnt.control="clear" # clear the old result
>
>     # sysctl -w obj_cnt.type=0x1     # track type 0x1 operating
>     # sysctl -w obj_cnt.name=test    # match name == test or
>     # sysctl -w obj_cnt.index=1      # match index == 1
>     # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' calltrace
>
>     ... (reproduce the issue)
>
>     # sysctl -w obj_cnt.control="scan"  # print the new result
>
> Note that after seeing Eric's another patchset for refcnt tracking I
> decided to post this patchset. As in this implemenation, it has some
> benefits which I think worth sharing:
>

How can your code coexist with ref_tracker ?

>   - it runs fast:
>     1. it doesn't create nodes for the repeatitive calls to the same
>        objects, and it saves memory and time.
>     2. the depth of the calltrace to record is configurable, at most
>        time small calltrace also saves memory and time, but will not
>        affect the analysis.
>     3. kmem_cache used also contributes to the performance.

Points 2/3 can be implemented right away in the ref_tracker infra,
please send patches.

Quite frankly using a global hash table seems wrong, stack_depot
already has this logic, why reimplement it ?
stack_depot is damn fast (no spinlock in fast path)

Seeing that your patches add chunks in lib/obj_cnt.c, I do not see how
you can claim this is generic code.

I don't know, it seems very strange to send this patch series now I
have done about 60 patches on these issues.

And by doing this work, I found already two bugs in our stack.

You can be sure syzbot will send us many reports, most syzbot repros
use a very limited number of objects.

About performance : You use a single spinlock to protect your hash table.
In my implementation, there is a spinlock per 'directory (eg one
spinlock per struct net_device, one spinlock per struct net), it is
more scalable.

My tests have not shown a significant cost of the ref_tracker
(the major cost comes from stack_trace_save() which you also use)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/5] net: add refcnt tracking for some common objects
  2021-12-07  4:41 ` [PATCH net-next 0/5] net: add refcnt tracking for some common objects Eric Dumazet
@ 2021-12-07 18:17   ` Xin Long
  2021-12-07 18:34     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Xin Long @ 2021-12-07 18:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: network dev, davem, Jakub Kicinski, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

On Mon, Dec 6, 2021 at 11:41 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Dec 6, 2021 at 8:02 PM Xin Long <lucien.xin@gmail.com> wrote:
> >
> > This patchset provides a simple lib(obj_cnt) to count the operatings on any
> > objects, and saves them into a gobal hashtable. Each node in this hashtable
> > can be identified with a calltrace and an object pointer. A calltrace could
> > be a function called from somewhere, like dev_hold() called by:
> >
> >     inetdev_init+0xff/0x1c0
> >     inetdev_event+0x4b7/0x600
> >     raw_notifier_call_chain+0x41/0x50
> >     register_netdevice+0x481/0x580
> >
> > and an object pointer would be the dev that this function is accessing:
> >
> >     dev_hold(dev).
> >
> > When this call comes to this object, a node including calltrace + object +
> > counter will be created if it doesn't exist, and the counter in this node
> > will increment if it already exists. Pretty simple.
> >
> > So naturally this lib can be used to track the refcnt of any objects, all
> > it has to do is put obj_cnt_track() to the place where this object is
> > held or put. It will count how many times this call has operated this
> > object after checking if this object and this type(hold/put) accessing
> > are being tracked.
> >
> > After the 1st lib patch, the other patches add the refcnt tracking for
> > netdev, dst, in6_dev and xfrm_state, and each has example how to use
> > in the changelog. The common use is:
> >
> >     # sysctl -w obj_cnt.control="clear" # clear the old result
> >
> >     # sysctl -w obj_cnt.type=0x1     # track type 0x1 operating
> >     # sysctl -w obj_cnt.name=test    # match name == test or
> >     # sysctl -w obj_cnt.index=1      # match index == 1
> >     # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' calltrace
> >
> >     ... (reproduce the issue)
> >
> >     # sysctl -w obj_cnt.control="scan"  # print the new result
> >
> > Note that after seeing Eric's another patchset for refcnt tracking I
> > decided to post this patchset. As in this implemenation, it has some
> > benefits which I think worth sharing:
> >
>
> How can your code coexist with ref_tracker ?
Hi, Eric, Thanks for your checking

It won't affect ref_tracker, one can even use both at the same time.

>
> >   - it runs fast:
> >     1. it doesn't create nodes for the repeatitive calls to the same
> >        objects, and it saves memory and time.
> >     2. the depth of the calltrace to record is configurable, at most
> >        time small calltrace also saves memory and time, but will not
> >        affect the analysis.
> >     3. kmem_cache used also contributes to the performance.
>
> Points 2/3 can be implemented right away in the ref_tracker infra,
> please send patches.
>
> Quite frankly using a global hash table seems wrong, stack_depot
> already has this logic, why reimplement it ?
> stack_depot is damn fast (no spinlock in fast path)
What this patchset is trying to add is a calltrace+object counter.
I was looking at stack_depot after seeing you patch, stack_depot saves
calltrace only, no object(I guess this is okay, I can save object to
to entries[0] if I want to use it), but also it's not a counter.

I'm not sure if it's allowed to do some change and add a counter to
the node of stack_depot, like when it's found in saving, the counter
increments. That will be perfect for this patchset.

This global spinlock will eventually be used only to protect the new
node's insertion. For the fast path (lookup), rcu_read_lock() will take
care of it. I haven't got time to add it. but this won't be a problem.

>
> Seeing that your patches add chunks in lib/obj_cnt.c, I do not see how
> you can claim this is generic code.
I planned it as a obj operating counter, it can be used for counting any
operatings, not just for the refcnt tracker which is only _put and _hold
operatings.

>
> I don't know, it seems very strange to send this patch series now I
> have done about 60 patches on these issues.
This patch is not to do exactly the same things as your patchset, I think your
patch saves more information into the objects in the kernel memory, it will
be useful for vmcore analysis.

This patchset is working in a different way, it's going to target a
specific object with index or name or pointer matched and some types
of function calls to it, we have to plan in advance after we know
which object (like it's name, index or string to match) is leaked.

>
> And by doing this work, I found already two bugs in our stack.
Great effects!
I can see that you must go over all networking stack for dev operations.

>
> You can be sure syzbot will send us many reports, most syzbot repros
> use a very limited number of objects.
>
> About performance : You use a single spinlock to protect your hash table.
> In my implementation, there is a spinlock per 'directory (eg one
> spinlock per struct net_device, one spinlock per struct net), it is
> more scalable.
I used per net spinlock at first, but I want to make the code more generic,
and not only for the network, then I decided to make it not related to net.

After using rcu_lock in the fast path, I think this single spinlock won't
affect much, besides, this single lock can be replaced by a per hlist lock
on each hlist_head, it will also save some.

>
> My tests have not shown a significant cost of the ref_tracker
> (the major cost comes from stack_trace_save() which you also use)
I added "run fast" in cover, mostly because it won't create many nodes
if dev_hold/put are called many times, it only increments the count if it's
the same call to the same object already existing in the hashtable.

dev could be fine, thinking about tracking dst, when sending packets, dst
can be hold/put too many times, creating nodes for each call is not a good
idea, especially for some leak only occurs once for few months which I've
seen quite a few times in our customer envs.


Other things are:
most net_dev leaks I've met are actually dst leak, some are in6_dev leak,
only tracking net_dev is not enough to address it, do you have plans to
add dst track for dst too? That may be a lot of changes too?

I think adding new members into core structures only for debugging
may not be a good choice, as it will bring troubles to downstream for
the backport because of kABI.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/5] net: add refcnt tracking for some common objects
  2021-12-07 18:17   ` Xin Long
@ 2021-12-07 18:34     ` Eric Dumazet
  2021-12-07 21:58       ` Xin Long
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2021-12-07 18:34 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, davem, Jakub Kicinski, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

On Tue, Dec 7, 2021 at 10:17 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> On Mon, Dec 6, 2021 at 11:41 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Dec 6, 2021 at 8:02 PM Xin Long <lucien.xin@gmail.com> wrote:
> > >
> > > This patchset provides a simple lib(obj_cnt) to count the operatings on any
> > > objects, and saves them into a gobal hashtable. Each node in this hashtable
> > > can be identified with a calltrace and an object pointer. A calltrace could
> > > be a function called from somewhere, like dev_hold() called by:
> > >
> > >     inetdev_init+0xff/0x1c0
> > >     inetdev_event+0x4b7/0x600
> > >     raw_notifier_call_chain+0x41/0x50
> > >     register_netdevice+0x481/0x580
> > >
> > > and an object pointer would be the dev that this function is accessing:
> > >
> > >     dev_hold(dev).
> > >
> > > When this call comes to this object, a node including calltrace + object +
> > > counter will be created if it doesn't exist, and the counter in this node
> > > will increment if it already exists. Pretty simple.
> > >
> > > So naturally this lib can be used to track the refcnt of any objects, all
> > > it has to do is put obj_cnt_track() to the place where this object is
> > > held or put. It will count how many times this call has operated this
> > > object after checking if this object and this type(hold/put) accessing
> > > are being tracked.
> > >
> > > After the 1st lib patch, the other patches add the refcnt tracking for
> > > netdev, dst, in6_dev and xfrm_state, and each has example how to use
> > > in the changelog. The common use is:
> > >
> > >     # sysctl -w obj_cnt.control="clear" # clear the old result
> > >
> > >     # sysctl -w obj_cnt.type=0x1     # track type 0x1 operating
> > >     # sysctl -w obj_cnt.name=test    # match name == test or
> > >     # sysctl -w obj_cnt.index=1      # match index == 1
> > >     # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' calltrace
> > >
> > >     ... (reproduce the issue)
> > >
> > >     # sysctl -w obj_cnt.control="scan"  # print the new result
> > >
> > > Note that after seeing Eric's another patchset for refcnt tracking I
> > > decided to post this patchset. As in this implemenation, it has some
> > > benefits which I think worth sharing:
> > >
> >
> > How can your code coexist with ref_tracker ?
> Hi, Eric, Thanks for your checking
>
> It won't affect ref_tracker, one can even use both at the same time.
>
> >
> > >   - it runs fast:
> > >     1. it doesn't create nodes for the repeatitive calls to the same
> > >        objects, and it saves memory and time.
> > >     2. the depth of the calltrace to record is configurable, at most
> > >        time small calltrace also saves memory and time, but will not
> > >        affect the analysis.
> > >     3. kmem_cache used also contributes to the performance.
> >
> > Points 2/3 can be implemented right away in the ref_tracker infra,
> > please send patches.
> >
> > Quite frankly using a global hash table seems wrong, stack_depot
> > already has this logic, why reimplement it ?
> > stack_depot is damn fast (no spinlock in fast path)
> What this patchset is trying to add is a calltrace+object counter.
> I was looking at stack_depot after seeing you patch, stack_depot saves
> calltrace only, no object(I guess this is okay, I can save object to
> to entries[0] if I want to use it), but also it's not a counter.
>
> I'm not sure if it's allowed to do some change and add a counter to
> the node of stack_depot, like when it's found in saving, the counter
> increments. That will be perfect for this patchset.
>
> This global spinlock will eventually be used only to protect the new
> node's insertion. For the fast path (lookup), rcu_read_lock() will take
> care of it. I haven't got time to add it. but this won't be a problem.
>
> >
> > Seeing that your patches add chunks in lib/obj_cnt.c, I do not see how
> > you can claim this is generic code.
> I planned it as a obj operating counter, it can be used for counting any
> operatings, not just for the refcnt tracker which is only _put and _hold
> operatings.
>
> >
> > I don't know, it seems very strange to send this patch series now I
> > have done about 60 patches on these issues.
> This patch is not to do exactly the same things as your patchset, I think your
> patch saves more information into the objects in the kernel memory, it will
> be useful for vmcore analysis.
>
> This patchset is working in a different way, it's going to target a
> specific object with index or name or pointer matched and some types
> of function calls to it, we have to plan in advance after we know
> which object (like it's name, index or string to match) is leaked.
>
> >
> > And by doing this work, I found already two bugs in our stack.
> Great effects!
> I can see that you must go over all networking stack for dev operations.
>
> >
> > You can be sure syzbot will send us many reports, most syzbot repros
> > use a very limited number of objects.
> >
> > About performance : You use a single spinlock to protect your hash table.
> > In my implementation, there is a spinlock per 'directory (eg one
> > spinlock per struct net_device, one spinlock per struct net), it is
> > more scalable.
> I used per net spinlock at first, but I want to make the code more generic,
> and not only for the network, then I decided to make it not related to net.
>
> After using rcu_lock in the fast path, I think this single spinlock won't
> affect much, besides, this single lock can be replaced by a per hlist lock
> on each hlist_head, it will also save some.
>
> >
> > My tests have not shown a significant cost of the ref_tracker
> > (the major cost comes from stack_trace_save() which you also use)
> I added "run fast" in cover, mostly because it won't create many nodes
> if dev_hold/put are called many times, it only increments the count if it's
> the same call to the same object already existing in the hashtable.
>
> dev could be fine, thinking about tracking dst, when sending packets, dst
> can be hold/put too many times, creating nodes for each call is not a good
> idea, especially for some leak only occurs once for few months which I've
> seen quite a few times in our customer envs.

That is why I chose to not automatically track all dev_hold()/dev_put()

For the ones that are contained in a short function, with clear
contract, there is
no need for adding tracking.

But before taking this decision, I am looking at each case, very carefully.

>
>
> Other things are:
> most net_dev leaks I've met are actually dst leak, some are in6_dev leak,
> only tracking net_dev is not enough to address it, do you have plans to
> add dst track for dst too? That may be a lot of changes too?

Yes, the plan is to add trackers when we think there is a high
probability of having leaks.

I think you missed the one major point of the exercise :

By carefully doing the pairing, we can spot bugs already, even without
turning the CONFIG_ options.

Once this (patient) work done, it is done, the infra will catch future
bugs right away.

>
> I think adding new members into core structures only for debugging
> may not be a good choice, as it will bring troubles to downstream for
> the backport because of kABI.

The main point of this stuff, is to allow syzbot to simply turn on a
few CONFIG options and let
it discover past/new issues. No special sysctls magics.

I see you added introspection, to list all active references, I think
this could be added on a per object
basis, to help developers have ways to better understand the kernel behavior.

Again, I think there is no reason your work can not complement mine.
They serve different purposes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/5] net: add refcnt tracking for some common objects
  2021-12-07 18:34     ` Eric Dumazet
@ 2021-12-07 21:58       ` Xin Long
  0 siblings, 0 replies; 10+ messages in thread
From: Xin Long @ 2021-12-07 21:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: network dev, davem, Jakub Kicinski, Marcelo Ricardo Leitner,
	Davide Caratti, Paolo Abeni

On Tue, Dec 7, 2021 at 1:34 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Dec 7, 2021 at 10:17 AM Xin Long <lucien.xin@gmail.com> wrote:
> >
> > On Mon, Dec 6, 2021 at 11:41 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, Dec 6, 2021 at 8:02 PM Xin Long <lucien.xin@gmail.com> wrote:
> > > >
> > > > This patchset provides a simple lib(obj_cnt) to count the operatings on any
> > > > objects, and saves them into a gobal hashtable. Each node in this hashtable
> > > > can be identified with a calltrace and an object pointer. A calltrace could
> > > > be a function called from somewhere, like dev_hold() called by:
> > > >
> > > >     inetdev_init+0xff/0x1c0
> > > >     inetdev_event+0x4b7/0x600
> > > >     raw_notifier_call_chain+0x41/0x50
> > > >     register_netdevice+0x481/0x580
> > > >
> > > > and an object pointer would be the dev that this function is accessing:
> > > >
> > > >     dev_hold(dev).
> > > >
> > > > When this call comes to this object, a node including calltrace + object +
> > > > counter will be created if it doesn't exist, and the counter in this node
> > > > will increment if it already exists. Pretty simple.
> > > >
> > > > So naturally this lib can be used to track the refcnt of any objects, all
> > > > it has to do is put obj_cnt_track() to the place where this object is
> > > > held or put. It will count how many times this call has operated this
> > > > object after checking if this object and this type(hold/put) accessing
> > > > are being tracked.
> > > >
> > > > After the 1st lib patch, the other patches add the refcnt tracking for
> > > > netdev, dst, in6_dev and xfrm_state, and each has example how to use
> > > > in the changelog. The common use is:
> > > >
> > > >     # sysctl -w obj_cnt.control="clear" # clear the old result
> > > >
> > > >     # sysctl -w obj_cnt.type=0x1     # track type 0x1 operating
> > > >     # sysctl -w obj_cnt.name=test    # match name == test or
> > > >     # sysctl -w obj_cnt.index=1      # match index == 1
> > > >     # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' calltrace
> > > >
> > > >     ... (reproduce the issue)
> > > >
> > > >     # sysctl -w obj_cnt.control="scan"  # print the new result
> > > >
> > > > Note that after seeing Eric's another patchset for refcnt tracking I
> > > > decided to post this patchset. As in this implemenation, it has some
> > > > benefits which I think worth sharing:
> > > >
> > >
> > > How can your code coexist with ref_tracker ?
> > Hi, Eric, Thanks for your checking
> >
> > It won't affect ref_tracker, one can even use both at the same time.
> >
> > >
> > > >   - it runs fast:
> > > >     1. it doesn't create nodes for the repeatitive calls to the same
> > > >        objects, and it saves memory and time.
> > > >     2. the depth of the calltrace to record is configurable, at most
> > > >        time small calltrace also saves memory and time, but will not
> > > >        affect the analysis.
> > > >     3. kmem_cache used also contributes to the performance.
> > >
> > > Points 2/3 can be implemented right away in the ref_tracker infra,
> > > please send patches.
> > >
> > > Quite frankly using a global hash table seems wrong, stack_depot
> > > already has this logic, why reimplement it ?
> > > stack_depot is damn fast (no spinlock in fast path)
> > What this patchset is trying to add is a calltrace+object counter.
> > I was looking at stack_depot after seeing you patch, stack_depot saves
> > calltrace only, no object(I guess this is okay, I can save object to
> > to entries[0] if I want to use it), but also it's not a counter.
> >
> > I'm not sure if it's allowed to do some change and add a counter to
> > the node of stack_depot, like when it's found in saving, the counter
> > increments. That will be perfect for this patchset.
> >
> > This global spinlock will eventually be used only to protect the new
> > node's insertion. For the fast path (lookup), rcu_read_lock() will take
> > care of it. I haven't got time to add it. but this won't be a problem.
> >
> > >
> > > Seeing that your patches add chunks in lib/obj_cnt.c, I do not see how
> > > you can claim this is generic code.
> > I planned it as a obj operating counter, it can be used for counting any
> > operatings, not just for the refcnt tracker which is only _put and _hold
> > operatings.
> >
> > >
> > > I don't know, it seems very strange to send this patch series now I
> > > have done about 60 patches on these issues.
> > This patch is not to do exactly the same things as your patchset, I think your
> > patch saves more information into the objects in the kernel memory, it will
> > be useful for vmcore analysis.
> >
> > This patchset is working in a different way, it's going to target a
> > specific object with index or name or pointer matched and some types
> > of function calls to it, we have to plan in advance after we know
> > which object (like it's name, index or string to match) is leaked.
> >
> > >
> > > And by doing this work, I found already two bugs in our stack.
> > Great effects!
> > I can see that you must go over all networking stack for dev operations.
> >
> > >
> > > You can be sure syzbot will send us many reports, most syzbot repros
> > > use a very limited number of objects.
> > >
> > > About performance : You use a single spinlock to protect your hash table.
> > > In my implementation, there is a spinlock per 'directory (eg one
> > > spinlock per struct net_device, one spinlock per struct net), it is
> > > more scalable.
> > I used per net spinlock at first, but I want to make the code more generic,
> > and not only for the network, then I decided to make it not related to net.
> >
> > After using rcu_lock in the fast path, I think this single spinlock won't
> > affect much, besides, this single lock can be replaced by a per hlist lock
> > on each hlist_head, it will also save some.
> >
> > >
> > > My tests have not shown a significant cost of the ref_tracker
> > > (the major cost comes from stack_trace_save() which you also use)
> > I added "run fast" in cover, mostly because it won't create many nodes
> > if dev_hold/put are called many times, it only increments the count if it's
> > the same call to the same object already existing in the hashtable.
> >
> > dev could be fine, thinking about tracking dst, when sending packets, dst
> > can be hold/put too many times, creating nodes for each call is not a good
> > idea, especially for some leak only occurs once for few months which I've
> > seen quite a few times in our customer envs.
>
> That is why I chose to not automatically track all dev_hold()/dev_put()
>
> For the ones that are contained in a short function, with clear
> contract, there is
> no need for adding tracking.
>
> But before taking this decision, I am looking at each case, very carefully.
OK, that's quite a lot of effort.
I'm sure you have a strong code review skill on networking code.

>
> >
> >
> > Other things are:
> > most net_dev leaks I've met are actually dst leak, some are in6_dev leak,
> > only tracking net_dev is not enough to address it, do you have plans to
> > add dst track for dst too? That may be a lot of changes too?
>
> Yes, the plan is to add trackers when we think there is a high
> probability of having leaks.
>
> I think you missed the one major point of the exercise :
>
> By carefully doing the pairing, we can spot bugs already, even without
> turning the CONFIG_ options.
>
> Once this (patient) work done, it is done, the infra will catch future
> bugs right away.
It seems to me the phase of your patches' development is more fun.
For this patchset, it's the analysis part, we will do the pairing with the
results, and check which _hold and _put didn't come in pairs.

But I believe that refcnt track for netdev only is not enough, you might
eventually see the leak be caused by a missing dst_destroy() call, for
example.

>
> >
> > I think adding new members into core structures only for debugging
> > may not be a good choice, as it will bring troubles to downstream for
> > the backport because of kABI.
>
> The main point of this stuff, is to allow syzbot to simply turn on a
> few CONFIG options and let
> it discover past/new issues. No special sysctls magics.
I see, collecting data for syzbot, that's one thing I didn't know.

>
> I see you added introspection, to list all active references, I think
> this could be added on a per object
> basis, to help developers have ways to better understand the kernel behavior.
Printing the ‘counting' results will save a lot of time to address the leaks.
Tracking multiple types at the same time will allow to address the problem
by reproducing the issue only once.

>
> Again, I think there is no reason your work can not complement mine.
> They serve different purposes.
Yep, this implementation is used more by developers to address known problems,
not by bot to look for new problems.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-07 21:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07  4:02 [PATCH net-next 0/5] net: add refcnt tracking for some common objects Xin Long
2021-12-07  4:02 ` [PATCH net-next 1/5] lib: add obj_cnt infrastructure Xin Long
2021-12-07  4:02 ` [PATCH net-next 2/5] net: track netdev refcnt with obj_cnt Xin Long
2021-12-07  4:02 ` [PATCH net-next 3/5] net: track dst " Xin Long
2021-12-07  4:02 ` [PATCH net-next 4/5] net: track in6_dev " Xin Long
2021-12-07  4:02 ` [PATCH net-next 5/5] net: track xfrm_state " Xin Long
2021-12-07  4:41 ` [PATCH net-next 0/5] net: add refcnt tracking for some common objects Eric Dumazet
2021-12-07 18:17   ` Xin Long
2021-12-07 18:34     ` Eric Dumazet
2021-12-07 21:58       ` Xin Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.