linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/7] kref improvements
@ 2016-11-14 17:39 Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 1/7] kref: Add KREF_INIT() Peter Zijlstra
                   ` (7 more replies)
  0 siblings, 8 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel

This series unfscks kref and then implements it in terms of refcount_t.

x86_64-allyesconfig compile tested and boot tested with my regular config.

refcount_t is as per the previous thread, it BUGs on over-/underflow and
saturates at UINT_MAX, such that if we ever overflow, we'll never free again.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 1/7] kref: Add KREF_INIT()
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
@ 2016-11-14 17:39 ` Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 2/7] kref: Add kref_read() Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-ref-1.patch --]
[-- Type: text/plain, Size: 1943 bytes --]

Since we need to change the implementation, stop exposing internals.

Provide KREF_INIT() to allow static initialization of struct kref.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/block/drbd/drbd_bitmap.c |    2 +-
 fs/fuse/fuse_i.h                 |    2 +-
 include/linux/kref.h             |    2 ++
 init/version.c                   |    4 +---
 kernel/pid.c                     |    4 +---
 5 files changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -1070,7 +1070,7 @@ static int bm_rw(struct drbd_device *dev
 		.done = 0,
 		.flags = flags,
 		.error = 0,
-		.kref = { ATOMIC_INIT(2) },
+		.kref = KREF_INIT(2),
 	};
 
 	if (!get_ldev_if_state(device, D_ATTACHING)) {  /* put is in drbd_bm_aio_ctx_destroy() */
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -256,7 +256,7 @@ struct fuse_io_priv {
 
 #define FUSE_IO_PRIV_SYNC(f) \
 {					\
-	.refcnt = { ATOMIC_INIT(1) },	\
+	.refcnt = KREF_INIT(1),		\
 	.async = 0,			\
 	.file = f,			\
 }
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -24,6 +24,8 @@ struct kref {
 	atomic_t refcount;
 };
 
+#define KREF_INIT(n)	{ .refcount = ATOMIC_INIT(n), }
+
 /**
  * kref_init - initialize object.
  * @kref: object in question.
--- a/init/version.c
+++ b/init/version.c
@@ -23,9 +23,7 @@ int version_string(LINUX_VERSION_CODE);
 #endif
 
 struct uts_namespace init_uts_ns = {
-	.kref = {
-		.refcount	= ATOMIC_INIT(2),
-	},
+	.kref = KREF_INIT(2),
 	.name = {
 		.sysname	= UTS_SYSNAME,
 		.nodename	= UTS_NODENAME,
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -68,9 +68,7 @@ static inline int mk_pid(struct pid_name
  * the scheme scales to up to 4 million PIDs, runtime.
  */
 struct pid_namespace init_pid_ns = {
-	.kref = {
-		.refcount       = ATOMIC_INIT(2),
-	},
+	.kref = KREF_INIT(2),
 	.pidmap = {
 		[ 0 ... PIDMAP_ENTRIES-1] = { ATOMIC_INIT(BITS_PER_PAGE), NULL }
 	},

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 1/7] kref: Add KREF_INIT() Peter Zijlstra
@ 2016-11-14 17:39 ` Peter Zijlstra
  2016-11-14 18:16   ` Christoph Hellwig
  2016-11-15  7:33   ` [RFC][PATCH 2/7] kref: Add kref_read() Greg KH
  2016-11-14 17:39 ` [RFC][PATCH 3/7] kref: Kill kref_sub() Peter Zijlstra
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-ref-2.patch --]
[-- Type: text/plain, Size: 42970 bytes --]

Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

	atomic_read(&kref->refcount)
	kref->refcount.counter

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/block/drbd/drbd_req.c                |    2 -
 drivers/block/rbd.c                          |    8 ++---
 drivers/block/virtio_blk.c                   |    2 -
 drivers/gpu/drm/drm_gem_cma_helper.c         |    2 -
 drivers/gpu/drm/drm_info.c                   |    2 -
 drivers/gpu/drm/drm_mode_object.c            |    4 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c        |    2 -
 drivers/gpu/drm/msm/msm_gem.c                |    2 -
 drivers/gpu/drm/nouveau/nouveau_fence.c      |    2 -
 drivers/gpu/drm/omapdrm/omap_gem.c           |    2 -
 drivers/gpu/drm/ttm/ttm_bo.c                 |    4 +-
 drivers/gpu/drm/ttm/ttm_object.c             |    2 -
 drivers/infiniband/hw/cxgb3/iwch_cm.h        |    6 ++--
 drivers/infiniband/hw/cxgb3/iwch_qp.c        |    2 -
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h       |    6 ++--
 drivers/infiniband/hw/cxgb4/qp.c             |    2 -
 drivers/infiniband/hw/usnic/usnic_ib_sysfs.c |    6 ++--
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |    4 +-
 drivers/misc/genwqe/card_dev.c               |    2 -
 drivers/misc/mei/debugfs.c                   |    2 -
 drivers/pci/hotplug/pnv_php.c                |    2 -
 drivers/pci/slot.c                           |    2 -
 drivers/scsi/bnx2fc/bnx2fc_io.c              |    8 ++---
 drivers/scsi/cxgbi/libcxgbi.h                |    4 +-
 drivers/scsi/lpfc/lpfc_debugfs.c             |    2 -
 drivers/scsi/lpfc/lpfc_els.c                 |    2 -
 drivers/scsi/lpfc/lpfc_hbadisc.c             |   40 +++++++++++++--------------
 drivers/scsi/lpfc/lpfc_init.c                |    3 --
 drivers/scsi/qla2xxx/tcm_qla2xxx.c           |    4 +-
 drivers/staging/android/ion/ion.c            |    2 -
 drivers/staging/comedi/comedi_buf.c          |    2 -
 drivers/target/target_core_pr.c              |   10 +++---
 drivers/target/tcm_fc/tfc_sess.c             |    2 -
 drivers/usb/gadget/function/f_fs.c           |    2 -
 fs/exofs/sys.c                               |    2 -
 fs/ocfs2/cluster/netdebug.c                  |    2 -
 fs/ocfs2/cluster/tcp.c                       |    2 -
 fs/ocfs2/dlm/dlmdebug.c                      |   12 ++++----
 fs/ocfs2/dlm/dlmdomain.c                     |    2 -
 fs/ocfs2/dlm/dlmmaster.c                     |    8 ++---
 fs/ocfs2/dlm/dlmunlock.c                     |    2 -
 include/drm/drm_framebuffer.h                |    2 -
 include/drm/ttm/ttm_bo_driver.h              |    4 +-
 include/linux/kref.h                         |    5 +++
 include/linux/sunrpc/cache.h                 |    2 -
 include/net/bluetooth/hci_core.h             |    4 +-
 net/bluetooth/6lowpan.c                      |    2 -
 net/bluetooth/a2mp.c                         |    4 +-
 net/bluetooth/amp.c                          |    4 +-
 net/bluetooth/l2cap_core.c                   |    4 +-
 net/ceph/messenger.c                         |    4 +-
 net/ceph/osd_client.c                        |   10 +++---
 net/sunrpc/cache.c                           |    2 -
 net/sunrpc/svc_xprt.c                        |    6 ++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c     |    4 +-
 55 files changed, 120 insertions(+), 116 deletions(-)

--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
 		/* Completion does it's own kref_put.  If we are going to
 		 * kref_sub below, we need req to be still around then. */
 		int at_least = k_put + !!c_put;
-		int refcount = atomic_read(&req->kref.refcount);
+		int refcount = kref_read(&req->kref);
 		if (refcount < at_least)
 			drbd_err(device,
 				"mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1535,7 +1535,7 @@ static bool obj_request_overlaps_parent(
 static void rbd_obj_request_get(struct rbd_obj_request *obj_request)
 {
 	dout("%s: obj %p (was %d)\n", __func__, obj_request,
-		atomic_read(&obj_request->kref.refcount));
+		kref_read(&obj_request->kref));
 	kref_get(&obj_request->kref);
 }
 
@@ -1544,14 +1544,14 @@ static void rbd_obj_request_put(struct r
 {
 	rbd_assert(obj_request != NULL);
 	dout("%s: obj %p (was %d)\n", __func__, obj_request,
-		atomic_read(&obj_request->kref.refcount));
+		kref_read(&obj_request->kref));
 	kref_put(&obj_request->kref, rbd_obj_request_destroy);
 }
 
 static void rbd_img_request_get(struct rbd_img_request *img_request)
 {
 	dout("%s: img %p (was %d)\n", __func__, img_request,
-	     atomic_read(&img_request->kref.refcount));
+	     kref_read(&img_request->kref));
 	kref_get(&img_request->kref);
 }
 
@@ -1562,7 +1562,7 @@ static void rbd_img_request_put(struct r
 {
 	rbd_assert(img_request != NULL);
 	dout("%s: img %p (was %d)\n", __func__, img_request,
-		atomic_read(&img_request->kref.refcount));
+		kref_read(&img_request->kref));
 	if (img_request_child_test(img_request))
 		kref_put(&img_request->kref, rbd_parent_request_destroy);
 	else
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
 	/* Stop all the virtqueues. */
 	vdev->config->reset(vdev);
 
-	refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
+	refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
 	put_disk(vblk->disk);
 	vdev->config->del_vqs(vdev);
 	kfree(vblk->vqs);
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -376,7 +376,7 @@ void drm_gem_cma_describe(struct drm_gem
 	off = drm_vma_node_start(&obj->vma_node);
 
 	seq_printf(m, "%2d (%2d) %08llx %pad %p %zu",
-			obj->name, obj->refcount.refcount.counter,
+			obj->name, kref_read(&obj->refcount),
 			off, &cma_obj->paddr, cma_obj->vaddr, obj->size);
 
 	seq_printf(m, "\n");
--- a/drivers/gpu/drm/drm_info.c
+++ b/drivers/gpu/drm/drm_info.c
@@ -118,7 +118,7 @@ static int drm_gem_one_name_info(int id,
 	seq_printf(m, "%6d %8zd %7d %8d\n",
 		   obj->name, obj->size,
 		   obj->handle_count,
-		   atomic_read(&obj->refcount.refcount));
+		   kref_read(&obj->refcount));
 	return 0;
 }
 
--- a/drivers/gpu/drm/drm_mode_object.c
+++ b/drivers/gpu/drm/drm_mode_object.c
@@ -159,7 +159,7 @@ EXPORT_SYMBOL(drm_mode_object_find);
 void drm_mode_object_unreference(struct drm_mode_object *obj)
 {
 	if (obj->free_cb) {
-		DRM_DEBUG("OBJ ID: %d (%d)\n", obj->id, atomic_read(&obj->refcount.refcount));
+		DRM_DEBUG("OBJ ID: %d (%d)\n", obj->id, kref_read(&obj->refcount));
 		kref_put(&obj->refcount, obj->free_cb);
 	}
 }
@@ -176,7 +176,7 @@ EXPORT_SYMBOL(drm_mode_object_unreferenc
 void drm_mode_object_reference(struct drm_mode_object *obj)
 {
 	if (obj->free_cb) {
-		DRM_DEBUG("OBJ ID: %d (%d)\n", obj->id, atomic_read(&obj->refcount.refcount));
+		DRM_DEBUG("OBJ ID: %d (%d)\n", obj->id, kref_read(&obj->refcount));
 		kref_get(&obj->refcount);
 	}
 }
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -491,7 +491,7 @@ static void etnaviv_gem_describe(struct
 
 	seq_printf(m, "%08x: %c %2d (%2d) %08lx %p %zd\n",
 			etnaviv_obj->flags, is_active(etnaviv_obj) ? 'A' : 'I',
-			obj->name, obj->refcount.refcount.counter,
+			obj->name, kref_read(&obj->refcount),
 			off, etnaviv_obj->vaddr, obj->size);
 
 	rcu_read_lock();
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -652,7 +652,7 @@ void msm_gem_describe(struct drm_gem_obj
 
 	seq_printf(m, "%08x: %c %2d (%2d) %08llx %p %zu%s\n",
 			msm_obj->flags, is_active(msm_obj) ? 'A' : 'I',
-			obj->name, obj->refcount.refcount.counter,
+			obj->name, kref_read(&obj->refcount),
 			off, msm_obj->vaddr, obj->size, madv);
 
 	rcu_read_lock();
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -527,7 +527,7 @@ static bool nouveau_fence_no_signaling(s
 	 * caller should have a reference on the fence,
 	 * else fence could get freed here
 	 */
-	WARN_ON(atomic_read(&fence->base.refcount.refcount) <= 1);
+	WARN_ON(kref_read(&fence->base.refcount) <= 1);
 
 	/*
 	 * This needs uevents to work correctly, but fence_add_callback relies on
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -1035,7 +1035,7 @@ void omap_gem_describe(struct drm_gem_ob
 	off = drm_vma_node_start(&obj->vma_node);
 
 	seq_printf(m, "%08x: %2d (%2d) %08llx %pad (%2d) %p %4d",
-			omap_obj->flags, obj->name, obj->refcount.refcount.counter,
+			omap_obj->flags, obj->name, kref_read(&obj->refcount),
 			off, &omap_obj->paddr, omap_obj->paddr_cnt,
 			omap_obj->vaddr, omap_obj->roll);
 
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -140,8 +140,8 @@ static void ttm_bo_release_list(struct k
 	struct ttm_bo_device *bdev = bo->bdev;
 	size_t acc_size = bo->acc_size;
 
-	BUG_ON(atomic_read(&bo->list_kref.refcount));
-	BUG_ON(atomic_read(&bo->kref.refcount));
+	BUG_ON(kref_read(&bo->list_kref));
+	BUG_ON(kref_read(&bo->kref));
 	BUG_ON(atomic_read(&bo->cpu_writers));
 	BUG_ON(bo->mem.mm_node != NULL);
 	BUG_ON(!list_empty(&bo->lru));
--- a/drivers/gpu/drm/ttm/ttm_object.c
+++ b/drivers/gpu/drm/ttm/ttm_object.c
@@ -304,7 +304,7 @@ bool ttm_ref_object_exists(struct ttm_ob
 	 * Verify that the ref->obj pointer was actually valid!
 	 */
 	rmb();
-	if (unlikely(atomic_read(&ref->kref.refcount) == 0))
+	if (unlikely(kref_read(&ref->kref) == 0))
 		goto out_false;
 
 	rcu_read_unlock();
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -55,14 +55,14 @@
 
 #define put_ep(ep) { \
 	PDBG("put_ep (via %s:%u) ep %p refcnt %d\n", __func__, __LINE__,  \
-	     ep, atomic_read(&((ep)->kref.refcount))); \
-	WARN_ON(atomic_read(&((ep)->kref.refcount)) < 1); \
+	     ep, kref_read(&((ep)->kref))); \
+	WARN_ON(kref_read(&((ep)->kref)) < 1); \
 	kref_put(&((ep)->kref), __free_ep); \
 }
 
 #define get_ep(ep) { \
 	PDBG("get_ep (via %s:%u) ep %p, refcnt %d\n", __func__, __LINE__, \
-	     ep, atomic_read(&((ep)->kref.refcount))); \
+	     ep, kref_read(&((ep)->kref))); \
 	kref_get(&((ep)->kref));  \
 }
 
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -961,7 +961,7 @@ int iwch_modify_qp(struct iwch_dev *rhp,
 	case IWCH_QP_STATE_RTS:
 		switch (attrs->next_state) {
 		case IWCH_QP_STATE_CLOSING:
-			BUG_ON(atomic_read(&qhp->ep->com.kref.refcount) < 2);
+			BUG_ON(kref_read(&qhp->ep->com.kref) < 2);
 			qhp->attr.state = IWCH_QP_STATE_CLOSING;
 			if (!internal) {
 				abort=0;
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -654,14 +654,14 @@ enum c4iw_mmid_state {
 
 #define c4iw_put_ep(ep) { \
 	PDBG("put_ep (via %s:%u) ep %p refcnt %d\n", __func__, __LINE__,  \
-	     ep, atomic_read(&((ep)->kref.refcount))); \
-	WARN_ON(atomic_read(&((ep)->kref.refcount)) < 1); \
+	     ep, kref_read(&((ep)->kref))); \
+	WARN_ON(kref_read(&((ep)->kref)) < 1); \
 	kref_put(&((ep)->kref), _c4iw_free_ep); \
 }
 
 #define c4iw_get_ep(ep) { \
 	PDBG("get_ep (via %s:%u) ep %p, refcnt %d\n", __func__, __LINE__, \
-	     ep, atomic_read(&((ep)->kref.refcount))); \
+	     ep, kref_read(&((ep)->kref))); \
 	kref_get(&((ep)->kref));  \
 }
 void _c4iw_free_ep(struct kref *kref);
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1498,7 +1498,7 @@ int c4iw_modify_qp(struct c4iw_dev *rhp,
 	case C4IW_QP_STATE_RTS:
 		switch (attrs->next_state) {
 		case C4IW_QP_STATE_CLOSING:
-			BUG_ON(atomic_read(&qhp->ep->com.kref.refcount) < 2);
+			BUG_ON(kref_read(&qhp->ep->com.kref) < 2);
 			t4_set_wq_in_error(&qhp->wq);
 			set_state(qhp, C4IW_QP_STATE_CLOSING);
 			ep = qhp->ep;
--- a/drivers/infiniband/hw/usnic/usnic_ib_sysfs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_sysfs.c
@@ -80,7 +80,7 @@ usnic_ib_show_config(struct device *devi
 	left = PAGE_SIZE;
 
 	mutex_lock(&us_ibdev->usdev_lock);
-	if (atomic_read(&us_ibdev->vf_cnt.refcount) > 0) {
+	if (kref_read(&us_ibdev->vf_cnt) > 0) {
 		char *busname;
 
 		/*
@@ -99,7 +99,7 @@ usnic_ib_show_config(struct device *devi
 			PCI_FUNC(us_ibdev->pdev->devfn),
 			netdev_name(us_ibdev->netdev),
 			us_ibdev->ufdev->mac,
-			atomic_read(&us_ibdev->vf_cnt.refcount));
+			kref_read(&us_ibdev->vf_cnt));
 		UPDATE_PTR_LEFT(n, ptr, left);
 
 		for (res_type = USNIC_VNIC_RES_TYPE_EOL;
@@ -147,7 +147,7 @@ usnic_ib_show_max_vf(struct device *devi
 	us_ibdev = container_of(device, struct usnic_ib_dev, ib_dev.dev);
 
 	return scnprintf(buf, PAGE_SIZE, "%u\n",
-			atomic_read(&us_ibdev->vf_cnt.refcount));
+			kref_read(&us_ibdev->vf_cnt));
 }
 
 static ssize_t
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -291,11 +291,11 @@ int usnic_ib_query_device(struct ib_devi
 	qp_per_vf = max(us_ibdev->vf_res_cnt[USNIC_VNIC_RES_TYPE_WQ],
 			us_ibdev->vf_res_cnt[USNIC_VNIC_RES_TYPE_RQ]);
 	props->max_qp = qp_per_vf *
-		atomic_read(&us_ibdev->vf_cnt.refcount);
+		kref_read(&us_ibdev->vf_cnt);
 	props->device_cap_flags = IB_DEVICE_PORT_ACTIVE_EVENT |
 		IB_DEVICE_SYS_IMAGE_GUID | IB_DEVICE_BLOCK_MULTICAST_LOOPBACK;
 	props->max_cq = us_ibdev->vf_res_cnt[USNIC_VNIC_RES_TYPE_CQ] *
-		atomic_read(&us_ibdev->vf_cnt.refcount);
+		kref_read(&us_ibdev->vf_cnt);
 	props->max_pd = USNIC_UIOM_MAX_PD_CNT;
 	props->max_mr = USNIC_UIOM_MAX_MR_CNT;
 	props->local_ca_ack_delay = 0;
--- a/drivers/misc/genwqe/card_dev.c
+++ b/drivers/misc/genwqe/card_dev.c
@@ -1396,7 +1396,7 @@ int genwqe_device_remove(struct genwqe_d
 	 * application which will decrease this reference from
 	 * 1/unused to 0/illegal and not from 2/used 1/empty.
 	 */
-	rc = atomic_read(&cd->cdev_genwqe.kobj.kref.refcount);
+	rc = kref_read(&cd->cdev_genwqe.kobj.kref);
 	if (rc != 1) {
 		dev_err(&pci_dev->dev,
 			"[%s] err: cdev_genwqe...refcount=%d\n", __func__, rc);
--- a/drivers/misc/mei/debugfs.c
+++ b/drivers/misc/mei/debugfs.c
@@ -67,7 +67,7 @@ static ssize_t mei_dbgfs_read_meclients(
 				me_cl->props.max_number_of_connections,
 				me_cl->props.max_msg_length,
 				me_cl->props.single_recv_buf,
-				atomic_read(&me_cl->refcnt.refcount));
+				kref_read(&me_cl->refcnt));
 
 			mei_me_cl_put(me_cl);
 		}
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -155,7 +155,7 @@ static void pnv_php_detach_device_nodes(
 		pnv_php_detach_device_nodes(dn);
 
 		of_node_put(dn);
-		refcount = atomic_read(&dn->kobj.kref.refcount);
+		refcount = kref_read(&dn->kobj.kref);
 		if (refcount != 1)
 			pr_warn("Invalid refcount %d on <%s>\n",
 				refcount, of_node_full_name(dn));
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -345,7 +345,7 @@ EXPORT_SYMBOL_GPL(pci_create_slot);
 void pci_destroy_slot(struct pci_slot *slot)
 {
 	dev_dbg(&slot->bus->dev, "dev %02x, dec refcount to %d\n",
-		slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);
+		slot->number, kref_read(&slot->kobj.kref) - 1);
 
 	mutex_lock(&pci_slot_mutex);
 	kobject_put(&slot->kobj);
--- a/drivers/scsi/bnx2fc/bnx2fc_io.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_io.c
@@ -74,7 +74,7 @@ static void bnx2fc_cmd_timeout(struct wo
 				    &io_req->req_flags)) {
 			/* Handle internally generated ABTS timeout */
 			BNX2FC_IO_DBG(io_req, "ABTS timed out refcnt = %d\n",
-					io_req->refcount.refcount.counter);
+					kref_read(&io_req->refcount));
 			if (!(test_and_set_bit(BNX2FC_FLAG_ABTS_DONE,
 					       &io_req->req_flags))) {
 				/*
@@ -1141,7 +1141,7 @@ int bnx2fc_eh_abort(struct scsi_cmnd *sc
 		return SUCCESS;
 	}
 	BNX2FC_IO_DBG(io_req, "eh_abort - refcnt = %d\n",
-		      io_req->refcount.refcount.counter);
+		      kref_read(&io_req->refcount));
 
 	/* Hold IO request across abort processing */
 	kref_get(&io_req->refcount);
@@ -1299,7 +1299,7 @@ void bnx2fc_process_cleanup_compl(struct
 {
 	BNX2FC_IO_DBG(io_req, "Entered process_cleanup_compl "
 			      "refcnt = %d, cmd_type = %d\n",
-		   io_req->refcount.refcount.counter, io_req->cmd_type);
+		   kref_read(&io_req->refcount), io_req->cmd_type);
 	bnx2fc_scsi_done(io_req, DID_ERROR);
 	kref_put(&io_req->refcount, bnx2fc_cmd_release);
 	if (io_req->wait_for_comp)
@@ -1318,7 +1318,7 @@ void bnx2fc_process_abts_compl(struct bn
 	BNX2FC_IO_DBG(io_req, "Entered process_abts_compl xid = 0x%x"
 			      "refcnt = %d, cmd_type = %d\n",
 		   io_req->xid,
-		   io_req->refcount.refcount.counter, io_req->cmd_type);
+		   kref_read(&io_req->refcount), io_req->cmd_type);
 
 	if (test_and_set_bit(BNX2FC_FLAG_ABTS_DONE,
 				       &io_req->req_flags)) {
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -300,7 +300,7 @@ static inline void __cxgbi_sock_put(cons
 {
 	log_debug(1 << CXGBI_DBG_SOCK,
 		"%s, put csk 0x%p, ref %u-1.\n",
-		fn, csk, atomic_read(&csk->refcnt.refcount));
+		fn, csk, kref_read(&csk->refcnt));
 	kref_put(&csk->refcnt, cxgbi_sock_free);
 }
 #define cxgbi_sock_put(csk)	__cxgbi_sock_put(__func__, csk)
@@ -309,7 +309,7 @@ static inline void __cxgbi_sock_get(cons
 {
 	log_debug(1 << CXGBI_DBG_SOCK,
 		"%s, get csk 0x%p, ref %u+1.\n",
-		fn, csk, atomic_read(&csk->refcnt.refcount));
+		fn, csk, kref_read(&csk->refcnt));
 	kref_get(&csk->refcnt);
 }
 #define cxgbi_sock_get(csk)	__cxgbi_sock_get(__func__, csk)
--- a/drivers/scsi/lpfc/lpfc_debugfs.c
+++ b/drivers/scsi/lpfc/lpfc_debugfs.c
@@ -607,7 +607,7 @@ lpfc_debugfs_nodelist_data(struct lpfc_v
 		len += snprintf(buf+len, size-len, "usgmap:%x ",
 			ndlp->nlp_usg_map);
 		len += snprintf(buf+len, size-len, "refcnt:%x",
-			atomic_read(&ndlp->kref.refcount));
+			kref_read(&ndlp->kref));
 		len +=  snprintf(buf+len, size-len, "\n");
 	}
 	spin_unlock_irq(shost->host_lock);
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -3688,7 +3688,7 @@ lpfc_mbx_cmpl_dflt_rpi(struct lpfc_hba *
 		lpfc_printf_vlog(ndlp->vport, KERN_INFO, LOG_NODE,
 				 "0006 rpi%x DID:%x flg:%x %d map:%x %p\n",
 				 ndlp->nlp_rpi, ndlp->nlp_DID, ndlp->nlp_flag,
-				 atomic_read(&ndlp->kref.refcount),
+				 kref_read(&ndlp->kref),
 				 ndlp->nlp_usg_map, ndlp);
 		if (NLP_CHK_NODE_ACT(ndlp)) {
 			lpfc_nlp_put(ndlp);
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -3440,7 +3440,7 @@ lpfc_mbx_cmpl_reg_login(struct lpfc_hba
 	lpfc_printf_vlog(vport, KERN_INFO, LOG_SLI,
 			 "0002 rpi:%x DID:%x flg:%x %d map:%x %p\n",
 			 ndlp->nlp_rpi, ndlp->nlp_DID, ndlp->nlp_flag,
-			 atomic_read(&ndlp->kref.refcount),
+			 kref_read(&ndlp->kref),
 			 ndlp->nlp_usg_map, ndlp);
 	if (ndlp->nlp_flag & NLP_REG_LOGIN_SEND)
 		ndlp->nlp_flag &= ~NLP_REG_LOGIN_SEND;
@@ -3861,7 +3861,7 @@ lpfc_mbx_cmpl_ns_reg_login(struct lpfc_h
 	lpfc_printf_vlog(vport, KERN_INFO, LOG_SLI,
 			 "0003 rpi:%x DID:%x flg:%x %d map%x %p\n",
 			 ndlp->nlp_rpi, ndlp->nlp_DID, ndlp->nlp_flag,
-			 atomic_read(&ndlp->kref.refcount),
+			 kref_read(&ndlp->kref),
 			 ndlp->nlp_usg_map, ndlp);
 
 	if (vport->port_state < LPFC_VPORT_READY) {
@@ -4238,7 +4238,7 @@ lpfc_enable_node(struct lpfc_vport *vpor
 				"0277 lpfc_enable_node: ndlp:x%p "
 				"usgmap:x%x refcnt:%d\n",
 				(void *)ndlp, ndlp->nlp_usg_map,
-				atomic_read(&ndlp->kref.refcount));
+				kref_read(&ndlp->kref));
 		return NULL;
 	}
 	/* The ndlp should not already be in active mode */
@@ -4248,7 +4248,7 @@ lpfc_enable_node(struct lpfc_vport *vpor
 				"0278 lpfc_enable_node: ndlp:x%p "
 				"usgmap:x%x refcnt:%d\n",
 				(void *)ndlp, ndlp->nlp_usg_map,
-				atomic_read(&ndlp->kref.refcount));
+				kref_read(&ndlp->kref));
 		return NULL;
 	}
 
@@ -4272,7 +4272,7 @@ lpfc_enable_node(struct lpfc_vport *vpor
 				 "0008 rpi:%x DID:%x flg:%x refcnt:%d "
 				 "map:%x %p\n", ndlp->nlp_rpi, ndlp->nlp_DID,
 				 ndlp->nlp_flag,
-				 atomic_read(&ndlp->kref.refcount),
+				 kref_read(&ndlp->kref),
 				 ndlp->nlp_usg_map, ndlp);
 	}
 
@@ -4546,7 +4546,7 @@ lpfc_unreg_rpi(struct lpfc_vport *vport,
 				    (bf_get(lpfc_sli_intf_if_type,
 				     &phba->sli4_hba.sli_intf) ==
 				      LPFC_SLI_INTF_IF_TYPE_2) &&
-				    (atomic_read(&ndlp->kref.refcount) > 0)) {
+				    (kref_read(&ndlp->kref) > 0)) {
 					mbox->context1 = lpfc_nlp_get(ndlp);
 					mbox->mbox_cmpl =
 						lpfc_sli4_unreg_rpi_cmpl_clr;
@@ -4695,14 +4695,14 @@ lpfc_cleanup_node(struct lpfc_vport *vpo
 				"0280 lpfc_cleanup_node: ndlp:x%p "
 				"usgmap:x%x refcnt:%d\n",
 				(void *)ndlp, ndlp->nlp_usg_map,
-				atomic_read(&ndlp->kref.refcount));
+				kref_read(&ndlp->kref));
 		lpfc_dequeue_node(vport, ndlp);
 	} else {
 		lpfc_printf_vlog(vport, KERN_WARNING, LOG_NODE,
 				"0281 lpfc_cleanup_node: ndlp:x%p "
 				"usgmap:x%x refcnt:%d\n",
 				(void *)ndlp, ndlp->nlp_usg_map,
-				atomic_read(&ndlp->kref.refcount));
+				kref_read(&ndlp->kref));
 		lpfc_disable_node(vport, ndlp);
 	}
 
@@ -4791,7 +4791,7 @@ lpfc_nlp_remove(struct lpfc_vport *vport
 		lpfc_printf_vlog(vport, KERN_INFO, LOG_NODE,
 				 "0005 rpi:%x DID:%x flg:%x %d map:%x %p\n",
 				 ndlp->nlp_rpi, ndlp->nlp_DID, ndlp->nlp_flag,
-				 atomic_read(&ndlp->kref.refcount),
+				 kref_read(&ndlp->kref),
 				 ndlp->nlp_usg_map, ndlp);
 		if ((mbox = mempool_alloc(phba->mbox_mem_pool, GFP_KERNEL))
 			!= NULL) {
@@ -5557,7 +5557,7 @@ lpfc_mbx_cmpl_fdmi_reg_login(struct lpfc
 	lpfc_printf_vlog(vport, KERN_INFO, LOG_SLI,
 			 "0004 rpi:%x DID:%x flg:%x %d map:%x %p\n",
 			 ndlp->nlp_rpi, ndlp->nlp_DID, ndlp->nlp_flag,
-			 atomic_read(&ndlp->kref.refcount),
+			 kref_read(&ndlp->kref),
 			 ndlp->nlp_usg_map, ndlp);
 	/*
 	 * Start issuing Fabric-Device Management Interface (FDMI) command to
@@ -5728,7 +5728,7 @@ lpfc_nlp_init(struct lpfc_vport *vport,
 				 "0007 rpi:%x DID:%x flg:%x refcnt:%d "
 				 "map:%x %p\n", ndlp->nlp_rpi, ndlp->nlp_DID,
 				 ndlp->nlp_flag,
-				 atomic_read(&ndlp->kref.refcount),
+				 kref_read(&ndlp->kref),
 				 ndlp->nlp_usg_map, ndlp);
 
 		ndlp->active_rrqs_xri_bitmap =
@@ -5767,7 +5767,7 @@ lpfc_nlp_release(struct kref *kref)
 			"0279 lpfc_nlp_release: ndlp:x%p did %x "
 			"usgmap:x%x refcnt:%d rpi:%x\n",
 			(void *)ndlp, ndlp->nlp_DID, ndlp->nlp_usg_map,
-			atomic_read(&ndlp->kref.refcount), ndlp->nlp_rpi);
+			kref_read(&ndlp->kref), ndlp->nlp_rpi);
 
 	/* remove ndlp from action. */
 	lpfc_nlp_remove(ndlp->vport, ndlp);
@@ -5804,7 +5804,7 @@ lpfc_nlp_get(struct lpfc_nodelist *ndlp)
 		lpfc_debugfs_disc_trc(ndlp->vport, LPFC_DISC_TRC_NODE,
 			"node get:        did:x%x flg:x%x refcnt:x%x",
 			ndlp->nlp_DID, ndlp->nlp_flag,
-			atomic_read(&ndlp->kref.refcount));
+			kref_read(&ndlp->kref));
 		/* The check of ndlp usage to prevent incrementing the
 		 * ndlp reference count that is in the process of being
 		 * released.
@@ -5817,7 +5817,7 @@ lpfc_nlp_get(struct lpfc_nodelist *ndlp)
 				"0276 lpfc_nlp_get: ndlp:x%p "
 				"usgmap:x%x refcnt:%d\n",
 				(void *)ndlp, ndlp->nlp_usg_map,
-				atomic_read(&ndlp->kref.refcount));
+				kref_read(&ndlp->kref));
 			return NULL;
 		} else
 			kref_get(&ndlp->kref);
@@ -5844,7 +5844,7 @@ lpfc_nlp_put(struct lpfc_nodelist *ndlp)
 	lpfc_debugfs_disc_trc(ndlp->vport, LPFC_DISC_TRC_NODE,
 	"node put:        did:x%x flg:x%x refcnt:x%x",
 		ndlp->nlp_DID, ndlp->nlp_flag,
-		atomic_read(&ndlp->kref.refcount));
+		kref_read(&ndlp->kref));
 	phba = ndlp->phba;
 	spin_lock_irqsave(&phba->ndlp_lock, flags);
 	/* Check the ndlp memory free acknowledge flag to avoid the
@@ -5857,7 +5857,7 @@ lpfc_nlp_put(struct lpfc_nodelist *ndlp)
 				"0274 lpfc_nlp_put: ndlp:x%p "
 				"usgmap:x%x refcnt:%d\n",
 				(void *)ndlp, ndlp->nlp_usg_map,
-				atomic_read(&ndlp->kref.refcount));
+				kref_read(&ndlp->kref));
 		return 1;
 	}
 	/* Check the ndlp inactivate log flag to avoid the possible
@@ -5870,7 +5870,7 @@ lpfc_nlp_put(struct lpfc_nodelist *ndlp)
 				"0275 lpfc_nlp_put: ndlp:x%p "
 				"usgmap:x%x refcnt:%d\n",
 				(void *)ndlp, ndlp->nlp_usg_map,
-				atomic_read(&ndlp->kref.refcount));
+				kref_read(&ndlp->kref));
 		return 1;
 	}
 	/* For last put, mark the ndlp usage flags to make sure no
@@ -5878,7 +5878,7 @@ lpfc_nlp_put(struct lpfc_nodelist *ndlp)
 	 * in between the process when the final kref_put has been
 	 * invoked on this ndlp.
 	 */
-	if (atomic_read(&ndlp->kref.refcount) == 1) {
+	if (kref_read(&ndlp->kref) == 1) {
 		/* Indicate ndlp is put to inactive state. */
 		NLP_SET_IACT_REQ(ndlp);
 		/* Acknowledge ndlp memory free has been seen. */
@@ -5906,8 +5906,8 @@ lpfc_nlp_not_used(struct lpfc_nodelist *
 	lpfc_debugfs_disc_trc(ndlp->vport, LPFC_DISC_TRC_NODE,
 		"node not used:   did:x%x flg:x%x refcnt:x%x",
 		ndlp->nlp_DID, ndlp->nlp_flag,
-		atomic_read(&ndlp->kref.refcount));
-	if (atomic_read(&ndlp->kref.refcount) == 1)
+		kref_read(&ndlp->kref));
+	if (kref_read(&ndlp->kref) == 1)
 		if (lpfc_nlp_put(ndlp))
 			return 1;
 	return 0;
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -2660,8 +2660,7 @@ lpfc_cleanup(struct lpfc_vport *vport)
 						"usgmap:x%x refcnt:%d\n",
 						ndlp->nlp_DID, (void *)ndlp,
 						ndlp->nlp_usg_map,
-						atomic_read(
-							&ndlp->kref.refcount));
+						kref_read(&ndlp->kref));
 			}
 			break;
 		}
--- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
+++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
@@ -371,7 +371,7 @@ static int tcm_qla2xxx_write_pending(str
 		 */
 		pr_debug("write_pending aborted cmd[%p] refcount %d "
 			"transport_state %x, t_state %x, se_cmd_flags %x\n",
-			cmd,cmd->se_cmd.cmd_kref.refcount.counter,
+			cmd, kref_read(&cmd->se_cmd.cmd_kref),
 			cmd->se_cmd.transport_state,
 			cmd->se_cmd.t_state,
 			cmd->se_cmd.se_cmd_flags);
@@ -584,7 +584,7 @@ static int tcm_qla2xxx_queue_data_in(str
 		 */
 		pr_debug("queue_data_in aborted cmd[%p] refcount %d "
 			"transport_state %x, t_state %x, se_cmd_flags %x\n",
-			cmd,cmd->se_cmd.cmd_kref.refcount.counter,
+			cmd, kref_read(&cmd->se_cmd.cmd_kref),
 			cmd->se_cmd.transport_state,
 			cmd->se_cmd.t_state,
 			cmd->se_cmd.se_cmd_flags);
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -1300,7 +1300,7 @@ static int ion_debug_heap_show(struct se
 			seq_printf(s, "%16s %16u %16zu %d %d\n",
 				   buffer->task_comm, buffer->pid,
 				   buffer->size, buffer->kmap_cnt,
-				   atomic_read(&buffer->ref.refcount));
+				   kref_read(&buffer->ref));
 			total_orphaned_size += buffer->size;
 		}
 	}
--- a/drivers/staging/comedi/comedi_buf.c
+++ b/drivers/staging/comedi/comedi_buf.c
@@ -188,7 +188,7 @@ bool comedi_buf_is_mmapped(struct comedi
 {
 	struct comedi_buf_map *bm = s->async->buf_map;
 
-	return bm && (atomic_read(&bm->refcount.refcount) > 1);
+	return bm && (kref_read(&bm->refcount) > 1);
 }
 
 int comedi_buf_alloc(struct comedi_device *dev, struct comedi_subdevice *s,
--- a/drivers/target/target_core_pr.c
+++ b/drivers/target/target_core_pr.c
@@ -787,7 +787,7 @@ static struct t10_pr_registration *__cor
 			 * __core_scsi3_add_registration()
 			 */
 			dest_lun = rcu_dereference_check(deve_tmp->se_lun,
-				atomic_read(&deve_tmp->pr_kref.refcount) != 0);
+				kref_read(&deve_tmp->pr_kref) != 0);
 
 			pr_reg_atp = __core_scsi3_do_alloc_registration(dev,
 						nacl_tmp, dest_lun, deve_tmp,
@@ -1462,7 +1462,7 @@ static int core_scsi3_lunacl_depend_item
 	 * For nacl->dynamic_node_acl=1
 	 */
 	lun_acl = rcu_dereference_check(se_deve->se_lun_acl,
-				atomic_read(&se_deve->pr_kref.refcount) != 0);
+				kref_read(&se_deve->pr_kref) != 0);
 	if (!lun_acl)
 		return 0;
 
@@ -1477,7 +1477,7 @@ static void core_scsi3_lunacl_undepend_i
 	 * For nacl->dynamic_node_acl=1
 	 */
 	lun_acl = rcu_dereference_check(se_deve->se_lun_acl,
-				atomic_read(&se_deve->pr_kref.refcount) != 0);
+				kref_read(&se_deve->pr_kref) != 0);
 	if (!lun_acl) {
 		kref_put(&se_deve->pr_kref, target_pr_kref_release);
 		return;
@@ -1758,7 +1758,7 @@ core_scsi3_decode_spec_i_port(
 		 * 2nd loop which will never fail.
 		 */
 		dest_lun = rcu_dereference_check(dest_se_deve->se_lun,
-				atomic_read(&dest_se_deve->pr_kref.refcount) != 0);
+				kref_read(&dest_se_deve->pr_kref) != 0);
 
 		dest_pr_reg = __core_scsi3_alloc_registration(cmd->se_dev,
 					dest_node_acl, dest_lun, dest_se_deve,
@@ -3465,7 +3465,7 @@ core_scsi3_emulate_pro_register_and_move
 					iport_ptr);
 	if (!dest_pr_reg) {
 		struct se_lun *dest_lun = rcu_dereference_check(dest_se_deve->se_lun,
-				atomic_read(&dest_se_deve->pr_kref.refcount) != 0);
+				kref_read(&dest_se_deve->pr_kref) != 0);
 
 		spin_unlock(&dev->dev_reservation_lock);
 		if (core_scsi3_alloc_registration(cmd->se_dev, dest_node_acl,
--- a/drivers/target/tcm_fc/tfc_sess.c
+++ b/drivers/target/tcm_fc/tfc_sess.c
@@ -454,7 +454,7 @@ static void ft_sess_free(struct kref *kr
 
 void ft_sess_put(struct ft_sess *sess)
 {
-	int sess_held = atomic_read(&sess->kref.refcount);
+	int sess_held = kref_read(&sess->kref);
 
 	BUG_ON(!sess_held);
 	kref_put(&sess->kref, ft_sess_free);
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -3686,7 +3686,7 @@ static void ffs_closed(struct ffs_data *
 		goto done;
 
 	if (opts->no_configfs || !opts->func_inst.group.cg_item.ci_parent
-	    || !atomic_read(&opts->func_inst.group.cg_item.ci_kref.refcount))
+	    || !kref_read(&opts->func_inst.group.cg_item.ci_kref))
 		goto done;
 
 	unregister_gadget_item(ffs_obj->opts->
--- a/fs/exofs/sys.c
+++ b/fs/exofs/sys.c
@@ -122,7 +122,7 @@ void exofs_sysfs_dbg_print(void)
 	list_for_each_entry_safe(k_name, k_tmp, &exofs_kset->list, entry) {
 		printk(KERN_INFO "%s: name %s ref %d\n",
 			__func__, kobject_name(k_name),
-			(int)atomic_read(&k_name->kref.refcount));
+			(int)kref_read(&k_name->kref));
 	}
 #endif
 }
--- a/fs/ocfs2/cluster/netdebug.c
+++ b/fs/ocfs2/cluster/netdebug.c
@@ -349,7 +349,7 @@ static void sc_show_sock_container(struc
 		   "  func key:        0x%08x\n"
 		   "  func type:       %u\n",
 		   sc,
-		   atomic_read(&sc->sc_kref.refcount),
+		   kref_read(&sc->sc_kref),
 		   &saddr, inet ? ntohs(sport) : 0,
 		   &daddr, inet ? ntohs(dport) : 0,
 		   sc->sc_node->nd_name,
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -97,7 +97,7 @@
 	typeof(sc) __sc = (sc);						\
 	mlog(ML_SOCKET, "[sc %p refs %d sock %p node %u page %p "	\
 	     "pg_off %zu] " fmt, __sc,					\
-	     atomic_read(&__sc->sc_kref.refcount), __sc->sc_sock,	\
+	     kref_read(&__sc->sc_kref), __sc->sc_sock,	\
 	    __sc->sc_node->nd_num, __sc->sc_page, __sc->sc_page_off ,	\
 	    ##args);							\
 } while (0)
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -81,7 +81,7 @@ static void __dlm_print_lock(struct dlm_
 	       lock->ml.type, lock->ml.convert_type, lock->ml.node,
 	       dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)),
 	       dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)),
-	       atomic_read(&lock->lock_refs.refcount),
+	       kref_read(&lock->lock_refs),
 	       (list_empty(&lock->ast_list) ? 'y' : 'n'),
 	       (lock->ast_pending ? 'y' : 'n'),
 	       (list_empty(&lock->bast_list) ? 'y' : 'n'),
@@ -106,7 +106,7 @@ void __dlm_print_one_lock_resource(struc
 	printk("lockres: %s, owner=%u, state=%u\n",
 	       buf, res->owner, res->state);
 	printk("  last used: %lu, refcnt: %u, on purge list: %s\n",
-	       res->last_used, atomic_read(&res->refs.refcount),
+	       res->last_used, kref_read(&res->refs),
 	       list_empty(&res->purge) ? "no" : "yes");
 	printk("  on dirty list: %s, on reco list: %s, "
 	       "migrating pending: %s\n",
@@ -298,7 +298,7 @@ static int dump_mle(struct dlm_master_li
 			mle_type, mle->master, mle->new_master,
 			!list_empty(&mle->hb_events),
 			!!mle->inuse,
-			atomic_read(&mle->mle_refs.refcount));
+			kref_read(&mle->mle_refs));
 
 	out += snprintf(buf + out, len - out, "Maybe=");
 	out += stringify_nodemap(mle->maybe_map, O2NM_MAX_NODES,
@@ -494,7 +494,7 @@ static int dump_lock(struct dlm_lock *lo
 		       lock->ast_pending, lock->bast_pending,
 		       lock->convert_pending, lock->lock_pending,
 		       lock->cancel_pending, lock->unlock_pending,
-		       atomic_read(&lock->lock_refs.refcount));
+		       kref_read(&lock->lock_refs));
 	spin_unlock(&lock->spinlock);
 
 	return out;
@@ -521,7 +521,7 @@ static int dump_lockres(struct dlm_lock_
 			!list_empty(&res->recovering),
 			res->inflight_locks, res->migration_pending,
 			atomic_read(&res->asts_reserved),
-			atomic_read(&res->refs.refcount));
+			kref_read(&res->refs));
 
 	/* refmap */
 	out += snprintf(buf + out, len - out, "RMAP:");
@@ -777,7 +777,7 @@ static int debug_state_print(struct dlm_
 	/* Purge Count: xxx  Refs: xxx */
 	out += snprintf(buf + out, len - out,
 			"Purge Count: %d  Refs: %d\n", dlm->purge_count,
-			atomic_read(&dlm->dlm_refs.refcount));
+			kref_read(&dlm->dlm_refs));
 
 	/* Dead Node: xxx */
 	out += snprintf(buf + out, len - out,
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -2072,7 +2072,7 @@ static struct dlm_ctxt *dlm_alloc_ctxt(c
 	INIT_LIST_HEAD(&dlm->dlm_eviction_callbacks);
 
 	mlog(0, "context init: refcount %u\n",
-		  atomic_read(&dlm->dlm_refs.refcount));
+		  kref_read(&dlm->dlm_refs));
 
 leave:
 	if (ret < 0 && dlm) {
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -233,7 +233,7 @@ static void __dlm_put_mle(struct dlm_mas
 
 	assert_spin_locked(&dlm->spinlock);
 	assert_spin_locked(&dlm->master_lock);
-	if (!atomic_read(&mle->mle_refs.refcount)) {
+	if (!kref_read(&mle->mle_refs)) {
 		/* this may or may not crash, but who cares.
 		 * it's a BUG. */
 		mlog(ML_ERROR, "bad mle: %p\n", mle);
@@ -1124,9 +1124,9 @@ static int dlm_wait_for_lock_mastery(str
 		unsigned long timeo = msecs_to_jiffies(DLM_MASTERY_TIMEOUT_MS);
 
 		/*
-		if (atomic_read(&mle->mle_refs.refcount) < 2)
+		if (kref_read(&mle->mle_refs) < 2)
 			mlog(ML_ERROR, "mle (%p) refs=%d, name=%.*s\n", mle,
-			atomic_read(&mle->mle_refs.refcount),
+			kref_read(&mle->mle_refs),
 			res->lockname.len, res->lockname.name);
 		*/
 		atomic_set(&mle->woken, 0);
@@ -1988,7 +1988,7 @@ int dlm_assert_master_handler(struct o2n
 		 * on this mle. */
 		spin_lock(&dlm->master_lock);
 
-		rr = atomic_read(&mle->mle_refs.refcount);
+		rr = kref_read(&mle->mle_refs);
 		if (mle->inuse > 0) {
 			if (extra_ref && rr < 3)
 				err = 1;
--- a/fs/ocfs2/dlm/dlmunlock.c
+++ b/fs/ocfs2/dlm/dlmunlock.c
@@ -251,7 +251,7 @@ static enum dlm_status dlmunlock_common(
 		mlog(0, "lock %u:%llu should be gone now! refs=%d\n",
 		     dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)),
 		     dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)),
-		     atomic_read(&lock->lock_refs.refcount)-1);
+		     kref_read(&lock->lock_refs)-1);
 		dlm_lock_put(lock);
 	}
 	if (actions & DLM_UNLOCK_CALL_AST)
--- a/include/drm/drm_framebuffer.h
+++ b/include/drm/drm_framebuffer.h
@@ -247,7 +247,7 @@ static inline void drm_framebuffer_unref
  */
 static inline uint32_t drm_framebuffer_read_refcount(struct drm_framebuffer *fb)
 {
-	return atomic_read(&fb->base.refcount.refcount);
+	return kref_read(&fb->base.refcount);
 }
 
 /**
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -864,7 +864,7 @@ static inline int ttm_bo_reserve(struct
 {
 	int ret;
 
-	WARN_ON(!atomic_read(&bo->kref.refcount));
+	WARN_ON(!kref_read(&bo->kref));
 
 	ret = __ttm_bo_reserve(bo, interruptible, no_wait, ticket);
 	if (likely(ret == 0))
@@ -889,7 +889,7 @@ static inline int ttm_bo_reserve_slowpat
 {
 	int ret = 0;
 
-	WARN_ON(!atomic_read(&bo->kref.refcount));
+	WARN_ON(!kref_read(&bo->kref));
 
 	if (interruptible)
 		ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -35,6 +35,11 @@ static inline void kref_init(struct kref
 	atomic_set(&kref->refcount, 1);
 }
 
+static inline int kref_read(const struct kref *kref)
+{
+	return atomic_read(&kref->refcount);
+}
+
 /**
  * kref_get - increment refcount for object.
  * @kref: object.
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -198,7 +198,7 @@ static inline struct cache_head  *cache_
 
 static inline void cache_put(struct cache_head *h, struct cache_detail *cd)
 {
-	if (atomic_read(&h->ref.refcount) <= 2 &&
+	if (kref_read(&h->ref) <= 2 &&
 	    h->expiry_time < cd->nextcheck)
 		cd->nextcheck = h->expiry_time;
 	kref_put(&h->ref, cd->cache_put);
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -987,7 +987,7 @@ static inline void hci_conn_drop(struct
 static inline void hci_dev_put(struct hci_dev *d)
 {
 	BT_DBG("%s orig refcnt %d", d->name,
-	       atomic_read(&d->dev.kobj.kref.refcount));
+	       kref_read(&d->dev.kobj.kref));
 
 	put_device(&d->dev);
 }
@@ -995,7 +995,7 @@ static inline void hci_dev_put(struct hc
 static inline struct hci_dev *hci_dev_hold(struct hci_dev *d)
 {
 	BT_DBG("%s orig refcnt %d", d->name,
-	       atomic_read(&d->dev.kobj.kref.refcount));
+	       kref_read(&d->dev.kobj.kref));
 
 	get_device(&d->dev);
 	return d;
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -920,7 +920,7 @@ static void chan_close_cb(struct l2cap_c
 			BT_DBG("dev %p removing %speer %p", dev,
 			       last ? "last " : "1 ", peer);
 			BT_DBG("chan %p orig refcnt %d", chan,
-			       atomic_read(&chan->kref.refcount));
+			       kref_read(&chan->kref));
 
 			l2cap_chan_put(chan);
 			break;
--- a/net/bluetooth/a2mp.c
+++ b/net/bluetooth/a2mp.c
@@ -810,7 +810,7 @@ static struct l2cap_chan *a2mp_chan_open
 /* AMP Manager functions */
 struct amp_mgr *amp_mgr_get(struct amp_mgr *mgr)
 {
-	BT_DBG("mgr %p orig refcnt %d", mgr, atomic_read(&mgr->kref.refcount));
+	BT_DBG("mgr %p orig refcnt %d", mgr, kref_read(&mgr->kref));
 
 	kref_get(&mgr->kref);
 
@@ -833,7 +833,7 @@ static void amp_mgr_destroy(struct kref
 
 int amp_mgr_put(struct amp_mgr *mgr)
 {
-	BT_DBG("mgr %p orig refcnt %d", mgr, atomic_read(&mgr->kref.refcount));
+	BT_DBG("mgr %p orig refcnt %d", mgr, kref_read(&mgr->kref));
 
 	return kref_put(&mgr->kref, &amp_mgr_destroy);
 }
--- a/net/bluetooth/amp.c
+++ b/net/bluetooth/amp.c
@@ -24,7 +24,7 @@
 void amp_ctrl_get(struct amp_ctrl *ctrl)
 {
 	BT_DBG("ctrl %p orig refcnt %d", ctrl,
-	       atomic_read(&ctrl->kref.refcount));
+	       kref_read(&ctrl->kref));
 
 	kref_get(&ctrl->kref);
 }
@@ -42,7 +42,7 @@ static void amp_ctrl_destroy(struct kref
 int amp_ctrl_put(struct amp_ctrl *ctrl)
 {
 	BT_DBG("ctrl %p orig refcnt %d", ctrl,
-	       atomic_read(&ctrl->kref.refcount));
+	       kref_read(&ctrl->kref));
 
 	return kref_put(&ctrl->kref, &amp_ctrl_destroy);
 }
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -481,14 +481,14 @@ static void l2cap_chan_destroy(struct kr
 
 void l2cap_chan_hold(struct l2cap_chan *c)
 {
-	BT_DBG("chan %p orig refcnt %d", c, atomic_read(&c->kref.refcount));
+	BT_DBG("chan %p orig refcnt %d", c, kref_read(&c->kref));
 
 	kref_get(&c->kref);
 }
 
 void l2cap_chan_put(struct l2cap_chan *c)
 {
-	BT_DBG("chan %p orig refcnt %d", c, atomic_read(&c->kref.refcount));
+	BT_DBG("chan %p orig refcnt %d", c, kref_read(&c->kref));
 
 	kref_put(&c->kref, l2cap_chan_destroy);
 }
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -3418,7 +3418,7 @@ static void ceph_msg_release(struct kref
 struct ceph_msg *ceph_msg_get(struct ceph_msg *msg)
 {
 	dout("%s %p (was %d)\n", __func__, msg,
-	     atomic_read(&msg->kref.refcount));
+	     kref_read(&msg->kref));
 	kref_get(&msg->kref);
 	return msg;
 }
@@ -3427,7 +3427,7 @@ EXPORT_SYMBOL(ceph_msg_get);
 void ceph_msg_put(struct ceph_msg *msg)
 {
 	dout("%s %p (was %d)\n", __func__, msg,
-	     atomic_read(&msg->kref.refcount));
+	     kref_read(&msg->kref));
 	kref_put(&msg->kref, ceph_msg_release);
 }
 EXPORT_SYMBOL(ceph_msg_put);
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -438,7 +438,7 @@ static void ceph_osdc_release_request(st
 void ceph_osdc_get_request(struct ceph_osd_request *req)
 {
 	dout("%s %p (was %d)\n", __func__, req,
-	     atomic_read(&req->r_kref.refcount));
+	     kref_read(&req->r_kref));
 	kref_get(&req->r_kref);
 }
 EXPORT_SYMBOL(ceph_osdc_get_request);
@@ -447,7 +447,7 @@ void ceph_osdc_put_request(struct ceph_o
 {
 	if (req) {
 		dout("%s %p (was %d)\n", __func__, req,
-		     atomic_read(&req->r_kref.refcount));
+		     kref_read(&req->r_kref));
 		kref_put(&req->r_kref, ceph_osdc_release_request);
 	}
 }
@@ -487,11 +487,11 @@ static void request_reinit(struct ceph_o
 	struct ceph_msg *reply_msg = req->r_reply;
 
 	dout("%s req %p\n", __func__, req);
-	WARN_ON(atomic_read(&req->r_kref.refcount) != 1);
+	WARN_ON(kref_read(&req->r_kref) != 1);
 	request_release_checks(req);
 
-	WARN_ON(atomic_read(&request_msg->kref.refcount) != 1);
-	WARN_ON(atomic_read(&reply_msg->kref.refcount) != 1);
+	WARN_ON(kref_read(&request_msg->kref) != 1);
+	WARN_ON(kref_read(&reply_msg->kref) != 1);
 	target_destroy(&req->r_t);
 
 	request_init(req);
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1358,7 +1358,7 @@ static int c_show(struct seq_file *m, vo
 	ifdebug(CACHE)
 		seq_printf(m, "# expiry=%ld refcnt=%d flags=%lx\n",
 			   convert_to_wallclock(cp->expiry_time),
-			   atomic_read(&cp->ref.refcount), cp->flags);
+			   kref_read(&cp->ref), cp->flags);
 	cache_get(cp);
 	if (cache_check(cd, cp, NULL))
 		/* cache_check does a cache_put on failure */
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -490,7 +490,7 @@ static struct svc_xprt *svc_xprt_dequeue
 		svc_xprt_get(xprt);
 
 		dprintk("svc: transport %p dequeued, inuse=%d\n",
-			xprt, atomic_read(&xprt->xpt_ref.refcount));
+			xprt, kref_read(&xprt->xpt_ref));
 	}
 	spin_unlock_bh(&pool->sp_lock);
 out:
@@ -820,7 +820,7 @@ static int svc_handle_xprt(struct svc_rq
 		/* XPT_DATA|XPT_DEFERRED case: */
 		dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
 			rqstp, rqstp->rq_pool->sp_id, xprt,
-			atomic_read(&xprt->xpt_ref.refcount));
+			kref_read(&xprt->xpt_ref));
 		rqstp->rq_deferred = svc_deferred_dequeue(xprt);
 		if (rqstp->rq_deferred)
 			len = svc_deferred_recv(rqstp);
@@ -978,7 +978,7 @@ static void svc_age_temp_xprts(unsigned
 		 * through, close it. */
 		if (!test_and_set_bit(XPT_OLD, &xprt->xpt_flags))
 			continue;
-		if (atomic_read(&xprt->xpt_ref.refcount) > 1 ||
+		if (kref_read(&xprt->xpt_ref) > 1 ||
 		    test_bit(XPT_BUSY, &xprt->xpt_flags))
 			continue;
 		list_del_init(le);
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1222,9 +1222,9 @@ static void __svc_rdma_free(struct work_
 		ib_drain_qp(rdma->sc_qp);
 
 	/* We should only be called from kref_put */
-	if (atomic_read(&xprt->xpt_ref.refcount) != 0)
+	if (kref_read(&xprt->xpt_ref) != 0)
 		pr_err("svcrdma: sc_xprt still in use? (%d)\n",
-		       atomic_read(&xprt->xpt_ref.refcount));
+		       kref_read(&xprt->xpt_ref));
 
 	/*
 	 * Destroy queued, but not processed read completions. Note

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 3/7] kref: Kill kref_sub()
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 1/7] kref: Add KREF_INIT() Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 2/7] kref: Add kref_read() Peter Zijlstra
@ 2016-11-14 17:39 ` Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 4/7] kref: Use kref_get_unless_zero() more Peter Zijlstra
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-ref-3.patch --]
[-- Type: text/plain, Size: 11979 bytes --]

By general sentiment kref_sub() is a bad interface, make it go away.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/block/drbd/drbd_main.c         |    7 ++-
 drivers/block/drbd/drbd_req.c          |   31 +++++------------
 drivers/gpu/drm/ttm/ttm_bo.c           |   59 ++++++++-------------------------
 drivers/gpu/drm/ttm/ttm_execbuf_util.c |    4 --
 include/drm/ttm/ttm_bo_api.h           |   15 --------
 include/linux/kref.h                   |   32 ++---------------
 6 files changed, 36 insertions(+), 112 deletions(-)

--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2948,7 +2948,6 @@ void drbd_delete_device(struct drbd_devi
 	struct drbd_resource *resource = device->resource;
 	struct drbd_connection *connection;
 	struct drbd_peer_device *peer_device;
-	int refs = 3;
 
 	/* move to free_peer_device() */
 	for_each_peer_device(peer_device, device)
@@ -2956,13 +2955,15 @@ void drbd_delete_device(struct drbd_devi
 	drbd_debugfs_device_cleanup(device);
 	for_each_connection(connection, resource) {
 		idr_remove(&connection->peer_devices, device->vnr);
-		refs++;
+		kref_put(&device->kref, drbd_destroy_device);
 	}
 	idr_remove(&resource->devices, device->vnr);
+	kref_put(&device->kref, drbd_destroy_device);
 	idr_remove(&drbd_devices, device_to_minor(device));
+	kref_put(&device->kref, drbd_destroy_device);
 	del_gendisk(device->vdisk);
 	synchronize_rcu();
-	kref_sub(&device->kref, refs, drbd_destroy_device);
+	kref_put(&device->kref, drbd_destroy_device);
 }
 
 static int __init drbd_init(void)
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -421,7 +421,6 @@ static void mod_rq_state(struct drbd_req
 	struct drbd_peer_device *peer_device = first_peer_device(device);
 	unsigned s = req->rq_state;
 	int c_put = 0;
-	int k_put = 0;
 
 	if (drbd_suspended(device) && !((s | clear) & RQ_COMPLETION_SUSP))
 		set |= RQ_COMPLETION_SUSP;
@@ -437,6 +436,8 @@ static void mod_rq_state(struct drbd_req
 
 	/* intent: get references */
 
+	kref_get(&req->kref);
+
 	if (!(s & RQ_LOCAL_PENDING) && (set & RQ_LOCAL_PENDING))
 		atomic_inc(&req->completion_ref);
 
@@ -473,15 +474,12 @@ static void mod_rq_state(struct drbd_req
 
 	if (!(s & RQ_LOCAL_ABORTED) && (set & RQ_LOCAL_ABORTED)) {
 		D_ASSERT(device, req->rq_state & RQ_LOCAL_PENDING);
-		/* local completion may still come in later,
-		 * we need to keep the req object around. */
-		kref_get(&req->kref);
 		++c_put;
 	}
 
 	if ((s & RQ_LOCAL_PENDING) && (clear & RQ_LOCAL_PENDING)) {
 		if (req->rq_state & RQ_LOCAL_ABORTED)
-			++k_put;
+			kref_put(&req->kref, drbd_req_destroy);
 		else
 			++c_put;
 		list_del_init(&req->req_pending_local);
@@ -503,7 +501,7 @@ static void mod_rq_state(struct drbd_req
 		if (s & RQ_NET_SENT)
 			atomic_sub(req->i.size >> 9, &device->ap_in_flight);
 		if (s & RQ_EXP_BARR_ACK)
-			++k_put;
+			kref_put(&req->kref, drbd_req_destroy);
 		req->net_done_jif = jiffies;
 
 		/* in ahead/behind mode, or just in case,
@@ -516,25 +514,16 @@ static void mod_rq_state(struct drbd_req
 
 	/* potentially complete and destroy */
 
-	if (k_put || c_put) {
-		/* Completion does it's own kref_put.  If we are going to
-		 * kref_sub below, we need req to be still around then. */
-		int at_least = k_put + !!c_put;
-		int refcount = kref_read(&req->kref);
-		if (refcount < at_least)
-			drbd_err(device,
-				"mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",
-				s, req->rq_state, refcount, at_least);
-	}
-
 	/* If we made progress, retry conflicting peer requests, if any. */
 	if (req->i.waiting)
 		wake_up(&device->misc_wait);
 
-	if (c_put)
-		k_put += drbd_req_put_completion_ref(req, m, c_put);
-	if (k_put)
-		kref_sub(&req->kref, k_put, drbd_req_destroy);
+	if (c_put) {
+		if (drbd_req_put_completion_ref(req, m, c_put))
+			kref_put(&req->kref, drbd_req_destroy);
+	} else {
+		kref_put(&req->kref, drbd_req_destroy);
+	}
 }
 
 static void drbd_report_io_error(struct drbd_device *device, struct drbd_request *req)
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -181,61 +181,46 @@ void ttm_bo_add_to_lru(struct ttm_buffer
 }
 EXPORT_SYMBOL(ttm_bo_add_to_lru);
 
-int ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
+static void ttm_bo_ref_bug(struct kref *list_kref)
+{
+	BUG();
+}
+
+void ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
 {
 	struct ttm_bo_device *bdev = bo->bdev;
-	int put_count = 0;
 
 	if (bdev->driver->lru_removal)
 		bdev->driver->lru_removal(bo);
 
 	if (!list_empty(&bo->swap)) {
 		list_del_init(&bo->swap);
-		++put_count;
+		kref_put(&bo->list_kref, ttm_bo_ref_bug);
 	}
 	if (!list_empty(&bo->lru)) {
 		list_del_init(&bo->lru);
-		++put_count;
+		kref_put(&bo->list_kref, ttm_bo_ref_bug);
 	}
-
-	return put_count;
-}
-
-static void ttm_bo_ref_bug(struct kref *list_kref)
-{
-	BUG();
-}
-
-void ttm_bo_list_ref_sub(struct ttm_buffer_object *bo, int count,
-			 bool never_free)
-{
-	kref_sub(&bo->list_kref, count,
-		 (never_free) ? ttm_bo_ref_bug : ttm_bo_release_list);
 }
 
 void ttm_bo_del_sub_from_lru(struct ttm_buffer_object *bo)
 {
-	int put_count;
-
 	spin_lock(&bo->glob->lru_lock);
-	put_count = ttm_bo_del_from_lru(bo);
+	ttm_bo_del_from_lru(bo);
 	spin_unlock(&bo->glob->lru_lock);
-	ttm_bo_list_ref_sub(bo, put_count, true);
 }
 EXPORT_SYMBOL(ttm_bo_del_sub_from_lru);
 
 void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo)
 {
 	struct ttm_bo_device *bdev = bo->bdev;
-	int put_count = 0;
 
 	lockdep_assert_held(&bo->resv->lock.base);
 
 	if (bdev->driver->lru_removal)
 		bdev->driver->lru_removal(bo);
 
-	put_count = ttm_bo_del_from_lru(bo);
-	ttm_bo_list_ref_sub(bo, put_count, true);
+	ttm_bo_del_from_lru(bo);
 	ttm_bo_add_to_lru(bo);
 }
 EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
@@ -447,7 +432,6 @@ static void ttm_bo_cleanup_refs_or_queue
 {
 	struct ttm_bo_device *bdev = bo->bdev;
 	struct ttm_bo_global *glob = bo->glob;
-	int put_count;
 	int ret;
 
 	spin_lock(&glob->lru_lock);
@@ -455,13 +439,10 @@ static void ttm_bo_cleanup_refs_or_queue
 
 	if (!ret) {
 		if (!ttm_bo_wait(bo, false, true)) {
-			put_count = ttm_bo_del_from_lru(bo);
-
+			ttm_bo_del_from_lru(bo);
 			spin_unlock(&glob->lru_lock);
 			ttm_bo_cleanup_memtype_use(bo);
 
-			ttm_bo_list_ref_sub(bo, put_count, true);
-
 			return;
 		} else
 			ttm_bo_flush_all_fences(bo);
@@ -504,7 +485,6 @@ static int ttm_bo_cleanup_refs_and_unloc
 					  bool no_wait_gpu)
 {
 	struct ttm_bo_global *glob = bo->glob;
-	int put_count;
 	int ret;
 
 	ret = ttm_bo_wait(bo, false, true);
@@ -554,15 +534,13 @@ static int ttm_bo_cleanup_refs_and_unloc
 		return ret;
 	}
 
-	put_count = ttm_bo_del_from_lru(bo);
+	ttm_bo_del_from_lru(bo);
 	list_del_init(&bo->ddestroy);
-	++put_count;
+	kref_put(&bo->list_kref, ttm_bo_ref_bug);
 
 	spin_unlock(&glob->lru_lock);
 	ttm_bo_cleanup_memtype_use(bo);
 
-	ttm_bo_list_ref_sub(bo, put_count, true);
-
 	return 0;
 }
 
@@ -726,7 +704,7 @@ static int ttm_mem_evict_first(struct tt
 	struct ttm_bo_global *glob = bdev->glob;
 	struct ttm_mem_type_manager *man = &bdev->man[mem_type];
 	struct ttm_buffer_object *bo;
-	int ret = -EBUSY, put_count;
+	int ret = -EBUSY;
 
 	spin_lock(&glob->lru_lock);
 	list_for_each_entry(bo, &man->lru, lru) {
@@ -762,13 +740,11 @@ static int ttm_mem_evict_first(struct tt
 		return ret;
 	}
 
-	put_count = ttm_bo_del_from_lru(bo);
+	ttm_bo_del_from_lru(bo);
 	spin_unlock(&glob->lru_lock);
 
 	BUG_ON(ret != 0);
 
-	ttm_bo_list_ref_sub(bo, put_count, true);
-
 	ret = ttm_bo_evict(bo, interruptible, no_wait_gpu);
 	ttm_bo_unreserve(bo);
 
@@ -1653,7 +1629,6 @@ static int ttm_bo_swapout(struct ttm_mem
 	    container_of(shrink, struct ttm_bo_global, shrink);
 	struct ttm_buffer_object *bo;
 	int ret = -EBUSY;
-	int put_count;
 	uint32_t swap_placement = (TTM_PL_FLAG_CACHED | TTM_PL_FLAG_SYSTEM);
 
 	spin_lock(&glob->lru_lock);
@@ -1676,11 +1651,9 @@ static int ttm_bo_swapout(struct ttm_mem
 		return ret;
 	}
 
-	put_count = ttm_bo_del_from_lru(bo);
+	ttm_bo_del_from_lru(bo);
 	spin_unlock(&glob->lru_lock);
 
-	ttm_bo_list_ref_sub(bo, put_count, true);
-
 	/**
 	 * Move to system cached
 	 */
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -48,9 +48,7 @@ static void ttm_eu_del_from_lru_locked(s
 
 	list_for_each_entry(entry, list, head) {
 		struct ttm_buffer_object *bo = entry->bo;
-		unsigned put_count = ttm_bo_del_from_lru(bo);
-
-		ttm_bo_list_ref_sub(bo, put_count, true);
+		ttm_bo_del_from_lru(bo);
 	}
 }
 
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -330,19 +330,6 @@ extern int ttm_bo_validate(struct ttm_bu
  */
 extern void ttm_bo_unref(struct ttm_buffer_object **bo);
 
-
-/**
- * ttm_bo_list_ref_sub
- *
- * @bo: The buffer object.
- * @count: The number of references with which to decrease @bo::list_kref;
- * @never_free: The refcount should not reach zero with this operation.
- *
- * Release @count lru list references to this buffer object.
- */
-extern void ttm_bo_list_ref_sub(struct ttm_buffer_object *bo, int count,
-				bool never_free);
-
 /**
  * ttm_bo_add_to_lru
  *
@@ -365,7 +352,7 @@ extern void ttm_bo_add_to_lru(struct ttm
  * and is usually called just immediately after the bo has been reserved to
  * avoid recursive reservation from lru lists.
  */
-extern int ttm_bo_del_from_lru(struct ttm_buffer_object *bo);
+extern void ttm_bo_del_from_lru(struct ttm_buffer_object *bo);
 
 /**
  * ttm_bo_move_to_lru_tail
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -54,9 +54,8 @@ static inline void kref_get(struct kref
 }
 
 /**
- * kref_sub - subtract a number of refcounts for object.
+ * kref_put - decrement refcount for object.
  * @kref: object.
- * @count: Number of recounts to subtract.
  * @release: pointer to the function that will clean up the object when the
  *	     last reference to the object is released.
  *	     This pointer is required, and it is not acceptable to pass kfree
@@ -65,46 +64,23 @@ static inline void kref_get(struct kref
  *	     maintainer, and anyone else who happens to notice it.  You have
  *	     been warned.
  *
- * Subtract @count from the refcount, and if 0, call release().
+ * Decrement the refcount, and if 0, call release().
  * Return 1 if the object was removed, otherwise return 0.  Beware, if this
  * function returns 0, you still can not count on the kref from remaining in
  * memory.  Only use the return value if you want to see if the kref is now
  * gone, not present.
  */
-static inline int kref_sub(struct kref *kref, unsigned int count,
-	     void (*release)(struct kref *kref))
+static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref))
 {
 	WARN_ON(release == NULL);
 
-	if (atomic_sub_and_test((int) count, &kref->refcount)) {
+	if (atomic_dec_and_test(&kref->refcount)) {
 		release(kref);
 		return 1;
 	}
 	return 0;
 }
 
-/**
- * kref_put - decrement refcount for object.
- * @kref: object.
- * @release: pointer to the function that will clean up the object when the
- *	     last reference to the object is released.
- *	     This pointer is required, and it is not acceptable to pass kfree
- *	     in as this function.  If the caller does pass kfree to this
- *	     function, you will be publicly mocked mercilessly by the kref
- *	     maintainer, and anyone else who happens to notice it.  You have
- *	     been warned.
- *
- * Decrement the refcount, and if 0, call release().
- * Return 1 if the object was removed, otherwise return 0.  Beware, if this
- * function returns 0, you still can not count on the kref from remaining in
- * memory.  Only use the return value if you want to see if the kref is now
- * gone, not present.
- */
-static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref))
-{
-	return kref_sub(kref, 1, release);
-}
-
 static inline int kref_put_mutex(struct kref *kref,
 				 void (*release)(struct kref *kref),
 				 struct mutex *lock)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 4/7] kref: Use kref_get_unless_zero() more
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
                   ` (2 preceding siblings ...)
  2016-11-14 17:39 ` [RFC][PATCH 3/7] kref: Kill kref_sub() Peter Zijlstra
@ 2016-11-14 17:39 ` Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 5/7] kref: Implement kref_put_lock() Peter Zijlstra
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-ref-4.patch --]
[-- Type: text/plain, Size: 1387 bytes --]

For some obscure reason apparmor thinks its needs to locally implement
kref primitives that already exist. Stop doing this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 security/apparmor/include/apparmor.h |    6 ------
 security/apparmor/include/policy.h   |    4 ++--
 2 files changed, 2 insertions(+), 8 deletions(-)

--- a/security/apparmor/include/apparmor.h
+++ b/security/apparmor/include/apparmor.h
@@ -78,12 +78,6 @@ static inline void *kvzalloc(size_t size
 	return __aa_kvmalloc(size, __GFP_ZERO);
 }
 
-/* returns 0 if kref not incremented */
-static inline int kref_get_not0(struct kref *kref)
-{
-	return atomic_inc_not_zero(&kref->refcount);
-}
-
 /**
  * aa_strneq - compare null terminated @str to a non null terminated substring
  * @str: a null terminated string
--- a/security/apparmor/include/policy.h
+++ b/security/apparmor/include/policy.h
@@ -287,7 +287,7 @@ static inline struct aa_profile *aa_get_
  */
 static inline struct aa_profile *aa_get_profile_not0(struct aa_profile *p)
 {
-	if (p && kref_get_not0(&p->count))
+	if (p && kref_get_unless_zero(&p->count))
 		return p;
 
 	return NULL;
@@ -307,7 +307,7 @@ static inline struct aa_profile *aa_get_
 	rcu_read_lock();
 	do {
 		c = rcu_dereference(*p);
-	} while (c && !kref_get_not0(&c->count));
+	} while (c && !kref_get_unless_zero(&c->count));
 	rcu_read_unlock();
 
 	return c;

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 5/7] kref: Implement kref_put_lock()
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
                   ` (3 preceding siblings ...)
  2016-11-14 17:39 ` [RFC][PATCH 4/7] kref: Use kref_get_unless_zero() more Peter Zijlstra
@ 2016-11-14 17:39 ` Peter Zijlstra
  2016-11-14 20:35   ` Kees Cook
  2016-11-14 17:39 ` [RFC][PATCH 6/7] kref: Avoid more abuse Peter Zijlstra
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-ref-4a.patch --]
[-- Type: text/plain, Size: 1810 bytes --]

Because home-rolling your own is _awesome_, stop doing it. Provide
kref_put_lock(), just like kref_put_mutex() but for a spinlock.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/kref.h |   21 +++++++++++++++------
 net/sunrpc/svcauth.c |   15 ++++++++++-----
 2 files changed, 25 insertions(+), 11 deletions(-)

--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -86,12 +86,21 @@ static inline int kref_put_mutex(struct
 				 struct mutex *lock)
 {
 	WARN_ON(release == NULL);
-	if (unlikely(!atomic_add_unless(&kref->refcount, -1, 1))) {
-		mutex_lock(lock);
-		if (unlikely(!atomic_dec_and_test(&kref->refcount))) {
-			mutex_unlock(lock);
-			return 0;
-		}
+
+	if (atomic_dec_and_mutex_lock(&kref->refcount, lock)) {
+		release(kref);
+		return 1;
+	}
+	return 0;
+}
+
+static inline int kref_put_lock(struct kref *kref,
+				void (*release)(struct kref *kref),
+				spinlock_t *lock)
+{
+	WARN_ON(release == NULL);
+
+	if (atomic_dec_and_lock(&kref->refcount, lock)) {
 		release(kref);
 		return 1;
 	}
--- a/net/sunrpc/svcauth.c
+++ b/net/sunrpc/svcauth.c
@@ -127,13 +127,18 @@ static struct hlist_head	auth_domain_tab
 static spinlock_t	auth_domain_lock =
 	__SPIN_LOCK_UNLOCKED(auth_domain_lock);
 
+static void auth_domain_release(struct kref *kref)
+{
+	struct auth_domain *dom = container_of(kref, struct auth_domain, ref);
+
+	hlist_del(&dom->hash);
+	dom->flavour->domain_release(dom);
+	spin_unlock(&auth_domain_lock);
+}
+
 void auth_domain_put(struct auth_domain *dom)
 {
-	if (atomic_dec_and_lock(&dom->ref.refcount, &auth_domain_lock)) {
-		hlist_del(&dom->hash);
-		dom->flavour->domain_release(dom);
-		spin_unlock(&auth_domain_lock);
-	}
+	kref_put_lock(&dom->ref, auth_domain_release, &auth_domain_lock);
 }
 EXPORT_SYMBOL_GPL(auth_domain_put);
 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 6/7] kref: Avoid more abuse
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
                   ` (4 preceding siblings ...)
  2016-11-14 17:39 ` [RFC][PATCH 5/7] kref: Implement kref_put_lock() Peter Zijlstra
@ 2016-11-14 17:39 ` Peter Zijlstra
  2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
  2016-11-15  7:27 ` [RFC][PATCH 0/7] kref improvements Greg KH
  7 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-ref-4b.patch --]
[-- Type: text/plain, Size: 629 bytes --]

Leak references by unbalanced get, instead of poking at kref
implementation details.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/usb/mon/mon_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/mon/mon_main.c
+++ b/drivers/usb/mon/mon_main.c
@@ -409,7 +409,7 @@ static void __exit mon_exit(void)
 			printk(KERN_ERR TAG
 			    ": Outstanding opens (%d) on usb%d, leaking...\n",
 			    mbus->nreaders, mbus->u_bus->busnum);
-			atomic_set(&mbus->ref.refcount, 2);	/* Force leak */
+			kref_get(&mbus->ref); /* Force leak */
 		}
 
 		mon_dissolve(mbus, mbus->u_bus);

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
                   ` (5 preceding siblings ...)
  2016-11-14 17:39 ` [RFC][PATCH 6/7] kref: Avoid more abuse Peter Zijlstra
@ 2016-11-14 17:39 ` Peter Zijlstra
  2016-11-15  8:40   ` Ingo Molnar
                     ` (4 more replies)
  2016-11-15  7:27 ` [RFC][PATCH 0/7] kref improvements Greg KH
  7 siblings, 5 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-14 17:39 UTC (permalink / raw)
  To: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave
  Cc: linux-kernel, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-ref-5.patch --]
[-- Type: text/plain, Size: 7981 bytes --]

Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides overflow and underflow checks as well as saturation
semantics such that when it overflows, we'll never attempt to free it
again, ever.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/kref.h     |   29 ++----
 include/linux/refcount.h |  221 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 232 insertions(+), 18 deletions(-)

--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -15,16 +15,13 @@
 #ifndef _KREF_H_
 #define _KREF_H_
 
-#include <linux/bug.h>
-#include <linux/atomic.h>
-#include <linux/kernel.h>
-#include <linux/mutex.h>
+#include <linux/refcount.h>
 
 struct kref {
-	atomic_t refcount;
+	refcount_t refcount;
 };
 
-#define KREF_INIT(n)	{ .refcount = ATOMIC_INIT(n), }
+#define KREF_INIT(n)	{ .refcount = REFCOUNT_INIT(n), }
 
 /**
  * kref_init - initialize object.
@@ -32,12 +29,12 @@ struct kref {
  */
 static inline void kref_init(struct kref *kref)
 {
-	atomic_set(&kref->refcount, 1);
+	refcount_set(&kref->refcount, 1);
 }
 
-static inline int kref_read(const struct kref *kref)
+static inline unsigned int kref_read(const struct kref *kref)
 {
-	return atomic_read(&kref->refcount);
+	return refcount_read(&kref->refcount);
 }
 
 /**
@@ -46,11 +43,7 @@ static inline int kref_read(const struct
  */
 static inline void kref_get(struct kref *kref)
 {
-	/* If refcount was 0 before incrementing then we have a race
-	 * condition when this kref is freeing by some other thread right now.
-	 * In this case one should use kref_get_unless_zero()
-	 */
-	WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
+	refcount_inc(&kref->refcount);
 }
 
 /**
@@ -74,7 +67,7 @@ static inline int kref_put(struct kref *
 {
 	WARN_ON(release == NULL);
 
-	if (atomic_dec_and_test(&kref->refcount)) {
+	if (refcount_dec_and_test(&kref->refcount)) {
 		release(kref);
 		return 1;
 	}
@@ -87,7 +80,7 @@ static inline int kref_put_mutex(struct
 {
 	WARN_ON(release == NULL);
 
-	if (atomic_dec_and_mutex_lock(&kref->refcount, lock)) {
+	if (refcount_dec_and_mutex_lock(&kref->refcount, lock)) {
 		release(kref);
 		return 1;
 	}
@@ -100,7 +93,7 @@ static inline int kref_put_lock(struct k
 {
 	WARN_ON(release == NULL);
 
-	if (atomic_dec_and_lock(&kref->refcount, lock)) {
+	if (refcount_dec_and_lock(&kref->refcount, lock)) {
 		release(kref);
 		return 1;
 	}
@@ -125,6 +118,6 @@ static inline int kref_put_lock(struct k
  */
 static inline int __must_check kref_get_unless_zero(struct kref *kref)
 {
-	return atomic_add_unless(&kref->refcount, 1, 0);
+	return refcount_inc_not_zero(&kref->refcount);
 }
 #endif /* _KREF_H_ */
--- /dev/null
+++ b/include/linux/refcount.h
@@ -0,0 +1,221 @@
+#ifndef _LINUX_REFCOUNT_H
+#define _LINUX_REFCOUNT_H
+
+/*
+ * Variant of atomic_t specialized for refcounting.
+ *
+ * The interface matches the atomic_t interface (to aid in porting) but only
+ * provides the few functions one should use for refcounting.
+ *
+ * They add explicit overflow and underflow tests, once a refcount hits
+ * UINT_MAX it stays there.
+ *
+ * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
+ * and provide only what is strictly required for refcounts.
+ *
+ * The increments are fully relaxed; these will not provide ordering. The
+ * rationale is that whatever is used to obtain the object we're increasing the
+ * reference count on will provide the ordering. For locked data structures,
+ * its the lock acquire, for RCU/lockless data structures its the dependent
+ * read.
+ *
+ * Do note that inc_not_zero() provides a control dependency which will order
+ * future stores against the inc, this ensures we'll never modify the object
+ * if we did not in fact acquire a reference.
+ *
+ * The decrements will provide release order, such that all the prior loads and
+ * stores will be issued before we proceed with freeing the object.
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+
+typedef struct refcount_struct {
+	atomic_t refs;
+} refcount_t;
+
+#define REFCOUNT_INIT(n)	{ .refs = ATOMIC_INIT(n), }
+
+static inline void refcount_set(refcount_t *r, int n)
+{
+	atomic_set(&r->refs, n);
+}
+
+static inline unsigned int refcount_read(const refcount_t *r)
+{
+	return atomic_read(&r->refs);
+}
+
+/*
+ * Similar to atomic_inc(), will BUG on overflow and saturate at UINT_MAX.
+ *
+ * Provides no memory ordering, it is assumed the caller already has a
+ * reference on the object, will WARN when this is not so.
+ */
+static inline void refcount_inc(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		WARN_ON_ONCE(!val);
+
+		new = val + 1;
+		if (new < val)
+			BUG(); /* overflow */
+
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+}
+
+/*
+ * Similar to atomic_inc_not_zero(), will BUG on overflow and saturate at UINT_MAX.
+ *
+ * Provides no memory ordering, it is assumed the caller has guaranteed the
+ * object memory to be stable (RCU, etc.). It does provide a control dependency
+ * and thereby orders future stores.
+ */
+static inline __must_check
+bool refcount_inc_not_zero(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (!val)
+			return false;
+
+		new = val + 1;
+		if (new < val)
+			BUG(); /* overflow */
+
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return true;
+}
+
+/*
+ * Similar to atomic_dec_and_test(), it will BUG on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before a subsequent free.
+ */
+static inline __must_check
+bool refcount_dec_and_test(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		new = val - 1;
+		if (new > val)
+			BUG(); /* underflow */
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return !new;
+}
+
+/*
+ * Similar to atomic_dec_and_mutex_lock(), it will BUG on underflow and fail
+ * to decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before a subsequent free. This allows free() while holding the mutex.
+ */
+static inline __must_check
+bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+	bool locked = false;
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		if (val == 1 && !locked) {
+			locked = true;
+			mutex_lock(lock);
+		}
+
+		new = val - 1;
+		if (new > val) {
+			if (locked)
+				mutex_unlock(lock);
+			BUG(); /* underflow */
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	if (new && locked)
+		mutex_unlock(lock);
+
+	return !new;
+}
+
+/*
+ * Similar to atomic_dec_and_lock(), it will BUG on underflow and fail
+ * to decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before a subsequent free. This allows free() while holding the lock.
+ */
+static inline __must_check
+bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+	bool locked = false;
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		if (val == 1 && !locked) {
+			locked = true;
+			spin_lock(lock);
+		}
+
+		new = val - 1;
+		if (new > val) {
+			if (locked)
+				spin_unlock(lock);
+			BUG(); /* underflow */
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	if (new && locked)
+		spin_unlock(lock);
+
+	return !new;
+}
+
+#endif /* _LINUX_REFCOUNT_H */

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-14 17:39 ` [RFC][PATCH 2/7] kref: Add kref_read() Peter Zijlstra
@ 2016-11-14 18:16   ` Christoph Hellwig
  2016-11-15  7:28     ` Greg KH
  2016-11-15  7:33   ` [RFC][PATCH 2/7] kref: Add kref_read() Greg KH
  1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2016-11-14 18:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> Since we need to change the implementation, stop exposing internals.
> 
> Provide kref_read() to read the current reference count; typically
> used for debug messages.

Can we just provide a printk specifier for a kref value instead as
that is the only valid use case for reading the value?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 5/7] kref: Implement kref_put_lock()
  2016-11-14 17:39 ` [RFC][PATCH 5/7] kref: Implement kref_put_lock() Peter Zijlstra
@ 2016-11-14 20:35   ` Kees Cook
  2016-11-15  7:50     ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Kees Cook @ 2016-11-14 20:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML

On Mon, Nov 14, 2016 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> Because home-rolling your own is _awesome_, stop doing it. Provide
> kref_put_lock(), just like kref_put_mutex() but for a spinlock.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/kref.h |   21 +++++++++++++++------
>  net/sunrpc/svcauth.c |   15 ++++++++++-----
>  2 files changed, 25 insertions(+), 11 deletions(-)
>
> --- a/include/linux/kref.h
> +++ b/include/linux/kref.h
> @@ -86,12 +86,21 @@ static inline int kref_put_mutex(struct
>                                  struct mutex *lock)
>  {
>         WARN_ON(release == NULL);

This WARN_ON makes sense, yes, though it seems like it should be deal
with differently. If it's NULL, we'll just Oops when we call release()
later... Seems like this should saturate the kref or something else
similar.

> -       if (unlikely(!atomic_add_unless(&kref->refcount, -1, 1))) {
> -               mutex_lock(lock);
> -               if (unlikely(!atomic_dec_and_test(&kref->refcount))) {
> -                       mutex_unlock(lock);
> -                       return 0;
> -               }
> +
> +       if (atomic_dec_and_mutex_lock(&kref->refcount, lock)) {
> +               release(kref);
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +static inline int kref_put_lock(struct kref *kref,
> +                               void (*release)(struct kref *kref),
> +                               spinlock_t *lock)
> +{
> +       WARN_ON(release == NULL);
> +
> +       if (atomic_dec_and_lock(&kref->refcount, lock)) {
>                 release(kref);
>                 return 1;
>         }
> --- a/net/sunrpc/svcauth.c
> +++ b/net/sunrpc/svcauth.c
> @@ -127,13 +127,18 @@ static struct hlist_head  auth_domain_tab
>  static spinlock_t      auth_domain_lock =
>         __SPIN_LOCK_UNLOCKED(auth_domain_lock);
>
> +static void auth_domain_release(struct kref *kref)
> +{
> +       struct auth_domain *dom = container_of(kref, struct auth_domain, ref);
> +
> +       hlist_del(&dom->hash);
> +       dom->flavour->domain_release(dom);
> +       spin_unlock(&auth_domain_lock);
> +}
> +
>  void auth_domain_put(struct auth_domain *dom)
>  {
> -       if (atomic_dec_and_lock(&dom->ref.refcount, &auth_domain_lock)) {
> -               hlist_del(&dom->hash);
> -               dom->flavour->domain_release(dom);
> -               spin_unlock(&auth_domain_lock);
> -       }
> +       kref_put_lock(&dom->ref, auth_domain_release, &auth_domain_lock);
>  }
>  EXPORT_SYMBOL_GPL(auth_domain_put);
>
>
>



-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 0/7] kref improvements
  2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
                   ` (6 preceding siblings ...)
  2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
@ 2016-11-15  7:27 ` Greg KH
  2016-11-15  7:42   ` Ingo Molnar
  2016-11-15  7:48   ` Peter Zijlstra
  7 siblings, 2 replies; 96+ messages in thread
From: Greg KH @ 2016-11-15  7:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, will.deacon, elena.reshetova, arnd, tglx, mingo, hpa,
	dave, linux-kernel

On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> This series unfscks kref and then implements it in terms of refcount_t.
> 
> x86_64-allyesconfig compile tested and boot tested with my regular config.
> 
> refcount_t is as per the previous thread, it BUGs on over-/underflow and
> saturates at UINT_MAX, such that if we ever overflow, we'll never free again.
> 
> 

Thanks so much for doing these, at the very least, I want to take the
kref-abuse-fixes now as those users shouldn't be doing those foolish
things.  Any objection for me taking some of them through my tree now?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-14 18:16   ` Christoph Hellwig
@ 2016-11-15  7:28     ` Greg KH
  2016-11-15  7:47       ` Peter Zijlstra
  2016-11-15  8:37       ` [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref' Ingo Molnar
  0 siblings, 2 replies; 96+ messages in thread
From: Greg KH @ 2016-11-15  7:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, keescook, will.deacon, elena.reshetova, arnd,
	tglx, mingo, hpa, dave, linux-kernel

On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > Since we need to change the implementation, stop exposing internals.
> > 
> > Provide kref_read() to read the current reference count; typically
> > used for debug messages.
> 
> Can we just provide a printk specifier for a kref value instead as
> that is the only valid use case for reading the value?

Yeah, that would be great as no one should be doing anything
logic-related based on the kref value.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-14 17:39 ` [RFC][PATCH 2/7] kref: Add kref_read() Peter Zijlstra
  2016-11-14 18:16   ` Christoph Hellwig
@ 2016-11-15  7:33   ` Greg KH
  2016-11-15  8:03     ` Peter Zijlstra
  1 sibling, 1 reply; 96+ messages in thread
From: Greg KH @ 2016-11-15  7:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, will.deacon, elena.reshetova, arnd, tglx, mingo, hpa,
	dave, linux-kernel

On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> Since we need to change the implementation, stop exposing internals.
> 
> Provide kref_read() to read the current reference count; typically
> used for debug messages.
> 
> Kills two anti-patterns:
> 
> 	atomic_read(&kref->refcount)
> 	kref->refcount.counter
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  drivers/block/drbd/drbd_req.c                |    2 -
>  drivers/block/rbd.c                          |    8 ++---
>  drivers/block/virtio_blk.c                   |    2 -
>  drivers/gpu/drm/drm_gem_cma_helper.c         |    2 -
>  drivers/gpu/drm/drm_info.c                   |    2 -
>  drivers/gpu/drm/drm_mode_object.c            |    4 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c        |    2 -
>  drivers/gpu/drm/msm/msm_gem.c                |    2 -
>  drivers/gpu/drm/nouveau/nouveau_fence.c      |    2 -
>  drivers/gpu/drm/omapdrm/omap_gem.c           |    2 -
>  drivers/gpu/drm/ttm/ttm_bo.c                 |    4 +-
>  drivers/gpu/drm/ttm/ttm_object.c             |    2 -
>  drivers/infiniband/hw/cxgb3/iwch_cm.h        |    6 ++--
>  drivers/infiniband/hw/cxgb3/iwch_qp.c        |    2 -
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h       |    6 ++--
>  drivers/infiniband/hw/cxgb4/qp.c             |    2 -
>  drivers/infiniband/hw/usnic/usnic_ib_sysfs.c |    6 ++--
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |    4 +-
>  drivers/misc/genwqe/card_dev.c               |    2 -
>  drivers/misc/mei/debugfs.c                   |    2 -
>  drivers/pci/hotplug/pnv_php.c                |    2 -
>  drivers/pci/slot.c                           |    2 -
>  drivers/scsi/bnx2fc/bnx2fc_io.c              |    8 ++---
>  drivers/scsi/cxgbi/libcxgbi.h                |    4 +-
>  drivers/scsi/lpfc/lpfc_debugfs.c             |    2 -
>  drivers/scsi/lpfc/lpfc_els.c                 |    2 -
>  drivers/scsi/lpfc/lpfc_hbadisc.c             |   40 +++++++++++++--------------
>  drivers/scsi/lpfc/lpfc_init.c                |    3 --
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c           |    4 +-
>  drivers/staging/android/ion/ion.c            |    2 -
>  drivers/staging/comedi/comedi_buf.c          |    2 -
>  drivers/target/target_core_pr.c              |   10 +++---
>  drivers/target/tcm_fc/tfc_sess.c             |    2 -
>  drivers/usb/gadget/function/f_fs.c           |    2 -
>  fs/exofs/sys.c                               |    2 -
>  fs/ocfs2/cluster/netdebug.c                  |    2 -
>  fs/ocfs2/cluster/tcp.c                       |    2 -
>  fs/ocfs2/dlm/dlmdebug.c                      |   12 ++++----
>  fs/ocfs2/dlm/dlmdomain.c                     |    2 -
>  fs/ocfs2/dlm/dlmmaster.c                     |    8 ++---
>  fs/ocfs2/dlm/dlmunlock.c                     |    2 -
>  include/drm/drm_framebuffer.h                |    2 -
>  include/drm/ttm/ttm_bo_driver.h              |    4 +-
>  include/linux/kref.h                         |    5 +++
>  include/linux/sunrpc/cache.h                 |    2 -
>  include/net/bluetooth/hci_core.h             |    4 +-
>  net/bluetooth/6lowpan.c                      |    2 -
>  net/bluetooth/a2mp.c                         |    4 +-
>  net/bluetooth/amp.c                          |    4 +-
>  net/bluetooth/l2cap_core.c                   |    4 +-
>  net/ceph/messenger.c                         |    4 +-
>  net/ceph/osd_client.c                        |   10 +++---
>  net/sunrpc/cache.c                           |    2 -
>  net/sunrpc/svc_xprt.c                        |    6 ++--
>  net/sunrpc/xprtrdma/svc_rdma_transport.c     |    4 +-
>  55 files changed, 120 insertions(+), 116 deletions(-)
> 
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
>  		/* Completion does it's own kref_put.  If we are going to
>  		 * kref_sub below, we need req to be still around then. */
>  		int at_least = k_put + !!c_put;
> -		int refcount = atomic_read(&req->kref.refcount);
> +		int refcount = kref_read(&req->kref);
>  		if (refcount < at_least)
>  			drbd_err(device,
>  				"mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",

As proof of "things you should never do", here is one such example.

ugh.


> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
>  	/* Stop all the virtqueues. */
>  	vdev->config->reset(vdev);
>  
> -	refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
> +	refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
>  	put_disk(vblk->disk);
>  	vdev->config->del_vqs(vdev);
>  	kfree(vblk->vqs);

And this too, ugh, that's a huge abuse and is probably totally wrong...

thanks again for digging through this crap.  I wonder if we need to name
the kref reference variable "do_not_touch_this_ever" or some such thing
to catch all of the people who try to be "too smart".

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 0/7] kref improvements
  2016-11-15  7:27 ` [RFC][PATCH 0/7] kref improvements Greg KH
@ 2016-11-15  7:42   ` Ingo Molnar
  2016-11-15 15:05     ` Greg KH
  2016-11-15  7:48   ` Peter Zijlstra
  1 sibling, 1 reply; 96+ messages in thread
From: Ingo Molnar @ 2016-11-15  7:42 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, keescook, will.deacon, elena.reshetova, arnd,
	tglx, hpa, dave, linux-kernel


* Greg KH <gregkh@linuxfoundation.org> wrote:

> On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> > This series unfscks kref and then implements it in terms of refcount_t.
> > 
> > x86_64-allyesconfig compile tested and boot tested with my regular config.
> > 
> > refcount_t is as per the previous thread, it BUGs on over-/underflow and
> > saturates at UINT_MAX, such that if we ever overflow, we'll never free again.
> > 
> > 
> 
> Thanks so much for doing these, at the very least, I want to take the
> kref-abuse-fixes now as those users shouldn't be doing those foolish
> things.  Any objection for me taking some of them through my tree now?

Very nice series indeed!

We normally route atomics related patches through tip:locking/core (there's also 
tip:atomic/core), but this is a special case I think, given how broadly it 
interacts with driver code.

So both would work I think: we could concentrate these and only these patches into 
tip:atomic/core into an append-only tree, or you can carry them in the driver tree 
- whichever variant you prefer!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-15  7:28     ` Greg KH
@ 2016-11-15  7:47       ` Peter Zijlstra
  2016-11-15  8:37       ` [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref' Ingo Molnar
  1 sibling, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15  7:47 UTC (permalink / raw)
  To: Greg KH
  Cc: Christoph Hellwig, keescook, will.deacon, elena.reshetova, arnd,
	tglx, mingo, hpa, dave, linux-kernel

On Tue, Nov 15, 2016 at 08:28:55AM +0100, Greg KH wrote:
> On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> > On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > > Since we need to change the implementation, stop exposing internals.
> > > 
> > > Provide kref_read() to read the current reference count; typically
> > > used for debug messages.
> > 
> > Can we just provide a printk specifier for a kref value instead as
> > that is the only valid use case for reading the value?
> 
> Yeah, that would be great as no one should be doing anything
> logic-related based on the kref value.

IIRC there are a few users that WARN_ON() the value with a minimum bound
or somesuch. Those would be left in the cold, but yes I too like the
idea of a printk() format specifier.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 0/7] kref improvements
  2016-11-15  7:27 ` [RFC][PATCH 0/7] kref improvements Greg KH
  2016-11-15  7:42   ` Ingo Molnar
@ 2016-11-15  7:48   ` Peter Zijlstra
  1 sibling, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15  7:48 UTC (permalink / raw)
  To: Greg KH
  Cc: keescook, will.deacon, elena.reshetova, arnd, tglx, mingo, hpa,
	dave, linux-kernel

On Tue, Nov 15, 2016 at 08:27:42AM +0100, Greg KH wrote:
> On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> > This series unfscks kref and then implements it in terms of refcount_t.
> > 
> > x86_64-allyesconfig compile tested and boot tested with my regular config.
> > 
> > refcount_t is as per the previous thread, it BUGs on over-/underflow and
> > saturates at UINT_MAX, such that if we ever overflow, we'll never free again.
> > 
> > 
> 
> Thanks so much for doing these, at the very least, I want to take the
> kref-abuse-fixes now as those users shouldn't be doing those foolish
> things.  Any objection for me taking some of them through my tree now?

None at all, but please double check at least the 'kill kref_sub()' one,
I might have messed up drbd or something, that code isn't entirely
transparant.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 5/7] kref: Implement kref_put_lock()
  2016-11-14 20:35   ` Kees Cook
@ 2016-11-15  7:50     ` Peter Zijlstra
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15  7:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML

On Mon, Nov 14, 2016 at 12:35:48PM -0800, Kees Cook wrote:
> On Mon, Nov 14, 2016 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Because home-rolling your own is _awesome_, stop doing it. Provide
> > kref_put_lock(), just like kref_put_mutex() but for a spinlock.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/linux/kref.h |   21 +++++++++++++++------
> >  net/sunrpc/svcauth.c |   15 ++++++++++-----
> >  2 files changed, 25 insertions(+), 11 deletions(-)
> >
> > --- a/include/linux/kref.h
> > +++ b/include/linux/kref.h
> > @@ -86,12 +86,21 @@ static inline int kref_put_mutex(struct
> >                                  struct mutex *lock)
> >  {
> >         WARN_ON(release == NULL);
> 
> This WARN_ON makes sense, yes, though it seems like it should be deal
> with differently. If it's NULL, we'll just Oops when we call release()
> later... Seems like this should saturate the kref or something else
> similar.

So I simply took the pattern from the existing kref_put().

But I like it more in these kref_put_{lock,mutex}() variants, because
someone will need to unlock. If we simply crash/bug without unlock we'll
have broken state the rest of the kernel cannot fix up.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-15  7:33   ` [RFC][PATCH 2/7] kref: Add kref_read() Greg KH
@ 2016-11-15  8:03     ` Peter Zijlstra
  2016-11-15 20:53       ` Kees Cook
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15  8:03 UTC (permalink / raw)
  To: Greg KH
  Cc: keescook, will.deacon, elena.reshetova, arnd, tglx, mingo, hpa,
	dave, linux-kernel

On Tue, Nov 15, 2016 at 08:33:22AM +0100, Greg KH wrote:
> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:

> > --- a/drivers/block/drbd/drbd_req.c
> > +++ b/drivers/block/drbd/drbd_req.c
> > @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
> >  		/* Completion does it's own kref_put.  If we are going to
> >  		 * kref_sub below, we need req to be still around then. */
> >  		int at_least = k_put + !!c_put;
> > -		int refcount = atomic_read(&req->kref.refcount);
> > +		int refcount = kref_read(&req->kref);
> >  		if (refcount < at_least)
> >  			drbd_err(device,
> >  				"mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",
> 
> As proof of "things you should never do", here is one such example.
> 
> ugh.
> 
> 
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
> >  	/* Stop all the virtqueues. */
> >  	vdev->config->reset(vdev);
> >  
> > -	refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
> > +	refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
> >  	put_disk(vblk->disk);
> >  	vdev->config->del_vqs(vdev);
> >  	kfree(vblk->vqs);
> 
> And this too, ugh, that's a huge abuse and is probably totally wrong...
> 
> thanks again for digging through this crap.  I wonder if we need to name
> the kref reference variable "do_not_touch_this_ever" or some such thing
> to catch all of the people who try to be "too smart".

There's unimaginable bong hits involved in this stuff, in the end I
resorted to brute force and scripts to convert all this.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
  2016-11-15  7:28     ` Greg KH
  2016-11-15  7:47       ` Peter Zijlstra
@ 2016-11-15  8:37       ` Ingo Molnar
  2016-11-15  8:43         ` [PATCH v2] " Ingo Molnar
  2016-11-15 16:42         ` [PATCH] " Linus Torvalds
  1 sibling, 2 replies; 96+ messages in thread
From: Ingo Molnar @ 2016-11-15  8:37 UTC (permalink / raw)
  To: Greg KH
  Cc: Christoph Hellwig, Peter Zijlstra, keescook, will.deacon,
	elena.reshetova, arnd, tglx, hpa, Linus Torvalds, dave,
	linux-kernel


* Greg KH <gregkh@linuxfoundation.org> wrote:

> On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> > On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > > Since we need to change the implementation, stop exposing internals.
> > > 
> > > Provide kref_read() to read the current reference count; typically
> > > used for debug messages.
> > 
> > Can we just provide a printk specifier for a kref value instead as
> > that is the only valid use case for reading the value?
> 
> Yeah, that would be great as no one should be doing anything
> logic-related based on the kref value.

Find below a patch that implements %pAk for 'struct kref' count printing and
%pAr for atomic_t counter printing.

This is against vanilla upstream.

Thanks,

	Ingo

============================>
Subject: printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
From: Ingo Molnar <mingo@kernel.org>
Date: Tue Nov 15 08:53:14 CET 2016

A decade of kref internals exposed to driver writers has proven that
exposing internals to them is a bad idea.

Make the bad patterns a bit easier to detect and allow cleaner
printouts by offering two new printk format string extensions:

	%pAr - print the atomic_t count in decimal
	%pAk - print the struct kref count in decimal

Also add printf testcases:

  [    0.353126] test_printf: all 268 tests passed

Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/printk-formats.txt |   10 +++++++++
 lib/test_printf.c                |   28 ++++++++++++++++++++++++++
 lib/vsprintf.c                   |   42 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

Index: tip/Documentation/printk-formats.txt
===================================================================
--- tip.orig/Documentation/printk-formats.txt
+++ tip/Documentation/printk-formats.txt
@@ -316,6 +316,16 @@ Flags bitfields such as page flags, gfp_
 
 	Passed by reference.
 
+atomic variables such atomic_t or struct kref:
+
+	%pAr	atomic_t count
+	%pAk	struct kref count
+
+	For printing the current count value of atomic variables. This is
+	preferred to accessing the counts directly.
+
+	Passed by reference.
+
 Network device features:
 
 	%pNF	0x000000000000c000
Index: tip/lib/test_printf.c
===================================================================
--- tip.orig/lib/test_printf.c
+++ tip/lib/test_printf.c
@@ -20,6 +20,8 @@
 #include <linux/gfp.h>
 #include <linux/mm.h>
 
+#include <linux/kref.h>
+
 #define BUF_SIZE 256
 #define PAD_SIZE 16
 #define FILL_CHAR '$'
@@ -462,6 +464,31 @@ flags(void)
 	kfree(cmp_buffer);
 }
 
+/*
+ * Testcases for %pAr (atomic_t) and %pAk (struct kref) count printing:
+ */
+static void __init test_atomics__atomic_t(void)
+{
+	atomic_t count = ATOMIC_INIT(1);
+
+	test("1", "%pAr", &count);
+}
+
+static void __init test_atomics__kref(void)
+{
+	struct kref kref;
+
+	kref_init(&kref);
+
+	test("1", "%pAk", &kref);
+}
+
+static void __init test_atomics(void)
+{
+	test_atomics__atomic_t();
+	test_atomics__kref();
+}
+
 static void __init
 test_pointer(void)
 {
@@ -481,6 +508,7 @@ test_pointer(void)
 	bitmap();
 	netdev_features();
 	flags();
+	test_atomics();
 }
 
 static int __init
Index: tip/lib/vsprintf.c
===================================================================
--- tip.orig/lib/vsprintf.c
+++ tip/lib/vsprintf.c
@@ -38,6 +38,8 @@
 
 #include "../mm/internal.h"	/* For the trace_print_flags arrays */
 
+#include <linux/kref.h>
+
 #include <asm/page.h>		/* for PAGE_SIZE */
 #include <asm/sections.h>	/* for dereference_function_descriptor() */
 #include <asm/byteorder.h>	/* cpu_to_le16 */
@@ -1470,6 +1472,40 @@ char *flags_string(char *buf, char *end,
 	return format_flags(buf, end, flags, names);
 }
 
+static noinline_for_stack
+char *atomic_var(char *buf, char *end, void *atomic_ptr, const char *fmt)
+{
+	unsigned long num;
+	const struct printf_spec numspec = {
+		.flags		= SPECIAL|SMALL,
+		.field_width	= -1,
+		.precision	= -1,
+		.base		= 10,
+	};
+
+	switch (fmt[1]) {
+		case 'r':
+		{
+			atomic_t *count_p = (void *)atomic_ptr;
+
+			num = atomic_read(count_p);
+			break;
+		}
+		case 'k':
+		{
+			struct kref *kref_p = (void *)atomic_ptr;
+
+			num = atomic_read(&kref_p->refcount);
+			break;
+		}
+		default:
+			WARN_ONCE(1, "Unsupported atomics modifier: %c\n", fmt[1]);
+			return buf;
+	}
+
+	return number(buf, end, num, numspec);
+}
+
 int kptr_restrict __read_mostly;
 
 /*
@@ -1563,6 +1599,10 @@ int kptr_restrict __read_mostly;
  *       p page flags (see struct page) given as pointer to unsigned long
  *       g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
  *       v vma flags (VM_*) given as pointer to unsigned long
+ * - 'A' For the count of atomic variables to be printed.
+ *       Supported flags given by option:
+ *        r atomic_t    ('r'aw count)
+ *        k struct kref ('k'ref count)
  *
  * ** Please update also Documentation/printk-formats.txt when making changes **
  *
@@ -1718,6 +1758,8 @@ char *pointer(const char *fmt, char *buf
 
 	case 'G':
 		return flags_string(buf, end, ptr, fmt);
+	case 'A':
+		return atomic_var(buf, end, ptr, fmt);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
@ 2016-11-15  8:40   ` Ingo Molnar
  2016-11-15  9:47     ` Peter Zijlstra
  2016-11-15 12:33   ` Boqun Feng
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 96+ messages in thread
From: Ingo Molnar @ 2016-11-15  8:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx, hpa,
	dave, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> Provide refcount_t, an atomic_t like primitive built just for
> refcounting.
> 
> It provides overflow and underflow checks as well as saturation
> semantics such that when it overflows, we'll never attempt to free it
> again, ever.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/kref.h     |   29 ++----
>  include/linux/refcount.h |  221 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 232 insertions(+), 18 deletions(-)

I'd suggest splitting this patch into two parts: first patch introduces the 
refcount.h facility, second patch changes over struct kref to the new facility.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
  2016-11-15  8:37       ` [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref' Ingo Molnar
@ 2016-11-15  8:43         ` Ingo Molnar
  2016-11-15  9:21           ` Peter Zijlstra
  2016-11-15 10:10           ` [PATCH v2] printk, locking/atomics, kref: Introduce new %pAr " kbuild test robot
  2016-11-15 16:42         ` [PATCH] " Linus Torvalds
  1 sibling, 2 replies; 96+ messages in thread
From: Ingo Molnar @ 2016-11-15  8:43 UTC (permalink / raw)
  To: Greg KH
  Cc: Christoph Hellwig, Peter Zijlstra, keescook, will.deacon,
	elena.reshetova, arnd, tglx, hpa, Linus Torvalds, dave,
	linux-kernel


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> > On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> > > On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > > > Since we need to change the implementation, stop exposing internals.
> > > > 
> > > > Provide kref_read() to read the current reference count; typically
> > > > used for debug messages.
> > > 
> > > Can we just provide a printk specifier for a kref value instead as
> > > that is the only valid use case for reading the value?
> > 
> > Yeah, that would be great as no one should be doing anything
> > logic-related based on the kref value.
> 
> Find below a patch that implements %pAk for 'struct kref' count printing and
> %pAr for atomic_t counter printing.
> 
> This is against vanilla upstream.

The patch below is against Peter's refcount series. Note that this patch depends 
on this patch in Peter's series:

  kref: Implement using refcount_t

Thanks,

	Ingo

==================================>
Subject: printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
From: Ingo Molnar <mingo@kernel.org>
Date: Tue Nov 15 08:53:14 CET 2016

A decade of kref internals exposed to driver writers has proven that
exposing internals to them is a bad idea.

Make the bad patterns a bit easier to detect and allow cleaner
printouts by offering two new printk format string extensions:

	%pAr - print the atomic_t count in decimal
	%pAk - print the struct kref count in decimal

Also add printf testcases:

  [    0.328495] test_printf: all 268 tests passed

Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/printk-formats.txt |   10 +++++++++
 lib/test_printf.c                |   28 ++++++++++++++++++++++++++
 lib/vsprintf.c                   |   42 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

Index: tip/Documentation/printk-formats.txt
===================================================================
--- tip.orig/Documentation/printk-formats.txt
+++ tip/Documentation/printk-formats.txt
@@ -316,6 +316,16 @@ Flags bitfields such as page flags, gfp_
 
 	Passed by reference.
 
+atomic variables such atomic_t or struct kref:
+
+	%pAr	atomic_t count
+	%pAk	struct kref count
+
+	For printing the current count value of atomic variables. This is
+	preferred to accessing the counts directly.
+
+	Passed by reference.
+
 Network device features:
 
 	%pNF	0x000000000000c000
Index: tip/lib/test_printf.c
===================================================================
--- tip.orig/lib/test_printf.c
+++ tip/lib/test_printf.c
@@ -20,6 +20,8 @@
 #include <linux/gfp.h>
 #include <linux/mm.h>
 
+#include <linux/kref.h>
+
 #define BUF_SIZE 256
 #define PAD_SIZE 16
 #define FILL_CHAR '$'
@@ -462,6 +464,31 @@ flags(void)
 	kfree(cmp_buffer);
 }
 
+/*
+ * Testcases for %pAr (atomic_t) and %pAk (struct kref) count printing:
+ */
+static void __init test_atomics__atomic_t(void)
+{
+	atomic_t count = ATOMIC_INIT(1);
+
+	test("1", "%pAr", &count);
+}
+
+static void __init test_atomics__kref(void)
+{
+	struct kref kref;
+
+	kref_init(&kref);
+
+	test("1", "%pAk", &kref);
+}
+
+static void __init test_atomics(void)
+{
+	test_atomics__atomic_t();
+	test_atomics__kref();
+}
+
 static void __init
 test_pointer(void)
 {
@@ -481,6 +508,7 @@ test_pointer(void)
 	bitmap();
 	netdev_features();
 	flags();
+	test_atomics();
 }
 
 static int __init
Index: tip/lib/vsprintf.c
===================================================================
--- tip.orig/lib/vsprintf.c
+++ tip/lib/vsprintf.c
@@ -38,6 +38,8 @@
 
 #include "../mm/internal.h"	/* For the trace_print_flags arrays */
 
+#include <linux/kref.h>
+
 #include <asm/page.h>		/* for PAGE_SIZE */
 #include <asm/sections.h>	/* for dereference_function_descriptor() */
 #include <asm/byteorder.h>	/* cpu_to_le16 */
@@ -1470,6 +1472,40 @@ char *flags_string(char *buf, char *end,
 	return format_flags(buf, end, flags, names);
 }
 
+static noinline_for_stack
+char *atomic_var(char *buf, char *end, void *atomic_ptr, const char *fmt)
+{
+	unsigned long num;
+	const struct printf_spec numspec = {
+		.flags		= SPECIAL|SMALL,
+		.field_width	= -1,
+		.precision	= -1,
+		.base		= 10,
+	};
+
+	switch (fmt[1]) {
+		case 'r':
+		{
+			atomic_t *count_p = (void *)atomic_ptr;
+
+			num = atomic_read(count_p);
+			break;
+		}
+		case 'k':
+		{
+			struct kref *kref_p = (void *)atomic_ptr;
+
+			num = refcount_read(&kref_p->refcount);
+			break;
+		}
+		default:
+			WARN_ONCE(1, "Unsupported atomics modifier: %c\n", fmt[1]);
+			return buf;
+	}
+
+	return number(buf, end, num, numspec);
+}
+
 int kptr_restrict __read_mostly;
 
 /*
@@ -1563,6 +1599,10 @@ int kptr_restrict __read_mostly;
  *       p page flags (see struct page) given as pointer to unsigned long
  *       g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
  *       v vma flags (VM_*) given as pointer to unsigned long
+ * - 'A' For the count of atomic variables to be printed.
+ *       Supported flags given by option:
+ *        r atomic_t    ('r'aw count)
+ *        k struct kref ('k'ref count)
  *
  * ** Please update also Documentation/printk-formats.txt when making changes **
  *
@@ -1718,6 +1758,8 @@ char *pointer(const char *fmt, char *buf
 
 	case 'G':
 		return flags_string(buf, end, ptr, fmt);
+	case 'A':
+		return atomic_var(buf, end, ptr, fmt);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
  2016-11-15  8:43         ` [PATCH v2] " Ingo Molnar
@ 2016-11-15  9:21           ` Peter Zijlstra
  2016-11-15  9:41             ` [PATCH v3] printk, locking/atomics, kref: Introduce new %pAa " Ingo Molnar
  2016-11-15 10:10           ` [PATCH v2] printk, locking/atomics, kref: Introduce new %pAr " kbuild test robot
  1 sibling, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15  9:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg KH, Christoph Hellwig, keescook, will.deacon,
	elena.reshetova, arnd, tglx, hpa, Linus Torvalds, dave,
	linux-kernel

On Tue, Nov 15, 2016 at 09:43:55AM +0100, Ingo Molnar wrote:
> +atomic variables such atomic_t or struct kref:
> +
> +	%pAr	atomic_t count

Why 'r' for atomic_t ? I was expecting 'a' for atomic_t or something.
That then also leaves 'r' available for refcount_t.

> +	%pAk	struct kref count

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v3] printk, locking/atomics, kref: Introduce new %pAa and %pAk format string options for atomic_t and 'struct kref'
  2016-11-15  9:21           ` Peter Zijlstra
@ 2016-11-15  9:41             ` Ingo Molnar
  0 siblings, 0 replies; 96+ messages in thread
From: Ingo Molnar @ 2016-11-15  9:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Christoph Hellwig, keescook, will.deacon,
	elena.reshetova, arnd, tglx, hpa, Linus Torvalds, dave,
	linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Nov 15, 2016 at 09:43:55AM +0100, Ingo Molnar wrote:
> > +atomic variables such atomic_t or struct kref:
> > +
> > +	%pAr	atomic_t count
> 
> Why 'r' for atomic_t ? I was expecting 'a' for atomic_t or something.
> That then also leaves 'r' available for refcount_t.

'r' was for 'raw atomic count', but you are right - new patch attached below.

Thanks,

	Ingo

==========
Subject: printk, locking/atomics, kref: Introduce new %pAa and %pAk format string options for atomic_t and 'struct kref'
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 15 Nov 2016 09:43:55 +0100

A decade of kref internals exposed to driver writers has proven that
exposing internals to them is a bad idea.

Make the bad patterns a bit easier to detect and allow cleaner
printouts by offering two new printk format string extensions:

	%pAa - print the atomic_t count in decimal
	%pAk - print the struct kref count in decimal

Also add printf testcases:

  [    0.334919] test_printf: all 268 tests passed

Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: dave@progbits.org
Cc: elena.reshetova@intel.com
Cc: keescook@chromium.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/20161115084355.GA24175@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/printk-formats.txt |   10 +++++++++
 lib/test_printf.c                |   28 ++++++++++++++++++++++++++
 lib/vsprintf.c                   |   42 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

Index: tip/Documentation/printk-formats.txt
===================================================================
--- tip.orig/Documentation/printk-formats.txt
+++ tip/Documentation/printk-formats.txt
@@ -316,6 +316,16 @@ Flags bitfields such as page flags, gfp_
 
 	Passed by reference.
 
+atomic variables such atomic_t or struct kref:
+
+	%pAa	atomic_t count
+	%pAk	struct kref count
+
+	For printing the current count value of atomic variables. This is
+	preferred to accessing the counts directly.
+
+	Passed by reference.
+
 Network device features:
 
 	%pNF	0x000000000000c000
Index: tip/lib/test_printf.c
===================================================================
--- tip.orig/lib/test_printf.c
+++ tip/lib/test_printf.c
@@ -20,6 +20,8 @@
 #include <linux/gfp.h>
 #include <linux/mm.h>
 
+#include <linux/kref.h>
+
 #define BUF_SIZE 256
 #define PAD_SIZE 16
 #define FILL_CHAR '$'
@@ -462,6 +464,31 @@ flags(void)
 	kfree(cmp_buffer);
 }
 
+/*
+ * Testcases for %pAa (atomic_t) and %pAk (struct kref) count printing:
+ */
+static void __init test_atomics__atomic_t(void)
+{
+	atomic_t count = ATOMIC_INIT(1);
+
+	test("1", "%pAa", &count);
+}
+
+static void __init test_atomics__kref(void)
+{
+	struct kref kref;
+
+	kref_init(&kref);
+
+	test("1", "%pAk", &kref);
+}
+
+static void __init test_atomics(void)
+{
+	test_atomics__atomic_t();
+	test_atomics__kref();
+}
+
 static void __init
 test_pointer(void)
 {
@@ -481,6 +508,7 @@ test_pointer(void)
 	bitmap();
 	netdev_features();
 	flags();
+	test_atomics();
 }
 
 static int __init
Index: tip/lib/vsprintf.c
===================================================================
--- tip.orig/lib/vsprintf.c
+++ tip/lib/vsprintf.c
@@ -38,6 +38,8 @@
 
 #include "../mm/internal.h"	/* For the trace_print_flags arrays */
 
+#include <linux/kref.h>
+
 #include <asm/page.h>		/* for PAGE_SIZE */
 #include <asm/sections.h>	/* for dereference_function_descriptor() */
 #include <asm/byteorder.h>	/* cpu_to_le16 */
@@ -1470,6 +1472,40 @@ char *flags_string(char *buf, char *end,
 	return format_flags(buf, end, flags, names);
 }
 
+static noinline_for_stack
+char *atomic_var(char *buf, char *end, void *atomic_ptr, const char *fmt)
+{
+	unsigned long num;
+	const struct printf_spec numspec = {
+		.flags		= SPECIAL|SMALL,
+		.field_width	= -1,
+		.precision	= -1,
+		.base		= 10,
+	};
+
+	switch (fmt[1]) {
+		case 'a':
+		{
+			atomic_t *count_p = (void *)atomic_ptr;
+
+			num = atomic_read(count_p);
+			break;
+		}
+		case 'k':
+		{
+			struct kref *kref_p = (void *)atomic_ptr;
+
+			num = refcount_read(&kref_p->refcount);
+			break;
+		}
+		default:
+			WARN_ONCE(1, "Unsupported atomics modifier: %c\n", fmt[1]);
+			return buf;
+	}
+
+	return number(buf, end, num, numspec);
+}
+
 int kptr_restrict __read_mostly;
 
 /*
@@ -1563,6 +1599,10 @@ int kptr_restrict __read_mostly;
  *       p page flags (see struct page) given as pointer to unsigned long
  *       g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
  *       v vma flags (VM_*) given as pointer to unsigned long
+ * - 'A' For the count of atomic variables to be printed.
+ *       Supported flags given by option:
+ *        a atomic_t    ('a'tomic count)
+ *        k struct kref ('k'ref count)
  *
  * ** Please update also Documentation/printk-formats.txt when making changes **
  *
@@ -1718,6 +1758,8 @@ char *pointer(const char *fmt, char *buf
 
 	case 'G':
 		return flags_string(buf, end, ptr, fmt);
+	case 'A':
+		return atomic_var(buf, end, ptr, fmt);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15  8:40   ` Ingo Molnar
@ 2016-11-15  9:47     ` Peter Zijlstra
  2016-11-15 10:03       ` Ingo Molnar
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15  9:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx, hpa,
	dave, linux-kernel

On Tue, Nov 15, 2016 at 09:40:09AM +0100, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Provide refcount_t, an atomic_t like primitive built just for
> > refcounting.
> > 
> > It provides overflow and underflow checks as well as saturation
> > semantics such that when it overflows, we'll never attempt to free it
> > again, ever.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/linux/kref.h     |   29 ++----
> >  include/linux/refcount.h |  221 +++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 232 insertions(+), 18 deletions(-)
> 
> I'd suggest splitting this patch into two parts: first patch introduces the 
> refcount.h facility, second patch changes over struct kref to the new facility.

You're right, I was just really glad I got it to compile and didn't want
to prod more at it.

Should I also make a CONFIG knob that implements refcount_t with the
'normal' atomic_t primitives?

And possibly another knob to toggle the BUG()s into WARN()s. With the
full saturation semantics WARN() is a lot safer and will not corrupt
kernel state as much.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15  9:47     ` Peter Zijlstra
@ 2016-11-15 10:03       ` Ingo Molnar
  2016-11-15 10:46         ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Ingo Molnar @ 2016-11-15 10:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx, hpa,
	dave, Linus Torvalds, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Nov 15, 2016 at 09:40:09AM +0100, Ingo Molnar wrote:
> > 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > Provide refcount_t, an atomic_t like primitive built just for
> > > refcounting.
> > > 
> > > It provides overflow and underflow checks as well as saturation
> > > semantics such that when it overflows, we'll never attempt to free it
> > > again, ever.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  include/linux/kref.h     |   29 ++----
> > >  include/linux/refcount.h |  221 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 232 insertions(+), 18 deletions(-)
> > 
> > I'd suggest splitting this patch into two parts: first patch introduces the 
> > refcount.h facility, second patch changes over struct kref to the new facility.
> 
> You're right, I was just really glad I got it to compile and didn't want
> to prod more at it.
>
> Should I also make a CONFIG knob that implements refcount_t with the
> 'normal' atomic_t primitives?

I'd suggest doing the saturation/safe-wrap semantics only for now (i.e. the 
current patch, split into two perhaps), and reconsider if there's any complaints?

> And possibly another knob to toggle the BUG()s into WARN()s. With the
> full saturation semantics WARN() is a lot safer and will not corrupt
> kernel state as much.

I'd suggest changing it to a WARN() straight away, no extra knobs.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
  2016-11-15  8:43         ` [PATCH v2] " Ingo Molnar
  2016-11-15  9:21           ` Peter Zijlstra
@ 2016-11-15 10:10           ` kbuild test robot
  1 sibling, 0 replies; 96+ messages in thread
From: kbuild test robot @ 2016-11-15 10:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: kbuild-all, Greg KH, Christoph Hellwig, Peter Zijlstra, keescook,
	will.deacon, elena.reshetova, arnd, tglx, hpa, Linus Torvalds,
	dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]

Hi Ingo,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.9-rc5 next-20161115]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ingo-Molnar/printk-locking-atomics-kref-Introduce-new-pAr-and-pAk-format-string-options-for-atomic_t-and-struct-kref/20161115-174900
config: i386-randconfig-x006-201646 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   lib/vsprintf.c: In function 'atomic_var':
>> lib/vsprintf.c:1498:10: error: implicit declaration of function 'refcount_read' [-Werror=implicit-function-declaration]
       num = refcount_read(&kref_p->refcount);
             ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/refcount_read +1498 lib/vsprintf.c

  1492				break;
  1493			}
  1494			case 'k':
  1495			{
  1496				struct kref *kref_p = (void *)atomic_ptr;
  1497	
> 1498				num = refcount_read(&kref_p->refcount);
  1499				break;
  1500			}
  1501			default:

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27870 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 10:03       ` Ingo Molnar
@ 2016-11-15 10:46         ` Peter Zijlstra
  2016-11-15 13:03           ` Ingo Molnar
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15 10:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx, hpa,
	dave, Linus Torvalds, linux-kernel

On Tue, Nov 15, 2016 at 11:03:59AM +0100, Ingo Molnar wrote:
> > Should I also make a CONFIG knob that implements refcount_t with the
> > 'normal' atomic_t primitives?
> 
> I'd suggest doing the saturation/safe-wrap semantics only for now (i.e. the 
> current patch, split into two perhaps), and reconsider if there's any complaints?
> 
> > And possibly another knob to toggle the BUG()s into WARN()s. With the
> > full saturation semantics WARN() is a lot safer and will not corrupt
> > kernel state as much.
> 
> I'd suggest changing it to a WARN() straight away, no extra knobs.

OK, a little like so then? Note that the overflow tests went away
because increments guarantee we saturate before we overflow.

---
Subject: refcount_t: A special purpose refcount type
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon Nov 14 18:06:19 CET 2016

Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides saturation semantics such that overflow becomes impossible
and thereby 'spurious' use-after-free is avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h |  229 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 229 insertions(+)

--- /dev/null
+++ b/include/linux/refcount.h
@@ -0,0 +1,229 @@
+#ifndef _LINUX_REFCOUNT_H
+#define _LINUX_REFCOUNT_H
+
+/*
+ * Variant of atomic_t specialized for refcounting.
+ *
+ * The interface matches the atomic_t interface (to aid in porting) but only
+ * provides the few functions one should use for refcounting.
+ *
+ * It differs in that the counter saturates at UINT_MAX and will not move once
+ * there. This avoids wrapping the counter and causing 'spurious'
+ * use-after-free issues.
+ *
+ * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
+ * and provide only what is strictly required for refcounts.
+ *
+ * The increments are fully relaxed; these will not provide ordering. The
+ * rationale is that whatever is used to obtain the object we're increasing the
+ * reference count on will provide the ordering. For locked data structures,
+ * its the lock acquire, for RCU/lockless data structures its the dependent
+ * load.
+ *
+ * Do note that inc_not_zero() provides a control dependency which will order
+ * future stores against the inc, this ensures we'll never modify the object
+ * if we did not in fact acquire a reference.
+ *
+ * The decrements will provide release order, such that all the prior loads and
+ * stores will be issued before we proceed with freeing the object.
+ *
+ * Note: the implementation hard relies on increments, bigger than 1 additions
+ *       need explicit overflow -> saturation logic.
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+
+typedef struct refcount_struct {
+	atomic_t refs;
+} refcount_t;
+
+#define REFCOUNT_INIT(n)	{ .refs = ATOMIC_INIT(n), }
+
+static inline void refcount_set(refcount_t *r, int n)
+{
+	atomic_set(&r->refs, n);
+}
+
+static inline unsigned int refcount_read(const refcount_t *r)
+{
+	return atomic_read(&r->refs);
+}
+
+/*
+ * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller already has a
+ * reference on the object, will WARN when this is not so.
+ */
+static inline void refcount_inc(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
+
+		if (unlikely(val == UINT_MAX))
+			return;
+
+		new = val + 1;
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+}
+
+/*
+ * Similar to atomic_inc_not_zero(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller has guaranteed the
+ * object memory to be stable (RCU, etc.). It does provide a control dependency
+ * and thereby orders future stores.
+ */
+static inline __must_check
+bool refcount_inc_not_zero(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (!val)
+			return false;
+
+		if (unlikely(val == UINT_MAX))
+			return true;
+
+		new = val + 1;
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+
+	return true;
+}
+
+/*
+ * Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before a subsequent free.
+ */
+static inline __must_check
+bool refcount_dec_and_test(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		new = val - 1;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n"))
+			return false;
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return !new;
+}
+
+/*
+ * Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
+ * to decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before a subsequent free. This allows free() while holding the mutex.
+ */
+static inline __must_check
+bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+	bool locked = false;
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		if (val == 1 && !locked) {
+			locked = true;
+			mutex_lock(lock);
+		}
+
+		new = val - 1;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n")) {
+			if (locked)
+				mutex_unlock(lock);
+			return false;
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	if (new && locked)
+		mutex_unlock(lock);
+
+	return !new;
+}
+
+/*
+ * Similar to atomic_dec_and_lock(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before a subsequent free. This allows free() while holding the lock.
+ */
+static inline __must_check
+bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+	bool locked = false;
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		if (val == 1 && !locked) {
+			locked = true;
+			spin_lock(lock);
+		}
+
+		new = val - 1;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n")) {
+			if (locked)
+				mutex_unlock(lock);
+			return false;
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	if (new && locked)
+		spin_unlock(lock);
+
+	return !new;
+}
+
+#endif /* _LINUX_REFCOUNT_H */

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
  2016-11-15  8:40   ` Ingo Molnar
@ 2016-11-15 12:33   ` Boqun Feng
  2016-11-15 13:01     ` Peter Zijlstra
  2016-11-18 10:07   ` Reshetova, Elena
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 96+ messages in thread
From: Boqun Feng @ 2016-11-15 12:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3955 bytes --]

Hi Peter,

On Mon, Nov 14, 2016 at 06:39:53PM +0100, Peter Zijlstra wrote:
[...]
> +/*
> + * Similar to atomic_dec_and_test(), it will BUG on underflow and fail to
> + * decrement when saturated at UINT_MAX.
> + *
> + * Provides release memory ordering, such that prior loads and stores are done
> + * before a subsequent free.

I'm not sure this is correct, the RELEASE semantics is for the STORE
part of cmpxchg, and semantically it will guarantee that memory
operations after cmpxchg won't be reordered upwards, for example, on
ARM64, the following code:

	WRITE_ONCE(x, 1)
	
	atomic_cmpxchg_release(&a, 1, 2);
	  r1 = ll(&a)
	  if (r1 == 1) {
	    sc_release(&a, 2);
	  }
	
	free()

could be reordered as, I think:

	atomic_cmpxchg_release(&a, 1, 2);
	  r1 = ll(&a)
	  if (r1 == 1) {
	    free()
	    WRITE_ONCE(x, 1)
	    sc_release(&a, 2);
	  }

Of course, we need to wait for Will to confirm about this. But if this
could happen, we'd better to use a smp_mb()+atomic_cmpxchg_relaxed()
here and for other refcount_dec_and_*().

That said, I think the really ordering guarantee we need here is that
two cmpxchg()s in refcount_dec_and_*() could be paired with each other
to ensure nobody observes a freed object inside a refcount critical
section. In that case, we need atomic_cmpxchg() here for ordering on
both sides. Or maybe replace atomic_read() with smp_load_acquire().

Regards,
Boqun


> + */
> +static inline __must_check
> +bool refcount_dec_and_test(refcount_t *r)
> +{
> +	unsigned int old, new, val = atomic_read(&r->refs);
> +
> +	for (;;) {
> +		if (val == UINT_MAX)
> +			return false;
> +
> +		new = val - 1;
> +		if (new > val)
> +			BUG(); /* underflow */
> +
> +		old = atomic_cmpxchg_release(&r->refs, val, new);
> +		if (old == val)
> +			break;
> +
> +		val = old;
> +	}
> +
> +	return !new;
> +}
> +
> +/*
> + * Similar to atomic_dec_and_mutex_lock(), it will BUG on underflow and fail
> + * to decrement when saturated at UINT_MAX.
> + *
> + * Provides release memory ordering, such that prior loads and stores are done
> + * before a subsequent free. This allows free() while holding the mutex.
> + */
> +static inline __must_check
> +bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock)
> +{
> +	unsigned int old, new, val = atomic_read(&r->refs);
> +	bool locked = false;
> +
> +	for (;;) {
> +		if (val == UINT_MAX)
> +			return false;
> +
> +		if (val == 1 && !locked) {
> +			locked = true;
> +			mutex_lock(lock);
> +		}
> +
> +		new = val - 1;
> +		if (new > val) {
> +			if (locked)
> +				mutex_unlock(lock);
> +			BUG(); /* underflow */
> +		}
> +
> +		old = atomic_cmpxchg_release(&r->refs, val, new);
> +		if (old == val)
> +			break;
> +
> +		val = old;
> +	}
> +
> +	if (new && locked)
> +		mutex_unlock(lock);
> +
> +	return !new;
> +}
> +
> +/*
> + * Similar to atomic_dec_and_lock(), it will BUG on underflow and fail
> + * to decrement when saturated at UINT_MAX.
> + *
> + * Provides release memory ordering, such that prior loads and stores are done
> + * before a subsequent free. This allows free() while holding the lock.
> + */
> +static inline __must_check
> +bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
> +{
> +	unsigned int old, new, val = atomic_read(&r->refs);
> +	bool locked = false;
> +
> +	for (;;) {
> +		if (val == UINT_MAX)
> +			return false;
> +
> +		if (val == 1 && !locked) {
> +			locked = true;
> +			spin_lock(lock);
> +		}
> +
> +		new = val - 1;
> +		if (new > val) {
> +			if (locked)
> +				spin_unlock(lock);
> +			BUG(); /* underflow */
> +		}
> +
> +		old = atomic_cmpxchg_release(&r->refs, val, new);
> +		if (old == val)
> +			break;
> +
> +		val = old;
> +	}
> +
> +	if (new && locked)
> +		spin_unlock(lock);
> +
> +	return !new;
> +}
> +
> +#endif /* _LINUX_REFCOUNT_H */
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 12:33   ` Boqun Feng
@ 2016-11-15 13:01     ` Peter Zijlstra
  2016-11-15 14:19       ` Boqun Feng
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15 13:01 UTC (permalink / raw)
  To: Boqun Feng
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Tue, Nov 15, 2016 at 08:33:37PM +0800, Boqun Feng wrote:
> Hi Peter,
> 
> On Mon, Nov 14, 2016 at 06:39:53PM +0100, Peter Zijlstra wrote:
> [...]
> > +/*
> > + * Similar to atomic_dec_and_test(), it will BUG on underflow and fail to
> > + * decrement when saturated at UINT_MAX.
> > + *
> > + * Provides release memory ordering, such that prior loads and stores are done
> > + * before a subsequent free.
> 
> I'm not sure this is correct, the RELEASE semantics is for the STORE
> part of cmpxchg, and semantically it will guarantee that memory
> operations after cmpxchg won't be reordered upwards, for example, on
> ARM64, the following code:
> 
> 	WRITE_ONCE(x, 1)
> 	
> 	atomic_cmpxchg_release(&a, 1, 2);
> 	  r1 = ll(&a)
> 	  if (r1 == 1) {
> 	    sc_release(&a, 2);
> 	  }
> 	
> 	free()
> 
> could be reordered as, I think:
> 
> 	atomic_cmpxchg_release(&a, 1, 2);
> 	  r1 = ll(&a)
> 	  if (r1 == 1) {
> 	    free()
> 	    WRITE_ONCE(x, 1)
> 	    sc_release(&a, 2);
> 	  }
> 
> Of course, we need to wait for Will to confirm about this. But if this
> could happen, we'd better to use a smp_mb()+atomic_cmpxchg_relaxed()
> here and for other refcount_dec_and_*().

Can't happen I think because of the control dependency between
dec_and_test() and free().

That is, the cmpxchg_release() must complete to determine if it was
successful or it needs a retry. The success, combined with the state of
the variable will then determine if we call free().

So I don't think we can get free() (which very much includes stores) to
happen before the store-release.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 10:46         ` Peter Zijlstra
@ 2016-11-15 13:03           ` Ingo Molnar
  2016-11-15 18:06             ` Kees Cook
  0 siblings, 1 reply; 96+ messages in thread
From: Ingo Molnar @ 2016-11-15 13:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx, hpa,
	dave, Linus Torvalds, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Nov 15, 2016 at 11:03:59AM +0100, Ingo Molnar wrote:
> > > Should I also make a CONFIG knob that implements refcount_t with the
> > > 'normal' atomic_t primitives?
> > 
> > I'd suggest doing the saturation/safe-wrap semantics only for now (i.e. the 
> > current patch, split into two perhaps), and reconsider if there's any complaints?
> > 
> > > And possibly another knob to toggle the BUG()s into WARN()s. With the
> > > full saturation semantics WARN() is a lot safer and will not corrupt
> > > kernel state as much.
> > 
> > I'd suggest changing it to a WARN() straight away, no extra knobs.
> 
> OK, a little like so then? Note that the overflow tests went away
> because increments guarantee we saturate before we overflow.

Looks good to me!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 13:01     ` Peter Zijlstra
@ 2016-11-15 14:19       ` Boqun Feng
  2016-11-17  9:28         ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Boqun Feng @ 2016-11-15 14:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2764 bytes --]

On Tue, Nov 15, 2016 at 02:01:54PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 15, 2016 at 08:33:37PM +0800, Boqun Feng wrote:
> > Hi Peter,
> > 
> > On Mon, Nov 14, 2016 at 06:39:53PM +0100, Peter Zijlstra wrote:
> > [...]
> > > +/*
> > > + * Similar to atomic_dec_and_test(), it will BUG on underflow and fail to
> > > + * decrement when saturated at UINT_MAX.
> > > + *
> > > + * Provides release memory ordering, such that prior loads and stores are done
> > > + * before a subsequent free.
> > 
> > I'm not sure this is correct, the RELEASE semantics is for the STORE
> > part of cmpxchg, and semantically it will guarantee that memory
> > operations after cmpxchg won't be reordered upwards, for example, on
> > ARM64, the following code:
> > 
> > 	WRITE_ONCE(x, 1)
> > 	
> > 	atomic_cmpxchg_release(&a, 1, 2);
> > 	  r1 = ll(&a)
> > 	  if (r1 == 1) {
> > 	    sc_release(&a, 2);
> > 	  }
> > 	
> > 	free()
> > 
> > could be reordered as, I think:
> > 
> > 	atomic_cmpxchg_release(&a, 1, 2);
> > 	  r1 = ll(&a)
> > 	  if (r1 == 1) {
> > 	    free()
> > 	    WRITE_ONCE(x, 1)
> > 	    sc_release(&a, 2);
> > 	  }
> > 
> > Of course, we need to wait for Will to confirm about this. But if this
> > could happen, we'd better to use a smp_mb()+atomic_cmpxchg_relaxed()
> > here and for other refcount_dec_and_*().
> 
> Can't happen I think because of the control dependency between
> dec_and_test() and free().
> 
> That is, the cmpxchg_release() must complete to determine if it was
> successful or it needs a retry. The success, combined with the state of
> the variable will then determine if we call free().
> 

The thing is that determination of the variable's state(i.e.
store_release() succeeds) and the actual writeback to memory are two
separate events. So yes, free() won't execute before store_release()
commits successfully, but there is no barrier here to order the memory
effects of store_release() and free().

See a similar example:

https://marc.info/?l=linux-s390&m=146604339321723&w=2


But as I said, we actually only need the pairing of orderings:

1) load part of cmpxchg -> free() 
2) object accesses -> store part of cmpxchg

Ordering #1 can be achieved via control dependency as you pointed out
that free()s very much includes stores. And ordering #2 can be achieved
with RELEASE.

So the code is right, I just thought the comment may be misleading. The
reason we use cmpxchg_release() is just for achieving ordering #2, and
not to order "prior loads and stores" with "a subsequent free".

Am I missing some subtle orderings here?

Regards,
Boqun

> So I don't think we can get free() (which very much includes stores) to
> happen before the store-release.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 0/7] kref improvements
  2016-11-15  7:42   ` Ingo Molnar
@ 2016-11-15 15:05     ` Greg KH
  0 siblings, 0 replies; 96+ messages in thread
From: Greg KH @ 2016-11-15 15:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, keescook, will.deacon, elena.reshetova, arnd,
	tglx, hpa, dave, linux-kernel

On Tue, Nov 15, 2016 at 08:42:45AM +0100, Ingo Molnar wrote:
> * Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> > Thanks so much for doing these, at the very least, I want to take the
> > kref-abuse-fixes now as those users shouldn't be doing those foolish
> > things.  Any objection for me taking some of them through my tree now?
> 
> Very nice series indeed!
> 
> We normally route atomics related patches through tip:locking/core (there's also 
> tip:atomic/core), but this is a special case I think, given how broadly it 
> interacts with driver code.
> 
> So both would work I think: we could concentrate these and only these patches into 
> tip:atomic/core into an append-only tree, or you can carry them in the driver tree 
> - whichever variant you prefer!

I have no objection for them going through a tip:whataver tree/branch,
if you all want to do that, it makes it easier for me :)

I'm guessing another series will come out before they get merged there,
so I'll hold off on providing an Acked-by: until then.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
  2016-11-15  8:37       ` [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref' Ingo Molnar
  2016-11-15  8:43         ` [PATCH v2] " Ingo Molnar
@ 2016-11-15 16:42         ` Linus Torvalds
  2016-11-16  8:13           ` Ingo Molnar
  1 sibling, 1 reply; 96+ messages in thread
From: Linus Torvalds @ 2016-11-15 16:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg KH, Christoph Hellwig, Peter Zijlstra, Kees Cook,
	Will Deacon, elena.reshetova, Arnd Bergmann, Thomas Gleixner,
	Peter Anvin, dave, Linux Kernel Mailing List

On Tue, Nov 15, 2016 at 12:37 AM, Ingo Molnar <mingo@kernel.org> wrote:
> +atomic variables such atomic_t or struct kref:
> +
> +       %pAr    atomic_t count
> +       %pAk    struct kref count

Not a huge fan. That "r" makes sense to you ("raw" atomic), but it
makes no sense to a user. An atomic isn't "raw" to anybody else. It's
just an atomic.

Also, we have 'atomic64_t", which this doesn't cover at all.

I'd suggest just %pA, %pA64, %pAkref or something. Which leaves us the
choice to add more atomic versions later without having to make up
random one-letter things that make no sense.

           Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 13:03           ` Ingo Molnar
@ 2016-11-15 18:06             ` Kees Cook
  2016-11-15 19:16               ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Kees Cook @ 2016-11-15 18:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Greg KH, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Tue, Nov 15, 2016 at 5:03 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Tue, Nov 15, 2016 at 11:03:59AM +0100, Ingo Molnar wrote:
>> > > Should I also make a CONFIG knob that implements refcount_t with the
>> > > 'normal' atomic_t primitives?
>> >
>> > I'd suggest doing the saturation/safe-wrap semantics only for now (i.e. the
>> > current patch, split into two perhaps), and reconsider if there's any complaints?
>> >
>> > > And possibly another knob to toggle the BUG()s into WARN()s. With the
>> > > full saturation semantics WARN() is a lot safer and will not corrupt
>> > > kernel state as much.

I'll want to modify this in the future; I have a config already doing
"Bug on data structure corruption" that makes the warn/bug choice.
It'll need some massaging to fit into the new refcount_t checks, but
it should be okay -- there needs to be a way to complete the
saturation, etc, but still kill the offending process group.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 18:06             ` Kees Cook
@ 2016-11-15 19:16               ` Peter Zijlstra
  2016-11-15 19:23                 ` Kees Cook
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-15 19:16 UTC (permalink / raw)
  To: Kees Cook, Ingo Molnar
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, H. Peter Anvin, David Windsor, Linus Torvalds,
	LKML



On 15 November 2016 19:06:28 CET, Kees Cook <keescook@chromium.org> wrote:

>I'll want to modify this in the future; I have a config already doing
>"Bug on data structure corruption" that makes the warn/bug choice.
>It'll need some massaging to fit into the new refcount_t checks, but
>it should be okay -- there needs to be a way to complete the
>saturation, etc, but still kill the offending process group.

Ideally we'd create a new WARN like construct that continues in kernel space and terminates the process on return to user. That way there would be minimal kernel state corruption.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 19:16               ` Peter Zijlstra
@ 2016-11-15 19:23                 ` Kees Cook
  2016-11-16  8:31                   ` Ingo Molnar
  0 siblings, 1 reply; 96+ messages in thread
From: Kees Cook @ 2016-11-15 19:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Greg KH, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Tue, Nov 15, 2016 at 11:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> On 15 November 2016 19:06:28 CET, Kees Cook <keescook@chromium.org> wrote:
>
>>I'll want to modify this in the future; I have a config already doing
>>"Bug on data structure corruption" that makes the warn/bug choice.
>>It'll need some massaging to fit into the new refcount_t checks, but
>>it should be okay -- there needs to be a way to complete the
>>saturation, etc, but still kill the offending process group.
>
> Ideally we'd create a new WARN like construct that continues in kernel space and terminates the process on return to user. That way there would be minimal kernel state corruption.

Right, though I'd like to be conservative about the kernel execution
continuing... I'll experiment with it.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-15  8:03     ` Peter Zijlstra
@ 2016-11-15 20:53       ` Kees Cook
  2016-11-16  8:21         ` Greg KH
  2016-11-16 10:09         ` Peter Zijlstra
  0 siblings, 2 replies; 96+ messages in thread
From: Kees Cook @ 2016-11-15 20:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Tue, Nov 15, 2016 at 12:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Nov 15, 2016 at 08:33:22AM +0100, Greg KH wrote:
>> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
>
>> > --- a/drivers/block/drbd/drbd_req.c
>> > +++ b/drivers/block/drbd/drbd_req.c
>> > @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
>> >             /* Completion does it's own kref_put.  If we are going to
>> >              * kref_sub below, we need req to be still around then. */
>> >             int at_least = k_put + !!c_put;
>> > -           int refcount = atomic_read(&req->kref.refcount);
>> > +           int refcount = kref_read(&req->kref);
>> >             if (refcount < at_least)
>> >                     drbd_err(device,
>> >                             "mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",
>>
>> As proof of "things you should never do", here is one such example.
>>
>> ugh.
>>
>>
>> > --- a/drivers/block/virtio_blk.c
>> > +++ b/drivers/block/virtio_blk.c
>> > @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
>> >     /* Stop all the virtqueues. */
>> >     vdev->config->reset(vdev);
>> >
>> > -   refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
>> > +   refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
>> >     put_disk(vblk->disk);
>> >     vdev->config->del_vqs(vdev);
>> >     kfree(vblk->vqs);
>>
>> And this too, ugh, that's a huge abuse and is probably totally wrong...
>>
>> thanks again for digging through this crap.  I wonder if we need to name
>> the kref reference variable "do_not_touch_this_ever" or some such thing
>> to catch all of the people who try to be "too smart".
>
> There's unimaginable bong hits involved in this stuff, in the end I
> resorted to brute force and scripts to convert all this.

What should we do about things like this (bpf_prog_put() and callbacks
from kernel/bpf/syscall.c):


static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
{
        struct user_struct *user = prog->aux->user;

        atomic_long_sub(prog->pages, &user->locked_vm);
        free_uid(user);
}

static void __bpf_prog_put_rcu(struct rcu_head *rcu)
{
        struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);

        free_used_maps(aux);
        bpf_prog_uncharge_memlock(aux->prog);
        bpf_prog_free(aux->prog);
}

void bpf_prog_put(struct bpf_prog *prog)
{
        if (atomic_dec_and_test(&prog->aux->refcnt))
                call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}


Not only do we want to protect prog->aux->refcnt, but I think we want
to protect user->locked_vm too ... I don't think it's sane for
user->locked_vm to be a stats_t ?

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref'
  2016-11-15 16:42         ` [PATCH] " Linus Torvalds
@ 2016-11-16  8:13           ` Ingo Molnar
  0 siblings, 0 replies; 96+ messages in thread
From: Ingo Molnar @ 2016-11-16  8:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg KH, Christoph Hellwig, Peter Zijlstra, Kees Cook,
	Will Deacon, elena.reshetova, Arnd Bergmann, Thomas Gleixner,
	Peter Anvin, dave, Linux Kernel Mailing List


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Nov 15, 2016 at 12:37 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > +atomic variables such atomic_t or struct kref:
> > +
> > +       %pAr    atomic_t count
> > +       %pAk    struct kref count
> 
> Not a huge fan. That "r" makes sense to you ("raw" atomic), but it
> makes no sense to a user. An atomic isn't "raw" to anybody else. It's
> just an atomic.

So in the latestest patch this has evolved to:

        %pAa - print the 'atomic_t'    count in decimal
        %pAk - print the 'struct kref' count in decimal
        %pAr - print the 'refcount_t'  count in decimal

... are you still hating it?

> Also, we have 'atomic64_t", which this doesn't cover at all.

We could use a somewhat logical letter for atomic64_t too:

	%pAA - print the 'atomic64_t'  count in decimal

... as 'A' is the bigger version of 'a', just like atomic64_t is the bigger 
version of atomic_t! ;-)

> I'd suggest just %pA, %pA64, %pAkref or something. Which leaves us the
> choice to add more atomic versions later without having to make up
> random one-letter things that make no sense.

It's a bit more work, but we could do that too, if you still don't like the above 
single letter abbreviations.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-15 20:53       ` Kees Cook
@ 2016-11-16  8:21         ` Greg KH
  2016-11-16 10:10           ` Peter Zijlstra
  2016-11-16 10:11           ` Daniel Borkmann
  2016-11-16 10:09         ` Peter Zijlstra
  1 sibling, 2 replies; 96+ messages in thread
From: Greg KH @ 2016-11-16  8:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
> On Tue, Nov 15, 2016 at 12:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Nov 15, 2016 at 08:33:22AM +0100, Greg KH wrote:
> >> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> >
> >> > --- a/drivers/block/drbd/drbd_req.c
> >> > +++ b/drivers/block/drbd/drbd_req.c
> >> > @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
> >> >             /* Completion does it's own kref_put.  If we are going to
> >> >              * kref_sub below, we need req to be still around then. */
> >> >             int at_least = k_put + !!c_put;
> >> > -           int refcount = atomic_read(&req->kref.refcount);
> >> > +           int refcount = kref_read(&req->kref);
> >> >             if (refcount < at_least)
> >> >                     drbd_err(device,
> >> >                             "mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",
> >>
> >> As proof of "things you should never do", here is one such example.
> >>
> >> ugh.
> >>
> >>
> >> > --- a/drivers/block/virtio_blk.c
> >> > +++ b/drivers/block/virtio_blk.c
> >> > @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
> >> >     /* Stop all the virtqueues. */
> >> >     vdev->config->reset(vdev);
> >> >
> >> > -   refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
> >> > +   refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
> >> >     put_disk(vblk->disk);
> >> >     vdev->config->del_vqs(vdev);
> >> >     kfree(vblk->vqs);
> >>
> >> And this too, ugh, that's a huge abuse and is probably totally wrong...
> >>
> >> thanks again for digging through this crap.  I wonder if we need to name
> >> the kref reference variable "do_not_touch_this_ever" or some such thing
> >> to catch all of the people who try to be "too smart".
> >
> > There's unimaginable bong hits involved in this stuff, in the end I
> > resorted to brute force and scripts to convert all this.
> 
> What should we do about things like this (bpf_prog_put() and callbacks
> from kernel/bpf/syscall.c):
> 
> 
> static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
> {
>         struct user_struct *user = prog->aux->user;
> 
>         atomic_long_sub(prog->pages, &user->locked_vm);

Oh that's scary.  Let's just make one reference count rely on another
one and not check things...

>         free_uid(user);
> }
> 
> static void __bpf_prog_put_rcu(struct rcu_head *rcu)
> {
>         struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> 
>         free_used_maps(aux);
>         bpf_prog_uncharge_memlock(aux->prog);
>         bpf_prog_free(aux->prog);
> }
> 
> void bpf_prog_put(struct bpf_prog *prog)
> {
>         if (atomic_dec_and_test(&prog->aux->refcnt))
>                 call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> }
> 
> 
> Not only do we want to protect prog->aux->refcnt, but I think we want
> to protect user->locked_vm too ... I don't think it's sane for
> user->locked_vm to be a stats_t ?

I don't think this is sane code...

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 19:23                 ` Kees Cook
@ 2016-11-16  8:31                   ` Ingo Molnar
  2016-11-16  8:51                     ` Greg KH
                                       ` (2 more replies)
  0 siblings, 3 replies; 96+ messages in thread
From: Ingo Molnar @ 2016-11-16  8:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Greg KH, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML


* Kees Cook <keescook@chromium.org> wrote:

> On Tue, Nov 15, 2016 at 11:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >
> > On 15 November 2016 19:06:28 CET, Kees Cook <keescook@chromium.org> wrote:
> >
> >>I'll want to modify this in the future; I have a config already doing
> >>"Bug on data structure corruption" that makes the warn/bug choice.
> >>It'll need some massaging to fit into the new refcount_t checks, but
> >>it should be okay -- there needs to be a way to complete the
> >>saturation, etc, but still kill the offending process group.
> >
> > Ideally we'd create a new WARN like construct that continues in kernel space 
> > and terminates the process on return to user. That way there would be minimal 
> > kernel state corruption.

Yeah, so the problem is that sometimes you are p0wned the moment you return to a 
corrupted stack, and some of these checks only detect corruption after the fact.

> Right, though I'd like to be conservative about the kernel execution 
> continuing... I'll experiment with it.

So what I'd love to see is to have a kernel option that re-introduces some 
historic root (and other) holes that can be exploited deterministically - 
obviously default disabled.

I'd restrict this to reasonably 'deterministic' holes, and the exploits themselves 
could be somewhere in tools/. (Obviously only where the maintainers agree to host 
the code.) They wouldn't give a root shell, they'd only test whether they reached 
uid0 (or some other elevated privilege).

The advantages of such a suite would be:

 - Uptodate tests on modern kernels: It would allow the (controlled) testing of 
   live kernel exploits even on the latest kernel - and would allow the testing of 
   various defensive measures.

 - It would also make sure that defensive measures _remain_ effective against 
   similar categories of bugs. We've had defensive measure regressions in the 
   past, which was only discovered when the next exploit came out ...

 - Testing of new defensive measures: It would help convert this whole 
   probabilistic and emotion driven "kernel protection" business into something 
   somewhat more rational. For example new protection mechanisms should have a 
   demonstrated ability to turn an existing exploit test into something less 
   dangerous.

 - Education: It would teach kernel developers the various patterns of holes, 
   right in the code. Maybe being more directly exposed to what can get you p0wned 
   is both a stronger education force plus it could give people ideas about how to 
   protect better.

 - I also think that collecting the various problems into a single place will give 
   us new insights into patterns, bug counts and various exploit techniques.

The disadvantages would be:

 - Maintenance: do we want to add extra (compiled out by default) code to the 
   kernel whose only purpose is to demonstrate certain types of bugs?

 - Exposing exploits: Do we want to host a powerful collection of almost-exploits 
   in tools/ ? I don't think we have a choice but to face the problem directly - 
   but others might disagree.

I think most of the negatives could be kept small by starting small, allowing 
maintainers to explicitly opt-in, and observing the effects as we go. But YMMV.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-16  8:31                   ` Ingo Molnar
@ 2016-11-16  8:51                     ` Greg KH
  2016-11-16  9:07                       ` Ingo Molnar
  2016-11-16 10:15                     ` Peter Zijlstra
  2016-11-16 18:41                     ` Kees Cook
  2 siblings, 1 reply; 96+ messages in thread
From: Greg KH @ 2016-11-16  8:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Peter Zijlstra, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Wed, Nov 16, 2016 at 09:31:55AM +0100, Ingo Molnar wrote:
> So what I'd love to see is to have a kernel option that re-introduces some 
> historic root (and other) holes that can be exploited deterministically - 
> obviously default disabled.

Ick, I don't want to have to support nasty #ifdefs for
"CONFIG_TOTALLY_INSECURE" type options in code logic for the next 20+
years, do you?

> I'd restrict this to reasonably 'deterministic' holes, and the exploits themselves 
> could be somewhere in tools/. (Obviously only where the maintainers agree to host 
> the code.) They wouldn't give a root shell, they'd only test whether they reached 
> uid0 (or some other elevated privilege).

Having exploits in tools/ would be good, I would like to see that, as
then we can ensure that we don't ever introduce old problems that we
have fixed again in the future.  That I have no objection to.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-16  8:51                     ` Greg KH
@ 2016-11-16  9:07                       ` Ingo Molnar
  2016-11-16  9:24                         ` Greg KH
  0 siblings, 1 reply; 96+ messages in thread
From: Ingo Molnar @ 2016-11-16  9:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Kees Cook, Peter Zijlstra, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML


* Greg KH <gregkh@linuxfoundation.org> wrote:

> On Wed, Nov 16, 2016 at 09:31:55AM +0100, Ingo Molnar wrote:
> > So what I'd love to see is to have a kernel option that re-introduces some 
> > historic root (and other) holes that can be exploited deterministically - 
> > obviously default disabled.
> 
> Ick, I don't want to have to support nasty #ifdefs for
> "CONFIG_TOTALLY_INSECURE" type options in code logic for the next 20+
> years, do you?

I'd write it in C, not CPP, so it would be an 'if', but yeah, it would be extra 
code otherwise.

So I'd restrict this strictly to cases:

 - Where the maintainer absolutely agrees to carry it.

 - Where it's still easy to do technically - for example a single unobtrusive 
   'if' condition or so, in cases where the current upstream code still has a 
   similar structure conductive to the re-introducion of the bug. Such testcases
   can be dropped the moment they interfere with active development.

 - Plus an additional approach could be that some of the typical holes can be
   reproduced in completely separate code that is not seen by anyone who doesn't 
   want to see it.

I doubt many bugs have 20 years life times in face of frequent code reorganization 
- and if code is static for 20 years then there won't be much extra maintenance 
overhead, right?

> > I'd restrict this to reasonably 'deterministic' holes, and the exploits themselves 
> > could be somewhere in tools/. (Obviously only where the maintainers agree to host 
> > the code.) They wouldn't give a root shell, they'd only test whether they reached 
> > uid0 (or some other elevated privilege).
> 
> Having exploits in tools/ would be good, I would like to see that, as
> then we can ensure that we don't ever introduce old problems that we
> have fixed again in the future.  That I have no objection to.

Heh, I actually guessed that this would be the more contentious part of my 
suggestion - go figure! ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-16  9:07                       ` Ingo Molnar
@ 2016-11-16  9:24                         ` Greg KH
  0 siblings, 0 replies; 96+ messages in thread
From: Greg KH @ 2016-11-16  9:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Peter Zijlstra, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Wed, Nov 16, 2016 at 10:07:37AM +0100, Ingo Molnar wrote:
> 
> * Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> > On Wed, Nov 16, 2016 at 09:31:55AM +0100, Ingo Molnar wrote:
> > > So what I'd love to see is to have a kernel option that re-introduces some 
> > > historic root (and other) holes that can be exploited deterministically - 
> > > obviously default disabled.
> > 
> > Ick, I don't want to have to support nasty #ifdefs for
> > "CONFIG_TOTALLY_INSECURE" type options in code logic for the next 20+
> > years, do you?
> 
> I'd write it in C, not CPP, so it would be an 'if', but yeah, it would be extra 
> code otherwise.
> 
> So I'd restrict this strictly to cases:
> 
>  - Where the maintainer absolutely agrees to carry it.
> 
>  - Where it's still easy to do technically - for example a single unobtrusive 
>    'if' condition or so, in cases where the current upstream code still has a 
>    similar structure conductive to the re-introducion of the bug. Such testcases
>    can be dropped the moment they interfere with active development.
> 
>  - Plus an additional approach could be that some of the typical holes can be
>    reproduced in completely separate code that is not seen by anyone who doesn't 
>    want to see it.

Ok, but in looking at a number of "security" fixes over the past year or
so, I don't think that many of them would really work well for this.
Just look at all of the "don't reference a NULL pointer" bugs for an
example of that.

> I doubt many bugs have 20 years life times in face of frequent code reorganization 
> - and if code is static for 20 years then there won't be much extra maintenance 
> overhead, right?

Hah, you obviously are not in charge of maintaining the tty layer :)

Anyway, if you want to try this for the next type of security "issue" in
your area of the kernel, be my guest, but I think it's going to be a lot
harder than you think.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-15 20:53       ` Kees Cook
  2016-11-16  8:21         ` Greg KH
@ 2016-11-16 10:09         ` Peter Zijlstra
  2016-11-16 18:58           ` Kees Cook
  1 sibling, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-16 10:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
> 
> What should we do about things like this (bpf_prog_put() and callbacks
> from kernel/bpf/syscall.c):
> 
> 
> static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
> {
>         struct user_struct *user = prog->aux->user;
> 
>         atomic_long_sub(prog->pages, &user->locked_vm);
>         free_uid(user);
> }
> 
> static void __bpf_prog_put_rcu(struct rcu_head *rcu)
> {
>         struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> 
>         free_used_maps(aux);
>         bpf_prog_uncharge_memlock(aux->prog);
>         bpf_prog_free(aux->prog);
> }
> 
> void bpf_prog_put(struct bpf_prog *prog)
> {
>         if (atomic_dec_and_test(&prog->aux->refcnt))
>                 call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> }
> 
> 
> Not only do we want to protect prog->aux->refcnt, but I think we want
> to protect user->locked_vm too ... I don't think it's sane for
> user->locked_vm to be a stats_t ?

Why would you want to mess with locked_vm? You seem of the opinion that
everything atomic_t is broken, this isn't the case.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-16  8:21         ` Greg KH
@ 2016-11-16 10:10           ` Peter Zijlstra
  2016-11-16 10:18             ` Greg KH
  2016-11-16 10:11           ` Daniel Borkmann
  1 sibling, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-16 10:10 UTC (permalink / raw)
  To: Greg KH
  Cc: Kees Cook, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Wed, Nov 16, 2016 at 09:21:51AM +0100, Greg KH wrote:
> > What should we do about things like this (bpf_prog_put() and callbacks
> > from kernel/bpf/syscall.c):
> > 
> > 
> > static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
> > {
> >         struct user_struct *user = prog->aux->user;
> > 
> >         atomic_long_sub(prog->pages, &user->locked_vm);
> 
> Oh that's scary.  Let's just make one reference count rely on another
> one and not check things...

Its not a reference count, its a resource limit thingy. Also, isn't
stacking, or in general building an object graph, the entire point of
reference counts?

> >         free_uid(user);
> > }
> > 
> > static void __bpf_prog_put_rcu(struct rcu_head *rcu)
> > {
> >         struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> > 
> >         free_used_maps(aux);
> >         bpf_prog_uncharge_memlock(aux->prog);
> >         bpf_prog_free(aux->prog);
> > }
> > 
> > void bpf_prog_put(struct bpf_prog *prog)
> > {
> >         if (atomic_dec_and_test(&prog->aux->refcnt))
> >                 call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> > }
> > 
> > 
> > Not only do we want to protect prog->aux->refcnt, but I think we want
> > to protect user->locked_vm too ... I don't think it's sane for
> > user->locked_vm to be a stats_t ?
> 
> I don't think this is sane code...

I once again fail to see any problems. That code is fine.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-16  8:21         ` Greg KH
  2016-11-16 10:10           ` Peter Zijlstra
@ 2016-11-16 10:11           ` Daniel Borkmann
  2016-11-16 10:19             ` Greg KH
  1 sibling, 1 reply; 96+ messages in thread
From: Daniel Borkmann @ 2016-11-16 10:11 UTC (permalink / raw)
  To: Greg KH, Kees Cook
  Cc: Peter Zijlstra, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov

On 11/16/2016 09:21 AM, Greg KH wrote:
> On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
>> On Tue, Nov 15, 2016 at 12:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> On Tue, Nov 15, 2016 at 08:33:22AM +0100, Greg KH wrote:
>>>> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
>>>
>>>>> --- a/drivers/block/drbd/drbd_req.c
>>>>> +++ b/drivers/block/drbd/drbd_req.c
>>>>> @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
>>>>>              /* Completion does it's own kref_put.  If we are going to
>>>>>               * kref_sub below, we need req to be still around then. */
>>>>>              int at_least = k_put + !!c_put;
>>>>> -           int refcount = atomic_read(&req->kref.refcount);
>>>>> +           int refcount = kref_read(&req->kref);
>>>>>              if (refcount < at_least)
>>>>>                      drbd_err(device,
>>>>>                              "mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",
>>>>
>>>> As proof of "things you should never do", here is one such example.
>>>>
>>>> ugh.
>>>>
>>>>> --- a/drivers/block/virtio_blk.c
>>>>> +++ b/drivers/block/virtio_blk.c
>>>>> @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
>>>>>      /* Stop all the virtqueues. */
>>>>>      vdev->config->reset(vdev);
>>>>>
>>>>> -   refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
>>>>> +   refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
>>>>>      put_disk(vblk->disk);
>>>>>      vdev->config->del_vqs(vdev);
>>>>>      kfree(vblk->vqs);
>>>>
>>>> And this too, ugh, that's a huge abuse and is probably totally wrong...
>>>>
>>>> thanks again for digging through this crap.  I wonder if we need to name
>>>> the kref reference variable "do_not_touch_this_ever" or some such thing
>>>> to catch all of the people who try to be "too smart".
>>>
>>> There's unimaginable bong hits involved in this stuff, in the end I
>>> resorted to brute force and scripts to convert all this.
>>
>> What should we do about things like this (bpf_prog_put() and callbacks
>> from kernel/bpf/syscall.c):

Just reading up on this series. Your question refers to converting bpf
prog and map ref counts to Peter's refcount_t eventually, right?

>> static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
>> {
>>          struct user_struct *user = prog->aux->user;
>>
>>          atomic_long_sub(prog->pages, &user->locked_vm);
>
> Oh that's scary.  Let's just make one reference count rely on another
> one and not check things...

Sorry, could you elaborate what you mean by 'check things', you mean for
wrap around? IIUC, back then accounting was roughly similar modeled after
perf event's one, and in this case accounts for pages used by progs and
maps during their life-time. Are you suggesting that this approach is
inherently broken?

>>          free_uid(user);
>> }
>>
>> static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>> {
>>          struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
>>
>>          free_used_maps(aux);
>>          bpf_prog_uncharge_memlock(aux->prog);
>>          bpf_prog_free(aux->prog);
>> }
>>
>> void bpf_prog_put(struct bpf_prog *prog)
>> {
>>          if (atomic_dec_and_test(&prog->aux->refcnt))
>>                  call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>> }
>>
>>
>> Not only do we want to protect prog->aux->refcnt, but I think we want
>> to protect user->locked_vm too ... I don't think it's sane for
>> user->locked_vm to be a stats_t ?
>
> I don't think this is sane code...
>
> greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-16  8:31                   ` Ingo Molnar
  2016-11-16  8:51                     ` Greg KH
@ 2016-11-16 10:15                     ` Peter Zijlstra
  2016-11-16 18:55                       ` Kees Cook
  2016-11-16 18:41                     ` Kees Cook
  2 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-16 10:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, H. Peter Anvin, David Windsor, Linus Torvalds,
	LKML

On Wed, Nov 16, 2016 at 09:31:55AM +0100, Ingo Molnar wrote:
> 
> * Kees Cook <keescook@chromium.org> wrote:
> 
> > On Tue, Nov 15, 2016 at 11:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > >
> > > On 15 November 2016 19:06:28 CET, Kees Cook <keescook@chromium.org> wrote:
> > >
> > >>I'll want to modify this in the future; I have a config already doing
> > >>"Bug on data structure corruption" that makes the warn/bug choice.
> > >>It'll need some massaging to fit into the new refcount_t checks, but
> > >>it should be okay -- there needs to be a way to complete the
> > >>saturation, etc, but still kill the offending process group.
> > >
> > > Ideally we'd create a new WARN like construct that continues in kernel space 
> > > and terminates the process on return to user. That way there would be minimal 
> > > kernel state corruption.
> 
> Yeah, so the problem is that sometimes you are p0wned the moment you return to a 
> corrupted stack, and some of these checks only detect corruption after the fact.

So the case here is about refcounts, with the saturation semantics we
avoid the use-after-free case which is all this is about. So actually
continuation of execution is harmless vs the attack vector in question.

Corrupting the stack is another attack vector, one that refcount
overflow is entirely unrelated to and not one I think we should consider
here.

The problem with BUG and insta killing the task is that refcounts are
typically done under locks, if you kill the task before the unlock,
you've wrecked kernel state in unrecoverable ways.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-16 10:10           ` Peter Zijlstra
@ 2016-11-16 10:18             ` Greg KH
  0 siblings, 0 replies; 96+ messages in thread
From: Greg KH @ 2016-11-16 10:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Wed, Nov 16, 2016 at 11:10:42AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 16, 2016 at 09:21:51AM +0100, Greg KH wrote:
> > > What should we do about things like this (bpf_prog_put() and callbacks
> > > from kernel/bpf/syscall.c):
> > > 
> > > 
> > > static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
> > > {
> > >         struct user_struct *user = prog->aux->user;
> > > 
> > >         atomic_long_sub(prog->pages, &user->locked_vm);
> > 
> > Oh that's scary.  Let's just make one reference count rely on another
> > one and not check things...
> 
> Its not a reference count, its a resource limit thingy. Also, isn't
> stacking, or in general building an object graph, the entire point of
> reference counts?

Ah, that wasn't obvious, but yes, you are correct here, sorry for the
noise.

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-16 10:11           ` Daniel Borkmann
@ 2016-11-16 10:19             ` Greg KH
  0 siblings, 0 replies; 96+ messages in thread
From: Greg KH @ 2016-11-16 10:19 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Peter Zijlstra, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	David Windsor, LKML, Alexei Starovoitov

On Wed, Nov 16, 2016 at 11:11:43AM +0100, Daniel Borkmann wrote:
> On 11/16/2016 09:21 AM, Greg KH wrote:
> > On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
> > > On Tue, Nov 15, 2016 at 12:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > On Tue, Nov 15, 2016 at 08:33:22AM +0100, Greg KH wrote:
> > > > > On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > > > 
> > > > > > --- a/drivers/block/drbd/drbd_req.c
> > > > > > +++ b/drivers/block/drbd/drbd_req.c
> > > > > > @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
> > > > > >              /* Completion does it's own kref_put.  If we are going to
> > > > > >               * kref_sub below, we need req to be still around then. */
> > > > > >              int at_least = k_put + !!c_put;
> > > > > > -           int refcount = atomic_read(&req->kref.refcount);
> > > > > > +           int refcount = kref_read(&req->kref);
> > > > > >              if (refcount < at_least)
> > > > > >                      drbd_err(device,
> > > > > >                              "mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",
> > > > > 
> > > > > As proof of "things you should never do", here is one such example.
> > > > > 
> > > > > ugh.
> > > > > 
> > > > > > --- a/drivers/block/virtio_blk.c
> > > > > > +++ b/drivers/block/virtio_blk.c
> > > > > > @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
> > > > > >      /* Stop all the virtqueues. */
> > > > > >      vdev->config->reset(vdev);
> > > > > > 
> > > > > > -   refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
> > > > > > +   refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
> > > > > >      put_disk(vblk->disk);
> > > > > >      vdev->config->del_vqs(vdev);
> > > > > >      kfree(vblk->vqs);
> > > > > 
> > > > > And this too, ugh, that's a huge abuse and is probably totally wrong...
> > > > > 
> > > > > thanks again for digging through this crap.  I wonder if we need to name
> > > > > the kref reference variable "do_not_touch_this_ever" or some such thing
> > > > > to catch all of the people who try to be "too smart".
> > > > 
> > > > There's unimaginable bong hits involved in this stuff, in the end I
> > > > resorted to brute force and scripts to convert all this.
> > > 
> > > What should we do about things like this (bpf_prog_put() and callbacks
> > > from kernel/bpf/syscall.c):
> 
> Just reading up on this series. Your question refers to converting bpf
> prog and map ref counts to Peter's refcount_t eventually, right?
> 
> > > static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
> > > {
> > >          struct user_struct *user = prog->aux->user;
> > > 
> > >          atomic_long_sub(prog->pages, &user->locked_vm);
> > 
> > Oh that's scary.  Let's just make one reference count rely on another
> > one and not check things...
> 
> Sorry, could you elaborate what you mean by 'check things', you mean for
> wrap around? IIUC, back then accounting was roughly similar modeled after
> perf event's one, and in this case accounts for pages used by progs and
> maps during their life-time. Are you suggesting that this approach is
> inherently broken?

No, it is correct, I responded too quickly before my morning coffee had
kicked in, my apologies.

greg k-h

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-16  8:31                   ` Ingo Molnar
  2016-11-16  8:51                     ` Greg KH
  2016-11-16 10:15                     ` Peter Zijlstra
@ 2016-11-16 18:41                     ` Kees Cook
  2 siblings, 0 replies; 96+ messages in thread
From: Kees Cook @ 2016-11-16 18:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Greg KH, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Wed, Nov 16, 2016 at 12:31 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> On Tue, Nov 15, 2016 at 11:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >
>> >
>> > On 15 November 2016 19:06:28 CET, Kees Cook <keescook@chromium.org> wrote:
>> >
>> >>I'll want to modify this in the future; I have a config already doing
>> >>"Bug on data structure corruption" that makes the warn/bug choice.
>> >>It'll need some massaging to fit into the new refcount_t checks, but
>> >>it should be okay -- there needs to be a way to complete the
>> >>saturation, etc, but still kill the offending process group.
>> >
>> > Ideally we'd create a new WARN like construct that continues in kernel space
>> > and terminates the process on return to user. That way there would be minimal
>> > kernel state corruption.
>
> Yeah, so the problem is that sometimes you are p0wned the moment you return to a
> corrupted stack, and some of these checks only detect corruption after the fact.

Exactly.

>> Right, though I'd like to be conservative about the kernel execution
>> continuing... I'll experiment with it.
>
> So what I'd love to see is to have a kernel option that re-introduces some
> historic root (and other) holes that can be exploited deterministically -
> obviously default disabled.
>
> I'd restrict this to reasonably 'deterministic' holes, and the exploits themselves
> could be somewhere in tools/. (Obviously only where the maintainers agree to host
> the code.) They wouldn't give a root shell, they'd only test whether they reached
> uid0 (or some other elevated privilege).

Have you looked at what lkdtm (CONFIG_LKDTM) does? It is explicitly a
collection of specific bad behaviors designed to trigger kernel flaw
mitigations.

> The advantages of such a suite would be:
>
>  - Uptodate tests on modern kernels: It would allow the (controlled) testing of
>    live kernel exploits even on the latest kernel - and would allow the testing of
>    various defensive measures.
>
>  - It would also make sure that defensive measures _remain_ effective against
>    similar categories of bugs. We've had defensive measure regressions in the
>    past, which was only discovered when the next exploit came out ...
>
>  - Testing of new defensive measures: It would help convert this whole
>    probabilistic and emotion driven "kernel protection" business into something
>    somewhat more rational. For example new protection mechanisms should have a
>    demonstrated ability to turn an existing exploit test into something less
>    dangerous.
>
>  - Education: It would teach kernel developers the various patterns of holes,
>    right in the code. Maybe being more directly exposed to what can get you p0wned
>    is both a stronger education force plus it could give people ideas about how to
>    protect better.
>
>  - I also think that collecting the various problems into a single place will give
>    us new insights into patterns, bug counts and various exploit techniques.

Unless I'm missing some detail of your idea, lkdtm already does all of this.

> The disadvantages would be:
>
>  - Maintenance: do we want to add extra (compiled out by default) code to the
>    kernel whose only purpose is to demonstrate certain types of bugs?
>
>  - Exposing exploits: Do we want to host a powerful collection of almost-exploits
>    in tools/ ? I don't think we have a choice but to face the problem directly -
>    but others might disagree.

They don't need to be exploits to test self-protection systems.

> I think most of the negatives could be kept small by starting small, allowing
> maintainers to explicitly opt-in, and observing the effects as we go. But YMMV.

I certainly think lkdtm could be further expanded, but I'd love to see
what you think is specifically missing...

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-16 10:15                     ` Peter Zijlstra
@ 2016-11-16 18:55                       ` Kees Cook
  2016-11-17  8:33                         ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Kees Cook @ 2016-11-16 18:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Greg KH, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Wed, Nov 16, 2016 at 2:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Nov 16, 2016 at 09:31:55AM +0100, Ingo Molnar wrote:
>>
>> * Kees Cook <keescook@chromium.org> wrote:
>>
>> > On Tue, Nov 15, 2016 at 11:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > >
>> > >
>> > > On 15 November 2016 19:06:28 CET, Kees Cook <keescook@chromium.org> wrote:
>> > >
>> > >>I'll want to modify this in the future; I have a config already doing
>> > >>"Bug on data structure corruption" that makes the warn/bug choice.
>> > >>It'll need some massaging to fit into the new refcount_t checks, but
>> > >>it should be okay -- there needs to be a way to complete the
>> > >>saturation, etc, but still kill the offending process group.
>> > >
>> > > Ideally we'd create a new WARN like construct that continues in kernel space
>> > > and terminates the process on return to user. That way there would be minimal
>> > > kernel state corruption.
>>
>> Yeah, so the problem is that sometimes you are p0wned the moment you return to a
>> corrupted stack, and some of these checks only detect corruption after the fact.
>
> So the case here is about refcounts, with the saturation semantics we
> avoid the use-after-free case which is all this is about. So actually
> continuation of execution is harmless vs the attack vector in question.
>
> Corrupting the stack is another attack vector, one that refcount
> overflow is entirely unrelated to and not one I think we should consider
> here.
>
> The problem with BUG and insta killing the task is that refcounts are
> typically done under locks, if you kill the task before the unlock,
> you've wrecked kernel state in unrecoverable ways.

My intention with what I'm designing is to couple the "panic_on_oops"
sysctl logic with a "kernel structure corruption has been detected"
warning. That way, one can select, at runtime, if the kernel should
panic instantly on hitting this, or just do its best to clean things
up and kill the process. There basically isn't a use-case for BUG in
this situation. Either you're risk-averse enough to want to take the
entire machine down, or you want to kill the offending process and
clean up to continue running.

I'm still evolving how to best do it, and right now it's a rather
large hammer (now controlled by a CONFIG called
CONFIG_BUG_ON_DATA_CORRUPTION in -next but it will likely disappear
entirely as its design has evolved). I intend to improve it first and
then expand its coverage in the kernel. It requires extracting some of
the per-arch BUG logic into a real kernel API, and combining it with
existing pieces of the WARN API.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-16 10:09         ` Peter Zijlstra
@ 2016-11-16 18:58           ` Kees Cook
  2016-11-17  8:34             ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Kees Cook @ 2016-11-16 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Wed, Nov 16, 2016 at 2:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
>>
>> What should we do about things like this (bpf_prog_put() and callbacks
>> from kernel/bpf/syscall.c):
>>
>>
>> static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
>> {
>>         struct user_struct *user = prog->aux->user;
>>
>>         atomic_long_sub(prog->pages, &user->locked_vm);
>>         free_uid(user);
>> }
>>
>> static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>> {
>>         struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
>>
>>         free_used_maps(aux);
>>         bpf_prog_uncharge_memlock(aux->prog);
>>         bpf_prog_free(aux->prog);
>> }
>>
>> void bpf_prog_put(struct bpf_prog *prog)
>> {
>>         if (atomic_dec_and_test(&prog->aux->refcnt))
>>                 call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>> }
>>
>>
>> Not only do we want to protect prog->aux->refcnt, but I think we want
>> to protect user->locked_vm too ... I don't think it's sane for
>> user->locked_vm to be a stats_t ?
>
> Why would you want to mess with locked_vm? You seem of the opinion that
> everything atomic_t is broken, this isn't the case.

What I mean to say is that while the refcnt here should clearly be
converted to kref or refcount_t, it looks like locked_vm should become
a new stats_t. However, it seems weird for locked_vm to ever wrap
either...

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-16 18:55                       ` Kees Cook
@ 2016-11-17  8:33                         ` Peter Zijlstra
  2016-11-17 19:50                           ` Kees Cook
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17  8:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Greg KH, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Wed, Nov 16, 2016 at 10:55:16AM -0800, Kees Cook wrote:
> My intention with what I'm designing is to couple the "panic_on_oops"

There is a panic_on_warn knob too.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-16 18:58           ` Kees Cook
@ 2016-11-17  8:34             ` Peter Zijlstra
  2016-11-17 12:30               ` David Windsor
  2016-11-17 19:34               ` Kees Cook
  0 siblings, 2 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17  8:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Wed, Nov 16, 2016 at 10:58:38AM -0800, Kees Cook wrote:
> On Wed, Nov 16, 2016 at 2:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
> >>
> >> What should we do about things like this (bpf_prog_put() and callbacks
> >> from kernel/bpf/syscall.c):
> >>
> >>
> >> static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
> >> {
> >>         struct user_struct *user = prog->aux->user;
> >>
> >>         atomic_long_sub(prog->pages, &user->locked_vm);
> >>         free_uid(user);
> >> }
> >>
> >> static void __bpf_prog_put_rcu(struct rcu_head *rcu)
> >> {
> >>         struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> >>
> >>         free_used_maps(aux);
> >>         bpf_prog_uncharge_memlock(aux->prog);
> >>         bpf_prog_free(aux->prog);
> >> }
> >>
> >> void bpf_prog_put(struct bpf_prog *prog)
> >> {
> >>         if (atomic_dec_and_test(&prog->aux->refcnt))
> >>                 call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> >> }
> >>
> >>
> >> Not only do we want to protect prog->aux->refcnt, but I think we want
> >> to protect user->locked_vm too ... I don't think it's sane for
> >> user->locked_vm to be a stats_t ?
> >
> > Why would you want to mess with locked_vm? You seem of the opinion that
> > everything atomic_t is broken, this isn't the case.
> 
> What I mean to say is that while the refcnt here should clearly be
> converted to kref or refcount_t, it looks like locked_vm should become
> a new stats_t. However, it seems weird for locked_vm to ever wrap
> either...

No, its not a statistic. Also, I'm far from convinced stats_t is an
actually useful thing to have.

refcount_t brought special semantics that clearly are different from
regular atomic_t, stats_t would not, so why would it need to exist.

Not to mention that you seem over eager to apply it, which doesn't
inspire confidence.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-15 14:19       ` Boqun Feng
@ 2016-11-17  9:28         ` Peter Zijlstra
  2016-11-17  9:48           ` Boqun Feng
  2016-11-17 12:08           ` Will Deacon
  0 siblings, 2 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17  9:28 UTC (permalink / raw)
  To: Boqun Feng
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Tue, Nov 15, 2016 at 10:19:09PM +0800, Boqun Feng wrote:
> On Tue, Nov 15, 2016 at 02:01:54PM +0100, Peter Zijlstra wrote:
> > On Tue, Nov 15, 2016 at 08:33:37PM +0800, Boqun Feng wrote:
> > > Hi Peter,
> > > 
> > > On Mon, Nov 14, 2016 at 06:39:53PM +0100, Peter Zijlstra wrote:
> > > [...]
> > > > +/*
> > > > + * Similar to atomic_dec_and_test(), it will BUG on underflow and fail to
> > > > + * decrement when saturated at UINT_MAX.
> > > > + *
> > > > + * Provides release memory ordering, such that prior loads and stores are done
> > > > + * before a subsequent free.
> > > 
> > > I'm not sure this is correct, the RELEASE semantics is for the STORE
> > > part of cmpxchg, and semantically it will guarantee that memory
> > > operations after cmpxchg won't be reordered upwards, for example, on
> > > ARM64, the following code:
> > > 
> > > 	WRITE_ONCE(x, 1)
> > > 	
> > > 	atomic_cmpxchg_release(&a, 1, 2);
> > > 	  r1 = ll(&a)
> > > 	  if (r1 == 1) {
> > > 	    sc_release(&a, 2);
> > > 	  }
> > > 	
> > > 	free()
> > > 
> > > could be reordered as, I think:
> > > 
> > > 	atomic_cmpxchg_release(&a, 1, 2);
> > > 	  r1 = ll(&a)
> > > 	  if (r1 == 1) {
> > > 	    free()
> > > 	    WRITE_ONCE(x, 1)
> > > 	    sc_release(&a, 2);
> > > 	  }
> > > 
> > > Of course, we need to wait for Will to confirm about this. But if this
> > > could happen, we'd better to use a smp_mb()+atomic_cmpxchg_relaxed()
> > > here and for other refcount_dec_and_*().
> > 
> > Can't happen I think because of the control dependency between
> > dec_and_test() and free().
> > 
> > That is, the cmpxchg_release() must complete to determine if it was
> > successful or it needs a retry. The success, combined with the state of
> > the variable will then determine if we call free().
> > 
> 
> The thing is that determination of the variable's state(i.e.
> store_release() succeeds) and the actual writeback to memory are two
> separate events. So yes, free() won't execute before store_release()
> commits successfully, but there is no barrier here to order the memory
> effects of store_release() and free().

Doesn't matter. If we dropped the refcount to 0, nobody else will be
observing this memory anymore (unless ill-formed program). The only
thing we need is that the free() will not be speculated.

This is because all RmW on a specific variable, irrespective of their
memory ordering on other loads/stores, are totally ordered against one
another.

> But as I said, we actually only need the pairing of orderings:
> 
> 1) load part of cmpxchg -> free() 
> 2) object accesses -> store part of cmpxchg
> 
> Ordering #1 can be achieved via control dependency as you pointed out
> that free()s very much includes stores. And ordering #2 can be achieved
> with RELEASE.
> 
> So the code is right, I just thought the comment may be misleading. The
> reason we use cmpxchg_release() is just for achieving ordering #2, and
> not to order "prior loads and stores" with "a subsequent free".
> 
> Am I missing some subtle orderings here?

I would want to further quality 1), it must be no earlier than the load
of the last / successful ll/sc round.

At that point we're guaranteed a reference count of 1 that _will_ drop
to 0, and thus nobody else (should) reference that memory anymore.

If we agree on this, I'll update the comment :-) Will, do you too agree?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17  9:28         ` Peter Zijlstra
@ 2016-11-17  9:48           ` Boqun Feng
  2016-11-17 10:29             ` Peter Zijlstra
  2016-11-17 12:08           ` Will Deacon
  1 sibling, 1 reply; 96+ messages in thread
From: Boqun Feng @ 2016-11-17  9:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3962 bytes --]

On Thu, Nov 17, 2016 at 10:28:00AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 15, 2016 at 10:19:09PM +0800, Boqun Feng wrote:
> > On Tue, Nov 15, 2016 at 02:01:54PM +0100, Peter Zijlstra wrote:
> > > On Tue, Nov 15, 2016 at 08:33:37PM +0800, Boqun Feng wrote:
> > > > Hi Peter,
> > > > 
> > > > On Mon, Nov 14, 2016 at 06:39:53PM +0100, Peter Zijlstra wrote:
> > > > [...]
> > > > > +/*
> > > > > + * Similar to atomic_dec_and_test(), it will BUG on underflow and fail to
> > > > > + * decrement when saturated at UINT_MAX.
> > > > > + *
> > > > > + * Provides release memory ordering, such that prior loads and stores are done
> > > > > + * before a subsequent free.
> > > > 
> > > > I'm not sure this is correct, the RELEASE semantics is for the STORE
> > > > part of cmpxchg, and semantically it will guarantee that memory
> > > > operations after cmpxchg won't be reordered upwards, for example, on
> > > > ARM64, the following code:
> > > > 
> > > > 	WRITE_ONCE(x, 1)
> > > > 	
> > > > 	atomic_cmpxchg_release(&a, 1, 2);
> > > > 	  r1 = ll(&a)
> > > > 	  if (r1 == 1) {
> > > > 	    sc_release(&a, 2);
> > > > 	  }
> > > > 	
> > > > 	free()
> > > > 
> > > > could be reordered as, I think:
> > > > 
> > > > 	atomic_cmpxchg_release(&a, 1, 2);
> > > > 	  r1 = ll(&a)
> > > > 	  if (r1 == 1) {
> > > > 	    free()
> > > > 	    WRITE_ONCE(x, 1)
> > > > 	    sc_release(&a, 2);
> > > > 	  }
> > > > 
> > > > Of course, we need to wait for Will to confirm about this. But if this
> > > > could happen, we'd better to use a smp_mb()+atomic_cmpxchg_relaxed()
> > > > here and for other refcount_dec_and_*().
> > > 
> > > Can't happen I think because of the control dependency between
> > > dec_and_test() and free().
> > > 
> > > That is, the cmpxchg_release() must complete to determine if it was
> > > successful or it needs a retry. The success, combined with the state of
> > > the variable will then determine if we call free().
> > > 
> > 
> > The thing is that determination of the variable's state(i.e.
> > store_release() succeeds) and the actual writeback to memory are two
> > separate events. So yes, free() won't execute before store_release()
> > commits successfully, but there is no barrier here to order the memory
> > effects of store_release() and free().
> 
> Doesn't matter. If we dropped the refcount to 0, nobody else will be
> observing this memory anymore (unless ill-formed program). The only
> thing we need is that the free() will not be speculated.
> 
> This is because all RmW on a specific variable, irrespective of their
> memory ordering on other loads/stores, are totally ordered against one
> another.
> 

Fair enough ;-)

> > But as I said, we actually only need the pairing of orderings:
> > 
> > 1) load part of cmpxchg -> free() 
> > 2) object accesses -> store part of cmpxchg
> > 
> > Ordering #1 can be achieved via control dependency as you pointed out
> > that free()s very much includes stores. And ordering #2 can be achieved
> > with RELEASE.
> > 
> > So the code is right, I just thought the comment may be misleading. The
> > reason we use cmpxchg_release() is just for achieving ordering #2, and
> > not to order "prior loads and stores" with "a subsequent free".
> > 
> > Am I missing some subtle orderings here?
> 
> I would want to further quality 1), it must be no earlier than the load
> of the last / successful ll/sc round.
> 

Great, that's more accurate!

> At that point we're guaranteed a reference count of 1 that _will_ drop
> to 0, and thus nobody else (should) reference that memory anymore.
> 
> If we agree on this, I'll update the comment :-) Will, do you too agree?

Agreed ;-)

Control dependencies and RELEASE are totally enough for the internal
correctness of refcount_t along with its interactivity with free().
People better not reply order guarantees other than this ;-)

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17  9:48           ` Boqun Feng
@ 2016-11-17 10:29             ` Peter Zijlstra
  2016-11-17 10:39               ` Peter Zijlstra
       [not found]               ` <CAL0jBu-GnREUPSX4kUDp-Cc8ZGp6+Cb2q0HVandswcLzPRnChQ@mail.gmail.com>
  0 siblings, 2 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 10:29 UTC (permalink / raw)
  To: Boqun Feng
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Thu, Nov 17, 2016 at 05:48:51PM +0800, Boqun Feng wrote:
> > > But as I said, we actually only need the pairing of orderings:
> > > 
> > > 1) load part of cmpxchg -> free() 
> > > 2) object accesses -> store part of cmpxchg
> > > 
> > > Ordering #1 can be achieved via control dependency as you pointed out
> > > that free()s very much includes stores. And ordering #2 can be achieved
> > > with RELEASE.
> > > 
> > > So the code is right, I just thought the comment may be misleading. The
> > > reason we use cmpxchg_release() is just for achieving ordering #2, and
> > > not to order "prior loads and stores" with "a subsequent free".
> > > 
> > > Am I missing some subtle orderings here?
> > 
> > I would want to further quality 1), it must be no earlier than the load
> > of the last / successful ll/sc round.
> > 
> 
> Great, that's more accurate!
> 
> > At that point we're guaranteed a reference count of 1 that _will_ drop
> > to 0, and thus nobody else (should) reference that memory anymore.
> > 
> > If we agree on this, I'll update the comment :-) Will, do you too agree?
> 
> Agreed ;-)
> 
> Control dependencies and RELEASE are totally enough for the internal
> correctness of refcount_t along with its interactivity with free().
> People better not reply order guarantees other than this ;-)

Hurm.. let me ruin my own argument.

Since the free() stores could leak upwards until that ll, and object
stores can be delayed until the sc, we still have a problem. Just not
with the thread that free()s or any other thread that knew about the
object.

The problem comes from any other thread doing an allocation, since its
possible to observe the memory as freed while there are stores pending
to it, we can have those delayed stores trample on our freshly allocated
and initialized object.

The stores must really not be before the SC, so I fear we must either
add an smp_wmb() after the release, or punt and use the fully ordered
cmpxchg().

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17 10:29             ` Peter Zijlstra
@ 2016-11-17 10:39               ` Peter Zijlstra
  2016-11-17 11:03                 ` Greg KH
       [not found]               ` <CAL0jBu-GnREUPSX4kUDp-Cc8ZGp6+Cb2q0HVandswcLzPRnChQ@mail.gmail.com>
  1 sibling, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 10:39 UTC (permalink / raw)
  To: Boqun Feng
  Cc: gregkh, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Thu, Nov 17, 2016 at 11:29:59AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 17, 2016 at 05:48:51PM +0800, Boqun Feng wrote:
> > > > But as I said, we actually only need the pairing of orderings:
> > > > 
> > > > 1) load part of cmpxchg -> free() 
> > > > 2) object accesses -> store part of cmpxchg
> > > > 
> > > > Ordering #1 can be achieved via control dependency as you pointed out
> > > > that free()s very much includes stores. And ordering #2 can be achieved
> > > > with RELEASE.
> > > > 
> > > > So the code is right, I just thought the comment may be misleading. The
> > > > reason we use cmpxchg_release() is just for achieving ordering #2, and
> > > > not to order "prior loads and stores" with "a subsequent free".
> > > > 
> > > > Am I missing some subtle orderings here?
> > > 
> > > I would want to further quality 1), it must be no earlier than the load
> > > of the last / successful ll/sc round.
> > > 
> > 
> > Great, that's more accurate!
> > 
> > > At that point we're guaranteed a reference count of 1 that _will_ drop
> > > to 0, and thus nobody else (should) reference that memory anymore.
> > > 
> > > If we agree on this, I'll update the comment :-) Will, do you too agree?
> > 
> > Agreed ;-)
> > 
> > Control dependencies and RELEASE are totally enough for the internal
> > correctness of refcount_t along with its interactivity with free().
> > People better not reply order guarantees other than this ;-)
> 
> Hurm.. let me ruin my own argument.
> 
> Since the free() stores could leak upwards until that ll, and object
> stores can be delayed until the sc, we still have a problem. Just not
> with the thread that free()s or any other thread that knew about the
> object.
> 
> The problem comes from any other thread doing an allocation, since its
> possible to observe the memory as freed while there are stores pending
> to it, we can have those delayed stores trample on our freshly allocated
> and initialized object.
> 
> The stores must really not be before the SC, so I fear we must either
> add an smp_wmb() after the release, or punt and use the fully ordered
> cmpxchg().

And let me note here that RCU users can use a fully relaxed put, because
call_rcu() guarantees a grace-period between the call_rcu and the
free(), which in turn provides a full memory barrier that orders things.

We could actually expose that to driver writers by doing something like:

	kref_put_rcu(struct kref *kref, struct rcu_head *head, rcu_callback_t func)
	{
		if (refcount_dec_and_test_relaxed(&kref->refcount))
			call_rcu(head, func);
	}

Do we want to go there?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17 10:39               ` Peter Zijlstra
@ 2016-11-17 11:03                 ` Greg KH
  2016-11-17 12:48                   ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Greg KH @ 2016-11-17 11:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Thu, Nov 17, 2016 at 11:39:27AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 17, 2016 at 11:29:59AM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 17, 2016 at 05:48:51PM +0800, Boqun Feng wrote:
> > > > > But as I said, we actually only need the pairing of orderings:
> > > > > 
> > > > > 1) load part of cmpxchg -> free() 
> > > > > 2) object accesses -> store part of cmpxchg
> > > > > 
> > > > > Ordering #1 can be achieved via control dependency as you pointed out
> > > > > that free()s very much includes stores. And ordering #2 can be achieved
> > > > > with RELEASE.
> > > > > 
> > > > > So the code is right, I just thought the comment may be misleading. The
> > > > > reason we use cmpxchg_release() is just for achieving ordering #2, and
> > > > > not to order "prior loads and stores" with "a subsequent free".
> > > > > 
> > > > > Am I missing some subtle orderings here?
> > > > 
> > > > I would want to further quality 1), it must be no earlier than the load
> > > > of the last / successful ll/sc round.
> > > > 
> > > 
> > > Great, that's more accurate!
> > > 
> > > > At that point we're guaranteed a reference count of 1 that _will_ drop
> > > > to 0, and thus nobody else (should) reference that memory anymore.
> > > > 
> > > > If we agree on this, I'll update the comment :-) Will, do you too agree?
> > > 
> > > Agreed ;-)
> > > 
> > > Control dependencies and RELEASE are totally enough for the internal
> > > correctness of refcount_t along with its interactivity with free().
> > > People better not reply order guarantees other than this ;-)
> > 
> > Hurm.. let me ruin my own argument.
> > 
> > Since the free() stores could leak upwards until that ll, and object
> > stores can be delayed until the sc, we still have a problem. Just not
> > with the thread that free()s or any other thread that knew about the
> > object.
> > 
> > The problem comes from any other thread doing an allocation, since its
> > possible to observe the memory as freed while there are stores pending
> > to it, we can have those delayed stores trample on our freshly allocated
> > and initialized object.
> > 
> > The stores must really not be before the SC, so I fear we must either
> > add an smp_wmb() after the release, or punt and use the fully ordered
> > cmpxchg().
> 
> And let me note here that RCU users can use a fully relaxed put, because
> call_rcu() guarantees a grace-period between the call_rcu and the
> free(), which in turn provides a full memory barrier that orders things.
> 
> We could actually expose that to driver writers by doing something like:
> 
> 	kref_put_rcu(struct kref *kref, struct rcu_head *head, rcu_callback_t func)
> 	{
> 		if (refcount_dec_and_test_relaxed(&kref->refcount))
> 			call_rcu(head, func);
> 	}
> 
> Do we want to go there?

No, please no...

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
       [not found]               ` <CAL0jBu-GnREUPSX4kUDp-Cc8ZGp6+Cb2q0HVandswcLzPRnChQ@mail.gmail.com>
@ 2016-11-17 12:08                 ` Peter Zijlstra
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 12:08 UTC (permalink / raw)
  To: Boqun Feng
  Cc: H. Peter Anvin, Arnd Bergmann, linux-kernel, tglx, will.deacon,
	dave, keescook, mingo, elena.reshetova, gregkh

On Thu, Nov 17, 2016 at 06:54:32PM +0800, Boqun Feng wrote:
> 
> If our freshly allocated object got trampled by stores in free(), that's the
> problem of allocator and free(), right? Because in that case, it's them who
> mess up the synchronization.
> 

Oh right, duh.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17  9:28         ` Peter Zijlstra
  2016-11-17  9:48           ` Boqun Feng
@ 2016-11-17 12:08           ` Will Deacon
  2016-11-17 16:11             ` Peter Zijlstra
  1 sibling, 1 reply; 96+ messages in thread
From: Will Deacon @ 2016-11-17 12:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, gregkh, keescook, elena.reshetova, arnd, tglx, mingo,
	hpa, dave, linux-kernel

On Thu, Nov 17, 2016 at 10:28:00AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 15, 2016 at 10:19:09PM +0800, Boqun Feng wrote:
> > But as I said, we actually only need the pairing of orderings:
> > 
> > 1) load part of cmpxchg -> free() 
> > 2) object accesses -> store part of cmpxchg
> > 
> > Ordering #1 can be achieved via control dependency as you pointed out
> > that free()s very much includes stores. And ordering #2 can be achieved
> > with RELEASE.
> > 
> > So the code is right, I just thought the comment may be misleading. The
> > reason we use cmpxchg_release() is just for achieving ordering #2, and
> > not to order "prior loads and stores" with "a subsequent free".
> > 
> > Am I missing some subtle orderings here?
> 
> I would want to further quality 1), it must be no earlier than the load
> of the last / successful ll/sc round.
> 
> At that point we're guaranteed a reference count of 1 that _will_ drop
> to 0, and thus nobody else (should) reference that memory anymore.
> 
> If we agree on this, I'll update the comment :-) Will, do you too agree?

All sounds reasonable to me. It's worth pointing out that you can't create
order using a control dependency hanging off the status flag of a
store-conditional, but the code in question here has the dependency from
the loaded value, which is sufficient.

Will

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17  8:34             ` Peter Zijlstra
@ 2016-11-17 12:30               ` David Windsor
  2016-11-17 12:43                 ` Peter Zijlstra
  2016-11-17 19:34               ` Kees Cook
  1 sibling, 1 reply; 96+ messages in thread
From: David Windsor @ 2016-11-17 12:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
	Alexei Starovoitov, Daniel Borkmann

On Thu, Nov 17, 2016 at 3:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Nov 16, 2016 at 10:58:38AM -0800, Kees Cook wrote:
>> On Wed, Nov 16, 2016 at 2:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
>> >>
>> >> What should we do about things like this (bpf_prog_put() and callbacks
>> >> from kernel/bpf/syscall.c):
>> >>
>> >>
>> >> static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
>> >> {
>> >>         struct user_struct *user = prog->aux->user;
>> >>
>> >>         atomic_long_sub(prog->pages, &user->locked_vm);
>> >>         free_uid(user);
>> >> }
>> >>
>> >> static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>> >> {
>> >>         struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
>> >>
>> >>         free_used_maps(aux);
>> >>         bpf_prog_uncharge_memlock(aux->prog);
>> >>         bpf_prog_free(aux->prog);
>> >> }
>> >>
>> >> void bpf_prog_put(struct bpf_prog *prog)
>> >> {
>> >>         if (atomic_dec_and_test(&prog->aux->refcnt))
>> >>                 call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>> >> }
>> >>
>> >>
>> >> Not only do we want to protect prog->aux->refcnt, but I think we want
>> >> to protect user->locked_vm too ... I don't think it's sane for
>> >> user->locked_vm to be a stats_t ?
>> >
>> > Why would you want to mess with locked_vm? You seem of the opinion that
>> > everything atomic_t is broken, this isn't the case.
>>
>> What I mean to say is that while the refcnt here should clearly be
>> converted to kref or refcount_t, it looks like locked_vm should become
>> a new stats_t. However, it seems weird for locked_vm to ever wrap
>> either...
>
> No, its not a statistic. Also, I'm far from convinced stats_t is an
> actually useful thing to have.
>

Regarding this, has there been any thought given as to how stats_t
will meaningfully differ from atomic_t?  If refcount_t is semantically
"atomic_t with reference counter overflow protection," what
services/guarantees does stats_t provide?  I cannot think of any that
don't require implementing overflow detection of some sort, which
incurs a performance hit.

One conceivable service/guarantee would be to give stats_t the ability
to detect/report when an overflow has occurred, but not ultimately
with the offending process getting killed.   On x86, this could be
done by having stats_t overflows generate a different exception number
and corresponding handler than refcount_t-generated overflows.  It
would still contain the mechanisms for detecting and responding to
overflows, but the response to stats_t overflows would differ from
that of refcount_t overflows.  Semantically, this version of stats_t
would be "refcount_t minus 'kill the offending process'."  I'm not
sure if this abstraction is in fact useful, or indeed worth the
requisite performance hit; I'm just suggesting a possible semantic
difference between atomic_t and stats_t.

> refcount_t brought special semantics that clearly are different from
> regular atomic_t, stats_t would not, so why would it need to exist.
>
> Not to mention that you seem over eager to apply it, which doesn't
> inspire confidence.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17 12:30               ` David Windsor
@ 2016-11-17 12:43                 ` Peter Zijlstra
  2016-11-17 13:01                   ` Reshetova, Elena
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 12:43 UTC (permalink / raw)
  To: David Windsor
  Cc: Kees Cook, Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
	Alexei Starovoitov, Daniel Borkmann

On Thu, Nov 17, 2016 at 07:30:29AM -0500, David Windsor wrote:
> On Thu, Nov 17, 2016 at 3:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > No, its not a statistic. Also, I'm far from convinced stats_t is an
> > actually useful thing to have.
> >
> 
> Regarding this, has there been any thought given as to how stats_t
> will meaningfully differ from atomic_t?  If refcount_t is semantically
> "atomic_t with reference counter overflow protection," what
> services/guarantees does stats_t provide?  I cannot think of any that
> don't require implementing overflow detection of some sort, which
> incurs a performance hit.

Afaict the whole point of stats_t was to allow overflow, since its only
stats, nobody cares etc..

I think the sole motivator is a general distaste of atomic_t, which
isn't a good reason at all.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17 11:03                 ` Greg KH
@ 2016-11-17 12:48                   ` Peter Zijlstra
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 12:48 UTC (permalink / raw)
  To: Greg KH
  Cc: Boqun Feng, keescook, will.deacon, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Thu, Nov 17, 2016 at 12:03:33PM +0100, Greg KH wrote:
> On Thu, Nov 17, 2016 at 11:39:27AM +0100, Peter Zijlstra wrote:

> > And let me note here that RCU users can use a fully relaxed put, because
> > call_rcu() guarantees a grace-period between the call_rcu and the
> > free(), which in turn provides a full memory barrier that orders things.
> > 
> > We could actually expose that to driver writers by doing something like:
> > 
> > 	kref_put_rcu(struct kref *kref, struct rcu_head *head, rcu_callback_t func)
> > 	{
> > 		if (refcount_dec_and_test_relaxed(&kref->refcount))
> > 			call_rcu(head, func);
> > 	}
> > 
> > Do we want to go there?
> 
> No, please no...

Hehe, fair enough. Although core kernel 'power' users might really want
that. The relaxed memory order stuff saves quite a few cycles on weak
archs.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17 12:43                 ` Peter Zijlstra
@ 2016-11-17 13:01                   ` Reshetova, Elena
  2016-11-17 13:22                     ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Reshetova, Elena @ 2016-11-17 13:01 UTC (permalink / raw)
  To: Peter Zijlstra, David Windsor
  Cc: Kees Cook, Greg KH, Will Deacon, Arnd Bergmann, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, LKML, Alexei Starovoitov,
	Daniel Borkmann


On Thu, Nov 17, 2016 at 07:30:29AM -0500, David Windsor wrote:
> On Thu, Nov 17, 2016 at 3:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > No, its not a statistic. Also, I'm far from convinced stats_t is an 
> > actually useful thing to have.
> >
> 
> Regarding this, has there been any thought given as to how stats_t 
> will meaningfully differ from atomic_t?  If refcount_t is semantically 
> "atomic_t with reference counter overflow protection," what 
> services/guarantees does stats_t provide?  I cannot think of any that 
> don't require implementing overflow detection of some sort, which 
> incurs a performance hit.

>Afaict the whole point of stats_t was to allow overflow, since its only stats, nobody cares etc..

>I think the sole motivator is a general distaste of atomic_t, which isn't a good reason at all.

I don't think anyone has this as motivation. But atomic_t is so powerful and flexible that easily ends up being misused (as past CVEs shown). 
Even if we now find all occurrences of atomic_t used as refcounter (which we cannot actually guarantee in any case unless someone manually reads every line)
and convert it to refcount_t, we still have atomic_t type present and new usage of it as refount will crawl in. It is just a matter of time IMO. 

So, this approach still doesn't solve the main problem: abuse of atomic_t a refcounter and  security vulnerabilities as result.  
What other mechanisms can we think we can utilize to prevent it? 
- Checkpatch? Would be hard to write enough rules to find all possible patterns how creative people might use atomic as refcounter. 
- People reviewing the code? Many kernel vulnerabilities live outside of core kernel, where maintainers are careful about what gets in and what's not. Further you go from core kernel (especially when you reach non-upstream drivers), code review quality is less, possibility of mistake is higher, and on average this code has more vulnerabilities. We can't say "this is not upstream code, who cares", because we want Linux kernel to follow "secure by default" principle: to provide enough mechanisms in kernel itself to minimize risk of mistakes and vulnerabilities. I think atomic is a great example of such case. We need to make it hard for people to make mistakes with overflows when overflows actually matter. 
This was really a reason for our initial approach that provided "security by default". Certainly it had some issues (we all agree on this), but let's think how else can we provide "secure by default" protection for this.  

 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17 13:01                   ` Reshetova, Elena
@ 2016-11-17 13:22                     ` Peter Zijlstra
  2016-11-17 15:42                       ` Reshetova, Elena
  2016-11-17 18:02                       ` Reshetova, Elena
  0 siblings, 2 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 13:22 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: David Windsor, Kees Cook, Greg KH, Will Deacon, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
	Alexei Starovoitov, Daniel Borkmann

On Thu, Nov 17, 2016 at 01:01:49PM +0000, Reshetova, Elena wrote:

> >I think the sole motivator is a general distaste of atomic_t, which isn't a good reason at all.
> 
> I don't think anyone has this as motivation. But atomic_t is so
> powerful and flexible that easily ends up being misused (as past CVEs
> shown). 

I don't think using atomic_t as reference count is abuse. There simply
wasn't anything better. The proposed refcount_t cures this.

> Even if we now find all occurrences of atomic_t used as
> refcounter (which we cannot actually guarantee in any case unless
> someone manually reads every line) and convert it to refcount_t, we
> still have atomic_t type present and new usage of it as refount will
> crawl in. It is just a matter of time IMO.

Improve tooling. The patterns shouldn't be _that_ hard to find. Once the
tools are good, new code isn't a problem either.

Anything: atomic*_{{dec,sub}_and_test,{add,sub}_return,fetch_{add,sub}}
followed by a call_rcu()/free().

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17 13:22                     ` Peter Zijlstra
@ 2016-11-17 15:42                       ` Reshetova, Elena
  2016-11-17 18:02                       ` Reshetova, Elena
  1 sibling, 0 replies; 96+ messages in thread
From: Reshetova, Elena @ 2016-11-17 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Windsor, Kees Cook, Greg KH, Will Deacon, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
	Alexei Starovoitov, Daniel Borkmann

> Even if we now find all occurrences of atomic_t used as refcounter 
> (which we cannot actually guarantee in any case unless someone 
> manually reads every line) and convert it to refcount_t, we still have 
> atomic_t type present and new usage of it as refount will crawl in. It 
> is just a matter of time IMO.

>Improve tooling. The patterns shouldn't be _that_ hard to find. Once the tools are good, new code isn't a problem either.

>Anything: atomic*_{{dec,sub}_and_test,{add,sub}_return,fetch_{add,sub}}
>followed by a call_rcu()/free().

Does not find everything unfortunately. Even if you add to above atomic*_add_unless() and also things like schedule_work(), still I fear we aren't covering everything. 
What is worse, I don't think there is a mechanism to guarantee full coverage. 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17 12:08           ` Will Deacon
@ 2016-11-17 16:11             ` Peter Zijlstra
  2016-11-17 16:36               ` Will Deacon
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 16:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Boqun Feng, gregkh, keescook, elena.reshetova, arnd, tglx, mingo,
	hpa, dave, linux-kernel

On Thu, Nov 17, 2016 at 12:08:36PM +0000, Will Deacon wrote:
> All sounds reasonable to me. It's worth pointing out that you can't create
> order using a control dependency hanging off the status flag of a
> store-conditional, but the code in question here has the dependency from
> the loaded value, which is sufficient.

Yeah, I'm always surprised by that 'feature'. Is that ARM specific? Or
so more LL/SC archs have this?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17 16:11             ` Peter Zijlstra
@ 2016-11-17 16:36               ` Will Deacon
  2016-11-18  8:26                 ` Boqun Feng
  0 siblings, 1 reply; 96+ messages in thread
From: Will Deacon @ 2016-11-17 16:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, gregkh, keescook, elena.reshetova, arnd, tglx, mingo,
	hpa, dave, linux-kernel

On Thu, Nov 17, 2016 at 05:11:10PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 17, 2016 at 12:08:36PM +0000, Will Deacon wrote:
> > All sounds reasonable to me. It's worth pointing out that you can't create
> > order using a control dependency hanging off the status flag of a
> > store-conditional, but the code in question here has the dependency from
> > the loaded value, which is sufficient.
> 
> Yeah, I'm always surprised by that 'feature'. Is that ARM specific? Or
> so more LL/SC archs have this?

In general, I'm not sure, but I think PPC does allow for the control
dependency.

Will

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17 13:22                     ` Peter Zijlstra
  2016-11-17 15:42                       ` Reshetova, Elena
@ 2016-11-17 18:02                       ` Reshetova, Elena
  2016-11-17 19:10                         ` Peter Zijlstra
  2016-11-17 19:29                         ` Peter Zijlstra
  1 sibling, 2 replies; 96+ messages in thread
From: Reshetova, Elena @ 2016-11-17 18:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Windsor, Kees Cook, Greg KH, Will Deacon, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
	Alexei Starovoitov, Daniel Borkmann


> Even if we now find all occurrences of atomic_t used as refcounter 
> (which we cannot actually guarantee in any case unless someone 
> manually reads every line) and convert it to refcount_t, we still have 
> atomic_t type present and new usage of it as refount will crawl in. It 
> is just a matter of time IMO.

>Improve tooling. The patterns shouldn't be _that_ hard to find. Once the tools are good, new code isn't a problem either.

Moreover, thinking of out of tree drivers: you think they would always do checkpatch or run some of our tools for security checks?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17 18:02                       ` Reshetova, Elena
@ 2016-11-17 19:10                         ` Peter Zijlstra
  2016-11-17 19:29                         ` Peter Zijlstra
  1 sibling, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 19:10 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: David Windsor, Kees Cook, Greg KH, Will Deacon, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
	Alexei Starovoitov, Daniel Borkmann

On Thu, Nov 17, 2016 at 06:02:33PM +0000, Reshetova, Elena wrote:
> 
> > Even if we now find all occurrences of atomic_t used as refcounter 
> > (which we cannot actually guarantee in any case unless someone 
> > manually reads every line) and convert it to refcount_t, we still have 
> > atomic_t type present and new usage of it as refount will crawl in. It 
> > is just a matter of time IMO.
> 
> >Improve tooling. The patterns shouldn't be _that_ hard to find. Once the tools are good, new code isn't a problem either.
> 
> Moreover, thinking of out of tree drivers: you think they would always
> do checkpatch or run some of our tools for security checks?

If they can't be arsed, neither can I. You can't fix the unfixable.

Like I said before, its chasing unicorns.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17 18:02                       ` Reshetova, Elena
  2016-11-17 19:10                         ` Peter Zijlstra
@ 2016-11-17 19:29                         ` Peter Zijlstra
  1 sibling, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-17 19:29 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: David Windsor, Kees Cook, Greg KH, Will Deacon, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
	Alexei Starovoitov, Daniel Borkmann

On Thu, Nov 17, 2016 at 06:02:33PM +0000, Reshetova, Elena wrote:

> >Improve tooling. The patterns shouldn't be _that_ hard to find. Once the tools are good, new code isn't a problem either.
> 
> Moreover, thinking of out of tree drivers: you think they would always
> do checkpatch or run some of our tools for security checks?

Also, checkpatch is a horrid example. That's mostly meaningless and
menial noise. Nobody wants to run that, even if, between all the
gibberish it lists a few sensible things.

Make an always enabled GCC plugin that generates build warns with a low
enough false positive rate and nobody will complain.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 2/7] kref: Add kref_read()
  2016-11-17  8:34             ` Peter Zijlstra
  2016-11-17 12:30               ` David Windsor
@ 2016-11-17 19:34               ` Kees Cook
  1 sibling, 0 replies; 96+ messages in thread
From: Kees Cook @ 2016-11-17 19:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Alexei Starovoitov, Daniel Borkmann

On Thu, Nov 17, 2016 at 12:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Nov 16, 2016 at 10:58:38AM -0800, Kees Cook wrote:
>> What I mean to say is that while the refcnt here should clearly be
>> converted to kref or refcount_t, it looks like locked_vm should become
>> a new stats_t. However, it seems weird for locked_vm to ever wrap
>> either...
>
> No, its not a statistic. Also, I'm far from convinced stats_t is an
> actually useful thing to have.

It's useful because its introduction creates a type that can't be
trivially used for refcounting (i.e. hard to make the mistake of using
stats_t for refcounting), and replacing atomic_t statistic counters
with stats_t reduces the effort required to do the initial (and
on-going) audit for misuse of atomic_t as a refcounter.

> refcount_t brought special semantics that clearly are different from
> regular atomic_t, stats_t would not, so why would it need to exist.

Your original suggestion about stats_t showed how its accessor API
would be a very small subset of the regular atomic_t set. I think that
reduction in accidental misuse has value.

> Not to mention that you seem over eager to apply it, which doesn't
> inspire confidence.

I'd like to get to the point where auditing for mistakes in this area
is tractable. :) If atomic_t is only used for non-stats and
non-refcount, it's much much easier to examine and reason about.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17  8:33                         ` Peter Zijlstra
@ 2016-11-17 19:50                           ` Kees Cook
  0 siblings, 0 replies; 96+ messages in thread
From: Kees Cook @ 2016-11-17 19:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Greg KH, Will Deacon, Reshetova, Elena,
	Arnd Bergmann, Thomas Gleixner, H. Peter Anvin, David Windsor,
	Linus Torvalds, LKML

On Thu, Nov 17, 2016 at 12:33 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Nov 16, 2016 at 10:55:16AM -0800, Kees Cook wrote:
>> My intention with what I'm designing is to couple the "panic_on_oops"
>
> There is a panic_on_warn knob too.

Yes, and that tends to be "too much". There is a need to create
something between warn and bug.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-17 16:36               ` Will Deacon
@ 2016-11-18  8:26                 ` Boqun Feng
  2016-11-18 10:16                   ` Will Deacon
  0 siblings, 1 reply; 96+ messages in thread
From: Boqun Feng @ 2016-11-18  8:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, gregkh, keescook, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1696 bytes --]

On Thu, Nov 17, 2016 at 04:36:24PM +0000, Will Deacon wrote:
> On Thu, Nov 17, 2016 at 05:11:10PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 17, 2016 at 12:08:36PM +0000, Will Deacon wrote:
> > > All sounds reasonable to me. It's worth pointing out that you can't create
> > > order using a control dependency hanging off the status flag of a
> > > store-conditional, but the code in question here has the dependency from
> > > the loaded value, which is sufficient.
> > 
> > Yeah, I'm always surprised by that 'feature'. Is that ARM specific? Or
> > so more LL/SC archs have this?
> 
> In general, I'm not sure, but I think PPC does allow for the control
> dependency.
> 

You guys mean the "control dependency" from a sc to subsequent WRITE,
like in the following litmus?

	PPC sc-control
	""
	{
	0:r11=x;0:r12=y;0:r3=1;0:r10=0;
	1:r11=x;1:r12=y;
	}

	P0                   | P1             ;
	lwarx r2, r10, r11   | lwz r2, 0(r12) ;
	stwcx. r3, r10, r11  | lwsync           ;
	bne Fail             |                ;
	stw r3, 0(r12)       | lwz r1, 0(r11) ;
	Fail:                |                ;

	exists
	(1:r2 = 1 /\ x = 1 /\ 1:r1 = 0)

PPCMEM and herd both said the exists-clause could be triggered
"Sometimes".

And ISA said:

"""
Because a Store Conditional instruction may complete before its store
has been performed, a conditional Branch instruction that depends on the
CR0 value set by a Store Conditional instruction does not order the
Store Conditional's store with respect to storage accesses caused by
instructions that follow the Branch.
"""

So ppc doesn't honor this "control dependency". ;-)

Regards,
Boqun


> Will

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
  2016-11-15  8:40   ` Ingo Molnar
  2016-11-15 12:33   ` Boqun Feng
@ 2016-11-18 10:07   ` Reshetova, Elena
  2016-11-18 11:37     ` Peter Zijlstra
  2016-11-18 10:47   ` Reshetova, Elena
  2017-01-26 23:14   ` Kees Cook
  4 siblings, 1 reply; 96+ messages in thread
From: Reshetova, Elena @ 2016-11-18 10:07 UTC (permalink / raw)
  To: Peter Zijlstra, gregkh, keescook, will.deacon, arnd, tglx, mingo,
	hpa, dave
  Cc: linux-kernel

>Provide refcount_t, an atomic_t like primitive built just for refcounting.
>It provides overflow and underflow checks as well as saturation semantics such that when it overflows, we'll never attempt to free it again, ever.

Peter do you have the changes to the refcount_t interface compare to the version in this patch? 
We are now starting working on atomic_t --> refcount_t conversions and it would save a bit of work to have latest version from you that we can be based upon. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18  8:26                 ` Boqun Feng
@ 2016-11-18 10:16                   ` Will Deacon
  0 siblings, 0 replies; 96+ messages in thread
From: Will Deacon @ 2016-11-18 10:16 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, gregkh, keescook, elena.reshetova, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Fri, Nov 18, 2016 at 04:26:34PM +0800, Boqun Feng wrote:
> On Thu, Nov 17, 2016 at 04:36:24PM +0000, Will Deacon wrote:
> > On Thu, Nov 17, 2016 at 05:11:10PM +0100, Peter Zijlstra wrote:
> > > On Thu, Nov 17, 2016 at 12:08:36PM +0000, Will Deacon wrote:
> > > > All sounds reasonable to me. It's worth pointing out that you can't create
> > > > order using a control dependency hanging off the status flag of a
> > > > store-conditional, but the code in question here has the dependency from
> > > > the loaded value, which is sufficient.
> > > 
> > > Yeah, I'm always surprised by that 'feature'. Is that ARM specific? Or
> > > so more LL/SC archs have this?
> > 
> > In general, I'm not sure, but I think PPC does allow for the control
> > dependency.
> > 
> 
> You guys mean the "control dependency" from a sc to subsequent WRITE,
> like in the following litmus?
> 
> 	PPC sc-control
> 	""
> 	{
> 	0:r11=x;0:r12=y;0:r3=1;0:r10=0;
> 	1:r11=x;1:r12=y;
> 	}
> 
> 	P0                   | P1             ;
> 	lwarx r2, r10, r11   | lwz r2, 0(r12) ;
> 	stwcx. r3, r10, r11  | lwsync           ;
> 	bne Fail             |                ;
> 	stw r3, 0(r12)       | lwz r1, 0(r11) ;
> 	Fail:                |                ;
> 
> 	exists
> 	(1:r2 = 1 /\ x = 1 /\ 1:r1 = 0)
> 
> PPCMEM and herd both said the exists-clause could be triggered
> "Sometimes".
> 
> And ISA said:
> 
> """
> Because a Store Conditional instruction may complete before its store
> has been performed, a conditional Branch instruction that depends on the
> CR0 value set by a Store Conditional instruction does not order the
> Store Conditional's store with respect to storage accesses caused by
> instructions that follow the Branch.
> """
> 
> So ppc doesn't honor this "control dependency". ;-)

Oh, wow! I was feeling like the odd duck after talking to paulmck and
Alan Stern about this at LPC, but now it looks like I have company on
the island of misfit memory models.

Will

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
                     ` (2 preceding siblings ...)
  2016-11-18 10:07   ` Reshetova, Elena
@ 2016-11-18 10:47   ` Reshetova, Elena
  2016-11-18 10:52     ` Peter Zijlstra
  2017-01-26 23:14   ` Kees Cook
  4 siblings, 1 reply; 96+ messages in thread
From: Reshetova, Elena @ 2016-11-18 10:47 UTC (permalink / raw)
  To: Peter Zijlstra, gregkh, keescook, will.deacon, arnd, tglx, mingo,
	hpa, dave
  Cc: linux-kernel

>Provide refcount_t, an atomic_t like primitive built just for refcounting.
>It provides overflow and underflow checks as well as saturation semantics such that when it overflows, we'll never attempt to free it again, ever.

>Peter do you have the changes to the refcount_t interface compare to the version in this patch? 
>We are now starting working on atomic_t --> refcount_t conversions and it would save a bit of work to have latest version from you that we can be based upon. 

Oh, and if we define refcount_t to be just atomic_t underneath, what about the other atomic_long_t, local_t and atomic64_t cases when it is used for recounting? 
I don't feel good just simply changing them to become atomic_t under refcount_t wrapper..... 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 10:47   ` Reshetova, Elena
@ 2016-11-18 10:52     ` Peter Zijlstra
  2016-11-18 16:58       ` Reshetova, Elena
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-18 10:52 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: gregkh, keescook, will.deacon, arnd, tglx, mingo, hpa, dave,
	linux-kernel


Could you please fix you mailer to not unwrap the emails?

On Fri, Nov 18, 2016 at 10:47:40AM +0000, Reshetova, Elena wrote:
> >Provide refcount_t, an atomic_t like primitive built just for
> >refcounting.  It provides overflow and underflow checks as well as
> >saturation semantics such that when it overflows, we'll never attempt
> >to free it again, ever.
> 
> >Peter do you have the changes to the refcount_t interface compare to
> >the version in this patch?  We are now starting working on atomic_t
> >--> refcount_t conversions and it would save a bit of work to have
> >latest version from you that we can be based upon. 
> 
> Oh, and if we define refcount_t to be just atomic_t underneath, what
> about the other atomic_long_t, local_t and atomic64_t cases when it is
> used for recounting?  I don't feel good just simply changing them to
> become atomic_t under refcount_t wrapper..... 

Is there anybody using local_t ? That seems 'creative' and highly
questionable.

As for atomic_long_t there's very few, I'd leave them be for now, and I
couldn't find a single atomic64_t refcount user.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 10:07   ` Reshetova, Elena
@ 2016-11-18 11:37     ` Peter Zijlstra
  2016-11-18 17:06       ` Will Deacon
  2016-11-21  8:44       ` Boqun Feng
  0 siblings, 2 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-18 11:37 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: gregkh, keescook, will.deacon, arnd, tglx, mingo, hpa, dave,
	linux-kernel, boqun.feng

On Fri, Nov 18, 2016 at 10:07:26AM +0000, Reshetova, Elena wrote:
> 
> Peter do you have the changes to the refcount_t interface compare to
> the version in this patch? 

> We are now starting working on atomic_t --> refcount_t conversions and
> it would save a bit of work to have latest version from you that we
> can be based upon. 

The latestest version below, mostly just comment changes since last
time.

---
Subject: refcount_t: A special purpose refcount type
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon Nov 14 18:06:19 CET 2016

Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides saturation semantics such that overflow becomes impossible
and thereby 'spurious' use-after-free is avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h |  241 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 241 insertions(+)

--- /dev/null
+++ b/include/linux/refcount.h
@@ -0,0 +1,241 @@
+#ifndef _LINUX_REFCOUNT_H
+#define _LINUX_REFCOUNT_H
+
+/*
+ * Variant of atomic_t specialized for reference counts.
+ *
+ * The interface matches the atomic_t interface (to aid in porting) but only
+ * provides the few functions one should use for reference counting.
+ *
+ * It differs in that the counter saturates at UINT_MAX and will not move once
+ * there. This avoids wrapping the counter and causing 'spurious'
+ * use-after-free issues.
+ *
+ * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
+ * and provide only what is strictly required for refcounts.
+ *
+ * The increments are fully relaxed; these will not provide ordering. The
+ * rationale is that whatever is used to obtain the object we're increasing the
+ * reference count on will provide the ordering. For locked data structures,
+ * its the lock acquire, for RCU/lockless data structures its the dependent
+ * load.
+ *
+ * Do note that inc_not_zero() provides a control dependency which will order
+ * future stores against the inc, this ensures we'll never modify the object
+ * if we did not in fact acquire a reference.
+ *
+ * The decrements will provide release order, such that all the prior loads and
+ * stores will be issued before, it also provides a control dependency, which
+ * will order us against the subsequent free().
+ *
+ * The control dependency is against the load of the cmpxchg (ll/sc) that
+ * succeeded. This means the stores aren't fully ordered, but this is fine
+ * because the 1->0 transition indicates no concurrency.
+ *
+ * Note that the allocator is responsible for ordering things between free()
+ * and alloc().
+ *
+ *
+ * Note: the implementation hard relies on increments, bigger than 1 additions
+ *       need explicit overflow -> saturation logic.
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+
+typedef struct refcount_struct {
+	atomic_t refs;
+} refcount_t;
+
+#define REFCOUNT_INIT(n)	{ .refs = ATOMIC_INIT(n), }
+
+static inline void refcount_set(refcount_t *r, int n)
+{
+	atomic_set(&r->refs, n);
+}
+
+static inline unsigned int refcount_read(const refcount_t *r)
+{
+	return atomic_read(&r->refs);
+}
+
+/*
+ * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller already has a
+ * reference on the object, will WARN when this is not so.
+ */
+static inline void refcount_inc(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
+
+		if (unlikely(val == UINT_MAX))
+			return;
+
+		new = val + 1;
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+}
+
+/*
+ * Similar to atomic_inc_not_zero(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller has guaranteed the
+ * object memory to be stable (RCU, etc.). It does provide a control dependency
+ * and thereby orders future stores. See the comment on top.
+ */
+static inline __must_check
+bool refcount_inc_not_zero(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (!val)
+			return false;
+
+		if (unlikely(val == UINT_MAX))
+			return true;
+
+		new = val + 1;
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+
+	return true;
+}
+
+/*
+ * Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __must_check
+bool refcount_dec_and_test(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		new = val - 1;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n"))
+			return false;
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return !new;
+}
+
+/*
+ * Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
+ * to decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __must_check
+bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+	bool locked = false;
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		if (val == 1 && !locked) {
+			locked = true;
+			mutex_lock(lock);
+		}
+
+		new = val - 1;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n")) {
+			if (locked)
+				mutex_unlock(lock);
+			return false;
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	if (new && locked)
+		mutex_unlock(lock);
+
+	return !new;
+}
+
+/*
+ * Similar to atomic_dec_and_lock(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __must_check
+bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+	bool locked = false;
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		if (val == 1 && !locked) {
+			locked = true;
+			spin_lock(lock);
+		}
+
+		new = val - 1;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n")) {
+			if (locked)
+				spin_unlock(lock);
+			return false;
+		}
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	if (new && locked)
+		spin_unlock(lock);
+
+	return !new;
+}
+
+#endif /* _LINUX_REFCOUNT_H */

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 10:52     ` Peter Zijlstra
@ 2016-11-18 16:58       ` Reshetova, Elena
  2016-11-18 18:53         ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Reshetova, Elena @ 2016-11-18 16:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, arnd, tglx, mingo, hpa, dave,
	linux-kernel

> Could you please fix you mailer to not unwrap the emails?

I wish I understand what you mean by "unwrap"... ?

On Fri, Nov 18, 2016 at 10:47:40AM +0000, Reshetova, Elena wrote:
> >Provide refcount_t, an atomic_t like primitive built just for 
> >refcounting.  It provides overflow and underflow checks as well as 
> >saturation semantics such that when it overflows, we'll never attempt 
> >to free it again, ever.
> 
> >Peter do you have the changes to the refcount_t interface compare to 
> >the version in this patch?  We are now starting working on atomic_t
> >--> refcount_t conversions and it would save a bit of work to have
> >latest version from you that we can be based upon. 
> 
> Oh, and if we define refcount_t to be just atomic_t underneath, what 
> about the other atomic_long_t, local_t and atomic64_t cases when it is 
> used for recounting?  I don't feel good just simply changing them to 
> become atomic_t under refcount_t wrapper.....

> Is there anybody using local_t ? That seems 'creative' and highly questionable.
I am not yet sure about refcounts, but local_t itself is used in couple of places. 

>As for atomic_long_t there's very few, I'd leave them be for now, 
Ok, I have started a list on them to keep track, but we need to do them also. There is no reason for them not to be refcounts, since so far the ones I see are classical refcounts. 

>and I couldn't find a single atomic64_t refcount user.
I will check when I get over the atomic_t and atomic_long.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 11:37     ` Peter Zijlstra
@ 2016-11-18 17:06       ` Will Deacon
  2016-11-18 18:57         ` Peter Zijlstra
  2016-11-21  4:06         ` Boqun Feng
  2016-11-21  8:44       ` Boqun Feng
  1 sibling, 2 replies; 96+ messages in thread
From: Will Deacon @ 2016-11-18 17:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, gregkh, keescook, arnd, tglx, mingo, hpa, dave,
	linux-kernel, boqun.feng

On Fri, Nov 18, 2016 at 12:37:18PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 18, 2016 at 10:07:26AM +0000, Reshetova, Elena wrote:
> > 
> > Peter do you have the changes to the refcount_t interface compare to
> > the version in this patch? 
> 
> > We are now starting working on atomic_t --> refcount_t conversions and
> > it would save a bit of work to have latest version from you that we
> > can be based upon. 
> 
> The latestest version below, mostly just comment changes since last
> time.
> 
> ---
> Subject: refcount_t: A special purpose refcount type
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon Nov 14 18:06:19 CET 2016
> 
> Provide refcount_t, an atomic_t like primitive built just for
> refcounting.
> 
> It provides saturation semantics such that overflow becomes impossible
> and thereby 'spurious' use-after-free is avoided.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/refcount.h |  241 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 241 insertions(+)
> 
> --- /dev/null
> +++ b/include/linux/refcount.h
> @@ -0,0 +1,241 @@
> +#ifndef _LINUX_REFCOUNT_H
> +#define _LINUX_REFCOUNT_H
> +
> +/*
> + * Variant of atomic_t specialized for reference counts.
> + *
> + * The interface matches the atomic_t interface (to aid in porting) but only
> + * provides the few functions one should use for reference counting.
> + *
> + * It differs in that the counter saturates at UINT_MAX and will not move once
> + * there. This avoids wrapping the counter and causing 'spurious'
> + * use-after-free issues.
> + *
> + * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
> + * and provide only what is strictly required for refcounts.
> + *
> + * The increments are fully relaxed; these will not provide ordering. The
> + * rationale is that whatever is used to obtain the object we're increasing the
> + * reference count on will provide the ordering. For locked data structures,
> + * its the lock acquire, for RCU/lockless data structures its the dependent
> + * load.
> + *
> + * Do note that inc_not_zero() provides a control dependency which will order
> + * future stores against the inc, this ensures we'll never modify the object
> + * if we did not in fact acquire a reference.
> + *
> + * The decrements will provide release order, such that all the prior loads and
> + * stores will be issued before, it also provides a control dependency, which
> + * will order us against the subsequent free().
> + *
> + * The control dependency is against the load of the cmpxchg (ll/sc) that
> + * succeeded. This means the stores aren't fully ordered, but this is fine
> + * because the 1->0 transition indicates no concurrency.
> + *
> + * Note that the allocator is responsible for ordering things between free()
> + * and alloc().
> + *
> + *
> + * Note: the implementation hard relies on increments, bigger than 1 additions
> + *       need explicit overflow -> saturation logic.
> + *
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/mutex.h>
> +#include <linux/spinlock.h>
> +
> +typedef struct refcount_struct {
> +	atomic_t refs;
> +} refcount_t;
> +
> +#define REFCOUNT_INIT(n)	{ .refs = ATOMIC_INIT(n), }
> +
> +static inline void refcount_set(refcount_t *r, int n)
> +{
> +	atomic_set(&r->refs, n);
> +}
> +
> +static inline unsigned int refcount_read(const refcount_t *r)
> +{
> +	return atomic_read(&r->refs);
> +}

Minor nit, but it might be worth being consistent in our usage of int
(parameter to refcount_set) and unsigned int (return value of
refcount_read).

> +
> +/*
> + * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
> + *
> + * Provides no memory ordering, it is assumed the caller already has a
> + * reference on the object, will WARN when this is not so.
> + */
> +static inline void refcount_inc(refcount_t *r)
> +{
> +	unsigned int old, new, val = atomic_read(&r->refs);
> +
> +	for (;;) {
> +		WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
> +
> +		if (unlikely(val == UINT_MAX))
> +			return;
> +
> +		new = val + 1;
> +		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> +		if (old == val)
> +			break;
> +
> +		val = old;
> +	}
> +
> +	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> +}
> +
> +/*
> + * Similar to atomic_inc_not_zero(), will saturate at UINT_MAX and WARN.
> + *
> + * Provides no memory ordering, it is assumed the caller has guaranteed the
> + * object memory to be stable (RCU, etc.). It does provide a control dependency
> + * and thereby orders future stores. See the comment on top.
> + */
> +static inline __must_check
> +bool refcount_inc_not_zero(refcount_t *r)
> +{
> +	unsigned int old, new, val = atomic_read(&r->refs);
> +
> +	for (;;) {
> +		if (!val)
> +			return false;
> +
> +		if (unlikely(val == UINT_MAX))
> +			return true;
> +
> +		new = val + 1;
> +		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> +		if (old == val)
> +			break;
> +
> +		val = old;

Hmm, it's a shame this code is duplicated from refcount_inc, but I suppose
you can actually be racing against the counter going to zero here and really
need to check it each time round the loop. Humph. That said, given that
refcount_inc WARNs if the thing is zero, maybe that could just call
refcount_inc_not_zero and warn if it returns false? Does it matter that
we don't actually do the increment?

> +	}
> +
> +	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> +
> +	return true;
> +}
> +
> +/*
> + * Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
> + * decrement when saturated at UINT_MAX.

It also fails to decrement in the underflow case (which is fine, but not
obvious from the comment). Same thing below.

> + *
> + * Provides release memory ordering, such that prior loads and stores are done
> + * before, and provides a control dependency such that free() must come after.
> + * See the comment on top.
> + */
> +static inline __must_check
> +bool refcount_dec_and_test(refcount_t *r)
> +{
> +	unsigned int old, new, val = atomic_read(&r->refs);
> +
> +	for (;;) {
> +		if (val == UINT_MAX)
> +			return false;
> +
> +		new = val - 1;
> +		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n"))
> +			return false;

Wouldn't it be clearer to compare val with 0 before doing the decrement?

Will

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 16:58       ` Reshetova, Elena
@ 2016-11-18 18:53         ` Peter Zijlstra
  2016-11-19  7:14           ` Reshetova, Elena
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-18 18:53 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: gregkh, keescook, will.deacon, arnd, tglx, mingo, hpa, dave,
	linux-kernel

On Fri, Nov 18, 2016 at 04:58:52PM +0000, Reshetova, Elena wrote:
> > Could you please fix you mailer to not unwrap the emails?
> 
> I wish I understand what you mean by "unwrap"... ?

Where I always have lines wrapped at 78 characters, but often when I see
them back in your reply, they're unwrapped and go on forever.

For some reason your mailer reflows text and mucks with whitespace. I
know Outlook likes to do this by default.

> On Fri, Nov 18, 2016 at 10:47:40AM +0000, Reshetova, Elena wrote:

> > Oh, and if we define refcount_t to be just atomic_t underneath, what 
> > about the other atomic_long_t, local_t and atomic64_t cases when it is 
> > used for recounting?  I don't feel good just simply changing them to 
> > become atomic_t under refcount_t wrapper.....
> 
> > Is there anybody using local_t ? That seems 'creative' and highly questionable.
> I am not yet sure about refcounts, but local_t itself is used in couple of places. 

Sure, there's local_t usage, but I'd be very surprised if there's a
single refcount usage among them.

> >As for atomic_long_t there's very few, I'd leave them be for now, 

> Ok, I have started a list on them to keep track, but we need to do
> them also. There is no reason for them not to be refcounts, since so
> far the ones I see are classical refcounts. 

Well, if you get to tools (cocci script or whatever) to reliably work
fork atomic_t, then converting the few atomic_long_t's later should be
trivial.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 17:06       ` Will Deacon
@ 2016-11-18 18:57         ` Peter Zijlstra
  2016-11-21  4:06         ` Boqun Feng
  1 sibling, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-18 18:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Reshetova, Elena, gregkh, keescook, arnd, tglx, mingo, hpa, dave,
	linux-kernel, boqun.feng

On Fri, Nov 18, 2016 at 05:06:55PM +0000, Will Deacon wrote:
> On Fri, Nov 18, 2016 at 12:37:18PM +0100, Peter Zijlstra wrote:
> > +static inline void refcount_set(refcount_t *r, int n)
> > +{
> > +	atomic_set(&r->refs, n);
> > +}
> > +
> > +static inline unsigned int refcount_read(const refcount_t *r)
> > +{
> > +	return atomic_read(&r->refs);
> > +}
> 
> Minor nit, but it might be worth being consistent in our usage of int
> (parameter to refcount_set) and unsigned int (return value of
> refcount_read).

Duh, I actually spotted that once and still didn't fix that :/

> > +static inline __must_check
> > +bool refcount_inc_not_zero(refcount_t *r)
> > +{
> > +	unsigned int old, new, val = atomic_read(&r->refs);
> > +
> > +	for (;;) {
> > +		if (!val)
> > +			return false;
> > +
> > +		if (unlikely(val == UINT_MAX))
> > +			return true;
> > +
> > +		new = val + 1;
> > +		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> > +		if (old == val)
> > +			break;
> > +
> > +		val = old;
> 
> Hmm, it's a shame this code is duplicated from refcount_inc, but I suppose
> you can actually be racing against the counter going to zero here and really
> need to check it each time round the loop. Humph. That said, given that
> refcount_inc WARNs if the thing is zero, maybe that could just call
> refcount_inc_not_zero and warn if it returns false? Does it matter that
> we don't actually do the increment?

Dunno, it _should_ not, but then again, who knows.

I can certainly write it as WARN_ON(!refcount_inc_not_zero());

> > +static inline __must_check
> > +bool refcount_dec_and_test(refcount_t *r)
> > +{
> > +	unsigned int old, new, val = atomic_read(&r->refs);
> > +
> > +	for (;;) {
> > +		if (val == UINT_MAX)
> > +			return false;
> > +
> > +		new = val - 1;
> > +		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n"))
> > +			return false;
> 
> Wouldn't it be clearer to compare val with 0 before doing the decrement?

Maybe, this way you can change the 1 and it'll keep working. Then again,
you can't do that with the inc side, so who cares.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 18:53         ` Peter Zijlstra
@ 2016-11-19  7:14           ` Reshetova, Elena
  2016-11-19 11:45             ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Reshetova, Elena @ 2016-11-19  7:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gregkh, keescook, will.deacon, arnd, tglx, mingo, hpa, dave,
	linux-kernel


> On Fri, Nov 18, 2016 at 04:58:52PM +0000, Reshetova, Elena wrote:
> > > Could you please fix you mailer to not unwrap the emails?
> >
> > I wish I understand what you mean by "unwrap"... ?
> 
> Where I always have lines wrapped at 78 characters, but often when I see
> them back in your reply, they're unwrapped and go on forever.
> 
> For some reason your mailer reflows text and mucks with whitespace. I
> know Outlook likes to do this by default.

Ok, I think I managed to fix it. Hope it looks better now. 
 
> > On Fri, Nov 18, 2016 at 10:47:40AM +0000, Reshetova, Elena wrote:
> 
> > > Oh, and if we define refcount_t to be just atomic_t underneath, what
> > > about the other atomic_long_t, local_t and atomic64_t cases when it is
> > > used for recounting?  I don't feel good just simply changing them to
> > > become atomic_t under refcount_t wrapper.....
> >
> > > Is there anybody using local_t ? That seems 'creative' and highly
> questionable.
> > I am not yet sure about refcounts, but local_t itself is used in couple of places.
> 
> Sure, there's local_t usage, but I'd be very surprised if there's a
> single refcount usage among them.
> 
> > >As for atomic_long_t there's very few, I'd leave them be for now,
> 
> > Ok, I have started a list on them to keep track, but we need to do
> > them also. There is no reason for them not to be refcounts, since so
> > far the ones I see are classical refcounts.
> 
> Well, if you get to tools (cocci script or whatever) to reliably work
> fork atomic_t, then converting the few atomic_long_t's later should be
> trivial.

I am using coccinelle to find all occurrences, but I do the changes only in semi-automated fashion.
Each change needs a proper manual review anyway and often one variable usage is spread between different headers/source files,
so I prefer not to go to full automation and then not being sure what I have done. 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-19  7:14           ` Reshetova, Elena
@ 2016-11-19 11:45             ` Peter Zijlstra
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-19 11:45 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: gregkh, keescook, will.deacon, arnd, tglx, mingo, hpa, dave,
	linux-kernel

On Sat, Nov 19, 2016 at 07:14:08AM +0000, Reshetova, Elena wrote:
> > Well, if you get to tools (cocci script or whatever) to reliably work
> > fork atomic_t, then converting the few atomic_long_t's later should be
> > trivial.
> 
> I am using coccinelle to find all occurrences, but I do the changes
> only in semi-automated fashion.

If you can get the detection solid, that's good enough.

> Each change needs a proper manual review anyway and often one variable
> usage is spread between different headers/source files, so I prefer
> not to go to full automation and then not being sure what I have done.

Sure, every patch needs review, regardless of how it came to be.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 17:06       ` Will Deacon
  2016-11-18 18:57         ` Peter Zijlstra
@ 2016-11-21  4:06         ` Boqun Feng
  2016-11-21  7:48           ` Ingo Molnar
  1 sibling, 1 reply; 96+ messages in thread
From: Boqun Feng @ 2016-11-21  4:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Reshetova, Elena, gregkh, keescook, arnd, tglx,
	mingo, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7759 bytes --]

On Fri, Nov 18, 2016 at 05:06:55PM +0000, Will Deacon wrote:
> On Fri, Nov 18, 2016 at 12:37:18PM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 18, 2016 at 10:07:26AM +0000, Reshetova, Elena wrote:
> > > 
> > > Peter do you have the changes to the refcount_t interface compare to
> > > the version in this patch? 
> > 
> > > We are now starting working on atomic_t --> refcount_t conversions and
> > > it would save a bit of work to have latest version from you that we
> > > can be based upon. 
> > 
> > The latestest version below, mostly just comment changes since last
> > time.
> > 
> > ---
> > Subject: refcount_t: A special purpose refcount type
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Mon Nov 14 18:06:19 CET 2016
> > 
> > Provide refcount_t, an atomic_t like primitive built just for
> > refcounting.
> > 
> > It provides saturation semantics such that overflow becomes impossible
> > and thereby 'spurious' use-after-free is avoided.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/linux/refcount.h |  241 +++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 241 insertions(+)
> > 
> > --- /dev/null
> > +++ b/include/linux/refcount.h
> > @@ -0,0 +1,241 @@
> > +#ifndef _LINUX_REFCOUNT_H
> > +#define _LINUX_REFCOUNT_H
> > +
> > +/*
> > + * Variant of atomic_t specialized for reference counts.
> > + *
> > + * The interface matches the atomic_t interface (to aid in porting) but only
> > + * provides the few functions one should use for reference counting.
> > + *
> > + * It differs in that the counter saturates at UINT_MAX and will not move once
> > + * there. This avoids wrapping the counter and causing 'spurious'
> > + * use-after-free issues.
> > + *
> > + * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
> > + * and provide only what is strictly required for refcounts.
> > + *
> > + * The increments are fully relaxed; these will not provide ordering. The
> > + * rationale is that whatever is used to obtain the object we're increasing the
> > + * reference count on will provide the ordering. For locked data structures,
> > + * its the lock acquire, for RCU/lockless data structures its the dependent
> > + * load.
> > + *
> > + * Do note that inc_not_zero() provides a control dependency which will order
> > + * future stores against the inc, this ensures we'll never modify the object
> > + * if we did not in fact acquire a reference.
> > + *
> > + * The decrements will provide release order, such that all the prior loads and
> > + * stores will be issued before, it also provides a control dependency, which
> > + * will order us against the subsequent free().
> > + *
> > + * The control dependency is against the load of the cmpxchg (ll/sc) that
> > + * succeeded. This means the stores aren't fully ordered, but this is fine
> > + * because the 1->0 transition indicates no concurrency.
> > + *
> > + * Note that the allocator is responsible for ordering things between free()
> > + * and alloc().
> > + *
> > + *
> > + * Note: the implementation hard relies on increments, bigger than 1 additions
> > + *       need explicit overflow -> saturation logic.
> > + *
> > + */
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bug.h>
> > +#include <linux/mutex.h>
> > +#include <linux/spinlock.h>
> > +
> > +typedef struct refcount_struct {
> > +	atomic_t refs;
> > +} refcount_t;
> > +
> > +#define REFCOUNT_INIT(n)	{ .refs = ATOMIC_INIT(n), }
> > +
> > +static inline void refcount_set(refcount_t *r, int n)
> > +{
> > +	atomic_set(&r->refs, n);
> > +}
> > +
> > +static inline unsigned int refcount_read(const refcount_t *r)
> > +{
> > +	return atomic_read(&r->refs);
> > +}
> 
> Minor nit, but it might be worth being consistent in our usage of int
> (parameter to refcount_set) and unsigned int (return value of
> refcount_read).
> 
> > +
> > +/*
> > + * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
> > + *
> > + * Provides no memory ordering, it is assumed the caller already has a
> > + * reference on the object, will WARN when this is not so.
> > + */
> > +static inline void refcount_inc(refcount_t *r)
> > +{
> > +	unsigned int old, new, val = atomic_read(&r->refs);
> > +
> > +	for (;;) {
> > +		WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
> > +
> > +		if (unlikely(val == UINT_MAX))
> > +			return;
> > +
> > +		new = val + 1;
> > +		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> > +		if (old == val)
> > +			break;
> > +
> > +		val = old;
> > +	}
> > +
> > +	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> > +}
> > +
> > +/*
> > + * Similar to atomic_inc_not_zero(), will saturate at UINT_MAX and WARN.
> > + *
> > + * Provides no memory ordering, it is assumed the caller has guaranteed the
> > + * object memory to be stable (RCU, etc.). It does provide a control dependency
> > + * and thereby orders future stores. See the comment on top.
> > + */
> > +static inline __must_check
> > +bool refcount_inc_not_zero(refcount_t *r)
> > +{
> > +	unsigned int old, new, val = atomic_read(&r->refs);
> > +
> > +	for (;;) {
> > +		if (!val)
> > +			return false;
> > +
> > +		if (unlikely(val == UINT_MAX))
> > +			return true;
> > +
> > +		new = val + 1;
> > +		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> > +		if (old == val)
> > +			break;
> > +
> > +		val = old;
> 
> Hmm, it's a shame this code is duplicated from refcount_inc, but I suppose
> you can actually be racing against the counter going to zero here and really
> need to check it each time round the loop. Humph. That said, given that
> refcount_inc WARNs if the thing is zero, maybe that could just call
> refcount_inc_not_zero and warn if it returns false? Does it matter that
> we don't actually do the increment?
> 
> > +	}
> > +
> > +	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> > +
> > +	return true;
> > +}
> > +
> > +/*
> > + * Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
> > + * decrement when saturated at UINT_MAX.
> 
> It also fails to decrement in the underflow case (which is fine, but not
> obvious from the comment). Same thing below.
> 

Maybe a table in the comment like the following helps?

/*
 * T: return true, F: return fasle
 * W: trigger WARNING
 * N: no effect
 *
 *                      |       value before ops                  |
 *                      |   0   |   1   | UINT_MAX - 1 | UINT_MAX |
 * ---------------------+-------+-------+--------------+----------+
 * inc()                |  W    |       |      W       |      N   |
 * inc_not_zero()       |   FN  |   T   |      WT      |    WTN   |
 * dec_and_test()       |  WFN  |   T   |       F      |     FN   |
 * dec_and_mutex_lock() |  WFN  |   T   |       F      |     FN   |
 * dec_and_spin_lock()  |  WFN  |   T   |       F      |     FN   |
 */

Regards,
Boqun


> > + *
> > + * Provides release memory ordering, such that prior loads and stores are done
> > + * before, and provides a control dependency such that free() must come after.
> > + * See the comment on top.
> > + */
> > +static inline __must_check
> > +bool refcount_dec_and_test(refcount_t *r)
> > +{
> > +	unsigned int old, new, val = atomic_read(&r->refs);
> > +
> > +	for (;;) {
> > +		if (val == UINT_MAX)
> > +			return false;
> > +
> > +		new = val - 1;
> > +		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n"))
> > +			return false;
> 
> Wouldn't it be clearer to compare val with 0 before doing the decrement?
> 
> Will

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-21  4:06         ` Boqun Feng
@ 2016-11-21  7:48           ` Ingo Molnar
  2016-11-21  8:38             ` Boqun Feng
  0 siblings, 1 reply; 96+ messages in thread
From: Ingo Molnar @ 2016-11-21  7:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Peter Zijlstra, Reshetova, Elena, gregkh, keescook,
	arnd, tglx, hpa, dave, linux-kernel


* Boqun Feng <boqun.feng@gmail.com> wrote:

> > It also fails to decrement in the underflow case (which is fine, but not
> > obvious from the comment). Same thing below.
> > 
> 
> Maybe a table in the comment like the following helps?
> 
> /*
>  * T: return true, F: return fasle
>  * W: trigger WARNING
>  * N: no effect
>  *
>  *                      |       value before ops                  |
>  *                      |   0   |   1   | UINT_MAX - 1 | UINT_MAX |
>  * ---------------------+-------+-------+--------------+----------+
>  * inc()                |  W    |       |      W       |      N   |
>  * inc_not_zero()       |   FN  |   T   |      WT      |    WTN   |
>  * dec_and_test()       |  WFN  |   T   |       F      |     FN   |
>  * dec_and_mutex_lock() |  WFN  |   T   |       F      |     FN   |
>  * dec_and_spin_lock()  |  WFN  |   T   |       F      |     FN   |
>  */

Yes!

nit: s/fasle/false

Also, I think we want to do a couple of other changes as well to make it more 
readable, extend the columns with 'normal' values (2 and UINT_MAX-2) and order the 
colums properly. I.e. something like:

/*
 * The before/after outcome of various atomic ops:
 *
 *   T: returns true
 *   F: returns false
 *   ----------------------------------
 *   W: op triggers kernel WARNING
 *   ----------------------------------
 *   0: no change to atomic var value
 *   +: atomic var value increases by 1
 *   -: atomic var value decreases by 1
 *   ----------------------------------
 *  -1: UINT_MAX
 *  -2: UINT_MAX-1
 *  -3: UINT_MAX-2
 *
 * ---------------------+-----+-----+-----+-----+-----+-----+
 * value before:        |  -3 |  -2 |  -1 |   0 |   1 |   2 |
 * ---------------------+-----+-----+-----+-----+-----+-----+
 * value+effect after:                                      |
 * ---------------------+     |     |     |     |     |     |
 * inc()                | ..+ | W.+ | ..0 | W.+ | ..+ | ..+ |
 * inc_not_zero()       | .T+ | WT+ | WT0 | .F0 | .T+ | .T+ |
 * dec_and_test()       | .F- | .F- | .F0 | WF0 | .T- | .F- |
 * dec_and_mutex_lock() | .F- | .F- | .F0 | WF0 | .T- | .F- |
 * dec_and_spin_lock()  | .F- | .F- | .F0 | WF0 | .T- | .F- |
 * ---------------------+-----+-----+-----+-----+-----+-----+
 *
 * So for example: 'WT+' in the inc_not_zero() row and '-2' column
 * means that when the atomic_inc_not_zero() function is called
 * with an atomic var that has a value of UINT_MAX-1, then the
 * atomic var's value will increase to the maximum overflow value
 * of UINT_MAX and will produce a warning. The function returns
 * 'true'.
 */

I think this table makes the overflow/underflow semantics pretty clear and also 
documents the regular behavior of these atomic ops pretty intuitively.

Agreed?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-21  7:48           ` Ingo Molnar
@ 2016-11-21  8:38             ` Boqun Feng
  0 siblings, 0 replies; 96+ messages in thread
From: Boqun Feng @ 2016-11-21  8:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Will Deacon, Peter Zijlstra, Reshetova, Elena, gregkh, keescook,
	arnd, tglx, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3168 bytes --]

On Mon, Nov 21, 2016 at 08:48:26AM +0100, Ingo Molnar wrote:
> 
> * Boqun Feng <boqun.feng@gmail.com> wrote:
> 
> > > It also fails to decrement in the underflow case (which is fine, but not
> > > obvious from the comment). Same thing below.
> > > 
> > 
> > Maybe a table in the comment like the following helps?
> > 
> > /*
> >  * T: return true, F: return fasle
> >  * W: trigger WARNING
> >  * N: no effect
> >  *
> >  *                      |       value before ops                  |
> >  *                      |   0   |   1   | UINT_MAX - 1 | UINT_MAX |
> >  * ---------------------+-------+-------+--------------+----------+
> >  * inc()                |  W    |       |      W       |      N   |
> >  * inc_not_zero()       |   FN  |   T   |      WT      |    WTN   |
> >  * dec_and_test()       |  WFN  |   T   |       F      |     FN   |
> >  * dec_and_mutex_lock() |  WFN  |   T   |       F      |     FN   |
> >  * dec_and_spin_lock()  |  WFN  |   T   |       F      |     FN   |
> >  */
> 
> Yes!
> 
> nit: s/fasle/false
> 
> Also, I think we want to do a couple of other changes as well to make it more 
> readable, extend the columns with 'normal' values (2 and UINT_MAX-2) and order the 
> colums properly. I.e. something like:
> 
> /*
>  * The before/after outcome of various atomic ops:
>  *
>  *   T: returns true
>  *   F: returns false
>  *   ----------------------------------
>  *   W: op triggers kernel WARNING
>  *   ----------------------------------
>  *   0: no change to atomic var value
>  *   +: atomic var value increases by 1
>  *   -: atomic var value decreases by 1
>  *   ----------------------------------
>  *  -1: UINT_MAX
>  *  -2: UINT_MAX-1
>  *  -3: UINT_MAX-2
>  *
>  * ---------------------+-----+-----+-----+-----+-----+-----+
>  * value before:        |  -3 |  -2 |  -1 |   0 |   1 |   2 |
>  * ---------------------+-----+-----+-----+-----+-----+-----+
>  * value+effect after:                                      |
>  * ---------------------+     |     |     |     |     |     |
>  * inc()                | ..+ | W.+ | ..0 | W.+ | ..+ | ..+ |
>  * inc_not_zero()       | .T+ | WT+ | WT0 | .F0 | .T+ | .T+ |
>  * dec_and_test()       | .F- | .F- | .F0 | WF0 | .T- | .F- |
>  * dec_and_mutex_lock() | .F- | .F- | .F0 | WF0 | .T- | .F- |
>  * dec_and_spin_lock()  | .F- | .F- | .F0 | WF0 | .T- | .F- |
>  * ---------------------+-----+-----+-----+-----+-----+-----+
>  *
>  * So for example: 'WT+' in the inc_not_zero() row and '-2' column
>  * means that when the atomic_inc_not_zero() function is called
>  * with an atomic var that has a value of UINT_MAX-1, then the
>  * atomic var's value will increase to the maximum overflow value
>  * of UINT_MAX and will produce a warning. The function returns
>  * 'true'.
>  */
> 
> I think this table makes the overflow/underflow semantics pretty clear and also 
> documents the regular behavior of these atomic ops pretty intuitively.
> 
> Agreed?
> 

Sure, this looks pretty great! Much more informative and readable than
my version ;-) Thank you.

Regards,
Boqun

> Thanks,
> 
> 	Ingo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-18 11:37     ` Peter Zijlstra
  2016-11-18 17:06       ` Will Deacon
@ 2016-11-21  8:44       ` Boqun Feng
  2016-11-21  9:02         ` Peter Zijlstra
  1 sibling, 1 reply; 96+ messages in thread
From: Boqun Feng @ 2016-11-21  8:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, gregkh, keescook, will.deacon, arnd, tglx,
	mingo, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

On Fri, Nov 18, 2016 at 12:37:18PM +0100, Peter Zijlstra wrote:
[snip]
> +
> +/*
> + * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
> + *
> + * Provides no memory ordering, it is assumed the caller already has a
> + * reference on the object, will WARN when this is not so.
> + */
> +static inline void refcount_inc(refcount_t *r)
> +{
> +	unsigned int old, new, val = atomic_read(&r->refs);
> +
> +	for (;;) {
> +		WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
> +

Do we want to put the address of @r into the WARN information? Which
could help us locate the problematic object quickly.

Regards,
Boqun

> +		if (unlikely(val == UINT_MAX))
> +			return;
> +
> +		new = val + 1;
> +		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> +		if (old == val)
> +			break;
> +
> +		val = old;
> +	}
> +
> +	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> +}
[...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-21  8:44       ` Boqun Feng
@ 2016-11-21  9:02         ` Peter Zijlstra
  2016-11-21  9:37           ` Boqun Feng
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2016-11-21  9:02 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Reshetova, Elena, gregkh, keescook, will.deacon, arnd, tglx,
	mingo, hpa, dave, linux-kernel

On Mon, Nov 21, 2016 at 04:44:28PM +0800, Boqun Feng wrote:
> On Fri, Nov 18, 2016 at 12:37:18PM +0100, Peter Zijlstra wrote:
> [snip]
> > +
> > +/*
> > + * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
> > + *
> > + * Provides no memory ordering, it is assumed the caller already has a
> > + * reference on the object, will WARN when this is not so.
> > + */
> > +static inline void refcount_inc(refcount_t *r)
> > +{
> > +	unsigned int old, new, val = atomic_read(&r->refs);
> > +
> > +	for (;;) {
> > +		WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
> > +
> 
> Do we want to put the address of @r into the WARN information? Which
> could help us locate the problematic object quickly.

I explicitly didn't do that because printing kernel addresses is
generally frowned upon. Also, random heap addresses are just that,
random. In most cases the backtrace is more informative.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-21  9:02         ` Peter Zijlstra
@ 2016-11-21  9:37           ` Boqun Feng
  0 siblings, 0 replies; 96+ messages in thread
From: Boqun Feng @ 2016-11-21  9:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, gregkh, keescook, will.deacon, arnd, tglx,
	mingo, hpa, dave, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

On Mon, Nov 21, 2016 at 10:02:23AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 04:44:28PM +0800, Boqun Feng wrote:
> > On Fri, Nov 18, 2016 at 12:37:18PM +0100, Peter Zijlstra wrote:
> > [snip]
> > > +
> > > +/*
> > > + * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
> > > + *
> > > + * Provides no memory ordering, it is assumed the caller already has a
> > > + * reference on the object, will WARN when this is not so.
> > > + */
> > > +static inline void refcount_inc(refcount_t *r)
> > > +{
> > > +	unsigned int old, new, val = atomic_read(&r->refs);
> > > +
> > > +	for (;;) {
> > > +		WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
> > > +
> > 
> > Do we want to put the address of @r into the WARN information? Which
> > could help us locate the problematic object quickly.
> 
> I explicitly didn't do that because printing kernel addresses is
> generally frowned upon. Also, random heap addresses are just that,
> random. In most cases the backtrace is more informative.

Fair enough ;-)

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
                     ` (3 preceding siblings ...)
  2016-11-18 10:47   ` Reshetova, Elena
@ 2017-01-26 23:14   ` Kees Cook
  2017-01-27  9:58     ` Peter Zijlstra
  4 siblings, 1 reply; 96+ messages in thread
From: Kees Cook @ 2017-01-26 23:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Hans Liljestrand

On Mon, Nov 14, 2016 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> Provide refcount_t, an atomic_t like primitive built just for
> refcounting.
>
> It provides overflow and underflow checks as well as saturation
> semantics such that when it overflows, we'll never attempt to free it
> again, ever.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Hi!

I see the other 6 patches from this series are in -next, but the
refcount_t implementation is still missing. What's needed to land this
in -next? It's blocking sending the atomic_t -> refcount_t patches,
which will likely all go through various maintainers, so we need to
have refcount_t first. :)

Thanks!

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2017-01-26 23:14   ` Kees Cook
@ 2017-01-27  9:58     ` Peter Zijlstra
  2017-01-27 21:07       ` Kees Cook
  0 siblings, 1 reply; 96+ messages in thread
From: Peter Zijlstra @ 2017-01-27  9:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Hans Liljestrand

On Thu, Jan 26, 2017 at 03:14:41PM -0800, Kees Cook wrote:
> On Mon, Nov 14, 2016 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Provide refcount_t, an atomic_t like primitive built just for
> > refcounting.
> >
> > It provides overflow and underflow checks as well as saturation
> > semantics such that when it overflows, we'll never attempt to free it
> > again, ever.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Hi!
> 
> I see the other 6 patches from this series are in -next, but the
> refcount_t implementation is still missing. What's needed to land this
> in -next? It's blocking sending the atomic_t -> refcount_t patches,
> which will likely all go through various maintainers, so we need to
> have refcount_t first. :)

Nothing much, except lack of time. I spend the last several days hunting
bugs, that trumps new features on my todo list.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2017-01-27  9:58     ` Peter Zijlstra
@ 2017-01-27 21:07       ` Kees Cook
  2017-01-30 13:40         ` Peter Zijlstra
  0 siblings, 1 reply; 96+ messages in thread
From: Kees Cook @ 2017-01-27 21:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Hans Liljestrand

On Fri, Jan 27, 2017 at 1:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jan 26, 2017 at 03:14:41PM -0800, Kees Cook wrote:
>> On Mon, Nov 14, 2016 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > Provide refcount_t, an atomic_t like primitive built just for
>> > refcounting.
>> >
>> > It provides overflow and underflow checks as well as saturation
>> > semantics such that when it overflows, we'll never attempt to free it
>> > again, ever.
>> >
>> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>
>> Hi!
>>
>> I see the other 6 patches from this series are in -next, but the
>> refcount_t implementation is still missing. What's needed to land this
>> in -next? It's blocking sending the atomic_t -> refcount_t patches,
>> which will likely all go through various maintainers, so we need to
>> have refcount_t first. :)
>
> Nothing much, except lack of time. I spend the last several days hunting
> bugs, that trumps new features on my todo list.

Totally understood. I was just trying to see if there was anything I
could help with for it. Are you expecting it to change much from your
original posted version?

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 7/7] kref: Implement using refcount_t
  2017-01-27 21:07       ` Kees Cook
@ 2017-01-30 13:40         ` Peter Zijlstra
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Zijlstra @ 2017-01-30 13:40 UTC (permalink / raw)
  To: Kees Cook
  Cc: Greg KH, Will Deacon, Reshetova, Elena, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, David Windsor,
	LKML, Hans Liljestrand

On Fri, Jan 27, 2017 at 01:07:35PM -0800, Kees Cook wrote:
> On Fri, Jan 27, 2017 at 1:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Nothing much, except lack of time. I spend the last several days hunting
> > bugs, that trumps new features on my todo list.
> 
> Totally understood. I was just trying to see if there was anything I
> could help with for it. Are you expecting it to change much from your
> original posted version?

There's a few things I wanted to play with, but I suppose we can merge
as is and fix up later. I should hopefully get time to do so this week.

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2017-01-30 13:41 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-14 17:39 [RFC][PATCH 0/7] kref improvements Peter Zijlstra
2016-11-14 17:39 ` [RFC][PATCH 1/7] kref: Add KREF_INIT() Peter Zijlstra
2016-11-14 17:39 ` [RFC][PATCH 2/7] kref: Add kref_read() Peter Zijlstra
2016-11-14 18:16   ` Christoph Hellwig
2016-11-15  7:28     ` Greg KH
2016-11-15  7:47       ` Peter Zijlstra
2016-11-15  8:37       ` [PATCH] printk, locking/atomics, kref: Introduce new %pAr and %pAk format string options for atomic_t and 'struct kref' Ingo Molnar
2016-11-15  8:43         ` [PATCH v2] " Ingo Molnar
2016-11-15  9:21           ` Peter Zijlstra
2016-11-15  9:41             ` [PATCH v3] printk, locking/atomics, kref: Introduce new %pAa " Ingo Molnar
2016-11-15 10:10           ` [PATCH v2] printk, locking/atomics, kref: Introduce new %pAr " kbuild test robot
2016-11-15 16:42         ` [PATCH] " Linus Torvalds
2016-11-16  8:13           ` Ingo Molnar
2016-11-15  7:33   ` [RFC][PATCH 2/7] kref: Add kref_read() Greg KH
2016-11-15  8:03     ` Peter Zijlstra
2016-11-15 20:53       ` Kees Cook
2016-11-16  8:21         ` Greg KH
2016-11-16 10:10           ` Peter Zijlstra
2016-11-16 10:18             ` Greg KH
2016-11-16 10:11           ` Daniel Borkmann
2016-11-16 10:19             ` Greg KH
2016-11-16 10:09         ` Peter Zijlstra
2016-11-16 18:58           ` Kees Cook
2016-11-17  8:34             ` Peter Zijlstra
2016-11-17 12:30               ` David Windsor
2016-11-17 12:43                 ` Peter Zijlstra
2016-11-17 13:01                   ` Reshetova, Elena
2016-11-17 13:22                     ` Peter Zijlstra
2016-11-17 15:42                       ` Reshetova, Elena
2016-11-17 18:02                       ` Reshetova, Elena
2016-11-17 19:10                         ` Peter Zijlstra
2016-11-17 19:29                         ` Peter Zijlstra
2016-11-17 19:34               ` Kees Cook
2016-11-14 17:39 ` [RFC][PATCH 3/7] kref: Kill kref_sub() Peter Zijlstra
2016-11-14 17:39 ` [RFC][PATCH 4/7] kref: Use kref_get_unless_zero() more Peter Zijlstra
2016-11-14 17:39 ` [RFC][PATCH 5/7] kref: Implement kref_put_lock() Peter Zijlstra
2016-11-14 20:35   ` Kees Cook
2016-11-15  7:50     ` Peter Zijlstra
2016-11-14 17:39 ` [RFC][PATCH 6/7] kref: Avoid more abuse Peter Zijlstra
2016-11-14 17:39 ` [RFC][PATCH 7/7] kref: Implement using refcount_t Peter Zijlstra
2016-11-15  8:40   ` Ingo Molnar
2016-11-15  9:47     ` Peter Zijlstra
2016-11-15 10:03       ` Ingo Molnar
2016-11-15 10:46         ` Peter Zijlstra
2016-11-15 13:03           ` Ingo Molnar
2016-11-15 18:06             ` Kees Cook
2016-11-15 19:16               ` Peter Zijlstra
2016-11-15 19:23                 ` Kees Cook
2016-11-16  8:31                   ` Ingo Molnar
2016-11-16  8:51                     ` Greg KH
2016-11-16  9:07                       ` Ingo Molnar
2016-11-16  9:24                         ` Greg KH
2016-11-16 10:15                     ` Peter Zijlstra
2016-11-16 18:55                       ` Kees Cook
2016-11-17  8:33                         ` Peter Zijlstra
2016-11-17 19:50                           ` Kees Cook
2016-11-16 18:41                     ` Kees Cook
2016-11-15 12:33   ` Boqun Feng
2016-11-15 13:01     ` Peter Zijlstra
2016-11-15 14:19       ` Boqun Feng
2016-11-17  9:28         ` Peter Zijlstra
2016-11-17  9:48           ` Boqun Feng
2016-11-17 10:29             ` Peter Zijlstra
2016-11-17 10:39               ` Peter Zijlstra
2016-11-17 11:03                 ` Greg KH
2016-11-17 12:48                   ` Peter Zijlstra
     [not found]               ` <CAL0jBu-GnREUPSX4kUDp-Cc8ZGp6+Cb2q0HVandswcLzPRnChQ@mail.gmail.com>
2016-11-17 12:08                 ` Peter Zijlstra
2016-11-17 12:08           ` Will Deacon
2016-11-17 16:11             ` Peter Zijlstra
2016-11-17 16:36               ` Will Deacon
2016-11-18  8:26                 ` Boqun Feng
2016-11-18 10:16                   ` Will Deacon
2016-11-18 10:07   ` Reshetova, Elena
2016-11-18 11:37     ` Peter Zijlstra
2016-11-18 17:06       ` Will Deacon
2016-11-18 18:57         ` Peter Zijlstra
2016-11-21  4:06         ` Boqun Feng
2016-11-21  7:48           ` Ingo Molnar
2016-11-21  8:38             ` Boqun Feng
2016-11-21  8:44       ` Boqun Feng
2016-11-21  9:02         ` Peter Zijlstra
2016-11-21  9:37           ` Boqun Feng
2016-11-18 10:47   ` Reshetova, Elena
2016-11-18 10:52     ` Peter Zijlstra
2016-11-18 16:58       ` Reshetova, Elena
2016-11-18 18:53         ` Peter Zijlstra
2016-11-19  7:14           ` Reshetova, Elena
2016-11-19 11:45             ` Peter Zijlstra
2017-01-26 23:14   ` Kees Cook
2017-01-27  9:58     ` Peter Zijlstra
2017-01-27 21:07       ` Kees Cook
2017-01-30 13:40         ` Peter Zijlstra
2016-11-15  7:27 ` [RFC][PATCH 0/7] kref improvements Greg KH
2016-11-15  7:42   ` Ingo Molnar
2016-11-15 15:05     ` Greg KH
2016-11-15  7:48   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).