All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave()
@ 2018-05-09 19:36 Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 1/8] bdi: use refcount_t for reference counting instead atomic_t Sebastian Andrzej Siewior
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner

This series is a v2 of the atomic_dec_and_lock_irqsave(). Now refcount_*
is used instead of atomic_* as suggested by Peter Zijlstra.

Patch
- 1-3 converts the user from atomic_* API to refcount_* API
- 4 implements refcount_dec_and_lock_irqsave
- 5-8 converts the local_irq_save() + refcount_dec_and_lock() users to
  refcount_dec_and_lock_irqsave()

The whole series sits also at
  git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git refcount_t_irqsave

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/8] bdi: use refcount_t for reference counting instead atomic_t
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 2/8] userns: " Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/backing-dev-defs.h |  3 ++-
 include/linux/backing-dev.h      |  4 ++--
 mm/backing-dev.c                 | 12 ++++++------
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 0bd432a4d7bd..81c75934ca5b 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -12,6 +12,7 @@
 #include <linux/timer.h>
 #include <linux/workqueue.h>
 #include <linux/kref.h>
+#include <linux/refcount.h>
 
 struct page;
 struct device;
@@ -76,7 +77,7 @@ enum wb_reason {
  */
 struct bdi_writeback_congested {
 	unsigned long state;		/* WB_[a]sync_congested flags */
-	atomic_t refcnt;		/* nr of attached wb's and blkg */
+	refcount_t refcnt;		/* nr of attached wb's and blkg */
 
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct backing_dev_info *__bdi;	/* the associated bdi, set to NULL
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 72ca0f3d39f3..c28a47cbe355 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -404,13 +404,13 @@ static inline bool inode_cgwb_enabled(struct inode *inode)
 static inline struct bdi_writeback_congested *
 wb_congested_get_create(struct backing_dev_info *bdi, int blkcg_id, gfp_t gfp)
 {
-	atomic_inc(&bdi->wb_congested->refcnt);
+	refcount_inc(&bdi->wb_congested->refcnt);
 	return bdi->wb_congested;
 }
 
 static inline void wb_congested_put(struct bdi_writeback_congested *congested)
 {
-	if (atomic_dec_and_test(&congested->refcnt))
+	if (refcount_dec_and_test(&congested->refcnt))
 		kfree(congested);
 }
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7441bd93b732..7984a872073e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -450,10 +450,10 @@ wb_congested_get_create(struct backing_dev_info *bdi, int blkcg_id, gfp_t gfp)
 	if (new_congested) {
 		/* !found and storage for new one already allocated, insert */
 		congested = new_congested;
-		new_congested = NULL;
 		rb_link_node(&congested->rb_node, parent, node);
 		rb_insert_color(&congested->rb_node, &bdi->cgwb_congested_tree);
-		goto found;
+		spin_unlock_irqrestore(&cgwb_lock, flags);
+		return congested;
 	}
 
 	spin_unlock_irqrestore(&cgwb_lock, flags);
@@ -463,13 +463,13 @@ wb_congested_get_create(struct backing_dev_info *bdi, int blkcg_id, gfp_t gfp)
 	if (!new_congested)
 		return NULL;
 
-	atomic_set(&new_congested->refcnt, 0);
+	refcount_set(&new_congested->refcnt, 1);
 	new_congested->__bdi = bdi;
 	new_congested->blkcg_id = blkcg_id;
 	goto retry;
 
 found:
-	atomic_inc(&congested->refcnt);
+	refcount_inc(&congested->refcnt);
 	spin_unlock_irqrestore(&cgwb_lock, flags);
 	kfree(new_congested);
 	return congested;
@@ -486,7 +486,7 @@ void wb_congested_put(struct bdi_writeback_congested *congested)
 	unsigned long flags;
 
 	local_irq_save(flags);
-	if (!atomic_dec_and_lock(&congested->refcnt, &cgwb_lock)) {
+	if (!refcount_dec_and_lock(&congested->refcnt, &cgwb_lock)) {
 		local_irq_restore(flags);
 		return;
 	}
@@ -794,7 +794,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 	if (!bdi->wb_congested)
 		return -ENOMEM;
 
-	atomic_set(&bdi->wb_congested->refcnt, 1);
+	refcount_set(&bdi->wb_congested->refcnt, 1);
 
 	err = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL);
 	if (err) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/8] userns: use refcount_t for reference counting instead atomic_t
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 1/8] bdi: use refcount_t for reference counting instead atomic_t Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 3/8] md: raid5: " Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched/user.h | 5 +++--
 kernel/user.c              | 8 ++++----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 96fe289c4c6e..39ad98c09c58 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -4,6 +4,7 @@
 
 #include <linux/uidgid.h>
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/ratelimit.h>
 
 struct key;
@@ -12,7 +13,7 @@ struct key;
  * Some day this will be a full-fledged user tracking system..
  */
 struct user_struct {
-	atomic_t __count;	/* reference count */
+	refcount_t __count;	/* reference count */
 	atomic_t processes;	/* How many processes does this user have? */
 	atomic_t sigpending;	/* How many pending signals does this user have? */
 #ifdef CONFIG_FANOTIFY
@@ -59,7 +60,7 @@ extern struct user_struct root_user;
 extern struct user_struct * alloc_uid(kuid_t);
 static inline struct user_struct *get_uid(struct user_struct *u)
 {
-	atomic_inc(&u->__count);
+	refcount_inc(&u->__count);
 	return u;
 }
 extern void free_uid(struct user_struct *);
diff --git a/kernel/user.c b/kernel/user.c
index 36288d840675..5f65ef195259 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -96,7 +96,7 @@ static DEFINE_SPINLOCK(uidhash_lock);
 
 /* root_user.__count is 1, for init task cred */
 struct user_struct root_user = {
-	.__count	= ATOMIC_INIT(1),
+	.__count	= REFCOUNT_INIT(1),
 	.processes	= ATOMIC_INIT(1),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
@@ -123,7 +123,7 @@ static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)
 
 	hlist_for_each_entry(user, hashent, uidhash_node) {
 		if (uid_eq(user->uid, uid)) {
-			atomic_inc(&user->__count);
+			refcount_inc(&user->__count);
 			return user;
 		}
 	}
@@ -170,7 +170,7 @@ void free_uid(struct user_struct *up)
 		return;
 
 	local_irq_save(flags);
-	if (atomic_dec_and_lock(&up->__count, &uidhash_lock))
+	if (refcount_dec_and_lock(&up->__count, &uidhash_lock))
 		free_user(up, flags);
 	else
 		local_irq_restore(flags);
@@ -191,7 +191,7 @@ struct user_struct *alloc_uid(kuid_t uid)
 			goto out_unlock;
 
 		new->uid = uid;
-		atomic_set(&new->__count, 1);
+		refcount_set(&new->__count, 1);
 		ratelimit_state_init(&new->ratelimit, HZ, 100);
 		ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 1/8] bdi: use refcount_t for reference counting instead atomic_t Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 2/8] userns: " Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-23 12:36   ` Peter Zijlstra
  2018-05-23 13:21   ` Matthew Wilcox
  2018-05-09 19:36 ` [PATCH 4/8] locking/refcount: implement refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.

Most changes are 1:1 replacements except for
	BUG_ON(atomic_inc_return(&sh->count) != 1);

which has been turned into
        refcount_inc(&sh->count);
        BUG_ON(refcount_read(&sh->count) != 1);

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/md/raid5-cache.c |  8 ++---
 drivers/md/raid5-ppl.c   |  2 +-
 drivers/md/raid5.c       | 67 ++++++++++++++++++++--------------------
 drivers/md/raid5.h       |  4 +--
 4 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 3c65f52b68f5..532fdf56c117 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1049,7 +1049,7 @@ int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
 	 * don't delay.
 	 */
 	clear_bit(STRIPE_DELAYED, &sh->state);
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 
 	mutex_lock(&log->io_mutex);
 	/* meta + data */
@@ -1388,7 +1388,7 @@ static void r5c_flush_stripe(struct r5conf *conf, struct stripe_head *sh)
 	lockdep_assert_held(&conf->device_lock);
 
 	list_del_init(&sh->lru);
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 
 	set_bit(STRIPE_HANDLE, &sh->state);
 	atomic_inc(&conf->active_stripes);
@@ -1491,7 +1491,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
 			 */
 			if (!list_empty(&sh->lru) &&
 			    !test_bit(STRIPE_HANDLE, &sh->state) &&
-			    atomic_read(&sh->count) == 0) {
+			    refcount_read(&sh->count) == 0) {
 				r5c_flush_stripe(conf, sh);
 				if (count++ >= R5C_RECLAIM_STRIPE_GROUP)
 					break;
@@ -2912,7 +2912,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
 	 * don't delay.
 	 */
 	clear_bit(STRIPE_DELAYED, &sh->state);
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 
 	mutex_lock(&log->io_mutex);
 	/* meta + data */
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index 42890a08375b..87840cfe7a80 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -388,7 +388,7 @@ int ppl_write_stripe(struct r5conf *conf, struct stripe_head *sh)
 
 	set_bit(STRIPE_LOG_TRAPPED, &sh->state);
 	clear_bit(STRIPE_DELAYED, &sh->state);
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 
 	if (ppl_log_stripe(log, sh)) {
 		spin_lock_irq(&ppl_conf->no_mem_stripes_lock);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index be117d0a65a8..57ea0ae8c7ff 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -306,7 +306,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
 static void __release_stripe(struct r5conf *conf, struct stripe_head *sh,
 			     struct list_head *temp_inactive_list)
 {
-	if (atomic_dec_and_test(&sh->count))
+	if (refcount_dec_and_test(&sh->count))
 		do_release_stripe(conf, sh, temp_inactive_list);
 }
 
@@ -398,7 +398,7 @@ void raid5_release_stripe(struct stripe_head *sh)
 
 	/* Avoid release_list until the last reference.
 	 */
-	if (atomic_add_unless(&sh->count, -1, 1))
+	if (refcount_dec_not_one(&sh->count))
 		return;
 
 	if (unlikely(!conf->mddev->thread) ||
@@ -411,7 +411,7 @@ void raid5_release_stripe(struct stripe_head *sh)
 slow_path:
 	local_irq_save(flags);
 	/* we are ok here if STRIPE_ON_RELEASE_LIST is set or not */
-	if (atomic_dec_and_lock(&sh->count, &conf->device_lock)) {
+	if (refcount_dec_and_lock(&sh->count, &conf->device_lock)) {
 		INIT_LIST_HEAD(&list);
 		hash = sh->hash_lock_index;
 		do_release_stripe(conf, sh, &list);
@@ -501,7 +501,7 @@ static void init_stripe(struct stripe_head *sh, sector_t sector, int previous)
 	struct r5conf *conf = sh->raid_conf;
 	int i, seq;
 
-	BUG_ON(atomic_read(&sh->count) != 0);
+	BUG_ON(refcount_read(&sh->count) != 0);
 	BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
 	BUG_ON(stripe_operations_active(sh));
 	BUG_ON(sh->batch_head);
@@ -678,11 +678,11 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
 					  &conf->cache_state);
 			} else {
 				init_stripe(sh, sector, previous);
-				atomic_inc(&sh->count);
+				refcount_inc(&sh->count);
 			}
-		} else if (!atomic_inc_not_zero(&sh->count)) {
+		} else if (!refcount_inc_not_zero(&sh->count)) {
 			spin_lock(&conf->device_lock);
-			if (!atomic_read(&sh->count)) {
+			if (!refcount_read(&sh->count)) {
 				if (!test_bit(STRIPE_HANDLE, &sh->state))
 					atomic_inc(&conf->active_stripes);
 				BUG_ON(list_empty(&sh->lru) &&
@@ -698,7 +698,7 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
 					sh->group = NULL;
 				}
 			}
-			atomic_inc(&sh->count);
+			refcount_inc(&sh->count);
 			spin_unlock(&conf->device_lock);
 		}
 	} while (sh == NULL);
@@ -760,9 +760,9 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
 	hash = stripe_hash_locks_hash(head_sector);
 	spin_lock_irq(conf->hash_locks + hash);
 	head = __find_stripe(conf, head_sector, conf->generation);
-	if (head && !atomic_inc_not_zero(&head->count)) {
+	if (head && !refcount_inc_not_zero(&head->count)) {
 		spin_lock(&conf->device_lock);
-		if (!atomic_read(&head->count)) {
+		if (!refcount_read(&head->count)) {
 			if (!test_bit(STRIPE_HANDLE, &head->state))
 				atomic_inc(&conf->active_stripes);
 			BUG_ON(list_empty(&head->lru) &&
@@ -778,7 +778,7 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
 				head->group = NULL;
 			}
 		}
-		atomic_inc(&head->count);
+		refcount_inc(&head->count);
 		spin_unlock(&conf->device_lock);
 	}
 	spin_unlock_irq(conf->hash_locks + hash);
@@ -847,7 +847,7 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
 		sh->batch_head->bm_seq = seq;
 	}
 
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 unlock_out:
 	unlock_two_stripes(head, sh);
 out:
@@ -1110,9 +1110,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 			pr_debug("%s: for %llu schedule op %d on disc %d\n",
 				__func__, (unsigned long long)sh->sector,
 				bi->bi_opf, i);
-			atomic_inc(&sh->count);
+			refcount_inc(&sh->count);
 			if (sh != head_sh)
-				atomic_inc(&head_sh->count);
+				refcount_inc(&head_sh->count);
 			if (use_new_offset(conf, sh))
 				bi->bi_iter.bi_sector = (sh->sector
 						 + rdev->new_data_offset);
@@ -1174,9 +1174,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 				 "replacement disc %d\n",
 				__func__, (unsigned long long)sh->sector,
 				rbi->bi_opf, i);
-			atomic_inc(&sh->count);
+			refcount_inc(&sh->count);
 			if (sh != head_sh)
-				atomic_inc(&head_sh->count);
+				refcount_inc(&head_sh->count);
 			if (use_new_offset(conf, sh))
 				rbi->bi_iter.bi_sector = (sh->sector
 						  + rrdev->new_data_offset);
@@ -1354,7 +1354,7 @@ static void ops_run_biofill(struct stripe_head *sh)
 		}
 	}
 
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 	init_async_submit(&submit, ASYNC_TX_ACK, tx, ops_complete_biofill, sh, NULL);
 	async_trigger_callback(&submit);
 }
@@ -1432,7 +1432,7 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
 		if (i != target)
 			xor_srcs[count++] = sh->dev[i].page;
 
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 
 	init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST, NULL,
 			  ops_complete_compute, sh, to_addr_conv(sh, percpu, 0));
@@ -1521,7 +1521,7 @@ ops_run_compute6_1(struct stripe_head *sh, struct raid5_percpu *percpu)
 	BUG_ON(!test_bit(R5_Wantcompute, &tgt->flags));
 	dest = tgt->page;
 
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 
 	if (target == qd_idx) {
 		count = set_syndrome_sources(blocks, sh, SYNDROME_SRC_ALL);
@@ -1596,7 +1596,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
 	pr_debug("%s: stripe: %llu faila: %d failb: %d\n",
 		 __func__, (unsigned long long)sh->sector, faila, failb);
 
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 
 	if (failb == syndrome_disks+1) {
 		/* Q disk is one of the missing disks */
@@ -1867,7 +1867,7 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
 			break;
 	}
 	if (i >= sh->disks) {
-		atomic_inc(&sh->count);
+		refcount_inc(&sh->count);
 		set_bit(R5_Discard, &sh->dev[pd_idx].flags);
 		ops_complete_reconstruct(sh);
 		return;
@@ -1908,7 +1908,7 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
 		flags = ASYNC_TX_ACK |
 			(prexor ? ASYNC_TX_XOR_DROP_DST : ASYNC_TX_XOR_ZERO_DST);
 
-		atomic_inc(&head_sh->count);
+		refcount_inc(&head_sh->count);
 		init_async_submit(&submit, flags, tx, ops_complete_reconstruct, head_sh,
 				  to_addr_conv(sh, percpu, j));
 	} else {
@@ -1950,7 +1950,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
 			break;
 	}
 	if (i >= sh->disks) {
-		atomic_inc(&sh->count);
+		refcount_inc(&sh->count);
 		set_bit(R5_Discard, &sh->dev[sh->pd_idx].flags);
 		set_bit(R5_Discard, &sh->dev[sh->qd_idx].flags);
 		ops_complete_reconstruct(sh);
@@ -1974,7 +1974,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
 				 struct stripe_head, batch_list) == head_sh;
 
 	if (last_stripe) {
-		atomic_inc(&head_sh->count);
+		refcount_inc(&head_sh->count);
 		init_async_submit(&submit, txflags, tx, ops_complete_reconstruct,
 				  head_sh, to_addr_conv(sh, percpu, j));
 	} else
@@ -2031,7 +2031,7 @@ static void ops_run_check_p(struct stripe_head *sh, struct raid5_percpu *percpu)
 	tx = async_xor_val(xor_dest, xor_srcs, 0, count, STRIPE_SIZE,
 			   &sh->ops.zero_sum_result, &submit);
 
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 	init_async_submit(&submit, ASYNC_TX_ACK, tx, ops_complete_check, sh, NULL);
 	tx = async_trigger_callback(&submit);
 }
@@ -2050,7 +2050,7 @@ static void ops_run_check_pq(struct stripe_head *sh, struct raid5_percpu *percpu
 	if (!checkp)
 		srcs[count] = NULL;
 
-	atomic_inc(&sh->count);
+	refcount_inc(&sh->count);
 	init_async_submit(&submit, ASYNC_TX_ACK, NULL, ops_complete_check,
 			  sh, to_addr_conv(sh, percpu, 0));
 	async_syndrome_val(srcs, 0, count+2, STRIPE_SIZE,
@@ -2150,7 +2150,7 @@ static struct stripe_head *alloc_stripe(struct kmem_cache *sc, gfp_t gfp,
 		INIT_LIST_HEAD(&sh->lru);
 		INIT_LIST_HEAD(&sh->r5c);
 		INIT_LIST_HEAD(&sh->log_list);
-		atomic_set(&sh->count, 1);
+		refcount_set(&sh->count, 1);
 		sh->raid_conf = conf;
 		sh->log_start = MaxSector;
 		for (i = 0; i < disks; i++) {
@@ -2451,7 +2451,7 @@ static int drop_one_stripe(struct r5conf *conf)
 	spin_unlock_irq(conf->hash_locks + hash);
 	if (!sh)
 		return 0;
-	BUG_ON(atomic_read(&sh->count));
+	BUG_ON(refcount_read(&sh->count));
 	shrink_buffers(sh);
 	free_stripe(conf->slab_cache, sh);
 	atomic_dec(&conf->active_stripes);
@@ -2483,7 +2483,7 @@ static void raid5_end_read_request(struct bio * bi)
 			break;
 
 	pr_debug("end_read_request %llu/%d, count: %d, error %d.\n",
-		(unsigned long long)sh->sector, i, atomic_read(&sh->count),
+		(unsigned long long)sh->sector, i, refcount_read(&sh->count),
 		bi->bi_status);
 	if (i == disks) {
 		bio_reset(bi);
@@ -2620,7 +2620,7 @@ static void raid5_end_write_request(struct bio *bi)
 		}
 	}
 	pr_debug("end_write_request %llu/%d, count %d, error: %d.\n",
-		(unsigned long long)sh->sector, i, atomic_read(&sh->count),
+		(unsigned long long)sh->sector, i, refcount_read(&sh->count),
 		bi->bi_status);
 	if (i == disks) {
 		bio_reset(bi);
@@ -4687,7 +4687,7 @@ static void handle_stripe(struct stripe_head *sh)
 	pr_debug("handling stripe %llu, state=%#lx cnt=%d, "
 		"pd_idx=%d, qd_idx=%d\n, check:%d, reconstruct:%d\n",
 	       (unsigned long long)sh->sector, sh->state,
-	       atomic_read(&sh->count), sh->pd_idx, sh->qd_idx,
+	       refcount_read(&sh->count), sh->pd_idx, sh->qd_idx,
 	       sh->check_state, sh->reconstruct_state);
 
 	analyse_stripe(sh, &s);
@@ -5062,7 +5062,7 @@ static void activate_bit_delay(struct r5conf *conf,
 		struct stripe_head *sh = list_entry(head.next, struct stripe_head, lru);
 		int hash;
 		list_del_init(&sh->lru);
-		atomic_inc(&sh->count);
+		refcount_inc(&sh->count);
 		hash = sh->hash_lock_index;
 		__release_stripe(conf, sh, &temp_inactive_list[hash]);
 	}
@@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct r5conf *conf, int group)
 		sh->group = NULL;
 	}
 	list_del_init(&sh->lru);
-	BUG_ON(atomic_inc_return(&sh->count) != 1);
+	refcount_inc(&sh->count);
+	BUG_ON(refcount_read(&sh->count) != 1);
 	return sh;
 }
 
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 3f8da26032ac..bc2a24e99346 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -4,7 +4,7 @@
 
 #include <linux/raid/xor.h>
 #include <linux/dmaengine.h>
-
+#include <linux/refcount.h>
 /*
  *
  * Each stripe contains one buffer per device.  Each buffer can be in
@@ -208,7 +208,7 @@ struct stripe_head {
 	short			ddf_layout;/* use DDF ordering to calculate Q */
 	short			hash_lock_index;
 	unsigned long		state;		/* state flags */
-	atomic_t		count;	      /* nr of active thread/requests */
+	refcount_t		count;	      /* nr of active thread/requests */
 	int			bm_seq;	/* sequence number for bitmap flushes */
 	int			disks;		/* disks in stripe */
 	int			overwrite_disks; /* total overwrite disks in stripe,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/8] locking/refcount: implement refcount_dec_and_lock_irqsave()
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2018-05-09 19:36 ` [PATCH 3/8] md: raid5: " Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 5/8] bdi: Use irqsave variant of refcount_dec_and_lock() Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

There are in-tree users of refcount_dec_and_lock() which must acquire the
spin lock with interrupts disabled. To workaround the lack of an irqsave
variant of refcount_dec_and_lock() they use local_irq_save() at the call
site. This causes extra code and creates in some places unneeded long
interrupt disabled times. These places need also extra treatment for
PREEMPT_RT due to the disconnect of the irq disabling and the lock
function.

Implement the missing irqsave variant of the function.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/refcount.h |  4 +++-
 lib/refcount.c           | 28 ++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 4193c41e383a..a685da2c4522 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -98,5 +98,7 @@ extern __must_check bool refcount_dec_if_one(refcount_t *r);
 extern __must_check bool refcount_dec_not_one(refcount_t *r);
 extern __must_check bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock);
 extern __must_check bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock);
-
+extern __must_check bool refcount_dec_and_lock_irqsave(refcount_t *r,
+						       spinlock_t *lock,
+						       unsigned long *flags);
 #endif /* _LINUX_REFCOUNT_H */
diff --git a/lib/refcount.c b/lib/refcount.c
index 0eb48353abe3..d3b81cefce91 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -350,3 +350,31 @@ bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
 }
 EXPORT_SYMBOL(refcount_dec_and_lock);
 
+/**
+ * refcount_dec_and_lock_irqsave - return holding spinlock with disabled
+ *                                 interrupts if able to decrement refcount to 0
+ * @r: the refcount
+ * @lock: the spinlock to be locked
+ * @flags: saved IRQ-flags if the is acquired
+ *
+ * Same as refcount_dec_and_lock() above except that the spinlock is acquired
+ * with disabled interupts.
+ *
+ * Return: true and hold spinlock if able to decrement refcount to 0, false
+ *         otherwise
+ */
+bool refcount_dec_and_lock_irqsave(refcount_t *r, spinlock_t *lock,
+				   unsigned long *flags)
+{
+	if (refcount_dec_not_one(r))
+		return false;
+
+	spin_lock_irqsave(lock, *flags);
+	if (!refcount_dec_and_test(r)) {
+		spin_unlock_irqrestore(lock, *flags);
+		return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(refcount_dec_and_lock_irqsave);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/8] bdi: Use irqsave variant of refcount_dec_and_lock()
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2018-05-09 19:36 ` [PATCH 4/8] locking/refcount: implement refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 6/8] userns: " Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save/restore is no longer required.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/backing-dev.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7984a872073e..520aa092f7b2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -485,11 +485,8 @@ void wb_congested_put(struct bdi_writeback_congested *congested)
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
-	if (!refcount_dec_and_lock(&congested->refcnt, &cgwb_lock)) {
-		local_irq_restore(flags);
+	if (!refcount_dec_and_lock_irqsave(&congested->refcnt, &cgwb_lock, &flags))
 		return;
-	}
 
 	/* bdi might already have been destroyed leaving @congested unlinked */
 	if (congested->__bdi) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/8] userns: Use irqsave variant of refcount_dec_and_lock()
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2018-05-09 19:36 ` [PATCH 5/8] bdi: Use irqsave variant of refcount_dec_and_lock() Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 7/8] md: raid5: " Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save/restore is no longer required.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/user.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 5f65ef195259..0df9b1640b2a 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -169,11 +169,8 @@ void free_uid(struct user_struct *up)
 	if (!up)
 		return;
 
-	local_irq_save(flags);
-	if (refcount_dec_and_lock(&up->__count, &uidhash_lock))
+	if (refcount_dec_and_lock_irqsave(&up->__count, &uidhash_lock, &flags))
 		free_user(up, flags);
-	else
-		local_irq_restore(flags);
 }
 
 struct user_struct *alloc_uid(kuid_t uid)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/8] md: raid5: Use irqsave variant of refcount_dec_and_lock()
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2018-05-09 19:36 ` [PATCH 6/8] userns: " Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-09 19:36 ` [PATCH 8/8] md: raid5: Do not disable irq on release_inactive_stripe_list() call Sebastian Andrzej Siewior
  2018-05-23 13:01 ` [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Peter Zijlstra
  8 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save is no longer required.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/md/raid5.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 57ea0ae8c7ff..28453264c3eb 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -409,16 +409,15 @@ void raid5_release_stripe(struct stripe_head *sh)
 		md_wakeup_thread(conf->mddev->thread);
 	return;
 slow_path:
-	local_irq_save(flags);
 	/* we are ok here if STRIPE_ON_RELEASE_LIST is set or not */
-	if (refcount_dec_and_lock(&sh->count, &conf->device_lock)) {
+	if (refcount_dec_and_lock_irqsave(&sh->count, &conf->device_lock, &flags)) {
 		INIT_LIST_HEAD(&list);
 		hash = sh->hash_lock_index;
 		do_release_stripe(conf, sh, &list);
 		spin_unlock(&conf->device_lock);
 		release_inactive_stripe_list(conf, &list, hash);
+		local_irq_restore(flags);
 	}
-	local_irq_restore(flags);
 }
 
 static inline void remove_hash(struct stripe_head *sh)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/8] md: raid5: Do not disable irq on release_inactive_stripe_list() call
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2018-05-09 19:36 ` [PATCH 7/8] md: raid5: " Sebastian Andrzej Siewior
@ 2018-05-09 19:36 ` Sebastian Andrzej Siewior
  2018-05-23 13:01 ` [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Peter Zijlstra
  8 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-09 19:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Peter Zijlstra, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner, Sebastian Andrzej Siewior

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

There is no need to invoke release_inactive_stripe_list() with interrupts
disabled. All call sites, except raid5_release_stripe(), unlock
->device_lock and enable interrupts before invoking the function.

Make it consistent.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/md/raid5.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 28453264c3eb..9433b2619006 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -414,9 +414,8 @@ void raid5_release_stripe(struct stripe_head *sh)
 		INIT_LIST_HEAD(&list);
 		hash = sh->hash_lock_index;
 		do_release_stripe(conf, sh, &list);
-		spin_unlock(&conf->device_lock);
+		spin_unlock_irqrestore(&conf->device_lock, flags);
 		release_inactive_stripe_list(conf, &list, hash);
-		local_irq_restore(flags);
 	}
 }
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-09 19:36 ` [PATCH 3/8] md: raid5: " Sebastian Andrzej Siewior
@ 2018-05-23 12:36   ` Peter Zijlstra
  2018-05-23 12:50     ` Sebastian Andrzej Siewior
  2018-05-23 13:21   ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-23 12:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, tglx, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner

> Most changes are 1:1 replacements except for
> 	BUG_ON(atomic_inc_return(&sh->count) != 1);

That doesn't look right, 'inc_return == 1' implies inc-from-zero, which
is not allowed by refcount.

> which has been turned into
>         refcount_inc(&sh->count);
>         BUG_ON(refcount_read(&sh->count) != 1);

And that is racy, you can have additional increments in between.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-23 12:36   ` Peter Zijlstra
@ 2018-05-23 12:50     ` Sebastian Andrzej Siewior
  2018-05-23 12:55       ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-23 12:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, tglx, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner

On 2018-05-23 14:36:15 [+0200], Peter Zijlstra wrote:
> > Most changes are 1:1 replacements except for
> > 	BUG_ON(atomic_inc_return(&sh->count) != 1);
> 
> That doesn't look right, 'inc_return == 1' implies inc-from-zero, which
> is not allowed by refcount.
> 
> > which has been turned into
> >         refcount_inc(&sh->count);
> >         BUG_ON(refcount_read(&sh->count) != 1);
> 
> And that is racy, you can have additional increments in between.

so do we stay with the atomic* API here or do we extend refcount* API?

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-23 12:50     ` Sebastian Andrzej Siewior
@ 2018-05-23 12:55       ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-23 12:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, tglx, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner

On Wed, May 23, 2018 at 02:50:07PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-05-23 14:36:15 [+0200], Peter Zijlstra wrote:
> > > Most changes are 1:1 replacements except for
> > > 	BUG_ON(atomic_inc_return(&sh->count) != 1);
> > 
> > That doesn't look right, 'inc_return == 1' implies inc-from-zero, which
> > is not allowed by refcount.
> > 
> > > which has been turned into
> > >         refcount_inc(&sh->count);
> > >         BUG_ON(refcount_read(&sh->count) != 1);
> > 
> > And that is racy, you can have additional increments in between.
> 
> so do we stay with the atomic* API here or do we extend refcount* API?

Stay with the atomic; I'll look at the rest of these patches, but raid5
looks like a usage-count, not a reference count.

I'll probably ack your initial set and parts of this.. but let me get to
the end of this first.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave()
  2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2018-05-09 19:36 ` [PATCH 8/8] md: raid5: Do not disable irq on release_inactive_stripe_list() call Sebastian Andrzej Siewior
@ 2018-05-23 13:01 ` Peter Zijlstra
  2018-05-23 13:09   ` Sebastian Andrzej Siewior
  8 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-23 13:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, tglx, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner

On Wed, May 09, 2018 at 09:36:37PM +0200, Sebastian Andrzej Siewior wrote:
> This series is a v2 of the atomic_dec_and_lock_irqsave(). Now refcount_*
> is used instead of atomic_* as suggested by Peter Zijlstra.
> 
> Patch
> - 1-3 converts the user from atomic_* API to refcount_* API
> - 4 implements refcount_dec_and_lock_irqsave
> - 5-8 converts the local_irq_save() + refcount_dec_and_lock() users to
>   refcount_dec_and_lock_irqsave()
> 
> The whole series sits also at
>   git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git refcount_t_irqsave
> 

1-2, 4-6:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave()
  2018-05-23 13:01 ` [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Peter Zijlstra
@ 2018-05-23 13:09   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-05-23 13:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, tglx, Ingo Molnar, linux-mm, Shaohua Li,
	linux-raid, Anna-Maria Gleixner

On 2018-05-23 15:01:25 [+0200], Peter Zijlstra wrote:
> 1-2, 4-6:
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thank you.

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-09 19:36 ` [PATCH 3/8] md: raid5: " Sebastian Andrzej Siewior
  2018-05-23 12:36   ` Peter Zijlstra
@ 2018-05-23 13:21   ` Matthew Wilcox
  2018-05-23 17:49     ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2018-05-23 13:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, tglx, Peter Zijlstra, Ingo Molnar, linux-mm,
	Shaohua Li, linux-raid, Anna-Maria Gleixner

On Wed, May 09, 2018 at 09:36:40PM +0200, Sebastian Andrzej Siewior wrote:
> refcount_t type and corresponding API should be used instead of atomic_t when
> the variable is used as a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free situations.
> 
> Most changes are 1:1 replacements except for
> 	BUG_ON(atomic_inc_return(&sh->count) != 1);
> 
> which has been turned into
>         refcount_inc(&sh->count);
>         BUG_ON(refcount_read(&sh->count) != 1);

@@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct
+r5conf *conf, int group)
                sh->group = NULL;
        }
        list_del_init(&sh->lru);
-       BUG_ON(atomic_inc_return(&sh->count) != 1);
+       refcount_inc(&sh->count);
+	BUG_ON(refcount_read(&sh->count) != 1);
        return sh;
 }


That's the only problematic usage.  And I think what it's really saying is:

	BUG_ON(refcount_read(&sh->count) != 0);
	refcount_set(&sh->count, 1);

With that, this looks like a reasonable use of refcount_t to me.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-23 13:21   ` Matthew Wilcox
@ 2018-05-23 17:49     ` Peter Zijlstra
  2018-05-23 19:22       ` Shaohua Li
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-23 17:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-kernel, tglx, Ingo Molnar,
	linux-mm, Shaohua Li, linux-raid, Anna-Maria Gleixner

On Wed, May 23, 2018 at 06:21:19AM -0700, Matthew Wilcox wrote:
> On Wed, May 09, 2018 at 09:36:40PM +0200, Sebastian Andrzej Siewior wrote:
> > refcount_t type and corresponding API should be used instead of atomic_t when
> > the variable is used as a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free situations.
> > 
> > Most changes are 1:1 replacements except for
> > 	BUG_ON(atomic_inc_return(&sh->count) != 1);
> > 
> > which has been turned into
> >         refcount_inc(&sh->count);
> >         BUG_ON(refcount_read(&sh->count) != 1);
> 
> @@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct
> +r5conf *conf, int group)
>                 sh->group = NULL;
>         }
>         list_del_init(&sh->lru);
> -       BUG_ON(atomic_inc_return(&sh->count) != 1);
> +       refcount_inc(&sh->count);
> +	BUG_ON(refcount_read(&sh->count) != 1);
>         return sh;
>  }
> 
> 
> That's the only problematic usage.  And I think what it's really saying is:
> 
> 	BUG_ON(refcount_read(&sh->count) != 0);
> 	refcount_set(&sh->count, 1);
> 
> With that, this looks like a reasonable use of refcount_t to me.

I'm not so sure, look at:

  r5c_do_reclaim():

	if (!list_empty(&sh->lru) &&
	    !test_bit(STRIPE_HANDLE, &sh->state) &&
	    atomic_read(&sh->count) == 0) {
	      r5c_flush_stripe(cond, sh)

Which does:

  r5c_flush_stripe():

	atomic_inc(&sh->count);

Which is another inc-from-zero. Also, having sh's with count==0 in a
list is counter to the concept of refcounts and smells like usage-counts
to me. For refcount 0 really means deads and gone.

If this really is supposed to be a refcount, someone more familiar with
the raid5 should do the patch and write a comprehensive changelog on it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-23 17:49     ` Peter Zijlstra
@ 2018-05-23 19:22       ` Shaohua Li
  2018-05-24  7:14         ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Shaohua Li @ 2018-05-23 19:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matthew Wilcox, Sebastian Andrzej Siewior, linux-kernel, tglx,
	Ingo Molnar, linux-mm, linux-raid, Anna-Maria Gleixner

On Wed, May 23, 2018 at 07:49:04PM +0200, Peter Zijlstra wrote:
> On Wed, May 23, 2018 at 06:21:19AM -0700, Matthew Wilcox wrote:
> > On Wed, May 09, 2018 at 09:36:40PM +0200, Sebastian Andrzej Siewior wrote:
> > > refcount_t type and corresponding API should be used instead of atomic_t when
> > > the variable is used as a reference counter. This allows to avoid accidental
> > > refcounter overflows that might lead to use-after-free situations.
> > > 
> > > Most changes are 1:1 replacements except for
> > > 	BUG_ON(atomic_inc_return(&sh->count) != 1);
> > > 
> > > which has been turned into
> > >         refcount_inc(&sh->count);
> > >         BUG_ON(refcount_read(&sh->count) != 1);
> > 
> > @@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct
> > +r5conf *conf, int group)
> >                 sh->group = NULL;
> >         }
> >         list_del_init(&sh->lru);
> > -       BUG_ON(atomic_inc_return(&sh->count) != 1);
> > +       refcount_inc(&sh->count);
> > +	BUG_ON(refcount_read(&sh->count) != 1);
> >         return sh;
> >  }
> > 
> > 
> > That's the only problematic usage.  And I think what it's really saying is:
> > 
> > 	BUG_ON(refcount_read(&sh->count) != 0);
> > 	refcount_set(&sh->count, 1);
> > 
> > With that, this looks like a reasonable use of refcount_t to me.
> 
> I'm not so sure, look at:
> 
>   r5c_do_reclaim():
> 
> 	if (!list_empty(&sh->lru) &&
> 	    !test_bit(STRIPE_HANDLE, &sh->state) &&
> 	    atomic_read(&sh->count) == 0) {
> 	      r5c_flush_stripe(cond, sh)
> 
> Which does:
> 
>   r5c_flush_stripe():
> 
> 	atomic_inc(&sh->count);
> 
> Which is another inc-from-zero. Also, having sh's with count==0 in a
> list is counter to the concept of refcounts and smells like usage-counts
> to me. For refcount 0 really means deads and gone.
> 
> If this really is supposed to be a refcount, someone more familiar with
> the raid5 should do the patch and write a comprehensive changelog on it.

I don't know what is changed in the refcount, such raid5 change has attempted
before and didn't work. 0 for the stripe count is a valid usage and we do
inc-from-zero in several places.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t
  2018-05-23 19:22       ` Shaohua Li
@ 2018-05-24  7:14         ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-24  7:14 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Matthew Wilcox, Sebastian Andrzej Siewior, linux-kernel, tglx,
	Ingo Molnar, linux-mm, linux-raid, Anna-Maria Gleixner

On Wed, May 23, 2018 at 12:22:39PM -0700, Shaohua Li wrote:
> I don't know what is changed in the refcount, such raid5 change has attempted
> before and didn't work. 0 for the stripe count is a valid usage and we do
> inc-from-zero in several places.

Nothing much has changed with refcount; and the above does indeed still
appear to be an issue. Thanks for confirming.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-05-24  7:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-09 19:36 [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
2018-05-09 19:36 ` [PATCH 1/8] bdi: use refcount_t for reference counting instead atomic_t Sebastian Andrzej Siewior
2018-05-09 19:36 ` [PATCH 2/8] userns: " Sebastian Andrzej Siewior
2018-05-09 19:36 ` [PATCH 3/8] md: raid5: " Sebastian Andrzej Siewior
2018-05-23 12:36   ` Peter Zijlstra
2018-05-23 12:50     ` Sebastian Andrzej Siewior
2018-05-23 12:55       ` Peter Zijlstra
2018-05-23 13:21   ` Matthew Wilcox
2018-05-23 17:49     ` Peter Zijlstra
2018-05-23 19:22       ` Shaohua Li
2018-05-24  7:14         ` Peter Zijlstra
2018-05-09 19:36 ` [PATCH 4/8] locking/refcount: implement refcount_dec_and_lock_irqsave() Sebastian Andrzej Siewior
2018-05-09 19:36 ` [PATCH 5/8] bdi: Use irqsave variant of refcount_dec_and_lock() Sebastian Andrzej Siewior
2018-05-09 19:36 ` [PATCH 6/8] userns: " Sebastian Andrzej Siewior
2018-05-09 19:36 ` [PATCH 7/8] md: raid5: " Sebastian Andrzej Siewior
2018-05-09 19:36 ` [PATCH 8/8] md: raid5: Do not disable irq on release_inactive_stripe_list() call Sebastian Andrzej Siewior
2018-05-23 13:01 ` [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave() Peter Zijlstra
2018-05-23 13:09   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.