All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] v5 kernel core pieces refcount conversions
@ 2017-08-30 12:22 Elena Reshetova
  2017-08-30 12:22 ` [PATCH 01/15] sched: convert sighand_struct.count to refcount_t Elena Reshetova
                   ` (15 more replies)
  0 siblings, 16 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
arm and others on their way.

Changes in v5:
 * Kees catched that the following changes in
   perf_event_context.refcount and futex_pi_state.refcount
   are not correct now when ARCH_HAS_REFCOUNT is enabled:
    -	WARN_ON(!atomic_inc_not_zero(refcount));
    +	refcount_inc(refcount);
   So they are now changed back to using refcount_inc_not_zero. 

Changes in v4:
 * just rebase and corrections on linux-next/master

Changes in v3:
 * SoB chain corrected
 * minor corrections based on v2 feedback
 * rebase on linux-next/master as of today

Changes in v2:
 * dropped already merged patches
 * rebase on top of linux-next/master
 * Now by default refcount_t = atomic_t (*) and uses all atomic
   standard operations unless CONFIG_REFCOUNT_FULL is enabled.
   This is a compromise for the systems that are critical on
   performance (such as net) and cannot accept even slight delay
   on the refcounter operations.

This series, for core kernel components, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The patches are fully independent and can be cherry-picked separately.
If there are no objections to the patches, please merge them via respective trees.


Elena Reshetova (15):
  sched: convert sighand_struct.count to refcount_t
  sched: convert signal_struct.sigcnt to refcount_t
  sched: convert user_struct.__count to refcount_t
  sched: convert numa_group.refcount to refcount_t
  sched/task_struct: convert task_struct.usage to refcount_t
  sched/task_struct: convert task_struct.stack_refcount to refcount_t
  perf: convert perf_event_context.refcount to refcount_t
  perf/ring_buffer: convert ring_buffer.refcount to refcount_t
  perf/ring_buffer: convert ring_buffer.aux_refcount to refcount_t
  uprobes: convert uprobe.ref to refcount_t
  nsproxy: convert nsproxy.count to refcount_t
  groups: convert group_info.usage to refcount_t
  creds: convert cred.usage to refcount_t
  futex: convert futex_pi_state.refcount to refcount_t
  kcov: convert kcov.refcount to refcount_t

 fs/exec.c                        |  4 ++--
 fs/proc/task_nommu.c             |  2 +-
 include/linux/cred.h             | 13 ++++++------
 include/linux/init_task.h        |  7 +++---
 include/linux/nsproxy.h          |  6 +++---
 include/linux/perf_event.h       |  3 ++-
 include/linux/sched.h            |  5 +++--
 include/linux/sched/signal.h     |  5 +++--
 include/linux/sched/task.h       |  4 ++--
 include/linux/sched/task_stack.h |  2 +-
 include/linux/sched/user.h       |  5 +++--
 kernel/cred.c                    | 46 ++++++++++++++++++++--------------------
 kernel/events/core.c             | 18 ++++++++--------
 kernel/events/internal.h         |  5 +++--
 kernel/events/ring_buffer.c      |  8 +++----
 kernel/events/uprobes.c          |  8 +++----
 kernel/fork.c                    | 24 ++++++++++-----------
 kernel/futex.c                   | 13 ++++++------
 kernel/groups.c                  |  2 +-
 kernel/kcov.c                    |  9 ++++----
 kernel/nsproxy.c                 |  6 +++---
 kernel/sched/fair.c              | 12 +++++------
 kernel/user.c                    |  8 +++----
 23 files changed, 112 insertions(+), 103 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 01/15] sched: convert sighand_struct.count to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 02/15] sched: convert signal_struct.sigcnt " Elena Reshetova
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable sighand_struct.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 fs/exec.c                    | 4 ++--
 fs/proc/task_nommu.c         | 2 +-
 include/linux/init_task.h    | 2 +-
 include/linux/sched/signal.h | 3 ++-
 kernel/fork.c                | 8 ++++----
 5 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index daa19d8..5c8960d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1200,7 +1200,7 @@ static int de_thread(struct task_struct *tsk)
 	flush_itimer_signals();
 #endif
 
-	if (atomic_read(&oldsighand->count) != 1) {
+	if (refcount_read(&oldsighand->count) != 1) {
 		struct sighand_struct *newsighand;
 		/*
 		 * This ->sighand is shared with the CLONE_SIGHAND
@@ -1210,7 +1210,7 @@ static int de_thread(struct task_struct *tsk)
 		if (!newsighand)
 			return -ENOMEM;
 
-		atomic_set(&newsighand->count, 1);
+		refcount_set(&newsighand->count, 1);
 		memcpy(newsighand->action, oldsighand->action,
 		       sizeof(newsighand->action));
 
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 8652209..5038b6b 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -63,7 +63,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	else
 		bytes += kobjsize(current->files);
 
-	if (current->sighand && atomic_read(&current->sighand->count) > 1)
+	if (current->sighand && refcount_read(&current->sighand->count) > 1)
 		sbytes += kobjsize(current->sighand);
 	else
 		bytes += kobjsize(current->sighand);
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 3c07ace..6c6f520 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -85,7 +85,7 @@ extern struct fs_struct init_fs;
 extern struct nsproxy init_nsproxy;
 
 #define INIT_SIGHAND(sighand) {						\
-	.count		= ATOMIC_INIT(1), 				\
+	.count		= REFCOUNT_INIT(1), 				\
 	.action		= { { { .sa_handler = SIG_DFL, } }, },		\
 	.siglock	= __SPIN_LOCK_UNLOCKED(sighand.siglock),	\
 	.signalfd_wqh	= __WAIT_QUEUE_HEAD_INITIALIZER(sighand.signalfd_wqh),	\
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 2a0dd40..4d5cdf1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -7,13 +7,14 @@
 #include <linux/sched/jobctl.h>
 #include <linux/sched/task.h>
 #include <linux/cred.h>
+#include <linux/refcount.h>
 
 /*
  * Types defining task->signal and task->sighand and APIs using them:
  */
 
 struct sighand_struct {
-	atomic_t		count;
+	refcount_t		count;
 	struct k_sigaction	action[_NSIG];
 	spinlock_t		siglock;
 	wait_queue_head_t	signalfd_wqh;
diff --git a/kernel/fork.c b/kernel/fork.c
index 8004fc7..43bcd20 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1320,7 +1320,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
 	struct sighand_struct *sig;
 
 	if (clone_flags & CLONE_SIGHAND) {
-		atomic_inc(&current->sighand->count);
+		refcount_inc(&current->sighand->count);
 		return 0;
 	}
 	sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL);
@@ -1328,14 +1328,14 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
 	if (!sig)
 		return -ENOMEM;
 
-	atomic_set(&sig->count, 1);
+	refcount_set(&sig->count, 1);
 	memcpy(sig->action, current->sighand->action, sizeof(sig->action));
 	return 0;
 }
 
 void __cleanup_sighand(struct sighand_struct *sighand)
 {
-	if (atomic_dec_and_test(&sighand->count)) {
+	if (refcount_dec_and_test(&sighand->count)) {
 		signalfd_cleanup(sighand);
 		/*
 		 * sighand_cachep is SLAB_TYPESAFE_BY_RCU so we can free it
@@ -2236,7 +2236,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
 			return -EINVAL;
 	}
 	if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
-		if (atomic_read(&current->sighand->count) > 1)
+		if (refcount_read(&current->sighand->count) > 1)
 			return -EINVAL;
 	}
 	if (unshare_flags & CLONE_VM) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 02/15] sched: convert signal_struct.sigcnt to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
  2017-08-30 12:22 ` [PATCH 01/15] sched: convert sighand_struct.count to refcount_t Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 03/15] sched: convert user_struct.__count " Elena Reshetova
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable signal_struct.sigcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/sched/signal.h | 2 +-
 kernel/fork.c                | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 4d5cdf1..c5f1a67 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -77,7 +77,7 @@ struct thread_group_cputimer {
  * the locking of signal_struct.
  */
 struct signal_struct {
-	atomic_t		sigcnt;
+	refcount_t		sigcnt;
 	atomic_t		live;
 	int			nr_threads;
 	struct list_head	thread_head;
diff --git a/kernel/fork.c b/kernel/fork.c
index 43bcd20..d2e2eb0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -400,7 +400,7 @@ static inline void free_signal_struct(struct signal_struct *sig)
 
 static inline void put_signal_struct(struct signal_struct *sig)
 {
-	if (atomic_dec_and_test(&sig->sigcnt))
+	if (refcount_dec_and_test(&sig->sigcnt))
 		free_signal_struct(sig);
 }
 
@@ -1382,7 +1382,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 
 	sig->nr_threads = 1;
 	atomic_set(&sig->live, 1);
-	atomic_set(&sig->sigcnt, 1);
+	refcount_set(&sig->sigcnt, 1);
 
 	/* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
 	sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
@@ -1887,7 +1887,7 @@ static __latent_entropy struct task_struct *copy_process(
 		} else {
 			current->signal->nr_threads++;
 			atomic_inc(&current->signal->live);
-			atomic_inc(&current->signal->sigcnt);
+			refcount_inc(&current->signal->sigcnt);
 			list_add_tail_rcu(&p->thread_group,
 					  &p->group_leader->thread_group);
 			list_add_tail_rcu(&p->thread_node,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 03/15] sched: convert user_struct.__count to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
  2017-08-30 12:22 ` [PATCH 01/15] sched: convert sighand_struct.count to refcount_t Elena Reshetova
  2017-08-30 12:22 ` [PATCH 02/15] sched: convert signal_struct.sigcnt " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 04/15] sched: convert numa_group.refcount " Elena Reshetova
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable user_struct.__count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/sched/user.h | 5 +++--
 kernel/user.c              | 8 ++++----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 3c07e41..afcbf19 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -3,6 +3,7 @@
 
 #include <linux/uidgid.h>
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 
 struct key;
 
@@ -10,7 +11,7 @@ struct key;
  * Some day this will be a full-fledged user tracking system..
  */
 struct user_struct {
-	atomic_t __count;	/* reference count */
+	refcount_t __count;	/* reference count */
 	atomic_t processes;	/* How many processes does this user have? */
 	atomic_t sigpending;	/* How many pending signals does this user have? */
 #ifdef CONFIG_FANOTIFY
@@ -54,7 +55,7 @@ extern struct user_struct root_user;
 extern struct user_struct * alloc_uid(kuid_t);
 static inline struct user_struct *get_uid(struct user_struct *u)
 {
-	atomic_inc(&u->__count);
+	refcount_inc(&u->__count);
 	return u;
 }
 extern void free_uid(struct user_struct *);
diff --git a/kernel/user.c b/kernel/user.c
index 00281ad..c072348 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -90,7 +90,7 @@ static DEFINE_SPINLOCK(uidhash_lock);
 
 /* root_user.__count is 1, for init task cred */
 struct user_struct root_user = {
-	.__count	= ATOMIC_INIT(1),
+	.__count	= REFCOUNT_INIT(1),
 	.processes	= ATOMIC_INIT(1),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
@@ -116,7 +116,7 @@ static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)
 
 	hlist_for_each_entry(user, hashent, uidhash_node) {
 		if (uid_eq(user->uid, uid)) {
-			atomic_inc(&user->__count);
+			refcount_inc(&user->__count);
 			return user;
 		}
 	}
@@ -163,7 +163,7 @@ void free_uid(struct user_struct *up)
 		return;
 
 	local_irq_save(flags);
-	if (atomic_dec_and_lock(&up->__count, &uidhash_lock))
+	if (refcount_dec_and_lock(&up->__count, &uidhash_lock))
 		free_user(up, flags);
 	else
 		local_irq_restore(flags);
@@ -184,7 +184,7 @@ struct user_struct *alloc_uid(kuid_t uid)
 			goto out_unlock;
 
 		new->uid = uid;
-		atomic_set(&new->__count, 1);
+		refcount_set(&new->__count, 1);
 
 		/*
 		 * Before adding this, check whether we raced
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 04/15] sched: convert numa_group.refcount to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (2 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 03/15] sched: convert user_struct.__count " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 05/15] sched/task_struct: convert task_struct.usage " Elena Reshetova
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable numa_group.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/sched/fair.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a5d83ed..fe4e38c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1059,7 +1059,7 @@ unsigned int sysctl_numa_balancing_scan_size = 256;
 unsigned int sysctl_numa_balancing_scan_delay = 1000;
 
 struct numa_group {
-	atomic_t refcount;
+	refcount_t refcount;
 
 	spinlock_t lock; /* nr_tasks, tasks */
 	int nr_tasks;
@@ -1128,7 +1128,7 @@ static unsigned int task_scan_start(struct task_struct *p)
 		unsigned long shared = group_faults_shared(ng);
 		unsigned long private = group_faults_priv(ng);
 
-		period *= atomic_read(&ng->refcount);
+		period *= refcount_read(&ng->refcount);
 		period *= shared + 1;
 		period /= private + shared + 1;
 	}
@@ -1151,7 +1151,7 @@ static unsigned int task_scan_max(struct task_struct *p)
 		unsigned long private = group_faults_priv(ng);
 		unsigned long period = smax;
 
-		period *= atomic_read(&ng->refcount);
+		period *= refcount_read(&ng->refcount);
 		period *= shared + 1;
 		period /= private + shared + 1;
 
@@ -2236,12 +2236,12 @@ static void task_numa_placement(struct task_struct *p)
 
 static inline int get_numa_group(struct numa_group *grp)
 {
-	return atomic_inc_not_zero(&grp->refcount);
+	return refcount_inc_not_zero(&grp->refcount);
 }
 
 static inline void put_numa_group(struct numa_group *grp)
 {
-	if (atomic_dec_and_test(&grp->refcount))
+	if (refcount_dec_and_test(&grp->refcount))
 		kfree_rcu(grp, rcu);
 }
 
@@ -2262,7 +2262,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
 		if (!grp)
 			return;
 
-		atomic_set(&grp->refcount, 1);
+		refcount_set(&grp->refcount, 1);
 		grp->active_nodes = 1;
 		grp->max_faults_cpu = 0;
 		spin_lock_init(&grp->lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 05/15] sched/task_struct: convert task_struct.usage to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (3 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 04/15] sched: convert numa_group.refcount " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 06/15] sched/task_struct: convert task_struct.stack_refcount " Elena Reshetova
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable task_struct.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/init_task.h  | 2 +-
 include/linux/sched.h      | 3 ++-
 include/linux/sched/task.h | 4 ++--
 kernel/fork.c              | 4 ++--
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6c6f520..d312376 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -227,7 +227,7 @@ extern struct cred init_cred;
 	INIT_TASK_TI(tsk)						\
 	.state		= 0,						\
 	.stack		= init_stack,					\
-	.usage		= ATOMIC_INIT(2),				\
+	.usage		= REFCOUNT_INIT(2),				\
 	.flags		= PF_KTHREAD,					\
 	.prio		= MAX_PRIO-20,					\
 	.static_prio	= MAX_PRIO-20,					\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6110471..1eeb300 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -20,6 +20,7 @@
 #include <linux/seccomp.h>
 #include <linux/nodemask.h>
 #include <linux/rcupdate.h>
+#include <linux/refcount.h>
 #include <linux/resource.h>
 #include <linux/latencytop.h>
 #include <linux/sched/prio.h>
@@ -534,7 +535,7 @@ struct task_struct {
 	randomized_struct_fields_start
 
 	void				*stack;
-	atomic_t			usage;
+	refcount_t			usage;
 	/* Per task flags (PF_*), defined further below: */
 	unsigned int			flags;
 	unsigned int			ptrace;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 79a2a74..2ddc9b0 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -85,13 +85,13 @@ extern void sched_exec(void);
 #define sched_exec()   {}
 #endif
 
-#define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
+#define get_task_struct(tsk) do { refcount_inc(&(tsk)->usage); } while(0)
 
 extern void __put_task_struct(struct task_struct *t);
 
 static inline void put_task_struct(struct task_struct *t)
 {
-	if (atomic_dec_and_test(&t->usage))
+	if (refcount_dec_and_test(&t->usage))
 		__put_task_struct(t);
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index d2e2eb0..1e54683 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -407,7 +407,7 @@ static inline void put_signal_struct(struct signal_struct *sig)
 void __put_task_struct(struct task_struct *tsk)
 {
 	WARN_ON(!tsk->exit_state);
-	WARN_ON(atomic_read(&tsk->usage));
+	WARN_ON(refcount_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
@@ -564,7 +564,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	 * One for us, one for whoever does the "release_task()" (usually
 	 * parent)
 	 */
-	atomic_set(&tsk->usage, 2);
+	refcount_set(&tsk->usage, 2);
 #ifdef CONFIG_BLK_DEV_IO_TRACE
 	tsk->btrace_seq = 0;
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 06/15] sched/task_struct: convert task_struct.stack_refcount to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (4 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 05/15] sched/task_struct: convert task_struct.usage " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 07/15] perf: convert perf_event_context.refcount " Elena Reshetova
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable task_struct.stack_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/init_task.h        | 3 ++-
 include/linux/sched.h            | 2 +-
 include/linux/sched/task_stack.h | 2 +-
 kernel/fork.c                    | 6 +++---
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index d312376..af2ebe6 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -12,6 +12,7 @@
 #include <linux/securebits.h>
 #include <linux/seqlock.h>
 #include <linux/rbtree.h>
+#include <linux/refcount.h>
 #include <linux/sched/autogroup.h>
 #include <net/net_namespace.h>
 #include <linux/sched/rt.h>
@@ -207,7 +208,7 @@ extern struct cred init_cred;
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 # define INIT_TASK_TI(tsk)			\
 	.thread_info = INIT_THREAD_INFO(tsk),	\
-	.stack_refcount = ATOMIC_INIT(1),
+	.stack_refcount = REFCOUNT_INIT(1),
 #else
 # define INIT_TASK_TI(tsk)
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1eeb300..3d21012 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1089,7 +1089,7 @@ struct task_struct {
 #endif
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/* A live task holds one reference: */
-	atomic_t			stack_refcount;
+	refcount_t			stack_refcount;
 #endif
 #ifdef CONFIG_LIVEPATCH
 	int patch_state;
diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h
index df6ea66..aab3809 100644
--- a/include/linux/sched/task_stack.h
+++ b/include/linux/sched/task_stack.h
@@ -60,7 +60,7 @@ static inline unsigned long *end_of_stack(struct task_struct *p)
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 static inline void *try_get_task_stack(struct task_struct *tsk)
 {
-	return atomic_inc_not_zero(&tsk->stack_refcount) ?
+	return refcount_inc_not_zero(&tsk->stack_refcount) ?
 		task_stack_page(tsk) : NULL;
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 1e54683..4208c79 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -355,7 +355,7 @@ static void release_task_stack(struct task_struct *tsk)
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 void put_task_stack(struct task_struct *tsk)
 {
-	if (atomic_dec_and_test(&tsk->stack_refcount))
+	if (refcount_dec_and_test(&tsk->stack_refcount))
 		release_task_stack(tsk);
 }
 #endif
@@ -373,7 +373,7 @@ void free_task(struct task_struct *tsk)
 	 * If the task had a separate stack allocation, it should be gone
 	 * by now.
 	 */
-	WARN_ON_ONCE(atomic_read(&tsk->stack_refcount) != 0);
+	WARN_ON_ONCE(refcount_read(&tsk->stack_refcount) != 0);
 #endif
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
@@ -535,7 +535,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	tsk->stack_vm_area = stack_vm_area;
 #endif
 #ifdef CONFIG_THREAD_INFO_IN_TASK
-	atomic_set(&tsk->stack_refcount, 1);
+	refcount_set(&tsk->stack_refcount, 1);
 #endif
 
 	if (err)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 07/15] perf: convert perf_event_context.refcount to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (5 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 06/15] sched/task_struct: convert task_struct.stack_refcount " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 08/15] perf/ring_buffer: convert ring_buffer.refcount " Elena Reshetova
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable perf_event_context.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/perf_event.h |  3 ++-
 kernel/events/core.c       | 12 ++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9bac4bf..75c11181 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -54,6 +54,7 @@ struct perf_guest_info_callbacks {
 #include <linux/perf_regs.h>
 #include <linux/workqueue.h>
 #include <linux/cgroup.h>
+#include <linux/refcount.h>
 #include <asm/local.h>
 
 struct perf_callchain_entry {
@@ -737,7 +738,7 @@ struct perf_event_context {
 	int				nr_stat;
 	int				nr_freq;
 	int				rotate_disable;
-	atomic_t			refcount;
+	refcount_t			refcount;
 	struct task_struct		*task;
 
 	/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 852768d..57ef7ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1109,7 +1109,7 @@ static void perf_event_ctx_deactivate(struct perf_event_context *ctx)
 
 static void get_ctx(struct perf_event_context *ctx)
 {
-	WARN_ON(!atomic_inc_not_zero(&ctx->refcount));
+	WARN_ON(!refcount_inc_not_zero(&ctx->refcount));
 }
 
 static void free_ctx(struct rcu_head *head)
@@ -1123,7 +1123,7 @@ static void free_ctx(struct rcu_head *head)
 
 static void put_ctx(struct perf_event_context *ctx)
 {
-	if (atomic_dec_and_test(&ctx->refcount)) {
+	if (refcount_dec_and_test(&ctx->refcount)) {
 		if (ctx->parent_ctx)
 			put_ctx(ctx->parent_ctx);
 		if (ctx->task && ctx->task != TASK_TOMBSTONE)
@@ -1201,7 +1201,7 @@ perf_event_ctx_lock_nested(struct perf_event *event, int nesting)
 again:
 	rcu_read_lock();
 	ctx = ACCESS_ONCE(event->ctx);
-	if (!atomic_inc_not_zero(&ctx->refcount)) {
+	if (!refcount_inc_not_zero(&ctx->refcount)) {
 		rcu_read_unlock();
 		goto again;
 	}
@@ -1329,7 +1329,7 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
 		}
 
 		if (ctx->task == TASK_TOMBSTONE ||
-		    !atomic_inc_not_zero(&ctx->refcount)) {
+		    !refcount_inc_not_zero(&ctx->refcount)) {
 			raw_spin_unlock(&ctx->lock);
 			ctx = NULL;
 		} else {
@@ -3795,7 +3795,7 @@ static void __perf_event_init_context(struct perf_event_context *ctx)
 	INIT_LIST_HEAD(&ctx->pinned_groups);
 	INIT_LIST_HEAD(&ctx->flexible_groups);
 	INIT_LIST_HEAD(&ctx->event_list);
-	atomic_set(&ctx->refcount, 1);
+	refcount_set(&ctx->refcount, 1);
 }
 
 static struct perf_event_context *
@@ -9812,7 +9812,7 @@ __perf_event_ctx_lock_double(struct perf_event *group_leader,
 again:
 	rcu_read_lock();
 	gctx = READ_ONCE(group_leader->ctx);
-	if (!atomic_inc_not_zero(&gctx->refcount)) {
+	if (!refcount_inc_not_zero(&gctx->refcount)) {
 		rcu_read_unlock();
 		goto again;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 08/15] perf/ring_buffer: convert ring_buffer.refcount to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (6 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 07/15] perf: convert perf_event_context.refcount " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 09/15] perf/ring_buffer: convert ring_buffer.aux_refcount " Elena Reshetova
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable ring_buffer.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/events/core.c        | 4 ++--
 kernel/events/internal.h    | 3 ++-
 kernel/events/ring_buffer.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 57ef7ce..43da16ef 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5089,7 +5089,7 @@ struct ring_buffer *ring_buffer_get(struct perf_event *event)
 	rcu_read_lock();
 	rb = rcu_dereference(event->rb);
 	if (rb) {
-		if (!atomic_inc_not_zero(&rb->refcount))
+		if (!refcount_inc_not_zero(&rb->refcount))
 			rb = NULL;
 	}
 	rcu_read_unlock();
@@ -5099,7 +5099,7 @@ struct ring_buffer *ring_buffer_get(struct perf_event *event)
 
 void ring_buffer_put(struct ring_buffer *rb)
 {
-	if (!atomic_dec_and_test(&rb->refcount))
+	if (!refcount_dec_and_test(&rb->refcount))
 		return;
 
 	WARN_ON_ONCE(!list_empty(&rb->event_list));
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 486fd78..b8e6fdf 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -3,13 +3,14 @@
 
 #include <linux/hardirq.h>
 #include <linux/uaccess.h>
+#include <linux/refcount.h>
 
 /* Buffer handling */
 
 #define RING_BUFFER_WRITABLE		0x01
 
 struct ring_buffer {
-	atomic_t			refcount;
+	refcount_t			refcount;
 	struct rcu_head			rcu_head;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index ee97196..3353572 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -284,7 +284,7 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 	else
 		rb->overwrite = 1;
 
-	atomic_set(&rb->refcount, 1);
+	refcount_set(&rb->refcount, 1);
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 09/15] perf/ring_buffer: convert ring_buffer.aux_refcount to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (7 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 08/15] perf/ring_buffer: convert ring_buffer.refcount " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 10/15] uprobes: convert uprobe.ref " Elena Reshetova
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable ring_buffer.aux_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/events/core.c        | 2 +-
 kernel/events/internal.h    | 2 +-
 kernel/events/ring_buffer.c | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 43da16ef..09c6704 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5164,7 +5164,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 
 		/* this has to be the last one */
 		rb_free_aux(rb);
-		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
+		WARN_ON_ONCE(refcount_read(&rb->aux_refcount));
 
 		mutex_unlock(&event->mmap_mutex);
 	}
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index b8e6fdf..fb55716 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -48,7 +48,7 @@ struct ring_buffer {
 	atomic_t			aux_mmap_count;
 	unsigned long			aux_mmap_locked;
 	void				(*free_aux)(void *);
-	atomic_t			aux_refcount;
+	refcount_t			aux_refcount;
 	void				**aux_pages;
 	void				*aux_priv;
 
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 3353572..ba12fd2 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -357,7 +357,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	if (!atomic_read(&rb->aux_mmap_count))
 		goto err;
 
-	if (!atomic_inc_not_zero(&rb->aux_refcount))
+	if (!refcount_inc_not_zero(&rb->aux_refcount))
 		goto err;
 
 	/*
@@ -648,7 +648,7 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 	 * we keep a refcount here to make sure either of the two can
 	 * reference them safely.
 	 */
-	atomic_set(&rb->aux_refcount, 1);
+	refcount_set(&rb->aux_refcount, 1);
 
 	rb->aux_overwrite = overwrite;
 	rb->aux_watermark = watermark;
@@ -667,7 +667,7 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 
 void rb_free_aux(struct ring_buffer *rb)
 {
-	if (atomic_dec_and_test(&rb->aux_refcount))
+	if (refcount_dec_and_test(&rb->aux_refcount))
 		__rb_free_aux(rb);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 10/15] uprobes: convert uprobe.ref to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (8 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 09/15] perf/ring_buffer: convert ring_buffer.aux_refcount " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 11/15] nsproxy: convert nsproxy.count " Elena Reshetova
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable uprobe.ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/events/uprobes.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 456596d..ca093957 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -66,7 +66,7 @@ static struct percpu_rw_semaphore dup_mmap_sem;
 
 struct uprobe {
 	struct rb_node		rb_node;	/* node in the rb tree */
-	atomic_t		ref;
+	refcount_t		ref;
 	struct rw_semaphore	register_rwsem;
 	struct rw_semaphore	consumer_rwsem;
 	struct list_head	pending_list;
@@ -371,13 +371,13 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
 {
-	atomic_inc(&uprobe->ref);
+	refcount_inc(&uprobe->ref);
 	return uprobe;
 }
 
 static void put_uprobe(struct uprobe *uprobe)
 {
-	if (atomic_dec_and_test(&uprobe->ref))
+	if (refcount_dec_and_test(&uprobe->ref))
 		kfree(uprobe);
 }
 
@@ -459,7 +459,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
 	rb_link_node(&uprobe->rb_node, parent, p);
 	rb_insert_color(&uprobe->rb_node, &uprobes_tree);
 	/* get access + creation ref */
-	atomic_set(&uprobe->ref, 2);
+	refcount_set(&uprobe->ref, 2);
 
 	return u;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 11/15] nsproxy: convert nsproxy.count to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (9 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 10/15] uprobes: convert uprobe.ref " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 12/15] groups: convert group_info.usage " Elena Reshetova
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nsproxy.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/nsproxy.h | 6 +++---
 kernel/nsproxy.c        | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index ac0d65b..f862ba8 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -28,7 +28,7 @@ struct fs_struct;
  * nsproxy is copied.
  */
 struct nsproxy {
-	atomic_t count;
+	refcount_t count;
 	struct uts_namespace *uts_ns;
 	struct ipc_namespace *ipc_ns;
 	struct mnt_namespace *mnt_ns;
@@ -74,14 +74,14 @@ int __init nsproxy_cache_init(void);
 
 static inline void put_nsproxy(struct nsproxy *ns)
 {
-	if (atomic_dec_and_test(&ns->count)) {
+	if (refcount_dec_and_test(&ns->count)) {
 		free_nsproxy(ns);
 	}
 }
 
 static inline void get_nsproxy(struct nsproxy *ns)
 {
-	atomic_inc(&ns->count);
+	refcount_inc(&ns->count);
 }
 
 #endif
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f6c5d33..5bfe691 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -31,7 +31,7 @@
 static struct kmem_cache *nsproxy_cachep;
 
 struct nsproxy init_nsproxy = {
-	.count			= ATOMIC_INIT(1),
+	.count			= REFCOUNT_INIT(1),
 	.uts_ns			= &init_uts_ns,
 #if defined(CONFIG_POSIX_MQUEUE) || defined(CONFIG_SYSVIPC)
 	.ipc_ns			= &init_ipc_ns,
@@ -52,7 +52,7 @@ static inline struct nsproxy *create_nsproxy(void)
 
 	nsproxy = kmem_cache_alloc(nsproxy_cachep, GFP_KERNEL);
 	if (nsproxy)
-		atomic_set(&nsproxy->count, 1);
+		refcount_set(&nsproxy->count, 1);
 	return nsproxy;
 }
 
@@ -225,7 +225,7 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
 	p->nsproxy = new;
 	task_unlock(p);
 
-	if (ns && atomic_dec_and_test(&ns->count))
+	if (ns && refcount_dec_and_test(&ns->count))
 		free_nsproxy(ns);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 12/15] groups: convert group_info.usage to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (10 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 11/15] nsproxy: convert nsproxy.count " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 13/15] creds: convert cred.usage " Elena Reshetova
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable group_info.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/cred.h | 7 ++++---
 kernel/cred.c        | 2 +-
 kernel/groups.c      | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 099058e..00948dd 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -17,6 +17,7 @@
 #include <linux/key.h>
 #include <linux/selinux.h>
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/uidgid.h>
 #include <linux/sched.h>
 #include <linux/sched/user.h>
@@ -28,7 +29,7 @@ struct inode;
  * COW Supplementary groups list
  */
 struct group_info {
-	atomic_t	usage;
+	refcount_t	usage;
 	int		ngroups;
 	kgid_t		gid[0];
 } __randomize_layout;
@@ -44,7 +45,7 @@ struct group_info {
  */
 static inline struct group_info *get_group_info(struct group_info *gi)
 {
-	atomic_inc(&gi->usage);
+	refcount_inc(&gi->usage);
 	return gi;
 }
 
@@ -54,7 +55,7 @@ static inline struct group_info *get_group_info(struct group_info *gi)
  */
 #define put_group_info(group_info)			\
 do {							\
-	if (atomic_dec_and_test(&(group_info)->usage))	\
+	if (refcount_dec_and_test(&(group_info)->usage))	\
 		groups_free(group_info);		\
 } while (0)
 
diff --git a/kernel/cred.c b/kernel/cred.c
index ecf0365..8122d7c 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -36,7 +36,7 @@ do {									\
 static struct kmem_cache *cred_jar;
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
-struct group_info init_groups = { .usage = ATOMIC_INIT(2) };
+struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
 
 /*
  * The initial credentials for the initial task
diff --git a/kernel/groups.c b/kernel/groups.c
index 434f666..5fc6e21 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -23,7 +23,7 @@ struct group_info *groups_alloc(int gidsetsize)
 	if (!gi)
 		return NULL;
 
-	atomic_set(&gi->usage, 1);
+	refcount_set(&gi->usage, 1);
 	gi->ngroups = gidsetsize;
 	return gi;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 13/15] creds: convert cred.usage to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (11 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 12/15] groups: convert group_info.usage " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-30 12:22 ` [PATCH 14/15] futex: convert futex_pi_state.refcount " Elena Reshetova
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable cred.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 include/linux/cred.h |  6 +++---
 kernel/cred.c        | 44 ++++++++++++++++++++++----------------------
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 00948dd..a9f217b 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -109,7 +109,7 @@ extern bool may_setgroups(void);
  * same context as task->real_cred.
  */
 struct cred {
-	atomic_t	usage;
+	refcount_t	usage;
 #ifdef CONFIG_DEBUG_CREDENTIALS
 	atomic_t	subscribers;	/* number of processes subscribed */
 	void		*put_addr;
@@ -222,7 +222,7 @@ static inline bool cap_ambient_invariant_ok(const struct cred *cred)
  */
 static inline struct cred *get_new_cred(struct cred *cred)
 {
-	atomic_inc(&cred->usage);
+	refcount_inc(&cred->usage);
 	return cred;
 }
 
@@ -262,7 +262,7 @@ static inline void put_cred(const struct cred *_cred)
 	struct cred *cred = (struct cred *) _cred;
 
 	validate_creds(cred);
-	if (atomic_dec_and_test(&(cred)->usage))
+	if (refcount_dec_and_test(&(cred)->usage))
 		__put_cred(cred);
 }
 
diff --git a/kernel/cred.c b/kernel/cred.c
index 8122d7c..49664e5 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -42,7 +42,7 @@ struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
  * The initial credentials for the initial task
  */
 struct cred init_cred = {
-	.usage			= ATOMIC_INIT(4),
+	.usage			= REFCOUNT_INIT(4),
 #ifdef CONFIG_DEBUG_CREDENTIALS
 	.subscribers		= ATOMIC_INIT(2),
 	.magic			= CRED_MAGIC,
@@ -101,17 +101,17 @@ static void put_cred_rcu(struct rcu_head *rcu)
 
 #ifdef CONFIG_DEBUG_CREDENTIALS
 	if (cred->magic != CRED_MAGIC_DEAD ||
-	    atomic_read(&cred->usage) != 0 ||
+	    refcount_read(&cred->usage) != 0 ||
 	    read_cred_subscribers(cred) != 0)
 		panic("CRED: put_cred_rcu() sees %p with"
 		      " mag %x, put %p, usage %d, subscr %d\n",
 		      cred, cred->magic, cred->put_addr,
-		      atomic_read(&cred->usage),
+		      refcount_read(&cred->usage),
 		      read_cred_subscribers(cred));
 #else
-	if (atomic_read(&cred->usage) != 0)
+	if (refcount_read(&cred->usage) != 0)
 		panic("CRED: put_cred_rcu() sees %p with usage %d\n",
-		      cred, atomic_read(&cred->usage));
+		      cred, refcount_read(&cred->usage));
 #endif
 
 	security_cred_free(cred);
@@ -135,10 +135,10 @@ static void put_cred_rcu(struct rcu_head *rcu)
 void __put_cred(struct cred *cred)
 {
 	kdebug("__put_cred(%p{%d,%d})", cred,
-	       atomic_read(&cred->usage),
+	       refcount_read(&cred->usage),
 	       read_cred_subscribers(cred));
 
-	BUG_ON(atomic_read(&cred->usage) != 0);
+	BUG_ON(refcount_read(&cred->usage) != 0);
 #ifdef CONFIG_DEBUG_CREDENTIALS
 	BUG_ON(read_cred_subscribers(cred) != 0);
 	cred->magic = CRED_MAGIC_DEAD;
@@ -159,7 +159,7 @@ void exit_creds(struct task_struct *tsk)
 	struct cred *cred;
 
 	kdebug("exit_creds(%u,%p,%p,{%d,%d})", tsk->pid, tsk->real_cred, tsk->cred,
-	       atomic_read(&tsk->cred->usage),
+	       refcount_read(&tsk->cred->usage),
 	       read_cred_subscribers(tsk->cred));
 
 	cred = (struct cred *) tsk->real_cred;
@@ -194,7 +194,7 @@ const struct cred *get_task_cred(struct task_struct *task)
 	do {
 		cred = __task_cred((task));
 		BUG_ON(!cred);
-	} while (!atomic_inc_not_zero(&((struct cred *)cred)->usage));
+	} while (!refcount_inc_not_zero(&((struct cred *)cred)->usage));
 
 	rcu_read_unlock();
 	return cred;
@@ -212,7 +212,7 @@ struct cred *cred_alloc_blank(void)
 	if (!new)
 		return NULL;
 
-	atomic_set(&new->usage, 1);
+	refcount_set(&new->usage, 1);
 #ifdef CONFIG_DEBUG_CREDENTIALS
 	new->magic = CRED_MAGIC;
 #endif
@@ -258,7 +258,7 @@ struct cred *prepare_creds(void)
 	old = task->cred;
 	memcpy(new, old, sizeof(struct cred));
 
-	atomic_set(&new->usage, 1);
+	refcount_set(&new->usage, 1);
 	set_cred_subscribers(new, 0);
 	get_group_info(new->group_info);
 	get_uid(new->user);
@@ -335,7 +335,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 		get_cred(p->cred);
 		alter_cred_subscribers(p->cred, 2);
 		kdebug("share_creds(%p{%d,%d})",
-		       p->cred, atomic_read(&p->cred->usage),
+		       p->cred, refcount_read(&p->cred->usage),
 		       read_cred_subscribers(p->cred));
 		atomic_inc(&p->cred->user->processes);
 		return 0;
@@ -426,7 +426,7 @@ int commit_creds(struct cred *new)
 	const struct cred *old = task->real_cred;
 
 	kdebug("commit_creds(%p{%d,%d})", new,
-	       atomic_read(&new->usage),
+	       refcount_read(&new->usage),
 	       read_cred_subscribers(new));
 
 	BUG_ON(task->cred != old);
@@ -435,7 +435,7 @@ int commit_creds(struct cred *new)
 	validate_creds(old);
 	validate_creds(new);
 #endif
-	BUG_ON(atomic_read(&new->usage) < 1);
+	BUG_ON(refcount_read(&new->usage) < 1);
 
 	get_cred(new); /* we will require a ref for the subj creds too */
 
@@ -500,13 +500,13 @@ EXPORT_SYMBOL(commit_creds);
 void abort_creds(struct cred *new)
 {
 	kdebug("abort_creds(%p{%d,%d})", new,
-	       atomic_read(&new->usage),
+	       refcount_read(&new->usage),
 	       read_cred_subscribers(new));
 
 #ifdef CONFIG_DEBUG_CREDENTIALS
 	BUG_ON(read_cred_subscribers(new) != 0);
 #endif
-	BUG_ON(atomic_read(&new->usage) < 1);
+	BUG_ON(refcount_read(&new->usage) < 1);
 	put_cred(new);
 }
 EXPORT_SYMBOL(abort_creds);
@@ -523,7 +523,7 @@ const struct cred *override_creds(const struct cred *new)
 	const struct cred *old = current->cred;
 
 	kdebug("override_creds(%p{%d,%d})", new,
-	       atomic_read(&new->usage),
+	       refcount_read(&new->usage),
 	       read_cred_subscribers(new));
 
 	validate_creds(old);
@@ -534,7 +534,7 @@ const struct cred *override_creds(const struct cred *new)
 	alter_cred_subscribers(old, -1);
 
 	kdebug("override_creds() = %p{%d,%d}", old,
-	       atomic_read(&old->usage),
+	       refcount_read(&old->usage),
 	       read_cred_subscribers(old));
 	return old;
 }
@@ -552,7 +552,7 @@ void revert_creds(const struct cred *old)
 	const struct cred *override = current->cred;
 
 	kdebug("revert_creds(%p{%d,%d})", old,
-	       atomic_read(&old->usage),
+	       refcount_read(&old->usage),
 	       read_cred_subscribers(old));
 
 	validate_creds(old);
@@ -611,7 +611,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
 	validate_creds(old);
 
 	*new = *old;
-	atomic_set(&new->usage, 1);
+	refcount_set(&new->usage, 1);
 	set_cred_subscribers(new, 0);
 	get_uid(new->user);
 	get_user_ns(new->user_ns);
@@ -735,7 +735,7 @@ static void dump_invalid_creds(const struct cred *cred, const char *label,
 	printk(KERN_ERR "CRED: ->magic=%x, put_addr=%p\n",
 	       cred->magic, cred->put_addr);
 	printk(KERN_ERR "CRED: ->usage=%d, subscr=%d\n",
-	       atomic_read(&cred->usage),
+	       refcount_read(&cred->usage),
 	       read_cred_subscribers(cred));
 	printk(KERN_ERR "CRED: ->*uid = { %d,%d,%d,%d }\n",
 		from_kuid_munged(&init_user_ns, cred->uid),
@@ -809,7 +809,7 @@ void validate_creds_for_do_exit(struct task_struct *tsk)
 {
 	kdebug("validate_creds_for_do_exit(%p,%p{%d,%d})",
 	       tsk->real_cred, tsk->cred,
-	       atomic_read(&tsk->cred->usage),
+	       refcount_read(&tsk->cred->usage),
 	       read_cred_subscribers(tsk->cred));
 
 	__validate_process_creds(tsk, __FILE__, __LINE__);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (12 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 13/15] creds: convert cred.usage " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-09-01  7:39   ` Thomas Gleixner
  2017-08-30 12:22 ` [PATCH 15/15] kcov: convert kcov.refcount " Elena Reshetova
  2017-08-31 23:48 ` [PATCH 00/15] v5 kernel core pieces refcount conversions Kees Cook
  15 siblings, 1 reply; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable futex_pi_state.refcount is used as pure
reference counter. Convert it to refcount_t and fix up
the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/futex.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 0939255..65460b4 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -67,6 +67,7 @@
 #include <linux/freezer.h>
 #include <linux/bootmem.h>
 #include <linux/fault-inject.h>
+#include <linux/refcount.h>
 
 #include <asm/futex.h>
 
@@ -209,7 +210,7 @@ struct futex_pi_state {
 	struct rt_mutex pi_mutex;
 
 	struct task_struct *owner;
-	atomic_t refcount;
+	refcount_t refcount;
 
 	union futex_key key;
 } __randomize_layout;
@@ -795,7 +796,7 @@ static int refill_pi_state_cache(void)
 	INIT_LIST_HEAD(&pi_state->list);
 	/* pi_mutex gets initialized later */
 	pi_state->owner = NULL;
-	atomic_set(&pi_state->refcount, 1);
+	refcount_set(&pi_state->refcount, 1);
 	pi_state->key = FUTEX_KEY_INIT;
 
 	current->pi_state_cache = pi_state;
@@ -815,7 +816,7 @@ static struct futex_pi_state *alloc_pi_state(void)
 
 static void get_pi_state(struct futex_pi_state *pi_state)
 {
-	WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount));
+	WARN_ON_ONCE(!refcount_inc_not_zero(&pi_state->refcount));
 }
 
 /*
@@ -829,7 +830,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 	if (!pi_state)
 		return;
 
-	if (!atomic_dec_and_test(&pi_state->refcount))
+	if (!refcount_dec_and_test(&pi_state->refcount))
 		return;
 
 	/*
@@ -853,7 +854,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 		 * refcount is at 0 - put it back to 1.
 		 */
 		pi_state->owner = NULL;
-		atomic_set(&pi_state->refcount, 1);
+		refcount_set(&pi_state->refcount, 1);
 		current->pi_state_cache = pi_state;
 	}
 }
@@ -1051,7 +1052,7 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
 	 * and futex_wait_requeue_pi() as it cannot go to 0 and consequently
 	 * free pi_state before we can take a reference ourselves.
 	 */
-	WARN_ON(!atomic_read(&pi_state->refcount));
+	WARN_ON(!refcount_read(&pi_state->refcount));
 
 	/*
 	 * Now that we have a pi_state, we can acquire wait_lock
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 15/15] kcov: convert kcov.refcount to refcount_t
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (13 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 14/15] futex: convert futex_pi_state.refcount " Elena Reshetova
@ 2017-08-30 12:22 ` Elena Reshetova
  2017-08-31 23:48 ` [PATCH 00/15] v5 kernel core pieces refcount conversions Kees Cook
  15 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-30 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable kcov.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/kcov.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index cd77199..a6c7df3 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -19,6 +19,7 @@
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
 #include <linux/kcov.h>
+#include <linux/refcount.h>
 #include <asm/setup.h>
 
 /*
@@ -35,7 +36,7 @@ struct kcov {
 	 *  - opened file descriptor
 	 *  - task with enabled coverage (we can't unwire it from another task)
 	 */
-	atomic_t		refcount;
+	refcount_t		refcount;
 	/* The lock protects mode, size, area and t. */
 	spinlock_t		lock;
 	enum kcov_mode		mode;
@@ -94,12 +95,12 @@ EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
 
 static void kcov_get(struct kcov *kcov)
 {
-	atomic_inc(&kcov->refcount);
+	refcount_inc(&kcov->refcount);
 }
 
 static void kcov_put(struct kcov *kcov)
 {
-	if (atomic_dec_and_test(&kcov->refcount)) {
+	if (refcount_dec_and_test(&kcov->refcount)) {
 		vfree(kcov->area);
 		kfree(kcov);
 	}
@@ -175,7 +176,7 @@ static int kcov_open(struct inode *inode, struct file *filep)
 	kcov = kzalloc(sizeof(*kcov), GFP_KERNEL);
 	if (!kcov)
 		return -ENOMEM;
-	atomic_set(&kcov->refcount, 1);
+	refcount_set(&kcov->refcount, 1);
 	spin_lock_init(&kcov->lock);
 	filep->private_data = kcov;
 	return nonseekable_open(inode, filep);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/15] v5 kernel core pieces refcount conversions
  2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
                   ` (14 preceding siblings ...)
  2017-08-30 12:22 ` [PATCH 15/15] kcov: convert kcov.refcount " Elena Reshetova
@ 2017-08-31 23:48 ` Kees Cook
  2017-09-01  9:48   ` Peter Zijlstra
  15 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2017-08-31 23:48 UTC (permalink / raw)
  To: Elena Reshetova, Andrew Morton
  Cc: LKML, linux-fsdevel, Peter Zijlstra, Greg KH, Al Viro, Tejun Heo,
	Ingo Molnar, Johannes Weiner, Li Zefan, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Eric Paris, Arnd Bergmann, Andy Lutomirski,
	Thomas Gleixner, dvhart, Eric W. Biederman

On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
<elena.reshetova@intel.com> wrote:
> Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
> arm and others on their way.
>
> Changes in v5:
>  * Kees catched that the following changes in
>    perf_event_context.refcount and futex_pi_state.refcount
>    are not correct now when ARCH_HAS_REFCOUNT is enabled:
>     -   WARN_ON(!atomic_inc_not_zero(refcount));
>     +   refcount_inc(refcount);
>    So they are now changed back to using refcount_inc_not_zero.

Thanks!

Reviewed-by: Kees Cook <keescook@chromium.org>

Andrew, are you able to carry these patches in -mm, since they span a
bunch of core kernel areas?

-Kees

>
> Changes in v4:
>  * just rebase and corrections on linux-next/master
>
> Changes in v3:
>  * SoB chain corrected
>  * minor corrections based on v2 feedback
>  * rebase on linux-next/master as of today
>
> Changes in v2:
>  * dropped already merged patches
>  * rebase on top of linux-next/master
>  * Now by default refcount_t = atomic_t (*) and uses all atomic
>    standard operations unless CONFIG_REFCOUNT_FULL is enabled.
>    This is a compromise for the systems that are critical on
>    performance (such as net) and cannot accept even slight delay
>    on the refcounter operations.
>
> This series, for core kernel components, replaces atomic_t reference
> counters with the new refcount_t type and API (see include/linux/refcount.h).
> By doing this we prevent intentional or accidental
> underflows or overflows that can led to use-after-free vulnerabilities.
>
> The patches are fully independent and can be cherry-picked separately.
> If there are no objections to the patches, please merge them via respective trees.
>
>
> Elena Reshetova (15):
>   sched: convert sighand_struct.count to refcount_t
>   sched: convert signal_struct.sigcnt to refcount_t
>   sched: convert user_struct.__count to refcount_t
>   sched: convert numa_group.refcount to refcount_t
>   sched/task_struct: convert task_struct.usage to refcount_t
>   sched/task_struct: convert task_struct.stack_refcount to refcount_t
>   perf: convert perf_event_context.refcount to refcount_t
>   perf/ring_buffer: convert ring_buffer.refcount to refcount_t
>   perf/ring_buffer: convert ring_buffer.aux_refcount to refcount_t
>   uprobes: convert uprobe.ref to refcount_t
>   nsproxy: convert nsproxy.count to refcount_t
>   groups: convert group_info.usage to refcount_t
>   creds: convert cred.usage to refcount_t
>   futex: convert futex_pi_state.refcount to refcount_t
>   kcov: convert kcov.refcount to refcount_t
>
>  fs/exec.c                        |  4 ++--
>  fs/proc/task_nommu.c             |  2 +-
>  include/linux/cred.h             | 13 ++++++------
>  include/linux/init_task.h        |  7 +++---
>  include/linux/nsproxy.h          |  6 +++---
>  include/linux/perf_event.h       |  3 ++-
>  include/linux/sched.h            |  5 +++--
>  include/linux/sched/signal.h     |  5 +++--
>  include/linux/sched/task.h       |  4 ++--
>  include/linux/sched/task_stack.h |  2 +-
>  include/linux/sched/user.h       |  5 +++--
>  kernel/cred.c                    | 46 ++++++++++++++++++++--------------------
>  kernel/events/core.c             | 18 ++++++++--------
>  kernel/events/internal.h         |  5 +++--
>  kernel/events/ring_buffer.c      |  8 +++----
>  kernel/events/uprobes.c          |  8 +++----
>  kernel/fork.c                    | 24 ++++++++++-----------
>  kernel/futex.c                   | 13 ++++++------
>  kernel/groups.c                  |  2 +-
>  kernel/kcov.c                    |  9 ++++----
>  kernel/nsproxy.c                 |  6 +++---
>  kernel/sched/fair.c              | 12 +++++------
>  kernel/user.c                    |  8 +++----
>  23 files changed, 112 insertions(+), 103 deletions(-)
>
> --
> 2.7.4
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-08-30 12:22 ` [PATCH 14/15] futex: convert futex_pi_state.refcount " Elena Reshetova
@ 2017-09-01  7:39   ` Thomas Gleixner
  2017-09-01  9:38     ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2017-09-01  7:39 UTC (permalink / raw)
  To: Elena Reshetova
  Cc: linux-kernel, linux-fsdevel, peterz, gregkh, viro, tj, mingo,
	hannes, lizefan, acme, alexander.shishkin, eparis, akpm, arnd,
	luto, keescook, dvhart, ebiederm

On Wed, 30 Aug 2017, Elena Reshetova wrote:
> atomic_t variables are currently used to implement reference
> counters with the following properties:
>  - counter is initialized to 1 using atomic_set()
>  - a resource is freed upon counter reaching zero
>  - once counter reaches zero, its further
>    increments aren't allowed
>  - counter schema uses basic atomic operations
>    (set, inc, inc_not_zero, dec_and_test, etc.)
> 
> Such atomic variables should be converted to a newly provided
> refcount_t type and API that prevents accidental counter overflows
> and underflows. This is important since overflows and underflows
> can lead to use-after-free situation and be exploitable.
> 
> The variable futex_pi_state.refcount is used as pure
> reference counter. Convert it to refcount_t and fix up
> the operations.
> 
> Suggested-by: Kees Cook <keescook@chromium.org>
> Reviewed-by: David Windsor <dwindsor@gmail.com>
> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01  7:39   ` Thomas Gleixner
@ 2017-09-01  9:38     ` Peter Zijlstra
  2017-09-01  9:43       ` Thomas Gleixner
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01  9:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Elena Reshetova, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > atomic_t variables are currently used to implement reference
> > counters with the following properties:
> >  - counter is initialized to 1 using atomic_set()
> >  - a resource is freed upon counter reaching zero
> >  - once counter reaches zero, its further
> >    increments aren't allowed
> >  - counter schema uses basic atomic operations
> >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > 
> > Such atomic variables should be converted to a newly provided
> > refcount_t type and API that prevents accidental counter overflows
> > and underflows. This is important since overflows and underflows
> > can lead to use-after-free situation and be exploitable.
> > 
> > The variable futex_pi_state.refcount is used as pure
> > reference counter. Convert it to refcount_t and fix up
> > the operations.
> > 
> > Suggested-by: Kees Cook <keescook@chromium.org>
> > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

So the thing to be careful with for things like futex and some of the
other core kernel code is the memory ordering.

atomic_dec_and_test() provides a full smp_mb() before and after,
refcount_dec_and_test() only provides release semantics.

This is typically sufficient, and I would argue that if we rely on more
than that, there _should_ be a comment, however reality isn't always as
nice.

That said, I think this conversion is OK, pi_state->refcount isn't
relied upon to provide additional memory ordering above and beyond what
refcounting requires.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01  9:38     ` Peter Zijlstra
@ 2017-09-01  9:43       ` Thomas Gleixner
  2017-09-01 10:52           ` Reshetova, Elena
  2017-09-01 11:05           ` Reshetova, Elena
  0 siblings, 2 replies; 51+ messages in thread
From: Thomas Gleixner @ 2017-09-01  9:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Elena Reshetova, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > atomic_t variables are currently used to implement reference
> > > counters with the following properties:
> > >  - counter is initialized to 1 using atomic_set()
> > >  - a resource is freed upon counter reaching zero
> > >  - once counter reaches zero, its further
> > >    increments aren't allowed
> > >  - counter schema uses basic atomic operations
> > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > 
> > > Such atomic variables should be converted to a newly provided
> > > refcount_t type and API that prevents accidental counter overflows
> > > and underflows. This is important since overflows and underflows
> > > can lead to use-after-free situation and be exploitable.
> > > 
> > > The variable futex_pi_state.refcount is used as pure
> > > reference counter. Convert it to refcount_t and fix up
> > > the operations.
> > > 
> > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > 
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> 
> So the thing to be careful with for things like futex and some of the
> other core kernel code is the memory ordering.
> 
> atomic_dec_and_test() provides a full smp_mb() before and after,
> refcount_dec_and_test() only provides release semantics.
> 
> This is typically sufficient, and I would argue that if we rely on more
> than that, there _should_ be a comment, however reality isn't always as
> nice.
> 
> That said, I think this conversion is OK, pi_state->refcount isn't
> relied upon to provide additional memory ordering above and beyond what
> refcounting requires.

So the changelogs should reflect that. The current one suggests that this
is a one to one replacement for atomic_t just with the extra sanity checks
added.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/15] v5 kernel core pieces refcount conversions
  2017-08-31 23:48 ` [PATCH 00/15] v5 kernel core pieces refcount conversions Kees Cook
@ 2017-09-01  9:48   ` Peter Zijlstra
  2017-09-01 16:55     ` Kees Cook
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01  9:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Elena Reshetova, Andrew Morton, LKML, linux-fsdevel, Greg KH,
	Al Viro, Tejun Heo, Ingo Molnar, Johannes Weiner, Li Zefan,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Eric Paris,
	Arnd Bergmann, Andy Lutomirski, Thomas Gleixner, dvhart,
	Eric W. Biederman

On Thu, Aug 31, 2017 at 04:48:00PM -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
> <elena.reshetova@intel.com> wrote:
> > Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
> > arm and others on their way.
> >
> > Changes in v5:
> >  * Kees catched that the following changes in
> >    perf_event_context.refcount and futex_pi_state.refcount
> >    are not correct now when ARCH_HAS_REFCOUNT is enabled:
> >     -   WARN_ON(!atomic_inc_not_zero(refcount));
> >     +   refcount_inc(refcount);
> >    So they are now changed back to using refcount_inc_not_zero.
> 
> Thanks!
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> 
> Andrew, are you able to carry these patches in -mm, since they span a
> bunch of core kernel areas?

No.. these patches should go through the regular trees that maintain
these various parts.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01  9:43       ` Thomas Gleixner
@ 2017-09-01 10:52           ` Reshetova, Elena
  2017-09-01 11:05           ` Reshetova, Elena
  1 sibling, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 10:52 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: linux-kernel, linux-fsdevel, gregkh, viro, tj, mingo, hannes,
	lizefan, acme, alexander.shishkin, eparis, akpm, arnd, luto,
	keescook, dvhart, ebiederm

> On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > atomic_t variables are currently used to implement reference
> > > > counters with the following properties:
> > > >  - counter is initialized to 1 using atomic_set()
> > > >  - a resource is freed upon counter reaching zero
> > > >  - once counter reaches zero, its further
> > > >    increments aren't allowed
> > > >  - counter schema uses basic atomic operations
> > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > >
> > > > Such atomic variables should be converted to a newly provided
> > > > refcount_t type and API that prevents accidental counter overflows
> > > > and underflows. This is important since overflows and underflows
> > > > can lead to use-after-free situation and be exploitable.
> > > >
> > > > The variable futex_pi_state.refcount is used as pure
> > > > reference counter. Convert it to refcount_t and fix up
> > > > the operations.
> > > >
> > > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > >
> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> >
> > So the thing to be careful with for things like futex and some of the
> > other core kernel code is the memory ordering.
> >
> > atomic_dec_and_test() provides a full smp_mb() before and after,
> > refcount_dec_and_test() only provides release semantics.
> >
> > This is typically sufficient, and I would argue that if we rely on more
> > than that, there _should_ be a comment, however reality isn't always as
> > nice.
> >
> > That said, I think this conversion is OK, pi_state->refcount isn't
> > relied upon to provide additional memory ordering above and beyond what
> > refcounting requires.
> 
> So the changelogs should reflect that. The current one suggests that this
> is a one to one replacement for atomic_t just with the extra sanity checks
> added.

I will update the commit texts accordingly and resend the whole series since
this should be then mentioned in every commit to make sure it is not missed.

Best Regards,
Elena.

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-01 10:52           ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 10:52 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: linux-kernel, linux-fsdevel, gregkh, viro, tj, mingo, hannes,
	lizefan, acme, alexander.shishkin, eparis, akpm, arnd, luto,
	keescook, dvhart, ebiederm

> On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > atomic_t variables are currently used to implement reference
> > > > counters with the following properties:
> > > >  - counter is initialized to 1 using atomic_set()
> > > >  - a resource is freed upon counter reaching zero
> > > >  - once counter reaches zero, its further
> > > >    increments aren't allowed
> > > >  - counter schema uses basic atomic operations
> > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > >
> > > > Such atomic variables should be converted to a newly provided
> > > > refcount_t type and API that prevents accidental counter overflows
> > > > and underflows. This is important since overflows and underflows
> > > > can lead to use-after-free situation and be exploitable.
> > > >
> > > > The variable futex_pi_state.refcount is used as pure
> > > > reference counter. Convert it to refcount_t and fix up
> > > > the operations.
> > > >
> > > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > >
> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> >
> > So the thing to be careful with for things like futex and some of the
> > other core kernel code is the memory ordering.
> >
> > atomic_dec_and_test() provides a full smp_mb() before and after,
> > refcount_dec_and_test() only provides release semantics.
> >
> > This is typically sufficient, and I would argue that if we rely on more
> > than that, there _should_ be a comment, however reality isn't always as
> > nice.
> >
> > That said, I think this conversion is OK, pi_state->refcount isn't
> > relied upon to provide additional memory ordering above and beyond what
> > refcounting requires.
> 
> So the changelogs should reflect that. The current one suggests that this
> is a one to one replacement for atomic_t just with the extra sanity checks
> added.

I will update the commit texts accordingly and resend the whole series since
this should be then mentioned in every commit to make sure it is not missed.

Best Regards,
Elena.

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01  9:43       ` Thomas Gleixner
@ 2017-09-01 11:05           ` Reshetova, Elena
  2017-09-01 11:05           ` Reshetova, Elena
  1 sibling, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 11:05 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: linux-kernel, linux-fsdevel, gregkh, viro, tj, mingo, hannes,
	lizefan, acme, alexander.shishkin, eparis, akpm, arnd, luto,
	keescook, dvhart, ebiederm

 > On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > > atomic_t variables are currently used to implement reference
> > > > > counters with the following properties:
> > > > >  - counter is initialized to 1 using atomic_set()
> > > > >  - a resource is freed upon counter reaching zero
> > > > >  - once counter reaches zero, its further
> > > > >    increments aren't allowed
> > > > >  - counter schema uses basic atomic operations
> > > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > > >
> > > > > Such atomic variables should be converted to a newly provided
> > > > > refcount_t type and API that prevents accidental counter overflows
> > > > > and underflows. This is important since overflows and underflows
> > > > > can lead to use-after-free situation and be exploitable.
> > > > >
> > > > > The variable futex_pi_state.refcount is used as pure
> > > > > reference counter. Convert it to refcount_t and fix up
> > > > > the operations.
> > > > >
> > > > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > > >
> > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > >
> > > So the thing to be careful with for things like futex and some of the
> > > other core kernel code is the memory ordering.
> > >
> > > atomic_dec_and_test() provides a full smp_mb() before and after,
> > > refcount_dec_and_test() only provides release semantics.
> > >
> > > This is typically sufficient, and I would argue that if we rely on more
> > > than that, there _should_ be a comment, however reality isn't always as
> > > nice.
> > >
> > > That said, I think this conversion is OK, pi_state->refcount isn't
> > > relied upon to provide additional memory ordering above and beyond what
> > > refcounting requires.
> >
> > So the changelogs should reflect that. The current one suggests that this
> > is a one to one replacement for atomic_t just with the extra sanity checks
> > added.
> 
> I will update the commit texts accordingly and resend the whole series since
> this should be then mentioned in every commit to make sure it is not missed.

Actually on the second thought: does the above memory ordering differences
really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
how it is currently implemented for x86 is the same way as it is for atomic cases.


> 
> Best Regards,
> Elena.
> 
> >
> > Thanks,
> >
> > 	tglx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-01 11:05           ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 11:05 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: linux-kernel, linux-fsdevel, gregkh, viro, tj, mingo, hannes,
	lizefan, acme, alexander.shishkin, eparis, akpm, arnd, luto,
	keescook, dvhart, ebiederm

 > On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > > atomic_t variables are currently used to implement reference
> > > > > counters with the following properties:
> > > > >  - counter is initialized to 1 using atomic_set()
> > > > >  - a resource is freed upon counter reaching zero
> > > > >  - once counter reaches zero, its further
> > > > >    increments aren't allowed
> > > > >  - counter schema uses basic atomic operations
> > > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > > >
> > > > > Such atomic variables should be converted to a newly provided
> > > > > refcount_t type and API that prevents accidental counter overflows
> > > > > and underflows. This is important since overflows and underflows
> > > > > can lead to use-after-free situation and be exploitable.
> > > > >
> > > > > The variable futex_pi_state.refcount is used as pure
> > > > > reference counter. Convert it to refcount_t and fix up
> > > > > the operations.
> > > > >
> > > > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > > >
> > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > >
> > > So the thing to be careful with for things like futex and some of the
> > > other core kernel code is the memory ordering.
> > >
> > > atomic_dec_and_test() provides a full smp_mb() before and after,
> > > refcount_dec_and_test() only provides release semantics.
> > >
> > > This is typically sufficient, and I would argue that if we rely on more
> > > than that, there _should_ be a comment, however reality isn't always as
> > > nice.
> > >
> > > That said, I think this conversion is OK, pi_state->refcount isn't
> > > relied upon to provide additional memory ordering above and beyond what
> > > refcounting requires.
> >
> > So the changelogs should reflect that. The current one suggests that this
> > is a one to one replacement for atomic_t just with the extra sanity checks
> > added.
> 
> I will update the commit texts accordingly and resend the whole series since
> this should be then mentioned in every commit to make sure it is not missed.

Actually on the second thought: does the above memory ordering differences
really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
how it is currently implemented for x86 is the same way as it is for atomic cases.


> 
> Best Regards,
> Elena.
> 
> >
> > Thanks,
> >
> > 	tglx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01 11:05           ` Reshetova, Elena
@ 2017-09-01 12:34             ` Peter Zijlstra
  -1 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01 12:34 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> Actually on the second thought: does the above memory ordering differences
> really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> how it is currently implemented for x86 is the same way as it is for atomic cases.

Never look to x86 for memory ordering, its boring.

And yes, for the ARM implementation it can certainly make a difference.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-01 12:34             ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01 12:34 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> Actually on the second thought: does the above memory ordering differences
> really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> how it is currently implemented for x86 is the same way as it is for atomic cases.

Never look to x86 for memory ordering, its boring.

And yes, for the ARM implementation it can certainly make a difference.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01 12:34             ` Peter Zijlstra
@ 2017-09-01 13:24               ` Reshetova, Elena
  -1 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm


> On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > Actually on the second thought: does the above memory ordering differences
> > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > how it is currently implemented for x86 is the same way as it is for atomic cases.
> 
> Never look to x86 for memory ordering, its boring.
> 
> And yes, for the ARM implementation it can certainly make a difference.

So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
might be differences when you use arch. dependent version of function or not. 

So, I guess if I rewrite the commits, I should only include the statement on relaxed memory order
for REFCOUNT_FULL and tell that arch. specific implementations may vary on their properties
(as they do now). 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-01 13:24               ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm


> On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > Actually on the second thought: does the above memory ordering differences
> > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > how it is currently implemented for x86 is the same way as it is for atomic cases.
> 
> Never look to x86 for memory ordering, its boring.
> 
> And yes, for the ARM implementation it can certainly make a difference.

So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
might be differences when you use arch. dependent version of function or not. 

So, I guess if I rewrite the commits, I should only include the statement on relaxed memory order
for REFCOUNT_FULL and tell that arch. specific implementations may vary on their properties
(as they do now). 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01 13:24               ` Reshetova, Elena
@ 2017-09-01 13:36                 ` Peter Zijlstra
  -1 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01 13:36 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> 
> > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > Actually on the second thought: does the above memory ordering differences
> > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > how it is currently implemented for x86 is the same way as it is for atomic cases.
> > 
> > Never look to x86 for memory ordering, its boring.
> > 
> > And yes, for the ARM implementation it can certainly make a difference.
> 
> So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
> enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
> might be differences when you use arch. dependent version of function or not. 

So the generic one in lib/refcount.c is already weaker on ARM, they
don't need to do a ARCH specific 'fast' implementation for the
difference to show up.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-01 13:36                 ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01 13:36 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> 
> > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > Actually on the second thought: does the above memory ordering differences
> > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > how it is currently implemented for x86 is the same way as it is for atomic cases.
> > 
> > Never look to x86 for memory ordering, its boring.
> > 
> > And yes, for the ARM implementation it can certainly make a difference.
> 
> So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
> enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
> might be differences when you use arch. dependent version of function or not. 

So the generic one in lib/refcount.c is already weaker on ARM, they
don't need to do a ARCH specific 'fast' implementation for the
difference to show up.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/15] v5 kernel core pieces refcount conversions
  2017-09-01  9:48   ` Peter Zijlstra
@ 2017-09-01 16:55     ` Kees Cook
  2017-09-01 17:08       ` Reshetova, Elena
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2017-09-01 16:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Elena Reshetova, Andrew Morton, LKML, linux-fsdevel, Greg KH,
	Al Viro, Tejun Heo, Ingo Molnar, Johannes Weiner, Li Zefan,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Eric Paris,
	Arnd Bergmann, Andy Lutomirski, Thomas Gleixner, dvhart,
	Eric W. Biederman

On Fri, Sep 1, 2017 at 2:48 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Aug 31, 2017 at 04:48:00PM -0700, Kees Cook wrote:
>> On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
>> <elena.reshetova@intel.com> wrote:
>> > Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
>> > arm and others on their way.
>> >
>> > Changes in v5:
>> >  * Kees catched that the following changes in
>> >    perf_event_context.refcount and futex_pi_state.refcount
>> >    are not correct now when ARCH_HAS_REFCOUNT is enabled:
>> >     -   WARN_ON(!atomic_inc_not_zero(refcount));
>> >     +   refcount_inc(refcount);
>> >    So they are now changed back to using refcount_inc_not_zero.
>>
>> Thanks!
>>
>> Reviewed-by: Kees Cook <keescook@chromium.org>
>>
>> Andrew, are you able to carry these patches in -mm, since they span a
>> bunch of core kernel areas?
>
> No.. these patches should go through the regular trees that maintain
> these various parts.

Okay, sounds fine. Elena, can you split these up? (You'll probably
have to examine MAINTAINERS and/or git history for each patch...)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01 13:36                 ` Peter Zijlstra
@ 2017-09-01 17:03                   ` Reshetova, Elena
  -1 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 17:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

> On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> >
> > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > Actually on the second thought: does the above memory ordering differences
> > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > how it is currently implemented for x86 is the same way as it is for atomic
> cases.
> > >
> > > Never look to x86 for memory ordering, its boring.
> > >
> > > And yes, for the ARM implementation it can certainly make a difference.
> >
> > So, yes, what I am trying to say is that it can really depend if you have
> ARCH_HAS_REFCOUNT
> > enabled or not and then also based on architecture. Thus I believe is also true for
> atomic: there
> > might be differences when you use arch. dependent version of function or not.
> 
> So the generic one in lib/refcount.c is already weaker on ARM, they
> don't need to do a ARCH specific 'fast' implementation for the
> difference to show up.

But can they make "fast" implementation on ARM that would give stronger memory guarantees?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-01 17:03                   ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 17:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

> On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> >
> > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > Actually on the second thought: does the above memory ordering differences
> > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > how it is currently implemented for x86 is the same way as it is for atomic
> cases.
> > >
> > > Never look to x86 for memory ordering, its boring.
> > >
> > > And yes, for the ARM implementation it can certainly make a difference.
> >
> > So, yes, what I am trying to say is that it can really depend if you have
> ARCH_HAS_REFCOUNT
> > enabled or not and then also based on architecture. Thus I believe is also true for
> atomic: there
> > might be differences when you use arch. dependent version of function or not.
> 
> So the generic one in lib/refcount.c is already weaker on ARM, they
> don't need to do a ARCH specific 'fast' implementation for the
> difference to show up.

But can they make "fast" implementation on ARM that would give stronger memory guarantees?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/15] v5 kernel core pieces refcount conversions
  2017-09-01 16:55     ` Kees Cook
@ 2017-09-01 17:08       ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-01 17:08 UTC (permalink / raw)
  To: Kees Cook, Peter Zijlstra
  Cc: Andrew Morton, LKML, linux-fsdevel, Greg KH, Al Viro, Tejun Heo,
	Ingo Molnar, Johannes Weiner, Li Zefan, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Eric Paris, Arnd Bergmann, Andy Lutomirski,
	Thomas Gleixner, dvhart, Eric W. Biederman

> On Fri, Sep 1, 2017 at 2:48 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Aug 31, 2017 at 04:48:00PM -0700, Kees Cook wrote:
> >> On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
> >> <elena.reshetova@intel.com> wrote:
> >> > Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
> >> > arm and others on their way.
> >> >
> >> > Changes in v5:
> >> >  * Kees catched that the following changes in
> >> >    perf_event_context.refcount and futex_pi_state.refcount
> >> >    are not correct now when ARCH_HAS_REFCOUNT is enabled:
> >> >     -   WARN_ON(!atomic_inc_not_zero(refcount));
> >> >     +   refcount_inc(refcount);
> >> >    So they are now changed back to using refcount_inc_not_zero.
> >>
> >> Thanks!
> >>
> >> Reviewed-by: Kees Cook <keescook@chromium.org>
> >>
> >> Andrew, are you able to carry these patches in -mm, since they span a
> >> bunch of core kernel areas?
> >
> > No.. these patches should go through the regular trees that maintain
> > these various parts.
> 
> Okay, sounds fine. Elena, can you split these up? (You'll probably
> have to examine MAINTAINERS and/or git history for each patch...)

Well, I can do this, but patches are already fully independent for cherry-pick and all maintainers
should be in the CC list, so I was hoping people can pull into their trees from this series. 
But if people want to split, I can do a split...

Best Regards,
Elena.

> 
> -Kees
> 
> --
> Kees Cook
> Pixel Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01 17:03                   ` Reshetova, Elena
@ 2017-09-01 19:12                     ` Peter Zijlstra
  -1 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01 19:12 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > >
> > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > Actually on the second thought: does the above memory ordering differences
> > > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > cases.
> > > >
> > > > Never look to x86 for memory ordering, its boring.
> > > >
> > > > And yes, for the ARM implementation it can certainly make a difference.
> > >
> > > So, yes, what I am trying to say is that it can really depend if you have
> > ARCH_HAS_REFCOUNT
> > > enabled or not and then also based on architecture. Thus I believe is also true for
> > atomic: there
> > > might be differences when you use arch. dependent version of function or not.
> > 
> > So the generic one in lib/refcount.c is already weaker on ARM, they
> > don't need to do a ARCH specific 'fast' implementation for the
> > difference to show up.
> 
> But can they make "fast" implementation on ARM that would give stronger memory guarantees?

Whatever for?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-01 19:12                     ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-01 19:12 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > >
> > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > Actually on the second thought: does the above memory ordering differences
> > > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > cases.
> > > >
> > > > Never look to x86 for memory ordering, its boring.
> > > >
> > > > And yes, for the ARM implementation it can certainly make a difference.
> > >
> > > So, yes, what I am trying to say is that it can really depend if you have
> > ARCH_HAS_REFCOUNT
> > > enabled or not and then also based on architecture. Thus I believe is also true for
> > atomic: there
> > > might be differences when you use arch. dependent version of function or not.
> > 
> > So the generic one in lib/refcount.c is already weaker on ARM, they
> > don't need to do a ARCH specific 'fast' implementation for the
> > difference to show up.
> 
> But can they make "fast" implementation on ARM that would give stronger memory guarantees?

Whatever for?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-01 19:12                     ` Peter Zijlstra
@ 2017-09-04 10:31                       ` Reshetova, Elena
  -1 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-04 10:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm



> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Friday, September 1, 2017 10:13 PM
> To: Reshetova, Elena <elena.reshetova@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>; linux-kernel@vger.kernel.org; linux-
> fsdevel@vger.kernel.org; gregkh@linuxfoundation.org; viro@zeniv.linux.org.uk;
> tj@kernel.org; mingo@redhat.com; hannes@cmpxchg.org; lizefan@huawei.com;
> acme@kernel.org; alexander.shishkin@linux.intel.com; eparis@redhat.com;
> akpm@linux-foundation.org; arnd@arndb.de; luto@kernel.org;
> keescook@chromium.org; dvhart@infradead.org; ebiederm@xmission.com
> Subject: Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
> 
> On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > > >
> > > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > > Actually on the second thought: does the above memory ordering
> differences
> > > > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the
> way
> > > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > > cases.
> > > > >
> > > > > Never look to x86 for memory ordering, its boring.
> > > > >
> > > > > And yes, for the ARM implementation it can certainly make a difference.
> > > >
> > > > So, yes, what I am trying to say is that it can really depend if you have
> > > ARCH_HAS_REFCOUNT
> > > > enabled or not and then also based on architecture. Thus I believe is also true
> for
> > > atomic: there
> > > > might be differences when you use arch. dependent version of function or not.
> > >
> > > So the generic one in lib/refcount.c is already weaker on ARM, they
> > > don't need to do a ARCH specific 'fast' implementation for the
> > > difference to show up.
> >
> > But can they make "fast" implementation on ARM that would give stronger
> memory guarantees?
> 
> Whatever for?

Well, maybe just by default when arch.-specific implementation is done. But I was just trying to speculate
to understand. I will resend this one with new comment added. 

Still not sure if I need to resend the whole series with updated commits
or break this up by individual patches further for the separate merges. 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-04 10:31                       ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-04 10:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm



> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Friday, September 1, 2017 10:13 PM
> To: Reshetova, Elena <elena.reshetova@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>; linux-kernel@vger.kernel.org; linux-
> fsdevel@vger.kernel.org; gregkh@linuxfoundation.org; viro@zeniv.linux.org.uk;
> tj@kernel.org; mingo@redhat.com; hannes@cmpxchg.org; lizefan@huawei.com;
> acme@kernel.org; alexander.shishkin@linux.intel.com; eparis@redhat.com;
> akpm@linux-foundation.org; arnd@arndb.de; luto@kernel.org;
> keescook@chromium.org; dvhart@infradead.org; ebiederm@xmission.com
> Subject: Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
> 
> On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > > >
> > > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > > Actually on the second thought: does the above memory ordering
> differences
> > > > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the
> way
> > > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > > cases.
> > > > >
> > > > > Never look to x86 for memory ordering, its boring.
> > > > >
> > > > > And yes, for the ARM implementation it can certainly make a difference.
> > > >
> > > > So, yes, what I am trying to say is that it can really depend if you have
> > > ARCH_HAS_REFCOUNT
> > > > enabled or not and then also based on architecture. Thus I believe is also true
> for
> > > atomic: there
> > > > might be differences when you use arch. dependent version of function or not.
> > >
> > > So the generic one in lib/refcount.c is already weaker on ARM, they
> > > don't need to do a ARCH specific 'fast' implementation for the
> > > difference to show up.
> >
> > But can they make "fast" implementation on ARM that would give stronger
> memory guarantees?
> 
> Whatever for?

Well, maybe just by default when arch.-specific implementation is done. But I was just trying to speculate
to understand. I will resend this one with new comment added. 

Still not sure if I need to resend the whole series with updated commits
or break this up by individual patches further for the separate merges. 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-04 10:31                       ` Reshetova, Elena
@ 2017-09-04 12:00                         ` Peter Zijlstra
  -1 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-04 12:00 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > But can they make "fast" implementation on ARM that would give stronger
> > > memory guarantees?
> > 
> > Whatever for?
> 
> Well, maybe just by default when arch.-specific implementation is
> done. But I was just trying to speculate to understand. I will resend
> this one with new comment added. 

So the generic lib/refcount.c already has weak ordering. It doesn't make
sense for an arch specific implementation (on a weakly ordered machine)
to provide stronger guarantees (it would make things slower).

The weaker ordering of the refcount_t primitives is sufficient if we're
talking pure refcounts. If for some reason code relies on stronger
ordering there _SHOULD_ be a comment with describing the additional
ordering requirements.

But that's a fairly big 'should'. I can well imagine the comment not
being there. In fact, see below.

> Still not sure if I need to resend the whole series with updated
> commits or break this up by individual patches further for the
> separate merges. 

I've yet to look at the ones targeted at subsystems I do, I'm forever
and terminally behind on review :/

I called out the issue on futex in particular because it is fairly
tricky code that.

Now Thomas would like you to mention the fact that refcount_t doesn't
provide the exact same ordering as the atomic_t usages it replaces and
I think it would be good if you could hand-wave an argument on why the
futex code doesn't care.


Now, suppose we were to convert i_count to refcount_t (yes, I know, my
initial conversion wasn't well received), then we need to add
futex_get_inode() similar to futex_get_mm().

That is, smp_mb__{before,after}_atomic() works as expected and can be
used to fortify the implied barriers by refcount_t.

---
Subject: fs,inode: Add comment explaining additional ordering

Add a note to ihold() to document the ordering futex relies upon.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 fs/inode.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index 50370599e371..17192ba92fef 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -395,6 +395,10 @@ void __iget(struct inode *inode)
  */
 void ihold(struct inode *inode)
 {
+	/*
+	 * Note: futex.c:get_futex_key_refs() relies on this function
+	 * implying an smp_mb().
+	 */
 	WARN_ON(atomic_inc_return(&inode->i_count) < 2);
 }
 EXPORT_SYMBOL(ihold);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-04 12:00                         ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2017-09-04 12:00 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > But can they make "fast" implementation on ARM that would give stronger
> > > memory guarantees?
> > 
> > Whatever for?
> 
> Well, maybe just by default when arch.-specific implementation is
> done. But I was just trying to speculate to understand. I will resend
> this one with new comment added. 

So the generic lib/refcount.c already has weak ordering. It doesn't make
sense for an arch specific implementation (on a weakly ordered machine)
to provide stronger guarantees (it would make things slower).

The weaker ordering of the refcount_t primitives is sufficient if we're
talking pure refcounts. If for some reason code relies on stronger
ordering there _SHOULD_ be a comment with describing the additional
ordering requirements.

But that's a fairly big 'should'. I can well imagine the comment not
being there. In fact, see below.

> Still not sure if I need to resend the whole series with updated
> commits or break this up by individual patches further for the
> separate merges. 

I've yet to look at the ones targeted at subsystems I do, I'm forever
and terminally behind on review :/

I called out the issue on futex in particular because it is fairly
tricky code that.

Now Thomas would like you to mention the fact that refcount_t doesn't
provide the exact same ordering as the atomic_t usages it replaces and
I think it would be good if you could hand-wave an argument on why the
futex code doesn't care.


Now, suppose we were to convert i_count to refcount_t (yes, I know, my
initial conversion wasn't well received), then we need to add
futex_get_inode() similar to futex_get_mm().

That is, smp_mb__{before,after}_atomic() works as expected and can be
used to fortify the implied barriers by refcount_t.

---
Subject: fs,inode: Add comment explaining additional ordering

Add a note to ihold() to document the ordering futex relies upon.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 fs/inode.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index 50370599e371..17192ba92fef 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -395,6 +395,10 @@ void __iget(struct inode *inode)
  */
 void ihold(struct inode *inode)
 {
+	/*
+	 * Note: futex.c:get_futex_key_refs() relies on this function
+	 * implying an smp_mb().
+	 */
 	WARN_ON(atomic_inc_return(&inode->i_count) < 2);
 }
 EXPORT_SYMBOL(ihold);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-04 12:00                         ` Peter Zijlstra
@ 2017-09-14 16:02                           ` Reshetova, Elena
  -1 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-14 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

Sorry for delayed reply. 

> On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > > But can they make "fast" implementation on ARM that would give stronger
> > > > memory guarantees?
> > >
> > > Whatever for?
> >
> > Well, maybe just by default when arch.-specific implementation is
> > done. But I was just trying to speculate to understand. I will resend
> > this one with new comment added.
> 
> So the generic lib/refcount.c already has weak ordering. It doesn't make
> sense for an arch specific implementation (on a weakly ordered machine)
> to provide stronger guarantees (it would make things slower).

Thank you for explaining this! Helps to understand a lot. 
> 
> The weaker ordering of the refcount_t primitives is sufficient if we're
> talking pure refcounts. If for some reason code relies on stronger
> ordering there _SHOULD_ be a comment with describing the additional
> ordering requirements.
> 
> But that's a fairly big 'should'. I can well imagine the comment not
> being there. In fact, see below.
> 
> > Still not sure if I need to resend the whole series with updated
> > commits or break this up by individual patches further for the
> > separate merges.
> 
> I've yet to look at the ones targeted at subsystems I do, I'm forever
> and terminally behind on review :/
> 
> I called out the issue on futex in particular because it is fairly
> tricky code that.
> 
> Now Thomas would like you to mention the fact that refcount_t doesn't
> provide the exact same ordering as the atomic_t usages it replaces and
> I think it would be good if you could hand-wave an argument on why the
> futex code doesn't care.

I think I can mention the ordering differences on all yet-to-be-merged
patches to make sure maintainers are aware. The problem with concrete
cases is that I don't usually have enough knowledge of code to understand 
for sure where it would matter or not. Previously I was even under impression
that it should not matter at all for the variables that we are converting since
they are classical refcounters, but your examples clearly show that it is not
*always* the case (but I think it is the case for most of patches). 
So, I am a bit confused on how to approach this. 
Either just put a statement to all patches and rely that maintainers certainly
know their code and can catch these things or do an analysis myself, but 
then I would need a bit of guidance on what is the reasonable heuristics on 
how check each refcounter. This goes really beyond my current
kernel knowledge, but I am happy to learn if somebody points me to smth
I can read/fill missing points.   

Best Regards,
Elena.


> 
> 
> Now, suppose we were to convert i_count to refcount_t (yes, I know, my
> initial conversion wasn't well received), then we need to add
> futex_get_inode() similar to futex_get_mm().
> 
> That is, smp_mb__{before,after}_atomic() works as expected and can be
> used to fortify the implied barriers by refcount_t.
> 
> ---
> Subject: fs,inode: Add comment explaining additional ordering
> 
> Add a note to ihold() to document the ordering futex relies upon.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  fs/inode.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 50370599e371..17192ba92fef 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -395,6 +395,10 @@ void __iget(struct inode *inode)
>   */
>  void ihold(struct inode *inode)
>  {
> +	/*
> +	 * Note: futex.c:get_futex_key_refs() relies on this function
> +	 * implying an smp_mb().
> +	 */
>  	WARN_ON(atomic_inc_return(&inode->i_count) < 2);
>  }
>  EXPORT_SYMBOL(ihold);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-09-14 16:02                           ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-09-14 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, linux-kernel, linux-fsdevel, gregkh, viro, tj,
	mingo, hannes, lizefan, acme, alexander.shishkin, eparis, akpm,
	arnd, luto, keescook, dvhart, ebiederm

Sorry for delayed reply. 

> On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > > But can they make "fast" implementation on ARM that would give stronger
> > > > memory guarantees?
> > >
> > > Whatever for?
> >
> > Well, maybe just by default when arch.-specific implementation is
> > done. But I was just trying to speculate to understand. I will resend
> > this one with new comment added.
> 
> So the generic lib/refcount.c already has weak ordering. It doesn't make
> sense for an arch specific implementation (on a weakly ordered machine)
> to provide stronger guarantees (it would make things slower).

Thank you for explaining this! Helps to understand a lot. 
> 
> The weaker ordering of the refcount_t primitives is sufficient if we're
> talking pure refcounts. If for some reason code relies on stronger
> ordering there _SHOULD_ be a comment with describing the additional
> ordering requirements.
> 
> But that's a fairly big 'should'. I can well imagine the comment not
> being there. In fact, see below.
> 
> > Still not sure if I need to resend the whole series with updated
> > commits or break this up by individual patches further for the
> > separate merges.
> 
> I've yet to look at the ones targeted at subsystems I do, I'm forever
> and terminally behind on review :/
> 
> I called out the issue on futex in particular because it is fairly
> tricky code that.
> 
> Now Thomas would like you to mention the fact that refcount_t doesn't
> provide the exact same ordering as the atomic_t usages it replaces and
> I think it would be good if you could hand-wave an argument on why the
> futex code doesn't care.

I think I can mention the ordering differences on all yet-to-be-merged
patches to make sure maintainers are aware. The problem with concrete
cases is that I don't usually have enough knowledge of code to understand 
for sure where it would matter or not. Previously I was even under impression
that it should not matter at all for the variables that we are converting since
they are classical refcounters, but your examples clearly show that it is not
*always* the case (but I think it is the case for most of patches). 
So, I am a bit confused on how to approach this. 
Either just put a statement to all patches and rely that maintainers certainly
know their code and can catch these things or do an analysis myself, but 
then I would need a bit of guidance on what is the reasonable heuristics on 
how check each refcounter. This goes really beyond my current
kernel knowledge, but I am happy to learn if somebody points me to smth
I can read/fill missing points.   

Best Regards,
Elena.


> 
> 
> Now, suppose we were to convert i_count to refcount_t (yes, I know, my
> initial conversion wasn't well received), then we need to add
> futex_get_inode() similar to futex_get_mm().
> 
> That is, smp_mb__{before,after}_atomic() works as expected and can be
> used to fortify the implied barriers by refcount_t.
> 
> ---
> Subject: fs,inode: Add comment explaining additional ordering
> 
> Add a note to ihold() to document the ordering futex relies upon.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  fs/inode.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 50370599e371..17192ba92fef 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -395,6 +395,10 @@ void __iget(struct inode *inode)
>   */
>  void ihold(struct inode *inode)
>  {
> +	/*
> +	 * Note: futex.c:get_futex_key_refs() relies on this function
> +	 * implying an smp_mb().
> +	 */
>  	WARN_ON(atomic_inc_return(&inode->i_count) < 2);
>  }
>  EXPORT_SYMBOL(ihold);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-09-14 16:02                           ` Reshetova, Elena
@ 2017-10-20 12:03                             ` Reshetova, Elena
  -1 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-10-20 12:03 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Thomas Gleixner', 'linux-kernel@vger.kernel.org',
	'linux-fsdevel@vger.kernel.org',
	'gregkh@linuxfoundation.org',
	'viro@zeniv.linux.org.uk', 'tj@kernel.org',
	'mingo@redhat.com', 'hannes@cmpxchg.org',
	'lizefan@huawei.com', 'acme@kernel.org',
	'alexander.shishkin@linux.intel.com',
	'eparis@redhat.com', 'akpm@linux-foundation.org',
	'arnd@arndb.de', 'luto@kernel.org',
	'keescook@chromium.org', 'dvhart@infradead.org',
	'ebiederm@xmission.com'

Since I am not really sure what to do with this futex patch, I will drop it
from the new series that I am about to send now. 

Please let me know what exactly should I do with this patch, as I wrote 
previously I really don't understand. 

Best Regards,
Elena.

> Sorry for delayed reply.
> 
> > On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > > > But can they make "fast" implementation on ARM that would give stronger
> > > > > memory guarantees?
> > > >
> > > > Whatever for?
> > >
> > > Well, maybe just by default when arch.-specific implementation is
> > > done. But I was just trying to speculate to understand. I will resend
> > > this one with new comment added.
> >
> > So the generic lib/refcount.c already has weak ordering. It doesn't make
> > sense for an arch specific implementation (on a weakly ordered machine)
> > to provide stronger guarantees (it would make things slower).
> 
> Thank you for explaining this! Helps to understand a lot.
> >
> > The weaker ordering of the refcount_t primitives is sufficient if we're
> > talking pure refcounts. If for some reason code relies on stronger
> > ordering there _SHOULD_ be a comment with describing the additional
> > ordering requirements.
> >
> > But that's a fairly big 'should'. I can well imagine the comment not
> > being there. In fact, see below.
> >
> > > Still not sure if I need to resend the whole series with updated
> > > commits or break this up by individual patches further for the
> > > separate merges.
> >
> > I've yet to look at the ones targeted at subsystems I do, I'm forever
> > and terminally behind on review :/
> >
> > I called out the issue on futex in particular because it is fairly
> > tricky code that.
> >
> > Now Thomas would like you to mention the fact that refcount_t doesn't
> > provide the exact same ordering as the atomic_t usages it replaces and
> > I think it would be good if you could hand-wave an argument on why the
> > futex code doesn't care.
> 
> I think I can mention the ordering differences on all yet-to-be-merged
> patches to make sure maintainers are aware. The problem with concrete
> cases is that I don't usually have enough knowledge of code to understand
> for sure where it would matter or not. Previously I was even under impression
> that it should not matter at all for the variables that we are converting since
> they are classical refcounters, but your examples clearly show that it is not
> *always* the case (but I think it is the case for most of patches).
> So, I am a bit confused on how to approach this.
> Either just put a statement to all patches and rely that maintainers certainly
> know their code and can catch these things or do an analysis myself, but
> then I would need a bit of guidance on what is the reasonable heuristics on
> how check each refcounter. This goes really beyond my current
> kernel knowledge, but I am happy to learn if somebody points me to smth
> I can read/fill missing points.
> 
> Best Regards,
> Elena.
> 
> 
> >
> >
> > Now, suppose we were to convert i_count to refcount_t (yes, I know, my
> > initial conversion wasn't well received), then we need to add
> > futex_get_inode() similar to futex_get_mm().
> >
> > That is, smp_mb__{before,after}_atomic() works as expected and can be
> > used to fortify the implied barriers by refcount_t.
> >
> > ---
> > Subject: fs,inode: Add comment explaining additional ordering
> >
> > Add a note to ihold() to document the ordering futex relies upon.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  fs/inode.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 50370599e371..17192ba92fef 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -395,6 +395,10 @@ void __iget(struct inode *inode)
> >   */
> >  void ihold(struct inode *inode)
> >  {
> > +	/*
> > +	 * Note: futex.c:get_futex_key_refs() relies on this function
> > +	 * implying an smp_mb().
> > +	 */
> >  	WARN_ON(atomic_inc_return(&inode->i_count) < 2);
> >  }
> >  EXPORT_SYMBOL(ihold);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-10-20 12:03                             ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-10-20 12:03 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Thomas Gleixner', 'linux-kernel@vger.kernel.org',
	'linux-fsdevel@vger.kernel.org',
	'gregkh@linuxfoundation.org',
	'viro@zeniv.linux.org.uk', 'tj@kernel.org',
	'mingo@redhat.com', 'hannes@cmpxchg.org',
	'lizefan@huawei.com', 'acme@kernel.org',
	'alexander.shishkin@linux.intel.com',
	'eparis@redhat.com', 'akpm@linux-foundation.org',
	'arnd@arndb.de', 'luto@kernel.org',
	'keescook@chromium.org', 'dvhart@infradead.org',
	'ebiederm@xmission.com'

Since I am not really sure what to do with this futex patch, I will drop it
from the new series that I am about to send now. 

Please let me know what exactly should I do with this patch, as I wrote 
previously I really don't understand. 

Best Regards,
Elena.

> Sorry for delayed reply.
> 
> > On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > > > But can they make "fast" implementation on ARM that would give stronger
> > > > > memory guarantees?
> > > >
> > > > Whatever for?
> > >
> > > Well, maybe just by default when arch.-specific implementation is
> > > done. But I was just trying to speculate to understand. I will resend
> > > this one with new comment added.
> >
> > So the generic lib/refcount.c already has weak ordering. It doesn't make
> > sense for an arch specific implementation (on a weakly ordered machine)
> > to provide stronger guarantees (it would make things slower).
> 
> Thank you for explaining this! Helps to understand a lot.
> >
> > The weaker ordering of the refcount_t primitives is sufficient if we're
> > talking pure refcounts. If for some reason code relies on stronger
> > ordering there _SHOULD_ be a comment with describing the additional
> > ordering requirements.
> >
> > But that's a fairly big 'should'. I can well imagine the comment not
> > being there. In fact, see below.
> >
> > > Still not sure if I need to resend the whole series with updated
> > > commits or break this up by individual patches further for the
> > > separate merges.
> >
> > I've yet to look at the ones targeted at subsystems I do, I'm forever
> > and terminally behind on review :/
> >
> > I called out the issue on futex in particular because it is fairly
> > tricky code that.
> >
> > Now Thomas would like you to mention the fact that refcount_t doesn't
> > provide the exact same ordering as the atomic_t usages it replaces and
> > I think it would be good if you could hand-wave an argument on why the
> > futex code doesn't care.
> 
> I think I can mention the ordering differences on all yet-to-be-merged
> patches to make sure maintainers are aware. The problem with concrete
> cases is that I don't usually have enough knowledge of code to understand
> for sure where it would matter or not. Previously I was even under impression
> that it should not matter at all for the variables that we are converting since
> they are classical refcounters, but your examples clearly show that it is not
> *always* the case (but I think it is the case for most of patches).
> So, I am a bit confused on how to approach this.
> Either just put a statement to all patches and rely that maintainers certainly
> know their code and can catch these things or do an analysis myself, but
> then I would need a bit of guidance on what is the reasonable heuristics on
> how check each refcounter. This goes really beyond my current
> kernel knowledge, but I am happy to learn if somebody points me to smth
> I can read/fill missing points.
> 
> Best Regards,
> Elena.
> 
> 
> >
> >
> > Now, suppose we were to convert i_count to refcount_t (yes, I know, my
> > initial conversion wasn't well received), then we need to add
> > futex_get_inode() similar to futex_get_mm().
> >
> > That is, smp_mb__{before,after}_atomic() works as expected and can be
> > used to fortify the implied barriers by refcount_t.
> >
> > ---
> > Subject: fs,inode: Add comment explaining additional ordering
> >
> > Add a note to ihold() to document the ordering futex relies upon.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  fs/inode.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 50370599e371..17192ba92fef 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -395,6 +395,10 @@ void __iget(struct inode *inode)
> >   */
> >  void ihold(struct inode *inode)
> >  {
> > +	/*
> > +	 * Note: futex.c:get_futex_key_refs() relies on this function
> > +	 * implying an smp_mb().
> > +	 */
> >  	WARN_ON(atomic_inc_return(&inode->i_count) < 2);
> >  }
> >  EXPORT_SYMBOL(ihold);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-10-20 12:03                             ` Reshetova, Elena
@ 2017-10-20 12:30                               ` Thomas Gleixner
  -1 siblings, 0 replies; 51+ messages in thread
From: Thomas Gleixner @ 2017-10-20 12:30 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: 'Peter Zijlstra', 'linux-kernel@vger.kernel.org',
	'linux-fsdevel@vger.kernel.org',
	'gregkh@linuxfoundation.org',
	'viro@zeniv.linux.org.uk', 'tj@kernel.org',
	'mingo@redhat.com', 'hannes@cmpxchg.org',
	'lizefan@huawei.com', 'acme@kernel.org',
	'alexander.shishkin@linux.intel.com',
	'eparis@redhat.com', 'akpm@linux-foundation.org',
	'arnd@arndb.de', 'luto@kernel.org',
	'keescook@chromium.org', 'dvhart@infradead.org',
	'ebiederm@xmission.com'

On Fri, 20 Oct 2017, Reshetova, Elena wrote:

> Since I am not really sure what to do with this futex patch, I will drop it
> from the new series that I am about to send now. 
> 
> Please let me know what exactly should I do with this patch, as I wrote 
> previously I really don't understand. 

As Peter said:

> > > Now Thomas would like you to mention the fact that refcount_t doesn't
> > > provide the exact same ordering as the atomic_t usages it replaces and
> > > I think it would be good if you could hand-wave an argument on why the
> > > futex code doesn't care.

So if you cannot come with a halfways reasonable argument then at least
make it entirely clear that refcount_t is not the same as atomic_t in terms
of ordering guarantees and you think that it does not matter but
explicitely ask for help from the developers and maintainers to look at it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-10-20 12:30                               ` Thomas Gleixner
  0 siblings, 0 replies; 51+ messages in thread
From: Thomas Gleixner @ 2017-10-20 12:30 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: 'Peter Zijlstra', 'linux-kernel@vger.kernel.org',
	'linux-fsdevel@vger.kernel.org',
	'gregkh@linuxfoundation.org',
	'viro@zeniv.linux.org.uk', 'tj@kernel.org',
	'mingo@redhat.com', 'hannes@cmpxchg.org',
	'lizefan@huawei.com', 'acme@kernel.org',
	'alexander.shishkin@linux.intel.com',
	'eparis@redhat.com', 'akpm@linux-foundation.org',
	'arnd@arndb.de', 'luto@kernel.org',
	'keescook@chromium.org', 'dvhart@infradead.org',
	'ebiederm@xmission.com'

On Fri, 20 Oct 2017, Reshetova, Elena wrote:

> Since I am not really sure what to do with this futex patch, I will drop it
> from the new series that I am about to send now. 
> 
> Please let me know what exactly should I do with this patch, as I wrote 
> previously I really don't understand. 

As Peter said:

> > > Now Thomas would like you to mention the fact that refcount_t doesn't
> > > provide the exact same ordering as the atomic_t usages it replaces and
> > > I think it would be good if you could hand-wave an argument on why the
> > > futex code doesn't care.

So if you cannot come with a halfways reasonable argument then at least
make it entirely clear that refcount_t is not the same as atomic_t in terms
of ordering guarantees and you think that it does not matter but
explicitely ask for help from the developers and maintainers to look at it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-10-20 12:30                               ` Thomas Gleixner
@ 2017-10-23  7:36                                 ` Reshetova, Elena
  -1 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-10-23  7:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: 'Peter Zijlstra', 'linux-kernel@vger.kernel.org',
	'linux-fsdevel@vger.kernel.org',
	'gregkh@linuxfoundation.org',
	'viro@zeniv.linux.org.uk', 'tj@kernel.org',
	'mingo@redhat.com', 'hannes@cmpxchg.org',
	'lizefan@huawei.com', 'acme@kernel.org',
	'alexander.shishkin@linux.intel.com',
	'eparis@redhat.com', 'akpm@linux-foundation.org',
	'arnd@arndb.de', 'luto@kernel.org',
	'keescook@chromium.org', 'dvhart@infradead.org',
	'ebiederm@xmission.com'


> On Fri, 20 Oct 2017, Reshetova, Elena wrote:
> 
> > Since I am not really sure what to do with this futex patch, I will drop it
> > from the new series that I am about to send now.
> >
> > Please let me know what exactly should I do with this patch, as I wrote
> > previously I really don't understand.
> 
> As Peter said:
> 
> > > > Now Thomas would like you to mention the fact that refcount_t doesn't
> > > > provide the exact same ordering as the atomic_t usages it replaces and
> > > > I think it would be good if you could hand-wave an argument on why the
> > > > futex code doesn't care.
> 
> So if you cannot come with a halfways reasonable argument then at least
> make it entirely clear that refcount_t is not the same as atomic_t in terms
> of ordering guarantees and you think that it does not matter but
> explicitely ask for help from the developers and maintainers to look at it.


There is another (I think better) proposal that came from Kees now: 
What if we change the default refcount_t to provide the strict memory
ordering like atomic_t?
I guess the reason why Peter intially went with *_relaxed() versions of compare and
exchange loop was performance. But now when we have x86 spec. fast
implementation, maybe it is ok to make the default FULL refcount slower?

What do people think of this?

Best Regards,
Elena.


> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
@ 2017-10-23  7:36                                 ` Reshetova, Elena
  0 siblings, 0 replies; 51+ messages in thread
From: Reshetova, Elena @ 2017-10-23  7:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: 'Peter Zijlstra', 'linux-kernel@vger.kernel.org',
	'linux-fsdevel@vger.kernel.org',
	'gregkh@linuxfoundation.org',
	'viro@zeniv.linux.org.uk', 'tj@kernel.org',
	'mingo@redhat.com', 'hannes@cmpxchg.org',
	'lizefan@huawei.com', 'acme@kernel.org',
	'alexander.shishkin@linux.intel.com',
	'eparis@redhat.com', 'akpm@linux-foundation.org',
	'arnd@arndb.de', 'luto@kernel.org',
	'keescook@chromium.org', 'dvhart@infradead.org',
	'ebiederm@xmission.com'


> On Fri, 20 Oct 2017, Reshetova, Elena wrote:
> 
> > Since I am not really sure what to do with this futex patch, I will drop it
> > from the new series that I am about to send now.
> >
> > Please let me know what exactly should I do with this patch, as I wrote
> > previously I really don't understand.
> 
> As Peter said:
> 
> > > > Now Thomas would like you to mention the fact that refcount_t doesn't
> > > > provide the exact same ordering as the atomic_t usages it replaces and
> > > > I think it would be good if you could hand-wave an argument on why the
> > > > futex code doesn't care.
> 
> So if you cannot come with a halfways reasonable argument then at least
> make it entirely clear that refcount_t is not the same as atomic_t in terms
> of ordering guarantees and you think that it does not matter but
> explicitely ask for help from the developers and maintainers to look at it.


There is another (I think better) proposal that came from Kees now: 
What if we change the default refcount_t to provide the strict memory
ordering like atomic_t?
I guess the reason why Peter intially went with *_relaxed() versions of compare and
exchange loop was performance. But now when we have x86 spec. fast
implementation, maybe it is ok to make the default FULL refcount slower?

What do people think of this?

Best Regards,
Elena.


> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-08-25  9:41 [PATCH 00/15] v4 " Elena Reshetova
@ 2017-08-25  9:41 ` Elena Reshetova
  0 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-08-25  9:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable futex_pi_state.refcount is used as pure
reference counter. Convert it to refcount_t and fix up
the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/futex.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 0939255..07f735c 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -67,6 +67,7 @@
 #include <linux/freezer.h>
 #include <linux/bootmem.h>
 #include <linux/fault-inject.h>
+#include <linux/refcount.h>
 
 #include <asm/futex.h>
 
@@ -209,7 +210,7 @@ struct futex_pi_state {
 	struct rt_mutex pi_mutex;
 
 	struct task_struct *owner;
-	atomic_t refcount;
+	refcount_t refcount;
 
 	union futex_key key;
 } __randomize_layout;
@@ -795,7 +796,7 @@ static int refill_pi_state_cache(void)
 	INIT_LIST_HEAD(&pi_state->list);
 	/* pi_mutex gets initialized later */
 	pi_state->owner = NULL;
-	atomic_set(&pi_state->refcount, 1);
+	refcount_set(&pi_state->refcount, 1);
 	pi_state->key = FUTEX_KEY_INIT;
 
 	current->pi_state_cache = pi_state;
@@ -815,7 +816,7 @@ static struct futex_pi_state *alloc_pi_state(void)
 
 static void get_pi_state(struct futex_pi_state *pi_state)
 {
-	WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount));
+	refcount_inc(&pi_state->refcount);
 }
 
 /*
@@ -829,7 +830,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 	if (!pi_state)
 		return;
 
-	if (!atomic_dec_and_test(&pi_state->refcount))
+	if (!refcount_dec_and_test(&pi_state->refcount))
 		return;
 
 	/*
@@ -853,7 +854,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 		 * refcount is at 0 - put it back to 1.
 		 */
 		pi_state->owner = NULL;
-		atomic_set(&pi_state->refcount, 1);
+		refcount_set(&pi_state->refcount, 1);
 		current->pi_state_cache = pi_state;
 	}
 }
@@ -1051,7 +1052,7 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
 	 * and futex_wait_requeue_pi() as it cannot go to 0 and consequently
 	 * free pi_state before we can take a reference ourselves.
 	 */
-	WARN_ON(!atomic_read(&pi_state->refcount));
+	WARN_ON(!refcount_read(&pi_state->refcount));
 
 	/*
 	 * Now that we have a pi_state, we can acquire wait_lock
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
  2017-07-18 10:29 [PATCH 00/15] v4 kernel core pieces refcount conversions Elena Reshetova
@ 2017-07-18 10:30 ` Elena Reshetova
  0 siblings, 0 replies; 51+ messages in thread
From: Elena Reshetova @ 2017-07-18 10:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, peterz, gregkh, viro, tj, mingo, hannes, lizefan,
	acme, alexander.shishkin, eparis, akpm, arnd, luto, keescook,
	tglx, dvhart, ebiederm, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable futex_pi_state.refcount is used as pure
reference counter. Convert it to refcount_t and fix up
the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/futex.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 16dbe4c..1cc7641 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -67,6 +67,7 @@
 #include <linux/freezer.h>
 #include <linux/bootmem.h>
 #include <linux/fault-inject.h>
+#include <linux/refcount.h>
 
 #include <asm/futex.h>
 
@@ -209,7 +210,7 @@ struct futex_pi_state {
 	struct rt_mutex pi_mutex;
 
 	struct task_struct *owner;
-	atomic_t refcount;
+	refcount_t refcount;
 
 	union futex_key key;
 } __randomize_layout;
@@ -794,7 +795,7 @@ static int refill_pi_state_cache(void)
 	INIT_LIST_HEAD(&pi_state->list);
 	/* pi_mutex gets initialized later */
 	pi_state->owner = NULL;
-	atomic_set(&pi_state->refcount, 1);
+	refcount_set(&pi_state->refcount, 1);
 	pi_state->key = FUTEX_KEY_INIT;
 
 	current->pi_state_cache = pi_state;
@@ -814,7 +815,7 @@ static struct futex_pi_state *alloc_pi_state(void)
 
 static void get_pi_state(struct futex_pi_state *pi_state)
 {
-	WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount));
+	refcount_inc(&pi_state->refcount);
 }
 
 /*
@@ -828,7 +829,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 	if (!pi_state)
 		return;
 
-	if (!atomic_dec_and_test(&pi_state->refcount))
+	if (!refcount_dec_and_test(&pi_state->refcount))
 		return;
 
 	/*
@@ -852,7 +853,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 		 * refcount is at 0 - put it back to 1.
 		 */
 		pi_state->owner = NULL;
-		atomic_set(&pi_state->refcount, 1);
+		refcount_set(&pi_state->refcount, 1);
 		current->pi_state_cache = pi_state;
 	}
 }
@@ -1046,7 +1047,7 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
 	 * and futex_wait_requeue_pi() as it cannot go to 0 and consequently
 	 * free pi_state before we can take a reference ourselves.
 	 */
-	WARN_ON(!atomic_read(&pi_state->refcount));
+	WARN_ON(!refcount_read(&pi_state->refcount));
 
 	/*
 	 * Now that we have a pi_state, we can acquire wait_lock
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-10-23  7:37 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30 12:22 [PATCH 00/15] v5 kernel core pieces refcount conversions Elena Reshetova
2017-08-30 12:22 ` [PATCH 01/15] sched: convert sighand_struct.count to refcount_t Elena Reshetova
2017-08-30 12:22 ` [PATCH 02/15] sched: convert signal_struct.sigcnt " Elena Reshetova
2017-08-30 12:22 ` [PATCH 03/15] sched: convert user_struct.__count " Elena Reshetova
2017-08-30 12:22 ` [PATCH 04/15] sched: convert numa_group.refcount " Elena Reshetova
2017-08-30 12:22 ` [PATCH 05/15] sched/task_struct: convert task_struct.usage " Elena Reshetova
2017-08-30 12:22 ` [PATCH 06/15] sched/task_struct: convert task_struct.stack_refcount " Elena Reshetova
2017-08-30 12:22 ` [PATCH 07/15] perf: convert perf_event_context.refcount " Elena Reshetova
2017-08-30 12:22 ` [PATCH 08/15] perf/ring_buffer: convert ring_buffer.refcount " Elena Reshetova
2017-08-30 12:22 ` [PATCH 09/15] perf/ring_buffer: convert ring_buffer.aux_refcount " Elena Reshetova
2017-08-30 12:22 ` [PATCH 10/15] uprobes: convert uprobe.ref " Elena Reshetova
2017-08-30 12:22 ` [PATCH 11/15] nsproxy: convert nsproxy.count " Elena Reshetova
2017-08-30 12:22 ` [PATCH 12/15] groups: convert group_info.usage " Elena Reshetova
2017-08-30 12:22 ` [PATCH 13/15] creds: convert cred.usage " Elena Reshetova
2017-08-30 12:22 ` [PATCH 14/15] futex: convert futex_pi_state.refcount " Elena Reshetova
2017-09-01  7:39   ` Thomas Gleixner
2017-09-01  9:38     ` Peter Zijlstra
2017-09-01  9:43       ` Thomas Gleixner
2017-09-01 10:52         ` Reshetova, Elena
2017-09-01 10:52           ` Reshetova, Elena
2017-09-01 11:05         ` Reshetova, Elena
2017-09-01 11:05           ` Reshetova, Elena
2017-09-01 12:34           ` Peter Zijlstra
2017-09-01 12:34             ` Peter Zijlstra
2017-09-01 13:24             ` Reshetova, Elena
2017-09-01 13:24               ` Reshetova, Elena
2017-09-01 13:36               ` Peter Zijlstra
2017-09-01 13:36                 ` Peter Zijlstra
2017-09-01 17:03                 ` Reshetova, Elena
2017-09-01 17:03                   ` Reshetova, Elena
2017-09-01 19:12                   ` Peter Zijlstra
2017-09-01 19:12                     ` Peter Zijlstra
2017-09-04 10:31                     ` Reshetova, Elena
2017-09-04 10:31                       ` Reshetova, Elena
2017-09-04 12:00                       ` Peter Zijlstra
2017-09-04 12:00                         ` Peter Zijlstra
2017-09-14 16:02                         ` Reshetova, Elena
2017-09-14 16:02                           ` Reshetova, Elena
2017-10-20 12:03                           ` Reshetova, Elena
2017-10-20 12:03                             ` Reshetova, Elena
2017-10-20 12:30                             ` Thomas Gleixner
2017-10-20 12:30                               ` Thomas Gleixner
2017-10-23  7:36                               ` Reshetova, Elena
2017-10-23  7:36                                 ` Reshetova, Elena
2017-08-30 12:22 ` [PATCH 15/15] kcov: convert kcov.refcount " Elena Reshetova
2017-08-31 23:48 ` [PATCH 00/15] v5 kernel core pieces refcount conversions Kees Cook
2017-09-01  9:48   ` Peter Zijlstra
2017-09-01 16:55     ` Kees Cook
2017-09-01 17:08       ` Reshetova, Elena
  -- strict thread matches above, loose matches on Subject: below --
2017-08-25  9:41 [PATCH 00/15] v4 " Elena Reshetova
2017-08-25  9:41 ` [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t Elena Reshetova
2017-07-18 10:29 [PATCH 00/15] v4 kernel core pieces refcount conversions Elena Reshetova
2017-07-18 10:30 ` [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t Elena Reshetova

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.