All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] core: add lockless update_rlimit_cpu
@ 2009-09-02  9:45 Jiri Slaby
  2009-09-02  9:45 ` [PATCH 2/2] core: allow setrlimit to non-current tasks Jiri Slaby
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-02  9:45 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Oleg Nesterov wrote:
> But I dislike the fact the patch uses tasklist_lock. Can't
> lock_task_sighand() work for you? (of course, in this case
> update_rlimit_cpu() should be updated too).

Yeah, I think so. I didn't know about that, sorry.

Something like these two?
--

Rename update_rlimit_cpu to __update_rlimit_cpu and create
update_rlimit_cpu wrapper with sighand lock.

The former is to be used by users who already hold the lock.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/posix-timers.h |    1 +
 kernel/posix-cpu-timers.c    |   14 +++++++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 3e23844..d0bac29 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -117,6 +117,7 @@ void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
 
 long clock_nanosleep_restart(struct restart_block *restart_block);
 
+void __update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
 #endif
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index a61c625..56eb1d5 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -12,17 +12,21 @@
 /*
  * Called after updating RLIMIT_CPU to set timer expiration if necessary.
  */
-void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new)
+void __update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new)
 {
 	cputime_t cputime = secs_to_cputime(rlim_new);
 	struct signal_struct *const sig = task->signal;
 
 	if (cputime_eq(sig->it[CPUCLOCK_PROF].expires, cputime_zero) ||
-	    cputime_gt(sig->it[CPUCLOCK_PROF].expires, cputime)) {
-		spin_lock_irq(&task->sighand->siglock);
+	    cputime_gt(sig->it[CPUCLOCK_PROF].expires, cputime))
 		set_process_cpu_timer(task, CPUCLOCK_PROF, &cputime, NULL);
-		spin_unlock_irq(&task->sighand->siglock);
-	}
+}
+
+void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new)
+{
+	spin_lock_irq(&task->sighand->siglock);
+	__update_rlimit_cpu(task, rlim_new);
+	spin_unlock_irq(&task->sighand->siglock);
 }
 
 static int check_clock(const clockid_t which_clock)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] core: allow setrlimit to non-current tasks
  2009-09-02  9:45 [PATCH 1/2] core: add lockless update_rlimit_cpu Jiri Slaby
@ 2009-09-02  9:45 ` Jiri Slaby
  2009-09-02  9:47   ` Jiri Slaby
  2009-09-02 13:50   ` Oleg Nesterov
  0 siblings, 2 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-02  9:45 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Add locking to allow setrlimit accept task parameter other than
current.

Namely, lock tasklist_lock for read and check whether the task
structure has signal and sighand non-null. Do all the signal
processing under that lock still held.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sys.c |   25 ++++++++++++++++++-------
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 919cebe..4a55bef 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1240,20 +1240,28 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 		struct rlimit *new_rlim)
 {
 	struct rlimit *old_rlim;
+	unsigned long flags;
 	int retval;
 
 	if (new_rlim->rlim_cur > new_rlim->rlim_max)
 		return -EINVAL;
+
+	if (lock_task_sighand(tsk, &flags) == NULL)
+		return -ESRCH;
 	old_rlim = tsk->signal->rlim + resource;
 	if ((new_rlim->rlim_max > old_rlim->rlim_max) &&
-	    !capable(CAP_SYS_RESOURCE))
-		return -EPERM;
-	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
-		return -EPERM;
+	    !capable(CAP_SYS_RESOURCE)) {
+		retval = -EPERM;
+		goto unlock;
+	}
+	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open) {
+		retval = -EPERM;
+		goto unlock;
+	}
 
 	retval = security_task_setrlimit(tsk, resource, new_rlim);
 	if (retval)
-		return retval;
+		goto unlock;
 
 	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
 		/*
@@ -1281,9 +1289,12 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 	if (new_rlim->rlim_cur == RLIM_INFINITY)
 		goto out;
 
-	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
+	__update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
-	return 0;
+	retval = 0;
+unlock:
+	unlock_task_sighand(tsk, &flags);
+	return retval;
 }
 
 SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] core: allow setrlimit to non-current tasks
  2009-09-02  9:45 ` [PATCH 2/2] core: allow setrlimit to non-current tasks Jiri Slaby
@ 2009-09-02  9:47   ` Jiri Slaby
  2009-09-02 13:50   ` Oleg Nesterov
  1 sibling, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-02  9:47 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: oleg, akpm, mingo, linux-kernel

On 09/02/2009 11:45 AM, Jiri Slaby wrote:
> Namely, lock tasklist_lock for read and check whether the task
> structure has signal and sighand non-null. Do all the signal
> processing under that lock still held.

The description is incorrect now, indeed...

And I broke mail threads, sorry.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] core: allow setrlimit to non-current tasks
  2009-09-02  9:45 ` [PATCH 2/2] core: allow setrlimit to non-current tasks Jiri Slaby
  2009-09-02  9:47   ` Jiri Slaby
@ 2009-09-02 13:50   ` Oleg Nesterov
  2009-09-02 18:44     ` Jiri Slaby
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2009-09-02 13:50 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: akpm, mingo, linux-kernel

On 09/02, Jiri Slaby wrote:
>
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1240,20 +1240,28 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
>  		struct rlimit *new_rlim)
>  {
>  	struct rlimit *old_rlim;
> +	unsigned long flags;
>  	int retval;
>
>  	if (new_rlim->rlim_cur > new_rlim->rlim_max)
>  		return -EINVAL;
> +
> +	if (lock_task_sighand(tsk, &flags) == NULL)
> +		return -ESRCH;

No, sorry, this can't work.

Because we need task_lock() to update rlimits, and ->alloc_lock does not
nest under ->siglock.

Looks like we have to use tasklist_lock, but please don't use _irq, and
please do not check ->signal != NULL. Perhaps it makes sense to take
tasklist only if !same_thread_group(tsk, current) though.

Oh. We really need to make ->signal refcountable.


But there is another minor problem. If we use read_lock(ttasklist), then
the write to /proc/application_pid/limits can race with application doing
sys_setrlimits().

Nothing bad can happen, but this means that "echo ... > /proc/limits" can
be lost. Not good, if admin wants to lower ->rlim_max we should try to ensure
this always works.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] core: allow setrlimit to non-current tasks
  2009-09-02 13:50   ` Oleg Nesterov
@ 2009-09-02 18:44     ` Jiri Slaby
  2009-09-02 21:51       ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-02 18:44 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, mingo, linux-kernel

On 09/02/2009 03:50 PM, Oleg Nesterov wrote:
> But there is another minor problem. If we use read_lock(ttasklist), then
> the write to /proc/application_pid/limits can race with application doing
> sys_setrlimits().
> 
> Nothing bad can happen, but this means that "echo ... > /proc/limits" can
> be lost. Not good, if admin wants to lower ->rlim_max we should try to ensure
> this always works.

Actually, process cpu timer may be set to a wrong value. When
* somebody unrelated holds sighand->siglock
* process one stores rlim_new to rlim and gets stuck on spin_lock(siglock)
* process two does the same
* somebody releases sighand->siglock
* process one continues...

I can't think of anything else than doing all the checks and updates
under alloc_lock, introducing coarse grained static mutex in setrlimit
to protect it, or some other(I-don't-know-of)/new lock. Any ideas?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] core: allow setrlimit to non-current tasks
  2009-09-02 18:44     ` Jiri Slaby
@ 2009-09-02 21:51       ` Oleg Nesterov
  2009-09-03 13:47         ` Jiri Slaby
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2009-09-02 21:51 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: akpm, mingo, linux-kernel

On 09/02, Jiri Slaby wrote:
>
> On 09/02/2009 03:50 PM, Oleg Nesterov wrote:
> > But there is another minor problem. If we use read_lock(ttasklist), then
> > the write to /proc/application_pid/limits can race with application doing
> > sys_setrlimits().
> >
> > Nothing bad can happen, but this means that "echo ... > /proc/limits" can
> > be lost. Not good, if admin wants to lower ->rlim_max we should try to ensure
> > this always works.
>
> Actually, process cpu timer may be set to a wrong value. When

Yes, I thought about this too. In fact I was going to complain, but then
decided this is OK.

> * somebody unrelated holds sighand->siglock
> * process one stores rlim_new to rlim and gets stuck on spin_lock(siglock)
> * process two does the same

s/process/thread/. (I am talking about the current code). IOW, if the
application is stupid and does setrlimit() from multimple threads at
the same time - we can't help, the result is not predictable.

But, unless I missed something I think this case is fine, please see below.

> * somebody releases sighand->siglock
> * process one continues...

Now, it is possible that cputime_expires.xxx_exp does not match
->rlim[RLIMIT_CPU].rlim_cur.

But we don't care. update_rlimit_cpu() must ensure that cputime_expires.xxx_exp
is not greater than necessary, nothing else.

> I can't think of anything else than doing all the checks and updates
> under alloc_lock, introducing coarse grained static mutex in setrlimit
> to protect it,

Oh, please don't ;)

Or I missed your point?


But if you mean this series, then yes, I agree. We should try to do something
to ensure that at least rlim_max can be always lowered when admin writes to
/proc/pid/limits.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] core: allow setrlimit to non-current tasks
  2009-09-02 21:51       ` Oleg Nesterov
@ 2009-09-03 13:47         ` Jiri Slaby
  2009-09-03 13:52           ` [PATCH] " Jiri Slaby
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 13:47 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, mingo, linux-kernel

On 09/02/2009 11:51 PM, Oleg Nesterov wrote:
> On 09/02, Jiri Slaby wrote:
>> I can't think of anything else than doing all the checks and updates
>> under alloc_lock, introducing coarse grained static mutex in setrlimit
>> to protect it,
> 
> Oh, please don't ;)
> 
> Or I missed your point?
> 
> 
> But if you mean this series, then yes, I agree. 

Yes, I meant those. But I don't know what do you agree with :).

> We should try to do something
> to ensure that at least rlim_max can be always lowered when admin writes to
> /proc/pid/limits.

Yes, that's what I asked about when I wrote the three options which I
was able to think of above. So any other ideas about how to elegantly
protect against sys_setrlimit vs. admin+/proc/*/limits race?

Thanks a heap.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] core: allow setrlimit to non-current tasks
  2009-09-03 13:47         ` Jiri Slaby
@ 2009-09-03 13:52           ` Jiri Slaby
  2009-09-03 17:41             ` Oleg Nesterov
  2009-09-03 17:20           ` [PATCH 0/1] sys_setrlimit: make sure ->rlim_max never grows Oleg Nesterov
  2009-09-03 17:21           ` [PATCH 1/1] " Oleg Nesterov
  2 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 13:52 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Just for clarity. The current patch looks like this:
--
Add locking to allow setrlimit accept task parameter other than
current.

Namely, lock tasklist_lock for read and check whether the task
structure has sighand non-null. Do all the signal processing under
that lock still held. We do this only for non-current tasks.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sys.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index ed500c8..85d5d4d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1240,6 +1240,7 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 		struct rlimit *new_rlim)
 {
 	struct rlimit *old_rlim;
+	unsigned int needs_locking = !same_thread_group(tsk, current);
 	int retval;
 
 	if (new_rlim->rlim_cur > new_rlim->rlim_max)
@@ -1247,13 +1248,24 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
 		return -EPERM;
 
+	/* optimization: 'current' doesn't need locking, e.g. setrlimit */
+	if (needs_locking) {
+		/* protect tsk->signal and tsk->sighand from disappearing */
+		read_lock(&tasklist_lock);
+		if (!tsk->sighand) {
+			retval = -ESRCH;
+			goto unlock;
+		}
+	}
 	old_rlim = tsk->signal->rlim + resource;
 	if ((new_rlim->rlim_max > old_rlim->rlim_max) &&
-	    !capable(CAP_SYS_RESOURCE))
-		return -EPERM;
+	    !capable(CAP_SYS_RESOURCE)) {
+		retval = -EPERM;
+		goto unlock;
+	}
 	retval = security_task_setrlimit(tsk, resource, new_rlim);
 	if (retval)
-		return retval;
+		goto unlock;
 
 	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
 		/*
@@ -1283,7 +1295,11 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 
 	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
-	return 0;
+	retval = 0;
+unlock:
+	if (needs_locking)
+		read_unlock(&tasklist_lock);
+	return retval;
 }
 
 SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 0/1] sys_setrlimit: make sure ->rlim_max never grows
  2009-09-03 13:47         ` Jiri Slaby
  2009-09-03 13:52           ` [PATCH] " Jiri Slaby
@ 2009-09-03 17:20           ` Oleg Nesterov
  2009-09-03 17:21           ` [PATCH 1/1] " Oleg Nesterov
  2 siblings, 0 replies; 22+ messages in thread
From: Oleg Nesterov @ 2009-09-03 17:20 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: akpm, mingo, linux-kernel

On 09/03, Jiri Slaby wrote:
>
> On 09/02/2009 11:51 PM, Oleg Nesterov wrote:
> > On 09/02, Jiri Slaby wrote:
> >> I can't think of anything else than doing all the checks and updates
> >> under alloc_lock, introducing coarse grained static mutex in setrlimit
> >> to protect it,
> >
> > Oh, please don't ;)
> >
> > Or I missed your point?
> >
> > But if you mean this series, then yes, I agree.
>
> Yes, I meant those. But I don't know what do you agree with :).

Not sure what I agree with, but I am glad we seem to agree with each other ;)

> > We should try to do something
> > to ensure that at least rlim_max can be always lowered when admin writes to
> > /proc/pid/limits.
>
> Yes, that's what I asked about when I wrote the three options which I
> was able to think of above. So any other ideas about how to elegantly
> protect against sys_setrlimit vs. admin+/proc/*/limits race?

Perhaps we should start these change with this patch (see the next email) ?

Perhaps, before your changes, we should "fix" sys_setrlimit() first ?
Well, the patch (the next email) is not tested... What do you think?

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/1] sys_setrlimit: make sure ->rlim_max never grows
  2009-09-03 13:47         ` Jiri Slaby
  2009-09-03 13:52           ` [PATCH] " Jiri Slaby
  2009-09-03 17:20           ` [PATCH 0/1] sys_setrlimit: make sure ->rlim_max never grows Oleg Nesterov
@ 2009-09-03 17:21           ` Oleg Nesterov
  2 siblings, 0 replies; 22+ messages in thread
From: Oleg Nesterov @ 2009-09-03 17:21 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: akpm, mingo, linux-kernel

Mostly preparation for Jiri's changes, but probably makes sense anyway.

sys_setrlimit() checks new_rlim.rlim_max <= old_rlim->rlim_max, but when
it takes task_lock() old_rlim->rlim_max can be already lowered. Move this
check under task_lock().

Currently this is not important, we can only race with our sub-thread,
this means the application is stupid. But when we change the code to allow
the update of !current task's limits, it becomes important to make sure
->rlim_max can be lowered "reliably" even if we race with the application
doing sys_setrlimit().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 kernel/sys.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- WAIT/kernel/sys.c~SETRLIMIT	2009-06-17 14:11:26.000000000 +0200
+++ WAIT/kernel/sys.c	2009-09-03 19:01:48.000000000 +0200
@@ -1247,10 +1247,6 @@ SYSCALL_DEFINE2(setrlimit, unsigned int,
 		return -EFAULT;
 	if (new_rlim.rlim_cur > new_rlim.rlim_max)
 		return -EINVAL;
-	old_rlim = current->signal->rlim + resource;
-	if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
-	    !capable(CAP_SYS_RESOURCE))
-		return -EPERM;
 	if (resource == RLIMIT_NOFILE && new_rlim.rlim_max > sysctl_nr_open)
 		return -EPERM;
 
@@ -1268,13 +1264,17 @@ SYSCALL_DEFINE2(setrlimit, unsigned int,
 		new_rlim.rlim_cur = 1;
 	}
 
+	old_rlim = current->signal->rlim + resource;
 	task_lock(current->group_leader);
-	*old_rlim = new_rlim;
+	if ((new_rlim.rlim_max <= old_rlim->rlim_max) ||
+				capable(CAP_SYS_RESOURCE))
+		*old_rlim = new_rlim;
+	else
+		retval = -EPERM;
 	task_unlock(current->group_leader);
 
-	if (resource != RLIMIT_CPU)
+	if (retval || resource != RLIMIT_CPU)
 		goto out;
-
 	/*
 	 * RLIMIT_CPU handling.   Note that the kernel fails to return an error
 	 * code if it rejected the user's attempt to set RLIMIT_CPU.  This is a
@@ -1286,7 +1286,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int,
 
 	update_rlimit_cpu(new_rlim.rlim_cur);
 out:
-	return 0;
+	return retval;
 }
 
 /*


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] core: allow setrlimit to non-current tasks
  2009-09-03 13:52           ` [PATCH] " Jiri Slaby
@ 2009-09-03 17:41             ` Oleg Nesterov
  2009-09-03 20:08               ` [PATCH v2 1/8] SECURITY: selinux, fix update_rlimit_cpu parameter Jiri Slaby
                                 ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Oleg Nesterov @ 2009-09-03 17:41 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: akpm, mingo, linux-kernel

On 09/03, Jiri Slaby wrote:
>
> @@ -1240,6 +1240,7 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
>  		struct rlimit *new_rlim)
>  {
>  	struct rlimit *old_rlim;
> +	unsigned int needs_locking = !same_thread_group(tsk, current);
>  	int retval;

Yes, thanks for doing this, imho this optimization is worthwhile.

But I'd suggest you to add this optimization in a separate patch
because,

> +	/* optimization: 'current' doesn't need locking, e.g. setrlimit */
> +	if (needs_locking) {
> +		/* protect tsk->signal and tsk->sighand from disappearing */
> +		read_lock(&tasklist_lock);
> +		if (!tsk->sighand) {
> +			retval = -ESRCH;
> +			goto unlock;

I should have mentioned this before, but it is not that simple.

Even if same_thread_group(tsk, current), we must not trust tsk->sighand,
it can be NULL if our subthread is dead. (well, we need ->signal, not
->sighand but this doesn't matter because they disappear simultaneously).

Actually, perhaps same_thread_group() is not needed, perhaps it is enough
to avoid tasklist in sys_setrlimit case. So, I think optimization should
do:

	retval = -ESRCH;
	if (tsk != current) {
		read_lock(&tasklist_lock);
		if (!tsk->sighand)
			goto unlock;
	}

unlock:
	if (tsk != current)
		read_unlock(&tasklist_lock);


Or, if we use same_thread_group(),

	needs_locking = !same_thread_group(tsk, current);

	if (!needs_locking)
		tsk = current;
	else {
		take tasklist, check ->sighand.
	}

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] SECURITY: selinux, fix update_rlimit_cpu parameter
  2009-09-03 17:41             ` Oleg Nesterov
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 2/8] SECURITY: add task_struct to setrlimit Jiri Slaby
                                 ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg
  Cc: akpm, mingo, linux-kernel, Jiri Slaby, Stephen Smalley,
	Eric Paris, David Howells

OK, thanks, I ended up having this so far.
--
Don't pass rlim_cur member of RLIM_NLIMITS-1=RLIMIT_RTTIME limit
to update_rlimit_cpu() in selinux_bprm_committing_creds.

Use proper rlim[RLIMIT_CPU].rlim_cur instead.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Cc: David Howells <dhowells@redhat.com>
---
 security/selinux/hooks.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index cf41988..496e626 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2360,7 +2360,7 @@ static void selinux_bprm_committing_creds(struct linux_binprm *bprm)
 			initrlim = init_task.signal->rlim + i;
 			rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
 		}
-		update_rlimit_cpu(rlim->rlim_cur);
+		update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
 	}
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/8] SECURITY: add task_struct to setrlimit
  2009-09-03 17:41             ` Oleg Nesterov
  2009-09-03 20:08               ` [PATCH v2 1/8] SECURITY: selinux, fix update_rlimit_cpu parameter Jiri Slaby
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 3/8] core: add task_struct to update_rlimit_cpu Jiri Slaby
                                 ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Add task_struct to task_setrlimit of security_operations to be able to set
rlimit of different task than current.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
---
 include/linux/security.h |    9 ++++++---
 kernel/sys.c             |    2 +-
 security/capability.c    |    3 ++-
 security/security.c      |    5 +++--
 security/selinux/hooks.c |    7 ++++---
 5 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 0f81525..880a21c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1520,7 +1520,8 @@ struct security_operations {
 	int (*task_setnice) (struct task_struct *p, int nice);
 	int (*task_setioprio) (struct task_struct *p, int ioprio);
 	int (*task_getioprio) (struct task_struct *p);
-	int (*task_setrlimit) (unsigned int resource, struct rlimit *new_rlim);
+	int (*task_setrlimit) (struct task_struct *p, unsigned int resource,
+			struct rlimit *new_rlim);
 	int (*task_setscheduler) (struct task_struct *p, int policy,
 				  struct sched_param *lp);
 	int (*task_getscheduler) (struct task_struct *p);
@@ -1775,7 +1776,8 @@ int security_task_setgroups(struct group_info *group_info);
 int security_task_setnice(struct task_struct *p, int nice);
 int security_task_setioprio(struct task_struct *p, int ioprio);
 int security_task_getioprio(struct task_struct *p);
-int security_task_setrlimit(unsigned int resource, struct rlimit *new_rlim);
+int security_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim);
 int security_task_setscheduler(struct task_struct *p,
 				int policy, struct sched_param *lp);
 int security_task_getscheduler(struct task_struct *p);
@@ -2385,7 +2387,8 @@ static inline int security_task_getioprio(struct task_struct *p)
 	return 0;
 }
 
-static inline int security_task_setrlimit(unsigned int resource,
+static inline int security_task_setrlimit(struct task_struct *p,
+					  unsigned int resource,
 					  struct rlimit *new_rlim)
 {
 	return 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index a7e36b6..ae7be2e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1254,7 +1254,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 	if (resource == RLIMIT_NOFILE && new_rlim.rlim_max > sysctl_nr_open)
 		return -EPERM;
 
-	retval = security_task_setrlimit(resource, &new_rlim);
+	retval = security_task_setrlimit(current, resource, &new_rlim);
 	if (retval)
 		return retval;
 
diff --git a/security/capability.c b/security/capability.c
index 6c38f4d..90d6daf 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -447,7 +447,8 @@ static int cap_task_getioprio(struct task_struct *p)
 	return 0;
 }
 
-static int cap_task_setrlimit(unsigned int resource, struct rlimit *new_rlim)
+static int cap_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index 0b7f9eb..b2c21dc 100644
--- a/security/security.c
+++ b/security/security.c
@@ -779,9 +779,10 @@ int security_task_getioprio(struct task_struct *p)
 	return security_ops->task_getioprio(p);
 }
 
-int security_task_setrlimit(unsigned int resource, struct rlimit *new_rlim)
+int security_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim)
 {
-	return security_ops->task_setrlimit(resource, new_rlim);
+	return security_ops->task_setrlimit(p, resource, new_rlim);
 }
 
 int security_task_setscheduler(struct task_struct *p,
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 496e626..839622a 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3355,16 +3355,17 @@ static int selinux_task_getioprio(struct task_struct *p)
 	return current_has_perm(p, PROCESS__GETSCHED);
 }
 
-static int selinux_task_setrlimit(unsigned int resource, struct rlimit *new_rlim)
+static int selinux_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim)
 {
-	struct rlimit *old_rlim = current->signal->rlim + resource;
+	struct rlimit *old_rlim = p->signal->rlim + resource;
 
 	/* Control the ability to change the hard limit (whether
 	   lowering or raising it), so that the hard limit can
 	   later be used as a safe reset point for the soft limit
 	   upon context transitions.  See selinux_bprm_committing_creds. */
 	if (old_rlim->rlim_max != new_rlim->rlim_max)
-		return current_has_perm(current, PROCESS__SETRLIMIT);
+		return current_has_perm(p, PROCESS__SETRLIMIT);
 
 	return 0;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/8] core: add task_struct to update_rlimit_cpu
  2009-09-03 17:41             ` Oleg Nesterov
  2009-09-03 20:08               ` [PATCH v2 1/8] SECURITY: selinux, fix update_rlimit_cpu parameter Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 2/8] SECURITY: add task_struct to setrlimit Jiri Slaby
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 4/8] sys_setrlimit: make sure ->rlim_max never grows Jiri Slaby
                                 ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Add task_struct as a parameter to update_rlimit_cpu to be able to set
rlimit_cpu of different task than current.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
---
 include/linux/posix-timers.h |    2 +-
 kernel/posix-cpu-timers.c    |   10 +++++-----
 kernel/sys.c                 |    2 +-
 security/selinux/hooks.c     |    3 ++-
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 4f71bf4..3e23844 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -117,6 +117,6 @@ void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
 
 long clock_nanosleep_restart(struct restart_block *restart_block);
 
-void update_rlimit_cpu(unsigned long rlim_new);
+void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
 #endif
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 12161f7..a61c625 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -12,16 +12,16 @@
 /*
  * Called after updating RLIMIT_CPU to set timer expiration if necessary.
  */
-void update_rlimit_cpu(unsigned long rlim_new)
+void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new)
 {
 	cputime_t cputime = secs_to_cputime(rlim_new);
-	struct signal_struct *const sig = current->signal;
+	struct signal_struct *const sig = task->signal;
 
 	if (cputime_eq(sig->it[CPUCLOCK_PROF].expires, cputime_zero) ||
 	    cputime_gt(sig->it[CPUCLOCK_PROF].expires, cputime)) {
-		spin_lock_irq(&current->sighand->siglock);
-		set_process_cpu_timer(current, CPUCLOCK_PROF, &cputime, NULL);
-		spin_unlock_irq(&current->sighand->siglock);
+		spin_lock_irq(&task->sighand->siglock);
+		set_process_cpu_timer(task, CPUCLOCK_PROF, &cputime, NULL);
+		spin_unlock_irq(&task->sighand->siglock);
 	}
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index ae7be2e..d501f60 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1284,7 +1284,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 	if (new_rlim.rlim_cur == RLIM_INFINITY)
 		goto out;
 
-	update_rlimit_cpu(new_rlim.rlim_cur);
+	update_rlimit_cpu(current, new_rlim.rlim_cur);
 out:
 	return 0;
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 839622a..6903784 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2360,7 +2360,8 @@ static void selinux_bprm_committing_creds(struct linux_binprm *bprm)
 			initrlim = init_task.signal->rlim + i;
 			rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
 		}
-		update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
+		update_rlimit_cpu(current,
+				current->signal->rlim[RLIMIT_CPU].rlim_cur);
 	}
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/8] sys_setrlimit: make sure ->rlim_max never grows
  2009-09-03 17:41             ` Oleg Nesterov
                                 ` (2 preceding siblings ...)
  2009-09-03 20:08               ` [PATCH v2 3/8] core: add task_struct to update_rlimit_cpu Jiri Slaby
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 5/8] core: split sys_setrlimit Jiri Slaby
                                 ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

From: Oleg Nesterov <oleg@redhat.com>

Mostly preparation for Jiri's changes, but probably makes sense anyway.

sys_setrlimit() checks new_rlim.rlim_max <= old_rlim->rlim_max, but when
it takes task_lock() old_rlim->rlim_max can be already lowered. Move this
check under task_lock().

Currently this is not important, we can only race with our sub-thread,
this means the application is stupid. But when we change the code to allow
the update of !current task's limits, it becomes important to make sure
->rlim_max can be lowered "reliably" even if we race with the application
doing sys_setrlimit().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
---
 kernel/sys.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index d501f60..600d3e2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1247,10 +1247,6 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 		return -EFAULT;
 	if (new_rlim.rlim_cur > new_rlim.rlim_max)
 		return -EINVAL;
-	old_rlim = current->signal->rlim + resource;
-	if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
-	    !capable(CAP_SYS_RESOURCE))
-		return -EPERM;
 	if (resource == RLIMIT_NOFILE && new_rlim.rlim_max > sysctl_nr_open)
 		return -EPERM;
 
@@ -1268,11 +1264,16 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 		new_rlim.rlim_cur = 1;
 	}
 
+	old_rlim = current->signal->rlim + resource;
 	task_lock(current->group_leader);
-	*old_rlim = new_rlim;
+	if ((new_rlim.rlim_max <= old_rlim->rlim_max) ||
+				capable(CAP_SYS_RESOURCE))
+		*old_rlim = new_rlim;
+	else
+		retval = -EPERM;
 	task_unlock(current->group_leader);
 
-	if (resource != RLIMIT_CPU)
+	if (retval || resource != RLIMIT_CPU)
 		goto out;
 
 	/*
@@ -1286,7 +1287,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 
 	update_rlimit_cpu(current, new_rlim.rlim_cur);
 out:
-	return 0;
+	return retval;
 }
 
 /*
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/8] core: split sys_setrlimit
  2009-09-03 17:41             ` Oleg Nesterov
                                 ` (3 preceding siblings ...)
  2009-09-03 20:08               ` [PATCH v2 4/8] sys_setrlimit: make sure ->rlim_max never grows Jiri Slaby
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 6/8] core: allow setrlimit to non-current tasks Jiri Slaby
                                 ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Create setrlimit from sys_setrlimit and declare setrlimit in
the resource header. This is to allow rlimits to be changed not
only by syscall, but later from proc code too.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
---
 include/linux/resource.h |    2 ++
 kernel/sys.c             |   44 ++++++++++++++++++++++++++------------------
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/include/linux/resource.h b/include/linux/resource.h
index 40fc7e6..4301d67 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -71,5 +71,7 @@ struct rlimit {
 #include <asm/resource.h>
 
 int getrusage(struct task_struct *p, int who, struct rusage __user *ru);
+int setrlimit(struct task_struct *tsk, unsigned int resource,
+		struct rlimit *new_rlim);
 
 #endif
diff --git a/kernel/sys.c b/kernel/sys.c
index 600d3e2..655dd7b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1236,42 +1236,39 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
 
 #endif
 
-SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
+int setrlimit(struct task_struct *tsk, unsigned int resource,
+		struct rlimit *new_rlim)
 {
-	struct rlimit new_rlim, *old_rlim;
+	struct rlimit *old_rlim;
 	int retval;
 
-	if (resource >= RLIM_NLIMITS)
-		return -EINVAL;
-	if (copy_from_user(&new_rlim, rlim, sizeof(*rlim)))
-		return -EFAULT;
-	if (new_rlim.rlim_cur > new_rlim.rlim_max)
+	if (new_rlim->rlim_cur > new_rlim->rlim_max)
 		return -EINVAL;
-	if (resource == RLIMIT_NOFILE && new_rlim.rlim_max > sysctl_nr_open)
+	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
 		return -EPERM;
 
-	retval = security_task_setrlimit(current, resource, &new_rlim);
+	retval = security_task_setrlimit(tsk, resource, new_rlim);
 	if (retval)
 		return retval;
 
-	if (resource == RLIMIT_CPU && new_rlim.rlim_cur == 0) {
+	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
 		/*
 		 * The caller is asking for an immediate RLIMIT_CPU
 		 * expiry.  But we use the zero value to mean "it was
 		 * never set".  So let's cheat and make it one second
 		 * instead
 		 */
-		new_rlim.rlim_cur = 1;
+		new_rlim->rlim_cur = 1;
 	}
 
-	old_rlim = current->signal->rlim + resource;
-	task_lock(current->group_leader);
-	if ((new_rlim.rlim_max <= old_rlim->rlim_max) ||
+	old_rlim = tsk->signal->rlim + resource;
+	task_lock(tsk->group_leader);
+	if ((new_rlim->rlim_max <= old_rlim->rlim_max) ||
 				capable(CAP_SYS_RESOURCE))
-		*old_rlim = new_rlim;
+		*old_rlim = *new_rlim;
 	else
 		retval = -EPERM;
-	task_unlock(current->group_leader);
+	task_unlock(tsk->group_leader);
 
 	if (retval || resource != RLIMIT_CPU)
 		goto out;
@@ -1282,14 +1279,25 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 	 * very long-standing error, and fixing it now risks breakage of
 	 * applications, so we live with it
 	 */
-	if (new_rlim.rlim_cur == RLIM_INFINITY)
+	if (new_rlim->rlim_cur == RLIM_INFINITY)
 		goto out;
 
-	update_rlimit_cpu(current, new_rlim.rlim_cur);
+	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
 	return retval;
 }
 
+SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
+{
+	struct rlimit new_rlim;
+
+	if (resource >= RLIM_NLIMITS)
+		return -EINVAL;
+	if (copy_from_user(&new_rlim, rlim, sizeof(*rlim)))
+		return -EFAULT;
+	return setrlimit(current, resource, &new_rlim);
+}
+
 /*
  * It would make sense to put struct rusage in the task_struct,
  * except that would make the task_struct be *really big*.  After
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/8] core: allow setrlimit to non-current tasks
  2009-09-03 17:41             ` Oleg Nesterov
                                 ` (4 preceding siblings ...)
  2009-09-03 20:08               ` [PATCH v2 5/8] core: split sys_setrlimit Jiri Slaby
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 7/8] core: optimize setrlimit for current task Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 8/8] FS: proc, make limits writable Jiri Slaby
  7 siblings, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Add locking to allow setrlimit accept task parameter other than
current.

Namely, lock tasklist_lock for read and check whether the task
structure has sighand non-null. Do all the signal processing under
that lock still held. We do this only for non-current tasks.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sys.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 655dd7b..791b5c5 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1247,9 +1247,16 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
 		return -EPERM;
 
+	/* protect tsk->signal and tsk->sighand from disappearing */
+	read_lock(&tasklist_lock);
+	if (!tsk->sighand) {
+		retval = -ESRCH;
+		goto out;
+	}
+
 	retval = security_task_setrlimit(tsk, resource, new_rlim);
 	if (retval)
-		return retval;
+		goto out;
 
 	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
 		/*
@@ -1284,6 +1291,7 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 
 	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
+	read_unlock(&tasklist_lock);
 	return retval;
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 7/8] core: optimize setrlimit for current task
  2009-09-03 17:41             ` Oleg Nesterov
                                 ` (5 preceding siblings ...)
  2009-09-03 20:08               ` [PATCH v2 6/8] core: allow setrlimit to non-current tasks Jiri Slaby
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-03 20:08               ` [PATCH v2 8/8] FS: proc, make limits writable Jiri Slaby
  7 siblings, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Don't take tasklist lock for 'current'. It's not needed, since
current->sighand/signal can't disappear.

This improves serlimit called especially via sys_setrlimit.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sys.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 791b5c5..feca9d6 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1247,11 +1247,14 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
 		return -EPERM;
 
-	/* protect tsk->signal and tsk->sighand from disappearing */
-	read_lock(&tasklist_lock);
-	if (!tsk->sighand) {
-		retval = -ESRCH;
-		goto out;
+	/* optimization: 'current' doesn't need locking, e.g. setrlimit */
+	if (tsk != current) {
+		/* protect tsk->signal and tsk->sighand from disappearing */
+		read_lock(&tasklist_lock);
+		if (!tsk->sighand) {
+			retval = -ESRCH;
+			goto out;
+		}
 	}
 
 	retval = security_task_setrlimit(tsk, resource, new_rlim);
@@ -1291,7 +1294,8 @@ int setrlimit(struct task_struct *tsk, unsigned int resource,
 
 	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
-	read_unlock(&tasklist_lock);
+	if (tsk != current)
+		read_unlock(&tasklist_lock);
 	return retval;
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 8/8] FS: proc, make limits writable
  2009-09-03 17:41             ` Oleg Nesterov
                                 ` (6 preceding siblings ...)
  2009-09-03 20:08               ` [PATCH v2 7/8] core: optimize setrlimit for current task Jiri Slaby
@ 2009-09-03 20:08               ` Jiri Slaby
  2009-09-04 14:26                 ` Oleg Nesterov
  7 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-03 20:08 UTC (permalink / raw)
  To: oleg; +Cc: akpm, mingo, linux-kernel, Jiri Slaby

Allow writing strings such as
Max core file size=0:unlimited
to /proc/<pid>/limits to change limits.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
---
 fs/proc/base.c |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index ea71cae..0d9e3e9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -129,6 +129,8 @@ struct pid_entry {
 		NULL, &proc_single_file_operations,	\
 		{ .proc_show = show } )
 
+static ssize_t proc_info_read(struct file * file, char __user * buf,
+			  size_t count, loff_t *ppos);
 /*
  * Count the number of hardlinks for the pid_entry table, excluding the .
  * and .. links.
@@ -522,6 +524,74 @@ static int proc_pid_limits(struct task_struct *task, char *buffer)
 	return count;
 }
 
+static ssize_t limits_write(struct file *file, const char __user *buf,
+		size_t count, loff_t *ppos)
+{
+	struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
+	char str[32 + 1 + 16 + 1 + 16 + 1], *delim, *next;
+	struct rlimit new_rlimit;
+	unsigned int i;
+	int ret;
+
+	if (!task) {
+		count = -ESRCH;
+		goto out;
+	}
+	if (copy_from_user(str, buf, min(count, sizeof(str) - 1))) {
+		count = -EFAULT;
+		goto put_task;
+	}
+
+	str[min(count, sizeof(str) - 1)] = 0;
+
+	delim = strchr(str, '=');
+	if (!delim) {
+		count = -EINVAL;
+		goto put_task;
+	}
+	*delim++ = 0; /* for easy 'str' usage */
+	new_rlimit.rlim_cur = simple_strtoul(delim, &next, 0);
+	if (*next != ':') {
+		if (strncmp(delim, "unlimited:", 10)) {
+			count = -EINVAL;
+			goto put_task;
+		}
+		new_rlimit.rlim_cur = RLIM_INFINITY;
+		next = delim + 9; /* move to ':' */
+	}
+	delim = next + 1;
+	new_rlimit.rlim_max = simple_strtoul(delim, &next, 0);
+	if (*next != 0) {
+		if (strcmp(delim, "unlimited")) {
+			count = -EINVAL;
+			goto put_task;
+		}
+		new_rlimit.rlim_max = RLIM_INFINITY;
+	}
+
+	for (i = 0; i < RLIM_NLIMITS; i++)
+		if (!strcmp(str, lnames[i].name))
+			break;
+	if (i >= RLIM_NLIMITS) {
+		count = -EINVAL;
+		goto put_task;
+	}
+
+	ret = setrlimit(task, i, &new_rlimit);
+	if (ret)
+		count = ret;
+
+put_task:
+	put_task_struct(task);
+out:
+	return count;
+}
+
+static const struct file_operations proc_pid_limits_operations = {
+	.read		= proc_info_read,
+	.write		= limits_write,
+};
+
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 static int proc_pid_syscall(struct task_struct *task, char *buffer)
 {
@@ -2501,7 +2571,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 	INF("auxv",       S_IRUSR, proc_pid_auxv),
 	ONE("status",     S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
-	INF("limits",	  S_IRUSR, proc_pid_limits),
+	NOD("limits",	  S_IFREG|S_IRUSR|S_IWUSR, NULL,
+			&proc_pid_limits_operations,
+			{ .proc_read = proc_pid_limits }),
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
@@ -2836,7 +2908,9 @@ static const struct pid_entry tid_base_stuff[] = {
 	INF("auxv",      S_IRUSR, proc_pid_auxv),
 	ONE("status",    S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
-	INF("limits",	 S_IRUSR, proc_pid_limits),
+	NOD("limits",	  S_IFREG|S_IRUSR|S_IWUSR, NULL,
+			&proc_pid_limits_operations,
+			{ .proc_read = proc_pid_limits }),
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",     S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 8/8] FS: proc, make limits writable
  2009-09-03 20:08               ` [PATCH v2 8/8] FS: proc, make limits writable Jiri Slaby
@ 2009-09-04 14:26                 ` Oleg Nesterov
  2009-10-08 20:55                   ` Jiri Slaby
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2009-09-04 14:26 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: akpm, mingo, linux-kernel

On 09/03, Jiri Slaby wrote:
>
> Allow writing strings such as
> Max core file size=0:unlimited
> to /proc/<pid>/limits to change limits.

Can't review the parsing in limits_write() because I don't have enough
"C" skills, but otherwise the whole series looks correct to me.






One small nit, just to suggest the further 9/8 cleanup,

> +static const struct file_operations proc_pid_limits_operations = {
> +	.read		= proc_info_read,
> +	.write		= limits_write,
> +};

I think it makes sense to tweak proc_pid_limits() a little bit (and
rename it), so that we can do

	.read = limits_read,
	.write = limits_write

Then,

> @@ -2501,7 +2571,9 @@ static const struct pid_entry tgid_base_stuff[] = {
> +	NOD("limits",	  S_IFREG|S_IRUSR|S_IWUSR, NULL,
> +			&proc_pid_limits_operations,
> +			{ .proc_read = proc_pid_limits }),

We could use

	REG("limits", S_IRUSR|S_IWUSR, &proc_pid_limits_operations),

instead, this looks a bit cleaner to me.

But as I said, we can do this later.


And another minor nit (just in case you will re-submit this series for
some reason). Perhaps the changelog in 6/8 should mention that we do
not do any security checks when tsk != current (without selinux). We
assume that either the caller is sys_setrlimit(), or the caller should
verify it has rights to change the limits: in case of limits_write()
we rely on ->mode = S_IRUSR|S_IWUSR.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 8/8] FS: proc, make limits writable
  2009-09-04 14:26                 ` Oleg Nesterov
@ 2009-10-08 20:55                   ` Jiri Slaby
  2009-10-12 15:13                     ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-10-08 20:55 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, mingo, linux-kernel

On 09/04/2009 04:26 PM, Oleg Nesterov wrote:
> One small nit, just to suggest the further 9/8 cleanup,
> 
>> +static const struct file_operations proc_pid_limits_operations = {
>> +	.read		= proc_info_read,
>> +	.write		= limits_write,
>> +};
> 
> I think it makes sense to tweak proc_pid_limits() a little bit (and
> rename it), so that we can do
> 
> 	.read = limits_read,
> 	.write = limits_write
> 
> Then,
> 
>> @@ -2501,7 +2571,9 @@ static const struct pid_entry tgid_base_stuff[] = {
>> +	NOD("limits",	  S_IFREG|S_IRUSR|S_IWUSR, NULL,
>> +			&proc_pid_limits_operations,
>> +			{ .proc_read = proc_pid_limits }),
> 
> We could use
> 
> 	REG("limits", S_IRUSR|S_IWUSR, &proc_pid_limits_operations),
> 
> instead, this looks a bit cleaner to me.

Hi again, nobody picked them up yet, I waited till the end of the merge
window and now I'll repost.

Did you mean here to do the proc_info_read work (get/put task, alloc
buf, simple_read) directly in limits_read?

> And another minor nit (just in case you will re-submit this series for
> some reason). Perhaps the changelog in 6/8 should mention that we do
> not do any security checks when tsk != current (without selinux). We
> assume that either the caller is sys_setrlimit(), or the caller should
> verify it has rights to change the limits: in case of limits_write()
> we rely on ->mode = S_IRUSR|S_IWUSR.

I did it as a comment by the setrlimit. I think nobody would care about
a changelog note ;).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 8/8] FS: proc, make limits writable
  2009-10-08 20:55                   ` Jiri Slaby
@ 2009-10-12 15:13                     ` Oleg Nesterov
  0 siblings, 0 replies; 22+ messages in thread
From: Oleg Nesterov @ 2009-10-12 15:13 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: akpm, mingo, linux-kernel

On 10/08, Jiri Slaby wrote:
>
> On 09/04/2009 04:26 PM, Oleg Nesterov wrote:
> > One small nit, just to suggest the further 9/8 cleanup,
> >
> >> +static const struct file_operations proc_pid_limits_operations = {
> >> +	.read		= proc_info_read,
> >> +	.write		= limits_write,
> >> +};
> >
> > I think it makes sense to tweak proc_pid_limits() a little bit (and
> > rename it), so that we can do
> >
> > 	.read = limits_read,
> > 	.write = limits_write
> >
> > Then,
> >
> >> @@ -2501,7 +2571,9 @@ static const struct pid_entry tgid_base_stuff[] = {
> >> +	NOD("limits",	  S_IFREG|S_IRUSR|S_IWUSR, NULL,
> >> +			&proc_pid_limits_operations,
> >> +			{ .proc_read = proc_pid_limits }),
> >
> > We could use
> >
> > 	REG("limits", S_IRUSR|S_IWUSR, &proc_pid_limits_operations),
> >
> > instead, this looks a bit cleaner to me.
>
> Hi again, nobody picked them up yet, I waited till the end of the merge
> window and now I'll repost.
>
> Did you mean here to do the proc_info_read work (get/put task, alloc
> buf, simple_read) directly in limits_read?

limits_read() has to do get_proc_task(d_inode), yes. Otherwise I don't
see any additional complications, you can use simple_read_from_buffer(),
no need to allocate the buffer but of course you can you shouldn't
write more than "size_t count" bytes.

But perhaps I missed something, and in any case this is up to you. If
you don't like this - please forget.

> > And another minor nit (just in case you will re-submit this series for
> > some reason). Perhaps the changelog in 6/8 should mention that we do
> > not do any security checks when tsk != current (without selinux). We
> > assume that either the caller is sys_setrlimit(), or the caller should
> > verify it has rights to change the limits: in case of limits_write()
> > we rely on ->mode = S_IRUSR|S_IWUSR.
>
> I did it as a comment by the setrlimit. I think nobody would care about
> a changelog note ;).

Good ;)

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-10-12 15:19 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-02  9:45 [PATCH 1/2] core: add lockless update_rlimit_cpu Jiri Slaby
2009-09-02  9:45 ` [PATCH 2/2] core: allow setrlimit to non-current tasks Jiri Slaby
2009-09-02  9:47   ` Jiri Slaby
2009-09-02 13:50   ` Oleg Nesterov
2009-09-02 18:44     ` Jiri Slaby
2009-09-02 21:51       ` Oleg Nesterov
2009-09-03 13:47         ` Jiri Slaby
2009-09-03 13:52           ` [PATCH] " Jiri Slaby
2009-09-03 17:41             ` Oleg Nesterov
2009-09-03 20:08               ` [PATCH v2 1/8] SECURITY: selinux, fix update_rlimit_cpu parameter Jiri Slaby
2009-09-03 20:08               ` [PATCH v2 2/8] SECURITY: add task_struct to setrlimit Jiri Slaby
2009-09-03 20:08               ` [PATCH v2 3/8] core: add task_struct to update_rlimit_cpu Jiri Slaby
2009-09-03 20:08               ` [PATCH v2 4/8] sys_setrlimit: make sure ->rlim_max never grows Jiri Slaby
2009-09-03 20:08               ` [PATCH v2 5/8] core: split sys_setrlimit Jiri Slaby
2009-09-03 20:08               ` [PATCH v2 6/8] core: allow setrlimit to non-current tasks Jiri Slaby
2009-09-03 20:08               ` [PATCH v2 7/8] core: optimize setrlimit for current task Jiri Slaby
2009-09-03 20:08               ` [PATCH v2 8/8] FS: proc, make limits writable Jiri Slaby
2009-09-04 14:26                 ` Oleg Nesterov
2009-10-08 20:55                   ` Jiri Slaby
2009-10-12 15:13                     ` Oleg Nesterov
2009-09-03 17:20           ` [PATCH 0/1] sys_setrlimit: make sure ->rlim_max never grows Oleg Nesterov
2009-09-03 17:21           ` [PATCH 1/1] " Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.