* Re: [PATCH] oom: create a resource limit for oom_adj
[not found] ` <fNMps-1S1-21@gated-at.bofh.it>
@ 2010-11-11 23:15 ` Bodo Eggert
2010-11-11 23:21 ` David Rientjes
2010-11-14 5:07 ` KOSAKI Motohiro
[not found] ` <fNNOx-4qf-1@gated-at.bofh.it>
1 sibling, 2 replies; 15+ messages in thread
From: Bodo Eggert @ 2010-11-11 23:15 UTC (permalink / raw)
To: David Rientjes, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Rik van Riel, Ying Han, linux-kernel, gspencer,
piman, wad, olofj
David Rientjes <rientjes@google.com> wrote:
> On Thu, 11 Nov 2010, Mandeep Singh Baines wrote:
>> > What is the anticipated use case for this? We know that you want to lower
>> > oom_adj without CAP_SYS_RESOURCE, but what's the expected behavior when an
>> > app moves from foreground to background? I assume it's something like
>>
>> The focus here is the web browser's tabs. In our case, each is a process. If
>> OOM is going to kill a process, you'd rather it kill the tab you looked at
>> hours ago instead of the one you're looking at now. So you'd like to have a
>> policy where the LRU tab gets killed first. We'd like to use oom_score_adj
>> as the mechanism to implement an LRU policy like this.
>>
>
> Hmm, at first glance that seems potentially dangerous if the current tab
> generates a burt of memory allocations and it ends up killing all other
> tabs before finally targeting the culprit whereas currently the heuristic
> should do a good job of finding this problematic tab and killing it
> instantly.
The original oom_adj design would e.g. adjust all background tabs to seem
twice as bad as the current tab, so a ever-growing current tab would
only be able to get one or two tabs killed in the worst case:
System is near OOM while the current tab starts using a normal amount of
mem and grows beyond limits. After it easily killed the first bg tab by
allocating one byte, it needs to grow to twice the normal tab memsize to
start the OOM killer again. By then, it's score will be equal to normal
background tabs (half size, double score), and killing the second tab will
be luck. Killing the third tab should be impossible.
(If you adjust the bg tabs to be four times as killable, you'll get what
you asked for.)
I don't know the current oom_score_adj, it should be able to do something
similar? Or should there be a oom_score_mul?
>> > What do you anticipate will be writing to oom_score_adj with this patch,
>> > the app itself?
>>
>> A process in the browser session will do the adusting. We'd rather not give
>> it CAP_SYS_RESOURCE. It should only be allowed to change oom_score_adj up
>> and down within the bounds set by the administrator. Analagous to renice()
>> which we also do using a similar policy.
>>
>
> So as more and more tabs get used, the least recently used tab gets its
> oom_score_adj raised higher and higher until it is reused itself and then
> it gets reset back to 0 for the current tab?
As far as I understand, a background tab should get a higher score if it
rests untouched for some time, and if it gets used again, it will jump to
neutral oom_adj.
> Is there a reason you don't want to give the underlying browser session
> process CAP_SYS_RESOURCE?
I should not have control over a CAP_SYS_RESOURCE process while being a user,
but I should be able to tell the system which process to kill. When I designed
the oom_adj, I did not think about one-process-per-tab, so I did not suggest
a soft/hard limit.
I think the concept of using an rlimit is the right thing to do, I did not yet
look/think about the mapping. Glancing at oom_score_adj does suggest a different
range ...
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 23:15 ` [PATCH] oom: create a resource limit for oom_adj Bodo Eggert
@ 2010-11-11 23:21 ` David Rientjes
2010-11-14 5:07 ` KOSAKI Motohiro
1 sibling, 0 replies; 15+ messages in thread
From: David Rientjes @ 2010-11-11 23:21 UTC (permalink / raw)
To: Bodo Eggert
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
Ying Han, linux-kernel, gspencer, piman, wad, olofj
On Fri, 12 Nov 2010, Bodo Eggert wrote:
> The original oom_adj design would e.g. adjust all background tabs to seem
> twice as bad as the current tab, so a ever-growing current tab would
> only be able to get one or two tabs killed in the worst case:
>
> System is near OOM while the current tab starts using a normal amount of
> mem and grows beyond limits. After it easily killed the first bg tab by
> allocating one byte, it needs to grow to twice the normal tab memsize to
> start the OOM killer again. By then, it's score will be equal to normal
> background tabs (half size, double score), and killing the second tab will
> be luck. Killing the third tab should be impossible.
>
> (If you adjust the bg tabs to be four times as killable, you'll get what
> you asked for.)
>
>
> I don't know the current oom_score_adj, it should be able to do something
> similar? Or should there be a oom_score_mul?
>
oom_score_adj is done on a linear scale instead of exponential so instead
of increasing oom_adj by 1 for each hop to the current tab, you would need
to double it.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 23:15 ` [PATCH] oom: create a resource limit for oom_adj Bodo Eggert
2010-11-11 23:21 ` David Rientjes
@ 2010-11-14 5:07 ` KOSAKI Motohiro
2010-11-14 21:42 ` David Rientjes
1 sibling, 1 reply; 15+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14 5:07 UTC (permalink / raw)
To: 7eggert
Cc: kosaki.motohiro, David Rientjes, Andrew Morton,
KAMEZAWA Hiroyuki, Rik van Riel, Ying Han, linux-kernel,
gspencer, piman, wad, olofj, Linus Torvalds
> > Hmm, at first glance that seems potentially dangerous if the current tab
> > generates a burt of memory allocations and it ends up killing all other
> > tabs before finally targeting the culprit whereas currently the heuristic
> > should do a good job of finding this problematic tab and killing it
> > instantly.
>
> The original oom_adj design would e.g. adjust all background tabs to seem
> twice as bad as the current tab, so a ever-growing current tab would
> only be able to get one or two tabs killed in the worst case:
>
> System is near OOM while the current tab starts using a normal amount of
> mem and grows beyond limits. After it easily killed the first bg tab by
> allocating one byte, it needs to grow to twice the normal tab memsize to
> start the OOM killer again. By then, it's score will be equal to normal
> background tabs (half size, double score), and killing the second tab will
> be luck. Killing the third tab should be impossible.
>
> (If you adjust the bg tabs to be four times as killable, you'll get what
> you asked for.)
Sorry for the delay.
I've sent revert patch to linus. We DON'T want to break any userland seriously.
So, we reconsider about this area. personally I hope to reintroduce oom_score_adj
later, but It must don't break current userland.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-14 5:07 ` KOSAKI Motohiro
@ 2010-11-14 21:42 ` David Rientjes
2010-11-23 7:16 ` KOSAKI Motohiro
0 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2010-11-14 21:42 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: 7eggert, Andrew Morton, KAMEZAWA Hiroyuki, Rik van Riel,
Ying Han, linux-kernel, gspencer, piman, wad, olofj,
Linus Torvalds
On Sun, 14 Nov 2010, KOSAKI Motohiro wrote:
> I've sent revert patch to linus. We DON'T want to break any userland seriously.
> So, we reconsider about this area. personally I hope to reintroduce oom_score_adj
> later, but It must don't break current userland.
>
Nobody is going to stall their development waiting for your patches to be
merged (especially when you've proposed them four times before and they
haven't been), so patches in this area should be against Linus' latest
tree.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-14 21:42 ` David Rientjes
@ 2010-11-23 7:16 ` KOSAKI Motohiro
0 siblings, 0 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23 7:16 UTC (permalink / raw)
To: David Rientjes
Cc: kosaki.motohiro, 7eggert, Andrew Morton, KAMEZAWA Hiroyuki,
Rik van Riel, Ying Han, linux-kernel, gspencer, piman, wad,
olofj, Linus Torvalds
> On Sun, 14 Nov 2010, KOSAKI Motohiro wrote:
>
> > I've sent revert patch to linus. We DON'T want to break any userland seriously.
> > So, we reconsider about this area. personally I hope to reintroduce oom_score_adj
> > later, but It must don't break current userland.
> >
>
> Nobody is going to stall their development waiting for your patches to be
> merged (especially when you've proposed them four times before and they
> haven't been), so patches in this area should be against Linus' latest
> tree.
Nobody want to accept you break userland.
^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <fNNOx-4qf-1@gated-at.bofh.it>]
* Re: [PATCH] oom: create a resource limit for oom_adj
@ 2010-11-11 5:19 Figo.zhang
0 siblings, 0 replies; 15+ messages in thread
From: Figo.zhang @ 2010-11-11 5:19 UTC (permalink / raw)
To: linux-kernel
> if (oom_adjust< task->signal->oom_adj&& !capable(CAP_SYS_RESOURCE)) {
> - err = -EACCES;
> - goto err_sighand;
> + /* convert oom_adj [15,-17] to rlimit style value [1,33] */
> + long oom_rlim = OOM_ADJUST_MAX + 1 - oom_adjust;
> +
> + if (oom_rlim> task->signal->rlim[RLIMIT_OOMADJ].rlim_cur) {
> + unlock_task_sighand(task,&flags);
> + put_task_struct(task);
> + err = -EACCES;
> + goto err_sighand;
> + }
> }
=> Label "err_sighand" have do that, why are you do that on here?
+ err = -EACCES;
+ goto err_sighand;
+ }
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] oom: create a resource limit for oom_adj
@ 2010-11-11 4:35 Mandeep Singh Baines
2010-11-11 7:35 ` David Rientjes
2010-11-14 5:07 ` KOSAKI Motohiro
0 siblings, 2 replies; 15+ messages in thread
From: Mandeep Singh Baines @ 2010-11-11 4:35 UTC (permalink / raw)
To: Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Rik van Riel, Ying Han
Cc: linux-kernel, gspencer, piman, wad, olofj
For ChromiumOS, we'd like to be able to oom_adj a process up/down
as its leaves/enters the foreground. Currently, it is not possible
to oom_adj down without CAP_SYS_RESOURCE. This patch creates a new
resource limit, RLIMIT_OOMADJ, which is works in a similar fashion
to RLIMIT_NICE. This allows a process's oom_adj to be lowered
without CAP_SYS_RESOURCE as long as the new value is greater
than the resource limit.
Alternative considered:
* a setuid binary
* a daemon with CAP_SYS_RESOURCE
Since you don't wan't all processes to be able to reduce their
oom_adj, a setuid or daemon implementation would be complex. The
alternatives also have much higher overhead.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
---
fs/proc/base.c | 12 ++++++++++--
include/asm-generic/resource.h | 5 ++++-
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index f3d02ca..4384013 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -462,6 +462,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
[RLIMIT_NICE] = {"Max nice priority", NULL},
[RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
[RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
+ [RLIMIT_OOMADJ] = {"Max OOM adjust", NULL},
};
/* Display limits for a process */
@@ -1057,8 +1058,15 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
}
if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) {
- err = -EACCES;
- goto err_sighand;
+ /* convert oom_adj [15,-17] to rlimit style value [1,33] */
+ long oom_rlim = OOM_ADJUST_MAX + 1 - oom_adjust;
+
+ if (oom_rlim > task->signal->rlim[RLIMIT_OOMADJ].rlim_cur) {
+ unlock_task_sighand(task, &flags);
+ put_task_struct(task);
+ err = -EACCES;
+ goto err_sighand;
+ }
}
if (oom_adjust != task->signal->oom_adj) {
diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h
index 587566f..a8640a4 100644
--- a/include/asm-generic/resource.h
+++ b/include/asm-generic/resource.h
@@ -45,7 +45,9 @@
0-39 for nice level 19 .. -20 */
#define RLIMIT_RTPRIO 14 /* maximum realtime priority */
#define RLIMIT_RTTIME 15 /* timeout for RT tasks in us */
-#define RLIM_NLIMITS 16
+#define RLIMIT_OOMADJ 16 /* max oom_adj allowed to lower to
+ 0-32 for oom level 15 .. -17 */
+#define RLIM_NLIMITS 17
/*
* SuS says limits have to be unsigned.
@@ -86,6 +88,7 @@
[RLIMIT_MSGQUEUE] = { MQ_BYTES_MAX, MQ_BYTES_MAX }, \
[RLIMIT_NICE] = { 0, 0 }, \
[RLIMIT_RTPRIO] = { 0, 0 }, \
+ [RLIMIT_OOMADJ] = { 0, 0 }, \
[RLIMIT_RTTIME] = { RLIM_INFINITY, RLIM_INFINITY }, \
}
--
1.7.3.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 4:35 Mandeep Singh Baines
@ 2010-11-11 7:35 ` David Rientjes
2010-11-11 18:30 ` Mandeep Singh Baines
2010-11-14 5:07 ` KOSAKI Motohiro
1 sibling, 1 reply; 15+ messages in thread
From: David Rientjes @ 2010-11-11 7:35 UTC (permalink / raw)
To: Mandeep Singh Baines
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
Ying Han, linux-kernel, gspencer, piman, wad, olofj
On Wed, 10 Nov 2010, Mandeep Singh Baines wrote:
> For ChromiumOS, we'd like to be able to oom_adj a process up/down
> as its leaves/enters the foreground. Currently, it is not possible
> to oom_adj down without CAP_SYS_RESOURCE. This patch creates a new
> resource limit, RLIMIT_OOMADJ, which is works in a similar fashion
> to RLIMIT_NICE. This allows a process's oom_adj to be lowered
> without CAP_SYS_RESOURCE as long as the new value is greater
> than the resource limit.
>
First of all, oom_adj is deprecated and scheduled for removal in a couple
of years (see Documentation/feature-removal-schedule.txt) so any work in
this area should be targeting oom_score_adj instead.
What is the anticipated use case for this? We know that you want to lower
oom_adj without CAP_SYS_RESOURCE, but what's the expected behavior when an
app moves from foreground to background? I assume it's something like
having an oom_adj of 0 in the background and +15 in the foreground. If
so, does /proc/sys/vm/oom_kill_allocating_task get you most of what you're
looking for?
I'm wondering if we can avoid yet another resource limit for something
like this.
> Alternative considered:
>
> * a setuid binary
> * a daemon with CAP_SYS_RESOURCE
>
> Since you don't wan't all processes to be able to reduce their
> oom_adj, a setuid or daemon implementation would be complex. The
> alternatives also have much higher overhead.
>
What do you anticipate will be writing to oom_score_adj with this patch,
the app itself?
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> ---
> fs/proc/base.c | 12 ++++++++++--
> include/asm-generic/resource.h | 5 ++++-
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index f3d02ca..4384013 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -462,6 +462,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
> [RLIMIT_NICE] = {"Max nice priority", NULL},
> [RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
> [RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
> + [RLIMIT_OOMADJ] = {"Max OOM adjust", NULL},
s/Max/Min, right?
> };
>
> /* Display limits for a process */
> @@ -1057,8 +1058,15 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
> }
>
> if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) {
> - err = -EACCES;
> - goto err_sighand;
> + /* convert oom_adj [15,-17] to rlimit style value [1,33] */
> + long oom_rlim = OOM_ADJUST_MAX + 1 - oom_adjust;
> +
Ouch, that's a rather unfortunate mapping.
> + if (oom_rlim > task->signal->rlim[RLIMIT_OOMADJ].rlim_cur) {
> + unlock_task_sighand(task, &flags);
> + put_task_struct(task);
> + err = -EACCES;
> + goto err_sighand;
err_sighand has duplicate unlock_task_sighand() and put_task_struct();
since you're missing the task_unlock(task) here, just using goto
err_sighand would suffice.
> + }
> }
>
> if (oom_adjust != task->signal->oom_adj) {
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 7:35 ` David Rientjes
@ 2010-11-11 18:30 ` Mandeep Singh Baines
2010-11-11 20:57 ` David Rientjes
0 siblings, 1 reply; 15+ messages in thread
From: Mandeep Singh Baines @ 2010-11-11 18:30 UTC (permalink / raw)
To: David Rientjes
Cc: Mandeep Singh Baines, Andrew Morton, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Rik van Riel, Ying Han, linux-kernel, gspencer,
piman, wad, olofj
David Rientjes (rientjes@google.com) wrote:
> On Wed, 10 Nov 2010, Mandeep Singh Baines wrote:
>
> > For ChromiumOS, we'd like to be able to oom_adj a process up/down
> > as its leaves/enters the foreground. Currently, it is not possible
> > to oom_adj down without CAP_SYS_RESOURCE. This patch creates a new
> > resource limit, RLIMIT_OOMADJ, which is works in a similar fashion
> > to RLIMIT_NICE. This allows a process's oom_adj to be lowered
> > without CAP_SYS_RESOURCE as long as the new value is greater
> > than the resource limit.
> >
>
> First of all, oom_adj is deprecated and scheduled for removal in a couple
> of years (see Documentation/feature-removal-schedule.txt) so any work in
> this area should be targeting oom_score_adj instead.
>
Ah. Thanks for the pointer.
> What is the anticipated use case for this? We know that you want to lower
> oom_adj without CAP_SYS_RESOURCE, but what's the expected behavior when an
> app moves from foreground to background? I assume it's something like
The focus here is the web browser's tabs. In our case, each is a process. If
OOM is going to kill a process, you'd rather it kill the tab you looked at
hours ago instead of the one you're looking at now. So you'd like to have a
policy where the LRU tab gets killed first. We'd like to use oom_score_adj
as the mechanism to implement an LRU policy like this.
> having an oom_adj of 0 in the background and +15 in the foreground. If
> so, does /proc/sys/vm/oom_kill_allocating_task get you most of what you're
> looking for?
>
As explained above, oom_kill_allocating_task won't give us what we want.
> I'm wondering if we can avoid yet another resource limit for something
> like this.
>
> > Alternative considered:
> >
> > * a setuid binary
> > * a daemon with CAP_SYS_RESOURCE
> >
> > Since you don't wan't all processes to be able to reduce their
> > oom_adj, a setuid or daemon implementation would be complex. The
> > alternatives also have much higher overhead.
> >
>
> What do you anticipate will be writing to oom_score_adj with this patch,
> the app itself?
>
A process in the browser session will do the adusting. We'd rather not give
it CAP_SYS_RESOURCE. It should only be allowed to change oom_score_adj up
and down within the bounds set by the administrator. Analagous to renice()
which we also do using a similar policy.
> > Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> > ---
> > fs/proc/base.c | 12 ++++++++++--
> > include/asm-generic/resource.h | 5 ++++-
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index f3d02ca..4384013 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -462,6 +462,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
> > [RLIMIT_NICE] = {"Max nice priority", NULL},
> > [RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
> > [RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
> > + [RLIMIT_OOMADJ] = {"Max OOM adjust", NULL},
>
> s/Max/Min, right?
>
This is a MAX value because of how resource limits work. On the other hand,
it is really controlling the minimum oom_adj. So its a toss up for me.
More than happy to change if you prefer Min.
> > };
> >
> > /* Display limits for a process */
> > @@ -1057,8 +1058,15 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
> > }
> >
> > if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) {
> > - err = -EACCES;
> > - goto err_sighand;
> > + /* convert oom_adj [15,-17] to rlimit style value [1,33] */
> > + long oom_rlim = OOM_ADJUST_MAX + 1 - oom_adjust;
> > +
>
> Ouch, that's a rather unfortunate mapping.
>
Unfortunate but unavoidable. The resource limit code checks to see if the
new limit is greater than the limit. This code was based on the can_nice()
code in sched.c.
> > + if (oom_rlim > task->signal->rlim[RLIMIT_OOMADJ].rlim_cur) {
> > + unlock_task_sighand(task, &flags);
> > + put_task_struct(task);
> > + err = -EACCES;
> > + goto err_sighand;
>
> err_sighand has duplicate unlock_task_sighand() and put_task_struct();
> since you're missing the task_unlock(task) here, just using goto
> err_sighand would suffice.
>
D'oh. Forward port error. I should be more careful. Thanks for catching:)
> > + }
> > }
> >
> > if (oom_adjust != task->signal->oom_adj) {
Thank you for reviewing this patch.
Should I send an updated oom_score_adj patch?
Regards,
Mandeep
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 18:30 ` Mandeep Singh Baines
@ 2010-11-11 20:57 ` David Rientjes
2010-11-11 22:25 ` Mandeep Singh Baines
0 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2010-11-11 20:57 UTC (permalink / raw)
To: Mandeep Singh Baines
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
Ying Han, linux-kernel, gspencer, piman, wad, olofj
On Thu, 11 Nov 2010, Mandeep Singh Baines wrote:
> > What is the anticipated use case for this? We know that you want to lower
> > oom_adj without CAP_SYS_RESOURCE, but what's the expected behavior when an
> > app moves from foreground to background? I assume it's something like
>
> The focus here is the web browser's tabs. In our case, each is a process. If
> OOM is going to kill a process, you'd rather it kill the tab you looked at
> hours ago instead of the one you're looking at now. So you'd like to have a
> policy where the LRU tab gets killed first. We'd like to use oom_score_adj
> as the mechanism to implement an LRU policy like this.
>
Hmm, at first glance that seems potentially dangerous if the current tab
generates a burt of memory allocations and it ends up killing all other
tabs before finally targeting the culprit whereas currently the heuristic
should do a good job of finding this problematic tab and killing it
instantly.
Perhaps that can't happen and it probably doesn't even matter:
oom_score_adj allows users to determine which process to kill regardless
of the underlying reason.
> > What do you anticipate will be writing to oom_score_adj with this patch,
> > the app itself?
>
> A process in the browser session will do the adusting. We'd rather not give
> it CAP_SYS_RESOURCE. It should only be allowed to change oom_score_adj up
> and down within the bounds set by the administrator. Analagous to renice()
> which we also do using a similar policy.
>
So as more and more tabs get used, the least recently used tab gets its
oom_score_adj raised higher and higher until it is reused itself and then
it gets reset back to 0 for the current tab?
Is there a reason you don't want to give the underlying browser session
process CAP_SYS_RESOURCE? Will it not be enforcing resource limits to
ensure tabs don't deplete all memory when certain sites are opened? Are
you concerned that it may deplete all memory itself (for which case you
could raise its own oom_score_adj, which is a proportion of available
memory so you can define where that point of depletiong is)?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 20:57 ` David Rientjes
@ 2010-11-11 22:25 ` Mandeep Singh Baines
2010-11-11 23:19 ` David Rientjes
0 siblings, 1 reply; 15+ messages in thread
From: Mandeep Singh Baines @ 2010-11-11 22:25 UTC (permalink / raw)
To: David Rientjes
Cc: Mandeep Singh Baines, Andrew Morton, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Rik van Riel, Ying Han, linux-kernel, gspencer,
piman, wad, olofj
David Rientjes (rientjes@google.com) wrote:
> On Thu, 11 Nov 2010, Mandeep Singh Baines wrote:
>
> > > What is the anticipated use case for this? We know that you want to lower
> > > oom_adj without CAP_SYS_RESOURCE, but what's the expected behavior when an
> > > app moves from foreground to background? I assume it's something like
> >
> > The focus here is the web browser's tabs. In our case, each is a process. If
> > OOM is going to kill a process, you'd rather it kill the tab you looked at
> > hours ago instead of the one you're looking at now. So you'd like to have a
> > policy where the LRU tab gets killed first. We'd like to use oom_score_adj
> > as the mechanism to implement an LRU policy like this.
> >
>
> Hmm, at first glance that seems potentially dangerous if the current tab
> generates a burt of memory allocations and it ends up killing all other
> tabs before finally targeting the culprit whereas currently the heuristic
> should do a good job of finding this problematic tab and killing it
> instantly.
>
If you're watching a movie, video chatting, playing a game, etc. What
would you rather have killed: the current tab you are interacting with or
some tab you opened a while back and are no longer interacting with.
> Perhaps that can't happen and it probably doesn't even matter:
> oom_score_adj allows users to determine which process to kill regardless
> of the underlying reason.
>
> > > What do you anticipate will be writing to oom_score_adj with this patch,
> > > the app itself?
> >
> > A process in the browser session will do the adusting. We'd rather not give
> > it CAP_SYS_RESOURCE. It should only be allowed to change oom_score_adj up
> > and down within the bounds set by the administrator. Analagous to renice()
> > which we also do using a similar policy.
> >
>
> So as more and more tabs get used, the least recently used tab gets its
> oom_score_adj raised higher and higher until it is reused itself and then
> it gets reset back to 0 for the current tab?
>
Exactly.
> Is there a reason you don't want to give the underlying browser session
> process CAP_SYS_RESOURCE? Will it not be enforcing resource limits to
Security. We want to use the least-privilege possible. We really want to
avoid giving special privileges to the browser. You shouldn't need
administrative privileges to run the browser. We'd like for oom_score_adj
to work the same as nice. An unprivileged user can nice up and down as
long as the new setting is within the administratively configured
resource limit: ulimit -e.
> ensure tabs don't deplete all memory when certain sites are opened? Are
> you concerned that it may deplete all memory itself (for which case you
> could raise its own oom_score_adj, which is a proportion of available
> memory so you can define where that point of depletiong is)?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 22:25 ` Mandeep Singh Baines
@ 2010-11-11 23:19 ` David Rientjes
2010-11-11 23:56 ` Mandeep Singh Baines
0 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2010-11-11 23:19 UTC (permalink / raw)
To: Mandeep Singh Baines
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
Ying Han, linux-kernel, gspencer, piman, wad, olofj
On Thu, 11 Nov 2010, Mandeep Singh Baines wrote:
> > Hmm, at first glance that seems potentially dangerous if the current tab
> > generates a burt of memory allocations and it ends up killing all other
> > tabs before finally targeting the culprit whereas currently the heuristic
> > should do a good job of finding this problematic tab and killing it
> > instantly.
> >
>
> If you're watching a movie, video chatting, playing a game, etc. What
> would you rather have killed: the current tab you are interacting with or
> some tab you opened a while back and are no longer interacting with.
>
Well, it's a tangential point, but I'd personally prefer that my existing
tabs that I've decided to leave open are guaranteed to remain open
regardless of where I'm browsing next (they could hold valuable data that
I can't easily get back) and avoid having all of them sacrificed out from
under me for the newly opened tab. I can always go back and close those
tabs for more memory if I know I don't need them anymore and then retry
the failed allocation.
> > So as more and more tabs get used, the least recently used tab gets its
> > oom_score_adj raised higher and higher until it is reused itself and then
> > it gets reset back to 0 for the current tab?
> >
>
> Exactly.
>
We don't necessarily want arbitrary tasks to be able to decrease their
oom_score_adj back to 0 if a CAP_SYS_RESOURCE thread has elevated it,
that's part of the reason for the restriction (in addition to decreasing
your own oom_score_adj all the way to OOM_SCORE_ADJ_MIN).
Would it suffice to allow a task to decrease its oom_score_adj back to the
highest value that a CAP_SYS_RESOURCE thread set it or its inherited value
at fork? Assuming the thread that has forked it has oom_score_adj of 0,
each tab could decrease it back from 0 upon activation unless a
CAP_SYS_RESOURCE thread elevated it to something higher.
To do this, we'd need to save the highest oom_score_adj set by a
CAP_SYS_RESOURCE in struct signal_struct.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 23:19 ` David Rientjes
@ 2010-11-11 23:56 ` Mandeep Singh Baines
0 siblings, 0 replies; 15+ messages in thread
From: Mandeep Singh Baines @ 2010-11-11 23:56 UTC (permalink / raw)
To: David Rientjes
Cc: Mandeep Singh Baines, Andrew Morton, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Rik van Riel, Ying Han, linux-kernel, gspencer,
piman, wad, olofj
David Rientjes (rientjes@google.com) wrote:
> On Thu, 11 Nov 2010, Mandeep Singh Baines wrote:
>
> > > Hmm, at first glance that seems potentially dangerous if the current tab
> > > generates a burt of memory allocations and it ends up killing all other
> > > tabs before finally targeting the culprit whereas currently the heuristic
> > > should do a good job of finding this problematic tab and killing it
> > > instantly.
> > >
> >
> > If you're watching a movie, video chatting, playing a game, etc. What
> > would you rather have killed: the current tab you are interacting with or
> > some tab you opened a while back and are no longer interacting with.
> >
>
> Well, it's a tangential point, but I'd personally prefer that my existing
> tabs that I've decided to leave open are guaranteed to remain open
> regardless of where I'm browsing next (they could hold valuable data that
> I can't easily get back) and avoid having all of them sacrificed out from
> under me for the newly opened tab. I can always go back and close those
> tabs for more memory if I know I don't need them anymore and then retry
> the failed allocation.
>
> > > So as more and more tabs get used, the least recently used tab gets its
> > > oom_score_adj raised higher and higher until it is reused itself and then
> > > it gets reset back to 0 for the current tab?
> > >
> >
> > Exactly.
> >
>
> We don't necessarily want arbitrary tasks to be able to decrease their
> oom_score_adj back to 0 if a CAP_SYS_RESOURCE thread has elevated it,
> that's part of the reason for the restriction (in addition to decreasing
> your own oom_score_adj all the way to OOM_SCORE_ADJ_MIN).
>
> Would it suffice to allow a task to decrease its oom_score_adj back to the
> highest value that a CAP_SYS_RESOURCE thread set it or its inherited value
> at fork? Assuming the thread that has forked it has oom_score_adj of 0,
> each tab could decrease it back from 0 upon activation unless a
> CAP_SYS_RESOURCE thread elevated it to something higher.
>
> To do this, we'd need to save the highest oom_score_adj set by a
> CAP_SYS_RESOURCE in struct signal_struct.
Sounds good to me. I'll start working on this patch.
Thanks,
Mandeep
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] oom: create a resource limit for oom_adj
2010-11-11 4:35 Mandeep Singh Baines
2010-11-11 7:35 ` David Rientjes
@ 2010-11-14 5:07 ` KOSAKI Motohiro
1 sibling, 0 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14 5:07 UTC (permalink / raw)
To: Mandeep Singh Baines
Cc: kosaki.motohiro, Andrew Morton, David Rientjes,
KAMEZAWA Hiroyuki, Rik van Riel, Ying Han, linux-kernel,
gspencer, piman, wad, olofj
Hi Mandeep,
> For ChromiumOS, we'd like to be able to oom_adj a process up/down
> as its leaves/enters the foreground. Currently, it is not possible
> to oom_adj down without CAP_SYS_RESOURCE. This patch creates a new
> resource limit, RLIMIT_OOMADJ, which is works in a similar fashion
> to RLIMIT_NICE. This allows a process's oom_adj to be lowered
> without CAP_SYS_RESOURCE as long as the new value is greater
> than the resource limit.
>
> Alternative considered:
>
> * a setuid binary
> * a daemon with CAP_SYS_RESOURCE
>
> Since you don't wan't all processes to be able to reduce their
> oom_adj, a setuid or daemon implementation would be complex. The
> alternatives also have much higher overhead.
>
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> ---
> fs/proc/base.c | 12 ++++++++++--
> include/asm-generic/resource.h | 5 ++++-
> 2 files changed, 14 insertions(+), 3 deletions(-)
This concept sound useful for embedeed. but I dislike this interface
a bit. Why don't you create /proc/{pid}/oom_adj_lower_bound or similar?
It is more straight forward because oom_adj are already using /proc.
I also think 15..-17 to 0-32 convertion is a bit user unfriendly.
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index f3d02ca..4384013 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -462,6 +462,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
> [RLIMIT_NICE] = {"Max nice priority", NULL},
> [RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
> [RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
> + [RLIMIT_OOMADJ] = {"Max OOM adjust", NULL},
> };
>
> /* Display limits for a process */
> @@ -1057,8 +1058,15 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
> }
>
> if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) {
> - err = -EACCES;
> - goto err_sighand;
> + /* convert oom_adj [15,-17] to rlimit style value [1,33] */
> + long oom_rlim = OOM_ADJUST_MAX + 1 - oom_adjust;
> +
> + if (oom_rlim > task->signal->rlim[RLIMIT_OOMADJ].rlim_cur) {
two points.
1) task->signal->rlim[RLIMIT_OOMADJ].rlim_cur is incorrect.
please use task_rlimit().
2) If process has CAP_SYS_RESOURCE, we should ignore RLIMIT_OOMADJ for
backword compatibility. CAP_NICE do so. (see below)
------------------------------------------------------------------
int can_nice(const struct task_struct *p, const int nice)
{
/* convert nice value [19,-20] to rlimit style value [1,40] */
int nice_rlim = 20 - nice;
return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
capable(CAP_SYS_NICE));
}
------------------------------------------------------------------
> + unlock_task_sighand(task, &flags);
> + put_task_struct(task);
> + err = -EACCES;
> + goto err_sighand;
> + }
> }
>
> if (oom_adjust != task->signal->oom_adj) {
> diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h
> index 587566f..a8640a4 100644
> --- a/include/asm-generic/resource.h
> +++ b/include/asm-generic/resource.h
> @@ -45,7 +45,9 @@
> 0-39 for nice level 19 .. -20 */
> #define RLIMIT_RTPRIO 14 /* maximum realtime priority */
> #define RLIMIT_RTTIME 15 /* timeout for RT tasks in us */
> -#define RLIM_NLIMITS 16
> +#define RLIMIT_OOMADJ 16 /* max oom_adj allowed to lower to
> + 0-32 for oom level 15 .. -17 */
> +#define RLIM_NLIMITS 17
>
> /*
> * SuS says limits have to be unsigned.
> @@ -86,6 +88,7 @@
> [RLIMIT_MSGQUEUE] = { MQ_BYTES_MAX, MQ_BYTES_MAX }, \
> [RLIMIT_NICE] = { 0, 0 }, \
> [RLIMIT_RTPRIO] = { 0, 0 }, \
> + [RLIMIT_OOMADJ] = { 0, 0 }, \
I don't think 0 is good initial value because 0 mean oom_adj==15.
> [RLIMIT_RTTIME] = { RLIM_INFINITY, RLIM_INFINITY }, \
> }
>
> --
> 1.7.3.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-11-23 7:17 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <fNx73-1cI-1@gated-at.bofh.it>
[not found] ` <fNzVf-5UY-3@gated-at.bofh.it>
[not found] ` <fNKdY-6FU-11@gated-at.bofh.it>
[not found] ` <fNMps-1S1-21@gated-at.bofh.it>
2010-11-11 23:15 ` [PATCH] oom: create a resource limit for oom_adj Bodo Eggert
2010-11-11 23:21 ` David Rientjes
2010-11-14 5:07 ` KOSAKI Motohiro
2010-11-14 21:42 ` David Rientjes
2010-11-23 7:16 ` KOSAKI Motohiro
[not found] ` <fNNOx-4qf-1@gated-at.bofh.it>
[not found] ` <fNOAW-5Oi-35@gated-at.bofh.it>
[not found] ` <fNPdF-6Hu-33@gated-at.bofh.it>
[not found] ` <fOctz-4o0-9@gated-at.bofh.it>
2010-11-14 0:05 ` [PATCH] oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down Bodo Eggert
2010-11-11 5:19 [PATCH] oom: create a resource limit for oom_adj Figo.zhang
-- strict thread matches above, loose matches on Subject: below --
2010-11-11 4:35 Mandeep Singh Baines
2010-11-11 7:35 ` David Rientjes
2010-11-11 18:30 ` Mandeep Singh Baines
2010-11-11 20:57 ` David Rientjes
2010-11-11 22:25 ` Mandeep Singh Baines
2010-11-11 23:19 ` David Rientjes
2010-11-11 23:56 ` Mandeep Singh Baines
2010-11-14 5:07 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).