* [RFC PATCH] Enforce RSS+Swap rlimit
@ 2011-11-04 14:45 Jerome Marchand
2011-11-15 13:10 ` [RFC PATCH V2] " Jerome Marchand
0 siblings, 1 reply; 5+ messages in thread
From: Jerome Marchand @ 2011-11-04 14:45 UTC (permalink / raw)
To: linux-mm; +Cc: Linux Kernel Mailing List, Balbir Singh
Currently RSS rlimit is not enforced. We can not forbid a process to exceeds
its RSS limit and allow it swap out. That would hurts the performance of all
system, even when memory resources are plentiful.
Therefore, instead of enforcing a limit on rss usage alone, this patch enforces
a limit on rss+swap value. This is similar to memsw limits of cgroup.
If a process rss+swap usage exceeds RLIMIT_RSS max limit, he received a SIGBUS
signal.
My tests show that code in do_anonymous_page() and __do_fault() indeed prevents
processes to get more memory than the limit and I haven't seen any adverse
effect, but so far I have no test coverage of the code in do_wp_page(). I'm
not sure how to test it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
include/linux/mm.h | 7 +++++++
mm/memory.c | 21 +++++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3dc3a8c..3b54ff1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1092,6 +1092,13 @@ static inline unsigned long get_mm_rss(struct mm_struct *mm)
get_mm_counter(mm, MM_ANONPAGES);
}
+static inline unsigned long get_mm_memsw(struct mm_struct *mm)
+{
+ return get_mm_counter(mm, MM_FILEPAGES) +
+ get_mm_counter(mm, MM_ANONPAGES) +
+ get_mm_counter(mm, MM_SWAPENTS);
+}
+
static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
{
return max(mm->hiwater_rss, get_mm_rss(mm));
diff --git a/mm/memory.c b/mm/memory.c
index b2b8731..c7226f5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2661,8 +2661,14 @@ gotten:
dec_mm_counter_fast(mm, MM_FILEPAGES);
inc_mm_counter_fast(mm, MM_ANONPAGES);
}
- } else
+ } else {
+ if (get_mm_memsw(mm) >=
+ rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+ ret |= VM_FAULT_SIGBUS;
+ goto release;
+ }
inc_mm_counter_fast(mm, MM_ANONPAGES);
+ }
flush_cache_page(vma, address, pte_pfn(orig_pte));
entry = mk_pte(new_page, vma->vm_page_prot);
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
@@ -2713,6 +2719,7 @@ gotten:
} else
mem_cgroup_uncharge_page(new_page);
+release:
if (new_page)
page_cache_release(new_page);
unlock:
@@ -3073,6 +3080,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *page;
spinlock_t *ptl;
pte_t entry;
+ int ret = 0;
pte_unmap(page_table);
@@ -3109,6 +3117,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (!pte_none(*page_table))
goto release;
+ if (get_mm_memsw(mm) >= rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+ ret = VM_FAULT_SIGBUS;
+ goto release;
+ }
inc_mm_counter_fast(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, address);
setpte:
@@ -3118,7 +3130,7 @@ setpte:
update_mmu_cache(vma, address, page_table);
unlock:
pte_unmap_unlock(page_table, ptl);
- return 0;
+ return ret;
release:
mem_cgroup_uncharge_page(page);
page_cache_release(page);
@@ -3263,6 +3275,10 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
entry = mk_pte(page, vma->vm_page_prot);
if (flags & FAULT_FLAG_WRITE)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ if (get_mm_memsw(mm) >= rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+ ret = VM_FAULT_SIGBUS;
+ goto unlock;
+ }
if (anon) {
inc_mm_counter_fast(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, address);
@@ -3287,6 +3303,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
anon = 1; /* no anon but release faulted_page */
}
+unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC PATCH V2] Enforce RSS+Swap rlimit
2011-11-04 14:45 [RFC PATCH] Enforce RSS+Swap rlimit Jerome Marchand
@ 2011-11-15 13:10 ` Jerome Marchand
2011-11-16 0:02 ` KOSAKI Motohiro
2011-11-16 10:09 ` Balbir Singh
0 siblings, 2 replies; 5+ messages in thread
From: Jerome Marchand @ 2011-11-15 13:10 UTC (permalink / raw)
To: Balbir Singh; +Cc: Linux Kernel Mailing List, linux-mm
Change since V1: rebase on 3.2-rc1
Currently RSS rlimit is not enforced. We can not forbid a process to exceeds
its RSS limit and allow it swap out. That would hurts the performance of all
system, even when memory resources are plentiful.
Therefore, instead of enforcing a limit on rss usage alone, this patch enforces
a limit on rss+swap value. This is similar to memsw limits of cgroup.
If a process rss+swap usage exceeds RLIMIT_RSS max limit, he received a SIGBUS
signal.
My tests show that code in do_anonymous_page() and __do_fault() indeed prevents
processes to get more memory than the limit and I haven't seen any adverse
effect, but so far, I have no test coverage of the code in do_wp_page(). I'm
not sure how to test it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
include/linux/mm.h | 7 +++++++
mm/memory.c | 21 +++++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3dc3a8c..3b54ff1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1092,6 +1092,13 @@ static inline unsigned long get_mm_rss(struct mm_struct *mm)
get_mm_counter(mm, MM_ANONPAGES);
}
+static inline unsigned long get_mm_memsw(struct mm_struct *mm)
+{
+ return get_mm_counter(mm, MM_FILEPAGES) +
+ get_mm_counter(mm, MM_ANONPAGES) +
+ get_mm_counter(mm, MM_SWAPENTS);
+}
+
static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
{
return max(mm->hiwater_rss, get_mm_rss(mm));
diff --git a/mm/memory.c b/mm/memory.c
index 829d437..b0463c2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2661,8 +2661,14 @@ gotten:
dec_mm_counter_fast(mm, MM_FILEPAGES);
inc_mm_counter_fast(mm, MM_ANONPAGES);
}
- } else
+ } else {
+ if (get_mm_memsw(mm) >=
+ rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+ ret |= VM_FAULT_SIGBUS;
+ goto release;
+ }
inc_mm_counter_fast(mm, MM_ANONPAGES);
+ }
flush_cache_page(vma, address, pte_pfn(orig_pte));
entry = mk_pte(new_page, vma->vm_page_prot);
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
@@ -2713,6 +2719,7 @@ gotten:
} else
mem_cgroup_uncharge_page(new_page);
+release:
if (new_page)
page_cache_release(new_page);
unlock:
@@ -3073,6 +3080,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *page;
spinlock_t *ptl;
pte_t entry;
+ int ret = 0;
pte_unmap(page_table);
@@ -3109,6 +3117,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (!pte_none(*page_table))
goto release;
+ if (get_mm_memsw(mm) >= rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+ ret = VM_FAULT_SIGBUS;
+ goto release;
+ }
inc_mm_counter_fast(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, address);
setpte:
@@ -3118,7 +3130,7 @@ setpte:
update_mmu_cache(vma, address, page_table);
unlock:
pte_unmap_unlock(page_table, ptl);
- return 0;
+ return ret;
release:
mem_cgroup_uncharge_page(page);
page_cache_release(page);
@@ -3263,6 +3275,10 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
entry = mk_pte(page, vma->vm_page_prot);
if (flags & FAULT_FLAG_WRITE)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ if (get_mm_memsw(mm) >= rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+ ret = VM_FAULT_SIGBUS;
+ goto unlock;
+ }
if (anon) {
inc_mm_counter_fast(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, address);
@@ -3287,6 +3303,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
anon = 1; /* no anon but release faulted_page */
}
+unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH V2] Enforce RSS+Swap rlimit
2011-11-15 13:10 ` [RFC PATCH V2] " Jerome Marchand
@ 2011-11-16 0:02 ` KOSAKI Motohiro
2011-11-16 9:40 ` Jerome Marchand
2011-11-16 10:09 ` Balbir Singh
1 sibling, 1 reply; 5+ messages in thread
From: KOSAKI Motohiro @ 2011-11-16 0:02 UTC (permalink / raw)
To: jmarchan; +Cc: bsingharora, linux-kernel, linux-mm
On 11/15/2011 8:10 AM, Jerome Marchand wrote:
>
> Change since V1: rebase on 3.2-rc1
>
> Currently RSS rlimit is not enforced. We can not forbid a process to exceeds
> its RSS limit and allow it swap out. That would hurts the performance of all
> system, even when memory resources are plentiful.
>
> Therefore, instead of enforcing a limit on rss usage alone, this patch enforces
> a limit on rss+swap value. This is similar to memsw limits of cgroup.
> If a process rss+swap usage exceeds RLIMIT_RSS max limit, he received a SIGBUS
> signal.
No good idea.
- RLIMIT_RSS has clear definition and this patch break it. you should makes
another rlimit at least.
- SIGBUS can be ignored. rlimit shouldn't ignorable.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH V2] Enforce RSS+Swap rlimit
2011-11-16 0:02 ` KOSAKI Motohiro
@ 2011-11-16 9:40 ` Jerome Marchand
0 siblings, 0 replies; 5+ messages in thread
From: Jerome Marchand @ 2011-11-16 9:40 UTC (permalink / raw)
To: KOSAKI Motohiro; +Cc: bsingharora, linux-kernel, linux-mm
On 11/16/2011 01:02 AM, KOSAKI Motohiro wrote:
> On 11/15/2011 8:10 AM, Jerome Marchand wrote:
>>
>> Change since V1: rebase on 3.2-rc1
>>
>> Currently RSS rlimit is not enforced. We can not forbid a process to exceeds
>> its RSS limit and allow it swap out. That would hurts the performance of all
>> system, even when memory resources are plentiful.
>>
>> Therefore, instead of enforcing a limit on rss usage alone, this patch enforces
>> a limit on rss+swap value. This is similar to memsw limits of cgroup.
>> If a process rss+swap usage exceeds RLIMIT_RSS max limit, he received a SIGBUS
>> signal.
>
> No good idea.
> - RLIMIT_RSS has clear definition and this patch break it. you should makes
> another rlimit at least.
I couldn't decide if we needed a new rlimit or not. I shall admit that I chose
the lazy option. If that's a problem, I can add a new rlimit, RLIMIT_MEMSW for
instance.
> - SIGBUS can be ignored. rlimit shouldn't ignorable.
The SIGBUS can be ignored, not the rlimit: if RLIMIT_RSS is exceeded, the process
does not the memory it requested. The SIGBUS is here to notify the process that
something wrong has happened.
Thanks,
Jerome
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH V2] Enforce RSS+Swap rlimit
2011-11-15 13:10 ` [RFC PATCH V2] " Jerome Marchand
2011-11-16 0:02 ` KOSAKI Motohiro
@ 2011-11-16 10:09 ` Balbir Singh
1 sibling, 0 replies; 5+ messages in thread
From: Balbir Singh @ 2011-11-16 10:09 UTC (permalink / raw)
To: Jerome Marchand; +Cc: Linux Kernel Mailing List, linux-mm
On Tue, Nov 15, 2011 at 6:40 PM, Jerome Marchand <jmarchan@redhat.com> wrote:
>
> Change since V1: rebase on 3.2-rc1
>
> Currently RSS rlimit is not enforced. We can not forbid a process to exceeds
> its RSS limit and allow it swap out. That would hurts the performance of all
> system, even when memory resources are plentiful.
>
> Therefore, instead of enforcing a limit on rss usage alone, this patch enforces
> a limit on rss+swap value. This is similar to memsw limits of cgroup.
> If a process rss+swap usage exceeds RLIMIT_RSS max limit, he received a SIGBUS
> signal.
>
> My tests show that code in do_anonymous_page() and __do_fault() indeed prevents
> processes to get more memory than the limit and I haven't seen any adverse
> effect, but so far, I have no test coverage of the code in do_wp_page(). I'm
> not sure how to test it.
>
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
I think we need the get_mm_rss* definitions need to be revisited and
agreed upon. I am afraid it cannot be simple addition, since
1. It does not account for shared pages
2. If we enforce a limit without accounting for sharing, we might
enforce wrong limits
Balbir
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-11-16 10:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-04 14:45 [RFC PATCH] Enforce RSS+Swap rlimit Jerome Marchand
2011-11-15 13:10 ` [RFC PATCH V2] " Jerome Marchand
2011-11-16 0:02 ` KOSAKI Motohiro
2011-11-16 9:40 ` Jerome Marchand
2011-11-16 10:09 ` Balbir Singh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).