All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] epoll: account epitem and eppoll_entry to kmemcg
@ 2017-10-03  2:15 ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2017-10-03  2:15 UTC (permalink / raw)
  To: Alexander Viro, Vladimir Davydov, Johannes Weiner, Michal Hocko,
	Greg Thelen
  Cc: Andrew Morton, linux-mm, linux-fsdevel, linux-kernel, Shakeel Butt

The user space application can directly trigger the allocations
from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious
application can consume a significant amount of system memory by
triggering such allocations. Indeed we have seen in production
where a buggy application was leaking the epoll references and
causing a burst of eventpoll_epi and eventpoll_pwq slab
allocations. This patch opt-in the charging of eventpoll_epi
and eventpoll_pwq slabs.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 fs/eventpoll.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19cdeea..a45360444895 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2329,11 +2329,11 @@ static int __init eventpoll_init(void)
 
 	/* Allocates slab cache used to allocate "struct epitem" items */
 	epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
-			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
+			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
 
 	/* Allocates slab cache used to allocate "struct eppoll_entry" */
 	pwq_cache = kmem_cache_create("eventpoll_pwq",
-			sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
+		sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
 
 	return 0;
 }
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] epoll: account epitem and eppoll_entry to kmemcg
@ 2017-10-03  2:15 ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2017-10-03  2:15 UTC (permalink / raw)
  To: Alexander Viro, Vladimir Davydov, Johannes Weiner, Michal Hocko,
	Greg Thelen
  Cc: Andrew Morton, linux-mm, linux-fsdevel, linux-kernel, Shakeel Butt

The user space application can directly trigger the allocations
from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious
application can consume a significant amount of system memory by
triggering such allocations. Indeed we have seen in production
where a buggy application was leaking the epoll references and
causing a burst of eventpoll_epi and eventpoll_pwq slab
allocations. This patch opt-in the charging of eventpoll_epi
and eventpoll_pwq slabs.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 fs/eventpoll.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19cdeea..a45360444895 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2329,11 +2329,11 @@ static int __init eventpoll_init(void)
 
 	/* Allocates slab cache used to allocate "struct epitem" items */
 	epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
-			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
+			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
 
 	/* Allocates slab cache used to allocate "struct eppoll_entry" */
 	pwq_cache = kmem_cache_create("eventpoll_pwq",
-			sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
+		sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
 
 	return 0;
 }
-- 
2.14.2.822.g60be5d43e6-goog

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
  2017-10-03  2:15 ` Shakeel Butt
@ 2017-10-04 13:17   ` Michal Hocko
  -1 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-10-04 13:17 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, linux-mm, linux-fsdevel, linux-kernel

On Mon 02-10-17 19:15:19, Shakeel Butt wrote:
> The user space application can directly trigger the allocations
> from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious
> application can consume a significant amount of system memory by
> triggering such allocations. Indeed we have seen in production
> where a buggy application was leaking the epoll references and
> causing a burst of eventpoll_epi and eventpoll_pwq slab
> allocations. This patch opt-in the charging of eventpoll_epi
> and eventpoll_pwq slabs.

I am not objecting to the patch I would just like to understand the
runaway case. ep_insert seems to limit the maximum number of watches to
max_user_watches which should be ~4% of lowmem if I am following the
code properly. pwq_cache should be bound by the number of watches as
well, or am I misunderstanding the code?

> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  fs/eventpoll.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 2fabd19cdeea..a45360444895 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2329,11 +2329,11 @@ static int __init eventpoll_init(void)
>  
>  	/* Allocates slab cache used to allocate "struct epitem" items */
>  	epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
> -			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
> +			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
>  
>  	/* Allocates slab cache used to allocate "struct eppoll_entry" */
>  	pwq_cache = kmem_cache_create("eventpoll_pwq",
> -			sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
> +		sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
>  
>  	return 0;
>  }
> -- 
> 2.14.2.822.g60be5d43e6-goog

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
@ 2017-10-04 13:17   ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-10-04 13:17 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, linux-mm, linux-fsdevel, linux-kernel

On Mon 02-10-17 19:15:19, Shakeel Butt wrote:
> The user space application can directly trigger the allocations
> from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious
> application can consume a significant amount of system memory by
> triggering such allocations. Indeed we have seen in production
> where a buggy application was leaking the epoll references and
> causing a burst of eventpoll_epi and eventpoll_pwq slab
> allocations. This patch opt-in the charging of eventpoll_epi
> and eventpoll_pwq slabs.

I am not objecting to the patch I would just like to understand the
runaway case. ep_insert seems to limit the maximum number of watches to
max_user_watches which should be ~4% of lowmem if I am following the
code properly. pwq_cache should be bound by the number of watches as
well, or am I misunderstanding the code?

> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  fs/eventpoll.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 2fabd19cdeea..a45360444895 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2329,11 +2329,11 @@ static int __init eventpoll_init(void)
>  
>  	/* Allocates slab cache used to allocate "struct epitem" items */
>  	epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
> -			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
> +			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
>  
>  	/* Allocates slab cache used to allocate "struct eppoll_entry" */
>  	pwq_cache = kmem_cache_create("eventpoll_pwq",
> -			sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
> +		sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
>  
>  	return 0;
>  }
> -- 
> 2.14.2.822.g60be5d43e6-goog

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
  2017-10-04 13:17   ` Michal Hocko
@ 2017-10-04 19:33     ` Shakeel Butt
  -1 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2017-10-04 19:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, Linux MM, linux-fsdevel, LKML

>
> I am not objecting to the patch I would just like to understand the
> runaway case. ep_insert seems to limit the maximum number of watches to
> max_user_watches which should be ~4% of lowmem if I am following the
> code properly. pwq_cache should be bound by the number of watches as
> well, or am I misunderstanding the code?
>

You are absolutely right that there is a per-user limit (~4% of total
memory if no highmem) on these caches. I think it is too generous
particularly in the scenario where jobs of multiple users are running
on the system and the administrator is reducing cost by overcomitting
the memory. This is unaccounted kernel memory and will not be
considered by the oom-killer. I think by accounting it to kmemcg, for
systems with kmem accounting enabled, we can provide better isolation
between jobs of different users.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
@ 2017-10-04 19:33     ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2017-10-04 19:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, Linux MM, linux-fsdevel, LKML

>
> I am not objecting to the patch I would just like to understand the
> runaway case. ep_insert seems to limit the maximum number of watches to
> max_user_watches which should be ~4% of lowmem if I am following the
> code properly. pwq_cache should be bound by the number of watches as
> well, or am I misunderstanding the code?
>

You are absolutely right that there is a per-user limit (~4% of total
memory if no highmem) on these caches. I think it is too generous
particularly in the scenario where jobs of multiple users are running
on the system and the administrator is reducing cost by overcomitting
the memory. This is unaccounted kernel memory and will not be
considered by the oom-killer. I think by accounting it to kmemcg, for
systems with kmem accounting enabled, we can provide better isolation
between jobs of different users.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
  2017-10-04 19:33     ` Shakeel Butt
@ 2017-10-05  8:21       ` Michal Hocko
  -1 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-10-05  8:21 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, Linux MM, linux-fsdevel, LKML

On Wed 04-10-17 12:33:14, Shakeel Butt wrote:
> >
> > I am not objecting to the patch I would just like to understand the
> > runaway case. ep_insert seems to limit the maximum number of watches to
> > max_user_watches which should be ~4% of lowmem if I am following the
> > code properly. pwq_cache should be bound by the number of watches as
> > well, or am I misunderstanding the code?
> >
> 
> You are absolutely right that there is a per-user limit (~4% of total
> memory if no highmem) on these caches. I think it is too generous
> particularly in the scenario where jobs of multiple users are running
> on the system and the administrator is reducing cost by overcomitting
> the memory. This is unaccounted kernel memory and will not be
> considered by the oom-killer. I think by accounting it to kmemcg, for
> systems with kmem accounting enabled, we can provide better isolation
> between jobs of different users.

Thanks for the clarification. For some reason I didn't figure that the
limit is per user, even though the name suggests so.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
@ 2017-10-05  8:21       ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-10-05  8:21 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, Linux MM, linux-fsdevel, LKML

On Wed 04-10-17 12:33:14, Shakeel Butt wrote:
> >
> > I am not objecting to the patch I would just like to understand the
> > runaway case. ep_insert seems to limit the maximum number of watches to
> > max_user_watches which should be ~4% of lowmem if I am following the
> > code properly. pwq_cache should be bound by the number of watches as
> > well, or am I misunderstanding the code?
> >
> 
> You are absolutely right that there is a per-user limit (~4% of total
> memory if no highmem) on these caches. I think it is too generous
> particularly in the scenario where jobs of multiple users are running
> on the system and the administrator is reducing cost by overcomitting
> the memory. This is unaccounted kernel memory and will not be
> considered by the oom-killer. I think by accounting it to kmemcg, for
> systems with kmem accounting enabled, we can provide better isolation
> between jobs of different users.

Thanks for the clarification. For some reason I didn't figure that the
limit is per user, even though the name suggests so.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
  2017-10-05  8:21       ` Michal Hocko
@ 2017-10-06  7:48         ` Michal Hocko
  -1 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-10-06  7:48 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, Linux MM, linux-fsdevel, LKML

On Thu 05-10-17 10:21:18, Michal Hocko wrote:
> On Wed 04-10-17 12:33:14, Shakeel Butt wrote:
> > >
> > > I am not objecting to the patch I would just like to understand the
> > > runaway case. ep_insert seems to limit the maximum number of watches to
> > > max_user_watches which should be ~4% of lowmem if I am following the
> > > code properly. pwq_cache should be bound by the number of watches as
> > > well, or am I misunderstanding the code?
> > >
> > 
> > You are absolutely right that there is a per-user limit (~4% of total
> > memory if no highmem) on these caches. I think it is too generous
> > particularly in the scenario where jobs of multiple users are running
> > on the system and the administrator is reducing cost by overcomitting
> > the memory. This is unaccounted kernel memory and will not be
> > considered by the oom-killer. I think by accounting it to kmemcg, for
> > systems with kmem accounting enabled, we can provide better isolation
> > between jobs of different users.
> 
> Thanks for the clarification. For some reason I didn't figure that the
> limit is per user, even though the name suggests so.

Completely forgot to add
Acked-by: Michal Hocko <mhocko@suse.com>
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
@ 2017-10-06  7:48         ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-10-06  7:48 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Vladimir Davydov, Johannes Weiner, Greg Thelen,
	Andrew Morton, Linux MM, linux-fsdevel, LKML

On Thu 05-10-17 10:21:18, Michal Hocko wrote:
> On Wed 04-10-17 12:33:14, Shakeel Butt wrote:
> > >
> > > I am not objecting to the patch I would just like to understand the
> > > runaway case. ep_insert seems to limit the maximum number of watches to
> > > max_user_watches which should be ~4% of lowmem if I am following the
> > > code properly. pwq_cache should be bound by the number of watches as
> > > well, or am I misunderstanding the code?
> > >
> > 
> > You are absolutely right that there is a per-user limit (~4% of total
> > memory if no highmem) on these caches. I think it is too generous
> > particularly in the scenario where jobs of multiple users are running
> > on the system and the administrator is reducing cost by overcomitting
> > the memory. This is unaccounted kernel memory and will not be
> > considered by the oom-killer. I think by accounting it to kmemcg, for
> > systems with kmem accounting enabled, we can provide better isolation
> > between jobs of different users.
> 
> Thanks for the clarification. For some reason I didn't figure that the
> limit is per user, even though the name suggests so.

Completely forgot to add
Acked-by: Michal Hocko <mhocko@suse.com>
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-10-06  7:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03  2:15 [PATCH] epoll: account epitem and eppoll_entry to kmemcg Shakeel Butt
2017-10-03  2:15 ` Shakeel Butt
2017-10-04 13:17 ` Michal Hocko
2017-10-04 13:17   ` Michal Hocko
2017-10-04 19:33   ` Shakeel Butt
2017-10-04 19:33     ` Shakeel Butt
2017-10-05  8:21     ` Michal Hocko
2017-10-05  8:21       ` Michal Hocko
2017-10-06  7:48       ` Michal Hocko
2017-10-06  7:48         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.