All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 11:28 ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 11:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrew Morton, Tejun Heo, Kirill A. Shutemov, Anton Vorontsov

There is no limit for the maximum number of threshold events registered
per memcg. This might lead to an user triggered memory depletion if a
regular user is allowed to register on memory.[memsw.]usage_in_bytes
eventfd interface.

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS	1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	else
 		BUG();
 
+	if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) {
+		ret = -ENOSPC;
+		goto unlock;
+	}
+
 	usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
 	/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 11:28 ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 11:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrew Morton, Tejun Heo, Kirill A. Shutemov, Anton Vorontsov

There is no limit for the maximum number of threshold events registered
per memcg. This might lead to an user triggered memory depletion if a
regular user is allowed to register on memory.[memsw.]usage_in_bytes
eventfd interface.

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS	1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	else
 		BUG();
 
+	if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) {
+		ret = -ENOSPC;
+		goto unlock;
+	}
+
 	usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
 	/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 11:28 ` Michal Hocko
@ 2013-08-07 11:28   ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 11:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrew Morton, Tejun Heo, Kirill A. Shutemov, Anton Vorontsov

There is no limit for the maximum number of oom_control events
registered per memcg. This might lead to an user triggered memory
depletion if a regular user is allowed to register events.

Let's be more strict and cap the number of events that might be
registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 24KB which shouldn't be a big deal
and it should be good enough (even 1024 oom notification events sounds
crazy).

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8247db3..233317a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -273,6 +273,7 @@ struct mem_cgroup {
 	struct mem_cgroup_thresholds memsw_thresholds;
 
 	/* For oom notifier event fd */
+	unsigned int oom_notify_count;
 	struct list_head oom_notify;
 
 	/*
@@ -5571,6 +5572,8 @@ unlock:
 	mutex_unlock(&memcg->thresholds_lock);
 }
 
+/* Maximum number of oom notify events per memcg */
+#define MAX_OOM_NOTIFY_EVENTS 1024
 static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5578,10 +5581,25 @@ static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 	struct mem_cgroup_eventfd_list *event;
 	enum res_type type = MEMFILE_TYPE(cft->private);
 
+	spin_lock(&memcg_oom_lock);
+	if (memcg->oom_notify_count == MAX_OOM_NOTIFY_EVENTS) {
+		spin_unlock(&memcg_oom_lock);
+		return -ENOSPC;
+	}
+	/*
+	 * Be optimistic that the allocation succeds and increase the count
+	 * now. This all is done because we have to drop the memcg_oom_lock
+	 * while allocating.
+	 */
+	memcg->oom_notify_count++;
+	spin_unlock(&memcg_oom_lock);
+
 	BUG_ON(type != _OOM_TYPE);
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
-	if (!event)
+	if (!event) {
+		memcg->oom_notify_count--;
 		return -ENOMEM;
+	}
 
 	spin_lock(&memcg_oom_lock);
 
@@ -5611,6 +5629,7 @@ static void mem_cgroup_oom_unregister_event(struct cgroup *cgrp,
 		if (ev->eventfd == eventfd) {
 			list_del(&ev->list);
 			kfree(ev);
+			memcg->oom_notify_count--;
 		}
 	}
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 11:28   ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 11:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrew Morton, Tejun Heo, Kirill A. Shutemov, Anton Vorontsov

There is no limit for the maximum number of oom_control events
registered per memcg. This might lead to an user triggered memory
depletion if a regular user is allowed to register events.

Let's be more strict and cap the number of events that might be
registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 24KB which shouldn't be a big deal
and it should be good enough (even 1024 oom notification events sounds
crazy).

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8247db3..233317a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -273,6 +273,7 @@ struct mem_cgroup {
 	struct mem_cgroup_thresholds memsw_thresholds;
 
 	/* For oom notifier event fd */
+	unsigned int oom_notify_count;
 	struct list_head oom_notify;
 
 	/*
@@ -5571,6 +5572,8 @@ unlock:
 	mutex_unlock(&memcg->thresholds_lock);
 }
 
+/* Maximum number of oom notify events per memcg */
+#define MAX_OOM_NOTIFY_EVENTS 1024
 static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5578,10 +5581,25 @@ static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 	struct mem_cgroup_eventfd_list *event;
 	enum res_type type = MEMFILE_TYPE(cft->private);
 
+	spin_lock(&memcg_oom_lock);
+	if (memcg->oom_notify_count == MAX_OOM_NOTIFY_EVENTS) {
+		spin_unlock(&memcg_oom_lock);
+		return -ENOSPC;
+	}
+	/*
+	 * Be optimistic that the allocation succeds and increase the count
+	 * now. This all is done because we have to drop the memcg_oom_lock
+	 * while allocating.
+	 */
+	memcg->oom_notify_count++;
+	spin_unlock(&memcg_oom_lock);
+
 	BUG_ON(type != _OOM_TYPE);
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
-	if (!event)
+	if (!event) {
+		memcg->oom_notify_count--;
 		return -ENOMEM;
+	}
 
 	spin_lock(&memcg_oom_lock);
 
@@ -5611,6 +5629,7 @@ static void mem_cgroup_oom_unregister_event(struct cgroup *cgrp,
 		if (ev->eventfd == eventfd) {
 			list_del(&ev->list);
 			kfree(ev);
+			memcg->oom_notify_count--;
 		}
 	}
 
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/3] vmpressure: limit the number of registered events
  2013-08-07 11:28 ` Michal Hocko
@ 2013-08-07 11:28   ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 11:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrew Morton, Tejun Heo, Kirill A. Shutemov, Anton Vorontsov

There is no limit for the maximum number of vmpressure events registered
per memcg. This might lead to an user triggered memory depletion if a
regular user is allowed to register events.

Let's be more strict and cap the number of events that might be
registered. MAX_VMPRESSURE_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 32KB which shouldn't be a big deal
and it should be good enough.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/vmpressure.h |    2 ++
 mm/vmpressure.c            |   21 +++++++++++++++++----
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 7dc17e2..474230c 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -14,6 +14,8 @@ struct vmpressure {
 	/* The lock is used to keep the scanned/reclaimed above in sync. */
 	struct spinlock sr_lock;
 
+	/* Number of registered events. */
+	unsigned int events_count;
 	/* The list of vmpressure_event structs. */
 	struct list_head events;
 	/* Have to grab the lock on events traversal or modifications. */
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 0c1e37d..bc9d546 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -281,6 +281,9 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 	vmpressure(gfp, memcg, vmpressure_win, 0);
 }
 
+/* Maximum number of events registered per group */
+#define MAX_VMPRESSURE_EVENTS 1024
+
 /**
  * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
  * @cg:		cgroup that is interested in vmpressure notifications
@@ -304,6 +307,7 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 	struct vmpressure *vmpr = cg_to_vmpressure(cg);
 	struct vmpressure_event *ev;
 	int level;
+	int ret = 0;
 
 	for (level = 0; level < VMPRESSURE_NUM_LEVELS; level++) {
 		if (!strcmp(vmpressure_str_levels[level], args))
@@ -313,18 +317,26 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 	if (level >= VMPRESSURE_NUM_LEVELS)
 		return -EINVAL;
 
+	mutex_lock(&vmpr->events_lock);
+	if (vmpr->events_count == MAX_VMPRESSURE_EVENTS) {
+		ret = -ENOSPC;
+		goto unlock;
+	}
 	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
-	if (!ev)
-		return -ENOMEM;
+	if (!ev) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
 
 	ev->efd = eventfd;
 	ev->level = level;
 
-	mutex_lock(&vmpr->events_lock);
+	vmpr->events_count++;
 	list_add(&ev->node, &vmpr->events);
+unlock:
 	mutex_unlock(&vmpr->events_lock);
 
-	return 0;
+	return ret;
 }
 
 /**
@@ -351,6 +363,7 @@ void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
 	list_for_each_entry(ev, &vmpr->events, node) {
 		if (ev->efd != eventfd)
 			continue;
+		vmpr->events_count--;
 		list_del(&ev->node);
 		kfree(ev);
 		break;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/3] vmpressure: limit the number of registered events
@ 2013-08-07 11:28   ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 11:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrew Morton, Tejun Heo, Kirill A. Shutemov, Anton Vorontsov

There is no limit for the maximum number of vmpressure events registered
per memcg. This might lead to an user triggered memory depletion if a
regular user is allowed to register events.

Let's be more strict and cap the number of events that might be
registered. MAX_VMPRESSURE_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 32KB which shouldn't be a big deal
and it should be good enough.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/vmpressure.h |    2 ++
 mm/vmpressure.c            |   21 +++++++++++++++++----
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 7dc17e2..474230c 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -14,6 +14,8 @@ struct vmpressure {
 	/* The lock is used to keep the scanned/reclaimed above in sync. */
 	struct spinlock sr_lock;
 
+	/* Number of registered events. */
+	unsigned int events_count;
 	/* The list of vmpressure_event structs. */
 	struct list_head events;
 	/* Have to grab the lock on events traversal or modifications. */
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 0c1e37d..bc9d546 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -281,6 +281,9 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 	vmpressure(gfp, memcg, vmpressure_win, 0);
 }
 
+/* Maximum number of events registered per group */
+#define MAX_VMPRESSURE_EVENTS 1024
+
 /**
  * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
  * @cg:		cgroup that is interested in vmpressure notifications
@@ -304,6 +307,7 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 	struct vmpressure *vmpr = cg_to_vmpressure(cg);
 	struct vmpressure_event *ev;
 	int level;
+	int ret = 0;
 
 	for (level = 0; level < VMPRESSURE_NUM_LEVELS; level++) {
 		if (!strcmp(vmpressure_str_levels[level], args))
@@ -313,18 +317,26 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 	if (level >= VMPRESSURE_NUM_LEVELS)
 		return -EINVAL;
 
+	mutex_lock(&vmpr->events_lock);
+	if (vmpr->events_count == MAX_VMPRESSURE_EVENTS) {
+		ret = -ENOSPC;
+		goto unlock;
+	}
 	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
-	if (!ev)
-		return -ENOMEM;
+	if (!ev) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
 
 	ev->efd = eventfd;
 	ev->level = level;
 
-	mutex_lock(&vmpr->events_lock);
+	vmpr->events_count++;
 	list_add(&ev->node, &vmpr->events);
+unlock:
 	mutex_unlock(&vmpr->events_lock);
 
-	return 0;
+	return ret;
 }
 
 /**
@@ -351,6 +363,7 @@ void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
 	list_for_each_entry(ev, &vmpr->events, node) {
 		if (ev->efd != eventfd)
 			continue;
+		vmpr->events_count--;
 		list_del(&ev->node);
 		kfree(ev);
 		break;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 11:28   ` Michal Hocko
  (?)
@ 2013-08-07 13:08     ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello, Michal.

On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> There is no limit for the maximum number of oom_control events
> registered per memcg. This might lead to an user triggered memory
> depletion if a regular user is allowed to register events.
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 24KB which shouldn't be a big deal
> and it should be good enough (even 1024 oom notification events sounds
> crazy).

I think putting restriction on usage_event makes sense as that builds
a shared contiguous table from all events which can't be attributed
correctly and makes it easy to trigger allocation failures due to
large order allocation but is this necessary for oom and vmpressure,
both of which allocate only for the listening task?  It isn't
different from listening from epoll, for example.  If there needs to
be kernel memory limit, shouldn't that be handled by kmemcg?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 13:08     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello, Michal.

On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> There is no limit for the maximum number of oom_control events
> registered per memcg. This might lead to an user triggered memory
> depletion if a regular user is allowed to register events.
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 24KB which shouldn't be a big deal
> and it should be good enough (even 1024 oom notification events sounds
> crazy).

I think putting restriction on usage_event makes sense as that builds
a shared contiguous table from all events which can't be attributed
correctly and makes it easy to trigger allocation failures due to
large order allocation but is this necessary for oom and vmpressure,
both of which allocate only for the listening task?  It isn't
different from listening from epoll, for example.  If there needs to
be kernel memory limit, shouldn't that be handled by kmemcg?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 13:08     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello, Michal.

On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> There is no limit for the maximum number of oom_control events
> registered per memcg. This might lead to an user triggered memory
> depletion if a regular user is allowed to register events.
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 24KB which shouldn't be a big deal
> and it should be good enough (even 1024 oom notification events sounds
> crazy).

I think putting restriction on usage_event makes sense as that builds
a shared contiguous table from all events which can't be attributed
correctly and makes it easy to trigger allocation failures due to
large order allocation but is this necessary for oom and vmpressure,
both of which allocate only for the listening task?  It isn't
different from listening from epoll, for example.  If there needs to
be kernel memory limit, shouldn't that be handled by kmemcg?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 13:08     ` Tejun Heo
@ 2013-08-07 13:11       ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed, Aug 07, 2013 at 09:08:36AM -0400, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of oom_control events
> > registered per memcg. This might lead to an user triggered memory
> > depletion if a regular user is allowed to register events.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 24KB which shouldn't be a big deal
> > and it should be good enough (even 1024 oom notification events sounds
> > crazy).
> 
> I think putting restriction on usage_event makes sense as that builds
> a shared contiguous table from all events which can't be attributed
> correctly and makes it easy to trigger allocation failures due to
> large order allocation but is this necessary for oom and vmpressure,
> both of which allocate only for the listening task?  It isn't
> different from listening from epoll, for example.  If there needs to
> be kernel memory limit, shouldn't that be handled by kmemcg?

To add a bit, adding this global limit actually makes these events
prone to DoS attack regardless of kernel memory usage limit.  Given
that the whole usage model of delegating access to the file is busted,
I don't think it matters all that much but I'm not sure what this
patch is protecting against.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 13:11       ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed, Aug 07, 2013 at 09:08:36AM -0400, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of oom_control events
> > registered per memcg. This might lead to an user triggered memory
> > depletion if a regular user is allowed to register events.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 24KB which shouldn't be a big deal
> > and it should be good enough (even 1024 oom notification events sounds
> > crazy).
> 
> I think putting restriction on usage_event makes sense as that builds
> a shared contiguous table from all events which can't be attributed
> correctly and makes it easy to trigger allocation failures due to
> large order allocation but is this necessary for oom and vmpressure,
> both of which allocate only for the listening task?  It isn't
> different from listening from epoll, for example.  If there needs to
> be kernel memory limit, shouldn't that be handled by kmemcg?

To add a bit, adding this global limit actually makes these events
prone to DoS attack regardless of kernel memory usage limit.  Given
that the whole usage model of delegating access to the file is busted,
I don't think it matters all that much but I'm not sure what this
patch is protecting against.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
  2013-08-07 11:28 ` Michal Hocko
  (?)
@ 2013-08-07 13:22   ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello,

On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> There is no limit for the maximum number of threshold events registered
> per memcg. This might lead to an user triggered memory depletion if a
> regular user is allowed to register on memory.[memsw.]usage_in_bytes
> eventfd interface.
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 16KB which shouldn't be a big deal
> and it should be good enough.

I don't think the memory consumption per-se is the issue to be handled
here (as kernel memory consumption is a different generic problem) but
rather that all listeners, regardless of their priv level, cgroup
membership and so on, end up contributing to this single shared
contiguous table, which makes it quite easy to do DoS attack on it if
the event control is actually delegated to untrusted security domain,
which BTW kinda makes all these complexities kinda pointless as it
nullifies the only use case (many un-coordinated listeners watching
different thresholds) which the event mechanism can actually do
better.

A proper fix would be making it build sorted data structure, be it
list or tree, and letting each listener insert its own probe at the
appropriate position and updating the event generation maintain cursor
in the tree and fire events as appropriate, but given that the whole
usage model is being obsoleted, it probably isn't worth doing that and
this fixed limit is better than just letting things go and allow
allocation to fail at some point, I suppose.

Can you please update the patch description to reflect the actual
problem?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 13:22   ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello,

On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> There is no limit for the maximum number of threshold events registered
> per memcg. This might lead to an user triggered memory depletion if a
> regular user is allowed to register on memory.[memsw.]usage_in_bytes
> eventfd interface.
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 16KB which shouldn't be a big deal
> and it should be good enough.

I don't think the memory consumption per-se is the issue to be handled
here (as kernel memory consumption is a different generic problem) but
rather that all listeners, regardless of their priv level, cgroup
membership and so on, end up contributing to this single shared
contiguous table, which makes it quite easy to do DoS attack on it if
the event control is actually delegated to untrusted security domain,
which BTW kinda makes all these complexities kinda pointless as it
nullifies the only use case (many un-coordinated listeners watching
different thresholds) which the event mechanism can actually do
better.

A proper fix would be making it build sorted data structure, be it
list or tree, and letting each listener insert its own probe at the
appropriate position and updating the event generation maintain cursor
in the tree and fire events as appropriate, but given that the whole
usage model is being obsoleted, it probably isn't worth doing that and
this fixed limit is better than just letting things go and allow
allocation to fail at some point, I suppose.

Can you please update the patch description to reflect the actual
problem?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 13:22   ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello,

On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> There is no limit for the maximum number of threshold events registered
> per memcg. This might lead to an user triggered memory depletion if a
> regular user is allowed to register on memory.[memsw.]usage_in_bytes
> eventfd interface.
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 16KB which shouldn't be a big deal
> and it should be good enough.

I don't think the memory consumption per-se is the issue to be handled
here (as kernel memory consumption is a different generic problem) but
rather that all listeners, regardless of their priv level, cgroup
membership and so on, end up contributing to this single shared
contiguous table, which makes it quite easy to do DoS attack on it if
the event control is actually delegated to untrusted security domain,
which BTW kinda makes all these complexities kinda pointless as it
nullifies the only use case (many un-coordinated listeners watching
different thresholds) which the event mechanism can actually do
better.

A proper fix would be making it build sorted data structure, be it
list or tree, and letting each listener insert its own probe at the
appropriate position and updating the event generation maintain cursor
in the tree and fire events as appropriate, but given that the whole
usage model is being obsoleted, it probably isn't worth doing that and
this fixed limit is better than just letting things go and allow
allocation to fail at some point, I suppose.

Can you please update the patch description to reflect the actual
problem?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 13:08     ` Tejun Heo
@ 2013-08-07 13:37       ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 13:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:08:36, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of oom_control events
> > registered per memcg. This might lead to an user triggered memory
> > depletion if a regular user is allowed to register events.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 24KB which shouldn't be a big deal
> > and it should be good enough (even 1024 oom notification events sounds
> > crazy).
> 
> I think putting restriction on usage_event makes sense as that builds
> a shared contiguous table from all events which can't be attributed
> correctly and makes it easy to trigger allocation failures due to
> large order allocation but is this necessary for oom and vmpressure,
> both of which allocate only for the listening task?

Once I was there I made them consistent in that regards.

> It isn't different from listening from epoll, for example.

epoll limits the number of watchers, no?

> If there needs to be kernel memory limit, shouldn't that be handled by
> kmemcg?

kmemcg would surely help but turning it on just because of potential
abuse of the event registration API sounds like an overkill.

I think having a cap for user trigable kernel resources is a good thing
in general.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 13:37       ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 13:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:08:36, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of oom_control events
> > registered per memcg. This might lead to an user triggered memory
> > depletion if a regular user is allowed to register events.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 24KB which shouldn't be a big deal
> > and it should be good enough (even 1024 oom notification events sounds
> > crazy).
> 
> I think putting restriction on usage_event makes sense as that builds
> a shared contiguous table from all events which can't be attributed
> correctly and makes it easy to trigger allocation failures due to
> large order allocation but is this necessary for oom and vmpressure,
> both of which allocate only for the listening task?

Once I was there I made them consistent in that regards.

> It isn't different from listening from epoll, for example.

epoll limits the number of watchers, no?

> If there needs to be kernel memory limit, shouldn't that be handled by
> kmemcg?

kmemcg would surely help but turning it on just because of potential
abuse of the event registration API sounds like an overkill.

I think having a cap for user trigable kernel resources is a good thing
in general.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
  2013-08-07 13:22   ` Tejun Heo
@ 2013-08-07 13:46     ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 13:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:22:10, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of threshold events registered
> > per memcg. This might lead to an user triggered memory depletion if a
> > regular user is allowed to register on memory.[memsw.]usage_in_bytes
> > eventfd interface.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> I don't think the memory consumption per-se is the issue to be handled
> here (as kernel memory consumption is a different generic problem) but
> rather that all listeners, regardless of their priv level, cgroup
> membership and so on, end up contributing to this single shared
> contiguous table,

The table is per-memcg but you are right that everybody who has file
write access to the particular group's usage file can register to it.

> which makes it quite easy to do DoS attack on it if
> the event control is actually delegated to untrusted security domain,

OK, I have obviously misunderstood your concern mentioned in the other
email. Could you be more specific what is the DoS scenario which was
your concern, then?

[...]
> Can you please update the patch description to reflect the actual
> problem?

As soon as I understand what is your concern ;)
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 13:46     ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 13:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:22:10, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of threshold events registered
> > per memcg. This might lead to an user triggered memory depletion if a
> > regular user is allowed to register on memory.[memsw.]usage_in_bytes
> > eventfd interface.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> I don't think the memory consumption per-se is the issue to be handled
> here (as kernel memory consumption is a different generic problem) but
> rather that all listeners, regardless of their priv level, cgroup
> membership and so on, end up contributing to this single shared
> contiguous table,

The table is per-memcg but you are right that everybody who has file
write access to the particular group's usage file can register to it.

> which makes it quite easy to do DoS attack on it if
> the event control is actually delegated to untrusted security domain,

OK, I have obviously misunderstood your concern mentioned in the other
email. Could you be more specific what is the DoS scenario which was
your concern, then?

[...]
> Can you please update the patch description to reflect the actual
> problem?

As soon as I understand what is your concern ;)
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 13:37       ` Michal Hocko
@ 2013-08-07 13:47         ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello,

On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > It isn't different from listening from epoll, for example.
> 
> epoll limits the number of watchers, no?

Not that I know of.  It'll be limited by max open fds but I don't
think there are other limits.  Why would there be?

> > If there needs to be kernel memory limit, shouldn't that be handled by
> > kmemcg?
> 
> kmemcg would surely help but turning it on just because of potential
> abuse of the event registration API sounds like an overkill.
> 
> I think having a cap for user trigable kernel resources is a good thing
> in general.

I don't know.  It's just very arbitrary because listening to events
itself isn't (and shouldn't) be something which consumes resource
which isn't attributed to the listener and this artificially creates a
global resource.  The problem with memory usage event is breaching
that rule with shared kmalloc() so putting well-defined limit on it is
fine but the latter two create additional artificial restrictions
which are both unnecessary and unconventional.  No?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 13:47         ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello,

On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > It isn't different from listening from epoll, for example.
> 
> epoll limits the number of watchers, no?

Not that I know of.  It'll be limited by max open fds but I don't
think there are other limits.  Why would there be?

> > If there needs to be kernel memory limit, shouldn't that be handled by
> > kmemcg?
> 
> kmemcg would surely help but turning it on just because of potential
> abuse of the event registration API sounds like an overkill.
> 
> I think having a cap for user trigable kernel resources is a good thing
> in general.

I don't know.  It's just very arbitrary because listening to events
itself isn't (and shouldn't) be something which consumes resource
which isn't attributed to the listener and this artificially creates a
global resource.  The problem with memory usage event is breaching
that rule with shared kmalloc() so putting well-defined limit on it is
fine but the latter two create additional artificial restrictions
which are both unnecessary and unconventional.  No?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 13:47         ` Tejun Heo
  (?)
@ 2013-08-07 13:57           ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 13:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:47:41, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > > It isn't different from listening from epoll, for example.
> > 
> > epoll limits the number of watchers, no?
> 
> Not that I know of.  It'll be limited by max open fds but I don't
> think there are other limits. 

max_user_watches seems to be a limit (4% of lowmem in maximum).

> Why would there be?

Because userspace should hog kernel resources without any limit.

> > > If there needs to be kernel memory limit, shouldn't that be handled by
> > > kmemcg?
> > 
> > kmemcg would surely help but turning it on just because of potential
> > abuse of the event registration API sounds like an overkill.
> > 
> > I think having a cap for user trigable kernel resources is a good thing
> > in general.
> 
> I don't know.  It's just very arbitrary because listening to events
> itself isn't (and shouldn't) be something which consumes resource
> which isn't attributed to the listener and this artificially creates a
> global resource.  The problem with memory usage event is breaching
> that rule with shared kmalloc() so putting well-defined limit on it is
> fine but the latter two create additional artificial restrictions
> which are both unnecessary and unconventional.  No?

Hmm, OK so you think that the fd limit is sufficient already?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 13:57           ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 13:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:47:41, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > > It isn't different from listening from epoll, for example.
> > 
> > epoll limits the number of watchers, no?
> 
> Not that I know of.  It'll be limited by max open fds but I don't
> think there are other limits. 

max_user_watches seems to be a limit (4% of lowmem in maximum).

> Why would there be?

Because userspace should hog kernel resources without any limit.

> > > If there needs to be kernel memory limit, shouldn't that be handled by
> > > kmemcg?
> > 
> > kmemcg would surely help but turning it on just because of potential
> > abuse of the event registration API sounds like an overkill.
> > 
> > I think having a cap for user trigable kernel resources is a good thing
> > in general.
> 
> I don't know.  It's just very arbitrary because listening to events
> itself isn't (and shouldn't) be something which consumes resource
> which isn't attributed to the listener and this artificially creates a
> global resource.  The problem with memory usage event is breaching
> that rule with shared kmalloc() so putting well-defined limit on it is
> fine but the latter two create additional artificial restrictions
> which are both unnecessary and unconventional.  No?

Hmm, OK so you think that the fd limit is sufficient already?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 13:57           ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 13:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:47:41, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > > It isn't different from listening from epoll, for example.
> > 
> > epoll limits the number of watchers, no?
> 
> Not that I know of.  It'll be limited by max open fds but I don't
> think there are other limits. 

max_user_watches seems to be a limit (4% of lowmem in maximum).

> Why would there be?

Because userspace should hog kernel resources without any limit.

> > > If there needs to be kernel memory limit, shouldn't that be handled by
> > > kmemcg?
> > 
> > kmemcg would surely help but turning it on just because of potential
> > abuse of the event registration API sounds like an overkill.
> > 
> > I think having a cap for user trigable kernel resources is a good thing
> > in general.
> 
> I don't know.  It's just very arbitrary because listening to events
> itself isn't (and shouldn't) be something which consumes resource
> which isn't attributed to the listener and this artificially creates a
> global resource.  The problem with memory usage event is breaching
> that rule with shared kmalloc() so putting well-defined limit on it is
> fine but the latter two create additional artificial restrictions
> which are both unnecessary and unconventional.  No?

Hmm, OK so you think that the fd limit is sufficient already?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
  2013-08-07 13:46     ` Michal Hocko
@ 2013-08-07 13:58       ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello,

On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> OK, I have obviously misunderstood your concern mentioned in the other
> email. Could you be more specific what is the DoS scenario which was
> your concern, then?

So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.

Putting an extra limit on it isn't an actual solution but could be
better, I think.  It at least makes it clear that this is a limited
global resource.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 13:58       ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello,

On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> OK, I have obviously misunderstood your concern mentioned in the other
> email. Could you be more specific what is the DoS scenario which was
> your concern, then?

So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.

Putting an extra limit on it isn't an actual solution but could be
better, I think.  It at least makes it clear that this is a limited
global resource.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 13:57           ` Michal Hocko
@ 2013-08-07 14:01             ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 14:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed, Aug 07, 2013 at 03:57:34PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 09:47:41, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > > > It isn't different from listening from epoll, for example.
> > > 
> > > epoll limits the number of watchers, no?
> > 
> > Not that I know of.  It'll be limited by max open fds but I don't
> > think there are other limits. 
> 
> max_user_watches seems to be a limit (4% of lowmem in maximum).

That's per *user* not per event source.  The problem here is creating
a global (across securit domains) resource shared by all users.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 14:01             ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-07 14:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed, Aug 07, 2013 at 03:57:34PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 09:47:41, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > > > It isn't different from listening from epoll, for example.
> > > 
> > > epoll limits the number of watchers, no?
> > 
> > Not that I know of.  It'll be limited by max open fds but I don't
> > think there are other limits. 
> 
> max_user_watches seems to be a limit (4% of lowmem in maximum).

That's per *user* not per event source.  The problem here is creating
a global (across securit domains) resource shared by all users.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
  2013-08-07 13:58       ` Tejun Heo
  (?)
@ 2013-08-07 14:37         ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 14:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > OK, I have obviously misunderstood your concern mentioned in the other
> > email. Could you be more specific what is the DoS scenario which was
> > your concern, then?
> 
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.

OK, got your point. You are right and I haven't considered the size of
the table and the size restrictions of kmalloc. Thanks for pointing this
out!
---
>From cde8a3333296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Wed, 7 Aug 2013 11:11:22 +0200
Subject: [PATCH] memcg: limit the number of thresholds per-memcg

There is no limit for the maximum number of threshold events registered
per memcg. It is even worse that all the events are stored in a
per-memcg table which is enlarged when a new event is registered. This
can lead to the following issue mentioned by Tejun:
"
So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.
"

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS	1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	else
 		BUG();
 
+	if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) {
+		ret = -ENOSPC;
+		goto unlock;
+	}
+
 	usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
 	/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 14:37         ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 14:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > OK, I have obviously misunderstood your concern mentioned in the other
> > email. Could you be more specific what is the DoS scenario which was
> > your concern, then?
> 
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.

OK, got your point. You are right and I haven't considered the size of
the table and the size restrictions of kmalloc. Thanks for pointing this
out!
---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 14:37         ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 14:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > OK, I have obviously misunderstood your concern mentioned in the other
> > email. Could you be more specific what is the DoS scenario which was
> > your concern, then?
> 
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.

OK, got your point. You are right and I haven't considered the size of
the table and the size restrictions of kmalloc. Thanks for pointing this
out!
---
From cde8a3333296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Date: Wed, 7 Aug 2013 11:11:22 +0200
Subject: [PATCH] memcg: limit the number of thresholds per-memcg

There is no limit for the maximum number of threshold events registered
per memcg. It is even worse that all the events are stored in a
per-memcg table which is enlarged when a new event is registered. This
can lead to the following issue mentioned by Tejun:
"
So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.
"

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 mm/memcontrol.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS	1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
 	else
 		BUG();
 
+	if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) {
+		ret = -ENOSPC;
+		goto unlock;
+	}
+
 	usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
 	/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 13:57           ` Michal Hocko
@ 2013-08-07 14:47             ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 14:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 15:57:34, Michal Hocko wrote:
[...]
> Hmm, OK so you think that the fd limit is sufficient already?

Hmm, that would need to touch the code as well (the register callback
would need to make sure only one event is registered per cfile). But yes
this way would be better. I will send a new patch once I have an idle
moment.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 14:47             ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 14:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 15:57:34, Michal Hocko wrote:
[...]
> Hmm, OK so you think that the fd limit is sufficient already?

Hmm, that would need to touch the code as well (the register callback
would need to make sure only one event is registered per cfile). But yes
this way would be better. I will send a new patch once I have an idle
moment.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 14:47             ` Michal Hocko
  (?)
@ 2013-08-07 17:30               ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 17:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 16:47:30, Michal Hocko wrote:
> On Wed 07-08-13 15:57:34, Michal Hocko wrote:
> [...]
> > Hmm, OK so you think that the fd limit is sufficient already?
> 
> Hmm, that would need to touch the code as well (the register callback
> would need to make sure only one event is registered per cfile). But yes
> this way would be better. I will send a new patch once I have an idle
> moment.

What do you think about the following? I am not sure about EINVAL maybe
there is a better way to tell userspace it is doing something wrong. I
would appreciate any suggestions. If this looks good I will post a
similar patch for vmpressure.
---
>From 180614714348300836701bec618608059438efc2 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Wed, 7 Aug 2013 18:51:40 +0200
Subject: [PATCH] memcg: Limit the number of events registered on oom_control

There is no limit for the maximum number of oom_control events
registered per memcg. This might lead to an user triggered memory
depletion if a regular user is allowed to register events because one
file descriptor might be used for arbitrary number of registrations.

Let's be more strict and cap the number of events to the number of
open file descriptors. Teach mem_cgroup_oom_register_event to check
the registered events and fail if the given cftype (target file file
descriptor) has been already registered. This will cap the number of
events to the maximum number of allowed open files so no user who is
allowed to register event can runaway the an existing limit.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8247db3..f893a39 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -212,6 +212,7 @@ struct mem_cgroup_thresholds {
 struct mem_cgroup_eventfd_list {
 	struct list_head list;
 	struct eventfd_ctx *eventfd;
+	struct cftype *cft;
 };
 
 static void mem_cgroup_threshold(struct mem_cgroup *memcg);
@@ -5577,6 +5578,7 @@ static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
 	struct mem_cgroup_eventfd_list *event;
 	enum res_type type = MEMFILE_TYPE(cft->private);
+	struct mem_cgroup_eventfd_list *ev, *tmp;
 
 	BUG_ON(type != _OOM_TYPE);
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
@@ -5585,7 +5587,20 @@ static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 
 	spin_lock(&memcg_oom_lock);
 
+	/*
+	 * Make sure that only one event is registered for the given
+	 * cftype - aka file descriptor. This caps the number of events
+	 * to number of file descriptors per process so the userspace doesn't
+	 * escape and depleate the memory.
+	 */
+	list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) {
+		if (ev->cft == cft) {
+			kfree(event);
+			return -EINVAL;
+		}
+	}
 	event->eventfd = eventfd;
+	event->cft = cft;
 	list_add(&event->list, &memcg->oom_notify);
 
 	/* already in OOM ? */
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 17:30               ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 17:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 16:47:30, Michal Hocko wrote:
> On Wed 07-08-13 15:57:34, Michal Hocko wrote:
> [...]
> > Hmm, OK so you think that the fd limit is sufficient already?
> 
> Hmm, that would need to touch the code as well (the register callback
> would need to make sure only one event is registered per cfile). But yes
> this way would be better. I will send a new patch once I have an idle
> moment.

What do you think about the following? I am not sure about EINVAL maybe
there is a better way to tell userspace it is doing something wrong. I
would appreciate any suggestions. If this looks good I will post a
similar patch for vmpressure.
---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-07 17:30               ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-07 17:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

On Wed 07-08-13 16:47:30, Michal Hocko wrote:
> On Wed 07-08-13 15:57:34, Michal Hocko wrote:
> [...]
> > Hmm, OK so you think that the fd limit is sufficient already?
> 
> Hmm, that would need to touch the code as well (the register callback
> would need to make sure only one event is registered per cfile). But yes
> this way would be better. I will send a new patch once I have an idle
> moment.

What do you think about the following? I am not sure about EINVAL maybe
there is a better way to tell userspace it is doing something wrong. I
would appreciate any suggestions. If this looks good I will post a
similar patch for vmpressure.
---
From 180614714348300836701bec618608059438efc2 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Date: Wed, 7 Aug 2013 18:51:40 +0200
Subject: [PATCH] memcg: Limit the number of events registered on oom_control

There is no limit for the maximum number of oom_control events
registered per memcg. This might lead to an user triggered memory
depletion if a regular user is allowed to register events because one
file descriptor might be used for arbitrary number of registrations.

Let's be more strict and cap the number of events to the number of
open file descriptors. Teach mem_cgroup_oom_register_event to check
the registered events and fail if the given cftype (target file file
descriptor) has been already registered. This will cap the number of
events to the maximum number of allowed open files so no user who is
allowed to register event can runaway the an existing limit.

Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 mm/memcontrol.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8247db3..f893a39 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -212,6 +212,7 @@ struct mem_cgroup_thresholds {
 struct mem_cgroup_eventfd_list {
 	struct list_head list;
 	struct eventfd_ctx *eventfd;
+	struct cftype *cft;
 };
 
 static void mem_cgroup_threshold(struct mem_cgroup *memcg);
@@ -5577,6 +5578,7 @@ static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
 	struct mem_cgroup_eventfd_list *event;
 	enum res_type type = MEMFILE_TYPE(cft->private);
+	struct mem_cgroup_eventfd_list *ev, *tmp;
 
 	BUG_ON(type != _OOM_TYPE);
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
@@ -5585,7 +5587,20 @@ static int mem_cgroup_oom_register_event(struct cgroup *cgrp,
 
 	spin_lock(&memcg_oom_lock);
 
+	/*
+	 * Make sure that only one event is registered for the given
+	 * cftype - aka file descriptor. This caps the number of events
+	 * to number of file descriptors per process so the userspace doesn't
+	 * escape and depleate the memory.
+	 */
+	list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) {
+		if (ev->cft == cft) {
+			kfree(event);
+			return -EINVAL;
+		}
+	}
 	event->eventfd = eventfd;
+	event->cft = cft;
 	list_add(&event->list, &memcg->oom_notify);
 
 	/* already in OOM ? */
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
  2013-08-07 14:37         ` Michal Hocko
@ 2013-08-07 22:05           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 44+ messages in thread
From: Kirill A. Shutemov @ 2013-08-07 22:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Anton Vorontsov

On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > > OK, I have obviously misunderstood your concern mentioned in the other
> > > email. Could you be more specific what is the DoS scenario which was
> > > your concern, then?
> > 
> > So, let's say the file is write-accessible to !priv user which is
> > under reasonable resource limits.  Normally this shouldn't affect priv
> > system tools which are monitoring the same event as it shouldn't be
> > able to deplete resources as long as the resource control mechanisms
> > are configured and functioning properly; however, the memory usage
> > event puts all event listeners into a single contiguous table which a
> > !priv user can easily expand to a size where the table can no longer
> > be enlarged and if a priv system tool or another user tries to
> > register event afterwards, it'll fail.  IOW, it creates a shared
> > resource which isn't properly provisioned and can be trivially filled
> > up making it an easy DoS target.
> 
> OK, got your point. You are right and I haven't considered the size of
> the table and the size restrictions of kmalloc. Thanks for pointing this
> out!
> ---
> From cde8a3333296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 7 Aug 2013 11:11:22 +0200
> Subject: [PATCH] memcg: limit the number of thresholds per-memcg
> 
> There is no limit for the maximum number of threshold events registered
> per memcg. It is even worse that all the events are stored in a
> per-memcg table which is enlarged when a new event is registered. This
> can lead to the following issue mentioned by Tejun:
> "
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.
> "
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 16KB which shouldn't be a big deal
> and it should be good enough.

Is it correct that you fix one local DoS by introducing a new one?
With the page the !priv user can block root from registering a threshold.
Is it really the way we want to fix it?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-07 22:05           ` Kirill A. Shutemov
  0 siblings, 0 replies; 44+ messages in thread
From: Kirill A. Shutemov @ 2013-08-07 22:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Anton Vorontsov

On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > > OK, I have obviously misunderstood your concern mentioned in the other
> > > email. Could you be more specific what is the DoS scenario which was
> > > your concern, then?
> > 
> > So, let's say the file is write-accessible to !priv user which is
> > under reasonable resource limits.  Normally this shouldn't affect priv
> > system tools which are monitoring the same event as it shouldn't be
> > able to deplete resources as long as the resource control mechanisms
> > are configured and functioning properly; however, the memory usage
> > event puts all event listeners into a single contiguous table which a
> > !priv user can easily expand to a size where the table can no longer
> > be enlarged and if a priv system tool or another user tries to
> > register event afterwards, it'll fail.  IOW, it creates a shared
> > resource which isn't properly provisioned and can be trivially filled
> > up making it an easy DoS target.
> 
> OK, got your point. You are right and I haven't considered the size of
> the table and the size restrictions of kmalloc. Thanks for pointing this
> out!
> ---
> From cde8a3333296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 7 Aug 2013 11:11:22 +0200
> Subject: [PATCH] memcg: limit the number of thresholds per-memcg
> 
> There is no limit for the maximum number of threshold events registered
> per memcg. It is even worse that all the events are stored in a
> per-memcg table which is enlarged when a new event is registered. This
> can lead to the following issue mentioned by Tejun:
> "
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.
> "
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 16KB which shouldn't be a big deal
> and it should be good enough.

Is it correct that you fix one local DoS by introducing a new one?
With the page the !priv user can block root from registering a threshold.
Is it really the way we want to fix it?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
  2013-08-07 22:05           ` Kirill A. Shutemov
  (?)
@ 2013-08-08 14:43             ` Michal Hocko
  -1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-08 14:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tejun Heo, linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Anton Vorontsov

On Thu 08-08-13 01:05:13, Kirill A. Shutemov wrote:
> On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
> > On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > > > OK, I have obviously misunderstood your concern mentioned in the other
> > > > email. Could you be more specific what is the DoS scenario which was
> > > > your concern, then?
> > > 
> > > So, let's say the file is write-accessible to !priv user which is
> > > under reasonable resource limits.  Normally this shouldn't affect priv
> > > system tools which are monitoring the same event as it shouldn't be
> > > able to deplete resources as long as the resource control mechanisms
> > > are configured and functioning properly; however, the memory usage
> > > event puts all event listeners into a single contiguous table which a
> > > !priv user can easily expand to a size where the table can no longer
> > > be enlarged and if a priv system tool or another user tries to
> > > register event afterwards, it'll fail.  IOW, it creates a shared
> > > resource which isn't properly provisioned and can be trivially filled
> > > up making it an easy DoS target.
> > 
> > OK, got your point. You are right and I haven't considered the size of
> > the table and the size restrictions of kmalloc. Thanks for pointing this
> > out!
> > ---
> > From cde8a3333296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Wed, 7 Aug 2013 11:11:22 +0200
> > Subject: [PATCH] memcg: limit the number of thresholds per-memcg
> > 
> > There is no limit for the maximum number of threshold events registered
> > per memcg. It is even worse that all the events are stored in a
> > per-memcg table which is enlarged when a new event is registered. This
> > can lead to the following issue mentioned by Tejun:
> > "
> > So, let's say the file is write-accessible to !priv user which is
> > under reasonable resource limits.  Normally this shouldn't affect priv
> > system tools which are monitoring the same event as it shouldn't be
> > able to deplete resources as long as the resource control mechanisms
> > are configured and functioning properly; however, the memory usage
> > event puts all event listeners into a single contiguous table which a
> > !priv user can easily expand to a size where the table can no longer
> > be enlarged and if a priv system tool or another user tries to
> > register event afterwards, it'll fail.  IOW, it creates a shared
> > resource which isn't properly provisioned and can be trivially filled
> > up making it an easy DoS target.
> > "
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> Is it correct that you fix one local DoS by introducing a new one?
> With the page the !priv user can block root from registering a threshold.
> Is it really the way we want to fix it?

OK, I will think about it some more.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-08 14:43             ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-08 14:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tejun Heo, linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Anton Vorontsov

On Thu 08-08-13 01:05:13, Kirill A. Shutemov wrote:
> On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
> > On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > > > OK, I have obviously misunderstood your concern mentioned in the other
> > > > email. Could you be more specific what is the DoS scenario which was
> > > > your concern, then?
> > > 
> > > So, let's say the file is write-accessible to !priv user which is
> > > under reasonable resource limits.  Normally this shouldn't affect priv
> > > system tools which are monitoring the same event as it shouldn't be
> > > able to deplete resources as long as the resource control mechanisms
> > > are configured and functioning properly; however, the memory usage
> > > event puts all event listeners into a single contiguous table which a
> > > !priv user can easily expand to a size where the table can no longer
> > > be enlarged and if a priv system tool or another user tries to
> > > register event afterwards, it'll fail.  IOW, it creates a shared
> > > resource which isn't properly provisioned and can be trivially filled
> > > up making it an easy DoS target.
> > 
> > OK, got your point. You are right and I haven't considered the size of
> > the table and the size restrictions of kmalloc. Thanks for pointing this
> > out!
> > ---
> > From cde8a3333296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Wed, 7 Aug 2013 11:11:22 +0200
> > Subject: [PATCH] memcg: limit the number of thresholds per-memcg
> > 
> > There is no limit for the maximum number of threshold events registered
> > per memcg. It is even worse that all the events are stored in a
> > per-memcg table which is enlarged when a new event is registered. This
> > can lead to the following issue mentioned by Tejun:
> > "
> > So, let's say the file is write-accessible to !priv user which is
> > under reasonable resource limits.  Normally this shouldn't affect priv
> > system tools which are monitoring the same event as it shouldn't be
> > able to deplete resources as long as the resource control mechanisms
> > are configured and functioning properly; however, the memory usage
> > event puts all event listeners into a single contiguous table which a
> > !priv user can easily expand to a size where the table can no longer
> > be enlarged and if a priv system tool or another user tries to
> > register event afterwards, it'll fail.  IOW, it creates a shared
> > resource which isn't properly provisioned and can be trivially filled
> > up making it an easy DoS target.
> > "
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> Is it correct that you fix one local DoS by introducing a new one?
> With the page the !priv user can block root from registering a threshold.
> Is it really the way we want to fix it?

OK, I will think about it some more.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-08 14:43             ` Michal Hocko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2013-08-08 14:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tejun Heo, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Anton Vorontsov

On Thu 08-08-13 01:05:13, Kirill A. Shutemov wrote:
> On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
> > On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > > > OK, I have obviously misunderstood your concern mentioned in the other
> > > > email. Could you be more specific what is the DoS scenario which was
> > > > your concern, then?
> > > 
> > > So, let's say the file is write-accessible to !priv user which is
> > > under reasonable resource limits.  Normally this shouldn't affect priv
> > > system tools which are monitoring the same event as it shouldn't be
> > > able to deplete resources as long as the resource control mechanisms
> > > are configured and functioning properly; however, the memory usage
> > > event puts all event listeners into a single contiguous table which a
> > > !priv user can easily expand to a size where the table can no longer
> > > be enlarged and if a priv system tool or another user tries to
> > > register event afterwards, it'll fail.  IOW, it creates a shared
> > > resource which isn't properly provisioned and can be trivially filled
> > > up making it an easy DoS target.
> > 
> > OK, got your point. You are right and I haven't considered the size of
> > the table and the size restrictions of kmalloc. Thanks for pointing this
> > out!
> > ---
> > From cde8a3333296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > Date: Wed, 7 Aug 2013 11:11:22 +0200
> > Subject: [PATCH] memcg: limit the number of thresholds per-memcg
> > 
> > There is no limit for the maximum number of threshold events registered
> > per memcg. It is even worse that all the events are stored in a
> > per-memcg table which is enlarged when a new event is registered. This
> > can lead to the following issue mentioned by Tejun:
> > "
> > So, let's say the file is write-accessible to !priv user which is
> > under reasonable resource limits.  Normally this shouldn't affect priv
> > system tools which are monitoring the same event as it shouldn't be
> > able to deplete resources as long as the resource control mechanisms
> > are configured and functioning properly; however, the memory usage
> > event puts all event listeners into a single contiguous table which a
> > !priv user can easily expand to a size where the table can no longer
> > be enlarged and if a priv system tool or another user tries to
> > register event afterwards, it'll fail.  IOW, it creates a shared
> > resource which isn't properly provisioned and can be trivially filled
> > up making it an easy DoS target.
> > "
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> Is it correct that you fix one local DoS by introducing a new one?
> With the page the !priv user can block root from registering a threshold.
> Is it really the way we want to fix it?

OK, I will think about it some more.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
  2013-08-07 17:30               ` Michal Hocko
@ 2013-08-09  0:46                 ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-09  0:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello, Michal.

On Wed, Aug 07, 2013 at 07:30:51PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 16:47:30, Michal Hocko wrote:
> > On Wed 07-08-13 15:57:34, Michal Hocko wrote:
> > [...]
> > > Hmm, OK so you think that the fd limit is sufficient already?
> > 
> > Hmm, that would need to touch the code as well (the register callback
> > would need to make sure only one event is registered per cfile). But yes
> > this way would be better. I will send a new patch once I have an idle
> > moment.
> 
> What do you think about the following? I am not sure about EINVAL maybe
> there is a better way to tell userspace it is doing something wrong. I
> would appreciate any suggestions. If this looks good I will post a
> similar patch for vmpressure.

I don't think it's a good idea.  Not sure it matters given that this
isn't a very popular interface but adding this sort of rather
arbitrary restrictions can be confusing and lead to issues in userland
which are extremely annoying to track down.

Also, in terms of layering, this is horribly misplaced.  This is low
level event source implementation, which is not the right place to
implement logic to protect from userland abuses / mistakes.

That's the whole thing with this interface.  It's essentially
implementing a new userland-visible notification framework.  It is a
complex userland visible interface which takes a lot of design and
effort to get right and cgroup core or memcg definitely is not the
place to do anything like this.  Collectively, we are not capable
enough to do pull things like this properly by ourselves and even if
we were it is not the right place to do it.

Given how generally broken delegating to !priv users is, I don't think
there's anything we can or should do at this point rather than noting
that it is broken and was a mistake.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control
@ 2013-08-09  0:46                 ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-09  0:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrew Morton, Kirill A. Shutemov,
	Anton Vorontsov

Hello, Michal.

On Wed, Aug 07, 2013 at 07:30:51PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 16:47:30, Michal Hocko wrote:
> > On Wed 07-08-13 15:57:34, Michal Hocko wrote:
> > [...]
> > > Hmm, OK so you think that the fd limit is sufficient already?
> > 
> > Hmm, that would need to touch the code as well (the register callback
> > would need to make sure only one event is registered per cfile). But yes
> > this way would be better. I will send a new patch once I have an idle
> > moment.
> 
> What do you think about the following? I am not sure about EINVAL maybe
> there is a better way to tell userspace it is doing something wrong. I
> would appreciate any suggestions. If this looks good I will post a
> similar patch for vmpressure.

I don't think it's a good idea.  Not sure it matters given that this
isn't a very popular interface but adding this sort of rather
arbitrary restrictions can be confusing and lead to issues in userland
which are extremely annoying to track down.

Also, in terms of layering, this is horribly misplaced.  This is low
level event source implementation, which is not the right place to
implement logic to protect from userland abuses / mistakes.

That's the whole thing with this interface.  It's essentially
implementing a new userland-visible notification framework.  It is a
complex userland visible interface which takes a lot of design and
effort to get right and cgroup core or memcg definitely is not the
place to do anything like this.  Collectively, we are not capable
enough to do pull things like this properly by ourselves and even if
we were it is not the right place to do it.

Given how generally broken delegating to !priv users is, I don't think
there's anything we can or should do at this point rather than noting
that it is broken and was a mistake.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
  2013-08-08 14:43             ` Michal Hocko
@ 2013-08-09  0:50               ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-09  0:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, cgroups,
	Johannes Weiner, KAMEZAWA Hiroyuki, Andrew Morton,
	Anton Vorontsov

Hello,

On Thu, Aug 08, 2013 at 04:43:51PM +0200, Michal Hocko wrote:
> > Is it correct that you fix one local DoS by introducing a new one?
> > With the page the !priv user can block root from registering a threshold.
> > Is it really the way we want to fix it?
> 
> OK, I will think about it some more.

The only thing the patch does is replacing implicit global resource
limit with an explicit one.  Whether that's useful or not, I don't
know, but it doesn't really change the nature of the problem or
actually fix anything.  The only way to fix it is rewriting the whole
thing so that allocations are broken up per source, which I don't
think is a good idea at this point.  I'd just add a comment noting why
it's broken.  Given that delegating to !priv users is horribly broken
anyway, I don't think this worsens the situation by too much anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
@ 2013-08-09  0:50               ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2013-08-09  0:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, cgroups,
	Johannes Weiner, KAMEZAWA Hiroyuki, Andrew Morton,
	Anton Vorontsov

Hello,

On Thu, Aug 08, 2013 at 04:43:51PM +0200, Michal Hocko wrote:
> > Is it correct that you fix one local DoS by introducing a new one?
> > With the page the !priv user can block root from registering a threshold.
> > Is it really the way we want to fix it?
> 
> OK, I will think about it some more.

The only thing the patch does is replacing implicit global resource
limit with an explicit one.  Whether that's useful or not, I don't
know, but it doesn't really change the nature of the problem or
actually fix anything.  The only way to fix it is rewriting the whole
thing so that allocations are broken up per source, which I don't
think is a good idea at this point.  I'd just add a comment noting why
it's broken.  Given that delegating to !priv users is horribly broken
anyway, I don't think this worsens the situation by too much anyway.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2013-08-09  0:50 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-07 11:28 [PATCH 1/3] memcg: limit the number of thresholds per-memcg Michal Hocko
2013-08-07 11:28 ` Michal Hocko
2013-08-07 11:28 ` [PATCH 2/3] memcg: Limit the number of events registered on oom_control Michal Hocko
2013-08-07 11:28   ` Michal Hocko
2013-08-07 13:08   ` Tejun Heo
2013-08-07 13:08     ` Tejun Heo
2013-08-07 13:08     ` Tejun Heo
2013-08-07 13:11     ` Tejun Heo
2013-08-07 13:11       ` Tejun Heo
2013-08-07 13:37     ` Michal Hocko
2013-08-07 13:37       ` Michal Hocko
2013-08-07 13:47       ` Tejun Heo
2013-08-07 13:47         ` Tejun Heo
2013-08-07 13:57         ` Michal Hocko
2013-08-07 13:57           ` Michal Hocko
2013-08-07 13:57           ` Michal Hocko
2013-08-07 14:01           ` Tejun Heo
2013-08-07 14:01             ` Tejun Heo
2013-08-07 14:47           ` Michal Hocko
2013-08-07 14:47             ` Michal Hocko
2013-08-07 17:30             ` Michal Hocko
2013-08-07 17:30               ` Michal Hocko
2013-08-07 17:30               ` Michal Hocko
2013-08-09  0:46               ` Tejun Heo
2013-08-09  0:46                 ` Tejun Heo
2013-08-07 11:28 ` [PATCH 3/3] vmpressure: limit the number of registered events Michal Hocko
2013-08-07 11:28   ` Michal Hocko
2013-08-07 13:22 ` [PATCH 1/3] memcg: limit the number of thresholds per-memcg Tejun Heo
2013-08-07 13:22   ` Tejun Heo
2013-08-07 13:22   ` Tejun Heo
2013-08-07 13:46   ` Michal Hocko
2013-08-07 13:46     ` Michal Hocko
2013-08-07 13:58     ` Tejun Heo
2013-08-07 13:58       ` Tejun Heo
2013-08-07 14:37       ` Michal Hocko
2013-08-07 14:37         ` Michal Hocko
2013-08-07 14:37         ` Michal Hocko
2013-08-07 22:05         ` Kirill A. Shutemov
2013-08-07 22:05           ` Kirill A. Shutemov
2013-08-08 14:43           ` Michal Hocko
2013-08-08 14:43             ` Michal Hocko
2013-08-08 14:43             ` Michal Hocko
2013-08-09  0:50             ` Tejun Heo
2013-08-09  0:50               ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.