All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Reduce overhead of memcg when unused
@ 2015-05-20 12:50 ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 12:50 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Andrew Morton
  Cc: Tejun Heo, Linux-CGroups, Linux-MM, LKML, Mel Gorman

These are two patches to reduce the overhead of memcg, particularly when
it's not used. The first is a simple reordering of when a barrier is applied
which memcg happens to get burned by.  I doubt it is controversial at all.

The second optionally disables memcg by default. This should have
been the default from the start and it matches what Debian already does
today. The difficulty is that existing installations may break if the new
kernel parameter is not applied so distributions need to be careful with
upgrades. The difference it makes is marginal and only visible in profiles,
not headline performance. It'd be understandable if memcg maintainers
rejected it but I'll leave it up to them.

 Documentation/kernel-parameters.txt |  4 ++++
 init/Kconfig                        | 15 +++++++++++++++
 kernel/cgroup.c                     | 20 ++++++++++++++++----
 mm/memcontrol.c                     |  3 +++
 mm/memory.c                         | 10 ++++++----
 5 files changed, 44 insertions(+), 8 deletions(-)

-- 
2.3.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/2] Reduce overhead of memcg when unused
@ 2015-05-20 12:50 ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 12:50 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Andrew Morton
  Cc: Tejun Heo, Linux-CGroups, Linux-MM, LKML, Mel Gorman

These are two patches to reduce the overhead of memcg, particularly when
it's not used. The first is a simple reordering of when a barrier is applied
which memcg happens to get burned by.  I doubt it is controversial at all.

The second optionally disables memcg by default. This should have
been the default from the start and it matches what Debian already does
today. The difficulty is that existing installations may break if the new
kernel parameter is not applied so distributions need to be careful with
upgrades. The difference it makes is marginal and only visible in profiles,
not headline performance. It'd be understandable if memcg maintainers
rejected it but I'll leave it up to them.

 Documentation/kernel-parameters.txt |  4 ++++
 init/Kconfig                        | 15 +++++++++++++++
 kernel/cgroup.c                     | 20 ++++++++++++++++----
 mm/memcontrol.c                     |  3 +++
 mm/memory.c                         | 10 ++++++----
 5 files changed, 44 insertions(+), 8 deletions(-)

-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/2] Reduce overhead of memcg when unused
@ 2015-05-20 12:50 ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 12:50 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Andrew Morton
  Cc: Tejun Heo, Linux-CGroups, Linux-MM, LKML, Mel Gorman

These are two patches to reduce the overhead of memcg, particularly when
it's not used. The first is a simple reordering of when a barrier is applied
which memcg happens to get burned by.  I doubt it is controversial at all.

The second optionally disables memcg by default. This should have
been the default from the start and it matches what Debian already does
today. The difficulty is that existing installations may break if the new
kernel parameter is not applied so distributions need to be careful with
upgrades. The difference it makes is marginal and only visible in profiles,
not headline performance. It'd be understandable if memcg maintainers
rejected it but I'll leave it up to them.

 Documentation/kernel-parameters.txt |  4 ++++
 init/Kconfig                        | 15 +++++++++++++++
 kernel/cgroup.c                     | 20 ++++++++++++++++----
 mm/memcontrol.c                     |  3 +++
 mm/memory.c                         | 10 ++++++----
 5 files changed, 44 insertions(+), 8 deletions(-)

-- 
2.3.5

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
  2015-05-20 12:50 ` Mel Gorman
@ 2015-05-20 12:50   ` Mel Gorman
  -1 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 12:50 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Andrew Morton
  Cc: Tejun Heo, Linux-CGroups, Linux-MM, LKML, Mel Gorman

Historically memcg overhead was high even if memcg was unused. This has
improved a lot but it still showed up in a profile summary as being a
problem.

/usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
  mem_cgroup_try_charge                                                        2.950%   175781
  __mem_cgroup_count_vm_event                                                  1.431%    85239
  mem_cgroup_page_lruvec                                                       0.456%    27156
  mem_cgroup_commit_charge                                                     0.392%    23342
  uncharge_list                                                                0.323%    19256
  mem_cgroup_update_lru_size                                                   0.278%    16538
  memcg_check_events                                                           0.216%    12858
  mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
  try_charge                                                                   0.150%     8928
  commit_charge                                                                0.141%     8388
  get_mem_cgroup_from_mm                                                       0.121%     7184

That is showing that 6.64% of system CPU cycles were in memcontrol.c and
dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
the cost was checking PageSwapCache which is expected to be cache hot but is
very expensive. The problem appears to be that __SetPageUptodate is called
just before the check which is a write barrier. It is required to make sure
struct page and page data is written before the PTE is updated and the data
visible to userspace. memcg charging does not require or need the barrier
but gets unfairly hit with the cost so this patch attempts the charging
before the barrier.  Aside from the accidental cost to memcg there is the
added benefit that the barrier is avoided if the page cannot be charged.
When applied the relevant profile summary is as follows.

/usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
  __mem_cgroup_count_vm_event                                                  1.143%    67312
  mem_cgroup_page_lruvec                                                       0.465%    27403
  mem_cgroup_commit_charge                                                     0.381%    22452
  uncharge_list                                                                0.332%    19543
  mem_cgroup_update_lru_size                                                   0.284%    16704
  get_mem_cgroup_from_mm                                                       0.271%    15952
  mem_cgroup_try_charge                                                        0.237%    13982
  memcg_check_events                                                           0.222%    13058
  mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
  commit_charge                                                                0.140%     8235
  try_charge                                                                   0.131%     7716

That brings the overhead down to 3.79% and leaves the memcg fault accounting
to the root cgroup but it's an improvement. The difference in headline
performance of the page fault microbench is marginal as memcg is such a
small component of it.

pft faults
                                       4.0.0                  4.0.0
                                     vanilla            chargefirst
Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1509075.7561 (  4.56%)
Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1339160.7113 ( -0.09%)
Hmean    faults/cpu-5  875599.0222 (  0.00%)  874174.1255 ( -0.16%)
Hmean    faults/cpu-7  601146.6726 (  0.00%)  601370.9977 (  0.04%)
Hmean    faults/cpu-8  510728.2754 (  0.00%)  510598.8214 ( -0.03%)
Hmean    faults/sec-1 1432084.7845 (  0.00%) 1497935.5274 (  4.60%)
Hmean    faults/sec-3 3943818.1437 (  0.00%) 3941920.1520 ( -0.05%)
Hmean    faults/sec-5 3877573.5867 (  0.00%) 3869385.7553 ( -0.21%)
Hmean    faults/sec-7 3991832.0418 (  0.00%) 3992181.4189 (  0.01%)
Hmean    faults/sec-8 3987189.8167 (  0.00%) 3986452.2204 ( -0.02%)

It's only visible at single threaded. The overhead is there for higher
threads but other factors dominate.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/memory.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 97839f5c8c30..80a03628bd77 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2158,11 +2158,12 @@ gotten:
 			goto oom;
 		cow_user_page(new_page, old_page, address, vma);
 	}
-	__SetPageUptodate(new_page);
 
 	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
 		goto oom_free_new;
 
+	__SetPageUptodate(new_page);
+
 	mmun_start  = address & PAGE_MASK;
 	mmun_end    = mmun_start + PAGE_SIZE;
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
@@ -2594,6 +2595,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	page = alloc_zeroed_user_highpage_movable(vma, address);
 	if (!page)
 		goto oom;
+
+	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
+		goto oom_free_page;
+
 	/*
 	 * The memory barrier inside __SetPageUptodate makes sure that
 	 * preceeding stores to the page contents become visible before
@@ -2601,9 +2606,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 */
 	__SetPageUptodate(page);
 
-	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
-		goto oom_free_page;
-
 	entry = mk_pte(page, vma->vm_page_prot);
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry));
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
@ 2015-05-20 12:50   ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 12:50 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Andrew Morton
  Cc: Tejun Heo, Linux-CGroups, Linux-MM, LKML, Mel Gorman

Historically memcg overhead was high even if memcg was unused. This has
improved a lot but it still showed up in a profile summary as being a
problem.

/usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
  mem_cgroup_try_charge                                                        2.950%   175781
  __mem_cgroup_count_vm_event                                                  1.431%    85239
  mem_cgroup_page_lruvec                                                       0.456%    27156
  mem_cgroup_commit_charge                                                     0.392%    23342
  uncharge_list                                                                0.323%    19256
  mem_cgroup_update_lru_size                                                   0.278%    16538
  memcg_check_events                                                           0.216%    12858
  mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
  try_charge                                                                   0.150%     8928
  commit_charge                                                                0.141%     8388
  get_mem_cgroup_from_mm                                                       0.121%     7184

That is showing that 6.64% of system CPU cycles were in memcontrol.c and
dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
the cost was checking PageSwapCache which is expected to be cache hot but is
very expensive. The problem appears to be that __SetPageUptodate is called
just before the check which is a write barrier. It is required to make sure
struct page and page data is written before the PTE is updated and the data
visible to userspace. memcg charging does not require or need the barrier
but gets unfairly hit with the cost so this patch attempts the charging
before the barrier.  Aside from the accidental cost to memcg there is the
added benefit that the barrier is avoided if the page cannot be charged.
When applied the relevant profile summary is as follows.

/usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
  __mem_cgroup_count_vm_event                                                  1.143%    67312
  mem_cgroup_page_lruvec                                                       0.465%    27403
  mem_cgroup_commit_charge                                                     0.381%    22452
  uncharge_list                                                                0.332%    19543
  mem_cgroup_update_lru_size                                                   0.284%    16704
  get_mem_cgroup_from_mm                                                       0.271%    15952
  mem_cgroup_try_charge                                                        0.237%    13982
  memcg_check_events                                                           0.222%    13058
  mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
  commit_charge                                                                0.140%     8235
  try_charge                                                                   0.131%     7716

That brings the overhead down to 3.79% and leaves the memcg fault accounting
to the root cgroup but it's an improvement. The difference in headline
performance of the page fault microbench is marginal as memcg is such a
small component of it.

pft faults
                                       4.0.0                  4.0.0
                                     vanilla            chargefirst
Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1509075.7561 (  4.56%)
Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1339160.7113 ( -0.09%)
Hmean    faults/cpu-5  875599.0222 (  0.00%)  874174.1255 ( -0.16%)
Hmean    faults/cpu-7  601146.6726 (  0.00%)  601370.9977 (  0.04%)
Hmean    faults/cpu-8  510728.2754 (  0.00%)  510598.8214 ( -0.03%)
Hmean    faults/sec-1 1432084.7845 (  0.00%) 1497935.5274 (  4.60%)
Hmean    faults/sec-3 3943818.1437 (  0.00%) 3941920.1520 ( -0.05%)
Hmean    faults/sec-5 3877573.5867 (  0.00%) 3869385.7553 ( -0.21%)
Hmean    faults/sec-7 3991832.0418 (  0.00%) 3992181.4189 (  0.01%)
Hmean    faults/sec-8 3987189.8167 (  0.00%) 3986452.2204 ( -0.02%)

It's only visible at single threaded. The overhead is there for higher
threads but other factors dominate.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/memory.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 97839f5c8c30..80a03628bd77 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2158,11 +2158,12 @@ gotten:
 			goto oom;
 		cow_user_page(new_page, old_page, address, vma);
 	}
-	__SetPageUptodate(new_page);
 
 	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
 		goto oom_free_new;
 
+	__SetPageUptodate(new_page);
+
 	mmun_start  = address & PAGE_MASK;
 	mmun_end    = mmun_start + PAGE_SIZE;
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
@@ -2594,6 +2595,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	page = alloc_zeroed_user_highpage_movable(vma, address);
 	if (!page)
 		goto oom;
+
+	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
+		goto oom_free_page;
+
 	/*
 	 * The memory barrier inside __SetPageUptodate makes sure that
 	 * preceeding stores to the page contents become visible before
@@ -2601,9 +2606,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 */
 	__SetPageUptodate(page);
 
-	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
-		goto oom_free_page;
-
 	entry = mk_pte(page, vma->vm_page_prot);
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry));
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
  2015-05-20 12:50 ` Mel Gorman
@ 2015-05-20 12:50   ` Mel Gorman
  -1 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 12:50 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Andrew Morton
  Cc: Tejun Heo, Linux-CGroups, Linux-MM, LKML, Mel Gorman

memcg was reported years ago to have significant overhead when unused. It
has improved but it's still the case that users that have no knowledge of
memcg pay a small performance penalty.

This patch adds a Kconfig that controls whether memcg is enabled by default
and a kernel parameter cgroup_enable= to enable it if desired. Anyone using
oldconfig will get the historical behaviour. It is not an option for most
distributions to simply disable MEMCG as there are users that require it
but they should also be knowledgable enough to use cgroup_enable=.

This was evaluated using aim9, a page fault microbenchmark and ebizzy
but I'll focus on the page fault microbenchmark. It can be reproduced
using pft from mmtests (https://github.com/gormanm/mmtests).  Edit
configs/config-global-dhp__pagealloc-performance and update MMTESTS to
only contain pft. This is the relevant part of the profile summary

/usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
  __mem_cgroup_count_vm_event                                                  1.143%    67312
  mem_cgroup_page_lruvec                                                       0.465%    27403
  mem_cgroup_commit_charge                                                     0.381%    22452
  uncharge_list                                                                0.332%    19543
  mem_cgroup_update_lru_size                                                   0.284%    16704
  get_mem_cgroup_from_mm                                                       0.271%    15952
  mem_cgroup_try_charge                                                        0.237%    13982
  memcg_check_events                                                           0.222%    13058
  mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
  commit_charge                                                                0.140%     8235
  try_charge                                                                   0.131%     7716

It's showing 3.79% overhead in memcontrol.c when no memcgs are in
use. Applying the patch and disabling memcg reduces this to 0.51%

/usr/src/linux-4.0-disable-v2r1/mm/memcontrol.c                      0.5100    29304
  mem_cgroup_page_lruvec                                                       0.161%     9267
  mem_cgroup_update_lru_size                                                   0.154%     8872
  mem_cgroup_try_charge                                                        0.153%     8768
  mem_cgroup_commit_charge                                                     0.042%     2397

pft faults
                                       4.0.0                  4.0.0
                                 chargefirst                disable
Hmean    faults/cpu-1 1509075.7561 (  0.00%) 1508934.4568 ( -0.01%)
Hmean    faults/cpu-3 1339160.7113 (  0.00%) 1379512.0698 (  3.01%)
Hmean    faults/cpu-5  874174.1255 (  0.00%)  875741.7674 (  0.18%)
Hmean    faults/cpu-7  601370.9977 (  0.00%)  599938.2026 ( -0.24%)
Hmean    faults/cpu-8  510598.8214 (  0.00%)  510663.5402 (  0.01%)
Hmean    faults/sec-1 1497935.5274 (  0.00%) 1496585.7400 ( -0.09%)
Hmean    faults/sec-3 3941920.1520 (  0.00%) 4050811.9259 (  2.76%)
Hmean    faults/sec-5 3869385.7553 (  0.00%) 3922299.6112 (  1.37%)
Hmean    faults/sec-7 3992181.4189 (  0.00%) 3988511.0065 ( -0.09%)
Hmean    faults/sec-8 3986452.2204 (  0.00%) 3977706.7883 ( -0.22%)

Low thread counts get a small boost but it's within noise as memcg overhead
does not dominate. It's not obvious at all at higher thread counts as other
factors cause more problems. The overall breakdown of CPU usage looks like

               4.0.0       4.0.0
        chargefirst-v2r1disable-v2r1
User           41.81       41.45
System        407.64      405.50
Elapsed       128.17      127.06

Despite the relative unimportance, there is at least some justification
for disabling memcg by default.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 Documentation/kernel-parameters.txt |  4 ++++
 init/Kconfig                        | 15 +++++++++++++++
 kernel/cgroup.c                     | 20 ++++++++++++++++----
 mm/memcontrol.c                     |  3 +++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a62a7b4..4f264f906816 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -591,6 +591,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			cut the overhead, others just disable the usage. So
 			only cgroup_disable=memory is actually worthy}
 
+	cgroup_enable= [KNL] Enable a particular controller
+			Similar to cgroup_disable except that it enables
+			controllers that are disabled by default.
+
 	checkreqprot	[SELINUX] Set initial checkreqprot flag value.
 			Format: { "0" | "1" }
 			See security/selinux/Kconfig help text.
diff --git a/init/Kconfig b/init/Kconfig
index f5dbc6d4261b..819b6cc05cba 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -990,6 +990,21 @@ config MEMCG
 	  Provides a memory resource controller that manages both anonymous
 	  memory and page cache. (See Documentation/cgroups/memory.txt)
 
+config MEMCG_DEFAULT_ENABLED
+	bool "Automatically enable memory resource controller"
+	default y
+	depends on MEMCG
+	help
+	  The memory controller has some overhead even if idle as resource
+	  usage must be tracked in case a group is created and a process
+	  migrated. As users may not be aware of this and the cgroup_disable=
+	  option, this config option controls whether it is enabled by
+	  default. It is assumed that someone that requires the controller
+	  can find the cgroup_enable= switch.
+
+	  Say N if unsure. This is default Y to preserve oldconfig and
+	  historical behaviour.
+
 config MEMCG_SWAP
 	bool "Memory Resource Controller Swap Extension"
 	depends on MEMCG && SWAP
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 29a7b2cc593e..0e79db55bf1a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5370,7 +5370,7 @@ out_free:
 	kfree(pathbuf);
 }
 
-static int __init cgroup_disable(char *str)
+static int __init __cgroup_set_state(char *str, bool disabled)
 {
 	struct cgroup_subsys *ss;
 	char *token;
@@ -5382,16 +5382,28 @@ static int __init cgroup_disable(char *str)
 
 		for_each_subsys(ss, i) {
 			if (!strcmp(token, ss->name)) {
-				ss->disabled = 1;
-				printk(KERN_INFO "Disabling %s control group"
-					" subsystem\n", ss->name);
+				ss->disabled = disabled;
+				printk(KERN_INFO "Setting %s control group"
+					" subsystem %s\n", ss->name,
+					disabled ? "disabled" : "enabled");
 				break;
 			}
 		}
 	}
 	return 1;
 }
+
+static int __init cgroup_disable(char *str)
+{
+	return __cgroup_set_state(str, true);
+}
+
+static int __init cgroup_enable(char *str)
+{
+	return __cgroup_set_state(str, false);
+}
 __setup("cgroup_disable=", cgroup_disable);
+__setup("cgroup_enable=", cgroup_enable);
 
 static int __init cgroup_set_legacy_files_on_dfl(char *str)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b34ef4a32a3b..ce171ba16949 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5391,6 +5391,9 @@ struct cgroup_subsys memory_cgrp_subsys = {
 	.dfl_cftypes = memory_files,
 	.legacy_cftypes = mem_cgroup_legacy_files,
 	.early_init = 0,
+#ifndef CONFIG_MEMCG_DEFAULT_ENABLED
+	.disabled = 1,
+#endif
 };
 
 /**
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 12:50   ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 12:50 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Andrew Morton
  Cc: Tejun Heo, Linux-CGroups, Linux-MM, LKML, Mel Gorman

memcg was reported years ago to have significant overhead when unused. It
has improved but it's still the case that users that have no knowledge of
memcg pay a small performance penalty.

This patch adds a Kconfig that controls whether memcg is enabled by default
and a kernel parameter cgroup_enable= to enable it if desired. Anyone using
oldconfig will get the historical behaviour. It is not an option for most
distributions to simply disable MEMCG as there are users that require it
but they should also be knowledgable enough to use cgroup_enable=.

This was evaluated using aim9, a page fault microbenchmark and ebizzy
but I'll focus on the page fault microbenchmark. It can be reproduced
using pft from mmtests (https://github.com/gormanm/mmtests).  Edit
configs/config-global-dhp__pagealloc-performance and update MMTESTS to
only contain pft. This is the relevant part of the profile summary

/usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
  __mem_cgroup_count_vm_event                                                  1.143%    67312
  mem_cgroup_page_lruvec                                                       0.465%    27403
  mem_cgroup_commit_charge                                                     0.381%    22452
  uncharge_list                                                                0.332%    19543
  mem_cgroup_update_lru_size                                                   0.284%    16704
  get_mem_cgroup_from_mm                                                       0.271%    15952
  mem_cgroup_try_charge                                                        0.237%    13982
  memcg_check_events                                                           0.222%    13058
  mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
  commit_charge                                                                0.140%     8235
  try_charge                                                                   0.131%     7716

It's showing 3.79% overhead in memcontrol.c when no memcgs are in
use. Applying the patch and disabling memcg reduces this to 0.51%

/usr/src/linux-4.0-disable-v2r1/mm/memcontrol.c                      0.5100    29304
  mem_cgroup_page_lruvec                                                       0.161%     9267
  mem_cgroup_update_lru_size                                                   0.154%     8872
  mem_cgroup_try_charge                                                        0.153%     8768
  mem_cgroup_commit_charge                                                     0.042%     2397

pft faults
                                       4.0.0                  4.0.0
                                 chargefirst                disable
Hmean    faults/cpu-1 1509075.7561 (  0.00%) 1508934.4568 ( -0.01%)
Hmean    faults/cpu-3 1339160.7113 (  0.00%) 1379512.0698 (  3.01%)
Hmean    faults/cpu-5  874174.1255 (  0.00%)  875741.7674 (  0.18%)
Hmean    faults/cpu-7  601370.9977 (  0.00%)  599938.2026 ( -0.24%)
Hmean    faults/cpu-8  510598.8214 (  0.00%)  510663.5402 (  0.01%)
Hmean    faults/sec-1 1497935.5274 (  0.00%) 1496585.7400 ( -0.09%)
Hmean    faults/sec-3 3941920.1520 (  0.00%) 4050811.9259 (  2.76%)
Hmean    faults/sec-5 3869385.7553 (  0.00%) 3922299.6112 (  1.37%)
Hmean    faults/sec-7 3992181.4189 (  0.00%) 3988511.0065 ( -0.09%)
Hmean    faults/sec-8 3986452.2204 (  0.00%) 3977706.7883 ( -0.22%)

Low thread counts get a small boost but it's within noise as memcg overhead
does not dominate. It's not obvious at all at higher thread counts as other
factors cause more problems. The overall breakdown of CPU usage looks like

               4.0.0       4.0.0
        chargefirst-v2r1disable-v2r1
User           41.81       41.45
System        407.64      405.50
Elapsed       128.17      127.06

Despite the relative unimportance, there is at least some justification
for disabling memcg by default.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 Documentation/kernel-parameters.txt |  4 ++++
 init/Kconfig                        | 15 +++++++++++++++
 kernel/cgroup.c                     | 20 ++++++++++++++++----
 mm/memcontrol.c                     |  3 +++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a62a7b4..4f264f906816 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -591,6 +591,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			cut the overhead, others just disable the usage. So
 			only cgroup_disable=memory is actually worthy}
 
+	cgroup_enable= [KNL] Enable a particular controller
+			Similar to cgroup_disable except that it enables
+			controllers that are disabled by default.
+
 	checkreqprot	[SELINUX] Set initial checkreqprot flag value.
 			Format: { "0" | "1" }
 			See security/selinux/Kconfig help text.
diff --git a/init/Kconfig b/init/Kconfig
index f5dbc6d4261b..819b6cc05cba 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -990,6 +990,21 @@ config MEMCG
 	  Provides a memory resource controller that manages both anonymous
 	  memory and page cache. (See Documentation/cgroups/memory.txt)
 
+config MEMCG_DEFAULT_ENABLED
+	bool "Automatically enable memory resource controller"
+	default y
+	depends on MEMCG
+	help
+	  The memory controller has some overhead even if idle as resource
+	  usage must be tracked in case a group is created and a process
+	  migrated. As users may not be aware of this and the cgroup_disable=
+	  option, this config option controls whether it is enabled by
+	  default. It is assumed that someone that requires the controller
+	  can find the cgroup_enable= switch.
+
+	  Say N if unsure. This is default Y to preserve oldconfig and
+	  historical behaviour.
+
 config MEMCG_SWAP
 	bool "Memory Resource Controller Swap Extension"
 	depends on MEMCG && SWAP
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 29a7b2cc593e..0e79db55bf1a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5370,7 +5370,7 @@ out_free:
 	kfree(pathbuf);
 }
 
-static int __init cgroup_disable(char *str)
+static int __init __cgroup_set_state(char *str, bool disabled)
 {
 	struct cgroup_subsys *ss;
 	char *token;
@@ -5382,16 +5382,28 @@ static int __init cgroup_disable(char *str)
 
 		for_each_subsys(ss, i) {
 			if (!strcmp(token, ss->name)) {
-				ss->disabled = 1;
-				printk(KERN_INFO "Disabling %s control group"
-					" subsystem\n", ss->name);
+				ss->disabled = disabled;
+				printk(KERN_INFO "Setting %s control group"
+					" subsystem %s\n", ss->name,
+					disabled ? "disabled" : "enabled");
 				break;
 			}
 		}
 	}
 	return 1;
 }
+
+static int __init cgroup_disable(char *str)
+{
+	return __cgroup_set_state(str, true);
+}
+
+static int __init cgroup_enable(char *str)
+{
+	return __cgroup_set_state(str, false);
+}
 __setup("cgroup_disable=", cgroup_disable);
+__setup("cgroup_enable=", cgroup_enable);
 
 static int __init cgroup_set_legacy_files_on_dfl(char *str)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b34ef4a32a3b..ce171ba16949 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5391,6 +5391,9 @@ struct cgroup_subsys memory_cgrp_subsys = {
 	.dfl_cftypes = memory_files,
 	.legacy_cftypes = mem_cgroup_legacy_files,
 	.early_init = 0,
+#ifndef CONFIG_MEMCG_DEFAULT_ENABLED
+	.disabled = 1,
+#endif
 };
 
 /**
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
  2015-05-20 12:50   ` Mel Gorman
@ 2015-05-20 13:47     ` Davidlohr Bueso
  -1 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2015-05-20 13:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Tejun Heo,
	Linux-CGroups, Linux-MM, LKML

On Wed, 2015-05-20 at 13:50 +0100, Mel Gorman wrote:
> +config MEMCG_DEFAULT_ENABLED
> +	bool "Automatically enable memory resource controller"
> +	default y
> +	depends on MEMCG
> +	help
> +	  The memory controller has some overhead even if idle as resource
> +	  usage must be tracked in case a group is created and a process
> +	  migrated. As users may not be aware of this and the cgroup_disable=
> +	  option, this config option controls whether it is enabled by
> +	  default. It is assumed that someone that requires the controller
> +	  can find the cgroup_enable= switch.
> +
> +	  Say N if unsure. This is default Y to preserve oldconfig and
> +	  historical behaviour.

Out of curiosity, how do you expect distros to handle this? I mean, this
is a pretty general functionality and customers won't want to be
changing kernels (they may or may not use memcg). iow, will this ever be
disabled?

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 13:47     ` Davidlohr Bueso
  0 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2015-05-20 13:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Tejun Heo,
	Linux-CGroups, Linux-MM, LKML

On Wed, 2015-05-20 at 13:50 +0100, Mel Gorman wrote:
> +config MEMCG_DEFAULT_ENABLED
> +	bool "Automatically enable memory resource controller"
> +	default y
> +	depends on MEMCG
> +	help
> +	  The memory controller has some overhead even if idle as resource
> +	  usage must be tracked in case a group is created and a process
> +	  migrated. As users may not be aware of this and the cgroup_disable=
> +	  option, this config option controls whether it is enabled by
> +	  default. It is assumed that someone that requires the controller
> +	  can find the cgroup_enable= switch.
> +
> +	  Say N if unsure. This is default Y to preserve oldconfig and
> +	  historical behaviour.

Out of curiosity, how do you expect distros to handle this? I mean, this
is a pretty general functionality and customers won't want to be
changing kernels (they may or may not use memcg). iow, will this ever be
disabled?

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
  2015-05-20 12:50   ` Mel Gorman
@ 2015-05-20 14:03     ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2015-05-20 14:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Johannes Weiner, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed 20-05-15 13:50:44, Mel Gorman wrote:
> Historically memcg overhead was high even if memcg was unused. This has
> improved a lot but it still showed up in a profile summary as being a
> problem.
> 
> /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
>   mem_cgroup_try_charge                                                        2.950%   175781
>   __mem_cgroup_count_vm_event                                                  1.431%    85239
>   mem_cgroup_page_lruvec                                                       0.456%    27156
>   mem_cgroup_commit_charge                                                     0.392%    23342
>   uncharge_list                                                                0.323%    19256
>   mem_cgroup_update_lru_size                                                   0.278%    16538
>   memcg_check_events                                                           0.216%    12858
>   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
>   try_charge                                                                   0.150%     8928
>   commit_charge                                                                0.141%     8388
>   get_mem_cgroup_from_mm                                                       0.121%     7184
> 
> That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> the cost was checking PageSwapCache which is expected to be cache hot but is
> very expensive. The problem appears to be that __SetPageUptodate is called
> just before the check which is a write barrier. It is required to make sure
> struct page and page data is written before the PTE is updated and the data
> visible to userspace. memcg charging does not require or need the barrier
> but gets unfairly hit with the cost so this patch attempts the charging
> before the barrier.  Aside from the accidental cost to memcg there is the
> added benefit that the barrier is avoided if the page cannot be charged.
> When applied the relevant profile summary is as follows.
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312
>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> That brings the overhead down to 3.79% and leaves the memcg fault accounting
> to the root cgroup but it's an improvement. The difference in headline
> performance of the page fault microbench is marginal as memcg is such a
> small component of it.
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                      vanilla            chargefirst
> Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1509075.7561 (  4.56%)
> Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1339160.7113 ( -0.09%)
> Hmean    faults/cpu-5  875599.0222 (  0.00%)  874174.1255 ( -0.16%)
> Hmean    faults/cpu-7  601146.6726 (  0.00%)  601370.9977 (  0.04%)
> Hmean    faults/cpu-8  510728.2754 (  0.00%)  510598.8214 ( -0.03%)
> Hmean    faults/sec-1 1432084.7845 (  0.00%) 1497935.5274 (  4.60%)
> Hmean    faults/sec-3 3943818.1437 (  0.00%) 3941920.1520 ( -0.05%)
> Hmean    faults/sec-5 3877573.5867 (  0.00%) 3869385.7553 ( -0.21%)
> Hmean    faults/sec-7 3991832.0418 (  0.00%) 3992181.4189 (  0.01%)
> Hmean    faults/sec-8 3987189.8167 (  0.00%) 3986452.2204 ( -0.02%)
> 
> It's only visible at single threaded. The overhead is there for higher
> threads but other factors dominate.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Very well spotted and I wouldn't have figured that out from the profiles
posted previously!

The patch makes sense. I am wondering why do we still have both
__SetPageUptodate and SetPageUptodate when they are same. Historically
they were slightly different but this is no longer the case.

Acked-by: Michal Hocko <mhocko@suse.cz>

Thanks!

> ---
>  mm/memory.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 97839f5c8c30..80a03628bd77 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2158,11 +2158,12 @@ gotten:
>  			goto oom;
>  		cow_user_page(new_page, old_page, address, vma);
>  	}
> -	__SetPageUptodate(new_page);
>  
>  	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
>  		goto oom_free_new;
>  
> +	__SetPageUptodate(new_page);
> +
>  	mmun_start  = address & PAGE_MASK;
>  	mmun_end    = mmun_start + PAGE_SIZE;
>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
> @@ -2594,6 +2595,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	page = alloc_zeroed_user_highpage_movable(vma, address);
>  	if (!page)
>  		goto oom;
> +
> +	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
> +		goto oom_free_page;
> +
>  	/*
>  	 * The memory barrier inside __SetPageUptodate makes sure that
>  	 * preceeding stores to the page contents become visible before
> @@ -2601,9 +2606,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 */
>  	__SetPageUptodate(page);
>  
> -	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
> -		goto oom_free_page;
> -
>  	entry = mk_pte(page, vma->vm_page_prot);
>  	if (vma->vm_flags & VM_WRITE)
>  		entry = pte_mkwrite(pte_mkdirty(entry));
> -- 
> 2.3.5
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
@ 2015-05-20 14:03     ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2015-05-20 14:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Johannes Weiner, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed 20-05-15 13:50:44, Mel Gorman wrote:
> Historically memcg overhead was high even if memcg was unused. This has
> improved a lot but it still showed up in a profile summary as being a
> problem.
> 
> /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
>   mem_cgroup_try_charge                                                        2.950%   175781
>   __mem_cgroup_count_vm_event                                                  1.431%    85239
>   mem_cgroup_page_lruvec                                                       0.456%    27156
>   mem_cgroup_commit_charge                                                     0.392%    23342
>   uncharge_list                                                                0.323%    19256
>   mem_cgroup_update_lru_size                                                   0.278%    16538
>   memcg_check_events                                                           0.216%    12858
>   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
>   try_charge                                                                   0.150%     8928
>   commit_charge                                                                0.141%     8388
>   get_mem_cgroup_from_mm                                                       0.121%     7184
> 
> That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> the cost was checking PageSwapCache which is expected to be cache hot but is
> very expensive. The problem appears to be that __SetPageUptodate is called
> just before the check which is a write barrier. It is required to make sure
> struct page and page data is written before the PTE is updated and the data
> visible to userspace. memcg charging does not require or need the barrier
> but gets unfairly hit with the cost so this patch attempts the charging
> before the barrier.  Aside from the accidental cost to memcg there is the
> added benefit that the barrier is avoided if the page cannot be charged.
> When applied the relevant profile summary is as follows.
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312
>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> That brings the overhead down to 3.79% and leaves the memcg fault accounting
> to the root cgroup but it's an improvement. The difference in headline
> performance of the page fault microbench is marginal as memcg is such a
> small component of it.
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                      vanilla            chargefirst
> Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1509075.7561 (  4.56%)
> Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1339160.7113 ( -0.09%)
> Hmean    faults/cpu-5  875599.0222 (  0.00%)  874174.1255 ( -0.16%)
> Hmean    faults/cpu-7  601146.6726 (  0.00%)  601370.9977 (  0.04%)
> Hmean    faults/cpu-8  510728.2754 (  0.00%)  510598.8214 ( -0.03%)
> Hmean    faults/sec-1 1432084.7845 (  0.00%) 1497935.5274 (  4.60%)
> Hmean    faults/sec-3 3943818.1437 (  0.00%) 3941920.1520 ( -0.05%)
> Hmean    faults/sec-5 3877573.5867 (  0.00%) 3869385.7553 ( -0.21%)
> Hmean    faults/sec-7 3991832.0418 (  0.00%) 3992181.4189 (  0.01%)
> Hmean    faults/sec-8 3987189.8167 (  0.00%) 3986452.2204 ( -0.02%)
> 
> It's only visible at single threaded. The overhead is there for higher
> threads but other factors dominate.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Very well spotted and I wouldn't have figured that out from the profiles
posted previously!

The patch makes sense. I am wondering why do we still have both
__SetPageUptodate and SetPageUptodate when they are same. Historically
they were slightly different but this is no longer the case.

Acked-by: Michal Hocko <mhocko@suse.cz>

Thanks!

> ---
>  mm/memory.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 97839f5c8c30..80a03628bd77 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2158,11 +2158,12 @@ gotten:
>  			goto oom;
>  		cow_user_page(new_page, old_page, address, vma);
>  	}
> -	__SetPageUptodate(new_page);
>  
>  	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
>  		goto oom_free_new;
>  
> +	__SetPageUptodate(new_page);
> +
>  	mmun_start  = address & PAGE_MASK;
>  	mmun_end    = mmun_start + PAGE_SIZE;
>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
> @@ -2594,6 +2595,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	page = alloc_zeroed_user_highpage_movable(vma, address);
>  	if (!page)
>  		goto oom;
> +
> +	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
> +		goto oom_free_page;
> +
>  	/*
>  	 * The memory barrier inside __SetPageUptodate makes sure that
>  	 * preceeding stores to the page contents become visible before
> @@ -2601,9 +2606,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 */
>  	__SetPageUptodate(page);
>  
> -	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
> -		goto oom_free_page;
> -
>  	entry = mk_pte(page, vma->vm_page_prot);
>  	if (vma->vm_flags & VM_WRITE)
>  		entry = pte_mkwrite(pte_mkdirty(entry));
> -- 
> 2.3.5
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
  2015-05-20 13:47     ` Davidlohr Bueso
@ 2015-05-20 14:12       ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2015-05-20 14:12 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Mel Gorman, Johannes Weiner, Andrew Morton, Tejun Heo,
	Linux-CGroups, Linux-MM, LKML, Ben Hutchings

[It seems Ben hasn't made it into the CC list - the thread starts here:
http://article.gmane.org/gmane.linux.kernel.cgroups/13345]

On Wed 20-05-15 06:47:46, Davidlohr Bueso wrote:
> On Wed, 2015-05-20 at 13:50 +0100, Mel Gorman wrote:
> > +config MEMCG_DEFAULT_ENABLED
> > +	bool "Automatically enable memory resource controller"
> > +	default y
> > +	depends on MEMCG
> > +	help
> > +	  The memory controller has some overhead even if idle as resource
> > +	  usage must be tracked in case a group is created and a process
> > +	  migrated. As users may not be aware of this and the cgroup_disable=
> > +	  option, this config option controls whether it is enabled by
> > +	  default. It is assumed that someone that requires the controller
> > +	  can find the cgroup_enable= switch.
> > +
> > +	  Say N if unsure. This is default Y to preserve oldconfig and
> > +	  historical behaviour.
> 
> Out of curiosity, how do you expect distros to handle this? I mean, this
> is a pretty general functionality and customers won't want to be
> changing kernels (they may or may not use memcg). iow, will this ever be
> disabled?

This was exactly my question during the previous iteration. Only those
distribution which either haven't enabled CONFIG_MEMCG at all and want
to start or those which have enabled it but have it runtime disabled
(e.g. Debian) would benefit from such a change. Ben has shown interest
in such a patch because he could drop Debian specific patch. But I am
not sure it still makes sense when the overal runtime overhead is quite
low even for microbenchmarks.

I would personally prefer to not take the patch because we have quite
some config options already but if Debian and potentially others insist
on their current (runtime disabled) policy then it has some merit
to merge it. The interface could be better I guess because cgroups
doesn't allow to enable/disable any other controllers so something like
swapaccount= (e.g. memcgaccount) would be more appropriate.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 14:12       ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2015-05-20 14:12 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Mel Gorman, Johannes Weiner, Andrew Morton, Tejun Heo,
	Linux-CGroups, Linux-MM, LKML, Ben Hutchings

[It seems Ben hasn't made it into the CC list - the thread starts here:
http://article.gmane.org/gmane.linux.kernel.cgroups/13345]

On Wed 20-05-15 06:47:46, Davidlohr Bueso wrote:
> On Wed, 2015-05-20 at 13:50 +0100, Mel Gorman wrote:
> > +config MEMCG_DEFAULT_ENABLED
> > +	bool "Automatically enable memory resource controller"
> > +	default y
> > +	depends on MEMCG
> > +	help
> > +	  The memory controller has some overhead even if idle as resource
> > +	  usage must be tracked in case a group is created and a process
> > +	  migrated. As users may not be aware of this and the cgroup_disable=
> > +	  option, this config option controls whether it is enabled by
> > +	  default. It is assumed that someone that requires the controller
> > +	  can find the cgroup_enable= switch.
> > +
> > +	  Say N if unsure. This is default Y to preserve oldconfig and
> > +	  historical behaviour.
> 
> Out of curiosity, how do you expect distros to handle this? I mean, this
> is a pretty general functionality and customers won't want to be
> changing kernels (they may or may not use memcg). iow, will this ever be
> disabled?

This was exactly my question during the previous iteration. Only those
distribution which either haven't enabled CONFIG_MEMCG at all and want
to start or those which have enabled it but have it runtime disabled
(e.g. Debian) would benefit from such a change. Ben has shown interest
in such a patch because he could drop Debian specific patch. But I am
not sure it still makes sense when the overal runtime overhead is quite
low even for microbenchmarks.

I would personally prefer to not take the patch because we have quite
some config options already but if Debian and potentially others insist
on their current (runtime disabled) policy then it has some merit
to merge it. The interface could be better I guess because cgroups
doesn't allow to enable/disable any other controllers so something like
swapaccount= (e.g. memcgaccount) would be more appropriate.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
  2015-05-20 13:47     ` Davidlohr Bueso
@ 2015-05-20 14:13       ` Mel Gorman
  -1 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 14:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Tejun Heo,
	Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 06:47:46AM -0700, Davidlohr Bueso wrote:
> On Wed, 2015-05-20 at 13:50 +0100, Mel Gorman wrote:
> > +config MEMCG_DEFAULT_ENABLED
> > +	bool "Automatically enable memory resource controller"
> > +	default y
> > +	depends on MEMCG
> > +	help
> > +	  The memory controller has some overhead even if idle as resource
> > +	  usage must be tracked in case a group is created and a process
> > +	  migrated. As users may not be aware of this and the cgroup_disable=
> > +	  option, this config option controls whether it is enabled by
> > +	  default. It is assumed that someone that requires the controller
> > +	  can find the cgroup_enable= switch.
> > +
> > +	  Say N if unsure. This is default Y to preserve oldconfig and
> > +	  historical behaviour.
> 
> Out of curiosity, how do you expect distros to handle this?

Ideally, distros would have been able to leave this disabled by default and
have the user explicitly enable it if it was required. This would have made
a lot of sense when memcg had unconditional memory overhead to go with it.

For distros that wanted to make the change, it would be fine to leave it
disabled on fresh installs. However, if upgrading then the installer would
have to also add the kernel parameter to prevent any possible regressions
for the user.

> I mean, this
> is a pretty general functionality and customers won't want to be
> changing kernels (they may or may not use memcg). iow, will this ever be
> disabled?
> 

It's not that general. It takes explicit user or sysadmin action before
it's used AFAIK.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 14:13       ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 14:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Tejun Heo,
	Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 06:47:46AM -0700, Davidlohr Bueso wrote:
> On Wed, 2015-05-20 at 13:50 +0100, Mel Gorman wrote:
> > +config MEMCG_DEFAULT_ENABLED
> > +	bool "Automatically enable memory resource controller"
> > +	default y
> > +	depends on MEMCG
> > +	help
> > +	  The memory controller has some overhead even if idle as resource
> > +	  usage must be tracked in case a group is created and a process
> > +	  migrated. As users may not be aware of this and the cgroup_disable=
> > +	  option, this config option controls whether it is enabled by
> > +	  default. It is assumed that someone that requires the controller
> > +	  can find the cgroup_enable= switch.
> > +
> > +	  Say N if unsure. This is default Y to preserve oldconfig and
> > +	  historical behaviour.
> 
> Out of curiosity, how do you expect distros to handle this?

Ideally, distros would have been able to leave this disabled by default and
have the user explicitly enable it if it was required. This would have made
a lot of sense when memcg had unconditional memory overhead to go with it.

For distros that wanted to make the change, it would be fine to leave it
disabled on fresh installs. However, if upgrading then the installer would
have to also add the kernel parameter to prevent any possible regressions
for the user.

> I mean, this
> is a pretty general functionality and customers won't want to be
> changing kernels (they may or may not use memcg). iow, will this ever be
> disabled?
> 

It's not that general. It takes explicit user or sysadmin action before
it's used AFAIK.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
  2015-05-20 14:03     ` Michal Hocko
  (?)
@ 2015-05-20 14:18       ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2015-05-20 14:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Johannes Weiner, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed 20-05-15 16:03:53, Michal Hocko wrote:
> I am wondering why do we still have both
> __SetPageUptodate and SetPageUptodate when they are same. Historically
> they were slightly different but this is no longer the case.

Bahh,  I am blind and failed spot a difference game. It is __set_bit vs
set_bit. Sorry about the noise.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
@ 2015-05-20 14:18       ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2015-05-20 14:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Johannes Weiner, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed 20-05-15 16:03:53, Michal Hocko wrote:
> I am wondering why do we still have both
> __SetPageUptodate and SetPageUptodate when they are same. Historically
> they were slightly different but this is no longer the case.

Bahh,  I am blind and failed spot a difference game. It is __set_bit vs
set_bit. Sorry about the noise.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
@ 2015-05-20 14:18       ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2015-05-20 14:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Johannes Weiner, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed 20-05-15 16:03:53, Michal Hocko wrote:
> I am wondering why do we still have both
> __SetPageUptodate and SetPageUptodate when they are same. Historically
> they were slightly different but this is no longer the case.

Bahh,  I am blind and failed spot a difference game. It is __set_bit vs
set_bit. Sorry about the noise.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
  2015-05-20 12:50   ` Mel Gorman
@ 2015-05-20 15:29     ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2015-05-20 15:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 01:50:44PM +0100, Mel Gorman wrote:
> Historically memcg overhead was high even if memcg was unused. This has
> improved a lot but it still showed up in a profile summary as being a
> problem.
> 
> /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
>   mem_cgroup_try_charge                                                        2.950%   175781
>   __mem_cgroup_count_vm_event                                                  1.431%    85239
>   mem_cgroup_page_lruvec                                                       0.456%    27156
>   mem_cgroup_commit_charge                                                     0.392%    23342
>   uncharge_list                                                                0.323%    19256
>   mem_cgroup_update_lru_size                                                   0.278%    16538
>   memcg_check_events                                                           0.216%    12858
>   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
>   try_charge                                                                   0.150%     8928
>   commit_charge                                                                0.141%     8388
>   get_mem_cgroup_from_mm                                                       0.121%     7184
> 
> That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> the cost was checking PageSwapCache which is expected to be cache hot but is
> very expensive. The problem appears to be that __SetPageUptodate is called
> just before the check which is a write barrier. It is required to make sure
> struct page and page data is written before the PTE is updated and the data
> visible to userspace. memcg charging does not require or need the barrier
> but gets unfairly hit with the cost so this patch attempts the charging
> before the barrier.  Aside from the accidental cost to memcg there is the
> added benefit that the barrier is avoided if the page cannot be charged.
> When applied the relevant profile summary is as follows.
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312

Out of curiosity, I'm still consistently reading this function at
around 0.7%.  Are you profiling this single-threadedly or for the
entire run?  For profiling 80 single-threaded iterations, I get:

+    1.31%     0.59%              pft  [kernel.kallsyms]            [k] mem_cgroup_try_charge
+    0.72%     0.44%              pft  [kernel.kallsyms]            [k] mem_cgroup_commit_charge
+    0.67%     0.67%              pft  [kernel.kallsyms]            [k] __mem_cgroup_count_vm_event
+    0.57%     0.57%              pft  [kernel.kallsyms]            [k] get_mem_cgroup_from_mm
+    0.32%     0.01%              pft  [kernel.kallsyms]            [k] mem_cgroup_uncharge_list
+    0.42%     0.42%              pft  [kernel.kallsyms]            [k] mem_cgroup_page_lruvec
+    0.31%     0.30%              pft  [kernel.kallsyms]            [k] uncharge_list
+    0.28%     0.28%              pft  [kernel.kallsyms]            [k] try_charge
+    0.21%     0.21%              pft  [kernel.kallsyms]            [k] mem_cgroup_charge_statistics.isra.26
+    0.20%     0.20%              pft  [kernel.kallsyms]            [k] mem_cgroup_update_lru_size
+    0.13%     0.13%              pft  [kernel.kallsyms]            [k] commit_charge
+    0.10%     0.09%              pft  [kernel.kallsyms]            [k] memcg_check_events

Adding up the recursive profile (first column) for the entry functions
(try_charge, commit, pgfault accounting, uncharge), this yields 3.02%.

>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> That brings the overhead down to 3.79% and leaves the memcg fault accounting
> to the root cgroup but it's an improvement. The difference in headline
> performance of the page fault microbench is marginal as memcg is such a
> small component of it.
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                      vanilla            chargefirst
> Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1509075.7561 (  4.56%)
> Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1339160.7113 ( -0.09%)
> Hmean    faults/cpu-5  875599.0222 (  0.00%)  874174.1255 ( -0.16%)
> Hmean    faults/cpu-7  601146.6726 (  0.00%)  601370.9977 (  0.04%)
> Hmean    faults/cpu-8  510728.2754 (  0.00%)  510598.8214 ( -0.03%)
> Hmean    faults/sec-1 1432084.7845 (  0.00%) 1497935.5274 (  4.60%)
> Hmean    faults/sec-3 3943818.1437 (  0.00%) 3941920.1520 ( -0.05%)
> Hmean    faults/sec-5 3877573.5867 (  0.00%) 3869385.7553 ( -0.21%)
> Hmean    faults/sec-7 3991832.0418 (  0.00%) 3992181.4189 (  0.01%)
> Hmean    faults/sec-8 3987189.8167 (  0.00%) 3986452.2204 ( -0.02%)
> 
> It's only visible at single threaded. The overhead is there for higher
> threads but other factors dominate.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Awesome analysis, thank you Mel.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
@ 2015-05-20 15:29     ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2015-05-20 15:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 01:50:44PM +0100, Mel Gorman wrote:
> Historically memcg overhead was high even if memcg was unused. This has
> improved a lot but it still showed up in a profile summary as being a
> problem.
> 
> /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
>   mem_cgroup_try_charge                                                        2.950%   175781
>   __mem_cgroup_count_vm_event                                                  1.431%    85239
>   mem_cgroup_page_lruvec                                                       0.456%    27156
>   mem_cgroup_commit_charge                                                     0.392%    23342
>   uncharge_list                                                                0.323%    19256
>   mem_cgroup_update_lru_size                                                   0.278%    16538
>   memcg_check_events                                                           0.216%    12858
>   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
>   try_charge                                                                   0.150%     8928
>   commit_charge                                                                0.141%     8388
>   get_mem_cgroup_from_mm                                                       0.121%     7184
> 
> That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> the cost was checking PageSwapCache which is expected to be cache hot but is
> very expensive. The problem appears to be that __SetPageUptodate is called
> just before the check which is a write barrier. It is required to make sure
> struct page and page data is written before the PTE is updated and the data
> visible to userspace. memcg charging does not require or need the barrier
> but gets unfairly hit with the cost so this patch attempts the charging
> before the barrier.  Aside from the accidental cost to memcg there is the
> added benefit that the barrier is avoided if the page cannot be charged.
> When applied the relevant profile summary is as follows.
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312

Out of curiosity, I'm still consistently reading this function at
around 0.7%.  Are you profiling this single-threadedly or for the
entire run?  For profiling 80 single-threaded iterations, I get:

+    1.31%     0.59%              pft  [kernel.kallsyms]            [k] mem_cgroup_try_charge
+    0.72%     0.44%              pft  [kernel.kallsyms]            [k] mem_cgroup_commit_charge
+    0.67%     0.67%              pft  [kernel.kallsyms]            [k] __mem_cgroup_count_vm_event
+    0.57%     0.57%              pft  [kernel.kallsyms]            [k] get_mem_cgroup_from_mm
+    0.32%     0.01%              pft  [kernel.kallsyms]            [k] mem_cgroup_uncharge_list
+    0.42%     0.42%              pft  [kernel.kallsyms]            [k] mem_cgroup_page_lruvec
+    0.31%     0.30%              pft  [kernel.kallsyms]            [k] uncharge_list
+    0.28%     0.28%              pft  [kernel.kallsyms]            [k] try_charge
+    0.21%     0.21%              pft  [kernel.kallsyms]            [k] mem_cgroup_charge_statistics.isra.26
+    0.20%     0.20%              pft  [kernel.kallsyms]            [k] mem_cgroup_update_lru_size
+    0.13%     0.13%              pft  [kernel.kallsyms]            [k] commit_charge
+    0.10%     0.09%              pft  [kernel.kallsyms]            [k] memcg_check_events

Adding up the recursive profile (first column) for the entry functions
(try_charge, commit, pgfault accounting, uncharge), this yields 3.02%.

>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> That brings the overhead down to 3.79% and leaves the memcg fault accounting
> to the root cgroup but it's an improvement. The difference in headline
> performance of the page fault microbench is marginal as memcg is such a
> small component of it.
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                      vanilla            chargefirst
> Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1509075.7561 (  4.56%)
> Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1339160.7113 ( -0.09%)
> Hmean    faults/cpu-5  875599.0222 (  0.00%)  874174.1255 ( -0.16%)
> Hmean    faults/cpu-7  601146.6726 (  0.00%)  601370.9977 (  0.04%)
> Hmean    faults/cpu-8  510728.2754 (  0.00%)  510598.8214 ( -0.03%)
> Hmean    faults/sec-1 1432084.7845 (  0.00%) 1497935.5274 (  4.60%)
> Hmean    faults/sec-3 3943818.1437 (  0.00%) 3941920.1520 ( -0.05%)
> Hmean    faults/sec-5 3877573.5867 (  0.00%) 3869385.7553 ( -0.21%)
> Hmean    faults/sec-7 3991832.0418 (  0.00%) 3992181.4189 (  0.01%)
> Hmean    faults/sec-8 3987189.8167 (  0.00%) 3986452.2204 ( -0.02%)
> 
> It's only visible at single threaded. The overhead is there for higher
> threads but other factors dominate.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Awesome analysis, thank you Mel.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
  2015-05-20 15:29     ` Johannes Weiner
  (?)
@ 2015-05-20 16:15       ` Mel Gorman
  -1 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 16:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 11:29:23AM -0400, Johannes Weiner wrote:
> On Wed, May 20, 2015 at 01:50:44PM +0100, Mel Gorman wrote:
> > Historically memcg overhead was high even if memcg was unused. This has
> > improved a lot but it still showed up in a profile summary as being a
> > problem.
> > 
> > /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
> >   mem_cgroup_try_charge                                                        2.950%   175781
> >   __mem_cgroup_count_vm_event                                                  1.431%    85239
> >   mem_cgroup_page_lruvec                                                       0.456%    27156
> >   mem_cgroup_commit_charge                                                     0.392%    23342
> >   uncharge_list                                                                0.323%    19256
> >   mem_cgroup_update_lru_size                                                   0.278%    16538
> >   memcg_check_events                                                           0.216%    12858
> >   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
> >   try_charge                                                                   0.150%     8928
> >   commit_charge                                                                0.141%     8388
> >   get_mem_cgroup_from_mm                                                       0.121%     7184
> > 
> > That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> > dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> > the cost was checking PageSwapCache which is expected to be cache hot but is
> > very expensive. The problem appears to be that __SetPageUptodate is called
> > just before the check which is a write barrier. It is required to make sure
> > struct page and page data is written before the PTE is updated and the data
> > visible to userspace. memcg charging does not require or need the barrier
> > but gets unfairly hit with the cost so this patch attempts the charging
> > before the barrier.  Aside from the accidental cost to memcg there is the
> > added benefit that the barrier is avoided if the page cannot be charged.
> > When applied the relevant profile summary is as follows.
> > 
> > /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
> >   __mem_cgroup_count_vm_event                                                  1.143%    67312
> 
> Out of curiosity, I'm still consistently reading this function at
> around 0.7%.  Are you profiling this single-threadedly or for the
> entire run?  For profiling 80 single-threaded iterations, I get:
> 

Single-threaded. The mmtests benchmark in question supports gathering one
profile per thread count so it's just the 1 thread profile I included in
the changelog. The CPU in question is a i7-3770

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
@ 2015-05-20 16:15       ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 16:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 11:29:23AM -0400, Johannes Weiner wrote:
> On Wed, May 20, 2015 at 01:50:44PM +0100, Mel Gorman wrote:
> > Historically memcg overhead was high even if memcg was unused. This has
> > improved a lot but it still showed up in a profile summary as being a
> > problem.
> > 
> > /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
> >   mem_cgroup_try_charge                                                        2.950%   175781
> >   __mem_cgroup_count_vm_event                                                  1.431%    85239
> >   mem_cgroup_page_lruvec                                                       0.456%    27156
> >   mem_cgroup_commit_charge                                                     0.392%    23342
> >   uncharge_list                                                                0.323%    19256
> >   mem_cgroup_update_lru_size                                                   0.278%    16538
> >   memcg_check_events                                                           0.216%    12858
> >   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
> >   try_charge                                                                   0.150%     8928
> >   commit_charge                                                                0.141%     8388
> >   get_mem_cgroup_from_mm                                                       0.121%     7184
> > 
> > That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> > dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> > the cost was checking PageSwapCache which is expected to be cache hot but is
> > very expensive. The problem appears to be that __SetPageUptodate is called
> > just before the check which is a write barrier. It is required to make sure
> > struct page and page data is written before the PTE is updated and the data
> > visible to userspace. memcg charging does not require or need the barrier
> > but gets unfairly hit with the cost so this patch attempts the charging
> > before the barrier.  Aside from the accidental cost to memcg there is the
> > added benefit that the barrier is avoided if the page cannot be charged.
> > When applied the relevant profile summary is as follows.
> > 
> > /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
> >   __mem_cgroup_count_vm_event                                                  1.143%    67312
> 
> Out of curiosity, I'm still consistently reading this function at
> around 0.7%.  Are you profiling this single-threadedly or for the
> entire run?  For profiling 80 single-threaded iterations, I get:
> 

Single-threaded. The mmtests benchmark in question supports gathering one
profile per thread count so it's just the 1 thread profile I included in
the changelog. The CPU in question is a i7-3770

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date
@ 2015-05-20 16:15       ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 16:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 11:29:23AM -0400, Johannes Weiner wrote:
> On Wed, May 20, 2015 at 01:50:44PM +0100, Mel Gorman wrote:
> > Historically memcg overhead was high even if memcg was unused. This has
> > improved a lot but it still showed up in a profile summary as being a
> > problem.
> > 
> > /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
> >   mem_cgroup_try_charge                                                        2.950%   175781
> >   __mem_cgroup_count_vm_event                                                  1.431%    85239
> >   mem_cgroup_page_lruvec                                                       0.456%    27156
> >   mem_cgroup_commit_charge                                                     0.392%    23342
> >   uncharge_list                                                                0.323%    19256
> >   mem_cgroup_update_lru_size                                                   0.278%    16538
> >   memcg_check_events                                                           0.216%    12858
> >   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
> >   try_charge                                                                   0.150%     8928
> >   commit_charge                                                                0.141%     8388
> >   get_mem_cgroup_from_mm                                                       0.121%     7184
> > 
> > That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> > dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> > the cost was checking PageSwapCache which is expected to be cache hot but is
> > very expensive. The problem appears to be that __SetPageUptodate is called
> > just before the check which is a write barrier. It is required to make sure
> > struct page and page data is written before the PTE is updated and the data
> > visible to userspace. memcg charging does not require or need the barrier
> > but gets unfairly hit with the cost so this patch attempts the charging
> > before the barrier.  Aside from the accidental cost to memcg there is the
> > added benefit that the barrier is avoided if the page cannot be charged.
> > When applied the relevant profile summary is as follows.
> > 
> > /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
> >   __mem_cgroup_count_vm_event                                                  1.143%    67312
> 
> Out of curiosity, I'm still consistently reading this function at
> around 0.7%.  Are you profiling this single-threadedly or for the
> entire run?  For profiling 80 single-threaded iterations, I get:
> 

Single-threaded. The mmtests benchmark in question supports gathering one
profile per thread count so it's just the 1 thread profile I included in
the changelog. The CPU in question is a i7-3770

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
  2015-05-20 12:50   ` Mel Gorman
  (?)
@ 2015-05-20 16:24     ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2015-05-20 16:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 01:50:45PM +0100, Mel Gorman wrote:
> memcg was reported years ago to have significant overhead when unused. It
> has improved but it's still the case that users that have no knowledge of
> memcg pay a small performance penalty.
>
> This patch adds a Kconfig that controls whether memcg is enabled by default
> and a kernel parameter cgroup_enable= to enable it if desired. Anyone using
> oldconfig will get the historical behaviour. It is not an option for most
> distributions to simply disable MEMCG as there are users that require it
> but they should also be knowledgable enough to use cgroup_enable=.
>
> This was evaluated using aim9, a page fault microbenchmark and ebizzy
> but I'll focus on the page fault microbenchmark. It can be reproduced
> using pft from mmtests (https://github.com/gormanm/mmtests).  Edit
> configs/config-global-dhp__pagealloc-performance and update MMTESTS to
> only contain pft. This is the relevant part of the profile summary
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312
>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> It's showing 3.79% overhead in memcontrol.c when no memcgs are in
> use. Applying the patch and disabling memcg reduces this to 0.51%
> 
> /usr/src/linux-4.0-disable-v2r1/mm/memcontrol.c                      0.5100    29304
>   mem_cgroup_page_lruvec                                                       0.161%     9267
>   mem_cgroup_update_lru_size                                                   0.154%     8872
>   mem_cgroup_try_charge                                                        0.153%     8768
>   mem_cgroup_commit_charge                                                     0.042%     2397
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                  chargefirst                disable
> Hmean    faults/cpu-1 1509075.7561 (  0.00%) 1508934.4568 ( -0.01%)
> Hmean    faults/cpu-3 1339160.7113 (  0.00%) 1379512.0698 (  3.01%)
> Hmean    faults/cpu-5  874174.1255 (  0.00%)  875741.7674 (  0.18%)
> Hmean    faults/cpu-7  601370.9977 (  0.00%)  599938.2026 ( -0.24%)
> Hmean    faults/cpu-8  510598.8214 (  0.00%)  510663.5402 (  0.01%)
> Hmean    faults/sec-1 1497935.5274 (  0.00%) 1496585.7400 ( -0.09%)
> Hmean    faults/sec-3 3941920.1520 (  0.00%) 4050811.9259 (  2.76%)
> Hmean    faults/sec-5 3869385.7553 (  0.00%) 3922299.6112 (  1.37%)
> Hmean    faults/sec-7 3992181.4189 (  0.00%) 3988511.0065 ( -0.09%)
> Hmean    faults/sec-8 3986452.2204 (  0.00%) 3977706.7883 ( -0.22%)
> 
> Low thread counts get a small boost but it's within noise as memcg overhead
> does not dominate. It's not obvious at all at higher thread counts as other
> factors cause more problems. The overall breakdown of CPU usage looks like
> 
>                4.0.0       4.0.0
>         chargefirst-v2r1disable-v2r1
> User           41.81       41.45
> System        407.64      405.50
> Elapsed       128.17      127.06

This is a worst case microbenchmark doing nothing but anonymous page
faults (with THP disabled), and yet the performance difference is in
the noise.  I don't see why we should burden the user with making a
decision that doesn't matter in theory, let alone in practice.

We have CONFIG_MEMCG and cgroup_disable=memory, that should be plenty
for users that obsess about fluctuation in the noise.  There is no
reason to complicate the world further for everybody else.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 16:24     ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2015-05-20 16:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 01:50:45PM +0100, Mel Gorman wrote:
> memcg was reported years ago to have significant overhead when unused. It
> has improved but it's still the case that users that have no knowledge of
> memcg pay a small performance penalty.
>
> This patch adds a Kconfig that controls whether memcg is enabled by default
> and a kernel parameter cgroup_enable= to enable it if desired. Anyone using
> oldconfig will get the historical behaviour. It is not an option for most
> distributions to simply disable MEMCG as there are users that require it
> but they should also be knowledgable enough to use cgroup_enable=.
>
> This was evaluated using aim9, a page fault microbenchmark and ebizzy
> but I'll focus on the page fault microbenchmark. It can be reproduced
> using pft from mmtests (https://github.com/gormanm/mmtests).  Edit
> configs/config-global-dhp__pagealloc-performance and update MMTESTS to
> only contain pft. This is the relevant part of the profile summary
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312
>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> It's showing 3.79% overhead in memcontrol.c when no memcgs are in
> use. Applying the patch and disabling memcg reduces this to 0.51%
> 
> /usr/src/linux-4.0-disable-v2r1/mm/memcontrol.c                      0.5100    29304
>   mem_cgroup_page_lruvec                                                       0.161%     9267
>   mem_cgroup_update_lru_size                                                   0.154%     8872
>   mem_cgroup_try_charge                                                        0.153%     8768
>   mem_cgroup_commit_charge                                                     0.042%     2397
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                  chargefirst                disable
> Hmean    faults/cpu-1 1509075.7561 (  0.00%) 1508934.4568 ( -0.01%)
> Hmean    faults/cpu-3 1339160.7113 (  0.00%) 1379512.0698 (  3.01%)
> Hmean    faults/cpu-5  874174.1255 (  0.00%)  875741.7674 (  0.18%)
> Hmean    faults/cpu-7  601370.9977 (  0.00%)  599938.2026 ( -0.24%)
> Hmean    faults/cpu-8  510598.8214 (  0.00%)  510663.5402 (  0.01%)
> Hmean    faults/sec-1 1497935.5274 (  0.00%) 1496585.7400 ( -0.09%)
> Hmean    faults/sec-3 3941920.1520 (  0.00%) 4050811.9259 (  2.76%)
> Hmean    faults/sec-5 3869385.7553 (  0.00%) 3922299.6112 (  1.37%)
> Hmean    faults/sec-7 3992181.4189 (  0.00%) 3988511.0065 ( -0.09%)
> Hmean    faults/sec-8 3986452.2204 (  0.00%) 3977706.7883 ( -0.22%)
> 
> Low thread counts get a small boost but it's within noise as memcg overhead
> does not dominate. It's not obvious at all at higher thread counts as other
> factors cause more problems. The overall breakdown of CPU usage looks like
> 
>                4.0.0       4.0.0
>         chargefirst-v2r1disable-v2r1
> User           41.81       41.45
> System        407.64      405.50
> Elapsed       128.17      127.06

This is a worst case microbenchmark doing nothing but anonymous page
faults (with THP disabled), and yet the performance difference is in
the noise.  I don't see why we should burden the user with making a
decision that doesn't matter in theory, let alone in practice.

We have CONFIG_MEMCG and cgroup_disable=memory, that should be plenty
for users that obsess about fluctuation in the noise.  There is no
reason to complicate the world further for everybody else.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 16:24     ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2015-05-20 16:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 01:50:45PM +0100, Mel Gorman wrote:
> memcg was reported years ago to have significant overhead when unused. It
> has improved but it's still the case that users that have no knowledge of
> memcg pay a small performance penalty.
>
> This patch adds a Kconfig that controls whether memcg is enabled by default
> and a kernel parameter cgroup_enable= to enable it if desired. Anyone using
> oldconfig will get the historical behaviour. It is not an option for most
> distributions to simply disable MEMCG as there are users that require it
> but they should also be knowledgable enough to use cgroup_enable=.
>
> This was evaluated using aim9, a page fault microbenchmark and ebizzy
> but I'll focus on the page fault microbenchmark. It can be reproduced
> using pft from mmtests (https://github.com/gormanm/mmtests).  Edit
> configs/config-global-dhp__pagealloc-performance and update MMTESTS to
> only contain pft. This is the relevant part of the profile summary
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312
>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> It's showing 3.79% overhead in memcontrol.c when no memcgs are in
> use. Applying the patch and disabling memcg reduces this to 0.51%
> 
> /usr/src/linux-4.0-disable-v2r1/mm/memcontrol.c                      0.5100    29304
>   mem_cgroup_page_lruvec                                                       0.161%     9267
>   mem_cgroup_update_lru_size                                                   0.154%     8872
>   mem_cgroup_try_charge                                                        0.153%     8768
>   mem_cgroup_commit_charge                                                     0.042%     2397
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                  chargefirst                disable
> Hmean    faults/cpu-1 1509075.7561 (  0.00%) 1508934.4568 ( -0.01%)
> Hmean    faults/cpu-3 1339160.7113 (  0.00%) 1379512.0698 (  3.01%)
> Hmean    faults/cpu-5  874174.1255 (  0.00%)  875741.7674 (  0.18%)
> Hmean    faults/cpu-7  601370.9977 (  0.00%)  599938.2026 ( -0.24%)
> Hmean    faults/cpu-8  510598.8214 (  0.00%)  510663.5402 (  0.01%)
> Hmean    faults/sec-1 1497935.5274 (  0.00%) 1496585.7400 ( -0.09%)
> Hmean    faults/sec-3 3941920.1520 (  0.00%) 4050811.9259 (  2.76%)
> Hmean    faults/sec-5 3869385.7553 (  0.00%) 3922299.6112 (  1.37%)
> Hmean    faults/sec-7 3992181.4189 (  0.00%) 3988511.0065 ( -0.09%)
> Hmean    faults/sec-8 3986452.2204 (  0.00%) 3977706.7883 ( -0.22%)
> 
> Low thread counts get a small boost but it's within noise as memcg overhead
> does not dominate. It's not obvious at all at higher thread counts as other
> factors cause more problems. The overall breakdown of CPU usage looks like
> 
>                4.0.0       4.0.0
>         chargefirst-v2r1disable-v2r1
> User           41.81       41.45
> System        407.64      405.50
> Elapsed       128.17      127.06

This is a worst case microbenchmark doing nothing but anonymous page
faults (with THP disabled), and yet the performance difference is in
the noise.  I don't see why we should burden the user with making a
decision that doesn't matter in theory, let alone in practice.

We have CONFIG_MEMCG and cgroup_disable=memory, that should be plenty
for users that obsess about fluctuation in the noise.  There is no
reason to complicate the world further for everybody else.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
  2015-05-20 16:24     ` Johannes Weiner
  (?)
@ 2015-05-20 16:44       ` Mel Gorman
  -1 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 16:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 12:24:21PM -0400, Johannes Weiner wrote:
> > 
> > Low thread counts get a small boost but it's within noise as memcg overhead
> > does not dominate. It's not obvious at all at higher thread counts as other
> > factors cause more problems. The overall breakdown of CPU usage looks like
> > 
> >                4.0.0       4.0.0
> >         chargefirst-v2r1disable-v2r1
> > User           41.81       41.45
> > System        407.64      405.50
> > Elapsed       128.17      127.06
> 
> This is a worst case microbenchmark doing nothing but anonymous page
> faults (with THP disabled), and yet the performance difference is in
> the noise.  I don't see why we should burden the user with making a
> decision that doesn't matter in theory, let alone in practice.
> 
> We have CONFIG_MEMCG and cgroup_disable=memory, that should be plenty
> for users that obsess about fluctuation in the noise.  There is no
> reason to complicate the world further for everybody else.

FWIW, I agree and only included this patch because I said I would
yesterday. After patch 1, there is almost no motivation to disable memcg
at all let alone by default.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 16:44       ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 16:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 12:24:21PM -0400, Johannes Weiner wrote:
> > 
> > Low thread counts get a small boost but it's within noise as memcg overhead
> > does not dominate. It's not obvious at all at higher thread counts as other
> > factors cause more problems. The overall breakdown of CPU usage looks like
> > 
> >                4.0.0       4.0.0
> >         chargefirst-v2r1disable-v2r1
> > User           41.81       41.45
> > System        407.64      405.50
> > Elapsed       128.17      127.06
> 
> This is a worst case microbenchmark doing nothing but anonymous page
> faults (with THP disabled), and yet the performance difference is in
> the noise.  I don't see why we should burden the user with making a
> decision that doesn't matter in theory, let alone in practice.
> 
> We have CONFIG_MEMCG and cgroup_disable=memory, that should be plenty
> for users that obsess about fluctuation in the noise.  There is no
> reason to complicate the world further for everybody else.

FWIW, I agree and only included this patch because I said I would
yesterday. After patch 1, there is almost no motivation to disable memcg
at all let alone by default.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig
@ 2015-05-20 16:44       ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2015-05-20 16:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Linux-CGroups, Linux-MM, LKML

On Wed, May 20, 2015 at 12:24:21PM -0400, Johannes Weiner wrote:
> > 
> > Low thread counts get a small boost but it's within noise as memcg overhead
> > does not dominate. It's not obvious at all at higher thread counts as other
> > factors cause more problems. The overall breakdown of CPU usage looks like
> > 
> >                4.0.0       4.0.0
> >         chargefirst-v2r1disable-v2r1
> > User           41.81       41.45
> > System        407.64      405.50
> > Elapsed       128.17      127.06
> 
> This is a worst case microbenchmark doing nothing but anonymous page
> faults (with THP disabled), and yet the performance difference is in
> the noise.  I don't see why we should burden the user with making a
> decision that doesn't matter in theory, let alone in practice.
> 
> We have CONFIG_MEMCG and cgroup_disable=memory, that should be plenty
> for users that obsess about fluctuation in the noise.  There is no
> reason to complicate the world further for everybody else.

FWIW, I agree and only included this patch because I said I would
yesterday. After patch 1, there is almost no motivation to disable memcg
at all let alone by default.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-05-20 16:44 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-20 12:50 [PATCH 0/2] Reduce overhead of memcg when unused Mel Gorman
2015-05-20 12:50 ` Mel Gorman
2015-05-20 12:50 ` Mel Gorman
2015-05-20 12:50 ` [PATCH 1/2] mm, memcg: Try charging a page before setting page up to date Mel Gorman
2015-05-20 12:50   ` Mel Gorman
2015-05-20 14:03   ` Michal Hocko
2015-05-20 14:03     ` Michal Hocko
2015-05-20 14:18     ` Michal Hocko
2015-05-20 14:18       ` Michal Hocko
2015-05-20 14:18       ` Michal Hocko
2015-05-20 15:29   ` Johannes Weiner
2015-05-20 15:29     ` Johannes Weiner
2015-05-20 16:15     ` Mel Gorman
2015-05-20 16:15       ` Mel Gorman
2015-05-20 16:15       ` Mel Gorman
2015-05-20 12:50 ` [PATCH 2/2] mm, memcg: Optionally disable memcg by default using Kconfig Mel Gorman
2015-05-20 12:50   ` Mel Gorman
2015-05-20 13:47   ` Davidlohr Bueso
2015-05-20 13:47     ` Davidlohr Bueso
2015-05-20 14:12     ` Michal Hocko
2015-05-20 14:12       ` Michal Hocko
2015-05-20 14:13     ` Mel Gorman
2015-05-20 14:13       ` Mel Gorman
2015-05-20 16:24   ` Johannes Weiner
2015-05-20 16:24     ` Johannes Weiner
2015-05-20 16:24     ` Johannes Weiner
2015-05-20 16:44     ` Mel Gorman
2015-05-20 16:44       ` Mel Gorman
2015-05-20 16:44       ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.