* [PATCH 0/7] Per-cgroup page stat accounting @ 2012-06-28 10:54 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:54 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju This patch series provide the ability for each memory cgroup to have independent dirty/writeback page stats. This can provide some information for per-cgroup direct reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. Three features are included in this patch series: (0).prepare patches for page accounting 1. memcg dirty page accounting 2. memcg writeback page accounting 3. memcg OOMs dump info In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify page info" and "dirty page accouting" stay in one function as much as possible for the sake of memcg bigger lock. These patches are cooked based on Andrew's akpm tree. Sha Zhengju (7): memcg-update-cgroup-memory-document.patch memcg-remove-MEMCG_NR_FILE_MAPPED.patch Make-TestSetPageDirty-and-dirty-page-accounting-in-o.patch Use-vfs-__set_page_dirty-interface-instead-of-doing-.patch memcg-add-per-cgroup-dirty-pages-accounting.patch memcg-add-per-cgroup-writeback-pages-accounting.patch memcg-print-more-detailed-info-while-memcg-oom-happe.patch Documentation/cgroups/memory.txt | 2 + fs/buffer.c | 36 +++++++++----- fs/ceph/addr.c | 20 +------- include/linux/buffer_head.h | 2 + include/linux/memcontrol.h | 27 +++++++--- mm/filemap.c | 5 ++ mm/memcontrol.c | 99 +++++++++++++++++++++++-------------- mm/page-writeback.c | 42 ++++++++++++++-- mm/rmap.c | 4 +- mm/truncate.c | 6 ++ 10 files changed, 159 insertions(+), 84 deletions(-) ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 0/7] Per-cgroup page stat accounting @ 2012-06-28 10:54 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:54 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA Cc: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju This patch series provide the ability for each memory cgroup to have independent dirty/writeback page stats. This can provide some information for per-cgroup direct reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. Three features are included in this patch series: (0).prepare patches for page accounting 1. memcg dirty page accounting 2. memcg writeback page accounting 3. memcg OOMs dump info In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify page info" and "dirty page accouting" stay in one function as much as possible for the sake of memcg bigger lock. These patches are cooked based on Andrew's akpm tree. Sha Zhengju (7): memcg-update-cgroup-memory-document.patch memcg-remove-MEMCG_NR_FILE_MAPPED.patch Make-TestSetPageDirty-and-dirty-page-accounting-in-o.patch Use-vfs-__set_page_dirty-interface-instead-of-doing-.patch memcg-add-per-cgroup-dirty-pages-accounting.patch memcg-add-per-cgroup-writeback-pages-accounting.patch memcg-print-more-detailed-info-while-memcg-oom-happe.patch Documentation/cgroups/memory.txt | 2 + fs/buffer.c | 36 +++++++++----- fs/ceph/addr.c | 20 +------- include/linux/buffer_head.h | 2 + include/linux/memcontrol.h | 27 +++++++--- mm/filemap.c | 5 ++ mm/memcontrol.c | 99 +++++++++++++++++++++++-------------- mm/page-writeback.c | 42 ++++++++++++++-- mm/rmap.c | 4 +- mm/truncate.c | 6 ++ 10 files changed, 159 insertions(+), 84 deletions(-) ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 0/7] Per-cgroup page stat accounting @ 2012-06-28 10:54 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:54 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju This patch series provide the ability for each memory cgroup to have independent dirty/writeback page stats. This can provide some information for per-cgroup direct reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. Three features are included in this patch series: (0).prepare patches for page accounting 1. memcg dirty page accounting 2. memcg writeback page accounting 3. memcg OOMs dump info In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify page info" and "dirty page accouting" stay in one function as much as possible for the sake of memcg bigger lock. These patches are cooked based on Andrew's akpm tree. Sha Zhengju (7): memcg-update-cgroup-memory-document.patch memcg-remove-MEMCG_NR_FILE_MAPPED.patch Make-TestSetPageDirty-and-dirty-page-accounting-in-o.patch Use-vfs-__set_page_dirty-interface-instead-of-doing-.patch memcg-add-per-cgroup-dirty-pages-accounting.patch memcg-add-per-cgroup-writeback-pages-accounting.patch memcg-print-more-detailed-info-while-memcg-oom-happe.patch Documentation/cgroups/memory.txt | 2 + fs/buffer.c | 36 +++++++++----- fs/ceph/addr.c | 20 +------- include/linux/buffer_head.h | 2 + include/linux/memcontrol.h | 27 +++++++--- mm/filemap.c | 5 ++ mm/memcontrol.c | 99 +++++++++++++++++++++++-------------- mm/page-writeback.c | 42 ++++++++++++++-- mm/rmap.c | 4 +- mm/truncate.c | 6 ++ 10 files changed, 159 insertions(+), 84 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 1/7] memcg: update cgroup memory document 2012-06-28 10:54 ` Sha Zhengju (?) @ 2012-06-28 10:57 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:57 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Document cgroup dirty/writeback memory statistics. The implementation for these new interface routines come in a series of following patches. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- Documentation/cgroups/memory.txt | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index dd88540..24d7e3c 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -420,6 +420,8 @@ pgpgin - # of charging events to the memory cgroup. The charging pgpgout - # of uncharging events to the memory cgroup. The uncharging event happens each time a page is unaccounted from the cgroup. swap - # of bytes of swap usage +dirty - # of bytes that are waiting to get written back to the disk. +writeback - # of bytes that are actively being written back to the disk. inactive_anon - # of bytes of anonymous memory and swap cache memory on LRU list. active_anon - # of bytes of anonymous and swap cache memory on active -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 1/7] memcg: update cgroup memory document @ 2012-06-28 10:57 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:57 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA Cc: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> Document cgroup dirty/writeback memory statistics. The implementation for these new interface routines come in a series of following patches. Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> --- Documentation/cgroups/memory.txt | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index dd88540..24d7e3c 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -420,6 +420,8 @@ pgpgin - # of charging events to the memory cgroup. The charging pgpgout - # of uncharging events to the memory cgroup. The uncharging event happens each time a page is unaccounted from the cgroup. swap - # of bytes of swap usage +dirty - # of bytes that are waiting to get written back to the disk. +writeback - # of bytes that are actively being written back to the disk. inactive_anon - # of bytes of anonymous memory and swap cache memory on LRU list. active_anon - # of bytes of anonymous and swap cache memory on active -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 1/7] memcg: update cgroup memory document @ 2012-06-28 10:57 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:57 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Document cgroup dirty/writeback memory statistics. The implementation for these new interface routines come in a series of following patches. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- Documentation/cgroups/memory.txt | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index dd88540..24d7e3c 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -420,6 +420,8 @@ pgpgin - # of charging events to the memory cgroup. The charging pgpgout - # of uncharging events to the memory cgroup. The uncharging event happens each time a page is unaccounted from the cgroup. swap - # of bytes of swap usage +dirty - # of bytes that are waiting to get written back to the disk. +writeback - # of bytes that are actively being written back to the disk. inactive_anon - # of bytes of anonymous memory and swap cache memory on LRU list. active_anon - # of bytes of anonymous and swap cache memory on active -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document 2012-06-28 10:57 ` Sha Zhengju @ 2012-07-02 7:00 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 7:00 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 19:57), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Document cgroup dirty/writeback memory statistics. > > The implementation for these new interface routines come in a series > of following patches. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document @ 2012-07-02 7:00 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 7:00 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 19:57), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Document cgroup dirty/writeback memory statistics. > > The implementation for these new interface routines come in a series > of following patches. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document 2012-06-28 10:57 ` Sha Zhengju (?) @ 2012-07-04 12:47 ` Michal Hocko -1 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:47 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju On Thu 28-06-12 18:57:35, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Document cgroup dirty/writeback memory statistics. > > The implementation for these new interface routines come in a series > of following patches. I would expect this one the be the last... > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Ackedy-by: Michal Hocko <mhocko@suse.cz> > --- > Documentation/cgroups/memory.txt | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index dd88540..24d7e3c 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -420,6 +420,8 @@ pgpgin - # of charging events to the memory cgroup. The charging > pgpgout - # of uncharging events to the memory cgroup. The uncharging > event happens each time a page is unaccounted from the cgroup. > swap - # of bytes of swap usage > +dirty - # of bytes that are waiting to get written back to the disk. > +writeback - # of bytes that are actively being written back to the disk. > inactive_anon - # of bytes of anonymous memory and swap cache memory on > LRU list. > active_anon - # of bytes of anonymous and swap cache memory on active > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document @ 2012-07-04 12:47 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:47 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On Thu 28-06-12 18:57:35, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > Document cgroup dirty/writeback memory statistics. > > The implementation for these new interface routines come in a series > of following patches. I would expect this one the be the last... > > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> Ackedy-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> > --- > Documentation/cgroups/memory.txt | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index dd88540..24d7e3c 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -420,6 +420,8 @@ pgpgin - # of charging events to the memory cgroup. The charging > pgpgout - # of uncharging events to the memory cgroup. The uncharging > event happens each time a page is unaccounted from the cgroup. > swap - # of bytes of swap usage > +dirty - # of bytes that are waiting to get written back to the disk. > +writeback - # of bytes that are actively being written back to the disk. > inactive_anon - # of bytes of anonymous memory and swap cache memory on > LRU list. > active_anon - # of bytes of anonymous and swap cache memory on active > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document @ 2012-07-04 12:47 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:47 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju On Thu 28-06-12 18:57:35, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Document cgroup dirty/writeback memory statistics. > > The implementation for these new interface routines come in a series > of following patches. I would expect this one the be the last... > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Ackedy-by: Michal Hocko <mhocko@suse.cz> > --- > Documentation/cgroups/memory.txt | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index dd88540..24d7e3c 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -420,6 +420,8 @@ pgpgin - # of charging events to the memory cgroup. The charging > pgpgout - # of uncharging events to the memory cgroup. The uncharging > event happens each time a page is unaccounted from the cgroup. > swap - # of bytes of swap usage > +dirty - # of bytes that are waiting to get written back to the disk. > +writeback - # of bytes that are actively being written back to the disk. > inactive_anon - # of bytes of anonymous memory and swap cache memory on > LRU list. > active_anon - # of bytes of anonymous and swap cache memory on active > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document 2012-06-28 10:57 ` Sha Zhengju (?) @ 2012-07-07 13:45 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-07 13:45 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju > +dirty - # of bytes that are waiting to get written back to the disk. > +writeback - # of bytes that are actively being written back to the disk. This should be a bit more clear to the user: dirty - # of bytes of file cache that are not in sync with the disk copy writeback - # of bytes of file cache that are queued for syncing to disk Thanks, Fengguang > inactive_anon - # of bytes of anonymous memory and swap cache memory on > LRU list. > active_anon - # of bytes of anonymous and swap cache memory on active ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document @ 2012-07-07 13:45 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-07 13:45 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju > +dirty - # of bytes that are waiting to get written back to the disk. > +writeback - # of bytes that are actively being written back to the disk. This should be a bit more clear to the user: dirty - # of bytes of file cache that are not in sync with the disk copy writeback - # of bytes of file cache that are queued for syncing to disk Thanks, Fengguang > inactive_anon - # of bytes of anonymous memory and swap cache memory on > LRU list. > active_anon - # of bytes of anonymous and swap cache memory on active ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 1/7] memcg: update cgroup memory document @ 2012-07-07 13:45 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-07 13:45 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju > +dirty - # of bytes that are waiting to get written back to the disk. > +writeback - # of bytes that are actively being written back to the disk. This should be a bit more clear to the user: dirty - # of bytes of file cache that are not in sync with the disk copy writeback - # of bytes of file cache that are queued for syncing to disk Thanks, Fengguang > inactive_anon - # of bytes of anonymous memory and swap cache memory on > LRU list. > active_anon - # of bytes of anonymous and swap cache memory on active -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED 2012-06-28 10:54 ` Sha Zhengju @ 2012-06-28 10:58 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:58 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED as an extra layer of indirection because of the complexity and presumed performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- include/linux/memcontrol.h | 25 +++++++++++++++++-------- mm/memcontrol.c | 24 +----------------------- mm/rmap.c | 4 ++-- 3 files changed, 20 insertions(+), 33 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83e7ba9..20b0f2d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -27,9 +27,18 @@ struct page_cgroup; struct page; struct mm_struct; -/* Stats that can be updated by kernel. */ -enum mem_cgroup_page_stat_item { - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ +/* + * Statistics for memory cgroup. + */ +enum mem_cgroup_stat_index { + /* + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. + */ + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ + MEM_CGROUP_STAT_NSTATS, }; struct mem_cgroup_reclaim_cookie { @@ -164,17 +173,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, } void mem_cgroup_update_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx, + enum mem_cgroup_stat_index idx, int val); static inline void mem_cgroup_inc_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { mem_cgroup_update_page_stat(page, idx, 1); } static inline void mem_cgroup_dec_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { mem_cgroup_update_page_stat(page, idx, -1); } @@ -349,12 +358,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, } static inline void mem_cgroup_inc_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { } static inline void mem_cgroup_dec_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a2677e0..ebed1ca 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -77,20 +77,6 @@ static int really_do_swap_account __initdata = 0; #endif -/* - * Statistics for memory cgroup. - */ -enum mem_cgroup_stat_index { - /* - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. - */ - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ - MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ - MEM_CGROUP_STAT_NSTATS, -}; - static const char * const mem_cgroup_stat_names[] = { "cache", "rss", @@ -1926,7 +1912,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags) } void mem_cgroup_update_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx, int val) + enum mem_cgroup_stat_index idx, int val) { struct mem_cgroup *memcg; struct page_cgroup *pc = lookup_page_cgroup(page); @@ -1939,14 +1925,6 @@ void mem_cgroup_update_page_stat(struct page *page, if (unlikely(!memcg || !PageCgroupUsed(pc))) return; - switch (idx) { - case MEMCG_NR_FILE_MAPPED: - idx = MEM_CGROUP_STAT_FILE_MAPPED; - break; - default: - BUG(); - } - this_cpu_add(memcg->stat->count[idx], val); } diff --git a/mm/rmap.c b/mm/rmap.c index 2144160..d6b93df 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page) mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (atomic_inc_and_test(&page->_mapcount)) { __inc_zone_page_state(page, NR_FILE_MAPPED); - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); } mem_cgroup_end_update_page_stat(page, &locked, &flags); } @@ -1202,7 +1202,7 @@ void page_remove_rmap(struct page *page) NR_ANON_TRANSPARENT_HUGEPAGES); } else { __dec_zone_page_state(page, NR_FILE_MAPPED); - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED); + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); } /* * It would be tidy to reset the PageAnon mapping here, -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-06-28 10:58 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 10:58 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED as an extra layer of indirection because of the complexity and presumed performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- include/linux/memcontrol.h | 25 +++++++++++++++++-------- mm/memcontrol.c | 24 +----------------------- mm/rmap.c | 4 ++-- 3 files changed, 20 insertions(+), 33 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83e7ba9..20b0f2d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -27,9 +27,18 @@ struct page_cgroup; struct page; struct mm_struct; -/* Stats that can be updated by kernel. */ -enum mem_cgroup_page_stat_item { - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ +/* + * Statistics for memory cgroup. + */ +enum mem_cgroup_stat_index { + /* + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. + */ + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ + MEM_CGROUP_STAT_NSTATS, }; struct mem_cgroup_reclaim_cookie { @@ -164,17 +173,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, } void mem_cgroup_update_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx, + enum mem_cgroup_stat_index idx, int val); static inline void mem_cgroup_inc_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { mem_cgroup_update_page_stat(page, idx, 1); } static inline void mem_cgroup_dec_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { mem_cgroup_update_page_stat(page, idx, -1); } @@ -349,12 +358,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, } static inline void mem_cgroup_inc_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { } static inline void mem_cgroup_dec_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx) + enum mem_cgroup_stat_index idx) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a2677e0..ebed1ca 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -77,20 +77,6 @@ static int really_do_swap_account __initdata = 0; #endif -/* - * Statistics for memory cgroup. - */ -enum mem_cgroup_stat_index { - /* - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. - */ - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ - MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ - MEM_CGROUP_STAT_NSTATS, -}; - static const char * const mem_cgroup_stat_names[] = { "cache", "rss", @@ -1926,7 +1912,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags) } void mem_cgroup_update_page_stat(struct page *page, - enum mem_cgroup_page_stat_item idx, int val) + enum mem_cgroup_stat_index idx, int val) { struct mem_cgroup *memcg; struct page_cgroup *pc = lookup_page_cgroup(page); @@ -1939,14 +1925,6 @@ void mem_cgroup_update_page_stat(struct page *page, if (unlikely(!memcg || !PageCgroupUsed(pc))) return; - switch (idx) { - case MEMCG_NR_FILE_MAPPED: - idx = MEM_CGROUP_STAT_FILE_MAPPED; - break; - default: - BUG(); - } - this_cpu_add(memcg->stat->count[idx], val); } diff --git a/mm/rmap.c b/mm/rmap.c index 2144160..d6b93df 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page) mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (atomic_inc_and_test(&page->_mapcount)) { __inc_zone_page_state(page, NR_FILE_MAPPED); - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); } mem_cgroup_end_update_page_stat(page, &locked, &flags); } @@ -1202,7 +1202,7 @@ void page_remove_rmap(struct page *page) NR_ANON_TRANSPARENT_HUGEPAGES); } else { __dec_zone_page_state(page, NR_FILE_MAPPED); - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED); + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); } /* * It would be tidy to reset the PageAnon mapping here, -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED 2012-06-28 10:58 ` Sha Zhengju (?) @ 2012-07-02 10:44 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 10:44 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 19:58), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-02 10:44 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 10:44 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju (2012/06/28 19:58), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-02 10:44 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 10:44 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 19:58), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED 2012-06-28 10:58 ` Sha Zhengju @ 2012-07-04 12:56 ` Michal Hocko -1 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:56 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju On Thu 28-06-12 18:58:31, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: Michal Hocko <mhocko@suse.cz> > --- > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > mm/memcontrol.c | 24 +----------------------- > mm/rmap.c | 4 ++-- > 3 files changed, 20 insertions(+), 33 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 83e7ba9..20b0f2d 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -27,9 +27,18 @@ struct page_cgroup; > struct page; > struct mm_struct; > > -/* Stats that can be updated by kernel. */ > -enum mem_cgroup_page_stat_item { > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > +/* > + * Statistics for memory cgroup. > + */ > +enum mem_cgroup_stat_index { > + /* > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > + */ > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_NSTATS, > }; > > struct mem_cgroup_reclaim_cookie { > @@ -164,17 +173,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > } > > void mem_cgroup_update_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx, > + enum mem_cgroup_stat_index idx, > int val); > > static inline void mem_cgroup_inc_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > mem_cgroup_update_page_stat(page, idx, 1); > } > > static inline void mem_cgroup_dec_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > mem_cgroup_update_page_stat(page, idx, -1); > } > @@ -349,12 +358,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > } > > static inline void mem_cgroup_inc_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > } > > static inline void mem_cgroup_dec_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a2677e0..ebed1ca 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -77,20 +77,6 @@ static int really_do_swap_account __initdata = 0; > #endif > > > -/* > - * Statistics for memory cgroup. > - */ > -enum mem_cgroup_stat_index { > - /* > - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > - */ > - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > - MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > - MEM_CGROUP_STAT_NSTATS, > -}; > - > static const char * const mem_cgroup_stat_names[] = { > "cache", > "rss", > @@ -1926,7 +1912,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags) > } > > void mem_cgroup_update_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx, int val) > + enum mem_cgroup_stat_index idx, int val) > { > struct mem_cgroup *memcg; > struct page_cgroup *pc = lookup_page_cgroup(page); > @@ -1939,14 +1925,6 @@ void mem_cgroup_update_page_stat(struct page *page, > if (unlikely(!memcg || !PageCgroupUsed(pc))) > return; > > - switch (idx) { > - case MEMCG_NR_FILE_MAPPED: > - idx = MEM_CGROUP_STAT_FILE_MAPPED; > - break; > - default: > - BUG(); > - } > - > this_cpu_add(memcg->stat->count[idx], val); > } > > diff --git a/mm/rmap.c b/mm/rmap.c > index 2144160..d6b93df 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page) > mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (atomic_inc_and_test(&page->_mapcount)) { > __inc_zone_page_state(page, NR_FILE_MAPPED); > - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > } > mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > @@ -1202,7 +1202,7 @@ void page_remove_rmap(struct page *page) > NR_ANON_TRANSPARENT_HUGEPAGES); > } else { > __dec_zone_page_state(page, NR_FILE_MAPPED); > - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED); > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > } > /* > * It would be tidy to reset the PageAnon mapping here, > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-04 12:56 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:56 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju On Thu 28-06-12 18:58:31, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: Michal Hocko <mhocko@suse.cz> > --- > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > mm/memcontrol.c | 24 +----------------------- > mm/rmap.c | 4 ++-- > 3 files changed, 20 insertions(+), 33 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 83e7ba9..20b0f2d 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -27,9 +27,18 @@ struct page_cgroup; > struct page; > struct mm_struct; > > -/* Stats that can be updated by kernel. */ > -enum mem_cgroup_page_stat_item { > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > +/* > + * Statistics for memory cgroup. > + */ > +enum mem_cgroup_stat_index { > + /* > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > + */ > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_NSTATS, > }; > > struct mem_cgroup_reclaim_cookie { > @@ -164,17 +173,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > } > > void mem_cgroup_update_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx, > + enum mem_cgroup_stat_index idx, > int val); > > static inline void mem_cgroup_inc_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > mem_cgroup_update_page_stat(page, idx, 1); > } > > static inline void mem_cgroup_dec_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > mem_cgroup_update_page_stat(page, idx, -1); > } > @@ -349,12 +358,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > } > > static inline void mem_cgroup_inc_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > } > > static inline void mem_cgroup_dec_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx) > + enum mem_cgroup_stat_index idx) > { > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a2677e0..ebed1ca 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -77,20 +77,6 @@ static int really_do_swap_account __initdata = 0; > #endif > > > -/* > - * Statistics for memory cgroup. > - */ > -enum mem_cgroup_stat_index { > - /* > - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > - */ > - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > - MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > - MEM_CGROUP_STAT_NSTATS, > -}; > - > static const char * const mem_cgroup_stat_names[] = { > "cache", > "rss", > @@ -1926,7 +1912,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags) > } > > void mem_cgroup_update_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx, int val) > + enum mem_cgroup_stat_index idx, int val) > { > struct mem_cgroup *memcg; > struct page_cgroup *pc = lookup_page_cgroup(page); > @@ -1939,14 +1925,6 @@ void mem_cgroup_update_page_stat(struct page *page, > if (unlikely(!memcg || !PageCgroupUsed(pc))) > return; > > - switch (idx) { > - case MEMCG_NR_FILE_MAPPED: > - idx = MEM_CGROUP_STAT_FILE_MAPPED; > - break; > - default: > - BUG(); > - } > - > this_cpu_add(memcg->stat->count[idx], val); > } > > diff --git a/mm/rmap.c b/mm/rmap.c > index 2144160..d6b93df 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page) > mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (atomic_inc_and_test(&page->_mapcount)) { > __inc_zone_page_state(page, NR_FILE_MAPPED); > - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > } > mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > @@ -1202,7 +1202,7 @@ void page_remove_rmap(struct page *page) > NR_ANON_TRANSPARENT_HUGEPAGES); > } else { > __dec_zone_page_state(page, NR_FILE_MAPPED); > - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED); > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > } > /* > * It would be tidy to reset the PageAnon mapping here, > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED 2012-07-04 12:56 ` Michal Hocko (?) @ 2012-07-04 12:58 ` Michal Hocko -1 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:58 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju On Wed 04-07-12 14:56:08, Michal Hocko wrote: > On Thu 28-06-12 18:58:31, Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > > as an extra layer of indirection because of the complexity and presumed > > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > Acked-by: Michal Hocko <mhocko@suse.cz> And forgot to mention that this one can be merged right away. > > > --- > > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > > mm/memcontrol.c | 24 +----------------------- > > mm/rmap.c | 4 ++-- > > 3 files changed, 20 insertions(+), 33 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 83e7ba9..20b0f2d 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -27,9 +27,18 @@ struct page_cgroup; > > struct page; > > struct mm_struct; > > > > -/* Stats that can be updated by kernel. */ > > -enum mem_cgroup_page_stat_item { > > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > > +/* > > + * Statistics for memory cgroup. > > + */ > > +enum mem_cgroup_stat_index { > > + /* > > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > > + */ > > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > > + MEM_CGROUP_STAT_NSTATS, > > }; > > > > struct mem_cgroup_reclaim_cookie { > > @@ -164,17 +173,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > > } > > > > void mem_cgroup_update_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx, > > + enum mem_cgroup_stat_index idx, > > int val); > > > > static inline void mem_cgroup_inc_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > mem_cgroup_update_page_stat(page, idx, 1); > > } > > > > static inline void mem_cgroup_dec_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > mem_cgroup_update_page_stat(page, idx, -1); > > } > > @@ -349,12 +358,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > > } > > > > static inline void mem_cgroup_inc_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > } > > > > static inline void mem_cgroup_dec_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > } > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a2677e0..ebed1ca 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -77,20 +77,6 @@ static int really_do_swap_account __initdata = 0; > > #endif > > > > > > -/* > > - * Statistics for memory cgroup. > > - */ > > -enum mem_cgroup_stat_index { > > - /* > > - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > > - */ > > - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > > - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > > - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > > - MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > > - MEM_CGROUP_STAT_NSTATS, > > -}; > > - > > static const char * const mem_cgroup_stat_names[] = { > > "cache", > > "rss", > > @@ -1926,7 +1912,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags) > > } > > > > void mem_cgroup_update_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx, int val) > > + enum mem_cgroup_stat_index idx, int val) > > { > > struct mem_cgroup *memcg; > > struct page_cgroup *pc = lookup_page_cgroup(page); > > @@ -1939,14 +1925,6 @@ void mem_cgroup_update_page_stat(struct page *page, > > if (unlikely(!memcg || !PageCgroupUsed(pc))) > > return; > > > > - switch (idx) { > > - case MEMCG_NR_FILE_MAPPED: > > - idx = MEM_CGROUP_STAT_FILE_MAPPED; > > - break; > > - default: > > - BUG(); > > - } > > - > > this_cpu_add(memcg->stat->count[idx], val); > > } > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 2144160..d6b93df 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page) > > mem_cgroup_begin_update_page_stat(page, &locked, &flags); > > if (atomic_inc_and_test(&page->_mapcount)) { > > __inc_zone_page_state(page, NR_FILE_MAPPED); > > - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); > > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > > } > > mem_cgroup_end_update_page_stat(page, &locked, &flags); > > } > > @@ -1202,7 +1202,7 @@ void page_remove_rmap(struct page *page) > > NR_ANON_TRANSPARENT_HUGEPAGES); > > } else { > > __dec_zone_page_state(page, NR_FILE_MAPPED); > > - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED); > > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > > } > > /* > > * It would be tidy to reset the PageAnon mapping here, > > -- > > 1.7.1 > > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-04 12:58 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:58 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On Wed 04-07-12 14:56:08, Michal Hocko wrote: > On Thu 28-06-12 18:58:31, Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > > as an extra layer of indirection because of the complexity and presumed > > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > > > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> And forgot to mention that this one can be merged right away. > > > --- > > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > > mm/memcontrol.c | 24 +----------------------- > > mm/rmap.c | 4 ++-- > > 3 files changed, 20 insertions(+), 33 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 83e7ba9..20b0f2d 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -27,9 +27,18 @@ struct page_cgroup; > > struct page; > > struct mm_struct; > > > > -/* Stats that can be updated by kernel. */ > > -enum mem_cgroup_page_stat_item { > > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > > +/* > > + * Statistics for memory cgroup. > > + */ > > +enum mem_cgroup_stat_index { > > + /* > > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > > + */ > > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > > + MEM_CGROUP_STAT_NSTATS, > > }; > > > > struct mem_cgroup_reclaim_cookie { > > @@ -164,17 +173,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > > } > > > > void mem_cgroup_update_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx, > > + enum mem_cgroup_stat_index idx, > > int val); > > > > static inline void mem_cgroup_inc_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > mem_cgroup_update_page_stat(page, idx, 1); > > } > > > > static inline void mem_cgroup_dec_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > mem_cgroup_update_page_stat(page, idx, -1); > > } > > @@ -349,12 +358,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > > } > > > > static inline void mem_cgroup_inc_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > } > > > > static inline void mem_cgroup_dec_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > } > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a2677e0..ebed1ca 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -77,20 +77,6 @@ static int really_do_swap_account __initdata = 0; > > #endif > > > > > > -/* > > - * Statistics for memory cgroup. > > - */ > > -enum mem_cgroup_stat_index { > > - /* > > - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > > - */ > > - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > > - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > > - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > > - MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > > - MEM_CGROUP_STAT_NSTATS, > > -}; > > - > > static const char * const mem_cgroup_stat_names[] = { > > "cache", > > "rss", > > @@ -1926,7 +1912,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags) > > } > > > > void mem_cgroup_update_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx, int val) > > + enum mem_cgroup_stat_index idx, int val) > > { > > struct mem_cgroup *memcg; > > struct page_cgroup *pc = lookup_page_cgroup(page); > > @@ -1939,14 +1925,6 @@ void mem_cgroup_update_page_stat(struct page *page, > > if (unlikely(!memcg || !PageCgroupUsed(pc))) > > return; > > > > - switch (idx) { > > - case MEMCG_NR_FILE_MAPPED: > > - idx = MEM_CGROUP_STAT_FILE_MAPPED; > > - break; > > - default: > > - BUG(); > > - } > > - > > this_cpu_add(memcg->stat->count[idx], val); > > } > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 2144160..d6b93df 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page) > > mem_cgroup_begin_update_page_stat(page, &locked, &flags); > > if (atomic_inc_and_test(&page->_mapcount)) { > > __inc_zone_page_state(page, NR_FILE_MAPPED); > > - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); > > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > > } > > mem_cgroup_end_update_page_stat(page, &locked, &flags); > > } > > @@ -1202,7 +1202,7 @@ void page_remove_rmap(struct page *page) > > NR_ANON_TRANSPARENT_HUGEPAGES); > > } else { > > __dec_zone_page_state(page, NR_FILE_MAPPED); > > - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED); > > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > > } > > /* > > * It would be tidy to reset the PageAnon mapping here, > > -- > > 1.7.1 > > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-04 12:58 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 12:58 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju On Wed 04-07-12 14:56:08, Michal Hocko wrote: > On Thu 28-06-12 18:58:31, Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > > as an extra layer of indirection because of the complexity and presumed > > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > Acked-by: Michal Hocko <mhocko@suse.cz> And forgot to mention that this one can be merged right away. > > > --- > > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > > mm/memcontrol.c | 24 +----------------------- > > mm/rmap.c | 4 ++-- > > 3 files changed, 20 insertions(+), 33 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 83e7ba9..20b0f2d 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -27,9 +27,18 @@ struct page_cgroup; > > struct page; > > struct mm_struct; > > > > -/* Stats that can be updated by kernel. */ > > -enum mem_cgroup_page_stat_item { > > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > > +/* > > + * Statistics for memory cgroup. > > + */ > > +enum mem_cgroup_stat_index { > > + /* > > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > > + */ > > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > > + MEM_CGROUP_STAT_NSTATS, > > }; > > > > struct mem_cgroup_reclaim_cookie { > > @@ -164,17 +173,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > > } > > > > void mem_cgroup_update_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx, > > + enum mem_cgroup_stat_index idx, > > int val); > > > > static inline void mem_cgroup_inc_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > mem_cgroup_update_page_stat(page, idx, 1); > > } > > > > static inline void mem_cgroup_dec_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > mem_cgroup_update_page_stat(page, idx, -1); > > } > > @@ -349,12 +358,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, > > } > > > > static inline void mem_cgroup_inc_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > } > > > > static inline void mem_cgroup_dec_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx) > > + enum mem_cgroup_stat_index idx) > > { > > } > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a2677e0..ebed1ca 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -77,20 +77,6 @@ static int really_do_swap_account __initdata = 0; > > #endif > > > > > > -/* > > - * Statistics for memory cgroup. > > - */ > > -enum mem_cgroup_stat_index { > > - /* > > - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > > - */ > > - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > > - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > > - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > > - MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > > - MEM_CGROUP_STAT_NSTATS, > > -}; > > - > > static const char * const mem_cgroup_stat_names[] = { > > "cache", > > "rss", > > @@ -1926,7 +1912,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags) > > } > > > > void mem_cgroup_update_page_stat(struct page *page, > > - enum mem_cgroup_page_stat_item idx, int val) > > + enum mem_cgroup_stat_index idx, int val) > > { > > struct mem_cgroup *memcg; > > struct page_cgroup *pc = lookup_page_cgroup(page); > > @@ -1939,14 +1925,6 @@ void mem_cgroup_update_page_stat(struct page *page, > > if (unlikely(!memcg || !PageCgroupUsed(pc))) > > return; > > > > - switch (idx) { > > - case MEMCG_NR_FILE_MAPPED: > > - idx = MEM_CGROUP_STAT_FILE_MAPPED; > > - break; > > - default: > > - BUG(); > > - } > > - > > this_cpu_add(memcg->stat->count[idx], val); > > } > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 2144160..d6b93df 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page) > > mem_cgroup_begin_update_page_stat(page, &locked, &flags); > > if (atomic_inc_and_test(&page->_mapcount)) { > > __inc_zone_page_state(page, NR_FILE_MAPPED); > > - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); > > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > > } > > mem_cgroup_end_update_page_stat(page, &locked, &flags); > > } > > @@ -1202,7 +1202,7 @@ void page_remove_rmap(struct page *page) > > NR_ANON_TRANSPARENT_HUGEPAGES); > > } else { > > __dec_zone_page_state(page, NR_FILE_MAPPED); > > - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED); > > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED); > > } > > /* > > * It would be tidy to reset the PageAnon mapping here, > > -- > > 1.7.1 > > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED 2012-06-28 10:58 ` Sha Zhengju @ 2012-07-07 13:48 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-07 13:48 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28, 2012 at 06:58:31PM +0800, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > mm/memcontrol.c | 24 +----------------------- > mm/rmap.c | 4 ++-- > 3 files changed, 20 insertions(+), 33 deletions(-) Nice cleanup! Acked-by: Fengguang Wu <fengguang.wu@intel.com> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-07 13:48 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-07 13:48 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28, 2012 at 06:58:31PM +0800, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > mm/memcontrol.c | 24 +----------------------- > mm/rmap.c | 4 ++-- > 3 files changed, 20 insertions(+), 33 deletions(-) Nice cleanup! Acked-by: Fengguang Wu <fengguang.wu@intel.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED 2012-06-28 10:58 ` Sha Zhengju (?) @ 2012-07-09 21:01 ` Greg Thelen -1 siblings, 0 replies; 132+ messages in thread From: Greg Thelen @ 2012-07-09 21:01 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > mm/memcontrol.c | 24 +----------------------- > mm/rmap.c | 4 ++-- > 3 files changed, 20 insertions(+), 33 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 83e7ba9..20b0f2d 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -27,9 +27,18 @@ struct page_cgroup; > struct page; > struct mm_struct; > > -/* Stats that can be updated by kernel. */ > -enum mem_cgroup_page_stat_item { > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > +/* > + * Statistics for memory cgroup. > + */ > +enum mem_cgroup_stat_index { > + /* > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > + */ > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_NSTATS, > }; Nit. Moving mem_cgroup_stat_index from memcontrol.c to memcontrol.h is fine with me. But this does increase the distance between related defintions of definition mem_cgroup_stat_index and mem_cgroup_stat_names. These two lists have to be kept in sync. So it might help to add a comment to both indicating their relationship so we don't accidentally modify the enum without updating the dependent string table. Otherwise, looks good. Reviewed-by: Greg Thelen <gthelen@google.com> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-09 21:01 ` Greg Thelen 0 siblings, 0 replies; 132+ messages in thread From: Greg Thelen @ 2012-07-09 21:01 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On Thu, Jun 28 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > --- > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > mm/memcontrol.c | 24 +----------------------- > mm/rmap.c | 4 ++-- > 3 files changed, 20 insertions(+), 33 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 83e7ba9..20b0f2d 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -27,9 +27,18 @@ struct page_cgroup; > struct page; > struct mm_struct; > > -/* Stats that can be updated by kernel. */ > -enum mem_cgroup_page_stat_item { > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > +/* > + * Statistics for memory cgroup. > + */ > +enum mem_cgroup_stat_index { > + /* > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > + */ > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_NSTATS, > }; Nit. Moving mem_cgroup_stat_index from memcontrol.c to memcontrol.h is fine with me. But this does increase the distance between related defintions of definition mem_cgroup_stat_index and mem_cgroup_stat_names. These two lists have to be kept in sync. So it might help to add a comment to both indicating their relationship so we don't accidentally modify the enum without updating the dependent string table. Otherwise, looks good. Reviewed-by: Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-09 21:01 ` Greg Thelen 0 siblings, 0 replies; 132+ messages in thread From: Greg Thelen @ 2012-07-09 21:01 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED > as an extra layer of indirection because of the complexity and presumed > performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 25 +++++++++++++++++-------- > mm/memcontrol.c | 24 +----------------------- > mm/rmap.c | 4 ++-- > 3 files changed, 20 insertions(+), 33 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 83e7ba9..20b0f2d 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -27,9 +27,18 @@ struct page_cgroup; > struct page; > struct mm_struct; > > -/* Stats that can be updated by kernel. */ > -enum mem_cgroup_page_stat_item { > - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ > +/* > + * Statistics for memory cgroup. > + */ > +enum mem_cgroup_stat_index { > + /* > + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. > + */ > + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ > + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_NSTATS, > }; Nit. Moving mem_cgroup_stat_index from memcontrol.c to memcontrol.h is fine with me. But this does increase the distance between related defintions of definition mem_cgroup_stat_index and mem_cgroup_stat_names. These two lists have to be kept in sync. So it might help to add a comment to both indicating their relationship so we don't accidentally modify the enum without updating the dependent string table. Otherwise, looks good. Reviewed-by: Greg Thelen <gthelen@google.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED 2012-07-09 21:01 ` Greg Thelen @ 2012-07-11 8:00 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-11 8:00 UTC (permalink / raw) To: Greg Thelen Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/10/2012 05:01 AM, Greg Thelen wrote: > On Thu, Jun 28 2012, Sha Zhengju wrote: > >> From: Sha Zhengju<handai.szj@taobao.com> >> >> While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED >> as an extra layer of indirection because of the complexity and presumed >> performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. >> >> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> --- >> include/linux/memcontrol.h | 25 +++++++++++++++++-------- >> mm/memcontrol.c | 24 +----------------------- >> mm/rmap.c | 4 ++-- >> 3 files changed, 20 insertions(+), 33 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 83e7ba9..20b0f2d 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -27,9 +27,18 @@ struct page_cgroup; >> struct page; >> struct mm_struct; >> >> -/* Stats that can be updated by kernel. */ >> -enum mem_cgroup_page_stat_item { >> - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ >> +/* >> + * Statistics for memory cgroup. >> + */ >> +enum mem_cgroup_stat_index { >> + /* >> + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. >> + */ >> + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ >> + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ >> + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ >> + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ >> + MEM_CGROUP_STAT_NSTATS, >> }; > Nit. Moving mem_cgroup_stat_index from memcontrol.c to memcontrol.h is > fine with me. But this does increase the distance between related > defintions of definition mem_cgroup_stat_index and > mem_cgroup_stat_names. These two lists have to be kept in sync. So it > might help to add a comment to both indicating their relationship so we > don't accidentally modify the enum without updating the dependent string > table. > > Otherwise, looks good. > > Reviewed-by: Greg Thelen<gthelen@google.com> Sorry for the delay. OK, I'll add some comment here. Thanks for reminding! Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED @ 2012-07-11 8:00 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-11 8:00 UTC (permalink / raw) To: Greg Thelen Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/10/2012 05:01 AM, Greg Thelen wrote: > On Thu, Jun 28 2012, Sha Zhengju wrote: > >> From: Sha Zhengju<handai.szj@taobao.com> >> >> While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED >> as an extra layer of indirection because of the complexity and presumed >> performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly. >> >> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> --- >> include/linux/memcontrol.h | 25 +++++++++++++++++-------- >> mm/memcontrol.c | 24 +----------------------- >> mm/rmap.c | 4 ++-- >> 3 files changed, 20 insertions(+), 33 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 83e7ba9..20b0f2d 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -27,9 +27,18 @@ struct page_cgroup; >> struct page; >> struct mm_struct; >> >> -/* Stats that can be updated by kernel. */ >> -enum mem_cgroup_page_stat_item { >> - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ >> +/* >> + * Statistics for memory cgroup. >> + */ >> +enum mem_cgroup_stat_index { >> + /* >> + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. >> + */ >> + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ >> + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ >> + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ >> + MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ >> + MEM_CGROUP_STAT_NSTATS, >> }; > Nit. Moving mem_cgroup_stat_index from memcontrol.c to memcontrol.h is > fine with me. But this does increase the distance between related > defintions of definition mem_cgroup_stat_index and > mem_cgroup_stat_names. These two lists have to be kept in sync. So it > might help to add a comment to both indicating their relationship so we > don't accidentally modify the enum without updating the dependent string > table. > > Otherwise, looks good. > > Reviewed-by: Greg Thelen<gthelen@google.com> Sorry for the delay. OK, I'll add some comment here. Thanks for reminding! Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func 2012-06-28 10:54 ` Sha Zhengju @ 2012-06-28 11:01 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:01 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) extracts TestSetPageDirty from __set_page_dirty and is far away from account_page_dirtied.But it's better to make the two operations in one single function to keep modular.So in order to avoid the potential race mentioned in commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. It's a prepare patch for following memcg dirty page accounting patches. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- fs/buffer.c | 25 +++++++++++++------------ 1 files changed, 13 insertions(+), 12 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 838a9cf..e8d96b8 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -610,9 +610,15 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. */ -static void __set_page_dirty(struct page *page, +static int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { + if (unlikely(!mapping)) + return !TestSetPageDirty(page); + + if (TestSetPageDirty(page)) + return 0; + spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ WARN_ON_ONCE(warn && !PageUptodate(page)); @@ -622,6 +628,8 @@ static void __set_page_dirty(struct page *page, } spin_unlock_irq(&mapping->tree_lock); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + return 1; } /* @@ -667,11 +675,9 @@ int __set_page_dirty_buffers(struct page *page) bh = bh->b_this_page; } while (bh != head); } - newly_dirty = !TestSetPageDirty(page); + newly_dirty = __set_page_dirty(page, mapping, 1); spin_unlock(&mapping->private_lock); - if (newly_dirty) - __set_page_dirty(page, mapping, 1); return newly_dirty; } EXPORT_SYMBOL(__set_page_dirty_buffers); @@ -1115,14 +1121,9 @@ void mark_buffer_dirty(struct buffer_head *bh) return; } - if (!test_set_buffer_dirty(bh)) { - struct page *page = bh->b_page; - if (!TestSetPageDirty(page)) { - struct address_space *mapping = page_mapping(page); - if (mapping) - __set_page_dirty(page, mapping, 0); - } - } + if (!test_set_buffer_dirty(bh)) + __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0); + } EXPORT_SYMBOL(mark_buffer_dirty); -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func @ 2012-06-28 11:01 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:01 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) extracts TestSetPageDirty from __set_page_dirty and is far away from account_page_dirtied.But it's better to make the two operations in one single function to keep modular.So in order to avoid the potential race mentioned in commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. It's a prepare patch for following memcg dirty page accounting patches. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- fs/buffer.c | 25 +++++++++++++------------ 1 files changed, 13 insertions(+), 12 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 838a9cf..e8d96b8 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -610,9 +610,15 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. */ -static void __set_page_dirty(struct page *page, +static int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { + if (unlikely(!mapping)) + return !TestSetPageDirty(page); + + if (TestSetPageDirty(page)) + return 0; + spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ WARN_ON_ONCE(warn && !PageUptodate(page)); @@ -622,6 +628,8 @@ static void __set_page_dirty(struct page *page, } spin_unlock_irq(&mapping->tree_lock); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + return 1; } /* @@ -667,11 +675,9 @@ int __set_page_dirty_buffers(struct page *page) bh = bh->b_this_page; } while (bh != head); } - newly_dirty = !TestSetPageDirty(page); + newly_dirty = __set_page_dirty(page, mapping, 1); spin_unlock(&mapping->private_lock); - if (newly_dirty) - __set_page_dirty(page, mapping, 1); return newly_dirty; } EXPORT_SYMBOL(__set_page_dirty_buffers); @@ -1115,14 +1121,9 @@ void mark_buffer_dirty(struct buffer_head *bh) return; } - if (!test_set_buffer_dirty(bh)) { - struct page *page = bh->b_page; - if (!TestSetPageDirty(page)) { - struct address_space *mapping = page_mapping(page); - if (mapping) - __set_page_dirty(page, mapping, 0); - } - } + if (!test_set_buffer_dirty(bh)) + __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0); + } EXPORT_SYMBOL(mark_buffer_dirty); -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func @ 2012-07-02 11:14 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 11:14 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju (2012/06/28 20:01), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) > extracts TestSetPageDirty from __set_page_dirty and is far away from > account_page_dirtied.But it's better to make the two operations in one single > function to keep modular.So in order to avoid the potential race mentioned in > commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. > I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. > > It's a prepare patch for following memcg dirty page accounting patches. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> I think there is no problem with the lock order. My small concern is the impact on the performance. IIUC, lock contention here can be seen if multiple threads write to the same file in parallel. Do you have any numbers before/after the patch ? Thanks, -Kmae ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func @ 2012-07-02 11:14 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 11:14 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju (2012/06/28 20:01), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) > extracts TestSetPageDirty from __set_page_dirty and is far away from > account_page_dirtied.But it's better to make the two operations in one single > function to keep modular.So in order to avoid the potential race mentioned in > commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. > I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. > > It's a prepare patch for following memcg dirty page accounting patches. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> I think there is no problem with the lock order. My small concern is the impact on the performance. IIUC, lock contention here can be seen if multiple threads write to the same file in parallel. Do you have any numbers before/after the patch ? Thanks, -Kmae -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func @ 2012-07-02 11:14 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-02 11:14 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju (2012/06/28 20:01), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) > extracts TestSetPageDirty from __set_page_dirty and is far away from > account_page_dirtied.But it's better to make the two operations in one single > function to keep modular.So in order to avoid the potential race mentioned in > commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. > I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. > > It's a prepare patch for following memcg dirty page accounting patches. > > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> I think there is no problem with the lock order. My small concern is the impact on the performance. IIUC, lock contention here can be seen if multiple threads write to the same file in parallel. Do you have any numbers before/after the patch ? Thanks, -Kmae ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func 2012-07-02 11:14 ` Kamezawa Hiroyuki @ 2012-07-07 14:42 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-07 14:42 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju On Mon, Jul 02, 2012 at 08:14:02PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/06/28 20:01), Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) > > extracts TestSetPageDirty from __set_page_dirty and is far away from > > account_page_dirtied.But it's better to make the two operations in one single > > function to keep modular.So in order to avoid the potential race mentioned in > > commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. > > I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. > > > > It's a prepare patch for following memcg dirty page accounting patches. > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > I think there is no problem with the lock order. Me think so, too. > My small concern is the impact on the performance. IIUC, lock contention here can be > seen if multiple threads write to the same file in parallel. > Do you have any numbers before/after the patch ? That would be a worthwhile test. The patch moves ->tree_lock and ->i_lock into ->private_lock, these are often contented locks.. For example, in the below case of 12 hard disks, each running 1 dd write, the ->tree_lock and ->private_lock have the top #1 and #2 contentions. lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.3.0/lock_stat ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- &(&mapping->tree_lock)->rlock: 18629034 19138284 0.09 1029.32 24353812.07 49650988 482883410 0.11 186.88 260706119.09 ----------------------------- &(&mapping->tree_lock)->rlock 783 [<ffffffff81109267>] tag_pages_for_writeback+0x2b/0x9d &(&mapping->tree_lock)->rlock 3195817 [<ffffffff81100d6c>] add_to_page_cache_locked+0xa3/0x119 &(&mapping->tree_lock)->rlock 3863710 [<ffffffff81108df7>] test_set_page_writeback+0x63/0x140 &(&mapping->tree_lock)->rlock 3311518 [<ffffffff81172ade>] __set_page_dirty+0x25/0xa5 ----------------------------- &(&mapping->tree_lock)->rlock 3450725 [<ffffffff81100d6c>] add_to_page_cache_locked+0xa3/0x119 &(&mapping->tree_lock)->rlock 3225542 [<ffffffff81172ade>] __set_page_dirty+0x25/0xa5 &(&mapping->tree_lock)->rlock 2241958 [<ffffffff81108df7>] test_set_page_writeback+0x63/0x140 &(&mapping->tree_lock)->rlock 7339603 [<ffffffff8110ac33>] test_clear_page_writeback+0x64/0x155 ............................................................................................................................................................................................... &(&mapping->private_lock)->rlock: 1165199 1191201 0.11 2843.25 1621608.38 13341420 152761848 0.10 3727.92 33559035.07 -------------------------------- &(&mapping->private_lock)->rlock 1 [<ffffffff81172913>] __find_get_block_slow+0x5a/0x135 &(&mapping->private_lock)->rlock 385576 [<ffffffff811735d6>] create_empty_buffers+0x48/0xbf &(&mapping->private_lock)->rlock 805624 [<ffffffff8117346d>] try_to_free_buffers+0x57/0xaa -------------------------------- &(&mapping->private_lock)->rlock 1 [<ffffffff811746dd>] __getblk+0x1b8/0x257 &(&mapping->private_lock)->rlock 952718 [<ffffffff8117346d>] try_to_free_buffers+0x57/0xaa &(&mapping->private_lock)->rlock 238482 [<ffffffff811735d6>] create_empty_buffers+0x48/0xbf Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func @ 2012-07-07 14:42 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-07 14:42 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju On Mon, Jul 02, 2012 at 08:14:02PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/06/28 20:01), Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) > > extracts TestSetPageDirty from __set_page_dirty and is far away from > > account_page_dirtied.But it's better to make the two operations in one single > > function to keep modular.So in order to avoid the potential race mentioned in > > commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. > > I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. > > > > It's a prepare patch for following memcg dirty page accounting patches. > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > I think there is no problem with the lock order. Me think so, too. > My small concern is the impact on the performance. IIUC, lock contention here can be > seen if multiple threads write to the same file in parallel. > Do you have any numbers before/after the patch ? That would be a worthwhile test. The patch moves ->tree_lock and ->i_lock into ->private_lock, these are often contented locks.. For example, in the below case of 12 hard disks, each running 1 dd write, the ->tree_lock and ->private_lock have the top #1 and #2 contentions. lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.3.0/lock_stat ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- &(&mapping->tree_lock)->rlock: 18629034 19138284 0.09 1029.32 24353812.07 49650988 482883410 0.11 186.88 260706119.09 ----------------------------- &(&mapping->tree_lock)->rlock 783 [<ffffffff81109267>] tag_pages_for_writeback+0x2b/0x9d &(&mapping->tree_lock)->rlock 3195817 [<ffffffff81100d6c>] add_to_page_cache_locked+0xa3/0x119 &(&mapping->tree_lock)->rlock 3863710 [<ffffffff81108df7>] test_set_page_writeback+0x63/0x140 &(&mapping->tree_lock)->rlock 3311518 [<ffffffff81172ade>] __set_page_dirty+0x25/0xa5 ----------------------------- &(&mapping->tree_lock)->rlock 3450725 [<ffffffff81100d6c>] add_to_page_cache_locked+0xa3/0x119 &(&mapping->tree_lock)->rlock 3225542 [<ffffffff81172ade>] __set_page_dirty+0x25/0xa5 &(&mapping->tree_lock)->rlock 2241958 [<ffffffff81108df7>] test_set_page_writeback+0x63/0x140 &(&mapping->tree_lock)->rlock 7339603 [<ffffffff8110ac33>] test_clear_page_writeback+0x64/0x155 ............................................................................................................................................................................................... &(&mapping->private_lock)->rlock: 1165199 1191201 0.11 2843.25 1621608.38 13341420 152761848 0.10 3727.92 33559035.07 -------------------------------- &(&mapping->private_lock)->rlock 1 [<ffffffff81172913>] __find_get_block_slow+0x5a/0x135 &(&mapping->private_lock)->rlock 385576 [<ffffffff811735d6>] create_empty_buffers+0x48/0xbf &(&mapping->private_lock)->rlock 805624 [<ffffffff8117346d>] try_to_free_buffers+0x57/0xaa -------------------------------- &(&mapping->private_lock)->rlock 1 [<ffffffff811746dd>] __getblk+0x1b8/0x257 &(&mapping->private_lock)->rlock 952718 [<ffffffff8117346d>] try_to_free_buffers+0x57/0xaa &(&mapping->private_lock)->rlock 238482 [<ffffffff811735d6>] create_empty_buffers+0x48/0xbf Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func 2012-06-28 11:01 ` Sha Zhengju @ 2012-07-04 14:23 ` Michal Hocko -1 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 14:23 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju, Nick Piggin [Add Nick to the CC] On Thu 28-06-12 19:01:15, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) > extracts TestSetPageDirty from __set_page_dirty and is far away from > account_page_dirtied.But it's better to make the two operations in one single > function to keep modular. > So in order to avoid the potential race mentioned in > commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. So this is exactly what Nick suggested in the beginning (https://lkml.org/lkml/2009/3/19/169) > I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. Yes and mm/filemap.c says that explicitly: * Lock ordering: * * ->i_mmap_mutex (truncate_pagecache) * ->private_lock (__free_pte->__set_page_dirty_buffers) * ->swap_lock (exclusive_swap_page, others) * ->mapping->tree_lock > > It's a prepare patch for following memcg dirty page accounting patches. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> The patch seems to be correct. Reviewed-by: Michal Hocko <mhocko@suse.cz> > --- > fs/buffer.c | 25 +++++++++++++------------ > 1 files changed, 13 insertions(+), 12 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 838a9cf..e8d96b8 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -610,9 +610,15 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > * If warn is true, then emit a warning if the page is not uptodate and has > * not been truncated. > */ > -static void __set_page_dirty(struct page *page, > +static int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + if (unlikely(!mapping)) > + return !TestSetPageDirty(page); > + > + if (TestSetPageDirty(page)) > + return 0; > + > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > WARN_ON_ONCE(warn && !PageUptodate(page)); > @@ -622,6 +628,8 @@ static void __set_page_dirty(struct page *page, > } > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > + > + return 1; > } > > /* > @@ -667,11 +675,9 @@ int __set_page_dirty_buffers(struct page *page) > bh = bh->b_this_page; > } while (bh != head); > } > - newly_dirty = !TestSetPageDirty(page); > + newly_dirty = __set_page_dirty(page, mapping, 1); > spin_unlock(&mapping->private_lock); > > - if (newly_dirty) > - __set_page_dirty(page, mapping, 1); > return newly_dirty; > } > EXPORT_SYMBOL(__set_page_dirty_buffers); > @@ -1115,14 +1121,9 @@ void mark_buffer_dirty(struct buffer_head *bh) > return; > } > > - if (!test_set_buffer_dirty(bh)) { > - struct page *page = bh->b_page; > - if (!TestSetPageDirty(page)) { > - struct address_space *mapping = page_mapping(page); > - if (mapping) > - __set_page_dirty(page, mapping, 0); > - } > - } > + if (!test_set_buffer_dirty(bh)) > + __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0); > + > } > EXPORT_SYMBOL(mark_buffer_dirty); > > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func @ 2012-07-04 14:23 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 14:23 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, torvalds, viro, linux-fsdevel, Sha Zhengju, Nick Piggin [Add Nick to the CC] On Thu 28-06-12 19:01:15, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers()) > extracts TestSetPageDirty from __set_page_dirty and is far away from > account_page_dirtied.But it's better to make the two operations in one single > function to keep modular. > So in order to avoid the potential race mentioned in > commit a8e7d49a, we can hold private_lock until __set_page_dirty completes. So this is exactly what Nick suggested in the beginning (https://lkml.org/lkml/2009/3/19/169) > I guess there's no deadlock between ->private_lock and ->tree_lock by quick look. Yes and mm/filemap.c says that explicitly: * Lock ordering: * * ->i_mmap_mutex (truncate_pagecache) * ->private_lock (__free_pte->__set_page_dirty_buffers) * ->swap_lock (exclusive_swap_page, others) * ->mapping->tree_lock > > It's a prepare patch for following memcg dirty page accounting patches. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> The patch seems to be correct. Reviewed-by: Michal Hocko <mhocko@suse.cz> > --- > fs/buffer.c | 25 +++++++++++++------------ > 1 files changed, 13 insertions(+), 12 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 838a9cf..e8d96b8 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -610,9 +610,15 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > * If warn is true, then emit a warning if the page is not uptodate and has > * not been truncated. > */ > -static void __set_page_dirty(struct page *page, > +static int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + if (unlikely(!mapping)) > + return !TestSetPageDirty(page); > + > + if (TestSetPageDirty(page)) > + return 0; > + > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > WARN_ON_ONCE(warn && !PageUptodate(page)); > @@ -622,6 +628,8 @@ static void __set_page_dirty(struct page *page, > } > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > + > + return 1; > } > > /* > @@ -667,11 +675,9 @@ int __set_page_dirty_buffers(struct page *page) > bh = bh->b_this_page; > } while (bh != head); > } > - newly_dirty = !TestSetPageDirty(page); > + newly_dirty = __set_page_dirty(page, mapping, 1); > spin_unlock(&mapping->private_lock); > > - if (newly_dirty) > - __set_page_dirty(page, mapping, 1); > return newly_dirty; > } > EXPORT_SYMBOL(__set_page_dirty_buffers); > @@ -1115,14 +1121,9 @@ void mark_buffer_dirty(struct buffer_head *bh) > return; > } > > - if (!test_set_buffer_dirty(bh)) { > - struct page *page = bh->b_page; > - if (!TestSetPageDirty(page)) { > - struct address_space *mapping = page_mapping(page); > - if (mapping) > - __set_page_dirty(page, mapping, 0); > - } > - } > + if (!test_set_buffer_dirty(bh)) > + __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0); > + > } > EXPORT_SYMBOL(mark_buffer_dirty); > > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-06-28 11:03 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:03 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Following we will treat SetPageDirty and dirty page accounting as an integrated operation. Filesystems had better use vfs interface directly to avoid those details. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- fs/buffer.c | 2 +- fs/ceph/addr.c | 20 ++------------------ include/linux/buffer_head.h | 2 ++ 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index e8d96b8..55522dd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. */ -static int __set_page_dirty(struct page *page, +int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { if (unlikely(!mapping)) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8b67304..d028fbe 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -5,6 +5,7 @@ #include <linux/mm.h> #include <linux/pagemap.h> #include <linux/writeback.h> /* generic_writepages */ +#include <linux/buffer_head.h> #include <linux/slab.h> #include <linux/pagevec.h> #include <linux/task_io_accounting_ops.h> @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) int undo = 0; struct ceph_snap_context *snapc; - if (unlikely(!mapping)) - return !TestSetPageDirty(page); - - if (TestSetPageDirty(page)) { - dout("%p set_page_dirty %p idx %lu -- already dirty\n", - mapping->host, page, page->index); + if (!__set_page_dirty(page, mapping, 1)) return 0; - } inode = mapping->host; ci = ceph_inode(inode); @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc->seq, snapc->num_snaps); spin_unlock(&ci->i_ceph_lock); - /* now adjust page */ - spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page->mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - /* * Reference snap context in page->private. Also set * PagePrivate so that we get invalidatepage callback. @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) undo = 1; } - spin_unlock_irq(&mapping->tree_lock); - if (undo) /* whoops, we failed to dirty the page */ ceph_put_wrbuffer_cap_refs(ci, 1, snapc); - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - BUG_ON(!PageDirty(page)); return 1; } diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 458f497..0a331a8 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) } extern int __set_page_dirty_buffers(struct page *page); +extern int __set_page_dirty(struct page *page, + struct address_space *mapping, int warn); #else /* CONFIG_BLOCK */ -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-06-28 11:03 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:03 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Following we will treat SetPageDirty and dirty page accounting as an integrated operation. Filesystems had better use vfs interface directly to avoid those details. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- fs/buffer.c | 2 +- fs/ceph/addr.c | 20 ++------------------ include/linux/buffer_head.h | 2 ++ 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index e8d96b8..55522dd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. */ -static int __set_page_dirty(struct page *page, +int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { if (unlikely(!mapping)) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8b67304..d028fbe 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -5,6 +5,7 @@ #include <linux/mm.h> #include <linux/pagemap.h> #include <linux/writeback.h> /* generic_writepages */ +#include <linux/buffer_head.h> #include <linux/slab.h> #include <linux/pagevec.h> #include <linux/task_io_accounting_ops.h> @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) int undo = 0; struct ceph_snap_context *snapc; - if (unlikely(!mapping)) - return !TestSetPageDirty(page); - - if (TestSetPageDirty(page)) { - dout("%p set_page_dirty %p idx %lu -- already dirty\n", - mapping->host, page, page->index); + if (!__set_page_dirty(page, mapping, 1)) return 0; - } inode = mapping->host; ci = ceph_inode(inode); @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc->seq, snapc->num_snaps); spin_unlock(&ci->i_ceph_lock); - /* now adjust page */ - spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page->mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - /* * Reference snap context in page->private. Also set * PagePrivate so that we get invalidatepage callback. @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) undo = 1; } - spin_unlock_irq(&mapping->tree_lock); - if (undo) /* whoops, we failed to dirty the page */ ceph_put_wrbuffer_cap_refs(ci, 1, snapc); - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - BUG_ON(!PageDirty(page)); return 1; } diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 458f497..0a331a8 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) } extern int __set_page_dirty_buffers(struct page *page); +extern int __set_page_dirty(struct page *page, + struct address_space *mapping, int warn); #else /* CONFIG_BLOCK */ -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-06-28 11:03 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:03 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA Cc: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, sage-BnTBU8nroG7k1uMJSBkQmQ, ceph-devel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> Following we will treat SetPageDirty and dirty page accounting as an integrated operation. Filesystems had better use vfs interface directly to avoid those details. Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> --- fs/buffer.c | 2 +- fs/ceph/addr.c | 20 ++------------------ include/linux/buffer_head.h | 2 ++ 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index e8d96b8..55522dd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. */ -static int __set_page_dirty(struct page *page, +int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { if (unlikely(!mapping)) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8b67304..d028fbe 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -5,6 +5,7 @@ #include <linux/mm.h> #include <linux/pagemap.h> #include <linux/writeback.h> /* generic_writepages */ +#include <linux/buffer_head.h> #include <linux/slab.h> #include <linux/pagevec.h> #include <linux/task_io_accounting_ops.h> @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) int undo = 0; struct ceph_snap_context *snapc; - if (unlikely(!mapping)) - return !TestSetPageDirty(page); - - if (TestSetPageDirty(page)) { - dout("%p set_page_dirty %p idx %lu -- already dirty\n", - mapping->host, page, page->index); + if (!__set_page_dirty(page, mapping, 1)) return 0; - } inode = mapping->host; ci = ceph_inode(inode); @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc->seq, snapc->num_snaps); spin_unlock(&ci->i_ceph_lock); - /* now adjust page */ - spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page->mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - /* * Reference snap context in page->private. Also set * PagePrivate so that we get invalidatepage callback. @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) undo = 1; } - spin_unlock_irq(&mapping->tree_lock); - if (undo) /* whoops, we failed to dirty the page */ ceph_put_wrbuffer_cap_refs(ci, 1, snapc); - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - BUG_ON(!PageDirty(page)); return 1; } diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 458f497..0a331a8 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) } extern int __set_page_dirty_buffers(struct page *page); +extern int __set_page_dirty(struct page *page, + struct address_space *mapping, int warn); #else /* CONFIG_BLOCK */ -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-06-29 5:21 ` Sage Weil 0 siblings, 0 replies; 132+ messages in thread From: Sage Weil @ 2012-06-29 5:21 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Thu, 28 Jun 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Following we will treat SetPageDirty and dirty page accounting as an integrated > operation. Filesystems had better use vfs interface directly to avoid those details. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 2 +- > fs/ceph/addr.c | 20 ++------------------ > include/linux/buffer_head.h | 2 ++ > 3 files changed, 5 insertions(+), 19 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index e8d96b8..55522dd 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > * If warn is true, then emit a warning if the page is not uptodate and has > * not been truncated. > */ > -static int __set_page_dirty(struct page *page, > +int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > if (unlikely(!mapping)) This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to continue to build as a module. With that fixed, the ceph bits are a welcome cleanup! Acked-by: Sage Weil <sage@inktank.com> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 8b67304..d028fbe 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -5,6 +5,7 @@ > #include <linux/mm.h> > #include <linux/pagemap.h> > #include <linux/writeback.h> /* generic_writepages */ > +#include <linux/buffer_head.h> > #include <linux/slab.h> > #include <linux/pagevec.h> > #include <linux/task_io_accounting_ops.h> > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > int undo = 0; > struct ceph_snap_context *snapc; > > - if (unlikely(!mapping)) > - return !TestSetPageDirty(page); > - > - if (TestSetPageDirty(page)) { > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > - mapping->host, page, page->index); > + if (!__set_page_dirty(page, mapping, 1)) > return 0; > - } > > inode = mapping->host; > ci = ceph_inode(inode); > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > snapc, snapc->seq, snapc->num_snaps); > spin_unlock(&ci->i_ceph_lock); > > - /* now adjust page */ > - spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > - WARN_ON_ONCE(!PageUptodate(page)); > - account_page_dirtied(page, page->mapping); > - radix_tree_tag_set(&mapping->page_tree, > - page_index(page), PAGECACHE_TAG_DIRTY); > - > /* > * Reference snap context in page->private. Also set > * PagePrivate so that we get invalidatepage callback. > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) > undo = 1; > } > > - spin_unlock_irq(&mapping->tree_lock); > - > if (undo) > /* whoops, we failed to dirty the page */ > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > - > BUG_ON(!PageDirty(page)); > return 1; > } > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > index 458f497..0a331a8 100644 > --- a/include/linux/buffer_head.h > +++ b/include/linux/buffer_head.h > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) > } > > extern int __set_page_dirty_buffers(struct page *page); > +extern int __set_page_dirty(struct page *page, > + struct address_space *mapping, int warn); > > #else /* CONFIG_BLOCK */ > > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-06-29 5:21 ` Sage Weil 0 siblings, 0 replies; 132+ messages in thread From: Sage Weil @ 2012-06-29 5:21 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Thu, 28 Jun 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Following we will treat SetPageDirty and dirty page accounting as an integrated > operation. Filesystems had better use vfs interface directly to avoid those details. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 2 +- > fs/ceph/addr.c | 20 ++------------------ > include/linux/buffer_head.h | 2 ++ > 3 files changed, 5 insertions(+), 19 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index e8d96b8..55522dd 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > * If warn is true, then emit a warning if the page is not uptodate and has > * not been truncated. > */ > -static int __set_page_dirty(struct page *page, > +int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > if (unlikely(!mapping)) This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to continue to build as a module. With that fixed, the ceph bits are a welcome cleanup! Acked-by: Sage Weil <sage@inktank.com> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 8b67304..d028fbe 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -5,6 +5,7 @@ > #include <linux/mm.h> > #include <linux/pagemap.h> > #include <linux/writeback.h> /* generic_writepages */ > +#include <linux/buffer_head.h> > #include <linux/slab.h> > #include <linux/pagevec.h> > #include <linux/task_io_accounting_ops.h> > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > int undo = 0; > struct ceph_snap_context *snapc; > > - if (unlikely(!mapping)) > - return !TestSetPageDirty(page); > - > - if (TestSetPageDirty(page)) { > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > - mapping->host, page, page->index); > + if (!__set_page_dirty(page, mapping, 1)) > return 0; > - } > > inode = mapping->host; > ci = ceph_inode(inode); > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > snapc, snapc->seq, snapc->num_snaps); > spin_unlock(&ci->i_ceph_lock); > > - /* now adjust page */ > - spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > - WARN_ON_ONCE(!PageUptodate(page)); > - account_page_dirtied(page, page->mapping); > - radix_tree_tag_set(&mapping->page_tree, > - page_index(page), PAGECACHE_TAG_DIRTY); > - > /* > * Reference snap context in page->private. Also set > * PagePrivate so that we get invalidatepage callback. > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) > undo = 1; > } > > - spin_unlock_irq(&mapping->tree_lock); > - > if (undo) > /* whoops, we failed to dirty the page */ > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > - > BUG_ON(!PageDirty(page)); > return 1; > } > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > index 458f497..0a331a8 100644 > --- a/include/linux/buffer_head.h > +++ b/include/linux/buffer_head.h > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) > } > > extern int __set_page_dirty_buffers(struct page *page); > +extern int __set_page_dirty(struct page *page, > + struct address_space *mapping, int warn); > > #else /* CONFIG_BLOCK */ > > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-06-29 5:21 ` Sage Weil 0 siblings, 0 replies; 132+ messages in thread From: Sage Weil @ 2012-06-29 5:21 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, sage-BnTBU8nroG7k1uMJSBkQmQ, ceph-devel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On Thu, 28 Jun 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > Following we will treat SetPageDirty and dirty page accounting as an integrated > operation. Filesystems had better use vfs interface directly to avoid those details. > > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > --- > fs/buffer.c | 2 +- > fs/ceph/addr.c | 20 ++------------------ > include/linux/buffer_head.h | 2 ++ > 3 files changed, 5 insertions(+), 19 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index e8d96b8..55522dd 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > * If warn is true, then emit a warning if the page is not uptodate and has > * not been truncated. > */ > -static int __set_page_dirty(struct page *page, > +int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > if (unlikely(!mapping)) This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to continue to build as a module. With that fixed, the ceph bits are a welcome cleanup! Acked-by: Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 8b67304..d028fbe 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -5,6 +5,7 @@ > #include <linux/mm.h> > #include <linux/pagemap.h> > #include <linux/writeback.h> /* generic_writepages */ > +#include <linux/buffer_head.h> > #include <linux/slab.h> > #include <linux/pagevec.h> > #include <linux/task_io_accounting_ops.h> > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > int undo = 0; > struct ceph_snap_context *snapc; > > - if (unlikely(!mapping)) > - return !TestSetPageDirty(page); > - > - if (TestSetPageDirty(page)) { > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > - mapping->host, page, page->index); > + if (!__set_page_dirty(page, mapping, 1)) > return 0; > - } > > inode = mapping->host; > ci = ceph_inode(inode); > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > snapc, snapc->seq, snapc->num_snaps); > spin_unlock(&ci->i_ceph_lock); > > - /* now adjust page */ > - spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > - WARN_ON_ONCE(!PageUptodate(page)); > - account_page_dirtied(page, page->mapping); > - radix_tree_tag_set(&mapping->page_tree, > - page_index(page), PAGECACHE_TAG_DIRTY); > - > /* > * Reference snap context in page->private. Also set > * PagePrivate so that we get invalidatepage callback. > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) > undo = 1; > } > > - spin_unlock_irq(&mapping->tree_lock); > - > if (undo) > /* whoops, we failed to dirty the page */ > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > - > BUG_ON(!PageDirty(page)); > return 1; > } > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > index 458f497..0a331a8 100644 > --- a/include/linux/buffer_head.h > +++ b/include/linux/buffer_head.h > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) > } > > extern int __set_page_dirty_buffers(struct page *page); > +extern int __set_page_dirty(struct page *page, > + struct address_space *mapping, int warn); > > #else /* CONFIG_BLOCK */ > > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem 2012-06-29 5:21 ` Sage Weil @ 2012-07-02 8:10 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-02 8:10 UTC (permalink / raw) To: Sage Weil Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On 06/29/2012 01:21 PM, Sage Weil wrote: > On Thu, 28 Jun 2012, Sha Zhengju wrote: > >> From: Sha Zhengju<handai.szj@taobao.com> >> >> Following we will treat SetPageDirty and dirty page accounting as an integrated >> operation. Filesystems had better use vfs interface directly to avoid those details. >> >> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> --- >> fs/buffer.c | 2 +- >> fs/ceph/addr.c | 20 ++------------------ >> include/linux/buffer_head.h | 2 ++ >> 3 files changed, 5 insertions(+), 19 deletions(-) >> >> diff --git a/fs/buffer.c b/fs/buffer.c >> index e8d96b8..55522dd 100644 >> --- a/fs/buffer.c >> +++ b/fs/buffer.c >> @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >> * If warn is true, then emit a warning if the page is not uptodate and has >> * not been truncated. >> */ >> -static int __set_page_dirty(struct page *page, >> +int __set_page_dirty(struct page *page, >> struct address_space *mapping, int warn) >> { >> if (unlikely(!mapping)) > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > continue to build as a module. > > With that fixed, the ceph bits are a welcome cleanup! > > Acked-by: Sage Weil<sage@inktank.com> Further, I check the path again and may it be reworked as follows to avoid undo? __set_page_dirty(); __set_page_dirty(); ceph operations; ==> if (page->mapping) if (page->mapping) ceph operations; ; else undo = 1; if (undo) xxx; Thanks, Sha >> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c >> index 8b67304..d028fbe 100644 >> --- a/fs/ceph/addr.c >> +++ b/fs/ceph/addr.c >> @@ -5,6 +5,7 @@ >> #include<linux/mm.h> >> #include<linux/pagemap.h> >> #include<linux/writeback.h> /* generic_writepages */ >> +#include<linux/buffer_head.h> >> #include<linux/slab.h> >> #include<linux/pagevec.h> >> #include<linux/task_io_accounting_ops.h> >> @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) >> int undo = 0; >> struct ceph_snap_context *snapc; >> >> - if (unlikely(!mapping)) >> - return !TestSetPageDirty(page); >> - >> - if (TestSetPageDirty(page)) { >> - dout("%p set_page_dirty %p idx %lu -- already dirty\n", >> - mapping->host, page, page->index); >> + if (!__set_page_dirty(page, mapping, 1)) >> return 0; >> - } >> >> inode = mapping->host; >> ci = ceph_inode(inode); >> @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) >> snapc, snapc->seq, snapc->num_snaps); >> spin_unlock(&ci->i_ceph_lock); >> >> - /* now adjust page */ >> - spin_lock_irq(&mapping->tree_lock); >> if (page->mapping) { /* Race with truncate? */ >> - WARN_ON_ONCE(!PageUptodate(page)); >> - account_page_dirtied(page, page->mapping); >> - radix_tree_tag_set(&mapping->page_tree, >> - page_index(page), PAGECACHE_TAG_DIRTY); >> - >> /* >> * Reference snap context in page->private. Also set >> * PagePrivate so that we get invalidatepage callback. >> @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) >> undo = 1; >> } >> >> - spin_unlock_irq(&mapping->tree_lock); >> - >> if (undo) >> /* whoops, we failed to dirty the page */ >> ceph_put_wrbuffer_cap_refs(ci, 1, snapc); >> >> - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> - >> BUG_ON(!PageDirty(page)); >> return 1; >> } >> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h >> index 458f497..0a331a8 100644 >> --- a/include/linux/buffer_head.h >> +++ b/include/linux/buffer_head.h >> @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) >> } >> >> extern int __set_page_dirty_buffers(struct page *page); >> +extern int __set_page_dirty(struct page *page, >> + struct address_space *mapping, int warn); >> >> #else /* CONFIG_BLOCK */ >> >> -- >> 1.7.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-07-02 8:10 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-02 8:10 UTC (permalink / raw) To: Sage Weil Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On 06/29/2012 01:21 PM, Sage Weil wrote: > On Thu, 28 Jun 2012, Sha Zhengju wrote: > >> From: Sha Zhengju<handai.szj@taobao.com> >> >> Following we will treat SetPageDirty and dirty page accounting as an integrated >> operation. Filesystems had better use vfs interface directly to avoid those details. >> >> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> --- >> fs/buffer.c | 2 +- >> fs/ceph/addr.c | 20 ++------------------ >> include/linux/buffer_head.h | 2 ++ >> 3 files changed, 5 insertions(+), 19 deletions(-) >> >> diff --git a/fs/buffer.c b/fs/buffer.c >> index e8d96b8..55522dd 100644 >> --- a/fs/buffer.c >> +++ b/fs/buffer.c >> @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >> * If warn is true, then emit a warning if the page is not uptodate and has >> * not been truncated. >> */ >> -static int __set_page_dirty(struct page *page, >> +int __set_page_dirty(struct page *page, >> struct address_space *mapping, int warn) >> { >> if (unlikely(!mapping)) > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > continue to build as a module. > > With that fixed, the ceph bits are a welcome cleanup! > > Acked-by: Sage Weil<sage@inktank.com> Further, I check the path again and may it be reworked as follows to avoid undo? __set_page_dirty(); __set_page_dirty(); ceph operations; ==> if (page->mapping) if (page->mapping) ceph operations; ; else undo = 1; if (undo) xxx; Thanks, Sha >> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c >> index 8b67304..d028fbe 100644 >> --- a/fs/ceph/addr.c >> +++ b/fs/ceph/addr.c >> @@ -5,6 +5,7 @@ >> #include<linux/mm.h> >> #include<linux/pagemap.h> >> #include<linux/writeback.h> /* generic_writepages */ >> +#include<linux/buffer_head.h> >> #include<linux/slab.h> >> #include<linux/pagevec.h> >> #include<linux/task_io_accounting_ops.h> >> @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) >> int undo = 0; >> struct ceph_snap_context *snapc; >> >> - if (unlikely(!mapping)) >> - return !TestSetPageDirty(page); >> - >> - if (TestSetPageDirty(page)) { >> - dout("%p set_page_dirty %p idx %lu -- already dirty\n", >> - mapping->host, page, page->index); >> + if (!__set_page_dirty(page, mapping, 1)) >> return 0; >> - } >> >> inode = mapping->host; >> ci = ceph_inode(inode); >> @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) >> snapc, snapc->seq, snapc->num_snaps); >> spin_unlock(&ci->i_ceph_lock); >> >> - /* now adjust page */ >> - spin_lock_irq(&mapping->tree_lock); >> if (page->mapping) { /* Race with truncate? */ >> - WARN_ON_ONCE(!PageUptodate(page)); >> - account_page_dirtied(page, page->mapping); >> - radix_tree_tag_set(&mapping->page_tree, >> - page_index(page), PAGECACHE_TAG_DIRTY); >> - >> /* >> * Reference snap context in page->private. Also set >> * PagePrivate so that we get invalidatepage callback. >> @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) >> undo = 1; >> } >> >> - spin_unlock_irq(&mapping->tree_lock); >> - >> if (undo) >> /* whoops, we failed to dirty the page */ >> ceph_put_wrbuffer_cap_refs(ci, 1, snapc); >> >> - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> - >> BUG_ON(!PageDirty(page)); >> return 1; >> } >> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h >> index 458f497..0a331a8 100644 >> --- a/include/linux/buffer_head.h >> +++ b/include/linux/buffer_head.h >> @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) >> } >> >> extern int __set_page_dirty_buffers(struct page *page); >> +extern int __set_page_dirty(struct page *page, >> + struct address_space *mapping, int warn); >> >> #else /* CONFIG_BLOCK */ >> >> -- >> 1.7.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem 2012-07-02 8:10 ` Sha Zhengju @ 2012-07-02 14:49 ` Sage Weil -1 siblings, 0 replies; 132+ messages in thread From: Sage Weil @ 2012-07-02 14:49 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Mon, 2 Jul 2012, Sha Zhengju wrote: > On 06/29/2012 01:21 PM, Sage Weil wrote: > > On Thu, 28 Jun 2012, Sha Zhengju wrote: > > > > > From: Sha Zhengju<handai.szj@taobao.com> > > > > > > Following we will treat SetPageDirty and dirty page accounting as an > > > integrated > > > operation. Filesystems had better use vfs interface directly to avoid > > > those details. > > > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > > > --- > > > fs/buffer.c | 2 +- > > > fs/ceph/addr.c | 20 ++------------------ > > > include/linux/buffer_head.h | 2 ++ > > > 3 files changed, 5 insertions(+), 19 deletions(-) > > > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > index e8d96b8..55522dd 100644 > > > --- a/fs/buffer.c > > > +++ b/fs/buffer.c > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > > > * If warn is true, then emit a warning if the page is not uptodate and > > > has > > > * not been truncated. > > > */ > > > -static int __set_page_dirty(struct page *page, > > > +int __set_page_dirty(struct page *page, > > > struct address_space *mapping, int warn) > > > { > > > if (unlikely(!mapping)) > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > > continue to build as a module. > > > > With that fixed, the ceph bits are a welcome cleanup! > > > > Acked-by: Sage Weil<sage@inktank.com> > > Further, I check the path again and may it be reworked as follows to avoid > undo? > > __set_page_dirty(); > __set_page_dirty(); > ceph operations; ==> if (page->mapping) > if (page->mapping) ceph operations; > ; > else > undo = 1; > if (undo) > xxx; Yep. Taking another look at the original code, though, I'm worried that one reason the __set_page_dirty() actions were spread out the way they are is because we wanted to ensure that the ceph operations were always performed when PagePrivate was set. It looks like invalidatepage won't get called if private isn't set, and presumably it handles the truncate race with __set_page_dirty() properly (right?). What about writeback? Do we need to worry about writepage[s] getting called with a NULL page->private? Thanks! sage > > > > Thanks, > Sha > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > > index 8b67304..d028fbe 100644 > > > --- a/fs/ceph/addr.c > > > +++ b/fs/ceph/addr.c > > > @@ -5,6 +5,7 @@ > > > #include<linux/mm.h> > > > #include<linux/pagemap.h> > > > #include<linux/writeback.h> /* generic_writepages */ > > > +#include<linux/buffer_head.h> > > > #include<linux/slab.h> > > > #include<linux/pagevec.h> > > > #include<linux/task_io_accounting_ops.h> > > > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > > > int undo = 0; > > > struct ceph_snap_context *snapc; > > > > > > - if (unlikely(!mapping)) > > > - return !TestSetPageDirty(page); > > > - > > > - if (TestSetPageDirty(page)) { > > > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > > > - mapping->host, page, page->index); > > > + if (!__set_page_dirty(page, mapping, 1)) > > > return 0; > > > - } > > > > > > inode = mapping->host; > > > ci = ceph_inode(inode); > > > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > > > snapc, snapc->seq, snapc->num_snaps); > > > spin_unlock(&ci->i_ceph_lock); > > > > > > - /* now adjust page */ > > > - spin_lock_irq(&mapping->tree_lock); > > > if (page->mapping) { /* Race with truncate? */ > > > - WARN_ON_ONCE(!PageUptodate(page)); > > > - account_page_dirtied(page, page->mapping); > > > - radix_tree_tag_set(&mapping->page_tree, > > > - page_index(page), PAGECACHE_TAG_DIRTY); > > > - > > > /* > > > * Reference snap context in page->private. Also set > > > * PagePrivate so that we get invalidatepage callback. > > > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) > > > undo = 1; > > > } > > > > > > - spin_unlock_irq(&mapping->tree_lock); > > > - > > > if (undo) > > > /* whoops, we failed to dirty the page */ > > > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > > - > > > BUG_ON(!PageDirty(page)); > > > return 1; > > > } > > > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > > > index 458f497..0a331a8 100644 > > > --- a/include/linux/buffer_head.h > > > +++ b/include/linux/buffer_head.h > > > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) > > > } > > > > > > extern int __set_page_dirty_buffers(struct page *page); > > > +extern int __set_page_dirty(struct page *page, > > > + struct address_space *mapping, int warn); > > > > > > #else /* CONFIG_BLOCK */ > > > > > > -- > > > 1.7.1 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > > > in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-07-02 14:49 ` Sage Weil 0 siblings, 0 replies; 132+ messages in thread From: Sage Weil @ 2012-07-02 14:49 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Mon, 2 Jul 2012, Sha Zhengju wrote: > On 06/29/2012 01:21 PM, Sage Weil wrote: > > On Thu, 28 Jun 2012, Sha Zhengju wrote: > > > > > From: Sha Zhengju<handai.szj@taobao.com> > > > > > > Following we will treat SetPageDirty and dirty page accounting as an > > > integrated > > > operation. Filesystems had better use vfs interface directly to avoid > > > those details. > > > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > > > --- > > > fs/buffer.c | 2 +- > > > fs/ceph/addr.c | 20 ++------------------ > > > include/linux/buffer_head.h | 2 ++ > > > 3 files changed, 5 insertions(+), 19 deletions(-) > > > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > index e8d96b8..55522dd 100644 > > > --- a/fs/buffer.c > > > +++ b/fs/buffer.c > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > > > * If warn is true, then emit a warning if the page is not uptodate and > > > has > > > * not been truncated. > > > */ > > > -static int __set_page_dirty(struct page *page, > > > +int __set_page_dirty(struct page *page, > > > struct address_space *mapping, int warn) > > > { > > > if (unlikely(!mapping)) > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > > continue to build as a module. > > > > With that fixed, the ceph bits are a welcome cleanup! > > > > Acked-by: Sage Weil<sage@inktank.com> > > Further, I check the path again and may it be reworked as follows to avoid > undo? > > __set_page_dirty(); > __set_page_dirty(); > ceph operations; ==> if (page->mapping) > if (page->mapping) ceph operations; > ; > else > undo = 1; > if (undo) > xxx; Yep. Taking another look at the original code, though, I'm worried that one reason the __set_page_dirty() actions were spread out the way they are is because we wanted to ensure that the ceph operations were always performed when PagePrivate was set. It looks like invalidatepage won't get called if private isn't set, and presumably it handles the truncate race with __set_page_dirty() properly (right?). What about writeback? Do we need to worry about writepage[s] getting called with a NULL page->private? Thanks! sage > > > > Thanks, > Sha > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > > index 8b67304..d028fbe 100644 > > > --- a/fs/ceph/addr.c > > > +++ b/fs/ceph/addr.c > > > @@ -5,6 +5,7 @@ > > > #include<linux/mm.h> > > > #include<linux/pagemap.h> > > > #include<linux/writeback.h> /* generic_writepages */ > > > +#include<linux/buffer_head.h> > > > #include<linux/slab.h> > > > #include<linux/pagevec.h> > > > #include<linux/task_io_accounting_ops.h> > > > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > > > int undo = 0; > > > struct ceph_snap_context *snapc; > > > > > > - if (unlikely(!mapping)) > > > - return !TestSetPageDirty(page); > > > - > > > - if (TestSetPageDirty(page)) { > > > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > > > - mapping->host, page, page->index); > > > + if (!__set_page_dirty(page, mapping, 1)) > > > return 0; > > > - } > > > > > > inode = mapping->host; > > > ci = ceph_inode(inode); > > > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > > > snapc, snapc->seq, snapc->num_snaps); > > > spin_unlock(&ci->i_ceph_lock); > > > > > > - /* now adjust page */ > > > - spin_lock_irq(&mapping->tree_lock); > > > if (page->mapping) { /* Race with truncate? */ > > > - WARN_ON_ONCE(!PageUptodate(page)); > > > - account_page_dirtied(page, page->mapping); > > > - radix_tree_tag_set(&mapping->page_tree, > > > - page_index(page), PAGECACHE_TAG_DIRTY); > > > - > > > /* > > > * Reference snap context in page->private. Also set > > > * PagePrivate so that we get invalidatepage callback. > > > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) > > > undo = 1; > > > } > > > > > > - spin_unlock_irq(&mapping->tree_lock); > > > - > > > if (undo) > > > /* whoops, we failed to dirty the page */ > > > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > > - > > > BUG_ON(!PageDirty(page)); > > > return 1; > > > } > > > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > > > index 458f497..0a331a8 100644 > > > --- a/include/linux/buffer_head.h > > > +++ b/include/linux/buffer_head.h > > > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) > > > } > > > > > > extern int __set_page_dirty_buffers(struct page *page); > > > +extern int __set_page_dirty(struct page *page, > > > + struct address_space *mapping, int warn); > > > > > > #else /* CONFIG_BLOCK */ > > > > > > -- > > > 1.7.1 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > > > in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem 2012-07-02 14:49 ` Sage Weil @ 2012-07-04 8:11 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 8:11 UTC (permalink / raw) To: Sage Weil Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On 07/02/2012 10:49 PM, Sage Weil wrote: > On Mon, 2 Jul 2012, Sha Zhengju wrote: >> On 06/29/2012 01:21 PM, Sage Weil wrote: >>> On Thu, 28 Jun 2012, Sha Zhengju wrote: >>> >>>> From: Sha Zhengju<handai.szj@taobao.com> >>>> >>>> Following we will treat SetPageDirty and dirty page accounting as an >>>> integrated >>>> operation. Filesystems had better use vfs interface directly to avoid >>>> those details. >>>> >>>> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >>>> --- >>>> fs/buffer.c | 2 +- >>>> fs/ceph/addr.c | 20 ++------------------ >>>> include/linux/buffer_head.h | 2 ++ >>>> 3 files changed, 5 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/fs/buffer.c b/fs/buffer.c >>>> index e8d96b8..55522dd 100644 >>>> --- a/fs/buffer.c >>>> +++ b/fs/buffer.c >>>> @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >>>> * If warn is true, then emit a warning if the page is not uptodate and >>>> has >>>> * not been truncated. >>>> */ >>>> -static int __set_page_dirty(struct page *page, >>>> +int __set_page_dirty(struct page *page, >>>> struct address_space *mapping, int warn) >>>> { >>>> if (unlikely(!mapping)) >>> This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to >>> continue to build as a module. >>> >>> With that fixed, the ceph bits are a welcome cleanup! >>> >>> Acked-by: Sage Weil<sage@inktank.com> >> Further, I check the path again and may it be reworked as follows to avoid >> undo? >> >> __set_page_dirty(); >> __set_page_dirty(); >> ceph operations; ==> if (page->mapping) >> if (page->mapping) ceph operations; >> ; >> else >> undo = 1; >> if (undo) >> xxx; > Yep. Taking another look at the original code, though, I'm worried that > one reason the __set_page_dirty() actions were spread out the way they are > is because we wanted to ensure that the ceph operations were always > performed when PagePrivate was set. > Sorry, I've lost something: __set_page_dirty(); __set_page_dirty(); ceph operations; if(page->mapping) ==> if(page->mapping) { SetPagePrivate; SetPagePrivate; else ceph operations; undo = 1; } if (undo) XXX; I think this can ensure that ceph operations are performed together with SetPagePrivate. > It looks like invalidatepage won't get called if private isn't set, and > presumably it handles the truncate race with __set_page_dirty() properly > (right?). What about writeback? Do we need to worry about writepage[s] > getting called with a NULL page->private? __set_page_dirty does handle racing conditions with truncate and writeback writepage[s] also take page->private into consideration which is done inside specific filesystems. I notice that ceph has handled this in ceph_writepage(). Sorry, not vfs expert and maybe I've not caught your point... Thanks, Sha > Thanks! > sage > > > >> >> >> Thanks, >> Sha >> >>>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c >>>> index 8b67304..d028fbe 100644 >>>> --- a/fs/ceph/addr.c >>>> +++ b/fs/ceph/addr.c >>>> @@ -5,6 +5,7 @@ >>>> #include<linux/mm.h> >>>> #include<linux/pagemap.h> >>>> #include<linux/writeback.h> /* generic_writepages */ >>>> +#include<linux/buffer_head.h> >>>> #include<linux/slab.h> >>>> #include<linux/pagevec.h> >>>> #include<linux/task_io_accounting_ops.h> >>>> @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) >>>> int undo = 0; >>>> struct ceph_snap_context *snapc; >>>> >>>> - if (unlikely(!mapping)) >>>> - return !TestSetPageDirty(page); >>>> - >>>> - if (TestSetPageDirty(page)) { >>>> - dout("%p set_page_dirty %p idx %lu -- already dirty\n", >>>> - mapping->host, page, page->index); >>>> + if (!__set_page_dirty(page, mapping, 1)) >>>> return 0; >>>> - } >>>> >>>> inode = mapping->host; >>>> ci = ceph_inode(inode); >>>> @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) >>>> snapc, snapc->seq, snapc->num_snaps); >>>> spin_unlock(&ci->i_ceph_lock); >>>> >>>> - /* now adjust page */ >>>> - spin_lock_irq(&mapping->tree_lock); >>>> if (page->mapping) { /* Race with truncate? */ >>>> - WARN_ON_ONCE(!PageUptodate(page)); >>>> - account_page_dirtied(page, page->mapping); >>>> - radix_tree_tag_set(&mapping->page_tree, >>>> - page_index(page), PAGECACHE_TAG_DIRTY); >>>> - >>>> /* >>>> * Reference snap context in page->private. Also set >>>> * PagePrivate so that we get invalidatepage callback. >>>> @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) >>>> undo = 1; >>>> } >>>> >>>> - spin_unlock_irq(&mapping->tree_lock); >>>> - >>>> if (undo) >>>> /* whoops, we failed to dirty the page */ >>>> ceph_put_wrbuffer_cap_refs(ci, 1, snapc); >>>> >>>> - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >>>> - >>>> BUG_ON(!PageDirty(page)); >>>> return 1; >>>> } >>>> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h >>>> index 458f497..0a331a8 100644 >>>> --- a/include/linux/buffer_head.h >>>> +++ b/include/linux/buffer_head.h >>>> @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) >>>> } >>>> >>>> extern int __set_page_dirty_buffers(struct page *page); >>>> +extern int __set_page_dirty(struct page *page, >>>> + struct address_space *mapping, int warn); >>>> >>>> #else /* CONFIG_BLOCK */ >>>> >>>> -- >>>> 1.7.1 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" >>>> in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-07-04 8:11 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 8:11 UTC (permalink / raw) To: Sage Weil Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On 07/02/2012 10:49 PM, Sage Weil wrote: > On Mon, 2 Jul 2012, Sha Zhengju wrote: >> On 06/29/2012 01:21 PM, Sage Weil wrote: >>> On Thu, 28 Jun 2012, Sha Zhengju wrote: >>> >>>> From: Sha Zhengju<handai.szj@taobao.com> >>>> >>>> Following we will treat SetPageDirty and dirty page accounting as an >>>> integrated >>>> operation. Filesystems had better use vfs interface directly to avoid >>>> those details. >>>> >>>> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >>>> --- >>>> fs/buffer.c | 2 +- >>>> fs/ceph/addr.c | 20 ++------------------ >>>> include/linux/buffer_head.h | 2 ++ >>>> 3 files changed, 5 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/fs/buffer.c b/fs/buffer.c >>>> index e8d96b8..55522dd 100644 >>>> --- a/fs/buffer.c >>>> +++ b/fs/buffer.c >>>> @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >>>> * If warn is true, then emit a warning if the page is not uptodate and >>>> has >>>> * not been truncated. >>>> */ >>>> -static int __set_page_dirty(struct page *page, >>>> +int __set_page_dirty(struct page *page, >>>> struct address_space *mapping, int warn) >>>> { >>>> if (unlikely(!mapping)) >>> This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to >>> continue to build as a module. >>> >>> With that fixed, the ceph bits are a welcome cleanup! >>> >>> Acked-by: Sage Weil<sage@inktank.com> >> Further, I check the path again and may it be reworked as follows to avoid >> undo? >> >> __set_page_dirty(); >> __set_page_dirty(); >> ceph operations; ==> if (page->mapping) >> if (page->mapping) ceph operations; >> ; >> else >> undo = 1; >> if (undo) >> xxx; > Yep. Taking another look at the original code, though, I'm worried that > one reason the __set_page_dirty() actions were spread out the way they are > is because we wanted to ensure that the ceph operations were always > performed when PagePrivate was set. > Sorry, I've lost something: __set_page_dirty(); __set_page_dirty(); ceph operations; if(page->mapping) ==> if(page->mapping) { SetPagePrivate; SetPagePrivate; else ceph operations; undo = 1; } if (undo) XXX; I think this can ensure that ceph operations are performed together with SetPagePrivate. > It looks like invalidatepage won't get called if private isn't set, and > presumably it handles the truncate race with __set_page_dirty() properly > (right?). What about writeback? Do we need to worry about writepage[s] > getting called with a NULL page->private? __set_page_dirty does handle racing conditions with truncate and writeback writepage[s] also take page->private into consideration which is done inside specific filesystems. I notice that ceph has handled this in ceph_writepage(). Sorry, not vfs expert and maybe I've not caught your point... Thanks, Sha > Thanks! > sage > > > >> >> >> Thanks, >> Sha >> >>>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c >>>> index 8b67304..d028fbe 100644 >>>> --- a/fs/ceph/addr.c >>>> +++ b/fs/ceph/addr.c >>>> @@ -5,6 +5,7 @@ >>>> #include<linux/mm.h> >>>> #include<linux/pagemap.h> >>>> #include<linux/writeback.h> /* generic_writepages */ >>>> +#include<linux/buffer_head.h> >>>> #include<linux/slab.h> >>>> #include<linux/pagevec.h> >>>> #include<linux/task_io_accounting_ops.h> >>>> @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) >>>> int undo = 0; >>>> struct ceph_snap_context *snapc; >>>> >>>> - if (unlikely(!mapping)) >>>> - return !TestSetPageDirty(page); >>>> - >>>> - if (TestSetPageDirty(page)) { >>>> - dout("%p set_page_dirty %p idx %lu -- already dirty\n", >>>> - mapping->host, page, page->index); >>>> + if (!__set_page_dirty(page, mapping, 1)) >>>> return 0; >>>> - } >>>> >>>> inode = mapping->host; >>>> ci = ceph_inode(inode); >>>> @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) >>>> snapc, snapc->seq, snapc->num_snaps); >>>> spin_unlock(&ci->i_ceph_lock); >>>> >>>> - /* now adjust page */ >>>> - spin_lock_irq(&mapping->tree_lock); >>>> if (page->mapping) { /* Race with truncate? */ >>>> - WARN_ON_ONCE(!PageUptodate(page)); >>>> - account_page_dirtied(page, page->mapping); >>>> - radix_tree_tag_set(&mapping->page_tree, >>>> - page_index(page), PAGECACHE_TAG_DIRTY); >>>> - >>>> /* >>>> * Reference snap context in page->private. Also set >>>> * PagePrivate so that we get invalidatepage callback. >>>> @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) >>>> undo = 1; >>>> } >>>> >>>> - spin_unlock_irq(&mapping->tree_lock); >>>> - >>>> if (undo) >>>> /* whoops, we failed to dirty the page */ >>>> ceph_put_wrbuffer_cap_refs(ci, 1, snapc); >>>> >>>> - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >>>> - >>>> BUG_ON(!PageDirty(page)); >>>> return 1; >>>> } >>>> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h >>>> index 458f497..0a331a8 100644 >>>> --- a/include/linux/buffer_head.h >>>> +++ b/include/linux/buffer_head.h >>>> @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) >>>> } >>>> >>>> extern int __set_page_dirty_buffers(struct page *page); >>>> +extern int __set_page_dirty(struct page *page, >>>> + struct address_space *mapping, int warn); >>>> >>>> #else /* CONFIG_BLOCK */ >>>> >>>> -- >>>> 1.7.1 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" >>>> in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem 2012-07-04 8:11 ` Sha Zhengju @ 2012-07-05 15:20 ` Sage Weil -1 siblings, 0 replies; 132+ messages in thread From: Sage Weil @ 2012-07-05 15:20 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Wed, 4 Jul 2012, Sha Zhengju wrote: > On 07/02/2012 10:49 PM, Sage Weil wrote: > > On Mon, 2 Jul 2012, Sha Zhengju wrote: > > > On 06/29/2012 01:21 PM, Sage Weil wrote: > > > > On Thu, 28 Jun 2012, Sha Zhengju wrote: > > > > > > > > > From: Sha Zhengju<handai.szj@taobao.com> > > > > > > > > > > Following we will treat SetPageDirty and dirty page accounting as an > > > > > integrated > > > > > operation. Filesystems had better use vfs interface directly to avoid > > > > > those details. > > > > > > > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > > > > > --- > > > > > fs/buffer.c | 2 +- > > > > > fs/ceph/addr.c | 20 ++------------------ > > > > > include/linux/buffer_head.h | 2 ++ > > > > > 3 files changed, 5 insertions(+), 19 deletions(-) > > > > > > > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > > > index e8d96b8..55522dd 100644 > > > > > --- a/fs/buffer.c > > > > > +++ b/fs/buffer.c > > > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > > > > > * If warn is true, then emit a warning if the page is not uptodate > > > > > and > > > > > has > > > > > * not been truncated. > > > > > */ > > > > > -static int __set_page_dirty(struct page *page, > > > > > +int __set_page_dirty(struct page *page, > > > > > struct address_space *mapping, int warn) > > > > > { > > > > > if (unlikely(!mapping)) > > > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > > > > continue to build as a module. > > > > > > > > With that fixed, the ceph bits are a welcome cleanup! > > > > > > > > Acked-by: Sage Weil<sage@inktank.com> > > > Further, I check the path again and may it be reworked as follows to avoid > > > undo? > > > > > > __set_page_dirty(); > > > __set_page_dirty(); > > > ceph operations; ==> if (page->mapping) > > > if (page->mapping) ceph > > > operations; > > > ; > > > else > > > undo = 1; > > > if (undo) > > > xxx; > > Yep. Taking another look at the original code, though, I'm worried that > > one reason the __set_page_dirty() actions were spread out the way they are > > is because we wanted to ensure that the ceph operations were always > > performed when PagePrivate was set. > > > > Sorry, I've lost something: > > __set_page_dirty(); __set_page_dirty(); > ceph operations; > if(page->mapping) ==> if(page->mapping) { > SetPagePrivate; SetPagePrivate; > else ceph operations; > undo = 1; } > > if (undo) > XXX; > > I think this can ensure that ceph operations are performed together with > SetPagePrivate. Yeah, that looks right, as long as the ceph accounting operations happen before SetPagePrivate. I think it's no more or less racy than before, at least. The patch doesn't apply without the previous ones in the series, it looks like. Do you want to prepare a new version or should I? Thanks! sage > > It looks like invalidatepage won't get called if private isn't set, and > > presumably it handles the truncate race with __set_page_dirty() properly > > (right?). What about writeback? Do we need to worry about writepage[s] > > getting called with a NULL page->private? > > __set_page_dirty does handle racing conditions with truncate and > writeback writepage[s] also take page->private into consideration > which is done inside specific filesystems. I notice that ceph has handled > this in ceph_writepage(). > Sorry, not vfs expert and maybe I've not caught your point... > > > > Thanks, > Sha > > > Thanks! > > sage > > > > > > > > > > > > > > > Thanks, > > > Sha > > > > > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > > > > index 8b67304..d028fbe 100644 > > > > > --- a/fs/ceph/addr.c > > > > > +++ b/fs/ceph/addr.c > > > > > @@ -5,6 +5,7 @@ > > > > > #include<linux/mm.h> > > > > > #include<linux/pagemap.h> > > > > > #include<linux/writeback.h> /* generic_writepages */ > > > > > +#include<linux/buffer_head.h> > > > > > #include<linux/slab.h> > > > > > #include<linux/pagevec.h> > > > > > #include<linux/task_io_accounting_ops.h> > > > > > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > > > > > int undo = 0; > > > > > struct ceph_snap_context *snapc; > > > > > > > > > > - if (unlikely(!mapping)) > > > > > - return !TestSetPageDirty(page); > > > > > - > > > > > - if (TestSetPageDirty(page)) { > > > > > - dout("%p set_page_dirty %p idx %lu -- already > > > > > dirty\n", > > > > > - mapping->host, page, page->index); > > > > > + if (!__set_page_dirty(page, mapping, 1)) > > > > > return 0; > > > > > - } > > > > > > > > > > inode = mapping->host; > > > > > ci = ceph_inode(inode); > > > > > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > > > > > snapc, snapc->seq, snapc->num_snaps); > > > > > spin_unlock(&ci->i_ceph_lock); > > > > > > > > > > - /* now adjust page */ > > > > > - spin_lock_irq(&mapping->tree_lock); > > > > > if (page->mapping) { /* Race with truncate? */ > > > > > - WARN_ON_ONCE(!PageUptodate(page)); > > > > > - account_page_dirtied(page, page->mapping); > > > > > - radix_tree_tag_set(&mapping->page_tree, > > > > > - page_index(page), > > > > > PAGECACHE_TAG_DIRTY); > > > > > - > > > > > /* > > > > > * Reference snap context in page->private. Also set > > > > > * PagePrivate so that we get invalidatepage callback. > > > > > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page > > > > > *page) > > > > > undo = 1; > > > > > } > > > > > > > > > > - spin_unlock_irq(&mapping->tree_lock); > > > > > - > > > > > if (undo) > > > > > /* whoops, we failed to dirty the page */ > > > > > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > > > > > > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > > > > - > > > > > BUG_ON(!PageDirty(page)); > > > > > return 1; > > > > > } > > > > > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > > > > > index 458f497..0a331a8 100644 > > > > > --- a/include/linux/buffer_head.h > > > > > +++ b/include/linux/buffer_head.h > > > > > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head > > > > > *bh) > > > > > } > > > > > > > > > > extern int __set_page_dirty_buffers(struct page *page); > > > > > +extern int __set_page_dirty(struct page *page, > > > > > + struct address_space *mapping, int warn); > > > > > > > > > > #else /* CONFIG_BLOCK */ > > > > > > > > > > -- > > > > > 1.7.1 > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe > > > > > linux-fsdevel" > > > > > in > > > > > the body of a message to majordomo@vger.kernel.org > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-07-05 15:20 ` Sage Weil 0 siblings, 0 replies; 132+ messages in thread From: Sage Weil @ 2012-07-05 15:20 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Wed, 4 Jul 2012, Sha Zhengju wrote: > On 07/02/2012 10:49 PM, Sage Weil wrote: > > On Mon, 2 Jul 2012, Sha Zhengju wrote: > > > On 06/29/2012 01:21 PM, Sage Weil wrote: > > > > On Thu, 28 Jun 2012, Sha Zhengju wrote: > > > > > > > > > From: Sha Zhengju<handai.szj@taobao.com> > > > > > > > > > > Following we will treat SetPageDirty and dirty page accounting as an > > > > > integrated > > > > > operation. Filesystems had better use vfs interface directly to avoid > > > > > those details. > > > > > > > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > > > > > --- > > > > > fs/buffer.c | 2 +- > > > > > fs/ceph/addr.c | 20 ++------------------ > > > > > include/linux/buffer_head.h | 2 ++ > > > > > 3 files changed, 5 insertions(+), 19 deletions(-) > > > > > > > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > > > index e8d96b8..55522dd 100644 > > > > > --- a/fs/buffer.c > > > > > +++ b/fs/buffer.c > > > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > > > > > * If warn is true, then emit a warning if the page is not uptodate > > > > > and > > > > > has > > > > > * not been truncated. > > > > > */ > > > > > -static int __set_page_dirty(struct page *page, > > > > > +int __set_page_dirty(struct page *page, > > > > > struct address_space *mapping, int warn) > > > > > { > > > > > if (unlikely(!mapping)) > > > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > > > > continue to build as a module. > > > > > > > > With that fixed, the ceph bits are a welcome cleanup! > > > > > > > > Acked-by: Sage Weil<sage@inktank.com> > > > Further, I check the path again and may it be reworked as follows to avoid > > > undo? > > > > > > __set_page_dirty(); > > > __set_page_dirty(); > > > ceph operations; ==> if (page->mapping) > > > if (page->mapping) ceph > > > operations; > > > ; > > > else > > > undo = 1; > > > if (undo) > > > xxx; > > Yep. Taking another look at the original code, though, I'm worried that > > one reason the __set_page_dirty() actions were spread out the way they are > > is because we wanted to ensure that the ceph operations were always > > performed when PagePrivate was set. > > > > Sorry, I've lost something: > > __set_page_dirty(); __set_page_dirty(); > ceph operations; > if(page->mapping) ==> if(page->mapping) { > SetPagePrivate; SetPagePrivate; > else ceph operations; > undo = 1; } > > if (undo) > XXX; > > I think this can ensure that ceph operations are performed together with > SetPagePrivate. Yeah, that looks right, as long as the ceph accounting operations happen before SetPagePrivate. I think it's no more or less racy than before, at least. The patch doesn't apply without the previous ones in the series, it looks like. Do you want to prepare a new version or should I? Thanks! sage > > It looks like invalidatepage won't get called if private isn't set, and > > presumably it handles the truncate race with __set_page_dirty() properly > > (right?). What about writeback? Do we need to worry about writepage[s] > > getting called with a NULL page->private? > > __set_page_dirty does handle racing conditions with truncate and > writeback writepage[s] also take page->private into consideration > which is done inside specific filesystems. I notice that ceph has handled > this in ceph_writepage(). > Sorry, not vfs expert and maybe I've not caught your point... > > > > Thanks, > Sha > > > Thanks! > > sage > > > > > > > > > > > > > > > Thanks, > > > Sha > > > > > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > > > > index 8b67304..d028fbe 100644 > > > > > --- a/fs/ceph/addr.c > > > > > +++ b/fs/ceph/addr.c > > > > > @@ -5,6 +5,7 @@ > > > > > #include<linux/mm.h> > > > > > #include<linux/pagemap.h> > > > > > #include<linux/writeback.h> /* generic_writepages */ > > > > > +#include<linux/buffer_head.h> > > > > > #include<linux/slab.h> > > > > > #include<linux/pagevec.h> > > > > > #include<linux/task_io_accounting_ops.h> > > > > > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > > > > > int undo = 0; > > > > > struct ceph_snap_context *snapc; > > > > > > > > > > - if (unlikely(!mapping)) > > > > > - return !TestSetPageDirty(page); > > > > > - > > > > > - if (TestSetPageDirty(page)) { > > > > > - dout("%p set_page_dirty %p idx %lu -- already > > > > > dirty\n", > > > > > - mapping->host, page, page->index); > > > > > + if (!__set_page_dirty(page, mapping, 1)) > > > > > return 0; > > > > > - } > > > > > > > > > > inode = mapping->host; > > > > > ci = ceph_inode(inode); > > > > > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > > > > > snapc, snapc->seq, snapc->num_snaps); > > > > > spin_unlock(&ci->i_ceph_lock); > > > > > > > > > > - /* now adjust page */ > > > > > - spin_lock_irq(&mapping->tree_lock); > > > > > if (page->mapping) { /* Race with truncate? */ > > > > > - WARN_ON_ONCE(!PageUptodate(page)); > > > > > - account_page_dirtied(page, page->mapping); > > > > > - radix_tree_tag_set(&mapping->page_tree, > > > > > - page_index(page), > > > > > PAGECACHE_TAG_DIRTY); > > > > > - > > > > > /* > > > > > * Reference snap context in page->private. Also set > > > > > * PagePrivate so that we get invalidatepage callback. > > > > > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page > > > > > *page) > > > > > undo = 1; > > > > > } > > > > > > > > > > - spin_unlock_irq(&mapping->tree_lock); > > > > > - > > > > > if (undo) > > > > > /* whoops, we failed to dirty the page */ > > > > > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > > > > > > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > > > > - > > > > > BUG_ON(!PageDirty(page)); > > > > > return 1; > > > > > } > > > > > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > > > > > index 458f497..0a331a8 100644 > > > > > --- a/include/linux/buffer_head.h > > > > > +++ b/include/linux/buffer_head.h > > > > > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head > > > > > *bh) > > > > > } > > > > > > > > > > extern int __set_page_dirty_buffers(struct page *page); > > > > > +extern int __set_page_dirty(struct page *page, > > > > > + struct address_space *mapping, int warn); > > > > > > > > > > #else /* CONFIG_BLOCK */ > > > > > > > > > > -- > > > > > 1.7.1 > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe > > > > > linux-fsdevel" > > > > > in > > > > > the body of a message to majordomo@vger.kernel.org > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem 2012-07-05 15:20 ` Sage Weil @ 2012-07-05 15:40 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-05 15:40 UTC (permalink / raw) To: Sage Weil Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Thu, Jul 5, 2012 at 11:20 PM, Sage Weil <sage@inktank.com> wrote: > On Wed, 4 Jul 2012, Sha Zhengju wrote: >> On 07/02/2012 10:49 PM, Sage Weil wrote: >> > On Mon, 2 Jul 2012, Sha Zhengju wrote: >> > > On 06/29/2012 01:21 PM, Sage Weil wrote: >> > > > On Thu, 28 Jun 2012, Sha Zhengju wrote: >> > > > >> > > > > From: Sha Zhengju<handai.szj@taobao.com> >> > > > > >> > > > > Following we will treat SetPageDirty and dirty page accounting as an >> > > > > integrated >> > > > > operation. Filesystems had better use vfs interface directly to avoid >> > > > > those details. >> > > > > >> > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> > > > > --- >> > > > > fs/buffer.c | 2 +- >> > > > > fs/ceph/addr.c | 20 ++------------------ >> > > > > include/linux/buffer_head.h | 2 ++ >> > > > > 3 files changed, 5 insertions(+), 19 deletions(-) >> > > > > >> > > > > diff --git a/fs/buffer.c b/fs/buffer.c >> > > > > index e8d96b8..55522dd 100644 >> > > > > --- a/fs/buffer.c >> > > > > +++ b/fs/buffer.c >> > > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >> > > > > * If warn is true, then emit a warning if the page is not uptodate >> > > > > and >> > > > > has >> > > > > * not been truncated. >> > > > > */ >> > > > > -static int __set_page_dirty(struct page *page, >> > > > > +int __set_page_dirty(struct page *page, >> > > > > struct address_space *mapping, int warn) >> > > > > { >> > > > > if (unlikely(!mapping)) >> > > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to >> > > > continue to build as a module. >> > > > >> > > > With that fixed, the ceph bits are a welcome cleanup! >> > > > >> > > > Acked-by: Sage Weil<sage@inktank.com> >> > > Further, I check the path again and may it be reworked as follows to avoid >> > > undo? >> > > >> > > __set_page_dirty(); >> > > __set_page_dirty(); >> > > ceph operations; ==> if (page->mapping) >> > > if (page->mapping) ceph >> > > operations; >> > > ; >> > > else >> > > undo = 1; >> > > if (undo) >> > > xxx; >> > Yep. Taking another look at the original code, though, I'm worried that >> > one reason the __set_page_dirty() actions were spread out the way they are >> > is because we wanted to ensure that the ceph operations were always >> > performed when PagePrivate was set. >> > >> >> Sorry, I've lost something: >> >> __set_page_dirty(); __set_page_dirty(); >> ceph operations; >> if(page->mapping) ==> if(page->mapping) { >> SetPagePrivate; SetPagePrivate; >> else ceph operations; >> undo = 1; } >> >> if (undo) >> XXX; >> >> I think this can ensure that ceph operations are performed together with >> SetPagePrivate. > > Yeah, that looks right, as long as the ceph accounting operations happen > before SetPagePrivate. I think it's no more or less racy than before, at > least. > > The patch doesn't apply without the previous ones in the series, it looks > like. Do you want to prepare a new version or should I? > Good. I'm doing some test then I'll send out a new version patchset, please wait a bit. : ) Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-07-05 15:40 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-05 15:40 UTC (permalink / raw) To: Sage Weil Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Thu, Jul 5, 2012 at 11:20 PM, Sage Weil <sage@inktank.com> wrote: > On Wed, 4 Jul 2012, Sha Zhengju wrote: >> On 07/02/2012 10:49 PM, Sage Weil wrote: >> > On Mon, 2 Jul 2012, Sha Zhengju wrote: >> > > On 06/29/2012 01:21 PM, Sage Weil wrote: >> > > > On Thu, 28 Jun 2012, Sha Zhengju wrote: >> > > > >> > > > > From: Sha Zhengju<handai.szj@taobao.com> >> > > > > >> > > > > Following we will treat SetPageDirty and dirty page accounting as an >> > > > > integrated >> > > > > operation. Filesystems had better use vfs interface directly to avoid >> > > > > those details. >> > > > > >> > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> > > > > --- >> > > > > fs/buffer.c | 2 +- >> > > > > fs/ceph/addr.c | 20 ++------------------ >> > > > > include/linux/buffer_head.h | 2 ++ >> > > > > 3 files changed, 5 insertions(+), 19 deletions(-) >> > > > > >> > > > > diff --git a/fs/buffer.c b/fs/buffer.c >> > > > > index e8d96b8..55522dd 100644 >> > > > > --- a/fs/buffer.c >> > > > > +++ b/fs/buffer.c >> > > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >> > > > > * If warn is true, then emit a warning if the page is not uptodate >> > > > > and >> > > > > has >> > > > > * not been truncated. >> > > > > */ >> > > > > -static int __set_page_dirty(struct page *page, >> > > > > +int __set_page_dirty(struct page *page, >> > > > > struct address_space *mapping, int warn) >> > > > > { >> > > > > if (unlikely(!mapping)) >> > > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to >> > > > continue to build as a module. >> > > > >> > > > With that fixed, the ceph bits are a welcome cleanup! >> > > > >> > > > Acked-by: Sage Weil<sage@inktank.com> >> > > Further, I check the path again and may it be reworked as follows to avoid >> > > undo? >> > > >> > > __set_page_dirty(); >> > > __set_page_dirty(); >> > > ceph operations; ==> if (page->mapping) >> > > if (page->mapping) ceph >> > > operations; >> > > ; >> > > else >> > > undo = 1; >> > > if (undo) >> > > xxx; >> > Yep. Taking another look at the original code, though, I'm worried that >> > one reason the __set_page_dirty() actions were spread out the way they are >> > is because we wanted to ensure that the ceph operations were always >> > performed when PagePrivate was set. >> > >> >> Sorry, I've lost something: >> >> __set_page_dirty(); __set_page_dirty(); >> ceph operations; >> if(page->mapping) ==> if(page->mapping) { >> SetPagePrivate; SetPagePrivate; >> else ceph operations; >> undo = 1; } >> >> if (undo) >> XXX; >> >> I think this can ensure that ceph operations are performed together with >> SetPagePrivate. > > Yeah, that looks right, as long as the ceph accounting operations happen > before SetPagePrivate. I think it's no more or less racy than before, at > least. > > The patch doesn't apply without the previous ones in the series, it looks > like. Do you want to prepare a new version or should I? > Good. I'm doing some test then I'll send out a new version patchset, please wait a bit. : ) Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem 2012-06-28 11:03 ` Sha Zhengju @ 2012-07-04 14:27 ` Michal Hocko -1 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 14:27 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Thu 28-06-12 19:03:43, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Following we will treat SetPageDirty and dirty page accounting as an integrated > operation. Filesystems had better use vfs interface directly to avoid those details. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 2 +- > fs/ceph/addr.c | 20 ++------------------ > include/linux/buffer_head.h | 2 ++ > 3 files changed, 5 insertions(+), 19 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index e8d96b8..55522dd 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > * If warn is true, then emit a warning if the page is not uptodate and has > * not been truncated. > */ > -static int __set_page_dirty(struct page *page, > +int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > if (unlikely(!mapping)) > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 8b67304..d028fbe 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -5,6 +5,7 @@ > #include <linux/mm.h> > #include <linux/pagemap.h> > #include <linux/writeback.h> /* generic_writepages */ > +#include <linux/buffer_head.h> > #include <linux/slab.h> > #include <linux/pagevec.h> > #include <linux/task_io_accounting_ops.h> > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > int undo = 0; > struct ceph_snap_context *snapc; > > - if (unlikely(!mapping)) > - return !TestSetPageDirty(page); > - > - if (TestSetPageDirty(page)) { > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > - mapping->host, page, page->index); I am not familiar with the code but this looks we loose an information about something bad(?) is going on? > + if (!__set_page_dirty(page, mapping, 1)) > return 0; > - } > > inode = mapping->host; > ci = ceph_inode(inode); -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem @ 2012-07-04 14:27 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 14:27 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, torvalds, viro, linux-fsdevel, sage, ceph-devel, Sha Zhengju On Thu 28-06-12 19:03:43, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Following we will treat SetPageDirty and dirty page accounting as an integrated > operation. Filesystems had better use vfs interface directly to avoid those details. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 2 +- > fs/ceph/addr.c | 20 ++------------------ > include/linux/buffer_head.h | 2 ++ > 3 files changed, 5 insertions(+), 19 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index e8d96b8..55522dd 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > * If warn is true, then emit a warning if the page is not uptodate and has > * not been truncated. > */ > -static int __set_page_dirty(struct page *page, > +int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > if (unlikely(!mapping)) > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 8b67304..d028fbe 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -5,6 +5,7 @@ > #include <linux/mm.h> > #include <linux/pagemap.h> > #include <linux/writeback.h> /* generic_writepages */ > +#include <linux/buffer_head.h> > #include <linux/slab.h> > #include <linux/pagevec.h> > #include <linux/task_io_accounting_ops.h> > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > int undo = 0; > struct ceph_snap_context *snapc; > > - if (unlikely(!mapping)) > - return !TestSetPageDirty(page); > - > - if (TestSetPageDirty(page)) { > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > - mapping->host, page, page->index); I am not familiar with the code but this looks we loose an information about something bad(?) is going on? > + if (!__set_page_dirty(page, mapping, 1)) > return 0; > - } > > inode = mapping->host; > ci = ceph_inode(inode); -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 5/7] memcg: add per cgroup dirty pages accounting 2012-06-28 10:54 ` Sha Zhengju (?) @ 2012-06-28 11:04 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:04 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> This patch adds memcg routines to count dirty pages, which allows memory controller to maintain an accurate view of the amount of its dirty memory and can provide some info for users while group's direct reclaim is working. After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg has a feature to move a page from a cgroup to another one and may have race between "move" and "page stat accounting". So in order to avoid the race we have designed a bigger lock: mem_cgroup_begin_update_page_stat() modify page information -->(a) mem_cgroup_update_page_stat() -->(b) mem_cgroup_end_update_page_stat() It requires (a) and (b)(dirty pages accounting) can stay close enough. In the previous two prepare patches, we have reworked the vfs set page dirty routines and now the interfaces are more explicit: incrementing (2): __set_page_dirty __set_page_dirty_nobuffers decrementing (2): clear_page_dirty_for_io cancel_dirty_page Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- fs/buffer.c | 17 ++++++++++++++--- include/linux/memcontrol.h | 1 + mm/filemap.c | 5 +++++ mm/memcontrol.c | 28 +++++++++++++++++++++------- mm/page-writeback.c | 30 ++++++++++++++++++++++++------ mm/truncate.c | 6 ++++++ 6 files changed, 71 insertions(+), 16 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 55522dd..d3714cc 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { + bool locked; + unsigned long flags; + int ret = 0; + if (unlikely(!mapping)) return !TestSetPageDirty(page); - if (TestSetPageDirty(page)) - return 0; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); + + if (TestSetPageDirty(page)) { + ret = 0; + goto out; + } spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, spin_unlock_irq(&mapping->tree_lock); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - return 1; + ret = 1; +out: + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } /* diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 20b0f2d..ad37b59 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/filemap.c b/mm/filemap.c index 1f19ec3..5159a49 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) * having removed the page entirely. */ if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { + /* + * Do not change page state, so no need to use mem_cgroup_ + * {begin, end}_update_page_stat to get lock. + */ + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ebed1ca..90e2946 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { "rss", "mapped_file", "swap", + "dirty", }; enum mem_cgroup_events_index { @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, + struct mem_cgroup *to, + enum mem_cgroup_stat_index idx) +{ + /* Update stat data for mem_cgroup */ + preempt_disable(); + __this_cpu_dec(from->stat->count[idx]); + __this_cpu_inc(to->stat->count[idx]); + preempt_enable(); +} + /** * mem_cgroup_move_account - move account of the page * @page: the page @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, move_lock_mem_cgroup(from, &flags); - if (!anon && page_mapped(page)) { - /* Update mapped_file data for mem_cgroup */ - preempt_disable(); - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); - preempt_enable(); - } + if (!anon && page_mapped(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_MAPPED); + + if (PageDirty(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_DIRTY); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e5363f3..e79a2f7 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) void account_page_dirtied(struct page *page, struct address_space *mapping) { if (mapping_cap_account_dirty(mapping)) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); __inc_zone_page_state(page, NR_FILE_DIRTY); __inc_zone_page_state(page, NR_DIRTIED); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); */ int __set_page_dirty_nobuffers(struct page *page) { + bool locked; + unsigned long flags; + int ret = 0; + + mem_cgroup_begin_update_page_stat(page, &locked, &flags); + if (!TestSetPageDirty(page)) { struct address_space *mapping = page_mapping(page); struct address_space *mapping2; - if (!mapping) - return 1; + if (!mapping) { + ret = 1; + goto out; + } spin_lock_irq(&mapping->tree_lock); mapping2 = page_mapping(page); @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) /* !PageAnon && !swapper_space */ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } - return 1; + ret = 1; } - return 0; + +out: + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } EXPORT_SYMBOL(__set_page_dirty_nobuffers); @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); int clear_page_dirty_for_io(struct page *page) { struct address_space *mapping = page_mapping(page); + bool locked; + unsigned long flags; + int ret = 0; BUG_ON(!PageLocked(page)); @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) * the desired exclusion. See mm/memory.c:do_wp_page() * for more comments. */ + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (TestClearPageDirty(page)) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); - return 1; + ret = 1; } - return 0; + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } return TestClearPageDirty(page); } diff --git a/mm/truncate.c b/mm/truncate.c index 75801ac..052016a 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) */ void cancel_dirty_page(struct page *page, unsigned int account_size) { + bool locked; + unsigned long flags; + + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (TestClearPageDirty(page)) { struct address_space *mapping = page->mapping; if (mapping && mapping_cap_account_dirty(mapping)) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) task_io_account_cancelled_write(account_size); } } + mem_cgroup_end_update_page_stat(page, &locked, &flags); } EXPORT_SYMBOL(cancel_dirty_page); -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-06-28 11:04 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:04 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA Cc: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> This patch adds memcg routines to count dirty pages, which allows memory controller to maintain an accurate view of the amount of its dirty memory and can provide some info for users while group's direct reclaim is working. After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg has a feature to move a page from a cgroup to another one and may have race between "move" and "page stat accounting". So in order to avoid the race we have designed a bigger lock: mem_cgroup_begin_update_page_stat() modify page information -->(a) mem_cgroup_update_page_stat() -->(b) mem_cgroup_end_update_page_stat() It requires (a) and (b)(dirty pages accounting) can stay close enough. In the previous two prepare patches, we have reworked the vfs set page dirty routines and now the interfaces are more explicit: incrementing (2): __set_page_dirty __set_page_dirty_nobuffers decrementing (2): clear_page_dirty_for_io cancel_dirty_page Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> --- fs/buffer.c | 17 ++++++++++++++--- include/linux/memcontrol.h | 1 + mm/filemap.c | 5 +++++ mm/memcontrol.c | 28 +++++++++++++++++++++------- mm/page-writeback.c | 30 ++++++++++++++++++++++++------ mm/truncate.c | 6 ++++++ 6 files changed, 71 insertions(+), 16 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 55522dd..d3714cc 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { + bool locked; + unsigned long flags; + int ret = 0; + if (unlikely(!mapping)) return !TestSetPageDirty(page); - if (TestSetPageDirty(page)) - return 0; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); + + if (TestSetPageDirty(page)) { + ret = 0; + goto out; + } spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, spin_unlock_irq(&mapping->tree_lock); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - return 1; + ret = 1; +out: + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } /* diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 20b0f2d..ad37b59 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/filemap.c b/mm/filemap.c index 1f19ec3..5159a49 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) * having removed the page entirely. */ if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { + /* + * Do not change page state, so no need to use mem_cgroup_ + * {begin, end}_update_page_stat to get lock. + */ + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ebed1ca..90e2946 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { "rss", "mapped_file", "swap", + "dirty", }; enum mem_cgroup_events_index { @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, + struct mem_cgroup *to, + enum mem_cgroup_stat_index idx) +{ + /* Update stat data for mem_cgroup */ + preempt_disable(); + __this_cpu_dec(from->stat->count[idx]); + __this_cpu_inc(to->stat->count[idx]); + preempt_enable(); +} + /** * mem_cgroup_move_account - move account of the page * @page: the page @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, move_lock_mem_cgroup(from, &flags); - if (!anon && page_mapped(page)) { - /* Update mapped_file data for mem_cgroup */ - preempt_disable(); - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); - preempt_enable(); - } + if (!anon && page_mapped(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_MAPPED); + + if (PageDirty(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_DIRTY); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e5363f3..e79a2f7 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) void account_page_dirtied(struct page *page, struct address_space *mapping) { if (mapping_cap_account_dirty(mapping)) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); __inc_zone_page_state(page, NR_FILE_DIRTY); __inc_zone_page_state(page, NR_DIRTIED); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); */ int __set_page_dirty_nobuffers(struct page *page) { + bool locked; + unsigned long flags; + int ret = 0; + + mem_cgroup_begin_update_page_stat(page, &locked, &flags); + if (!TestSetPageDirty(page)) { struct address_space *mapping = page_mapping(page); struct address_space *mapping2; - if (!mapping) - return 1; + if (!mapping) { + ret = 1; + goto out; + } spin_lock_irq(&mapping->tree_lock); mapping2 = page_mapping(page); @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) /* !PageAnon && !swapper_space */ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } - return 1; + ret = 1; } - return 0; + +out: + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } EXPORT_SYMBOL(__set_page_dirty_nobuffers); @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); int clear_page_dirty_for_io(struct page *page) { struct address_space *mapping = page_mapping(page); + bool locked; + unsigned long flags; + int ret = 0; BUG_ON(!PageLocked(page)); @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) * the desired exclusion. See mm/memory.c:do_wp_page() * for more comments. */ + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (TestClearPageDirty(page)) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); - return 1; + ret = 1; } - return 0; + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } return TestClearPageDirty(page); } diff --git a/mm/truncate.c b/mm/truncate.c index 75801ac..052016a 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) */ void cancel_dirty_page(struct page *page, unsigned int account_size) { + bool locked; + unsigned long flags; + + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (TestClearPageDirty(page)) { struct address_space *mapping = page->mapping; if (mapping && mapping_cap_account_dirty(mapping)) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) task_io_account_cancelled_write(account_size); } } + mem_cgroup_end_update_page_stat(page, &locked, &flags); } EXPORT_SYMBOL(cancel_dirty_page); -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-06-28 11:04 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:04 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> This patch adds memcg routines to count dirty pages, which allows memory controller to maintain an accurate view of the amount of its dirty memory and can provide some info for users while group's direct reclaim is working. After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg has a feature to move a page from a cgroup to another one and may have race between "move" and "page stat accounting". So in order to avoid the race we have designed a bigger lock: mem_cgroup_begin_update_page_stat() modify page information -->(a) mem_cgroup_update_page_stat() -->(b) mem_cgroup_end_update_page_stat() It requires (a) and (b)(dirty pages accounting) can stay close enough. In the previous two prepare patches, we have reworked the vfs set page dirty routines and now the interfaces are more explicit: incrementing (2): __set_page_dirty __set_page_dirty_nobuffers decrementing (2): clear_page_dirty_for_io cancel_dirty_page Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- fs/buffer.c | 17 ++++++++++++++--- include/linux/memcontrol.h | 1 + mm/filemap.c | 5 +++++ mm/memcontrol.c | 28 +++++++++++++++++++++------- mm/page-writeback.c | 30 ++++++++++++++++++++++++------ mm/truncate.c | 6 ++++++ 6 files changed, 71 insertions(+), 16 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 55522dd..d3714cc 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { + bool locked; + unsigned long flags; + int ret = 0; + if (unlikely(!mapping)) return !TestSetPageDirty(page); - if (TestSetPageDirty(page)) - return 0; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); + + if (TestSetPageDirty(page)) { + ret = 0; + goto out; + } spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, spin_unlock_irq(&mapping->tree_lock); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - return 1; + ret = 1; +out: + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } /* diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 20b0f2d..ad37b59 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/filemap.c b/mm/filemap.c index 1f19ec3..5159a49 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) * having removed the page entirely. */ if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { + /* + * Do not change page state, so no need to use mem_cgroup_ + * {begin, end}_update_page_stat to get lock. + */ + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ebed1ca..90e2946 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { "rss", "mapped_file", "swap", + "dirty", }; enum mem_cgroup_events_index { @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, + struct mem_cgroup *to, + enum mem_cgroup_stat_index idx) +{ + /* Update stat data for mem_cgroup */ + preempt_disable(); + __this_cpu_dec(from->stat->count[idx]); + __this_cpu_inc(to->stat->count[idx]); + preempt_enable(); +} + /** * mem_cgroup_move_account - move account of the page * @page: the page @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, move_lock_mem_cgroup(from, &flags); - if (!anon && page_mapped(page)) { - /* Update mapped_file data for mem_cgroup */ - preempt_disable(); - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); - preempt_enable(); - } + if (!anon && page_mapped(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_MAPPED); + + if (PageDirty(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_DIRTY); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e5363f3..e79a2f7 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) void account_page_dirtied(struct page *page, struct address_space *mapping) { if (mapping_cap_account_dirty(mapping)) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); __inc_zone_page_state(page, NR_FILE_DIRTY); __inc_zone_page_state(page, NR_DIRTIED); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); */ int __set_page_dirty_nobuffers(struct page *page) { + bool locked; + unsigned long flags; + int ret = 0; + + mem_cgroup_begin_update_page_stat(page, &locked, &flags); + if (!TestSetPageDirty(page)) { struct address_space *mapping = page_mapping(page); struct address_space *mapping2; - if (!mapping) - return 1; + if (!mapping) { + ret = 1; + goto out; + } spin_lock_irq(&mapping->tree_lock); mapping2 = page_mapping(page); @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) /* !PageAnon && !swapper_space */ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } - return 1; + ret = 1; } - return 0; + +out: + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } EXPORT_SYMBOL(__set_page_dirty_nobuffers); @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); int clear_page_dirty_for_io(struct page *page) { struct address_space *mapping = page_mapping(page); + bool locked; + unsigned long flags; + int ret = 0; BUG_ON(!PageLocked(page)); @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) * the desired exclusion. See mm/memory.c:do_wp_page() * for more comments. */ + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (TestClearPageDirty(page)) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); - return 1; + ret = 1; } - return 0; + mem_cgroup_end_update_page_stat(page, &locked, &flags); + return ret; } return TestClearPageDirty(page); } diff --git a/mm/truncate.c b/mm/truncate.c index 75801ac..052016a 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) */ void cancel_dirty_page(struct page *page, unsigned int account_size) { + bool locked; + unsigned long flags; + + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (TestClearPageDirty(page)) { struct address_space *mapping = page->mapping; if (mapping && mapping_cap_account_dirty(mapping)) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) task_io_account_cancelled_write(account_size); } } + mem_cgroup_end_update_page_stat(page, &locked, &flags); } EXPORT_SYMBOL(cancel_dirty_page); -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting 2012-06-28 11:04 ` Sha Zhengju @ 2012-07-03 5:57 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-03 5:57 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 20:04), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while group's direct reclaim is working. > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > has a feature to move a page from a cgroup to another one and may have race between > "move" and "page stat accounting". So in order to avoid the race we have designed a > bigger lock: > > mem_cgroup_begin_update_page_stat() > modify page information -->(a) > mem_cgroup_update_page_stat() -->(b) > mem_cgroup_end_update_page_stat() > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > and now the interfaces are more explicit: > incrementing (2): > __set_page_dirty > __set_page_dirty_nobuffers > decrementing (2): > clear_page_dirty_for_io > cancel_dirty_page > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Thank you. This seems much cleaner than expected ! very good. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> > --- > fs/buffer.c | 17 ++++++++++++++--- > include/linux/memcontrol.h | 1 + > mm/filemap.c | 5 +++++ > mm/memcontrol.c | 28 +++++++++++++++++++++------- > mm/page-writeback.c | 30 ++++++++++++++++++++++++------ > mm/truncate.c | 6 ++++++ > 6 files changed, 71 insertions(+), 16 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 55522dd..d3714cc 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) > - return 0; > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > + if (TestSetPageDirty(page)) { > + ret = 0; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - return 1; > + ret = 1; > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > > /* > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 20b0f2d..ad37b59 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1f19ec3..5159a49 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) > * having removed the page entirely. > */ > if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { > + /* > + * Do not change page state, so no need to use mem_cgroup_ > + * {begin, end}_update_page_stat to get lock. > + */ > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ebed1ca..90e2946 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { > "rss", > "mapped_file", > "swap", > + "dirty", > }; > > enum mem_cgroup_events_index { > @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) > } > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline > +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, > + struct mem_cgroup *to, > + enum mem_cgroup_stat_index idx) > +{ > + /* Update stat data for mem_cgroup */ > + preempt_disable(); > + __this_cpu_dec(from->stat->count[idx]); > + __this_cpu_inc(to->stat->count[idx]); > + preempt_enable(); > +} > + > /** > * mem_cgroup_move_account - move account of the page > * @page: the page > @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, > > move_lock_mem_cgroup(from, &flags); > > - if (!anon && page_mapped(page)) { > - /* Update mapped_file data for mem_cgroup */ > - preempt_disable(); > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - preempt_enable(); > - } > + if (!anon && page_mapped(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_MAPPED); > + > + if (PageDirty(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_DIRTY); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e5363f3..e79a2f7 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) > void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > __inc_zone_page_state(page, NR_FILE_DIRTY); > __inc_zone_page_state(page, NR_DIRTIED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); > */ > int __set_page_dirty_nobuffers(struct page *page) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) { > + ret = 1; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > mapping2 = page_mapping(page); > @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > - return 1; > + ret = 1; > } > - return 0; > + > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); > int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > + bool locked; > + unsigned long flags; > + int ret = 0; > > BUG_ON(!PageLocked(page)); > > @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) > * the desired exclusion. See mm/memory.c:do_wp_page() > * for more comments. > */ > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > - return 1; > + ret = 1; > } > - return 0; > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > return TestClearPageDirty(page); > } > diff --git a/mm/truncate.c b/mm/truncate.c > index 75801ac..052016a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) > */ > void cancel_dirty_page(struct page *page, unsigned int account_size) > { > + bool locked; > + unsigned long flags; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > struct address_space *mapping = page->mapping; > if (mapping && mapping_cap_account_dirty(mapping)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) > task_io_account_cancelled_write(account_size); > } > } > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > EXPORT_SYMBOL(cancel_dirty_page); > > ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-03 5:57 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-03 5:57 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 20:04), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while group's direct reclaim is working. > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > has a feature to move a page from a cgroup to another one and may have race between > "move" and "page stat accounting". So in order to avoid the race we have designed a > bigger lock: > > mem_cgroup_begin_update_page_stat() > modify page information -->(a) > mem_cgroup_update_page_stat() -->(b) > mem_cgroup_end_update_page_stat() > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > and now the interfaces are more explicit: > incrementing (2): > __set_page_dirty > __set_page_dirty_nobuffers > decrementing (2): > clear_page_dirty_for_io > cancel_dirty_page > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Thank you. This seems much cleaner than expected ! very good. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> > --- > fs/buffer.c | 17 ++++++++++++++--- > include/linux/memcontrol.h | 1 + > mm/filemap.c | 5 +++++ > mm/memcontrol.c | 28 +++++++++++++++++++++------- > mm/page-writeback.c | 30 ++++++++++++++++++++++++------ > mm/truncate.c | 6 ++++++ > 6 files changed, 71 insertions(+), 16 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 55522dd..d3714cc 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) > - return 0; > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > + if (TestSetPageDirty(page)) { > + ret = 0; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - return 1; > + ret = 1; > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > > /* > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 20b0f2d..ad37b59 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1f19ec3..5159a49 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) > * having removed the page entirely. > */ > if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { > + /* > + * Do not change page state, so no need to use mem_cgroup_ > + * {begin, end}_update_page_stat to get lock. > + */ > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ebed1ca..90e2946 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { > "rss", > "mapped_file", > "swap", > + "dirty", > }; > > enum mem_cgroup_events_index { > @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) > } > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline > +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, > + struct mem_cgroup *to, > + enum mem_cgroup_stat_index idx) > +{ > + /* Update stat data for mem_cgroup */ > + preempt_disable(); > + __this_cpu_dec(from->stat->count[idx]); > + __this_cpu_inc(to->stat->count[idx]); > + preempt_enable(); > +} > + > /** > * mem_cgroup_move_account - move account of the page > * @page: the page > @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, > > move_lock_mem_cgroup(from, &flags); > > - if (!anon && page_mapped(page)) { > - /* Update mapped_file data for mem_cgroup */ > - preempt_disable(); > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - preempt_enable(); > - } > + if (!anon && page_mapped(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_MAPPED); > + > + if (PageDirty(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_DIRTY); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e5363f3..e79a2f7 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) > void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > __inc_zone_page_state(page, NR_FILE_DIRTY); > __inc_zone_page_state(page, NR_DIRTIED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); > */ > int __set_page_dirty_nobuffers(struct page *page) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) { > + ret = 1; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > mapping2 = page_mapping(page); > @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > - return 1; > + ret = 1; > } > - return 0; > + > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); > int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > + bool locked; > + unsigned long flags; > + int ret = 0; > > BUG_ON(!PageLocked(page)); > > @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) > * the desired exclusion. See mm/memory.c:do_wp_page() > * for more comments. > */ > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > - return 1; > + ret = 1; > } > - return 0; > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > return TestClearPageDirty(page); > } > diff --git a/mm/truncate.c b/mm/truncate.c > index 75801ac..052016a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) > */ > void cancel_dirty_page(struct page *page, unsigned int account_size) > { > + bool locked; > + unsigned long flags; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > struct address_space *mapping = page->mapping; > if (mapping && mapping_cap_account_dirty(mapping)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) > task_io_account_cancelled_write(account_size); > } > } > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > EXPORT_SYMBOL(cancel_dirty_page); > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting 2012-07-03 5:57 ` Kamezawa Hiroyuki @ 2012-07-08 14:45 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-08 14:45 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Tue, Jul 03, 2012 at 02:57:08PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/06/28 20:04), Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > This patch adds memcg routines to count dirty pages, which allows memory controller > > to maintain an accurate view of the amount of its dirty memory and can provide some > > info for users while group's direct reclaim is working. > > > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > > has a feature to move a page from a cgroup to another one and may have race between > > "move" and "page stat accounting". So in order to avoid the race we have designed a > > bigger lock: > > > > mem_cgroup_begin_update_page_stat() > > modify page information -->(a) > > mem_cgroup_update_page_stat() -->(b) > > mem_cgroup_end_update_page_stat() > > > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > > and now the interfaces are more explicit: > > incrementing (2): > > __set_page_dirty > > __set_page_dirty_nobuffers > > decrementing (2): > > clear_page_dirty_for_io > > cancel_dirty_page > > > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > Thank you. This seems much cleaner than expected ! very good. > > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> I have the same good feelings :) Acked-by: Fengguang Wu <fengguang.wu@intel.com> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-08 14:45 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-08 14:45 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Tue, Jul 03, 2012 at 02:57:08PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/06/28 20:04), Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > This patch adds memcg routines to count dirty pages, which allows memory controller > > to maintain an accurate view of the amount of its dirty memory and can provide some > > info for users while group's direct reclaim is working. > > > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > > has a feature to move a page from a cgroup to another one and may have race between > > "move" and "page stat accounting". So in order to avoid the race we have designed a > > bigger lock: > > > > mem_cgroup_begin_update_page_stat() > > modify page information -->(a) > > mem_cgroup_update_page_stat() -->(b) > > mem_cgroup_end_update_page_stat() > > > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > > and now the interfaces are more explicit: > > incrementing (2): > > __set_page_dirty > > __set_page_dirty_nobuffers > > decrementing (2): > > clear_page_dirty_for_io > > cancel_dirty_page > > > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > Thank you. This seems much cleaner than expected ! very good. > > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> I have the same good feelings :) Acked-by: Fengguang Wu <fengguang.wu@intel.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-04 16:11 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 16:11 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju, Alexander Viro, linux-fsdevel I guess you should CC vfs people On Thu 28-06-12 19:04:46, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while group's direct reclaim is working. > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > has a feature to move a page from a cgroup to another one and may have race between > "move" and "page stat accounting". So in order to avoid the race we have designed a > bigger lock: > > mem_cgroup_begin_update_page_stat() > modify page information -->(a) > mem_cgroup_update_page_stat() -->(b) > mem_cgroup_end_update_page_stat() > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > and now the interfaces are more explicit: > incrementing (2): > __set_page_dirty > __set_page_dirty_nobuffers > decrementing (2): > clear_page_dirty_for_io > cancel_dirty_page The patch seems correct at first glance (I have to look closer but I am in rush at the momemnt and will be back next week). I was just thinking that memcg is enabled by most distributions these days but not that many people use it. So it would be probably good to think how to reduce an overhead if !mem_cgroup_disabled && no cgroups. Something similar Glauber did for the kmem accounting. > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 17 ++++++++++++++--- > include/linux/memcontrol.h | 1 + > mm/filemap.c | 5 +++++ > mm/memcontrol.c | 28 +++++++++++++++++++++------- > mm/page-writeback.c | 30 ++++++++++++++++++++++++------ > mm/truncate.c | 6 ++++++ > 6 files changed, 71 insertions(+), 16 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 55522dd..d3714cc 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) > - return 0; > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > + if (TestSetPageDirty(page)) { > + ret = 0; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - return 1; > + ret = 1; > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > > /* > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 20b0f2d..ad37b59 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1f19ec3..5159a49 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) > * having removed the page entirely. > */ > if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { > + /* > + * Do not change page state, so no need to use mem_cgroup_ > + * {begin, end}_update_page_stat to get lock. > + */ > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ebed1ca..90e2946 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { > "rss", > "mapped_file", > "swap", > + "dirty", > }; > > enum mem_cgroup_events_index { > @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) > } > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline > +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, > + struct mem_cgroup *to, > + enum mem_cgroup_stat_index idx) > +{ > + /* Update stat data for mem_cgroup */ > + preempt_disable(); > + __this_cpu_dec(from->stat->count[idx]); > + __this_cpu_inc(to->stat->count[idx]); > + preempt_enable(); > +} > + > /** > * mem_cgroup_move_account - move account of the page > * @page: the page > @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, > > move_lock_mem_cgroup(from, &flags); > > - if (!anon && page_mapped(page)) { > - /* Update mapped_file data for mem_cgroup */ > - preempt_disable(); > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - preempt_enable(); > - } > + if (!anon && page_mapped(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_MAPPED); > + > + if (PageDirty(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_DIRTY); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e5363f3..e79a2f7 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) > void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > __inc_zone_page_state(page, NR_FILE_DIRTY); > __inc_zone_page_state(page, NR_DIRTIED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); > */ > int __set_page_dirty_nobuffers(struct page *page) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) { > + ret = 1; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > mapping2 = page_mapping(page); > @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > - return 1; > + ret = 1; > } > - return 0; > + > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); > int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > + bool locked; > + unsigned long flags; > + int ret = 0; > > BUG_ON(!PageLocked(page)); > > @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) > * the desired exclusion. See mm/memory.c:do_wp_page() > * for more comments. > */ > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > - return 1; > + ret = 1; > } > - return 0; > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > return TestClearPageDirty(page); > } > diff --git a/mm/truncate.c b/mm/truncate.c > index 75801ac..052016a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) > */ > void cancel_dirty_page(struct page *page, unsigned int account_size) > { > + bool locked; > + unsigned long flags; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > struct address_space *mapping = page->mapping; > if (mapping && mapping_cap_account_dirty(mapping)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) > task_io_account_cancelled_write(account_size); > } > } > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > EXPORT_SYMBOL(cancel_dirty_page); > > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-04 16:11 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 16:11 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju, Alexander Viro, linux-fsdevel I guess you should CC vfs people On Thu 28-06-12 19:04:46, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while group's direct reclaim is working. > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > has a feature to move a page from a cgroup to another one and may have race between > "move" and "page stat accounting". So in order to avoid the race we have designed a > bigger lock: > > mem_cgroup_begin_update_page_stat() > modify page information -->(a) > mem_cgroup_update_page_stat() -->(b) > mem_cgroup_end_update_page_stat() > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > and now the interfaces are more explicit: > incrementing (2): > __set_page_dirty > __set_page_dirty_nobuffers > decrementing (2): > clear_page_dirty_for_io > cancel_dirty_page The patch seems correct at first glance (I have to look closer but I am in rush at the momemnt and will be back next week). I was just thinking that memcg is enabled by most distributions these days but not that many people use it. So it would be probably good to think how to reduce an overhead if !mem_cgroup_disabled && no cgroups. Something similar Glauber did for the kmem accounting. > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 17 ++++++++++++++--- > include/linux/memcontrol.h | 1 + > mm/filemap.c | 5 +++++ > mm/memcontrol.c | 28 +++++++++++++++++++++------- > mm/page-writeback.c | 30 ++++++++++++++++++++++++------ > mm/truncate.c | 6 ++++++ > 6 files changed, 71 insertions(+), 16 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 55522dd..d3714cc 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) > - return 0; > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > + if (TestSetPageDirty(page)) { > + ret = 0; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - return 1; > + ret = 1; > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > > /* > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 20b0f2d..ad37b59 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1f19ec3..5159a49 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) > * having removed the page entirely. > */ > if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { > + /* > + * Do not change page state, so no need to use mem_cgroup_ > + * {begin, end}_update_page_stat to get lock. > + */ > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ebed1ca..90e2946 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { > "rss", > "mapped_file", > "swap", > + "dirty", > }; > > enum mem_cgroup_events_index { > @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) > } > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline > +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, > + struct mem_cgroup *to, > + enum mem_cgroup_stat_index idx) > +{ > + /* Update stat data for mem_cgroup */ > + preempt_disable(); > + __this_cpu_dec(from->stat->count[idx]); > + __this_cpu_inc(to->stat->count[idx]); > + preempt_enable(); > +} > + > /** > * mem_cgroup_move_account - move account of the page > * @page: the page > @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, > > move_lock_mem_cgroup(from, &flags); > > - if (!anon && page_mapped(page)) { > - /* Update mapped_file data for mem_cgroup */ > - preempt_disable(); > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - preempt_enable(); > - } > + if (!anon && page_mapped(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_MAPPED); > + > + if (PageDirty(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_DIRTY); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e5363f3..e79a2f7 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) > void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > __inc_zone_page_state(page, NR_FILE_DIRTY); > __inc_zone_page_state(page, NR_DIRTIED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); > */ > int __set_page_dirty_nobuffers(struct page *page) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) { > + ret = 1; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > mapping2 = page_mapping(page); > @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > - return 1; > + ret = 1; > } > - return 0; > + > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); > int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > + bool locked; > + unsigned long flags; > + int ret = 0; > > BUG_ON(!PageLocked(page)); > > @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) > * the desired exclusion. See mm/memory.c:do_wp_page() > * for more comments. > */ > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > - return 1; > + ret = 1; > } > - return 0; > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > return TestClearPageDirty(page); > } > diff --git a/mm/truncate.c b/mm/truncate.c > index 75801ac..052016a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) > */ > void cancel_dirty_page(struct page *page, unsigned int account_size) > { > + bool locked; > + unsigned long flags; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > struct address_space *mapping = page->mapping; > if (mapping && mapping_cap_account_dirty(mapping)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) > task_io_account_cancelled_write(account_size); > } > } > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > EXPORT_SYMBOL(cancel_dirty_page); > > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-04 16:11 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 16:11 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju, Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA I guess you should CC vfs people On Thu 28-06-12 19:04:46, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while group's direct reclaim is working. > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > has a feature to move a page from a cgroup to another one and may have race between > "move" and "page stat accounting". So in order to avoid the race we have designed a > bigger lock: > > mem_cgroup_begin_update_page_stat() > modify page information -->(a) > mem_cgroup_update_page_stat() -->(b) > mem_cgroup_end_update_page_stat() > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > and now the interfaces are more explicit: > incrementing (2): > __set_page_dirty > __set_page_dirty_nobuffers > decrementing (2): > clear_page_dirty_for_io > cancel_dirty_page The patch seems correct at first glance (I have to look closer but I am in rush at the momemnt and will be back next week). I was just thinking that memcg is enabled by most distributions these days but not that many people use it. So it would be probably good to think how to reduce an overhead if !mem_cgroup_disabled && no cgroups. Something similar Glauber did for the kmem accounting. > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > --- > fs/buffer.c | 17 ++++++++++++++--- > include/linux/memcontrol.h | 1 + > mm/filemap.c | 5 +++++ > mm/memcontrol.c | 28 +++++++++++++++++++++------- > mm/page-writeback.c | 30 ++++++++++++++++++++++++------ > mm/truncate.c | 6 ++++++ > 6 files changed, 71 insertions(+), 16 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 55522dd..d3714cc 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) > - return 0; > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > + if (TestSetPageDirty(page)) { > + ret = 0; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - return 1; > + ret = 1; > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > > /* > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 20b0f2d..ad37b59 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1f19ec3..5159a49 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) > * having removed the page entirely. > */ > if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { > + /* > + * Do not change page state, so no need to use mem_cgroup_ > + * {begin, end}_update_page_stat to get lock. > + */ > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ebed1ca..90e2946 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { > "rss", > "mapped_file", > "swap", > + "dirty", > }; > > enum mem_cgroup_events_index { > @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) > } > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline > +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, > + struct mem_cgroup *to, > + enum mem_cgroup_stat_index idx) > +{ > + /* Update stat data for mem_cgroup */ > + preempt_disable(); > + __this_cpu_dec(from->stat->count[idx]); > + __this_cpu_inc(to->stat->count[idx]); > + preempt_enable(); > +} > + > /** > * mem_cgroup_move_account - move account of the page > * @page: the page > @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, > > move_lock_mem_cgroup(from, &flags); > > - if (!anon && page_mapped(page)) { > - /* Update mapped_file data for mem_cgroup */ > - preempt_disable(); > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - preempt_enable(); > - } > + if (!anon && page_mapped(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_MAPPED); > + > + if (PageDirty(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_DIRTY); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e5363f3..e79a2f7 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) > void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > __inc_zone_page_state(page, NR_FILE_DIRTY); > __inc_zone_page_state(page, NR_DIRTIED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); > */ > int __set_page_dirty_nobuffers(struct page *page) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) { > + ret = 1; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > mapping2 = page_mapping(page); > @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > - return 1; > + ret = 1; > } > - return 0; > + > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); > int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > + bool locked; > + unsigned long flags; > + int ret = 0; > > BUG_ON(!PageLocked(page)); > > @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) > * the desired exclusion. See mm/memory.c:do_wp_page() > * for more comments. > */ > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > - return 1; > + ret = 1; > } > - return 0; > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > return TestClearPageDirty(page); > } > diff --git a/mm/truncate.c b/mm/truncate.c > index 75801ac..052016a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) > */ > void cancel_dirty_page(struct page *page, unsigned int account_size) > { > + bool locked; > + unsigned long flags; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > struct address_space *mapping = page->mapping; > if (mapping && mapping_cap_account_dirty(mapping)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) > task_io_account_cancelled_write(account_size); > } > } > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > EXPORT_SYMBOL(cancel_dirty_page); > > -- > 1.7.1 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting 2012-06-28 11:04 ` Sha Zhengju @ 2012-07-09 21:02 ` Greg Thelen -1 siblings, 0 replies; 132+ messages in thread From: Greg Thelen @ 2012-07-09 21:02 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while group's direct reclaim is working. > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > has a feature to move a page from a cgroup to another one and may have race between > "move" and "page stat accounting". So in order to avoid the race we have designed a > bigger lock: > > mem_cgroup_begin_update_page_stat() > modify page information -->(a) > mem_cgroup_update_page_stat() -->(b) > mem_cgroup_end_update_page_stat() > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > and now the interfaces are more explicit: > incrementing (2): > __set_page_dirty > __set_page_dirty_nobuffers > decrementing (2): > clear_page_dirty_for_io > cancel_dirty_page > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 17 ++++++++++++++--- > include/linux/memcontrol.h | 1 + > mm/filemap.c | 5 +++++ > mm/memcontrol.c | 28 +++++++++++++++++++++------- > mm/page-writeback.c | 30 ++++++++++++++++++++++++------ > mm/truncate.c | 6 ++++++ > 6 files changed, 71 insertions(+), 16 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 55522dd..d3714cc 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + bool locked; > + unsigned long flags; > + int ret = 0; '= 0' and 'ret = 0' change (below) are redundant. My vote is to remove '= 0' here. > + > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) > - return 0; > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > + if (TestSetPageDirty(page)) { > + ret = 0; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - return 1; > + ret = 1; > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > > /* > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 20b0f2d..ad37b59 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1f19ec3..5159a49 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) > * having removed the page entirely. > */ > if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { > + /* > + * Do not change page state, so no need to use mem_cgroup_ > + * {begin, end}_update_page_stat to get lock. > + */ > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); I do not understand this comment. What serializes this function and mem_cgroup_move_account()? > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ebed1ca..90e2946 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { > "rss", > "mapped_file", > "swap", > + "dirty", > }; > > enum mem_cgroup_events_index { > @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) > } > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline > +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, > + struct mem_cgroup *to, > + enum mem_cgroup_stat_index idx) > +{ > + /* Update stat data for mem_cgroup */ > + preempt_disable(); > + __this_cpu_dec(from->stat->count[idx]); > + __this_cpu_inc(to->stat->count[idx]); > + preempt_enable(); > +} > + > /** > * mem_cgroup_move_account - move account of the page > * @page: the page > @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, > > move_lock_mem_cgroup(from, &flags); > > - if (!anon && page_mapped(page)) { > - /* Update mapped_file data for mem_cgroup */ > - preempt_disable(); > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - preempt_enable(); > - } > + if (!anon && page_mapped(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_MAPPED); > + > + if (PageDirty(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_DIRTY); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e5363f3..e79a2f7 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) > void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); It might be helpful to add a comment to account_page_dirtied() indicating that caller must hold mem_cgroup_begin_update_page_stat() lock. Extra credit for an new assertion added to mem_cgroup_update_page_stat() confirming the needed lock is held. > __inc_zone_page_state(page, NR_FILE_DIRTY); > __inc_zone_page_state(page, NR_DIRTIED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); > */ > int __set_page_dirty_nobuffers(struct page *page) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + Is there a strict lock ordering which says that mem_cgroup_begin_update_page_stat() must not be called while holding tree_lock? If yes, then maybe we should update the 'Lock ordering' comment in mm/filemap.c to describe the mem_cgroup_begin_update_page_stat() lock. > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) { > + ret = 1; > + goto out; > + } The following seems even easier because it does not need your 'ret = 1' change below. + ret = 1; if (!mapping) - return 1; + goto out; > > spin_lock_irq(&mapping->tree_lock); > mapping2 = page_mapping(page); > @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > - return 1; > + ret = 1; With the ret=1 change above, this can be changed to: - return 1; > } > - return 0; > + > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); > int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > + bool locked; > + unsigned long flags; > + int ret = 0; > > BUG_ON(!PageLocked(page)); > > @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) > * the desired exclusion. See mm/memory.c:do_wp_page() > * for more comments. > */ > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > - return 1; > + ret = 1; > } > - return 0; > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > return TestClearPageDirty(page); > } > diff --git a/mm/truncate.c b/mm/truncate.c > index 75801ac..052016a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) > */ > void cancel_dirty_page(struct page *page, unsigned int account_size) > { > + bool locked; > + unsigned long flags; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > struct address_space *mapping = page->mapping; > if (mapping && mapping_cap_account_dirty(mapping)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) > task_io_account_cancelled_write(account_size); > } > } > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > EXPORT_SYMBOL(cancel_dirty_page); ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-09 21:02 ` Greg Thelen 0 siblings, 0 replies; 132+ messages in thread From: Greg Thelen @ 2012-07-09 21:02 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while group's direct reclaim is working. > > After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can > use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg > has a feature to move a page from a cgroup to another one and may have race between > "move" and "page stat accounting". So in order to avoid the race we have designed a > bigger lock: > > mem_cgroup_begin_update_page_stat() > modify page information -->(a) > mem_cgroup_update_page_stat() -->(b) > mem_cgroup_end_update_page_stat() > > It requires (a) and (b)(dirty pages accounting) can stay close enough. > > In the previous two prepare patches, we have reworked the vfs set page dirty routines > and now the interfaces are more explicit: > incrementing (2): > __set_page_dirty > __set_page_dirty_nobuffers > decrementing (2): > clear_page_dirty_for_io > cancel_dirty_page > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > fs/buffer.c | 17 ++++++++++++++--- > include/linux/memcontrol.h | 1 + > mm/filemap.c | 5 +++++ > mm/memcontrol.c | 28 +++++++++++++++++++++------- > mm/page-writeback.c | 30 ++++++++++++++++++++++++------ > mm/truncate.c | 6 ++++++ > 6 files changed, 71 insertions(+), 16 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 55522dd..d3714cc 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > int __set_page_dirty(struct page *page, > struct address_space *mapping, int warn) > { > + bool locked; > + unsigned long flags; > + int ret = 0; '= 0' and 'ret = 0' change (below) are redundant. My vote is to remove '= 0' here. > + > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) > - return 0; > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + > + if (TestSetPageDirty(page)) { > + ret = 0; > + goto out; > + } > > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, > spin_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - return 1; > + ret = 1; > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > > /* > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 20b0f2d..ad37b59 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1f19ec3..5159a49 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) > * having removed the page entirely. > */ > if (PageDirty(page) && mapping_cap_account_dirty(mapping)) { > + /* > + * Do not change page state, so no need to use mem_cgroup_ > + * {begin, end}_update_page_stat to get lock. > + */ > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); I do not understand this comment. What serializes this function and mem_cgroup_move_account()? > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ebed1ca..90e2946 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { > "rss", > "mapped_file", > "swap", > + "dirty", > }; > > enum mem_cgroup_events_index { > @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) > } > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline > +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, > + struct mem_cgroup *to, > + enum mem_cgroup_stat_index idx) > +{ > + /* Update stat data for mem_cgroup */ > + preempt_disable(); > + __this_cpu_dec(from->stat->count[idx]); > + __this_cpu_inc(to->stat->count[idx]); > + preempt_enable(); > +} > + > /** > * mem_cgroup_move_account - move account of the page > * @page: the page > @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, > > move_lock_mem_cgroup(from, &flags); > > - if (!anon && page_mapped(page)) { > - /* Update mapped_file data for mem_cgroup */ > - preempt_disable(); > - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); > - preempt_enable(); > - } > + if (!anon && page_mapped(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_MAPPED); > + > + if (PageDirty(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_DIRTY); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e5363f3..e79a2f7 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) > void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); It might be helpful to add a comment to account_page_dirtied() indicating that caller must hold mem_cgroup_begin_update_page_stat() lock. Extra credit for an new assertion added to mem_cgroup_update_page_stat() confirming the needed lock is held. > __inc_zone_page_state(page, NR_FILE_DIRTY); > __inc_zone_page_state(page, NR_DIRTIED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); > */ > int __set_page_dirty_nobuffers(struct page *page) > { > + bool locked; > + unsigned long flags; > + int ret = 0; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > + Is there a strict lock ordering which says that mem_cgroup_begin_update_page_stat() must not be called while holding tree_lock? If yes, then maybe we should update the 'Lock ordering' comment in mm/filemap.c to describe the mem_cgroup_begin_update_page_stat() lock. > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) { > + ret = 1; > + goto out; > + } The following seems even easier because it does not need your 'ret = 1' change below. + ret = 1; if (!mapping) - return 1; + goto out; > > spin_lock_irq(&mapping->tree_lock); > mapping2 = page_mapping(page); > @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > - return 1; > + ret = 1; With the ret=1 change above, this can be changed to: - return 1; > } > - return 0; > + > +out: > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); > int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > + bool locked; > + unsigned long flags; > + int ret = 0; > > BUG_ON(!PageLocked(page)); > > @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) > * the desired exclusion. See mm/memory.c:do_wp_page() > * for more comments. > */ > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > - return 1; > + ret = 1; > } > - return 0; > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > + return ret; > } > return TestClearPageDirty(page); > } > diff --git a/mm/truncate.c b/mm/truncate.c > index 75801ac..052016a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) > */ > void cancel_dirty_page(struct page *page, unsigned int account_size) > { > + bool locked; > + unsigned long flags; > + > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (TestClearPageDirty(page)) { > struct address_space *mapping = page->mapping; > if (mapping && mapping_cap_account_dirty(mapping)) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > dec_zone_page_state(page, NR_FILE_DIRTY); > dec_bdi_stat(mapping->backing_dev_info, > BDI_RECLAIMABLE); > @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) > task_io_account_cancelled_write(account_size); > } > } > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > } > EXPORT_SYMBOL(cancel_dirty_page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting 2012-07-09 21:02 ` Greg Thelen @ 2012-07-11 9:32 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-11 9:32 UTC (permalink / raw) To: Greg Thelen Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/10/2012 05:02 AM, Greg Thelen wrote: > On Thu, Jun 28 2012, Sha Zhengju wrote: > >> From: Sha Zhengju<handai.szj@taobao.com> >> >> This patch adds memcg routines to count dirty pages, which allows memory controller >> to maintain an accurate view of the amount of its dirty memory and can provide some >> info for users while group's direct reclaim is working. >> >> After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can >> use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg >> has a feature to move a page from a cgroup to another one and may have race between >> "move" and "page stat accounting". So in order to avoid the race we have designed a >> bigger lock: >> >> mem_cgroup_begin_update_page_stat() >> modify page information -->(a) >> mem_cgroup_update_page_stat() -->(b) >> mem_cgroup_end_update_page_stat() >> >> It requires (a) and (b)(dirty pages accounting) can stay close enough. >> >> In the previous two prepare patches, we have reworked the vfs set page dirty routines >> and now the interfaces are more explicit: >> incrementing (2): >> __set_page_dirty >> __set_page_dirty_nobuffers >> decrementing (2): >> clear_page_dirty_for_io >> cancel_dirty_page >> >> >> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> --- >> fs/buffer.c | 17 ++++++++++++++--- >> include/linux/memcontrol.h | 1 + >> mm/filemap.c | 5 +++++ >> mm/memcontrol.c | 28 +++++++++++++++++++++------- >> mm/page-writeback.c | 30 ++++++++++++++++++++++++------ >> mm/truncate.c | 6 ++++++ >> 6 files changed, 71 insertions(+), 16 deletions(-) >> >> diff --git a/fs/buffer.c b/fs/buffer.c >> index 55522dd..d3714cc 100644 >> --- a/fs/buffer.c >> +++ b/fs/buffer.c >> @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >> int __set_page_dirty(struct page *page, >> struct address_space *mapping, int warn) >> { >> + bool locked; >> + unsigned long flags; >> + int ret = 0; > '= 0' and 'ret = 0' change (below) are redundant. My vote is to remove > '= 0' here. > Nice catch. :-) >> + >> if (unlikely(!mapping)) >> return !TestSetPageDirty(page); >> >> - if (TestSetPageDirty(page)) >> - return 0; >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> + >> + if (TestSetPageDirty(page)) { >> + ret = 0; >> + goto out; >> + } >> >> spin_lock_irq(&mapping->tree_lock); >> if (page->mapping) { /* Race with truncate? */ >> @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, >> spin_unlock_irq(&mapping->tree_lock); >> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> >> - return 1; >> + ret = 1; >> +out: >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> + return ret; >> } >> >> /* >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 20b0f2d..ad37b59 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { >> MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ >> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ >> MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ >> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ >> MEM_CGROUP_STAT_NSTATS, >> }; >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 1f19ec3..5159a49 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) >> * having removed the page entirely. >> */ >> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) { >> + /* >> + * Do not change page state, so no need to use mem_cgroup_ >> + * {begin, end}_update_page_stat to get lock. >> + */ >> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > I do not understand this comment. What serializes this function and > mem_cgroup_move_account()? > The race is exist just because the two competitors share one public variable and one reads it and the other writes it. I thought if both sides(accounting and cgroup_move) do not change page flag, then risks like doule-counting(see below) will not happen. CPU-A CPU-B Set PG_dirty (delay) move_lock_mem_cgroup() if (PageDirty(page)) new_memcg->nr_dirty++ pc->mem_cgroup = new_memcg; move_unlock_mem_cgroup() move_lock_mem_cgroup() memcg = pc->mem_cgroup new_memcg->nr_dirty++ But after second thoughts, it does have problem if without lock: CPU-A CPU-B if (PageDirty(page)) { move_lock_mem_cgroup() TestClearPageDirty(page)) memcg = pc->mem_cgroup new_memcg->nr_dirty -- move_unlock_mem_cgroup() memcg = pc->mem_cgroup new_memcg->nr_dirty-- } It may occur race between clear_page_dirty() operation. So this time I think we need the lock again... Kame, what about your opinion... >> dec_zone_page_state(page, NR_FILE_DIRTY); >> dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); >> } >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index ebed1ca..90e2946 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { >> "rss", >> "mapped_file", >> "swap", >> + "dirty", >> }; >> >> enum mem_cgroup_events_index { >> @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) >> } >> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ >> >> +static inline >> +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, >> + struct mem_cgroup *to, >> + enum mem_cgroup_stat_index idx) >> +{ >> + /* Update stat data for mem_cgroup */ >> + preempt_disable(); >> + __this_cpu_dec(from->stat->count[idx]); >> + __this_cpu_inc(to->stat->count[idx]); >> + preempt_enable(); >> +} >> + >> /** >> * mem_cgroup_move_account - move account of the page >> * @page: the page >> @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, >> >> move_lock_mem_cgroup(from,&flags); >> >> - if (!anon&& page_mapped(page)) { >> - /* Update mapped_file data for mem_cgroup */ >> - preempt_disable(); >> - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); >> - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); >> - preempt_enable(); >> - } >> + if (!anon&& page_mapped(page)) >> + mem_cgroup_move_account_page_stat(from, to, >> + MEM_CGROUP_STAT_FILE_MAPPED); >> + >> + if (PageDirty(page)) >> + mem_cgroup_move_account_page_stat(from, to, >> + MEM_CGROUP_STAT_FILE_DIRTY); >> + >> mem_cgroup_charge_statistics(from, anon, -nr_pages); >> >> /* caller should have done css_get */ >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >> index e5363f3..e79a2f7 100644 >> --- a/mm/page-writeback.c >> +++ b/mm/page-writeback.c >> @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) >> void account_page_dirtied(struct page *page, struct address_space *mapping) >> { >> if (mapping_cap_account_dirty(mapping)) { >> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > It might be helpful to add a comment to account_page_dirtied() > indicating that caller must hold mem_cgroup_begin_update_page_stat() > lock. Extra credit for an new assertion added to > mem_cgroup_update_page_stat() confirming the needed lock is held. > Got it! :-) >> __inc_zone_page_state(page, NR_FILE_DIRTY); >> __inc_zone_page_state(page, NR_DIRTIED); >> __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); >> @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); >> */ >> int __set_page_dirty_nobuffers(struct page *page) >> { >> + bool locked; >> + unsigned long flags; >> + int ret = 0; >> + >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> + > Is there a strict lock ordering which says that > mem_cgroup_begin_update_page_stat() must not be called while holding > tree_lock? If yes, then maybe we should update the 'Lock ordering' > comment in mm/filemap.c to describe the > mem_cgroup_begin_update_page_stat() lock. > I think yes, otherwise it may cause deadlock. I'll update it later. >> if (!TestSetPageDirty(page)) { >> struct address_space *mapping = page_mapping(page); >> struct address_space *mapping2; >> >> - if (!mapping) >> - return 1; >> + if (!mapping) { >> + ret = 1; >> + goto out; >> + } > > The following seems even easier because it does not need your 'ret = 1' > change below. > > + ret = 1; > if (!mapping) > - return 1; > + goto out; > > >> >> spin_lock_irq(&mapping->tree_lock); >> mapping2 = page_mapping(page); >> @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) >> /* !PageAnon&& !swapper_space */ >> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> } >> - return 1; >> + ret = 1; > With the ret=1 change above, this can be changed to: > - return 1; > Seems better. >> } >> - return 0; >> + >> +out: >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> + return ret; >> } >> EXPORT_SYMBOL(__set_page_dirty_nobuffers); >> >> @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); >> int clear_page_dirty_for_io(struct page *page) >> { >> struct address_space *mapping = page_mapping(page); >> + bool locked; >> + unsigned long flags; >> + int ret = 0; >> >> BUG_ON(!PageLocked(page)); >> >> @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) >> * the desired exclusion. See mm/memory.c:do_wp_page() >> * for more comments. >> */ >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> if (TestClearPageDirty(page)) { >> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); >> dec_zone_page_state(page, NR_FILE_DIRTY); >> dec_bdi_stat(mapping->backing_dev_info, >> BDI_RECLAIMABLE); >> - return 1; >> + ret = 1; >> } >> - return 0; >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> + return ret; >> } >> return TestClearPageDirty(page); >> } >> diff --git a/mm/truncate.c b/mm/truncate.c >> index 75801ac..052016a 100644 >> --- a/mm/truncate.c >> +++ b/mm/truncate.c >> @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) >> */ >> void cancel_dirty_page(struct page *page, unsigned int account_size) >> { >> + bool locked; >> + unsigned long flags; >> + >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> if (TestClearPageDirty(page)) { >> struct address_space *mapping = page->mapping; >> if (mapping&& mapping_cap_account_dirty(mapping)) { >> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); >> dec_zone_page_state(page, NR_FILE_DIRTY); >> dec_bdi_stat(mapping->backing_dev_info, >> BDI_RECLAIMABLE); >> @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) >> task_io_account_cancelled_write(account_size); >> } >> } >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> } >> EXPORT_SYMBOL(cancel_dirty_page); ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-11 9:32 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-11 9:32 UTC (permalink / raw) To: Greg Thelen Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/10/2012 05:02 AM, Greg Thelen wrote: > On Thu, Jun 28 2012, Sha Zhengju wrote: > >> From: Sha Zhengju<handai.szj@taobao.com> >> >> This patch adds memcg routines to count dirty pages, which allows memory controller >> to maintain an accurate view of the amount of its dirty memory and can provide some >> info for users while group's direct reclaim is working. >> >> After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can >> use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg >> has a feature to move a page from a cgroup to another one and may have race between >> "move" and "page stat accounting". So in order to avoid the race we have designed a >> bigger lock: >> >> mem_cgroup_begin_update_page_stat() >> modify page information -->(a) >> mem_cgroup_update_page_stat() -->(b) >> mem_cgroup_end_update_page_stat() >> >> It requires (a) and (b)(dirty pages accounting) can stay close enough. >> >> In the previous two prepare patches, we have reworked the vfs set page dirty routines >> and now the interfaces are more explicit: >> incrementing (2): >> __set_page_dirty >> __set_page_dirty_nobuffers >> decrementing (2): >> clear_page_dirty_for_io >> cancel_dirty_page >> >> >> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >> --- >> fs/buffer.c | 17 ++++++++++++++--- >> include/linux/memcontrol.h | 1 + >> mm/filemap.c | 5 +++++ >> mm/memcontrol.c | 28 +++++++++++++++++++++------- >> mm/page-writeback.c | 30 ++++++++++++++++++++++++------ >> mm/truncate.c | 6 ++++++ >> 6 files changed, 71 insertions(+), 16 deletions(-) >> >> diff --git a/fs/buffer.c b/fs/buffer.c >> index 55522dd..d3714cc 100644 >> --- a/fs/buffer.c >> +++ b/fs/buffer.c >> @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >> int __set_page_dirty(struct page *page, >> struct address_space *mapping, int warn) >> { >> + bool locked; >> + unsigned long flags; >> + int ret = 0; > '= 0' and 'ret = 0' change (below) are redundant. My vote is to remove > '= 0' here. > Nice catch. :-) >> + >> if (unlikely(!mapping)) >> return !TestSetPageDirty(page); >> >> - if (TestSetPageDirty(page)) >> - return 0; >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> + >> + if (TestSetPageDirty(page)) { >> + ret = 0; >> + goto out; >> + } >> >> spin_lock_irq(&mapping->tree_lock); >> if (page->mapping) { /* Race with truncate? */ >> @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, >> spin_unlock_irq(&mapping->tree_lock); >> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> >> - return 1; >> + ret = 1; >> +out: >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> + return ret; >> } >> >> /* >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 20b0f2d..ad37b59 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { >> MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ >> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ >> MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ >> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ >> MEM_CGROUP_STAT_NSTATS, >> }; >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 1f19ec3..5159a49 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) >> * having removed the page entirely. >> */ >> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) { >> + /* >> + * Do not change page state, so no need to use mem_cgroup_ >> + * {begin, end}_update_page_stat to get lock. >> + */ >> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > I do not understand this comment. What serializes this function and > mem_cgroup_move_account()? > The race is exist just because the two competitors share one public variable and one reads it and the other writes it. I thought if both sides(accounting and cgroup_move) do not change page flag, then risks like doule-counting(see below) will not happen. CPU-A CPU-B Set PG_dirty (delay) move_lock_mem_cgroup() if (PageDirty(page)) new_memcg->nr_dirty++ pc->mem_cgroup = new_memcg; move_unlock_mem_cgroup() move_lock_mem_cgroup() memcg = pc->mem_cgroup new_memcg->nr_dirty++ But after second thoughts, it does have problem if without lock: CPU-A CPU-B if (PageDirty(page)) { move_lock_mem_cgroup() TestClearPageDirty(page)) memcg = pc->mem_cgroup new_memcg->nr_dirty -- move_unlock_mem_cgroup() memcg = pc->mem_cgroup new_memcg->nr_dirty-- } It may occur race between clear_page_dirty() operation. So this time I think we need the lock again... Kame, what about your opinion... >> dec_zone_page_state(page, NR_FILE_DIRTY); >> dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); >> } >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index ebed1ca..90e2946 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -82,6 +82,7 @@ static const char * const mem_cgroup_stat_names[] = { >> "rss", >> "mapped_file", >> "swap", >> + "dirty", >> }; >> >> enum mem_cgroup_events_index { >> @@ -2538,6 +2539,18 @@ void mem_cgroup_split_huge_fixup(struct page *head) >> } >> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ >> >> +static inline >> +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from, >> + struct mem_cgroup *to, >> + enum mem_cgroup_stat_index idx) >> +{ >> + /* Update stat data for mem_cgroup */ >> + preempt_disable(); >> + __this_cpu_dec(from->stat->count[idx]); >> + __this_cpu_inc(to->stat->count[idx]); >> + preempt_enable(); >> +} >> + >> /** >> * mem_cgroup_move_account - move account of the page >> * @page: the page >> @@ -2583,13 +2596,14 @@ static int mem_cgroup_move_account(struct page *page, >> >> move_lock_mem_cgroup(from,&flags); >> >> - if (!anon&& page_mapped(page)) { >> - /* Update mapped_file data for mem_cgroup */ >> - preempt_disable(); >> - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); >> - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); >> - preempt_enable(); >> - } >> + if (!anon&& page_mapped(page)) >> + mem_cgroup_move_account_page_stat(from, to, >> + MEM_CGROUP_STAT_FILE_MAPPED); >> + >> + if (PageDirty(page)) >> + mem_cgroup_move_account_page_stat(from, to, >> + MEM_CGROUP_STAT_FILE_DIRTY); >> + >> mem_cgroup_charge_statistics(from, anon, -nr_pages); >> >> /* caller should have done css_get */ >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >> index e5363f3..e79a2f7 100644 >> --- a/mm/page-writeback.c >> +++ b/mm/page-writeback.c >> @@ -1962,6 +1962,7 @@ int __set_page_dirty_no_writeback(struct page *page) >> void account_page_dirtied(struct page *page, struct address_space *mapping) >> { >> if (mapping_cap_account_dirty(mapping)) { >> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); > It might be helpful to add a comment to account_page_dirtied() > indicating that caller must hold mem_cgroup_begin_update_page_stat() > lock. Extra credit for an new assertion added to > mem_cgroup_update_page_stat() confirming the needed lock is held. > Got it! :-) >> __inc_zone_page_state(page, NR_FILE_DIRTY); >> __inc_zone_page_state(page, NR_DIRTIED); >> __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); >> @@ -2001,12 +2002,20 @@ EXPORT_SYMBOL(account_page_writeback); >> */ >> int __set_page_dirty_nobuffers(struct page *page) >> { >> + bool locked; >> + unsigned long flags; >> + int ret = 0; >> + >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> + > Is there a strict lock ordering which says that > mem_cgroup_begin_update_page_stat() must not be called while holding > tree_lock? If yes, then maybe we should update the 'Lock ordering' > comment in mm/filemap.c to describe the > mem_cgroup_begin_update_page_stat() lock. > I think yes, otherwise it may cause deadlock. I'll update it later. >> if (!TestSetPageDirty(page)) { >> struct address_space *mapping = page_mapping(page); >> struct address_space *mapping2; >> >> - if (!mapping) >> - return 1; >> + if (!mapping) { >> + ret = 1; >> + goto out; >> + } > > The following seems even easier because it does not need your 'ret = 1' > change below. > > + ret = 1; > if (!mapping) > - return 1; > + goto out; > > >> >> spin_lock_irq(&mapping->tree_lock); >> mapping2 = page_mapping(page); >> @@ -2022,9 +2031,12 @@ int __set_page_dirty_nobuffers(struct page *page) >> /* !PageAnon&& !swapper_space */ >> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> } >> - return 1; >> + ret = 1; > With the ret=1 change above, this can be changed to: > - return 1; > Seems better. >> } >> - return 0; >> + >> +out: >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> + return ret; >> } >> EXPORT_SYMBOL(__set_page_dirty_nobuffers); >> >> @@ -2139,6 +2151,9 @@ EXPORT_SYMBOL(set_page_dirty_lock); >> int clear_page_dirty_for_io(struct page *page) >> { >> struct address_space *mapping = page_mapping(page); >> + bool locked; >> + unsigned long flags; >> + int ret = 0; >> >> BUG_ON(!PageLocked(page)); >> >> @@ -2180,13 +2195,16 @@ int clear_page_dirty_for_io(struct page *page) >> * the desired exclusion. See mm/memory.c:do_wp_page() >> * for more comments. >> */ >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> if (TestClearPageDirty(page)) { >> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); >> dec_zone_page_state(page, NR_FILE_DIRTY); >> dec_bdi_stat(mapping->backing_dev_info, >> BDI_RECLAIMABLE); >> - return 1; >> + ret = 1; >> } >> - return 0; >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> + return ret; >> } >> return TestClearPageDirty(page); >> } >> diff --git a/mm/truncate.c b/mm/truncate.c >> index 75801ac..052016a 100644 >> --- a/mm/truncate.c >> +++ b/mm/truncate.c >> @@ -73,9 +73,14 @@ static inline void truncate_partial_page(struct page *page, unsigned partial) >> */ >> void cancel_dirty_page(struct page *page, unsigned int account_size) >> { >> + bool locked; >> + unsigned long flags; >> + >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> if (TestClearPageDirty(page)) { >> struct address_space *mapping = page->mapping; >> if (mapping&& mapping_cap_account_dirty(mapping)) { >> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); >> dec_zone_page_state(page, NR_FILE_DIRTY); >> dec_bdi_stat(mapping->backing_dev_info, >> BDI_RECLAIMABLE); >> @@ -83,6 +88,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size) >> task_io_account_cancelled_write(account_size); >> } >> } >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> } >> EXPORT_SYMBOL(cancel_dirty_page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting 2012-07-11 9:32 ` Sha Zhengju (?) @ 2012-07-19 6:33 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-19 6:33 UTC (permalink / raw) To: Sha Zhengju Cc: Greg Thelen, linux-mm, cgroups, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/07/11 18:32), Sha Zhengju wrote: > On 07/10/2012 05:02 AM, Greg Thelen wrote: >> On Thu, Jun 28 2012, Sha Zhengju wrote: >> >>> From: Sha Zhengju<handai.szj@taobao.com> >>> >>> This patch adds memcg routines to count dirty pages, which allows memory controller >>> to maintain an accurate view of the amount of its dirty memory and can provide some >>> info for users while group's direct reclaim is working. >>> >>> After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can >>> use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg >>> has a feature to move a page from a cgroup to another one and may have race between >>> "move" and "page stat accounting". So in order to avoid the race we have designed a >>> bigger lock: >>> >>> mem_cgroup_begin_update_page_stat() >>> modify page information -->(a) >>> mem_cgroup_update_page_stat() -->(b) >>> mem_cgroup_end_update_page_stat() >>> >>> It requires (a) and (b)(dirty pages accounting) can stay close enough. >>> >>> In the previous two prepare patches, we have reworked the vfs set page dirty routines >>> and now the interfaces are more explicit: >>> incrementing (2): >>> __set_page_dirty >>> __set_page_dirty_nobuffers >>> decrementing (2): >>> clear_page_dirty_for_io >>> cancel_dirty_page >>> >>> >>> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >>> --- >>> fs/buffer.c | 17 ++++++++++++++--- >>> include/linux/memcontrol.h | 1 + >>> mm/filemap.c | 5 +++++ >>> mm/memcontrol.c | 28 +++++++++++++++++++++------- >>> mm/page-writeback.c | 30 ++++++++++++++++++++++++------ >>> mm/truncate.c | 6 ++++++ >>> 6 files changed, 71 insertions(+), 16 deletions(-) >>> >>> diff --git a/fs/buffer.c b/fs/buffer.c >>> index 55522dd..d3714cc 100644 >>> --- a/fs/buffer.c >>> +++ b/fs/buffer.c >>> @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >>> int __set_page_dirty(struct page *page, >>> struct address_space *mapping, int warn) >>> { >>> + bool locked; >>> + unsigned long flags; >>> + int ret = 0; >> '= 0' and 'ret = 0' change (below) are redundant. My vote is to remove >> '= 0' here. >> > > Nice catch. :-) > >>> + >>> if (unlikely(!mapping)) >>> return !TestSetPageDirty(page); >>> >>> - if (TestSetPageDirty(page)) >>> - return 0; >>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>> + >>> + if (TestSetPageDirty(page)) { >>> + ret = 0; >>> + goto out; >>> + } >>> >>> spin_lock_irq(&mapping->tree_lock); >>> if (page->mapping) { /* Race with truncate? */ >>> @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, >>> spin_unlock_irq(&mapping->tree_lock); >>> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >>> >>> - return 1; >>> + ret = 1; >>> +out: >>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>> + return ret; >>> } >>> >>> /* >>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >>> index 20b0f2d..ad37b59 100644 >>> --- a/include/linux/memcontrol.h >>> +++ b/include/linux/memcontrol.h >>> @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { >>> MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ >>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ >>> MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ >>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ >>> MEM_CGROUP_STAT_NSTATS, >>> }; >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 1f19ec3..5159a49 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) >>> * having removed the page entirely. >>> */ >>> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) { >>> + /* >>> + * Do not change page state, so no need to use mem_cgroup_ >>> + * {begin, end}_update_page_stat to get lock. >>> + */ >>> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); >> I do not understand this comment. What serializes this function and >> mem_cgroup_move_account()? >> > > The race is exist just because the two competitors share one > public variable and one reads it and the other writes it. > I thought if both sides(accounting and cgroup_move) do not > change page flag, then risks like doule-counting(see below) > will not happen. > > CPU-A CPU-B > Set PG_dirty > (delay) move_lock_mem_cgroup() > if (PageDirty(page)) > new_memcg->nr_dirty++ > pc->mem_cgroup = new_memcg; > move_unlock_mem_cgroup() > move_lock_mem_cgroup() > memcg = pc->mem_cgroup > new_memcg->nr_dirty++ > > > But after second thoughts, it does have problem if without lock: > > CPU-A CPU-B > if (PageDirty(page)) { > move_lock_mem_cgroup() > TestClearPageDirty(page)) > memcg = pc->mem_cgroup > new_memcg->nr_dirty -- > move_unlock_mem_cgroup() > > memcg = pc->mem_cgroup > new_memcg->nr_dirty-- > } > > > It may occur race between clear_page_dirty() operation. > So this time I think we need the lock again... > > Kame, what about your opinion... > I think Dirty bit is cleared implicitly here...So, having lock will be good. Thanks, -Kame ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-19 6:33 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-19 6:33 UTC (permalink / raw) To: Sha Zhengju Cc: Greg Thelen, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju (2012/07/11 18:32), Sha Zhengju wrote: > On 07/10/2012 05:02 AM, Greg Thelen wrote: >> On Thu, Jun 28 2012, Sha Zhengju wrote: >> >>> From: Sha Zhengju<handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> >>> >>> This patch adds memcg routines to count dirty pages, which allows memory controller >>> to maintain an accurate view of the amount of its dirty memory and can provide some >>> info for users while group's direct reclaim is working. >>> >>> After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can >>> use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg >>> has a feature to move a page from a cgroup to another one and may have race between >>> "move" and "page stat accounting". So in order to avoid the race we have designed a >>> bigger lock: >>> >>> mem_cgroup_begin_update_page_stat() >>> modify page information -->(a) >>> mem_cgroup_update_page_stat() -->(b) >>> mem_cgroup_end_update_page_stat() >>> >>> It requires (a) and (b)(dirty pages accounting) can stay close enough. >>> >>> In the previous two prepare patches, we have reworked the vfs set page dirty routines >>> and now the interfaces are more explicit: >>> incrementing (2): >>> __set_page_dirty >>> __set_page_dirty_nobuffers >>> decrementing (2): >>> clear_page_dirty_for_io >>> cancel_dirty_page >>> >>> >>> Signed-off-by: Sha Zhengju<handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> >>> --- >>> fs/buffer.c | 17 ++++++++++++++--- >>> include/linux/memcontrol.h | 1 + >>> mm/filemap.c | 5 +++++ >>> mm/memcontrol.c | 28 +++++++++++++++++++++------- >>> mm/page-writeback.c | 30 ++++++++++++++++++++++++------ >>> mm/truncate.c | 6 ++++++ >>> 6 files changed, 71 insertions(+), 16 deletions(-) >>> >>> diff --git a/fs/buffer.c b/fs/buffer.c >>> index 55522dd..d3714cc 100644 >>> --- a/fs/buffer.c >>> +++ b/fs/buffer.c >>> @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >>> int __set_page_dirty(struct page *page, >>> struct address_space *mapping, int warn) >>> { >>> + bool locked; >>> + unsigned long flags; >>> + int ret = 0; >> '= 0' and 'ret = 0' change (below) are redundant. My vote is to remove >> '= 0' here. >> > > Nice catch. :-) > >>> + >>> if (unlikely(!mapping)) >>> return !TestSetPageDirty(page); >>> >>> - if (TestSetPageDirty(page)) >>> - return 0; >>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>> + >>> + if (TestSetPageDirty(page)) { >>> + ret = 0; >>> + goto out; >>> + } >>> >>> spin_lock_irq(&mapping->tree_lock); >>> if (page->mapping) { /* Race with truncate? */ >>> @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, >>> spin_unlock_irq(&mapping->tree_lock); >>> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >>> >>> - return 1; >>> + ret = 1; >>> +out: >>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>> + return ret; >>> } >>> >>> /* >>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >>> index 20b0f2d..ad37b59 100644 >>> --- a/include/linux/memcontrol.h >>> +++ b/include/linux/memcontrol.h >>> @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { >>> MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ >>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ >>> MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ >>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ >>> MEM_CGROUP_STAT_NSTATS, >>> }; >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 1f19ec3..5159a49 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) >>> * having removed the page entirely. >>> */ >>> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) { >>> + /* >>> + * Do not change page state, so no need to use mem_cgroup_ >>> + * {begin, end}_update_page_stat to get lock. >>> + */ >>> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); >> I do not understand this comment. What serializes this function and >> mem_cgroup_move_account()? >> > > The race is exist just because the two competitors share one > public variable and one reads it and the other writes it. > I thought if both sides(accounting and cgroup_move) do not > change page flag, then risks like doule-counting(see below) > will not happen. > > CPU-A CPU-B > Set PG_dirty > (delay) move_lock_mem_cgroup() > if (PageDirty(page)) > new_memcg->nr_dirty++ > pc->mem_cgroup = new_memcg; > move_unlock_mem_cgroup() > move_lock_mem_cgroup() > memcg = pc->mem_cgroup > new_memcg->nr_dirty++ > > > But after second thoughts, it does have problem if without lock: > > CPU-A CPU-B > if (PageDirty(page)) { > move_lock_mem_cgroup() > TestClearPageDirty(page)) > memcg = pc->mem_cgroup > new_memcg->nr_dirty -- > move_unlock_mem_cgroup() > > memcg = pc->mem_cgroup > new_memcg->nr_dirty-- > } > > > It may occur race between clear_page_dirty() operation. > So this time I think we need the lock again... > > Kame, what about your opinion... > I think Dirty bit is cleared implicitly here...So, having lock will be good. Thanks, -Kame ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 5/7] memcg: add per cgroup dirty pages accounting @ 2012-07-19 6:33 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-19 6:33 UTC (permalink / raw) To: Sha Zhengju Cc: Greg Thelen, linux-mm, cgroups, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/07/11 18:32), Sha Zhengju wrote: > On 07/10/2012 05:02 AM, Greg Thelen wrote: >> On Thu, Jun 28 2012, Sha Zhengju wrote: >> >>> From: Sha Zhengju<handai.szj@taobao.com> >>> >>> This patch adds memcg routines to count dirty pages, which allows memory controller >>> to maintain an accurate view of the amount of its dirty memory and can provide some >>> info for users while group's direct reclaim is working. >>> >>> After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can >>> use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg >>> has a feature to move a page from a cgroup to another one and may have race between >>> "move" and "page stat accounting". So in order to avoid the race we have designed a >>> bigger lock: >>> >>> mem_cgroup_begin_update_page_stat() >>> modify page information -->(a) >>> mem_cgroup_update_page_stat() -->(b) >>> mem_cgroup_end_update_page_stat() >>> >>> It requires (a) and (b)(dirty pages accounting) can stay close enough. >>> >>> In the previous two prepare patches, we have reworked the vfs set page dirty routines >>> and now the interfaces are more explicit: >>> incrementing (2): >>> __set_page_dirty >>> __set_page_dirty_nobuffers >>> decrementing (2): >>> clear_page_dirty_for_io >>> cancel_dirty_page >>> >>> >>> Signed-off-by: Sha Zhengju<handai.szj@taobao.com> >>> --- >>> fs/buffer.c | 17 ++++++++++++++--- >>> include/linux/memcontrol.h | 1 + >>> mm/filemap.c | 5 +++++ >>> mm/memcontrol.c | 28 +++++++++++++++++++++------- >>> mm/page-writeback.c | 30 ++++++++++++++++++++++++------ >>> mm/truncate.c | 6 ++++++ >>> 6 files changed, 71 insertions(+), 16 deletions(-) >>> >>> diff --git a/fs/buffer.c b/fs/buffer.c >>> index 55522dd..d3714cc 100644 >>> --- a/fs/buffer.c >>> +++ b/fs/buffer.c >>> @@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); >>> int __set_page_dirty(struct page *page, >>> struct address_space *mapping, int warn) >>> { >>> + bool locked; >>> + unsigned long flags; >>> + int ret = 0; >> '= 0' and 'ret = 0' change (below) are redundant. My vote is to remove >> '= 0' here. >> > > Nice catch. :-) > >>> + >>> if (unlikely(!mapping)) >>> return !TestSetPageDirty(page); >>> >>> - if (TestSetPageDirty(page)) >>> - return 0; >>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>> + >>> + if (TestSetPageDirty(page)) { >>> + ret = 0; >>> + goto out; >>> + } >>> >>> spin_lock_irq(&mapping->tree_lock); >>> if (page->mapping) { /* Race with truncate? */ >>> @@ -629,7 +637,10 @@ int __set_page_dirty(struct page *page, >>> spin_unlock_irq(&mapping->tree_lock); >>> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >>> >>> - return 1; >>> + ret = 1; >>> +out: >>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>> + return ret; >>> } >>> >>> /* >>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >>> index 20b0f2d..ad37b59 100644 >>> --- a/include/linux/memcontrol.h >>> +++ b/include/linux/memcontrol.h >>> @@ -38,6 +38,7 @@ enum mem_cgroup_stat_index { >>> MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ >>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ >>> MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ >>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ >>> MEM_CGROUP_STAT_NSTATS, >>> }; >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 1f19ec3..5159a49 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -140,6 +140,11 @@ void __delete_from_page_cache(struct page *page) >>> * having removed the page entirely. >>> */ >>> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) { >>> + /* >>> + * Do not change page state, so no need to use mem_cgroup_ >>> + * {begin, end}_update_page_stat to get lock. >>> + */ >>> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY); >> I do not understand this comment. What serializes this function and >> mem_cgroup_move_account()? >> > > The race is exist just because the two competitors share one > public variable and one reads it and the other writes it. > I thought if both sides(accounting and cgroup_move) do not > change page flag, then risks like doule-counting(see below) > will not happen. > > CPU-A CPU-B > Set PG_dirty > (delay) move_lock_mem_cgroup() > if (PageDirty(page)) > new_memcg->nr_dirty++ > pc->mem_cgroup = new_memcg; > move_unlock_mem_cgroup() > move_lock_mem_cgroup() > memcg = pc->mem_cgroup > new_memcg->nr_dirty++ > > > But after second thoughts, it does have problem if without lock: > > CPU-A CPU-B > if (PageDirty(page)) { > move_lock_mem_cgroup() > TestClearPageDirty(page)) > memcg = pc->mem_cgroup > new_memcg->nr_dirty -- > move_unlock_mem_cgroup() > > memcg = pc->mem_cgroup > new_memcg->nr_dirty-- > } > > > It may occur race between clear_page_dirty() operation. > So this time I think we need the lock again... > > Kame, what about your opinion... > I think Dirty bit is cleared implicitly here...So, having lock will be good. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-06-28 10:54 ` Sha Zhengju @ 2012-06-28 11:05 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:05 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Similar to dirty page, we add per cgroup writeback pages accounting. The lock rule still is: mem_cgroup_begin_update_page_stat() modify page WRITEBACK stat mem_cgroup_update_page_stat() mem_cgroup_end_update_page_stat() There're two writeback interface to modify: test_clear/set_page_writeback. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 5 +++++ mm/page-writeback.c | 12 ++++++++++++ 3 files changed, 18 insertions(+), 0 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ad37b59..9193d93 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90e2946..8493119 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { "mapped_file", "swap", "dirty", + "writeback", }; enum mem_cgroup_events_index { @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, mem_cgroup_move_account_page_stat(from, to, MEM_CGROUP_STAT_FILE_DIRTY); + if (PageWriteback(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_WRITEBACK); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e79a2f7..7398836 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); */ void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } EXPORT_SYMBOL(account_page_writeback); @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) ret = TestClearPageWriteback(page); } if (ret) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); dec_zone_page_state(page, NR_WRITEBACK); inc_zone_page_state(page, NR_WRITTEN); } + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) } if (!ret) account_page_writeback(page); + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-06-28 11:05 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:05 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Similar to dirty page, we add per cgroup writeback pages accounting. The lock rule still is: mem_cgroup_begin_update_page_stat() modify page WRITEBACK stat mem_cgroup_update_page_stat() mem_cgroup_end_update_page_stat() There're two writeback interface to modify: test_clear/set_page_writeback. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 5 +++++ mm/page-writeback.c | 12 ++++++++++++ 3 files changed, 18 insertions(+), 0 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ad37b59..9193d93 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90e2946..8493119 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { "mapped_file", "swap", "dirty", + "writeback", }; enum mem_cgroup_events_index { @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, mem_cgroup_move_account_page_stat(from, to, MEM_CGROUP_STAT_FILE_DIRTY); + if (PageWriteback(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_WRITEBACK); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e79a2f7..7398836 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); */ void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } EXPORT_SYMBOL(account_page_writeback); @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) ret = TestClearPageWriteback(page); } if (ret) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); dec_zone_page_state(page, NR_WRITEBACK); inc_zone_page_state(page, NR_WRITTEN); } + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) } if (!ret) account_page_writeback(page); + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-06-28 11:05 ` Sha Zhengju @ 2012-07-03 6:31 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-03 6:31 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 20:05), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > rule still is: > mem_cgroup_begin_update_page_stat() > modify page WRITEBACK stat > mem_cgroup_update_page_stat() > mem_cgroup_end_update_page_stat() > > There're two writeback interface to modify: test_clear/set_page_writeback. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX but I wonder these counters will be used for accounting swap-out's dirty pages.. STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? Thanks, -Kame ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-03 6:31 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-03 6:31 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 20:05), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > rule still is: > mem_cgroup_begin_update_page_stat() > modify page WRITEBACK stat > mem_cgroup_update_page_stat() > mem_cgroup_end_update_page_stat() > > There're two writeback interface to modify: test_clear/set_page_writeback. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX but I wonder these counters will be used for accounting swap-out's dirty pages.. STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-03 6:31 ` Kamezawa Hiroyuki @ 2012-07-04 8:24 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 8:24 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/03/2012 02:31 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 20:05), Sha Zhengju wrote: >> From: Sha Zhengju <handai.szj@taobao.com> >> >> Similar to dirty page, we add per cgroup writeback pages accounting. The lock >> rule still is: >> mem_cgroup_begin_update_page_stat() >> modify page WRITEBACK stat >> mem_cgroup_update_page_stat() >> mem_cgroup_end_update_page_stat() >> >> There're two writeback interface to modify: test_clear/set_page_writeback. >> >> Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? Okay, STAT_DIRTY/WRITEBACK seem good, I'll change them in next version. Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-04 8:24 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 8:24 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/03/2012 02:31 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 20:05), Sha Zhengju wrote: >> From: Sha Zhengju <handai.szj@taobao.com> >> >> Similar to dirty page, we add per cgroup writeback pages accounting. The lock >> rule still is: >> mem_cgroup_begin_update_page_stat() >> modify page WRITEBACK stat >> mem_cgroup_update_page_stat() >> mem_cgroup_end_update_page_stat() >> >> There're two writeback interface to modify: test_clear/set_page_writeback. >> >> Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? Okay, STAT_DIRTY/WRITEBACK seem good, I'll change them in next version. Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-03 6:31 ` Kamezawa Hiroyuki @ 2012-07-08 14:44 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-08 14:44 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Tue, Jul 03, 2012 at 03:31:26PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/06/28 20:05), Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > > rule still is: > > mem_cgroup_begin_update_page_stat() > > modify page WRITEBACK stat > > mem_cgroup_update_page_stat() > > mem_cgroup_end_update_page_stat() > > > > There're two writeback interface to modify: test_clear/set_page_writeback. > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? Perhaps we can follow the established "enum zone_stat_item" names: NR_FILE_DIRTY, NR_WRITEBACK, s/NR_/MEM_CGROUP_STAT_/ The names indicate that dirty pages for anonymous pages are not accounted (by __set_page_dirty_no_writeback()). While the writeback pages accounting include both the file/anon pages. Ah then we'll need to update the document in patch 0 accordingly. This may sound a bit tricky to the users.. Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-08 14:44 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-08 14:44 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Tue, Jul 03, 2012 at 03:31:26PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/06/28 20:05), Sha Zhengju wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > > rule still is: > > mem_cgroup_begin_update_page_stat() > > modify page WRITEBACK stat > > mem_cgroup_update_page_stat() > > mem_cgroup_end_update_page_stat() > > > > There're two writeback interface to modify: test_clear/set_page_writeback. > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? Perhaps we can follow the established "enum zone_stat_item" names: NR_FILE_DIRTY, NR_WRITEBACK, s/NR_/MEM_CGROUP_STAT_/ The names indicate that dirty pages for anonymous pages are not accounted (by __set_page_dirty_no_writeback()). While the writeback pages accounting include both the file/anon pages. Ah then we'll need to update the document in patch 0 accordingly. This may sound a bit tricky to the users.. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-08 14:44 ` Fengguang Wu @ 2012-07-08 23:01 ` Johannes Weiner -1 siblings, 0 replies; 132+ messages in thread From: Johannes Weiner @ 2012-07-08 23:01 UTC (permalink / raw) To: Fengguang Wu Cc: Kamezawa Hiroyuki, Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Sun, Jul 08, 2012 at 10:44:59PM +0800, Fengguang Wu wrote: > On Tue, Jul 03, 2012 at 03:31:26PM +0900, KAMEZAWA Hiroyuki wrote: > > (2012/06/28 20:05), Sha Zhengju wrote: > > > From: Sha Zhengju <handai.szj@taobao.com> > > > > > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > > > rule still is: > > > mem_cgroup_begin_update_page_stat() > > > modify page WRITEBACK stat > > > mem_cgroup_update_page_stat() > > > mem_cgroup_end_update_page_stat() > > > > > > There're two writeback interface to modify: test_clear/set_page_writeback. > > > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > > > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? > > Perhaps we can follow the established "enum zone_stat_item" names: > > NR_FILE_DIRTY, > NR_WRITEBACK, > > s/NR_/MEM_CGROUP_STAT_/ > > The names indicate that dirty pages for anonymous pages are not > accounted (by __set_page_dirty_no_writeback()). While the writeback > pages accounting include both the file/anon pages. > > Ah then we'll need to update the document in patch 0 accordingly. This > may sound a bit tricky to the users.. We already report the global one as "nr_dirty", though. Please don't give the memcg one a different name. The enum naming is not too critical, but it would be nice to have it match the public name. ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-08 23:01 ` Johannes Weiner 0 siblings, 0 replies; 132+ messages in thread From: Johannes Weiner @ 2012-07-08 23:01 UTC (permalink / raw) To: Fengguang Wu Cc: Kamezawa Hiroyuki, Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Sun, Jul 08, 2012 at 10:44:59PM +0800, Fengguang Wu wrote: > On Tue, Jul 03, 2012 at 03:31:26PM +0900, KAMEZAWA Hiroyuki wrote: > > (2012/06/28 20:05), Sha Zhengju wrote: > > > From: Sha Zhengju <handai.szj@taobao.com> > > > > > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > > > rule still is: > > > mem_cgroup_begin_update_page_stat() > > > modify page WRITEBACK stat > > > mem_cgroup_update_page_stat() > > > mem_cgroup_end_update_page_stat() > > > > > > There're two writeback interface to modify: test_clear/set_page_writeback. > > > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > > > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? > > Perhaps we can follow the established "enum zone_stat_item" names: > > NR_FILE_DIRTY, > NR_WRITEBACK, > > s/NR_/MEM_CGROUP_STAT_/ > > The names indicate that dirty pages for anonymous pages are not > accounted (by __set_page_dirty_no_writeback()). While the writeback > pages accounting include both the file/anon pages. > > Ah then we'll need to update the document in patch 0 accordingly. This > may sound a bit tricky to the users.. We already report the global one as "nr_dirty", though. Please don't give the memcg one a different name. The enum naming is not too critical, but it would be nice to have it match the public name. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-08 23:01 ` Johannes Weiner (?) @ 2012-07-09 1:37 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 1:37 UTC (permalink / raw) To: Johannes Weiner Cc: Kamezawa Hiroyuki, Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Mon, Jul 09, 2012 at 01:01:00AM +0200, Johannes Weiner wrote: > On Sun, Jul 08, 2012 at 10:44:59PM +0800, Fengguang Wu wrote: > > On Tue, Jul 03, 2012 at 03:31:26PM +0900, KAMEZAWA Hiroyuki wrote: > > > (2012/06/28 20:05), Sha Zhengju wrote: > > > > From: Sha Zhengju <handai.szj@taobao.com> > > > > > > > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > > > > rule still is: > > > > mem_cgroup_begin_update_page_stat() > > > > modify page WRITEBACK stat > > > > mem_cgroup_update_page_stat() > > > > mem_cgroup_end_update_page_stat() > > > > > > > > There're two writeback interface to modify: test_clear/set_page_writeback. > > > > > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > > > > > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > > > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > > > > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? > > > > Perhaps we can follow the established "enum zone_stat_item" names: > > > > NR_FILE_DIRTY, > > NR_WRITEBACK, > > > > s/NR_/MEM_CGROUP_STAT_/ > > > > The names indicate that dirty pages for anonymous pages are not > > accounted (by __set_page_dirty_no_writeback()). While the writeback > > pages accounting include both the file/anon pages. > > > > Ah then we'll need to update the document in patch 0 accordingly. This > > may sound a bit tricky to the users.. > > We already report the global one as "nr_dirty", though. Please don't > give the memcg one a different name. > > The enum naming is not too critical, but it would be nice to have it > match the public name. Fair enough. The public name obviously has more weight :) Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 1:37 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 1:37 UTC (permalink / raw) To: Johannes Weiner Cc: Kamezawa Hiroyuki, Sha Zhengju, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On Mon, Jul 09, 2012 at 01:01:00AM +0200, Johannes Weiner wrote: > On Sun, Jul 08, 2012 at 10:44:59PM +0800, Fengguang Wu wrote: > > On Tue, Jul 03, 2012 at 03:31:26PM +0900, KAMEZAWA Hiroyuki wrote: > > > (2012/06/28 20:05), Sha Zhengju wrote: > > > > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > > > > > > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > > > > rule still is: > > > > mem_cgroup_begin_update_page_stat() > > > > modify page WRITEBACK stat > > > > mem_cgroup_update_page_stat() > > > > mem_cgroup_end_update_page_stat() > > > > > > > > There're two writeback interface to modify: test_clear/set_page_writeback. > > > > > > > > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > > > > > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > > > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > > > > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? > > > > Perhaps we can follow the established "enum zone_stat_item" names: > > > > NR_FILE_DIRTY, > > NR_WRITEBACK, > > > > s/NR_/MEM_CGROUP_STAT_/ > > > > The names indicate that dirty pages for anonymous pages are not > > accounted (by __set_page_dirty_no_writeback()). While the writeback > > pages accounting include both the file/anon pages. > > > > Ah then we'll need to update the document in patch 0 accordingly. This > > may sound a bit tricky to the users.. > > We already report the global one as "nr_dirty", though. Please don't > give the memcg one a different name. > > The enum naming is not too critical, but it would be nice to have it > match the public name. Fair enough. The public name obviously has more weight :) Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 1:37 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 1:37 UTC (permalink / raw) To: Johannes Weiner Cc: Kamezawa Hiroyuki, Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Mon, Jul 09, 2012 at 01:01:00AM +0200, Johannes Weiner wrote: > On Sun, Jul 08, 2012 at 10:44:59PM +0800, Fengguang Wu wrote: > > On Tue, Jul 03, 2012 at 03:31:26PM +0900, KAMEZAWA Hiroyuki wrote: > > > (2012/06/28 20:05), Sha Zhengju wrote: > > > > From: Sha Zhengju <handai.szj@taobao.com> > > > > > > > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > > > > rule still is: > > > > mem_cgroup_begin_update_page_stat() > > > > modify page WRITEBACK stat > > > > mem_cgroup_update_page_stat() > > > > mem_cgroup_end_update_page_stat() > > > > > > > > There're two writeback interface to modify: test_clear/set_page_writeback. > > > > > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > > > > > Seems good to me. BTW, you named macros as MEM_CGROUP_STAT_FILE_XXX > > > but I wonder these counters will be used for accounting swap-out's dirty pages.. > > > > > > STAT_DIRTY, STAT_WRITEBACK ? do you have better name ? > > > > Perhaps we can follow the established "enum zone_stat_item" names: > > > > NR_FILE_DIRTY, > > NR_WRITEBACK, > > > > s/NR_/MEM_CGROUP_STAT_/ > > > > The names indicate that dirty pages for anonymous pages are not > > accounted (by __set_page_dirty_no_writeback()). While the writeback > > pages accounting include both the file/anon pages. > > > > Ah then we'll need to update the document in patch 0 accordingly. This > > may sound a bit tricky to the users.. > > We already report the global one as "nr_dirty", though. Please don't > give the memcg one a different name. > > The enum naming is not too critical, but it would be nice to have it > match the public name. Fair enough. The public name obviously has more weight :) Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-06-28 11:05 ` Sha Zhengju @ 2012-07-04 16:15 ` Michal Hocko -1 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 16:15 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju, Jan Kara, Wu Fengguang [Let's add writeback people] On Thu 28-06-12 19:05:25, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > rule still is: > mem_cgroup_begin_update_page_stat() > modify page WRITEBACK stat > mem_cgroup_update_page_stat() > mem_cgroup_end_update_page_stat() > > There're two writeback interface to modify: test_clear/set_page_writeback. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 1 + > mm/memcontrol.c | 5 +++++ > mm/page-writeback.c | 12 ++++++++++++ > 3 files changed, 18 insertions(+), 0 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index ad37b59..9193d93 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 90e2946..8493119 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { > "mapped_file", > "swap", > "dirty", > + "writeback", > }; > > enum mem_cgroup_events_index { > @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, > mem_cgroup_move_account_page_stat(from, to, > MEM_CGROUP_STAT_FILE_DIRTY); > > + if (PageWriteback(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_WRITEBACK); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e79a2f7..7398836 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); > */ > void account_page_writeback(struct page *page) > { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > inc_zone_page_state(page, NR_WRITEBACK); > } > EXPORT_SYMBOL(account_page_writeback); > @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) > ret = TestClearPageWriteback(page); > } > if (ret) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > dec_zone_page_state(page, NR_WRITEBACK); > inc_zone_page_state(page, NR_WRITTEN); > } > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > } > > @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > } > if (!ret) > account_page_writeback(page); > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > > } > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-04 16:15 ` Michal Hocko 0 siblings, 0 replies; 132+ messages in thread From: Michal Hocko @ 2012-07-04 16:15 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, linux-kernel, Sha Zhengju, Jan Kara, Wu Fengguang [Let's add writeback people] On Thu 28-06-12 19:05:25, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > rule still is: > mem_cgroup_begin_update_page_stat() > modify page WRITEBACK stat > mem_cgroup_update_page_stat() > mem_cgroup_end_update_page_stat() > > There're two writeback interface to modify: test_clear/set_page_writeback. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 1 + > mm/memcontrol.c | 5 +++++ > mm/page-writeback.c | 12 ++++++++++++ > 3 files changed, 18 insertions(+), 0 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index ad37b59..9193d93 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 90e2946..8493119 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { > "mapped_file", > "swap", > "dirty", > + "writeback", > }; > > enum mem_cgroup_events_index { > @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, > mem_cgroup_move_account_page_stat(from, to, > MEM_CGROUP_STAT_FILE_DIRTY); > > + if (PageWriteback(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_WRITEBACK); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e79a2f7..7398836 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); > */ > void account_page_writeback(struct page *page) > { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > inc_zone_page_state(page, NR_WRITEBACK); > } > EXPORT_SYMBOL(account_page_writeback); > @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) > ret = TestClearPageWriteback(page); > } > if (ret) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > dec_zone_page_state(page, NR_WRITEBACK); > inc_zone_page_state(page, NR_WRITTEN); > } > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > } > > @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > } > if (!ret) > account_page_writeback(page); > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > > } > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-06-28 10:54 ` Sha Zhengju (?) @ 2012-06-28 11:06 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:06 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Similar to dirty page, we add per cgroup writeback pages accounting. The lock rule still is: mem_cgroup_begin_update_page_stat() modify page WRITEBACK stat mem_cgroup_update_page_stat() mem_cgroup_end_update_page_stat() There're two writeback interface to modify: test_clear/set_page_writeback. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 5 +++++ mm/page-writeback.c | 12 ++++++++++++ 3 files changed, 18 insertions(+), 0 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ad37b59..9193d93 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90e2946..8493119 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { "mapped_file", "swap", "dirty", + "writeback", }; enum mem_cgroup_events_index { @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, mem_cgroup_move_account_page_stat(from, to, MEM_CGROUP_STAT_FILE_DIRTY); + if (PageWriteback(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_WRITEBACK); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e79a2f7..7398836 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); */ void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } EXPORT_SYMBOL(account_page_writeback); @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) ret = TestClearPageWriteback(page); } if (ret) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); dec_zone_page_state(page, NR_WRITEBACK); inc_zone_page_state(page, NR_WRITTEN); } + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) } if (!ret) account_page_writeback(page); + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-06-28 11:06 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:06 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA Cc: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> Similar to dirty page, we add per cgroup writeback pages accounting. The lock rule still is: mem_cgroup_begin_update_page_stat() modify page WRITEBACK stat mem_cgroup_update_page_stat() mem_cgroup_end_update_page_stat() There're two writeback interface to modify: test_clear/set_page_writeback. Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 5 +++++ mm/page-writeback.c | 12 ++++++++++++ 3 files changed, 18 insertions(+), 0 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ad37b59..9193d93 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90e2946..8493119 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { "mapped_file", "swap", "dirty", + "writeback", }; enum mem_cgroup_events_index { @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, mem_cgroup_move_account_page_stat(from, to, MEM_CGROUP_STAT_FILE_DIRTY); + if (PageWriteback(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_WRITEBACK); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e79a2f7..7398836 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); */ void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } EXPORT_SYMBOL(account_page_writeback); @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) ret = TestClearPageWriteback(page); } if (ret) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); dec_zone_page_state(page, NR_WRITEBACK); inc_zone_page_state(page, NR_WRITTEN); } + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) } if (!ret) account_page_writeback(page); + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-06-28 11:06 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:06 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> Similar to dirty page, we add per cgroup writeback pages accounting. The lock rule still is: mem_cgroup_begin_update_page_stat() modify page WRITEBACK stat mem_cgroup_update_page_stat() mem_cgroup_end_update_page_stat() There're two writeback interface to modify: test_clear/set_page_writeback. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 5 +++++ mm/page-writeback.c | 12 ++++++++++++ 3 files changed, 18 insertions(+), 0 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ad37b59..9193d93 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ MEM_CGROUP_STAT_NSTATS, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90e2946..8493119 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { "mapped_file", "swap", "dirty", + "writeback", }; enum mem_cgroup_events_index { @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, mem_cgroup_move_account_page_stat(from, to, MEM_CGROUP_STAT_FILE_DIRTY); + if (PageWriteback(page)) + mem_cgroup_move_account_page_stat(from, to, + MEM_CGROUP_STAT_FILE_WRITEBACK); + mem_cgroup_charge_statistics(from, anon, -nr_pages); /* caller should have done css_get */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e79a2f7..7398836 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); */ void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } EXPORT_SYMBOL(account_page_writeback); @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) ret = TestClearPageWriteback(page); } if (ret) { + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); dec_zone_page_state(page, NR_WRITEBACK); inc_zone_page_state(page, NR_WRITTEN); } + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) { struct address_space *mapping = page_mapping(page); int ret; + bool locked; + unsigned long flags; + mem_cgroup_begin_update_page_stat(page, &locked, &flags); if (mapping) { struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long flags; @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) } if (!ret) account_page_writeback(page); + + mem_cgroup_end_update_page_stat(page, &locked, &flags); return ret; } -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-06-28 11:06 ` Sha Zhengju (?) @ 2012-07-08 14:53 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-08 14:53 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju > @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > } > if (!ret) > account_page_writeback(page); > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > > } Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-08 14:53 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-08 14:53 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju > @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > } > if (!ret) > account_page_writeback(page); > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > > } Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-08 14:53 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-08 14:53 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju > @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > } > if (!ret) > account_page_writeback(page); > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > > } Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-08 14:53 ` Fengguang Wu (?) @ 2012-07-09 3:36 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 3:36 UTC (permalink / raw) To: Fengguang Wu Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/08/2012 10:53 PM, Fengguang Wu wrote: >> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >> { >> struct address_space *mapping = page_mapping(page); >> int ret; >> + bool locked; >> + unsigned long flags; >> >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> if (mapping) { >> struct backing_dev_info *bdi = mapping->backing_dev_info; >> unsigned long flags; >> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >> } >> if (!ret) >> account_page_writeback(page); >> + >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> return ret; >> >> } > Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > It's in account_page_writeback(). void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } There isn't a unified interface to dec/inc writeback accounting, so I just follow that. Maybe we can rework account_page_writeback() to also account dec in? Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 3:36 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 3:36 UTC (permalink / raw) To: Fengguang Wu Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On 07/08/2012 10:53 PM, Fengguang Wu wrote: >> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >> { >> struct address_space *mapping = page_mapping(page); >> int ret; >> + bool locked; >> + unsigned long flags; >> >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> if (mapping) { >> struct backing_dev_info *bdi = mapping->backing_dev_info; >> unsigned long flags; >> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >> } >> if (!ret) >> account_page_writeback(page); >> + >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> return ret; >> >> } > Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > It's in account_page_writeback(). void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } There isn't a unified interface to dec/inc writeback accounting, so I just follow that. Maybe we can rework account_page_writeback() to also account dec in? Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 3:36 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 3:36 UTC (permalink / raw) To: Fengguang Wu Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/08/2012 10:53 PM, Fengguang Wu wrote: >> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >> { >> struct address_space *mapping = page_mapping(page); >> int ret; >> + bool locked; >> + unsigned long flags; >> >> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >> if (mapping) { >> struct backing_dev_info *bdi = mapping->backing_dev_info; >> unsigned long flags; >> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >> } >> if (!ret) >> account_page_writeback(page); >> + >> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >> return ret; >> >> } > Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > It's in account_page_writeback(). void account_page_writeback(struct page *page) { + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK); } There isn't a unified interface to dec/inc writeback accounting, so I just follow that. Maybe we can rework account_page_writeback() to also account dec in? Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-09 3:36 ` Sha Zhengju (?) @ 2012-07-09 4:14 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 4:14 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: > On 07/08/2012 10:53 PM, Fengguang Wu wrote: > >>@@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > >> { > >> struct address_space *mapping = page_mapping(page); > >> int ret; > >>+ bool locked; > >>+ unsigned long flags; > >> > >>+ mem_cgroup_begin_update_page_stat(page,&locked,&flags); > >> if (mapping) { > >> struct backing_dev_info *bdi = mapping->backing_dev_info; > >> unsigned long flags; > >>@@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > >> } > >> if (!ret) > >> account_page_writeback(page); > >>+ > >>+ mem_cgroup_end_update_page_stat(page,&locked,&flags); > >> return ret; > >> > >> } > >Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > > > > It's in account_page_writeback(). > > void account_page_writeback(struct page *page) > { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > inc_zone_page_state(page, NR_WRITEBACK); > } I didn't find that chunk, perhaps it's lost due to rebase.. > There isn't a unified interface to dec/inc writeback accounting, so > I just follow that. > Maybe we can rework account_page_writeback() to also account > dec in? The current seperate inc/dec paths are fine. It sounds like over-engineering if going any further. I'm a bit worried about some 3rd party kernel module to call account_page_writeback() without mem_cgroup_begin/end_update_page_stat(). Will that lead to serious locking issues, or merely inaccurate accounting? Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 4:14 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 4:14 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: > On 07/08/2012 10:53 PM, Fengguang Wu wrote: > >>@@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > >> { > >> struct address_space *mapping = page_mapping(page); > >> int ret; > >>+ bool locked; > >>+ unsigned long flags; > >> > >>+ mem_cgroup_begin_update_page_stat(page,&locked,&flags); > >> if (mapping) { > >> struct backing_dev_info *bdi = mapping->backing_dev_info; > >> unsigned long flags; > >>@@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > >> } > >> if (!ret) > >> account_page_writeback(page); > >>+ > >>+ mem_cgroup_end_update_page_stat(page,&locked,&flags); > >> return ret; > >> > >> } > >Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > > > > It's in account_page_writeback(). > > void account_page_writeback(struct page *page) > { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > inc_zone_page_state(page, NR_WRITEBACK); > } I didn't find that chunk, perhaps it's lost due to rebase.. > There isn't a unified interface to dec/inc writeback accounting, so > I just follow that. > Maybe we can rework account_page_writeback() to also account > dec in? The current seperate inc/dec paths are fine. It sounds like over-engineering if going any further. I'm a bit worried about some 3rd party kernel module to call account_page_writeback() without mem_cgroup_begin/end_update_page_stat(). Will that lead to serious locking issues, or merely inaccurate accounting? Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 4:14 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 4:14 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: > On 07/08/2012 10:53 PM, Fengguang Wu wrote: > >>@@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > >> { > >> struct address_space *mapping = page_mapping(page); > >> int ret; > >>+ bool locked; > >>+ unsigned long flags; > >> > >>+ mem_cgroup_begin_update_page_stat(page,&locked,&flags); > >> if (mapping) { > >> struct backing_dev_info *bdi = mapping->backing_dev_info; > >> unsigned long flags; > >>@@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > >> } > >> if (!ret) > >> account_page_writeback(page); > >>+ > >>+ mem_cgroup_end_update_page_stat(page,&locked,&flags); > >> return ret; > >> > >> } > >Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > > > > It's in account_page_writeback(). > > void account_page_writeback(struct page *page) > { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > inc_zone_page_state(page, NR_WRITEBACK); > } I didn't find that chunk, perhaps it's lost due to rebase.. > There isn't a unified interface to dec/inc writeback accounting, so > I just follow that. > Maybe we can rework account_page_writeback() to also account > dec in? The current seperate inc/dec paths are fine. It sounds like over-engineering if going any further. I'm a bit worried about some 3rd party kernel module to call account_page_writeback() without mem_cgroup_begin/end_update_page_stat(). Will that lead to serious locking issues, or merely inaccurate accounting? Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-09 4:14 ` Fengguang Wu @ 2012-07-09 4:18 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-09 4:18 UTC (permalink / raw) To: Fengguang Wu Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/07/09 13:14), Fengguang Wu wrote: > On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: >> On 07/08/2012 10:53 PM, Fengguang Wu wrote: >>>> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >>>> { >>>> struct address_space *mapping = page_mapping(page); >>>> int ret; >>>> + bool locked; >>>> + unsigned long flags; >>>> >>>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>>> if (mapping) { >>>> struct backing_dev_info *bdi = mapping->backing_dev_info; >>>> unsigned long flags; >>>> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >>>> } >>>> if (!ret) >>>> account_page_writeback(page); >>>> + >>>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>>> return ret; >>>> >>>> } >>> Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? >>> >> >> It's in account_page_writeback(). >> >> void account_page_writeback(struct page *page) >> { >> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); >> inc_zone_page_state(page, NR_WRITEBACK); >> } > > I didn't find that chunk, perhaps it's lost due to rebase.. > >> There isn't a unified interface to dec/inc writeback accounting, so >> I just follow that. >> Maybe we can rework account_page_writeback() to also account >> dec in? > > The current seperate inc/dec paths are fine. It sounds like > over-engineering if going any further. > > I'm a bit worried about some 3rd party kernel module to call > account_page_writeback() without mem_cgroup_begin/end_update_page_stat(). > Will that lead to serious locking issues, or merely inaccurate > accounting? > Ah, Hm. Maybe it's better to add some debug check in mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. Thanks, -Kame ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 4:18 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-09 4:18 UTC (permalink / raw) To: Fengguang Wu Cc: Sha Zhengju, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/07/09 13:14), Fengguang Wu wrote: > On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: >> On 07/08/2012 10:53 PM, Fengguang Wu wrote: >>>> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >>>> { >>>> struct address_space *mapping = page_mapping(page); >>>> int ret; >>>> + bool locked; >>>> + unsigned long flags; >>>> >>>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>>> if (mapping) { >>>> struct backing_dev_info *bdi = mapping->backing_dev_info; >>>> unsigned long flags; >>>> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >>>> } >>>> if (!ret) >>>> account_page_writeback(page); >>>> + >>>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>>> return ret; >>>> >>>> } >>> Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? >>> >> >> It's in account_page_writeback(). >> >> void account_page_writeback(struct page *page) >> { >> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); >> inc_zone_page_state(page, NR_WRITEBACK); >> } > > I didn't find that chunk, perhaps it's lost due to rebase.. > >> There isn't a unified interface to dec/inc writeback accounting, so >> I just follow that. >> Maybe we can rework account_page_writeback() to also account >> dec in? > > The current seperate inc/dec paths are fine. It sounds like > over-engineering if going any further. > > I'm a bit worried about some 3rd party kernel module to call > account_page_writeback() without mem_cgroup_begin/end_update_page_stat(). > Will that lead to serious locking issues, or merely inaccurate > accounting? > Ah, Hm. Maybe it's better to add some debug check in mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-09 4:18 ` Kamezawa Hiroyuki (?) @ 2012-07-09 5:22 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 5:22 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Fengguang Wu, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/09/2012 12:18 PM, Kamezawa Hiroyuki wrote: > (2012/07/09 13:14), Fengguang Wu wrote: >> On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: >>> On 07/08/2012 10:53 PM, Fengguang Wu wrote: >>>>> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >>>>> { >>>>> struct address_space *mapping = page_mapping(page); >>>>> int ret; >>>>> + bool locked; >>>>> + unsigned long flags; >>>>> >>>>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>>>> if (mapping) { >>>>> struct backing_dev_info *bdi = mapping->backing_dev_info; >>>>> unsigned long flags; >>>>> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >>>>> } >>>>> if (!ret) >>>>> account_page_writeback(page); >>>>> + >>>>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>>>> return ret; >>>>> >>>>> } >>>> Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? >>>> >>> >>> It's in account_page_writeback(). >>> >>> void account_page_writeback(struct page *page) >>> { >>> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); >>> inc_zone_page_state(page, NR_WRITEBACK); >>> } >> >> I didn't find that chunk, perhaps it's lost due to rebase.. >> >>> There isn't a unified interface to dec/inc writeback accounting, so >>> I just follow that. >>> Maybe we can rework account_page_writeback() to also account >>> dec in? >> >> The current seperate inc/dec paths are fine. It sounds like >> over-engineering if going any further. >> >> I'm a bit worried about some 3rd party kernel module to call >> account_page_writeback() without >> mem_cgroup_begin/end_update_page_stat(). >> Will that lead to serious locking issues, or merely inaccurate >> accounting? >> > > Ah, Hm. Maybe it's better to add some debug check in > mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. > This also apply to account_page_dirtied()... But as an "range" lock, I think it's common in current kernel: just as set_page_dirty(), the caller should call it under the page lock (in most cases) and it's his responsibility to guarantee correctness. I can add some comments or debug check as reminding but I think i can only do so... Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 5:22 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 5:22 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Fengguang Wu, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On 07/09/2012 12:18 PM, Kamezawa Hiroyuki wrote: > (2012/07/09 13:14), Fengguang Wu wrote: >> On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: >>> On 07/08/2012 10:53 PM, Fengguang Wu wrote: >>>>> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >>>>> { >>>>> struct address_space *mapping = page_mapping(page); >>>>> int ret; >>>>> + bool locked; >>>>> + unsigned long flags; >>>>> >>>>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>>>> if (mapping) { >>>>> struct backing_dev_info *bdi = mapping->backing_dev_info; >>>>> unsigned long flags; >>>>> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >>>>> } >>>>> if (!ret) >>>>> account_page_writeback(page); >>>>> + >>>>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>>>> return ret; >>>>> >>>>> } >>>> Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? >>>> >>> >>> It's in account_page_writeback(). >>> >>> void account_page_writeback(struct page *page) >>> { >>> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); >>> inc_zone_page_state(page, NR_WRITEBACK); >>> } >> >> I didn't find that chunk, perhaps it's lost due to rebase.. >> >>> There isn't a unified interface to dec/inc writeback accounting, so >>> I just follow that. >>> Maybe we can rework account_page_writeback() to also account >>> dec in? >> >> The current seperate inc/dec paths are fine. It sounds like >> over-engineering if going any further. >> >> I'm a bit worried about some 3rd party kernel module to call >> account_page_writeback() without >> mem_cgroup_begin/end_update_page_stat(). >> Will that lead to serious locking issues, or merely inaccurate >> accounting? >> > > Ah, Hm. Maybe it's better to add some debug check in > mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. > This also apply to account_page_dirtied()... But as an "range" lock, I think it's common in current kernel: just as set_page_dirty(), the caller should call it under the page lock (in most cases) and it's his responsibility to guarantee correctness. I can add some comments or debug check as reminding but I think i can only do so... Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 5:22 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 5:22 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Fengguang Wu, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/09/2012 12:18 PM, Kamezawa Hiroyuki wrote: > (2012/07/09 13:14), Fengguang Wu wrote: >> On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: >>> On 07/08/2012 10:53 PM, Fengguang Wu wrote: >>>>> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >>>>> { >>>>> struct address_space *mapping = page_mapping(page); >>>>> int ret; >>>>> + bool locked; >>>>> + unsigned long flags; >>>>> >>>>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>>>> if (mapping) { >>>>> struct backing_dev_info *bdi = mapping->backing_dev_info; >>>>> unsigned long flags; >>>>> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >>>>> } >>>>> if (!ret) >>>>> account_page_writeback(page); >>>>> + >>>>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>>>> return ret; >>>>> >>>>> } >>>> Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? >>>> >>> >>> It's in account_page_writeback(). >>> >>> void account_page_writeback(struct page *page) >>> { >>> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); >>> inc_zone_page_state(page, NR_WRITEBACK); >>> } >> >> I didn't find that chunk, perhaps it's lost due to rebase.. >> >>> There isn't a unified interface to dec/inc writeback accounting, so >>> I just follow that. >>> Maybe we can rework account_page_writeback() to also account >>> dec in? >> >> The current seperate inc/dec paths are fine. It sounds like >> over-engineering if going any further. >> >> I'm a bit worried about some 3rd party kernel module to call >> account_page_writeback() without >> mem_cgroup_begin/end_update_page_stat(). >> Will that lead to serious locking issues, or merely inaccurate >> accounting? >> > > Ah, Hm. Maybe it's better to add some debug check in > mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. > This also apply to account_page_dirtied()... But as an "range" lock, I think it's common in current kernel: just as set_page_dirty(), the caller should call it under the page lock (in most cases) and it's his responsibility to guarantee correctness. I can add some comments or debug check as reminding but I think i can only do so... Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-09 5:22 ` Sha Zhengju (?) @ 2012-07-09 5:28 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 5:28 UTC (permalink / raw) To: Sha Zhengju Cc: Kamezawa Hiroyuki, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Mon, Jul 09, 2012 at 01:22:54PM +0800, Sha Zhengju wrote: > On 07/09/2012 12:18 PM, Kamezawa Hiroyuki wrote: > >(2012/07/09 13:14), Fengguang Wu wrote: > >>On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: > >>>On 07/08/2012 10:53 PM, Fengguang Wu wrote: > >>>>>@@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > >>>>> { > >>>>> struct address_space *mapping = page_mapping(page); > >>>>> int ret; > >>>>>+ bool locked; > >>>>>+ unsigned long flags; > >>>>> > >>>>>+ mem_cgroup_begin_update_page_stat(page,&locked,&flags); > >>>>> if (mapping) { > >>>>> struct backing_dev_info *bdi = mapping->backing_dev_info; > >>>>> unsigned long flags; > >>>>>@@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > >>>>> } > >>>>> if (!ret) > >>>>> account_page_writeback(page); > >>>>>+ > >>>>>+ mem_cgroup_end_update_page_stat(page,&locked,&flags); > >>>>> return ret; > >>>>> > >>>>> } > >>>>Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > >>>> > >>> > >>>It's in account_page_writeback(). > >>> > >>> void account_page_writeback(struct page *page) > >>> { > >>>+ mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > >>> inc_zone_page_state(page, NR_WRITEBACK); > >>> } > >> > >>I didn't find that chunk, perhaps it's lost due to rebase.. > >> > >>>There isn't a unified interface to dec/inc writeback accounting, so > >>>I just follow that. > >>>Maybe we can rework account_page_writeback() to also account > >>>dec in? > >> > >>The current seperate inc/dec paths are fine. It sounds like > >>over-engineering if going any further. > >> > >>I'm a bit worried about some 3rd party kernel module to call > >>account_page_writeback() without > >>mem_cgroup_begin/end_update_page_stat(). > >>Will that lead to serious locking issues, or merely inaccurate > >>accounting? > >> > > > >Ah, Hm. Maybe it's better to add some debug check in > > mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. > > > > This also apply to account_page_dirtied()... But as an "range" lock, > I think it's common > in current kernel: just as set_page_dirty(), the caller should call > it under the page lock > (in most cases) and it's his responsibility to guarantee > correctness. I can add some > comments or debug check as reminding but I think i can only do so... Yeah, it helps to add some brief comment on the locking rule in account_page_*(). Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 5:28 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 5:28 UTC (permalink / raw) To: Sha Zhengju Cc: Kamezawa Hiroyuki, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On Mon, Jul 09, 2012 at 01:22:54PM +0800, Sha Zhengju wrote: > On 07/09/2012 12:18 PM, Kamezawa Hiroyuki wrote: > >(2012/07/09 13:14), Fengguang Wu wrote: > >>On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: > >>>On 07/08/2012 10:53 PM, Fengguang Wu wrote: > >>>>>@@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > >>>>> { > >>>>> struct address_space *mapping = page_mapping(page); > >>>>> int ret; > >>>>>+ bool locked; > >>>>>+ unsigned long flags; > >>>>> > >>>>>+ mem_cgroup_begin_update_page_stat(page,&locked,&flags); > >>>>> if (mapping) { > >>>>> struct backing_dev_info *bdi = mapping->backing_dev_info; > >>>>> unsigned long flags; > >>>>>@@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > >>>>> } > >>>>> if (!ret) > >>>>> account_page_writeback(page); > >>>>>+ > >>>>>+ mem_cgroup_end_update_page_stat(page,&locked,&flags); > >>>>> return ret; > >>>>> > >>>>> } > >>>>Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > >>>> > >>> > >>>It's in account_page_writeback(). > >>> > >>> void account_page_writeback(struct page *page) > >>> { > >>>+ mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > >>> inc_zone_page_state(page, NR_WRITEBACK); > >>> } > >> > >>I didn't find that chunk, perhaps it's lost due to rebase.. > >> > >>>There isn't a unified interface to dec/inc writeback accounting, so > >>>I just follow that. > >>>Maybe we can rework account_page_writeback() to also account > >>>dec in? > >> > >>The current seperate inc/dec paths are fine. It sounds like > >>over-engineering if going any further. > >> > >>I'm a bit worried about some 3rd party kernel module to call > >>account_page_writeback() without > >>mem_cgroup_begin/end_update_page_stat(). > >>Will that lead to serious locking issues, or merely inaccurate > >>accounting? > >> > > > >Ah, Hm. Maybe it's better to add some debug check in > > mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. > > > > This also apply to account_page_dirtied()... But as an "range" lock, > I think it's common > in current kernel: just as set_page_dirty(), the caller should call > it under the page lock > (in most cases) and it's his responsibility to guarantee > correctness. I can add some > comments or debug check as reminding but I think i can only do so... Yeah, it helps to add some brief comment on the locking rule in account_page_*(). Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 5:28 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 5:28 UTC (permalink / raw) To: Sha Zhengju Cc: Kamezawa Hiroyuki, linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Mon, Jul 09, 2012 at 01:22:54PM +0800, Sha Zhengju wrote: > On 07/09/2012 12:18 PM, Kamezawa Hiroyuki wrote: > >(2012/07/09 13:14), Fengguang Wu wrote: > >>On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: > >>>On 07/08/2012 10:53 PM, Fengguang Wu wrote: > >>>>>@@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > >>>>> { > >>>>> struct address_space *mapping = page_mapping(page); > >>>>> int ret; > >>>>>+ bool locked; > >>>>>+ unsigned long flags; > >>>>> > >>>>>+ mem_cgroup_begin_update_page_stat(page,&locked,&flags); > >>>>> if (mapping) { > >>>>> struct backing_dev_info *bdi = mapping->backing_dev_info; > >>>>> unsigned long flags; > >>>>>@@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > >>>>> } > >>>>> if (!ret) > >>>>> account_page_writeback(page); > >>>>>+ > >>>>>+ mem_cgroup_end_update_page_stat(page,&locked,&flags); > >>>>> return ret; > >>>>> > >>>>> } > >>>>Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > >>>> > >>> > >>>It's in account_page_writeback(). > >>> > >>> void account_page_writeback(struct page *page) > >>> { > >>>+ mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > >>> inc_zone_page_state(page, NR_WRITEBACK); > >>> } > >> > >>I didn't find that chunk, perhaps it's lost due to rebase.. > >> > >>>There isn't a unified interface to dec/inc writeback accounting, so > >>>I just follow that. > >>>Maybe we can rework account_page_writeback() to also account > >>>dec in? > >> > >>The current seperate inc/dec paths are fine. It sounds like > >>over-engineering if going any further. > >> > >>I'm a bit worried about some 3rd party kernel module to call > >>account_page_writeback() without > >>mem_cgroup_begin/end_update_page_stat(). > >>Will that lead to serious locking issues, or merely inaccurate > >>accounting? > >> > > > >Ah, Hm. Maybe it's better to add some debug check in > > mem_cgroup_update_page_stat(). rcu_read_lock_held() or some. > > > > This also apply to account_page_dirtied()... But as an "range" lock, > I think it's common > in current kernel: just as set_page_dirty(), the caller should call > it under the page lock > (in most cases) and it's his responsibility to guarantee > correctness. I can add some > comments or debug check as reminding but I think i can only do so... Yeah, it helps to add some brief comment on the locking rule in account_page_*(). Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-09 4:14 ` Fengguang Wu @ 2012-07-09 5:19 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 5:19 UTC (permalink / raw) To: Fengguang Wu Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/09/2012 12:14 PM, Fengguang Wu wrote: > On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: >> On 07/08/2012 10:53 PM, Fengguang Wu wrote: >>>> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >>>> { >>>> struct address_space *mapping = page_mapping(page); >>>> int ret; >>>> + bool locked; >>>> + unsigned long flags; >>>> >>>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>>> if (mapping) { >>>> struct backing_dev_info *bdi = mapping->backing_dev_info; >>>> unsigned long flags; >>>> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >>>> } >>>> if (!ret) >>>> account_page_writeback(page); >>>> + >>>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>>> return ret; >>>> >>>> } >>> Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? >>> >> It's in account_page_writeback(). >> >> void account_page_writeback(struct page *page) >> { >> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); >> inc_zone_page_state(page, NR_WRITEBACK); >> } > I didn't find that chunk, perhaps it's lost due to rebase.. Ah? a bit weird... you can refer to the link http://thread.gmane.org/gmane.linux.kernel.cgroups/3134 which is an integral one. Thanks! >> There isn't a unified interface to dec/inc writeback accounting, so >> I just follow that. >> Maybe we can rework account_page_writeback() to also account >> dec in? > The current seperate inc/dec paths are fine. It sounds like > over-engineering if going any further. > > I'm a bit worried about some 3rd party kernel module to call > account_page_writeback() without mem_cgroup_begin/end_update_page_stat(). > Will that lead to serious locking issues, or merely inaccurate > accounting? > > Thanks, > Fengguang > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 5:19 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-09 5:19 UTC (permalink / raw) To: Fengguang Wu Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/09/2012 12:14 PM, Fengguang Wu wrote: > On Mon, Jul 09, 2012 at 11:36:11AM +0800, Sha Zhengju wrote: >> On 07/08/2012 10:53 PM, Fengguang Wu wrote: >>>> @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) >>>> { >>>> struct address_space *mapping = page_mapping(page); >>>> int ret; >>>> + bool locked; >>>> + unsigned long flags; >>>> >>>> + mem_cgroup_begin_update_page_stat(page,&locked,&flags); >>>> if (mapping) { >>>> struct backing_dev_info *bdi = mapping->backing_dev_info; >>>> unsigned long flags; >>>> @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) >>>> } >>>> if (!ret) >>>> account_page_writeback(page); >>>> + >>>> + mem_cgroup_end_update_page_stat(page,&locked,&flags); >>>> return ret; >>>> >>>> } >>> Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? >>> >> It's in account_page_writeback(). >> >> void account_page_writeback(struct page *page) >> { >> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); >> inc_zone_page_state(page, NR_WRITEBACK); >> } > I didn't find that chunk, perhaps it's lost due to rebase.. Ah? a bit weird... you can refer to the link http://thread.gmane.org/gmane.linux.kernel.cgroups/3134 which is an integral one. Thanks! >> There isn't a unified interface to dec/inc writeback accounting, so >> I just follow that. >> Maybe we can rework account_page_writeback() to also account >> dec in? > The current seperate inc/dec paths are fine. It sounds like > over-engineering if going any further. > > I'm a bit worried about some 3rd party kernel module to call > account_page_writeback() without mem_cgroup_begin/end_update_page_stat(). > Will that lead to serious locking issues, or merely inaccurate > accounting? > > Thanks, > Fengguang > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-07-09 5:19 ` Sha Zhengju @ 2012-07-09 5:25 ` Fengguang Wu -1 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 5:25 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju > >>>Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > >>> > >>It's in account_page_writeback(). > >> > >> void account_page_writeback(struct page *page) > >> { > >>+ mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > >> inc_zone_page_state(page, NR_WRITEBACK); > >> } > >I didn't find that chunk, perhaps it's lost due to rebase.. > > Ah? a bit weird... you can refer to the link > http://thread.gmane.org/gmane.linux.kernel.cgroups/3134 > which is an integral one. Thanks! Ah I got it. Sorry I overlooked it..and the new view does help make it obvious ;) Thanks, Fengguang ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 5:25 ` Fengguang Wu 0 siblings, 0 replies; 132+ messages in thread From: Fengguang Wu @ 2012-07-09 5:25 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju > >>>Where is the MEM_CGROUP_STAT_FILE_WRITEBACK increased? > >>> > >>It's in account_page_writeback(). > >> > >> void account_page_writeback(struct page *page) > >> { > >>+ mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > >> inc_zone_page_state(page, NR_WRITEBACK); > >> } > >I didn't find that chunk, perhaps it's lost due to rebase.. > > Ah? a bit weird... you can refer to the link > http://thread.gmane.org/gmane.linux.kernel.cgroups/3134 > which is an integral one. Thanks! Ah I got it. Sorry I overlooked it..and the new view does help make it obvious ;) Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting 2012-06-28 11:06 ` Sha Zhengju @ 2012-07-09 21:02 ` Greg Thelen -1 siblings, 0 replies; 132+ messages in thread From: Greg Thelen @ 2012-07-09 21:02 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > rule still is: > mem_cgroup_begin_update_page_stat() > modify page WRITEBACK stat > mem_cgroup_update_page_stat() > mem_cgroup_end_update_page_stat() > > There're two writeback interface to modify: test_clear/set_page_writeback. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 1 + > mm/memcontrol.c | 5 +++++ > mm/page-writeback.c | 12 ++++++++++++ > 3 files changed, 18 insertions(+), 0 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index ad37b59..9193d93 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 90e2946..8493119 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { > "mapped_file", > "swap", > "dirty", > + "writeback", > }; > > enum mem_cgroup_events_index { > @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, > mem_cgroup_move_account_page_stat(from, to, > MEM_CGROUP_STAT_FILE_DIRTY); > > + if (PageWriteback(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_WRITEBACK); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e79a2f7..7398836 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); > */ > void account_page_writeback(struct page *page) > { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); As already mentioned, I'd also like to see a comment added to account_page_writeback() describing the new locking requirements (specifically mem_cgroup_begin_update_page_stat being held by caller). > inc_zone_page_state(page, NR_WRITEBACK); > } > EXPORT_SYMBOL(account_page_writeback); > @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) > ret = TestClearPageWriteback(page); > } > if (ret) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > dec_zone_page_state(page, NR_WRITEBACK); > inc_zone_page_state(page, NR_WRITTEN); > } > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > } > > @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > } > if (!ret) > account_page_writeback(page); > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > > } ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 6/7] memcg: add per cgroup writeback pages accounting @ 2012-07-09 21:02 ` Greg Thelen 0 siblings, 0 replies; 132+ messages in thread From: Greg Thelen @ 2012-07-09 21:02 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, kamezawa.hiroyu, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On Thu, Jun 28 2012, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > Similar to dirty page, we add per cgroup writeback pages accounting. The lock > rule still is: > mem_cgroup_begin_update_page_stat() > modify page WRITEBACK stat > mem_cgroup_update_page_stat() > mem_cgroup_end_update_page_stat() > > There're two writeback interface to modify: test_clear/set_page_writeback. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > --- > include/linux/memcontrol.h | 1 + > mm/memcontrol.c | 5 +++++ > mm/page-writeback.c | 12 ++++++++++++ > 3 files changed, 18 insertions(+), 0 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index ad37b59..9193d93 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index { > MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ > MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ > MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */ > + MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */ > MEM_CGROUP_STAT_NSTATS, > }; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 90e2946..8493119 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -83,6 +83,7 @@ static const char * const mem_cgroup_stat_names[] = { > "mapped_file", > "swap", > "dirty", > + "writeback", > }; > > enum mem_cgroup_events_index { > @@ -2604,6 +2605,10 @@ static int mem_cgroup_move_account(struct page *page, > mem_cgroup_move_account_page_stat(from, to, > MEM_CGROUP_STAT_FILE_DIRTY); > > + if (PageWriteback(page)) > + mem_cgroup_move_account_page_stat(from, to, > + MEM_CGROUP_STAT_FILE_WRITEBACK); > + > mem_cgroup_charge_statistics(from, anon, -nr_pages); > > /* caller should have done css_get */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index e79a2f7..7398836 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1981,6 +1981,7 @@ EXPORT_SYMBOL(account_page_dirtied); > */ > void account_page_writeback(struct page *page) > { > + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); As already mentioned, I'd also like to see a comment added to account_page_writeback() describing the new locking requirements (specifically mem_cgroup_begin_update_page_stat being held by caller). > inc_zone_page_state(page, NR_WRITEBACK); > } > EXPORT_SYMBOL(account_page_writeback); > @@ -2214,7 +2215,10 @@ int test_clear_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2235,9 +2239,12 @@ int test_clear_page_writeback(struct page *page) > ret = TestClearPageWriteback(page); > } > if (ret) { > + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_WRITEBACK); > dec_zone_page_state(page, NR_WRITEBACK); > inc_zone_page_state(page, NR_WRITTEN); > } > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > } > > @@ -2245,7 +2252,10 @@ int test_set_page_writeback(struct page *page) > { > struct address_space *mapping = page_mapping(page); > int ret; > + bool locked; > + unsigned long flags; > > + mem_cgroup_begin_update_page_stat(page, &locked, &flags); > if (mapping) { > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long flags; > @@ -2272,6 +2282,8 @@ int test_set_page_writeback(struct page *page) > } > if (!ret) > account_page_writeback(page); > + > + mem_cgroup_end_update_page_stat(page, &locked, &flags); > return ret; > > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* [PATCH 7/7] memcg: print more detailed info while memcg oom happening 2012-06-28 10:54 ` Sha Zhengju (?) @ 2012-06-28 11:06 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:06 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> While memcg oom happening, the dump info is limited, so add this to provide memcg page stat. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 files changed, 34 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8493119..3ed41e9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { "pgmajfault", }; +static const char * const mem_cgroup_lru_names[] = { + "inactive_anon", + "active_anon", + "inactive_file", + "active_file", + "unevictable", +}; + /* * Per memcg event counter is incremented at every pagein/pageout. With THP, * it will be incremated by the number of pages. This counter is used for @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, spin_unlock_irqrestore(&memcg->move_lock, *flags); } +#define K(x) ((x) << (PAGE_SHIFT-10)) +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) +{ + int i; + + printk(KERN_INFO "Memory cgroup stat:\n"); + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) { + if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account) + continue; + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], + K(mem_cgroup_read_stat(memcg, i))); + } + + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], + mem_cgroup_read_events(memcg, i)); + + for (i = 0; i < NR_LRU_LISTS; i++) + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); + printk(KERN_CONT "\n"); + +} + /** * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. * @memcg: The memory cgroup that went over limit @@ -1422,6 +1454,8 @@ done: res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10, res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10, res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); + + mem_cgroup_print_oom_stat(memcg); } /* @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, } #endif /* CONFIG_NUMA */ -static const char * const mem_cgroup_lru_names[] = { - "inactive_anon", - "active_anon", - "inactive_file", - "active_file", - "unevictable", -}; - static inline void mem_cgroup_lru_names_not_uptodate(void) { BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 7/7] memcg: print more detailed info while memcg oom happening @ 2012-06-28 11:06 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:06 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA Cc: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> While memcg oom happening, the dump info is limited, so add this to provide memcg page stat. Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> --- mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 files changed, 34 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8493119..3ed41e9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { "pgmajfault", }; +static const char * const mem_cgroup_lru_names[] = { + "inactive_anon", + "active_anon", + "inactive_file", + "active_file", + "unevictable", +}; + /* * Per memcg event counter is incremented at every pagein/pageout. With THP, * it will be incremated by the number of pages. This counter is used for @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, spin_unlock_irqrestore(&memcg->move_lock, *flags); } +#define K(x) ((x) << (PAGE_SHIFT-10)) +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) +{ + int i; + + printk(KERN_INFO "Memory cgroup stat:\n"); + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) { + if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account) + continue; + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], + K(mem_cgroup_read_stat(memcg, i))); + } + + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], + mem_cgroup_read_events(memcg, i)); + + for (i = 0; i < NR_LRU_LISTS; i++) + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); + printk(KERN_CONT "\n"); + +} + /** * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. * @memcg: The memory cgroup that went over limit @@ -1422,6 +1454,8 @@ done: res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10, res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10, res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); + + mem_cgroup_print_oom_stat(memcg); } /* @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, } #endif /* CONFIG_NUMA */ -static const char * const mem_cgroup_lru_names[] = { - "inactive_anon", - "active_anon", - "inactive_file", - "active_file", - "unevictable", -}; - static inline void mem_cgroup_lru_names_not_uptodate(void) { BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); -- 1.7.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* [PATCH 7/7] memcg: print more detailed info while memcg oom happening @ 2012-06-28 11:06 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-06-28 11:06 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju From: Sha Zhengju <handai.szj@taobao.com> While memcg oom happening, the dump info is limited, so add this to provide memcg page stat. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> --- mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 files changed, 34 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8493119..3ed41e9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { "pgmajfault", }; +static const char * const mem_cgroup_lru_names[] = { + "inactive_anon", + "active_anon", + "inactive_file", + "active_file", + "unevictable", +}; + /* * Per memcg event counter is incremented at every pagein/pageout. With THP, * it will be incremated by the number of pages. This counter is used for @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, spin_unlock_irqrestore(&memcg->move_lock, *flags); } +#define K(x) ((x) << (PAGE_SHIFT-10)) +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) +{ + int i; + + printk(KERN_INFO "Memory cgroup stat:\n"); + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) { + if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account) + continue; + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], + K(mem_cgroup_read_stat(memcg, i))); + } + + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], + mem_cgroup_read_events(memcg, i)); + + for (i = 0; i < NR_LRU_LISTS; i++) + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); + printk(KERN_CONT "\n"); + +} + /** * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. * @memcg: The memory cgroup that went over limit @@ -1422,6 +1454,8 @@ done: res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10, res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10, res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); + + mem_cgroup_print_oom_stat(memcg); } /* @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, } #endif /* CONFIG_NUMA */ -static const char * const mem_cgroup_lru_names[] = { - "inactive_anon", - "active_anon", - "inactive_file", - "active_file", - "unevictable", -}; - static inline void mem_cgroup_lru_names_not_uptodate(void) { BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening 2012-06-28 11:06 ` Sha Zhengju (?) @ 2012-07-04 8:25 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 8:25 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju Hi, Kame How about this bit? :-) On 06/28/2012 07:06 PM, Sha Zhengju wrote: > From: Sha Zhengju<handai.szj@taobao.com> > > While memcg oom happening, the dump info is limited, so add this > to provide memcg page stat. > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > --- > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- > 1 files changed, 34 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8493119..3ed41e9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { > "pgmajfault", > }; > > +static const char * const mem_cgroup_lru_names[] = { > + "inactive_anon", > + "active_anon", > + "inactive_file", > + "active_file", > + "unevictable", > +}; > + > /* > * Per memcg event counter is incremented at every pagein/pageout. With THP, > * it will be incremated by the number of pages. This counter is used for > @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, > spin_unlock_irqrestore(&memcg->move_lock, *flags); > } > > +#define K(x) ((x)<< (PAGE_SHIFT-10)) > +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) > +{ > + int i; > + > + printk(KERN_INFO "Memory cgroup stat:\n"); > + for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) { > + if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account) > + continue; > + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], > + K(mem_cgroup_read_stat(memcg, i))); > + } > + > + for (i = 0; i< MEM_CGROUP_EVENTS_NSTATS; i++) > + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], > + mem_cgroup_read_events(memcg, i)); > + > + for (i = 0; i< NR_LRU_LISTS; i++) > + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], > + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); > + printk(KERN_CONT "\n"); > + > +} > + > /** > * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. > * @memcg: The memory cgroup that went over limit > @@ -1422,6 +1454,8 @@ done: > res_counter_read_u64(&memcg->memsw, RES_USAGE)>> 10, > res_counter_read_u64(&memcg->memsw, RES_LIMIT)>> 10, > res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); > + > + mem_cgroup_print_oom_stat(memcg); > } > > /* > @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, > } > #endif /* CONFIG_NUMA */ > > -static const char * const mem_cgroup_lru_names[] = { > - "inactive_anon", > - "active_anon", > - "inactive_file", > - "active_file", > - "unevictable", > -}; > - > static inline void mem_cgroup_lru_names_not_uptodate(void) > { > BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening @ 2012-07-04 8:25 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 8:25 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA Cc: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju Hi, Kame How about this bit? :-) On 06/28/2012 07:06 PM, Sha Zhengju wrote: > From: Sha Zhengju<handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > > While memcg oom happening, the dump info is limited, so add this > to provide memcg page stat. > > Signed-off-by: Sha Zhengju<handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > --- > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- > 1 files changed, 34 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8493119..3ed41e9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { > "pgmajfault", > }; > > +static const char * const mem_cgroup_lru_names[] = { > + "inactive_anon", > + "active_anon", > + "inactive_file", > + "active_file", > + "unevictable", > +}; > + > /* > * Per memcg event counter is incremented at every pagein/pageout. With THP, > * it will be incremated by the number of pages. This counter is used for > @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, > spin_unlock_irqrestore(&memcg->move_lock, *flags); > } > > +#define K(x) ((x)<< (PAGE_SHIFT-10)) > +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) > +{ > + int i; > + > + printk(KERN_INFO "Memory cgroup stat:\n"); > + for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) { > + if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account) > + continue; > + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], > + K(mem_cgroup_read_stat(memcg, i))); > + } > + > + for (i = 0; i< MEM_CGROUP_EVENTS_NSTATS; i++) > + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], > + mem_cgroup_read_events(memcg, i)); > + > + for (i = 0; i< NR_LRU_LISTS; i++) > + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], > + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); > + printk(KERN_CONT "\n"); > + > +} > + > /** > * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. > * @memcg: The memory cgroup that went over limit > @@ -1422,6 +1454,8 @@ done: > res_counter_read_u64(&memcg->memsw, RES_USAGE)>> 10, > res_counter_read_u64(&memcg->memsw, RES_LIMIT)>> 10, > res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); > + > + mem_cgroup_print_oom_stat(memcg); > } > > /* > @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, > } > #endif /* CONFIG_NUMA */ > > -static const char * const mem_cgroup_lru_names[] = { > - "inactive_anon", > - "active_anon", > - "inactive_file", > - "active_file", > - "unevictable", > -}; > - > static inline void mem_cgroup_lru_names_not_uptodate(void) > { > BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening @ 2012-07-04 8:25 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 8:25 UTC (permalink / raw) To: linux-mm, cgroups Cc: kamezawa.hiroyu, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju Hi, Kame How about this biti 1/4 ? :-) On 06/28/2012 07:06 PM, Sha Zhengju wrote: > From: Sha Zhengju<handai.szj@taobao.com> > > While memcg oom happening, the dump info is limited, so add this > to provide memcg page stat. > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > --- > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- > 1 files changed, 34 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8493119..3ed41e9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { > "pgmajfault", > }; > > +static const char * const mem_cgroup_lru_names[] = { > + "inactive_anon", > + "active_anon", > + "inactive_file", > + "active_file", > + "unevictable", > +}; > + > /* > * Per memcg event counter is incremented at every pagein/pageout. With THP, > * it will be incremated by the number of pages. This counter is used for > @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, > spin_unlock_irqrestore(&memcg->move_lock, *flags); > } > > +#define K(x) ((x)<< (PAGE_SHIFT-10)) > +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) > +{ > + int i; > + > + printk(KERN_INFO "Memory cgroup stat:\n"); > + for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) { > + if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account) > + continue; > + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], > + K(mem_cgroup_read_stat(memcg, i))); > + } > + > + for (i = 0; i< MEM_CGROUP_EVENTS_NSTATS; i++) > + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], > + mem_cgroup_read_events(memcg, i)); > + > + for (i = 0; i< NR_LRU_LISTS; i++) > + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], > + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); > + printk(KERN_CONT "\n"); > + > +} > + > /** > * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. > * @memcg: The memory cgroup that went over limit > @@ -1422,6 +1454,8 @@ done: > res_counter_read_u64(&memcg->memsw, RES_USAGE)>> 10, > res_counter_read_u64(&memcg->memsw, RES_LIMIT)>> 10, > res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); > + > + mem_cgroup_print_oom_stat(memcg); > } > > /* > @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, > } > #endif /* CONFIG_NUMA */ > > -static const char * const mem_cgroup_lru_names[] = { > - "inactive_anon", > - "active_anon", > - "inactive_file", > - "active_file", > - "unevictable", > -}; > - > static inline void mem_cgroup_lru_names_not_uptodate(void) > { > BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening 2012-06-28 11:06 ` Sha Zhengju @ 2012-07-04 8:29 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-04 8:29 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 20:06), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While memcg oom happening, the dump info is limited, so add this > to provide memcg page stat. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Could you split this into a different series ? seems good to me in general but...one concern is hierarchy handling. IIUC, the passed 'memcg' is the root of hierarchy which gets OOM. So, the LRU info, which is local to the root memcg, may not contain any good information. I think you should visit all memcg under the tree. Thanks, -Kame > --- > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- > 1 files changed, 34 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8493119..3ed41e9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { > "pgmajfault", > }; > > +static const char * const mem_cgroup_lru_names[] = { > + "inactive_anon", > + "active_anon", > + "inactive_file", > + "active_file", > + "unevictable", > +}; > + > /* > * Per memcg event counter is incremented at every pagein/pageout. With THP, > * it will be incremated by the number of pages. This counter is used for > @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, > spin_unlock_irqrestore(&memcg->move_lock, *flags); > } > > +#define K(x) ((x) << (PAGE_SHIFT-10)) > +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) > +{ > + int i; > + > + printk(KERN_INFO "Memory cgroup stat:\n"); > + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) { > + if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account) > + continue; > + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], > + K(mem_cgroup_read_stat(memcg, i))); > + } > + > + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) > + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], > + mem_cgroup_read_events(memcg, i)); > + > + for (i = 0; i < NR_LRU_LISTS; i++) > + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], > + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); > + printk(KERN_CONT "\n"); > + > +} > + > /** > * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. > * @memcg: The memory cgroup that went over limit > @@ -1422,6 +1454,8 @@ done: > res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10, > res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10, > res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); > + > + mem_cgroup_print_oom_stat(memcg); > } > > /* > @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, > } > #endif /* CONFIG_NUMA */ > > -static const char * const mem_cgroup_lru_names[] = { > - "inactive_anon", > - "active_anon", > - "inactive_file", > - "active_file", > - "unevictable", > -}; > - > static inline void mem_cgroup_lru_names_not_uptodate(void) > { > BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); > ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening @ 2012-07-04 8:29 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-07-04 8:29 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 20:06), Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@taobao.com> > > While memcg oom happening, the dump info is limited, so add this > to provide memcg page stat. > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Could you split this into a different series ? seems good to me in general but...one concern is hierarchy handling. IIUC, the passed 'memcg' is the root of hierarchy which gets OOM. So, the LRU info, which is local to the root memcg, may not contain any good information. I think you should visit all memcg under the tree. Thanks, -Kame > --- > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++-------- > 1 files changed, 34 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8493119..3ed41e9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -101,6 +101,14 @@ static const char * const mem_cgroup_events_names[] = { > "pgmajfault", > }; > > +static const char * const mem_cgroup_lru_names[] = { > + "inactive_anon", > + "active_anon", > + "inactive_file", > + "active_file", > + "unevictable", > +}; > + > /* > * Per memcg event counter is incremented at every pagein/pageout. With THP, > * it will be incremated by the number of pages. This counter is used for > @@ -1358,6 +1366,30 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, > spin_unlock_irqrestore(&memcg->move_lock, *flags); > } > > +#define K(x) ((x) << (PAGE_SHIFT-10)) > +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) > +{ > + int i; > + > + printk(KERN_INFO "Memory cgroup stat:\n"); > + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) { > + if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account) > + continue; > + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i], > + K(mem_cgroup_read_stat(memcg, i))); > + } > + > + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) > + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i], > + mem_cgroup_read_events(memcg, i)); > + > + for (i = 0; i < NR_LRU_LISTS; i++) > + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i], > + K(mem_cgroup_nr_lru_pages(memcg, BIT(i)))); > + printk(KERN_CONT "\n"); > + > +} > + > /** > * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. > * @memcg: The memory cgroup that went over limit > @@ -1422,6 +1454,8 @@ done: > res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10, > res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10, > res_counter_read_u64(&memcg->memsw, RES_FAILCNT)); > + > + mem_cgroup_print_oom_stat(memcg); > } > > /* > @@ -4043,14 +4077,6 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft, > } > #endif /* CONFIG_NUMA */ > > -static const char * const mem_cgroup_lru_names[] = { > - "inactive_anon", > - "active_anon", > - "inactive_file", > - "active_file", > - "unevictable", > -}; > - > static inline void mem_cgroup_lru_names_not_uptodate(void) > { > BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening 2012-07-04 8:29 ` Kamezawa Hiroyuki (?) @ 2012-07-04 11:20 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 11:20 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/04/2012 04:29 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 20:06), Sha Zhengju wrote: >> From: Sha Zhengju <handai.szj@taobao.com> >> >> While memcg oom happening, the dump info is limited, so add this >> to provide memcg page stat. >> >> Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > Could you split this into a different series ? > seems good to me in general but...one concern is hierarchy handling. > > IIUC, the passed 'memcg' is the root of hierarchy which gets OOM. > So, the LRU info, which is local to the root memcg, may not contain any good > information. I think you should visit all memcg under the tree. > Yes, you're right! I did not handle hierarchy here, and just now I make a test case to prove this. I'll split it to another series later. Thanks for reviewing! Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening @ 2012-07-04 11:20 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 11:20 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On 07/04/2012 04:29 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 20:06), Sha Zhengju wrote: >> From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> >> >> While memcg oom happening, the dump info is limited, so add this >> to provide memcg page stat. >> >> Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > Could you split this into a different series ? > seems good to me in general but...one concern is hierarchy handling. > > IIUC, the passed 'memcg' is the root of hierarchy which gets OOM. > So, the LRU info, which is local to the root memcg, may not contain any good > information. I think you should visit all memcg under the tree. > Yes, you're right! I did not handle hierarchy here, and just now I make a test case to prove this. I'll split it to another series later. Thanks for reviewing! Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 7/7] memcg: print more detailed info while memcg oom happening @ 2012-07-04 11:20 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-04 11:20 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 07/04/2012 04:29 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 20:06), Sha Zhengju wrote: >> From: Sha Zhengju <handai.szj@taobao.com> >> >> While memcg oom happening, the dump info is limited, so add this >> to provide memcg page stat. >> >> Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > Could you split this into a different series ? > seems good to me in general but...one concern is hierarchy handling. > > IIUC, the passed 'memcg' is the root of hierarchy which gets OOM. > So, the LRU info, which is local to the root memcg, may not contain any good > information. I think you should visit all memcg under the tree. > Yes, you're right! I did not handle hierarchy here, and just now I make a test case to prove this. I'll split it to another series later. Thanks for reviewing! Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 0/7] Per-cgroup page stat accounting 2012-06-28 10:54 ` Sha Zhengju @ 2012-06-29 8:23 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-06-29 8:23 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 19:54), Sha Zhengju wrote: > This patch series provide the ability for each memory cgroup to have independent > dirty/writeback page stats. This can provide some information for per-cgroup direct > reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. > > Three features are included in this patch series: > (0).prepare patches for page accounting > 1. memcg dirty page accounting > 2. memcg writeback page accounting > 3. memcg OOMs dump info > > In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify > page info" and "dirty page accouting" stay in one function as much as possible for > the sake of memcg bigger lock. > > These patches are cooked based on Andrew's akpm tree. > Thank you !, it seems good in general. I'll review in detail, later. Do you have any performance comparison between before/after the series ? I mean, set_page_dirty() is the hot-path and we should be careful to add a new accounting. Thanks, -Kame > Sha Zhengju (7): > memcg-update-cgroup-memory-document.patch > memcg-remove-MEMCG_NR_FILE_MAPPED.patch > Make-TestSetPageDirty-and-dirty-page-accounting-in-o.patch > Use-vfs-__set_page_dirty-interface-instead-of-doing-.patch > memcg-add-per-cgroup-dirty-pages-accounting.patch > memcg-add-per-cgroup-writeback-pages-accounting.patch > memcg-print-more-detailed-info-while-memcg-oom-happe.patch > > Documentation/cgroups/memory.txt | 2 + > fs/buffer.c | 36 +++++++++----- > fs/ceph/addr.c | 20 +------- > include/linux/buffer_head.h | 2 + > include/linux/memcontrol.h | 27 +++++++--- > mm/filemap.c | 5 ++ > mm/memcontrol.c | 99 +++++++++++++++++++++++-------------- > mm/page-writeback.c | 42 ++++++++++++++-- > mm/rmap.c | 4 +- > mm/truncate.c | 6 ++ > 10 files changed, 159 insertions(+), 84 deletions(-) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 0/7] Per-cgroup page stat accounting @ 2012-06-29 8:23 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 132+ messages in thread From: Kamezawa Hiroyuki @ 2012-06-29 8:23 UTC (permalink / raw) To: Sha Zhengju Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju (2012/06/28 19:54), Sha Zhengju wrote: > This patch series provide the ability for each memory cgroup to have independent > dirty/writeback page stats. This can provide some information for per-cgroup direct > reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. > > Three features are included in this patch series: > (0).prepare patches for page accounting > 1. memcg dirty page accounting > 2. memcg writeback page accounting > 3. memcg OOMs dump info > > In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify > page info" and "dirty page accouting" stay in one function as much as possible for > the sake of memcg bigger lock. > > These patches are cooked based on Andrew's akpm tree. > Thank you !, it seems good in general. I'll review in detail, later. Do you have any performance comparison between before/after the series ? I mean, set_page_dirty() is the hot-path and we should be careful to add a new accounting. Thanks, -Kame > Sha Zhengju (7): > memcg-update-cgroup-memory-document.patch > memcg-remove-MEMCG_NR_FILE_MAPPED.patch > Make-TestSetPageDirty-and-dirty-page-accounting-in-o.patch > Use-vfs-__set_page_dirty-interface-instead-of-doing-.patch > memcg-add-per-cgroup-dirty-pages-accounting.patch > memcg-add-per-cgroup-writeback-pages-accounting.patch > memcg-print-more-detailed-info-while-memcg-oom-happe.patch > > Documentation/cgroups/memory.txt | 2 + > fs/buffer.c | 36 +++++++++----- > fs/ceph/addr.c | 20 +------- > include/linux/buffer_head.h | 2 + > include/linux/memcontrol.h | 27 +++++++--- > mm/filemap.c | 5 ++ > mm/memcontrol.c | 99 +++++++++++++++++++++++-------------- > mm/page-writeback.c | 42 ++++++++++++++-- > mm/rmap.c | 4 +- > mm/truncate.c | 6 ++ > 10 files changed, 159 insertions(+), 84 deletions(-) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 0/7] Per-cgroup page stat accounting 2012-06-29 8:23 ` Kamezawa Hiroyuki (?) @ 2012-07-02 7:51 ` Sha Zhengju -1 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-02 7:51 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 06/29/2012 04:23 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 19:54), Sha Zhengju wrote: >> This patch series provide the ability for each memory cgroup to have independent >> dirty/writeback page stats. This can provide some information for per-cgroup direct >> reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. >> >> Three features are included in this patch series: >> (0).prepare patches for page accounting >> 1. memcg dirty page accounting >> 2. memcg writeback page accounting >> 3. memcg OOMs dump info >> >> In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify >> page info" and "dirty page accouting" stay in one function as much as possible for >> the sake of memcg bigger lock. >> >> These patches are cooked based on Andrew's akpm tree. >> > Thank you !, it seems good in general. I'll review in detail, later. > > Do you have any performance comparison between before/after the series ? > I mean, set_page_dirty() is the hot-path and we should be careful to add a new accounting. Not yet, I sent it out as soon as I worked out this solution to check whether it's okay. I can test the series after most of people agree with it. Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 0/7] Per-cgroup page stat accounting @ 2012-07-02 7:51 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-02 7:51 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, gthelen-hpIqsD4AKlfQT0dZR+AlfA, yinghan-hpIqsD4AKlfQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Sha Zhengju On 06/29/2012 04:23 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 19:54), Sha Zhengju wrote: >> This patch series provide the ability for each memory cgroup to have independent >> dirty/writeback page stats. This can provide some information for per-cgroup direct >> reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. >> >> Three features are included in this patch series: >> (0).prepare patches for page accounting >> 1. memcg dirty page accounting >> 2. memcg writeback page accounting >> 3. memcg OOMs dump info >> >> In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify >> page info" and "dirty page accouting" stay in one function as much as possible for >> the sake of memcg bigger lock. >> >> These patches are cooked based on Andrew's akpm tree. >> > Thank you !, it seems good in general. I'll review in detail, later. > > Do you have any performance comparison between before/after the series ? > I mean, set_page_dirty() is the hot-path and we should be careful to add a new accounting. Not yet, I sent it out as soon as I worked out this solution to check whether it's okay. I can test the series after most of people agree with it. Thanks, Sha ^ permalink raw reply [flat|nested] 132+ messages in thread
* Re: [PATCH 0/7] Per-cgroup page stat accounting @ 2012-07-02 7:51 ` Sha Zhengju 0 siblings, 0 replies; 132+ messages in thread From: Sha Zhengju @ 2012-07-02 7:51 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-mm, cgroups, gthelen, yinghan, akpm, mhocko, linux-kernel, Sha Zhengju On 06/29/2012 04:23 PM, Kamezawa Hiroyuki wrote: > (2012/06/28 19:54), Sha Zhengju wrote: >> This patch series provide the ability for each memory cgroup to have independent >> dirty/writeback page stats. This can provide some information for per-cgroup direct >> reclaim. Meanwhile, we add more detailed dump messages for memcg OOMs. >> >> Three features are included in this patch series: >> (0).prepare patches for page accounting >> 1. memcg dirty page accounting >> 2. memcg writeback page accounting >> 3. memcg OOMs dump info >> >> In (0) prepare patches, we have reworked vfs set page dirty routines to make "modify >> page info" and "dirty page accouting" stay in one function as much as possible for >> the sake of memcg bigger lock. >> >> These patches are cooked based on Andrew's akpm tree. >> > Thank you !, it seems good in general. I'll review in detail, later. > > Do you have any performance comparison between before/after the series ? > I mean, set_page_dirty() is the hot-path and we should be careful to add a new accounting. Not yet, I sent it out as soon as I worked out this solution to check whether it's okay. I can test the series after most of people agree with it. Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 132+ messages in thread
end of thread, other threads:[~2012-07-19 6:35 UTC | newest] Thread overview: 132+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-06-28 10:54 [PATCH 0/7] Per-cgroup page stat accounting Sha Zhengju 2012-06-28 10:54 ` Sha Zhengju 2012-06-28 10:54 ` Sha Zhengju 2012-06-28 10:57 ` [PATCH 1/7] memcg: update cgroup memory document Sha Zhengju 2012-06-28 10:57 ` Sha Zhengju 2012-06-28 10:57 ` Sha Zhengju 2012-07-02 7:00 ` Kamezawa Hiroyuki 2012-07-02 7:00 ` Kamezawa Hiroyuki 2012-07-04 12:47 ` Michal Hocko 2012-07-04 12:47 ` Michal Hocko 2012-07-04 12:47 ` Michal Hocko 2012-07-07 13:45 ` Fengguang Wu 2012-07-07 13:45 ` Fengguang Wu 2012-07-07 13:45 ` Fengguang Wu 2012-06-28 10:58 ` [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju 2012-06-28 10:58 ` Sha Zhengju 2012-07-02 10:44 ` Kamezawa Hiroyuki 2012-07-02 10:44 ` Kamezawa Hiroyuki 2012-07-02 10:44 ` Kamezawa Hiroyuki 2012-07-04 12:56 ` Michal Hocko 2012-07-04 12:56 ` Michal Hocko 2012-07-04 12:58 ` Michal Hocko 2012-07-04 12:58 ` Michal Hocko 2012-07-04 12:58 ` Michal Hocko 2012-07-07 13:48 ` Fengguang Wu 2012-07-07 13:48 ` Fengguang Wu 2012-07-09 21:01 ` Greg Thelen 2012-07-09 21:01 ` Greg Thelen 2012-07-09 21:01 ` Greg Thelen 2012-07-11 8:00 ` Sha Zhengju 2012-07-11 8:00 ` Sha Zhengju 2012-06-28 11:01 ` [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func Sha Zhengju 2012-06-28 11:01 ` Sha Zhengju 2012-07-02 11:14 ` Kamezawa Hiroyuki 2012-07-02 11:14 ` Kamezawa Hiroyuki 2012-07-02 11:14 ` Kamezawa Hiroyuki 2012-07-07 14:42 ` Fengguang Wu 2012-07-07 14:42 ` Fengguang Wu 2012-07-04 14:23 ` Michal Hocko 2012-07-04 14:23 ` Michal Hocko 2012-06-28 11:03 ` [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem Sha Zhengju 2012-06-28 11:03 ` Sha Zhengju 2012-06-28 11:03 ` Sha Zhengju 2012-06-29 5:21 ` Sage Weil 2012-06-29 5:21 ` Sage Weil 2012-06-29 5:21 ` Sage Weil 2012-07-02 8:10 ` Sha Zhengju 2012-07-02 8:10 ` Sha Zhengju 2012-07-02 14:49 ` Sage Weil 2012-07-02 14:49 ` Sage Weil 2012-07-04 8:11 ` Sha Zhengju 2012-07-04 8:11 ` Sha Zhengju 2012-07-05 15:20 ` Sage Weil 2012-07-05 15:20 ` Sage Weil 2012-07-05 15:40 ` Sha Zhengju 2012-07-05 15:40 ` Sha Zhengju 2012-07-04 14:27 ` Michal Hocko 2012-07-04 14:27 ` Michal Hocko 2012-06-28 11:04 ` [PATCH 5/7] memcg: add per cgroup dirty pages accounting Sha Zhengju 2012-06-28 11:04 ` Sha Zhengju 2012-06-28 11:04 ` Sha Zhengju 2012-07-03 5:57 ` Kamezawa Hiroyuki 2012-07-03 5:57 ` Kamezawa Hiroyuki 2012-07-08 14:45 ` Fengguang Wu 2012-07-08 14:45 ` Fengguang Wu 2012-07-04 16:11 ` Michal Hocko 2012-07-04 16:11 ` Michal Hocko 2012-07-04 16:11 ` Michal Hocko 2012-07-09 21:02 ` Greg Thelen 2012-07-09 21:02 ` Greg Thelen 2012-07-11 9:32 ` Sha Zhengju 2012-07-11 9:32 ` Sha Zhengju 2012-07-19 6:33 ` Kamezawa Hiroyuki 2012-07-19 6:33 ` Kamezawa Hiroyuki 2012-07-19 6:33 ` Kamezawa Hiroyuki 2012-06-28 11:05 ` [PATCH 6/7] memcg: add per cgroup writeback " Sha Zhengju 2012-06-28 11:05 ` Sha Zhengju 2012-07-03 6:31 ` Kamezawa Hiroyuki 2012-07-03 6:31 ` Kamezawa Hiroyuki 2012-07-04 8:24 ` Sha Zhengju 2012-07-04 8:24 ` Sha Zhengju 2012-07-08 14:44 ` Fengguang Wu 2012-07-08 14:44 ` Fengguang Wu 2012-07-08 23:01 ` Johannes Weiner 2012-07-08 23:01 ` Johannes Weiner 2012-07-09 1:37 ` Fengguang Wu 2012-07-09 1:37 ` Fengguang Wu 2012-07-09 1:37 ` Fengguang Wu 2012-07-04 16:15 ` Michal Hocko 2012-07-04 16:15 ` Michal Hocko 2012-06-28 11:06 ` Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-07-08 14:53 ` Fengguang Wu 2012-07-08 14:53 ` Fengguang Wu 2012-07-08 14:53 ` Fengguang Wu 2012-07-09 3:36 ` Sha Zhengju 2012-07-09 3:36 ` Sha Zhengju 2012-07-09 3:36 ` Sha Zhengju 2012-07-09 4:14 ` Fengguang Wu 2012-07-09 4:14 ` Fengguang Wu 2012-07-09 4:14 ` Fengguang Wu 2012-07-09 4:18 ` Kamezawa Hiroyuki 2012-07-09 4:18 ` Kamezawa Hiroyuki 2012-07-09 5:22 ` Sha Zhengju 2012-07-09 5:22 ` Sha Zhengju 2012-07-09 5:22 ` Sha Zhengju 2012-07-09 5:28 ` Fengguang Wu 2012-07-09 5:28 ` Fengguang Wu 2012-07-09 5:28 ` Fengguang Wu 2012-07-09 5:19 ` Sha Zhengju 2012-07-09 5:19 ` Sha Zhengju 2012-07-09 5:25 ` Fengguang Wu 2012-07-09 5:25 ` Fengguang Wu 2012-07-09 21:02 ` Greg Thelen 2012-07-09 21:02 ` Greg Thelen 2012-06-28 11:06 ` [PATCH 7/7] memcg: print more detailed info while memcg oom happening Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-07-04 8:25 ` Sha Zhengju 2012-07-04 8:25 ` Sha Zhengju 2012-07-04 8:25 ` Sha Zhengju 2012-07-04 8:29 ` Kamezawa Hiroyuki 2012-07-04 8:29 ` Kamezawa Hiroyuki 2012-07-04 11:20 ` Sha Zhengju 2012-07-04 11:20 ` Sha Zhengju 2012-07-04 11:20 ` Sha Zhengju 2012-06-29 8:23 ` [PATCH 0/7] Per-cgroup page stat accounting Kamezawa Hiroyuki 2012-06-29 8:23 ` Kamezawa Hiroyuki 2012-07-02 7:51 ` Sha Zhengju 2012-07-02 7:51 ` Sha Zhengju 2012-07-02 7:51 ` Sha Zhengju
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.