From: Sage Weil <sage@inktank.com> To: Sha Zhengju <handai.szj@gmail.com> Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com, gthelen@google.com, yinghan@google.com, akpm@linux-foundation.org, mhocko@suse.cz, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, sage@newdream.net, ceph-devel@vger.kernel.org, Sha Zhengju <handai.szj@taobao.com> Subject: Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem Date: Mon, 2 Jul 2012 07:49:33 -0700 (PDT) [thread overview] Message-ID: <Pine.LNX.4.64.1207020745180.23342@cobra.newdream.net> (raw) In-Reply-To: <4FF15782.5090807@gmail.com> On Mon, 2 Jul 2012, Sha Zhengju wrote: > On 06/29/2012 01:21 PM, Sage Weil wrote: > > On Thu, 28 Jun 2012, Sha Zhengju wrote: > > > > > From: Sha Zhengju<handai.szj@taobao.com> > > > > > > Following we will treat SetPageDirty and dirty page accounting as an > > > integrated > > > operation. Filesystems had better use vfs interface directly to avoid > > > those details. > > > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > > > --- > > > fs/buffer.c | 2 +- > > > fs/ceph/addr.c | 20 ++------------------ > > > include/linux/buffer_head.h | 2 ++ > > > 3 files changed, 5 insertions(+), 19 deletions(-) > > > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > index e8d96b8..55522dd 100644 > > > --- a/fs/buffer.c > > > +++ b/fs/buffer.c > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > > > * If warn is true, then emit a warning if the page is not uptodate and > > > has > > > * not been truncated. > > > */ > > > -static int __set_page_dirty(struct page *page, > > > +int __set_page_dirty(struct page *page, > > > struct address_space *mapping, int warn) > > > { > > > if (unlikely(!mapping)) > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > > continue to build as a module. > > > > With that fixed, the ceph bits are a welcome cleanup! > > > > Acked-by: Sage Weil<sage@inktank.com> > > Further, I check the path again and may it be reworked as follows to avoid > undo? > > __set_page_dirty(); > __set_page_dirty(); > ceph operations; ==> if (page->mapping) > if (page->mapping) ceph operations; > ; > else > undo = 1; > if (undo) > xxx; Yep. Taking another look at the original code, though, I'm worried that one reason the __set_page_dirty() actions were spread out the way they are is because we wanted to ensure that the ceph operations were always performed when PagePrivate was set. It looks like invalidatepage won't get called if private isn't set, and presumably it handles the truncate race with __set_page_dirty() properly (right?). What about writeback? Do we need to worry about writepage[s] getting called with a NULL page->private? Thanks! sage > > > > Thanks, > Sha > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > > index 8b67304..d028fbe 100644 > > > --- a/fs/ceph/addr.c > > > +++ b/fs/ceph/addr.c > > > @@ -5,6 +5,7 @@ > > > #include<linux/mm.h> > > > #include<linux/pagemap.h> > > > #include<linux/writeback.h> /* generic_writepages */ > > > +#include<linux/buffer_head.h> > > > #include<linux/slab.h> > > > #include<linux/pagevec.h> > > > #include<linux/task_io_accounting_ops.h> > > > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > > > int undo = 0; > > > struct ceph_snap_context *snapc; > > > > > > - if (unlikely(!mapping)) > > > - return !TestSetPageDirty(page); > > > - > > > - if (TestSetPageDirty(page)) { > > > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > > > - mapping->host, page, page->index); > > > + if (!__set_page_dirty(page, mapping, 1)) > > > return 0; > > > - } > > > > > > inode = mapping->host; > > > ci = ceph_inode(inode); > > > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > > > snapc, snapc->seq, snapc->num_snaps); > > > spin_unlock(&ci->i_ceph_lock); > > > > > > - /* now adjust page */ > > > - spin_lock_irq(&mapping->tree_lock); > > > if (page->mapping) { /* Race with truncate? */ > > > - WARN_ON_ONCE(!PageUptodate(page)); > > > - account_page_dirtied(page, page->mapping); > > > - radix_tree_tag_set(&mapping->page_tree, > > > - page_index(page), PAGECACHE_TAG_DIRTY); > > > - > > > /* > > > * Reference snap context in page->private. Also set > > > * PagePrivate so that we get invalidatepage callback. > > > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) > > > undo = 1; > > > } > > > > > > - spin_unlock_irq(&mapping->tree_lock); > > > - > > > if (undo) > > > /* whoops, we failed to dirty the page */ > > > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > > - > > > BUG_ON(!PageDirty(page)); > > > return 1; > > > } > > > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > > > index 458f497..0a331a8 100644 > > > --- a/include/linux/buffer_head.h > > > +++ b/include/linux/buffer_head.h > > > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) > > > } > > > > > > extern int __set_page_dirty_buffers(struct page *page); > > > +extern int __set_page_dirty(struct page *page, > > > + struct address_space *mapping, int warn); > > > > > > #else /* CONFIG_BLOCK */ > > > > > > -- > > > 1.7.1 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > > > in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > >
WARNING: multiple messages have this Message-ID (diff)
From: Sage Weil <sage@inktank.com> To: Sha Zhengju <handai.szj@gmail.com> Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com, gthelen@google.com, yinghan@google.com, akpm@linux-foundation.org, mhocko@suse.cz, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, sage@newdream.net, ceph-devel@vger.kernel.org, Sha Zhengju <handai.szj@taobao.com> Subject: Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem Date: Mon, 2 Jul 2012 07:49:33 -0700 (PDT) [thread overview] Message-ID: <Pine.LNX.4.64.1207020745180.23342@cobra.newdream.net> (raw) In-Reply-To: <4FF15782.5090807@gmail.com> On Mon, 2 Jul 2012, Sha Zhengju wrote: > On 06/29/2012 01:21 PM, Sage Weil wrote: > > On Thu, 28 Jun 2012, Sha Zhengju wrote: > > > > > From: Sha Zhengju<handai.szj@taobao.com> > > > > > > Following we will treat SetPageDirty and dirty page accounting as an > > > integrated > > > operation. Filesystems had better use vfs interface directly to avoid > > > those details. > > > > > > Signed-off-by: Sha Zhengju<handai.szj@taobao.com> > > > --- > > > fs/buffer.c | 2 +- > > > fs/ceph/addr.c | 20 ++------------------ > > > include/linux/buffer_head.h | 2 ++ > > > 3 files changed, 5 insertions(+), 19 deletions(-) > > > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > index e8d96b8..55522dd 100644 > > > --- a/fs/buffer.c > > > +++ b/fs/buffer.c > > > @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); > > > * If warn is true, then emit a warning if the page is not uptodate and > > > has > > > * not been truncated. > > > */ > > > -static int __set_page_dirty(struct page *page, > > > +int __set_page_dirty(struct page *page, > > > struct address_space *mapping, int warn) > > > { > > > if (unlikely(!mapping)) > > This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to > > continue to build as a module. > > > > With that fixed, the ceph bits are a welcome cleanup! > > > > Acked-by: Sage Weil<sage@inktank.com> > > Further, I check the path again and may it be reworked as follows to avoid > undo? > > __set_page_dirty(); > __set_page_dirty(); > ceph operations; ==> if (page->mapping) > if (page->mapping) ceph operations; > ; > else > undo = 1; > if (undo) > xxx; Yep. Taking another look at the original code, though, I'm worried that one reason the __set_page_dirty() actions were spread out the way they are is because we wanted to ensure that the ceph operations were always performed when PagePrivate was set. It looks like invalidatepage won't get called if private isn't set, and presumably it handles the truncate race with __set_page_dirty() properly (right?). What about writeback? Do we need to worry about writepage[s] getting called with a NULL page->private? Thanks! sage > > > > Thanks, > Sha > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > > index 8b67304..d028fbe 100644 > > > --- a/fs/ceph/addr.c > > > +++ b/fs/ceph/addr.c > > > @@ -5,6 +5,7 @@ > > > #include<linux/mm.h> > > > #include<linux/pagemap.h> > > > #include<linux/writeback.h> /* generic_writepages */ > > > +#include<linux/buffer_head.h> > > > #include<linux/slab.h> > > > #include<linux/pagevec.h> > > > #include<linux/task_io_accounting_ops.h> > > > @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) > > > int undo = 0; > > > struct ceph_snap_context *snapc; > > > > > > - if (unlikely(!mapping)) > > > - return !TestSetPageDirty(page); > > > - > > > - if (TestSetPageDirty(page)) { > > > - dout("%p set_page_dirty %p idx %lu -- already dirty\n", > > > - mapping->host, page, page->index); > > > + if (!__set_page_dirty(page, mapping, 1)) > > > return 0; > > > - } > > > > > > inode = mapping->host; > > > ci = ceph_inode(inode); > > > @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) > > > snapc, snapc->seq, snapc->num_snaps); > > > spin_unlock(&ci->i_ceph_lock); > > > > > > - /* now adjust page */ > > > - spin_lock_irq(&mapping->tree_lock); > > > if (page->mapping) { /* Race with truncate? */ > > > - WARN_ON_ONCE(!PageUptodate(page)); > > > - account_page_dirtied(page, page->mapping); > > > - radix_tree_tag_set(&mapping->page_tree, > > > - page_index(page), PAGECACHE_TAG_DIRTY); > > > - > > > /* > > > * Reference snap context in page->private. Also set > > > * PagePrivate so that we get invalidatepage callback. > > > @@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page) > > > undo = 1; > > > } > > > > > > - spin_unlock_irq(&mapping->tree_lock); > > > - > > > if (undo) > > > /* whoops, we failed to dirty the page */ > > > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > > - > > > BUG_ON(!PageDirty(page)); > > > return 1; > > > } > > > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > > > index 458f497..0a331a8 100644 > > > --- a/include/linux/buffer_head.h > > > +++ b/include/linux/buffer_head.h > > > @@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh) > > > } > > > > > > extern int __set_page_dirty_buffers(struct page *page); > > > +extern int __set_page_dirty(struct page *page, > > > + struct address_space *mapping, int warn); > > > > > > #else /* CONFIG_BLOCK */ > > > > > > -- > > > 1.7.1 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > > > in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-07-02 14:49 UTC|newest] Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top 2012-06-28 10:54 [PATCH 0/7] Per-cgroup page stat accounting Sha Zhengju 2012-06-28 10:54 ` Sha Zhengju 2012-06-28 10:54 ` Sha Zhengju 2012-06-28 10:57 ` [PATCH 1/7] memcg: update cgroup memory document Sha Zhengju 2012-06-28 10:57 ` Sha Zhengju 2012-06-28 10:57 ` Sha Zhengju 2012-07-02 7:00 ` Kamezawa Hiroyuki 2012-07-02 7:00 ` Kamezawa Hiroyuki 2012-07-04 12:47 ` Michal Hocko 2012-07-04 12:47 ` Michal Hocko 2012-07-04 12:47 ` Michal Hocko 2012-07-07 13:45 ` Fengguang Wu 2012-07-07 13:45 ` Fengguang Wu 2012-07-07 13:45 ` Fengguang Wu 2012-06-28 10:58 ` [PATCH 2/7] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju 2012-06-28 10:58 ` Sha Zhengju 2012-07-02 10:44 ` Kamezawa Hiroyuki 2012-07-02 10:44 ` Kamezawa Hiroyuki 2012-07-02 10:44 ` Kamezawa Hiroyuki 2012-07-04 12:56 ` Michal Hocko 2012-07-04 12:56 ` Michal Hocko 2012-07-04 12:58 ` Michal Hocko 2012-07-04 12:58 ` Michal Hocko 2012-07-04 12:58 ` Michal Hocko 2012-07-07 13:48 ` Fengguang Wu 2012-07-07 13:48 ` Fengguang Wu 2012-07-09 21:01 ` Greg Thelen 2012-07-09 21:01 ` Greg Thelen 2012-07-09 21:01 ` Greg Thelen 2012-07-11 8:00 ` Sha Zhengju 2012-07-11 8:00 ` Sha Zhengju 2012-06-28 11:01 ` [PATCH 3/7] Make TestSetPageDirty and dirty page accounting in one func Sha Zhengju 2012-06-28 11:01 ` Sha Zhengju 2012-07-02 11:14 ` Kamezawa Hiroyuki 2012-07-02 11:14 ` Kamezawa Hiroyuki 2012-07-02 11:14 ` Kamezawa Hiroyuki 2012-07-07 14:42 ` Fengguang Wu 2012-07-07 14:42 ` Fengguang Wu 2012-07-04 14:23 ` Michal Hocko 2012-07-04 14:23 ` Michal Hocko 2012-06-28 11:03 ` [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem Sha Zhengju 2012-06-28 11:03 ` Sha Zhengju 2012-06-28 11:03 ` Sha Zhengju 2012-06-29 5:21 ` Sage Weil 2012-06-29 5:21 ` Sage Weil 2012-06-29 5:21 ` Sage Weil 2012-07-02 8:10 ` Sha Zhengju 2012-07-02 8:10 ` Sha Zhengju 2012-07-02 14:49 ` Sage Weil [this message] 2012-07-02 14:49 ` Sage Weil 2012-07-04 8:11 ` Sha Zhengju 2012-07-04 8:11 ` Sha Zhengju 2012-07-05 15:20 ` Sage Weil 2012-07-05 15:20 ` Sage Weil 2012-07-05 15:40 ` Sha Zhengju 2012-07-05 15:40 ` Sha Zhengju 2012-07-04 14:27 ` Michal Hocko 2012-07-04 14:27 ` Michal Hocko 2012-06-28 11:04 ` [PATCH 5/7] memcg: add per cgroup dirty pages accounting Sha Zhengju 2012-06-28 11:04 ` Sha Zhengju 2012-06-28 11:04 ` Sha Zhengju 2012-07-03 5:57 ` Kamezawa Hiroyuki 2012-07-03 5:57 ` Kamezawa Hiroyuki 2012-07-08 14:45 ` Fengguang Wu 2012-07-08 14:45 ` Fengguang Wu 2012-07-04 16:11 ` Michal Hocko 2012-07-04 16:11 ` Michal Hocko 2012-07-04 16:11 ` Michal Hocko 2012-07-09 21:02 ` Greg Thelen 2012-07-09 21:02 ` Greg Thelen 2012-07-11 9:32 ` Sha Zhengju 2012-07-11 9:32 ` Sha Zhengju 2012-07-19 6:33 ` Kamezawa Hiroyuki 2012-07-19 6:33 ` Kamezawa Hiroyuki 2012-07-19 6:33 ` Kamezawa Hiroyuki 2012-06-28 11:05 ` [PATCH 6/7] memcg: add per cgroup writeback " Sha Zhengju 2012-06-28 11:05 ` Sha Zhengju 2012-07-03 6:31 ` Kamezawa Hiroyuki 2012-07-03 6:31 ` Kamezawa Hiroyuki 2012-07-04 8:24 ` Sha Zhengju 2012-07-04 8:24 ` Sha Zhengju 2012-07-08 14:44 ` Fengguang Wu 2012-07-08 14:44 ` Fengguang Wu 2012-07-08 23:01 ` Johannes Weiner 2012-07-08 23:01 ` Johannes Weiner 2012-07-09 1:37 ` Fengguang Wu 2012-07-09 1:37 ` Fengguang Wu 2012-07-09 1:37 ` Fengguang Wu 2012-07-04 16:15 ` Michal Hocko 2012-07-04 16:15 ` Michal Hocko 2012-06-28 11:06 ` Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-07-08 14:53 ` Fengguang Wu 2012-07-08 14:53 ` Fengguang Wu 2012-07-08 14:53 ` Fengguang Wu 2012-07-09 3:36 ` Sha Zhengju 2012-07-09 3:36 ` Sha Zhengju 2012-07-09 3:36 ` Sha Zhengju 2012-07-09 4:14 ` Fengguang Wu 2012-07-09 4:14 ` Fengguang Wu 2012-07-09 4:14 ` Fengguang Wu 2012-07-09 4:18 ` Kamezawa Hiroyuki 2012-07-09 4:18 ` Kamezawa Hiroyuki 2012-07-09 5:22 ` Sha Zhengju 2012-07-09 5:22 ` Sha Zhengju 2012-07-09 5:22 ` Sha Zhengju 2012-07-09 5:28 ` Fengguang Wu 2012-07-09 5:28 ` Fengguang Wu 2012-07-09 5:28 ` Fengguang Wu 2012-07-09 5:19 ` Sha Zhengju 2012-07-09 5:19 ` Sha Zhengju 2012-07-09 5:25 ` Fengguang Wu 2012-07-09 5:25 ` Fengguang Wu 2012-07-09 21:02 ` Greg Thelen 2012-07-09 21:02 ` Greg Thelen 2012-06-28 11:06 ` [PATCH 7/7] memcg: print more detailed info while memcg oom happening Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-06-28 11:06 ` Sha Zhengju 2012-07-04 8:25 ` Sha Zhengju 2012-07-04 8:25 ` Sha Zhengju 2012-07-04 8:25 ` Sha Zhengju 2012-07-04 8:29 ` Kamezawa Hiroyuki 2012-07-04 8:29 ` Kamezawa Hiroyuki 2012-07-04 11:20 ` Sha Zhengju 2012-07-04 11:20 ` Sha Zhengju 2012-07-04 11:20 ` Sha Zhengju 2012-06-29 8:23 ` [PATCH 0/7] Per-cgroup page stat accounting Kamezawa Hiroyuki 2012-06-29 8:23 ` Kamezawa Hiroyuki 2012-07-02 7:51 ` Sha Zhengju 2012-07-02 7:51 ` Sha Zhengju 2012-07-02 7:51 ` Sha Zhengju
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Pine.LNX.4.64.1207020745180.23342@cobra.newdream.net \ --to=sage@inktank.com \ --cc=akpm@linux-foundation.org \ --cc=ceph-devel@vger.kernel.org \ --cc=cgroups@vger.kernel.org \ --cc=gthelen@google.com \ --cc=handai.szj@gmail.com \ --cc=handai.szj@taobao.com \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.cz \ --cc=sage@newdream.net \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=yinghan@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.