From: Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> To: Sha Zhengju <handai.szj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: "Yan, Zheng" <ukernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, ceph-devel <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, linux-mm <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>, Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>, Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> Subject: Re: [PATCH V5 2/8] fs/ceph: vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem Date: Fri, 2 Aug 2013 13:30:22 -0700 (PDT) [thread overview] Message-ID: <alpine.DEB.2.00.1308021326080.1128@cobra.newdream.net> (raw) In-Reply-To: <CAFj3OHVXvtr5BDMrGatHZi7M9y+dh1ZKRMQZGjZmNBcg3pNQtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> On Fri, 2 Aug 2013, Sha Zhengju wrote: > On Fri, Aug 2, 2013 at 2:27 AM, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote: > > On Thu, 1 Aug 2013, Yan, Zheng wrote: > >> On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju <handai.szj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> > From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > >> > > >> > Following we will begin to add memcg dirty page accounting around > >> __set_page_dirty_ > >> > {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to > >> avoid exporting > >> > those details to filesystems. > >> > > >> > Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org> > >> > --- > >> > fs/ceph/addr.c | 13 +------------ > >> > 1 file changed, 1 insertion(+), 12 deletions(-) > >> > > >> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > >> > index 3e68ac1..1445bf1 100644 > >> > --- a/fs/ceph/addr.c > >> > +++ b/fs/ceph/addr.c > >> > @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) > >> > if (unlikely(!mapping)) > >> > return !TestSetPageDirty(page); > >> > > >> > - if (TestSetPageDirty(page)) { > >> > + if (!__set_page_dirty_nobuffers(page)) { > >> it's too early to set the radix tree tag here. We should set page's snapshot > >> context and increase the i_wrbuffer_ref first. This is because once the tag > >> is set, writeback thread can find and start flushing the page. > > > > Unfortunately I only remember being frustrated by this code. :) Looking > > at it now, though, it seems like the minimum fix is to set the > > page->private before marking the page dirty. I don't know the locking > > rules around that, though. If that is potentially racy, maybe the safest > > thing would be if __set_page_dirty_nobuffers() took a void* to set > > page->private to atomically while holding the tree_lock. > > > > Sorry, I don't catch the point of your last sentence... Could you > please explain it again? It didn't make much sense. :) I was worried about multiple callers to set_page_dirty, but as understand it, this all happens under page->lock, right? (There is a mention of other special cases in mm/page-writeback.c, but I'm hoping we don't need to worry about that.) In any case, I suspect what we actually want is something like the below (untested) patch. The snapc accounting can be ignored here because invalidatepage will clean it up... sage diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index afb2fc2..7602e46 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -76,9 +76,10 @@ static int ceph_set_page_dirty(struct page *page) if (unlikely(!mapping)) return !TestSetPageDirty(page); - if (TestSetPageDirty(page)) { + if (PageDirty(page)) { dout("%p set_page_dirty %p idx %lu -- already dirty\n", mapping->host, page, page->index); + BUG_ON(!PagePrivate(page)); return 0; } @@ -107,35 +108,16 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc->seq, snapc->num_snaps); spin_unlock(&ci->i_ceph_lock); - /* now adjust page */ - spin_lock_irq(&mapping->tree_lock); - if (page->mapping) { /* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page->mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - - /* - * Reference snap context in page->private. Also set - * PagePrivate so that we get invalidatepage callback. - */ - page->private = (unsigned long)snapc; - SetPagePrivate(page); - } else { - dout("ANON set_page_dirty %p (raced truncate?)\n", page); - undo = 1; - } - - spin_unlock_irq(&mapping->tree_lock); - - if (undo) - /* whoops, we failed to dirty the page */ - ceph_put_wrbuffer_cap_refs(ci, 1, snapc); - - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + /* + * Reference snap context in page->private. Also set + * PagePrivate so that we get invalidatepage callback. + */ + BUG_ON(PagePrivate(page)); + page->private = (unsigned long)snapc; + SetPagePrivate(page); - BUG_ON(!PageDirty(page)); - return 1; + return __set_page_dirty_nobuffers(page); } /*
WARNING: multiple messages have this Message-ID (diff)
From: Sage Weil <sage@inktank.com> To: Sha Zhengju <handai.szj@gmail.com> Cc: "Yan, Zheng" <ukernel@gmail.com>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, ceph-devel <ceph-devel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>, Cgroups <cgroups@vger.kernel.org>, Michal Hocko <mhocko@suse.cz>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Glauber Costa <glommer@gmail.com>, Greg Thelen <gthelen@google.com>, Wu Fengguang <fengguang.wu@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Sha Zhengju <handai.szj@taobao.com> Subject: Re: [PATCH V5 2/8] fs/ceph: vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem Date: Fri, 2 Aug 2013 13:30:22 -0700 (PDT) [thread overview] Message-ID: <alpine.DEB.2.00.1308021326080.1128@cobra.newdream.net> (raw) In-Reply-To: <CAFj3OHVXvtr5BDMrGatHZi7M9y+dh1ZKRMQZGjZmNBcg3pNQtw@mail.gmail.com> On Fri, 2 Aug 2013, Sha Zhengju wrote: > On Fri, Aug 2, 2013 at 2:27 AM, Sage Weil <sage@inktank.com> wrote: > > On Thu, 1 Aug 2013, Yan, Zheng wrote: > >> On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju <handai.szj@gmail.com> wrote: > >> > From: Sha Zhengju <handai.szj@taobao.com> > >> > > >> > Following we will begin to add memcg dirty page accounting around > >> __set_page_dirty_ > >> > {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to > >> avoid exporting > >> > those details to filesystems. > >> > > >> > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > >> > --- > >> > fs/ceph/addr.c | 13 +------------ > >> > 1 file changed, 1 insertion(+), 12 deletions(-) > >> > > >> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > >> > index 3e68ac1..1445bf1 100644 > >> > --- a/fs/ceph/addr.c > >> > +++ b/fs/ceph/addr.c > >> > @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) > >> > if (unlikely(!mapping)) > >> > return !TestSetPageDirty(page); > >> > > >> > - if (TestSetPageDirty(page)) { > >> > + if (!__set_page_dirty_nobuffers(page)) { > >> it's too early to set the radix tree tag here. We should set page's snapshot > >> context and increase the i_wrbuffer_ref first. This is because once the tag > >> is set, writeback thread can find and start flushing the page. > > > > Unfortunately I only remember being frustrated by this code. :) Looking > > at it now, though, it seems like the minimum fix is to set the > > page->private before marking the page dirty. I don't know the locking > > rules around that, though. If that is potentially racy, maybe the safest > > thing would be if __set_page_dirty_nobuffers() took a void* to set > > page->private to atomically while holding the tree_lock. > > > > Sorry, I don't catch the point of your last sentence... Could you > please explain it again? It didn't make much sense. :) I was worried about multiple callers to set_page_dirty, but as understand it, this all happens under page->lock, right? (There is a mention of other special cases in mm/page-writeback.c, but I'm hoping we don't need to worry about that.) In any case, I suspect what we actually want is something like the below (untested) patch. The snapc accounting can be ignored here because invalidatepage will clean it up... sage diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index afb2fc2..7602e46 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -76,9 +76,10 @@ static int ceph_set_page_dirty(struct page *page) if (unlikely(!mapping)) return !TestSetPageDirty(page); - if (TestSetPageDirty(page)) { + if (PageDirty(page)) { dout("%p set_page_dirty %p idx %lu -- already dirty\n", mapping->host, page, page->index); + BUG_ON(!PagePrivate(page)); return 0; } @@ -107,35 +108,16 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc->seq, snapc->num_snaps); spin_unlock(&ci->i_ceph_lock); - /* now adjust page */ - spin_lock_irq(&mapping->tree_lock); - if (page->mapping) { /* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page->mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - - /* - * Reference snap context in page->private. Also set - * PagePrivate so that we get invalidatepage callback. - */ - page->private = (unsigned long)snapc; - SetPagePrivate(page); - } else { - dout("ANON set_page_dirty %p (raced truncate?)\n", page); - undo = 1; - } - - spin_unlock_irq(&mapping->tree_lock); - - if (undo) - /* whoops, we failed to dirty the page */ - ceph_put_wrbuffer_cap_refs(ci, 1, snapc); - - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + /* + * Reference snap context in page->private. Also set + * PagePrivate so that we get invalidatepage callback. + */ + BUG_ON(PagePrivate(page)); + page->private = (unsigned long)snapc; + SetPagePrivate(page); - BUG_ON(!PageDirty(page)); - return 1; + return __set_page_dirty_nobuffers(page); } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-08-02 20:30 UTC|newest] Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-08-01 11:43 [PATCH V5 0/8] Add memcg dirty/writeback page accounting Sha Zhengju 2013-08-01 11:43 ` Sha Zhengju 2013-08-01 11:44 ` [PATCH 1/8] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju 2013-08-01 11:51 ` [PATCH V5 2/8] fs/ceph: vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem Sha Zhengju 2013-08-01 15:19 ` Yan, Zheng 2013-08-01 18:27 ` Sage Weil 2013-08-02 10:04 ` Sha Zhengju [not found] ` <CAFj3OHVXvtr5BDMrGatHZi7M9y+dh1ZKRMQZGjZmNBcg3pNQtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-08-02 20:30 ` Sage Weil [this message] 2013-08-02 20:30 ` Sage Weil 2013-08-03 8:58 ` Sha Zhengju 2013-08-02 9:04 ` Sha Zhengju 2013-08-02 9:04 ` Sha Zhengju 2013-08-02 13:11 ` Yan, Zheng 2013-08-01 11:52 ` [PATCH V5 3/8] memcg: check for proper lock held in mem_cgroup_update_page_stat Sha Zhengju 2013-08-01 14:34 ` Michal Hocko 2013-08-01 14:34 ` Michal Hocko 2013-08-04 18:48 ` Greg Thelen 2013-08-04 18:48 ` Greg Thelen 2013-08-01 11:53 ` [PATCH V5 4/8] memcg: add per cgroup dirty pages accounting Sha Zhengju 2013-08-01 11:54 ` [PATCH V5 5/8] memcg: add per cgroup writeback " Sha Zhengju 2013-08-01 14:53 ` Michal Hocko [not found] ` <20130801145302.GJ5198-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 2013-08-03 9:25 ` Sha Zhengju 2013-08-03 9:25 ` Sha Zhengju [not found] ` <CAFj3OHV-VCKJfe6bv4UMvv+uj4LELDXsieRZFJD06Yrdyy=XxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-08-04 10:08 ` Michal Hocko 2013-08-04 10:08 ` Michal Hocko 2013-08-22 9:46 ` Fwd: " Sha Zhengju 2013-08-22 9:46 ` Sha Zhengju 2013-08-22 9:50 ` [PATCH 1/4] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju 2013-08-22 9:52 ` [PATCH 2/4] memcg: check for proper lock held in mem_cgroup_update_page_stat Sha Zhengju 2013-08-22 9:53 ` [PATCH 3/4] memcg: add per cgroup writeback pages accounting Sha Zhengju 2013-08-22 22:40 ` Andrew Morton 2013-08-23 16:11 ` Sha Zhengju 2013-08-23 16:11 ` Sha Zhengju 2013-08-22 9:53 ` [PATCH 4/4] memcg: Document cgroup dirty/writeback memory statistics Sha Zhengju 2013-08-04 18:51 ` [PATCH V5 5/8] memcg: add per cgroup writeback pages accounting Greg Thelen 2013-08-04 18:51 ` Greg Thelen 2013-08-01 11:55 ` [PATCH V5 6/8] memcg: make nocpu_base available for non-hotplug Sha Zhengju 2013-08-01 12:00 ` [PATCH V5 7/8] memcg: don't account root memcg page stats if only root exists Sha Zhengju 2013-08-01 16:20 ` Johannes Weiner 2013-08-02 4:32 ` Sha Zhengju 2013-08-02 4:32 ` Sha Zhengju 2013-08-05 21:58 ` Johannes Weiner 2013-08-05 21:58 ` Johannes Weiner 2013-08-01 12:00 ` [PATCH V5 8/8] memcg: Document cgroup dirty/writeback memory statistics Sha Zhengju 2013-08-01 12:00 ` Sha Zhengju 2013-08-01 14:43 ` [PATCH V5 0/8] Add memcg dirty/writeback page accounting Michal Hocko 2013-08-03 9:30 ` Sha Zhengju
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=alpine.DEB.2.00.1308021326080.1128@cobra.newdream.net \ --to=sage-4gqslpfj+cxbdgjk7y7tuq@public.gmane.org \ --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \ --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \ --cc=glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \ --cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \ --cc=handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org \ --cc=handai.szj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \ --cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \ --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \ --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \ --cc=ukernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.