From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Yan, Zheng" Subject: Re: [PATCH V5 2/8] fs/ceph: vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem Date: Thu, 1 Aug 2013 23:19:48 +0800 Message-ID: References: <1375357402-9811-1-git-send-email-handai.szj@taobao.com> <1375357892-10188-1-git-send-email-handai.szj@taobao.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e0160a6f277e6f204e2e45f4f Cc: "linux-fsdevel@vger.kernel.org" , ceph-devel , linux-mm , cgroups@vger.kernel.org, Sage Weil , mhocko@suse.cz, kamezawa.hiroyu@jp.fujitsu.com, glommer@gmail.com, Greg Thelen , Wu Fengguang , Andrew Morton , Sha Zhengju To: Sha Zhengju Return-path: In-Reply-To: <1375357892-10188-1-git-send-email-handai.szj@taobao.com> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org --089e0160a6f277e6f204e2e45f4f Content-Type: text/plain; charset=ISO-8859-1 On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju wrote: > From: Sha Zhengju > > Following we will begin to add memcg dirty page accounting around __set_page_dirty_ > {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to avoid exporting > those details to filesystems. > > Signed-off-by: Sha Zhengju > --- > fs/ceph/addr.c | 13 +------------ > 1 file changed, 1 insertion(+), 12 deletions(-) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 3e68ac1..1445bf1 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > - if (TestSetPageDirty(page)) { > + if (!__set_page_dirty_nobuffers(page)) { it's too early to set the radix tree tag here. We should set page's snapshot context and increase the i_wrbuffer_ref first. This is because once the tag is set, writeback thread can find and start flushing the page. > dout("%p set_page_dirty %p idx %lu -- already dirty\n", > mapping->host, page, page->index); > return 0; > @@ -107,14 +107,7 @@ static int ceph_set_page_dirty(struct page *page) > snapc, snapc->seq, snapc->num_snaps); > spin_unlock(&ci->i_ceph_lock); > > - /* now adjust page */ > - spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > - WARN_ON_ONCE(!PageUptodate(page)); > - account_page_dirtied(page, page->mapping); > - radix_tree_tag_set(&mapping->page_tree, > - page_index(page), PAGECACHE_TAG_DIRTY); > - this code was coped from __set_page_dirty_nobuffers(). I think the reason Sage did this is to handle the race described in __set_page_dirty_nobuffers()'s comment. But I'm wonder if "page->mapping == NULL" can still happen here. Because truncate_inode_page() unmap page from processes's address spaces first, then delete page from page cache. Regards Yan, Zheng > /* > * Reference snap context in page->private. Also set > * PagePrivate so that we get invalidatepage callback. > @@ -126,14 +119,10 @@ static int ceph_set_page_dirty(struct page *page) > undo = 1; > } > > - spin_unlock_irq(&mapping->tree_lock); > - > if (undo) > /* whoops, we failed to dirty the page */ > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > - > BUG_ON(!PageDirty(page)); > return 1; > } > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html --089e0160a6f277e6f204e2e45f4f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju <handai.szj@gmail.com>= ; wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> Following we will begin to add memcg dirty page accounting aro= und __set_page_dirty_
> {buffers,nobuffers} in vfs layer, so we'd= better use vfs interface to avoid exporting
> those details to files= ystems.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> ---
&g= t; =A0fs/ceph/addr.c | =A0 13 +------------
> =A01 file changed, 1 in= sertion(+), 12 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 3e6= 8ac1..1445bf1 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/add= r.c
> @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *p= age)
> =A0 =A0 =A0 =A0 if (unlikely(!mapping))
> =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 return !TestSetPageDirty(page);
>
> - =A0 =A0 =A0 i= f (TestSetPageDirty(page)) {
> + =A0 =A0 =A0 if (!__set_page_dirty_no= buffers(page)) {

it's too early to set the radix tree tag here. We should set page's= snapshot context and increase the i_wrbuffer_ref first. This is because on= ce the tag is set, writeback thread can find and start flushing the page.


> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dout(&quo= t;%p set_page_dirty %p idx %lu -- already dirty\n",
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mapping->host, page, pag= e->index);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
> @@ = -107,14 +107,7 @@ static int ceph_set_page_dirty(struct page *page)
>= =A0 =A0 =A0 =A0 =A0 =A0 =A0snapc, snapc->seq, snapc->num_snaps);
> =A0 =A0 =A0 =A0 spin_unlock(&ci->i_ceph_lock);
>
> = - =A0 =A0 =A0 /* now adjust page */
> - =A0 =A0 =A0 spin_lock_irq(&am= p;mapping->tree_lock);
> =A0 =A0 =A0 =A0 if (page->mapping) { = =A0 =A0/* Race with truncate? */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 WARN_ON_ONCE(!PageUptodate(page));<= br>
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 account_page_dirtied(page,= page->mapping);
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 radix_tree_tag_se= t(&mapping->page_tree,
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 page_index(page), PAGECACHE_TAG_DIRTY);
> -

this code was coped from __set_page_d= irty_nobuffers(). I think the reason Sage did this is to handle the race de= scribed in __set_page_dirty_nobuffers()'s comment. But I'm wonder i= f "page->mapping =3D=3D NULL" can still happen here. Because t= runcate_inode_page() unmap page from processes's address spaces first, = then delete page from page cache.

Regards
Yan, Zheng

> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /*
> =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0* Reference snap context in page->private. =A0Also set> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* PagePrivate so that we get inval= idatepage callback.
> @@ -126,14 +119,10 @@ static int ceph_set_page_dirty(struct page *page= )
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 undo =3D 1;
> =A0 =A0 =A0 =A0 }<= br>>
> - =A0 =A0 =A0 spin_unlock_irq(&mapping->tree_lock);<= /div>




> -
> = =A0 =A0 =A0 =A0 if (undo)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* whoops= , we failed to dirty the page */
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ceph_put_wrbuffer_cap_refs(ci, 1, snap= c);
>
> - =A0 =A0 =A0 __mark_inode_dirty(mapping->hos= t, I_DIRTY_PAGES);
> -
> =A0 =A0 =A0 =A0 BUG_ON(!PageDirty(page= ));
> =A0 =A0 =A0 =A0 return 1;
> =A0}
> --
> 1.7.9.5
&= gt;
> --
> To unsubscribe from this list: send the line "u= nsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.o= rg
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html=

--089e0160a6f277e6f204e2e45f4f-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org