On Thu, 1 Aug 2013, Yan, Zheng wrote: > On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju wrote: > > From: Sha Zhengju > > > > Following we will begin to add memcg dirty page accounting around > __set_page_dirty_ > > {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to > avoid exporting > > those details to filesystems. > > > > Signed-off-by: Sha Zhengju > > --- > >  fs/ceph/addr.c |   13 +------------ > >  1 file changed, 1 insertion(+), 12 deletions(-) > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > index 3e68ac1..1445bf1 100644 > > --- a/fs/ceph/addr.c > > +++ b/fs/ceph/addr.c > > @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) > >         if (unlikely(!mapping)) > >                 return !TestSetPageDirty(page); > > > > -       if (TestSetPageDirty(page)) { > > +       if (!__set_page_dirty_nobuffers(page)) { > it's too early to set the radix tree tag here. We should set page's snapshot > context and increase the i_wrbuffer_ref first. This is because once the tag > is set, writeback thread can find and start flushing the page. Unfortunately I only remember being frustrated by this code. :) Looking at it now, though, it seems like the minimum fix is to set the page->private before marking the page dirty. I don't know the locking rules around that, though. If that is potentially racy, maybe the safest thing would be if __set_page_dirty_nobuffers() took a void* to set page->private to atomically while holding the tree_lock. sage > > >                 dout("%p set_page_dirty %p idx %lu -- already dirty\n", > >                      mapping->host, page, page->index); > >                 return 0; > > @@ -107,14 +107,7 @@ static int ceph_set_page_dirty(struct page *page) > >              snapc, snapc->seq, snapc->num_snaps); > >         spin_unlock(&ci->i_ceph_lock); > > > > -       /* now adjust page */ > > -       spin_lock_irq(&mapping->tree_lock); > >         if (page->mapping) {    /* Race with truncate? */ > > -               WARN_ON_ONCE(!PageUptodate(page)); > > -               account_page_dirtied(page, page->mapping); > > -               radix_tree_tag_set(&mapping->page_tree, > > -                               page_index(page), PAGECACHE_TAG_DIRTY); > > - > > this code was coped from __set_page_dirty_nobuffers(). I think the reason > Sage did this is to handle the race described in > __set_page_dirty_nobuffers()'s comment. But I'm wonder if "page->mapping == > NULL" can still happen here. Because truncate_inode_page() unmap page from > processes's address spaces first, then delete page from page cache. > > Regards > Yan, Zheng > > >                 /* > >                  * Reference snap context in page->private.  Also set > >                  * PagePrivate so that we get invalidatepage callback. > > @@ -126,14 +119,10 @@ static int ceph_set_page_dirty(struct page *page) > >                 undo = 1; > >         } > > > > -       spin_unlock_irq(&mapping->tree_lock); > > > > > > - > >         if (undo) > >                 /* whoops, we failed to dirty the page */ > >                 ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > -       __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - > >         BUG_ON(!PageDirty(page)); > >         return 1; > >  } > > -- > > 1.7.9.5 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at  http://vger.kernel.org/majordomo-info.html > > >