From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 262DAC7C864 for ; Mon, 27 Apr 2020 08:48:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CED9C206D4 for ; Mon, 27 Apr 2020 08:48:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CED9C206D4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5B3A58E0008; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 514508E0001; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B4118E0008; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id 21CA28E0001 for ; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DEF226882 for ; Mon, 27 Apr 2020 08:48:38 +0000 (UTC) X-FDA: 76753009116.03.shoes01_2e3cf8b60514c X-HE-Tag: shoes01_2e3cf8b60514c X-Filterd-Recvd-Size: 9889 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:37 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547652" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:31 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id D7A794BCC88C; Mon, 27 Apr 2020 16:37:49 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:32 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:31 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , Subject: [RFC PATCH 1/8] fs/dax: Introduce dax-rmap btree for reflink Date: Mon, 27 Apr 2020 16:47:43 +0800 Message-ID: <20200427084750.136031-2-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain X-yoursite-MailScanner-ID: D7A794BCC88C.A20EE X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Normally, when accessing a mmapped file, entering the page fault, the file's (->mapping, ->index) will be associated with dax entry(represents for one page or a couple of pages) to facilitate the reverse mapping search. But in the case of reflink, a dax entry may be shared by multipl= e files or offsets. In order to establish a reverse mapping relationship i= n this case, I introduce a rb-tree to track multiple files and offsets. The root of the rb-tree is stored in page->private, since I haven't found it be used in fsdax. We create the rb-tree and insert the (->mapping, ->index) tuple in the second time a dax entry is associated, which means this dax entry is shared. And delete this tuple from the rb-tree when disassociating. Signed-off-by: Shiyang Ruan --- fs/dax.c | 153 ++++++++++++++++++++++++++++++++++++++++---- include/linux/dax.h | 6 ++ 2 files changed, 147 insertions(+), 12 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 11b16729b86f..2f996c566103 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -25,6 +25,7 @@ #include #include #include +#include #include =20 #define CREATE_TRACE_POINTS @@ -310,6 +311,120 @@ static unsigned long dax_entry_size(void *entry) return PAGE_SIZE; } =20 +static struct kmem_cache *dax_rmap_node_cachep; +static struct kmem_cache *dax_rmap_root_cachep; + +static int __init init_dax_rmap_cache(void) +{ + dax_rmap_root_cachep =3D KMEM_CACHE(rb_root_cached, SLAB_PANIC|SLAB_ACC= OUNT); + dax_rmap_node_cachep =3D KMEM_CACHE(shared_file, SLAB_PANIC|SLAB_ACCOUN= T); + return 0; +} +fs_initcall(init_dax_rmap_cache); + +struct rb_root_cached *dax_create_rbroot(void) +{ + struct rb_root_cached *root =3D kmem_cache_alloc(dax_rmap_root_cachep, + GFP_KERNEL); + + memset(root, 0, sizeof(struct rb_root_cached)); + return root; +} + +static bool dax_rmap_insert(struct page *page, struct address_space *map= ping, + pgoff_t index) +{ + struct rb_root_cached *root =3D (struct rb_root_cached *)page_private(p= age); + struct rb_node **new, *parent =3D NULL; + struct shared_file *p; + bool leftmost =3D true; + + if (!root) { + root =3D dax_create_rbroot(); + set_page_private(page, (unsigned long)root); + dax_rmap_insert(page, page->mapping, page->index); + } + new =3D &root->rb_root.rb_node; + /* Figure out where to insert new node */ + while (*new) { + struct shared_file *this =3D container_of(*new, struct shared_file, no= de); + long result =3D (long)mapping - (long)this->mapping; + + if (result =3D=3D 0) + result =3D (long)index - (long)this->index; + parent =3D *new; + if (result < 0) + new =3D &((*new)->rb_left); + else if (result > 0) { + new =3D &((*new)->rb_right); + leftmost =3D false; + } else + return false; + } + p =3D kmem_cache_alloc(dax_rmap_node_cachep, GFP_KERNEL); + p->mapping =3D mapping; + p->index =3D index; + + /* Add new node and rebalance tree. */ + rb_link_node(&p->node, parent, new); + rb_insert_color_cached(&p->node, root, leftmost); + + return true; +} + +static struct shared_file *dax_rmap_search(struct page *page, + struct address_space *mapping, + pgoff_t index) +{ + struct rb_root_cached *root =3D (struct rb_root_cached *)page_private(p= age); + struct rb_node *node =3D root->rb_root.rb_node; + + while (node) { + struct shared_file *this =3D container_of(node, struct shared_file, no= de); + long result =3D (long)mapping - (long)this->mapping; + + if (result =3D=3D 0) + result =3D (long)index - (long)this->index; + if (result < 0) + node =3D node->rb_left; + else if (result > 0) + node =3D node->rb_right; + else + return this; + } + return NULL; +} + +static void dax_rmap_delete(struct page *page, struct address_space *map= ping, + pgoff_t index) +{ + struct rb_root_cached *root =3D (struct rb_root_cached *)page_private(p= age); + struct shared_file *this; + + if (!root) { + page->mapping =3D NULL; + page->index =3D 0; + return; + } + + this =3D dax_rmap_search(page, mapping, index); + rb_erase_cached(&this->node, root); + kmem_cache_free(dax_rmap_node_cachep, this); + + if (!RB_EMPTY_ROOT(&root->rb_root)) { + if (page->mapping =3D=3D mapping && page->index =3D=3D index) { + this =3D container_of(rb_first_cached(root), struct shared_file, node= ); + page->mapping =3D this->mapping; + page->index =3D this->index; + } + } else { + kmem_cache_free(dax_rmap_root_cachep, root); + set_page_private(page, 0); + page->mapping =3D NULL; + page->index =3D 0; + } +} + static unsigned long dax_end_pfn(void *entry) { return dax_to_pfn(entry) + dax_entry_size(entry) / PAGE_SIZE; @@ -341,16 +456,20 @@ static void dax_associate_entry(void *entry, struct= address_space *mapping, for_each_mapped_pfn(entry, pfn) { struct page *page =3D pfn_to_page(pfn); =20 - WARN_ON_ONCE(page->mapping); - page->mapping =3D mapping; - page->index =3D index + i++; + if (!page->mapping) { + page->mapping =3D mapping; + page->index =3D index + i++; + } else { + dax_rmap_insert(page, mapping, index + i++); + } } } =20 static void dax_disassociate_entry(void *entry, struct address_space *ma= pping, - bool trunc) + pgoff_t index, bool trunc) { unsigned long pfn; + int i =3D 0; =20 if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) return; @@ -359,9 +478,19 @@ static void dax_disassociate_entry(void *entry, stru= ct address_space *mapping, struct page *page =3D pfn_to_page(pfn); =20 WARN_ON_ONCE(trunc && page_ref_count(page) > 1); - WARN_ON_ONCE(page->mapping && page->mapping !=3D mapping); - page->mapping =3D NULL; - page->index =3D 0; + WARN_ON_ONCE(!page->mapping); + dax_rmap_delete(page, mapping, index + i++); + } +} + +static void __dax_decrease_nrexceptional(void *entry, + struct address_space *mapping) +{ + if (dax_is_empty_entry(entry) || dax_is_zero_entry(entry) || + dax_is_pmd_entry(entry)) { + mapping->nrexceptional--; + } else { + mapping->nrexceptional -=3D PHYS_PFN(dax_entry_size(entry)); } } =20 @@ -522,10 +651,10 @@ static void *grab_mapping_entry(struct xa_state *xa= s, xas_lock_irq(xas); } =20 - dax_disassociate_entry(entry, mapping, false); + dax_disassociate_entry(entry, mapping, index, false); xas_store(xas, NULL); /* undo the PMD join */ dax_wake_entry(xas, entry, true); - mapping->nrexceptional--; + __dax_decrease_nrexceptional(entry, mapping); entry =3D NULL; xas_set(xas, index); } @@ -642,9 +771,9 @@ static int __dax_invalidate_entry(struct address_spac= e *mapping, (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY) || xas_get_mark(&xas, PAGECACHE_TAG_TOWRITE))) goto out; - dax_disassociate_entry(entry, mapping, trunc); + dax_disassociate_entry(entry, mapping, index, trunc); xas_store(&xas, NULL); - mapping->nrexceptional--; + __dax_decrease_nrexceptional(entry, mapping); ret =3D 1; out: put_unlocked_entry(&xas, entry); @@ -737,7 +866,7 @@ static void *dax_insert_entry(struct xa_state *xas, if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { void *old; =20 - dax_disassociate_entry(entry, mapping, false); + dax_disassociate_entry(entry, mapping, xas->xa_index, false); dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address); /* * Only swap our new entry into the page cache if the current diff --git a/include/linux/dax.h b/include/linux/dax.h index d7af5d243f24..1e2e81c701b6 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -39,6 +39,12 @@ struct dax_operations { int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); }; =20 +struct shared_file { + struct address_space *mapping; + pgoff_t index; + struct rb_node node; +}; + extern struct attribute_group dax_attribute_group; =20 #if IS_ENABLED(CONFIG_DAX) --=20 2.26.2