From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91840C433E2 for ; Thu, 3 Sep 2020 18:30:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33B6B20709 for ; Thu, 3 Sep 2020 18:30:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="sAAtpltZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33B6B20709 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ACCD86B007D; Thu, 3 Sep 2020 14:30:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A56E86B007E; Thu, 3 Sep 2020 14:30:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9466F900004; Thu, 3 Sep 2020 14:30:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id 7AAE16B007D for ; Thu, 3 Sep 2020 14:30:50 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3F417181AC553 for ; Thu, 3 Sep 2020 18:30:50 +0000 (UTC) X-FDA: 77222591460.25.stick25_3a05f8a270ab Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 048CD1804E3A1 for ; Thu, 3 Sep 2020 18:30:49 +0000 (UTC) X-HE-Tag: stick25_3a05f8a270ab X-Filterd-Recvd-Size: 12763 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Sep 2020 18:30:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=hKrhCxTanHWfnS47+fIzjV5nGFGB+4cjBWPVZgEfQyU=; b=sAAtpltZ6LDjU/WkgjOMBSpYr8 lEfOuJgwqHe2Ir3tEG9jqeHIsGw1WjLGL4MOxO87faKfXwkn35CLsrLcVn0aN5etKJLfxmdtHQOQF 5RtZhqe23s9V+8v5uTY3NYKCHLGnVSAY2+DlHZQ+pT3GhBqvJzeDrfzZ1Ick5wFZ3Wx0lofEB+NXE ayCCOLR8rWCCiQrthhr3iYBlnkRAn7FJbaQL5RviU+TpNvFbzB5Kd7uZTfCU7XCX1E908Ox1svCYd WfZu1K0heFRoxQ0zKKCS8LlF0BelW4mbi3YDtrU9+PQAvCoMnfZUWypNhrJv8PxCqTpFP3LtyQC1I f11b1w7g==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kDu0R-0003u4-5O; Thu, 03 Sep 2020 18:30:43 +0000 From: "Matthew Wilcox (Oracle)" To: Andrew Morton Cc: "Matthew Wilcox (Oracle)" , linux-mm@kvack.org, Song Liu , "Kirill A . Shutemov" , Qian Cai Subject: [PATCH 2/3] XArray: Add xas_split Date: Thu, 3 Sep 2020 19:30:28 +0100 Message-Id: <20200903183029.14930-3-willy@infradead.org> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20200903183029.14930-1-willy@infradead.org> References: <20200903183029.14930-1-willy@infradead.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 048CD1804E3A1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page entry). This version does not support splitting more than one level of the tree at a time. This is an acceptable limitation for the page cache as we do not expect to support order-12 pages in the near future. Signed-off-by: Matthew Wilcox (Oracle) --- Documentation/core-api/xarray.rst | 16 +-- include/linux/xarray.h | 13 +++ lib/test_xarray.c | 41 ++++++++ lib/xarray.c | 157 ++++++++++++++++++++++++++++-- 4 files changed, 211 insertions(+), 16 deletions(-) diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/x= array.rst index 640934b6f7b4..a137a0e6d068 100644 --- a/Documentation/core-api/xarray.rst +++ b/Documentation/core-api/xarray.rst @@ -475,13 +475,15 @@ or iterations will move the index to the first inde= x in the range. Each entry will only be returned once, no matter how many indices it occupies. =20 -Using xas_next() or xas_prev() with a multi-index xa_state -is not supported. Using either of these functions on a multi-index entr= y -will reveal sibling entries; these should be skipped over by the caller. - -Storing ``NULL`` into any index of a multi-index entry will set the entr= y -at every index to ``NULL`` and dissolve the tie. Splitting a multi-inde= x -entry into entries occupying smaller ranges is not yet supported. +Using xas_next() or xas_prev() with a multi-index xa_state is not +supported. Using either of these functions on a multi-index entry will +reveal sibling entries; these should be skipped over by the caller. + +Storing ``NULL`` into any index of a multi-index entry will set the +entry at every index to ``NULL`` and dissolve the tie. A multi-index +entry can be split into entries occupying smaller ranges by calling +xas_split_alloc() without the xa_lock held, followed by taking the lock +and calling xas_split(). =20 Functions and structures =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 5b7f4ebcf4ff..5cdf441f6377 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1507,11 +1507,24 @@ void xas_create_range(struct xa_state *); =20 #ifdef CONFIG_XARRAY_MULTI int xa_get_order(struct xarray *, unsigned long index); +void xas_split(struct xa_state *, void *entry, unsigned int order); +void xas_split_alloc(struct xa_state *, void *entry, unsigned int order,= gfp_t); #else static inline int xa_get_order(struct xarray *xa, unsigned long index) { return 0; } + +static inline void xas_split(struct xa_state *xas, void *entry, + unsigned int order) +{ + xas_store(xas, entry); +} + +static inline void xas_split_alloc(struct xa_state *xas, void *entry, + unsigned int order, gfp_t gfp) +{ +} #endif =20 /** diff --git a/lib/test_xarray.c b/lib/test_xarray.c index bdd4d7995f79..16fe5adcac62 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -1503,6 +1503,46 @@ static noinline void check_store_range(struct xarr= ay *xa) } } =20 +#ifdef CONFIG_XARRAY_MULTI +static void check_split_1(struct xarray *xa, unsigned long index, + unsigned int order) +{ + XA_STATE(xas, xa, index); + void *entry; + unsigned int i =3D 0; + + xa_store_order(xa, index, order, xa, GFP_KERNEL); + + xas_split_alloc(&xas, xa, order, GFP_KERNEL); + xas_lock(&xas); + xas_split(&xas, xa, order); + xas_unlock(&xas); + + xa_for_each(xa, index, entry) { + XA_BUG_ON(xa, entry !=3D xa); + i++; + } + XA_BUG_ON(xa, i !=3D 1 << order); + + xa_destroy(xa); +} + +static noinline void check_split(struct xarray *xa) +{ + unsigned int order; + + XA_BUG_ON(xa, !xa_empty(xa)); + + for (order =3D 1; order < 2 * XA_CHUNK_SHIFT; order++) { + check_split_1(xa, 0, order); + check_split_1(xa, 1UL << order, order); + check_split_1(xa, 3UL << order, order); + } +} +#else +static void check_split(struct xarray *xa) { } +#endif + static void check_align_1(struct xarray *xa, char *name) { int i; @@ -1729,6 +1769,7 @@ static int xarray_checks(void) check_store_range(&array); check_store_iter(&array); check_align(&xa0); + check_split(&array); =20 check_workingset(&array, 0); check_workingset(&array, 64); diff --git a/lib/xarray.c b/lib/xarray.c index b8cc769cefc0..ff3652657bd8 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -266,13 +266,15 @@ static void xa_node_free(struct xa_node *node) */ static void xas_destroy(struct xa_state *xas) { - struct xa_node *node =3D xas->xa_alloc; + struct xa_node *next, *node =3D xas->xa_alloc; =20 - if (!node) - return; - XA_NODE_BUG_ON(node, !list_empty(&node->private_list)); - kmem_cache_free(radix_tree_node_cachep, node); - xas->xa_alloc =3D NULL; + while (node) { + XA_NODE_BUG_ON(node, !list_empty(&node->private_list)); + next =3D rcu_dereference_raw(node->parent); + /* XXX: need to free children */ + kmem_cache_free(radix_tree_node_cachep, node); + xas->xa_alloc =3D node =3D next; + } } =20 /** @@ -304,6 +306,7 @@ bool xas_nomem(struct xa_state *xas, gfp_t gfp) xas->xa_alloc =3D kmem_cache_alloc(radix_tree_node_cachep, gfp); if (!xas->xa_alloc) return false; + xas->xa_alloc->parent =3D NULL; XA_NODE_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list)= ); xas->xa_node =3D XAS_RESTART; return true; @@ -339,6 +342,7 @@ static bool __xas_nomem(struct xa_state *xas, gfp_t g= fp) } if (!xas->xa_alloc) return false; + xas->xa_alloc->parent =3D NULL; XA_NODE_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list)= ); xas->xa_node =3D XAS_RESTART; return true; @@ -403,7 +407,7 @@ static unsigned long xas_size(const struct xa_state *= xas) /* * Use this to calculate the maximum index that will need to be created * in order to add the entry described by @xas. Because we cannot store= a - * multiple-index entry at index 0, the calculation is a little more com= plex + * multi-index entry at index 0, the calculation is a little more comple= x * than you might expect. */ static unsigned long xas_max(struct xa_state *xas) @@ -946,6 +950,141 @@ void xas_init_marks(const struct xa_state *xas) } EXPORT_SYMBOL_GPL(xas_init_marks); =20 +#ifdef CONFIG_XARRAY_MULTI +static unsigned int node_get_marks(struct xa_node *node, unsigned int of= fset) +{ + unsigned int marks =3D 0; + xa_mark_t mark =3D XA_MARK_0; + + for (;;) { + if (node_get_mark(node, offset, mark)) + marks |=3D 1 << (__force unsigned int)mark; + if (mark =3D=3D XA_MARK_MAX) + break; + mark_inc(mark); + } + + return marks; +} + +static void node_set_marks(struct xa_node *node, unsigned int offset, + struct xa_node *child, unsigned int marks) +{ + xa_mark_t mark =3D XA_MARK_0; + + for (;;) { + if (marks & (1 << (__force unsigned int)mark)) + node_set_mark(node, offset, mark); + if (child) + node_mark_all(child, mark); + if (mark =3D=3D XA_MARK_MAX) + break; + mark_inc(mark); + } +} + +/** + * Allocate memory for splitting an entry of @order size into the order + * stored in the @xas. + */ +void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int ord= er, + gfp_t gfp) +{ + unsigned int sibs =3D (1 << (order % XA_CHUNK_SHIFT)) - 1; + unsigned int mask =3D xas->xa_sibs; + + /* XXX: no support for splitting really large entries yet */ + if (WARN_ON(xas->xa_shift + 2 * XA_CHUNK_SHIFT < order)) + goto nomem; + if (xas->xa_shift + XA_CHUNK_SHIFT > order) + return; + + do { + unsigned int i; + void *sibling; + struct xa_node *node; + + node =3D kmem_cache_alloc(radix_tree_node_cachep, gfp); + if (!node) + goto nomem; + node->array =3D xas->xa; + for (i =3D 0; i < XA_CHUNK_SIZE; i++) { + if ((i & mask) =3D=3D 0) { + RCU_INIT_POINTER(node->slots[i], entry); + sibling =3D xa_mk_sibling(0); + } else { + RCU_INIT_POINTER(node->slots[i], sibling); + } + } + RCU_INIT_POINTER(node->parent, xas->xa_alloc); + xas->xa_alloc =3D node; + } while (sibs-- > 0); + + return; +nomem: + xas_destroy(xas); + xas_set_err(xas, -ENOMEM); +} + +/** + * xas_split() - Split a multi-index entry into smaller entries. + * @xas: XArray operation state. + * @entry: New entry to store in the array. + * @order: New entry order. + * + * The value in the entry is copied to all the replacement entries. + * + * Context: Any context. The caller should hold the xa_lock. + */ +void xas_split(struct xa_state *xas, void *entry, unsigned int order) +{ + unsigned int sibs =3D (1 << (order % XA_CHUNK_SHIFT)) - 1; + unsigned int offset, marks; + struct xa_node *node; + void *curr =3D xas_load(xas); + int values =3D 0; + + node =3D xas->xa_node; + if (xas_top(node)) + return; + + marks =3D node_get_marks(node, xas->xa_offset); + + offset =3D xas->xa_offset + sibs; + do { + if (xas->xa_shift < node->shift) { + struct xa_node *child =3D xas->xa_alloc; + + xas->xa_alloc =3D rcu_dereference_raw(child->parent); + child->shift =3D node->shift - XA_CHUNK_SHIFT; + child->offset =3D offset; + child->count =3D XA_CHUNK_SIZE; + child->nr_values =3D xa_is_value(entry) ? + XA_CHUNK_SIZE : 0; + RCU_INIT_POINTER(child->parent, node); + node_set_marks(node, offset, child, marks); + rcu_assign_pointer(node->slots[offset], + xa_mk_node(child)); + if (xa_is_value(curr)) + values--; + } else { + unsigned int canon =3D offset - xas->xa_sibs; + + node_set_marks(node, canon, NULL, marks); + rcu_assign_pointer(node->slots[canon], entry); + while (offset > canon) + rcu_assign_pointer(node->slots[offset--], + xa_mk_sibling(canon)); + values +=3D (xa_is_value(entry) - xa_is_value(curr)) * + (xas->xa_sibs + 1); + } + } while (offset-- > xas->xa_offset); + + node->nr_values +=3D values; +} +EXPORT_SYMBOL_GPL(xas_split); +#endif + /** * xas_pause() - Pause a walk to drop a lock. * @xas: XArray operation state. @@ -1407,7 +1546,7 @@ EXPORT_SYMBOL(__xa_store); * @gfp: Memory allocation flags. * * After this function returns, loads from this index will return @entry= . - * Storing into an existing multislot entry updates the entry of every i= ndex. + * Storing into an existing multi-index entry updates the entry of every= index. * The marks associated with @index are unaffected unless @entry is %NUL= L. * * Context: Any context. Takes and releases the xa_lock. @@ -1549,7 +1688,7 @@ static void xas_set_range(struct xa_state *xas, uns= igned long first, * * After this function returns, loads from any index between @first and = @last, * inclusive will return @entry. - * Storing into an existing multislot entry updates the entry of every i= ndex. + * Storing into an existing multi-index entry updates the entry of every= index. * The marks associated with @index are unaffected unless @entry is %NUL= L. * * Context: Process context. Takes and releases the xa_lock. May sleep --=20 2.28.0