All of lore.kernel.org
 help / color / mirror / Atom feed
* Is it possible to force an address_space to always allocate pages in specific order?
@ 2022-09-14  6:17 Qu Wenruo
  2022-09-14 22:23 ` Matthew Wilcox
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2022-09-14  6:17 UTC (permalink / raw)
  To: Linux Memory Management List; +Cc: linux-btrfs, Matthew Wilcox

Hi,

With recent folio MM changes, I'm wondering if it's possible to force an
address space to always allocate a folio in certain order?

E.g. For certain inode, we always allocate pages (folios) in the order
of 2 for its page cache.

I'm asking this seemingly weird question for the following reasons:

- Support multi-page blocksize of various filesystems
   Currently most file systems only go support sub-page, not multi-page
   blocksize.

   Thus if there is forced order for all the address space, it would be
   much easier to implement multi-page blocksize support.
   (Although I strongly doubt if we need such multi-page blocksize
    support for most fses)

- For btrfs metadata optimization
   Btrfs metadata is always using multiple blocks (and power of 2 of
   cource) for one of its metadata block.

   Currently we have to do a lot of cross-page handling, if we can ensure
   all of our metadata block are using folios, we can get rid of such
   cross-page checks (at a cost of possible higher chance hitting
   ENOMEM).

It looks like our current __filemap_get_folio() is still allocating new
folios using fixed order 0, thus it's not really possible for now.

Would it be possible in the future or it may need too much work for this
to work?
(Other than some folio order member in address_space?)

Thanks,
Qu

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is it possible to force an address_space to always allocate pages in specific order?
  2022-09-14  6:17 Is it possible to force an address_space to always allocate pages in specific order? Qu Wenruo
@ 2022-09-14 22:23 ` Matthew Wilcox
  2022-09-14 23:03   ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2022-09-14 22:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Linux Memory Management List, linux-btrfs

On Wed, Sep 14, 2022 at 02:17:24PM +0800, Qu Wenruo wrote:
> With recent folio MM changes, I'm wondering if it's possible to force an
> address space to always allocate a folio in certain order?

You're the second person to ask me about this today.  Well, actually,
the first because the other person asked me in-person after you sent
this email.

We have most of the infrastructure in place to do this now.  There
are some places still missing, such as allocating-pages-on-buffered-write.
I don't think any of them will be _hard_, we just need to do the work.

> E.g. For certain inode, we always allocate pages (folios) in the order
> of 2 for its page cache.
> 
> I'm asking this seemingly weird question for the following reasons:
> 
> - Support multi-page blocksize of various filesystems
>   Currently most file systems only go support sub-page, not multi-page
>   blocksize.
> 
>   Thus if there is forced order for all the address space, it would be
>   much easier to implement multi-page blocksize support.
>   (Although I strongly doubt if we need such multi-page blocksize
>    support for most fses)

It makes the MM people nervous when we *have* to do high-order
allocations.  For XFS, Dave Chinner has/had a patch set that uses base
page size to cache smaller pieces of larger blocks.  That approach works
for fs blocksize > page size, but doesn't work for storage LBA size >
page size.

It's definitely going to be easier to use large folios to solve your
use case, and since the page cache is usually a large part of the
memory consumption of a system, maybe it won't be as bad as the MM
people believe.

I have the beginnings of support for this (allowing the fs to set both a
minimum and maximum folio allocation order).  It's not tested, incomplete,
and as I mention above, it doesn't do the write-into-a-cache-miss
allocation.  Maybe there would also be other places that need to be
fixed too.  Would this API work for you?

(as you can see, i've been sitting on it for a while)

From 1aeee696f4d322af5f34544e39fc00006c399fb8 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Tue, 15 Dec 2020 10:57:34 -0500
Subject: [PATCH] fs: Allow fine-grained control of folio sizes

Some filesystems want to be able to limit the maximum size of folios,
and some want to be able to ensure that folios are at least a certain
size.  Add mapping_set_folio_orders() to allow this level of control
(although it is not yet honoured).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cad81db32e61..9cbb8bdbaee7 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -198,9 +198,15 @@ enum mapping_flags {
 	AS_EXITING	= 4, 	/* final truncate in progress */
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = 5,
-	AS_LARGE_FOLIO_SUPPORT = 6,
+	AS_FOLIO_ORDER_MIN = 8,
+	AS_FOLIO_ORDER_MAX = 13,
+	/* 8-17 are used for FOLIO_ORDER */
 };
 
+#define AS_FOLIO_ORDER_MIN_MASK	0x00001f00
+#define AS_FOLIO_ORDER_MAX_MASK 0x0002e000
+#define AS_FOLIO_ORDER_MASK (AS_FOLIO_ORDER_MIN_MASK | AS_FOLIO_ORDER_MAX_MASK)
+
 /**
  * mapping_set_error - record a writeback error in the address_space
  * @mapping: the mapping in which an error should be set
@@ -290,6 +296,29 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 	m->gfp_mask = mask;
 }
 
+/**
+ * mapping_set_folio_orders() - Set the range of folio sizes supported.
+ * @mapping: The file.
+ * @min: Minimum folio order (between 0-31 inclusive).
+ * @max: Maximum folio order (between 0-31 inclusive).
+ *
+ * The filesystem should call this function in its inode constructor to
+ * indicate which sizes of folio the VFS can use to cache the contents
+ * of the file.  This should only be used if the filesystem needs special
+ * handling of folio sizes (ie there is something the core cannot know).
+ * Do not tune it based on, eg, i_size.
+ * 
+ * Context: This should not be called while the inode is active as it
+ * is non-atomic.
+ */
+static inline void mapping_set_folio_orders(struct address_space *mapping,
+		unsigned int min, unsigned int max)
+{
+	mapping->flags = (mapping->flags & ~AS_FOLIO_ORDER_MASK) |
+			(min << AS_FOLIO_ORDER_MIN) |
+			(max << AS_FOLIO_ORDER_MAX);
+}
+
 /**
  * mapping_set_large_folios() - Indicate the file supports large folios.
  * @mapping: The file.
@@ -303,7 +332,12 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
  */
 static inline void mapping_set_large_folios(struct address_space *mapping)
 {
-	__set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+	mapping_set_folio_orders(mapping, 0, 31);
+}
+
+static inline unsigned mapping_max_folio_order(struct address_space *mapping)
+{
+	return (mapping->flags & AS_FOLIO_ORDER_MAX_MASK) >> AS_FOLIO_ORDER_MAX;
 }
 
 /*
@@ -312,8 +346,7 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
  */
 static inline bool mapping_large_folio_support(struct address_space *mapping)
 {
-	return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
-		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+	return mapping_max_folio_order(mapping) > 0;
 }
 
 static inline int filemap_nr_thps(struct address_space *mapping)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Is it possible to force an address_space to always allocate pages in specific order?
  2022-09-14 22:23 ` Matthew Wilcox
@ 2022-09-14 23:03   ` Qu Wenruo
  0 siblings, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2022-09-14 23:03 UTC (permalink / raw)
  To: Matthew Wilcox, Qu Wenruo; +Cc: Linux Memory Management List, linux-btrfs



On 2022/9/15 06:23, Matthew Wilcox wrote:
> On Wed, Sep 14, 2022 at 02:17:24PM +0800, Qu Wenruo wrote:
>> With recent folio MM changes, I'm wondering if it's possible to force an
>> address space to always allocate a folio in certain order?
> 
> You're the second person to ask me about this today.  Well, actually,
> the first because the other person asked me in-person after you sent
> this email.
> 
> We have most of the infrastructure in place to do this now.  There
> are some places still missing, such as allocating-pages-on-buffered-write.
> I don't think any of them will be _hard_, we just need to do the work.
> 
>> E.g. For certain inode, we always allocate pages (folios) in the order
>> of 2 for its page cache.
>>
>> I'm asking this seemingly weird question for the following reasons:
>>
>> - Support multi-page blocksize of various filesystems
>>    Currently most file systems only go support sub-page, not multi-page
>>    blocksize.
>>
>>    Thus if there is forced order for all the address space, it would be
>>    much easier to implement multi-page blocksize support.
>>    (Although I strongly doubt if we need such multi-page blocksize
>>     support for most fses)
> 
> It makes the MM people nervous when we *have* to do high-order
> allocations.  For XFS, Dave Chinner has/had a patch set that uses base
> page size to cache smaller pieces of larger blocks.  That approach works
> for fs blocksize > page size, but doesn't work for storage LBA size >
> page size.
> 
> It's definitely going to be easier to use large folios to solve your
> use case, and since the page cache is usually a large part of the
> memory consumption of a system, maybe it won't be as bad as the MM
> people believe.
> 
> I have the beginnings of support for this (allowing the fs to set both a
> minimum and maximum folio allocation order).  It's not tested, incomplete,
> and as I mention above, it doesn't do the write-into-a-cache-miss
> allocation.  Maybe there would also be other places that need to be
> fixed too.  Would this API work for you?

That is already the perfect interface for btrfs metadata at least.
(Although I still need to do more digging and testing for btrfs to
  migrate to folio interface, other than the compat one)

The point of btrfs metadata address space is, all of its page cache is 
pre-allocated before any read/write.

And that address space is only internally used, thus we will never go 
into write-into-a-cache-miss case.

Thanks,
Qu

> 
> (as you can see, i've been sitting on it for a while)
> 
>  From 1aeee696f4d322af5f34544e39fc00006c399fb8 Mon Sep 17 00:00:00 2001
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Date: Tue, 15 Dec 2020 10:57:34 -0500
> Subject: [PATCH] fs: Allow fine-grained control of folio sizes
> 
> Some filesystems want to be able to limit the maximum size of folios,
> and some want to be able to ensure that folios are at least a certain
> size.  Add mapping_set_folio_orders() to allow this level of control
> (although it is not yet honoured).
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/pagemap.h | 41 +++++++++++++++++++++++++++++++++++++----
>   1 file changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index cad81db32e61..9cbb8bdbaee7 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -198,9 +198,15 @@ enum mapping_flags {
>   	AS_EXITING	= 4, 	/* final truncate in progress */
>   	/* writeback related tags are not used */
>   	AS_NO_WRITEBACK_TAGS = 5,
> -	AS_LARGE_FOLIO_SUPPORT = 6,
> +	AS_FOLIO_ORDER_MIN = 8,
> +	AS_FOLIO_ORDER_MAX = 13,
> +	/* 8-17 are used for FOLIO_ORDER */
>   };
>   
> +#define AS_FOLIO_ORDER_MIN_MASK	0x00001f00
> +#define AS_FOLIO_ORDER_MAX_MASK 0x0002e000
> +#define AS_FOLIO_ORDER_MASK (AS_FOLIO_ORDER_MIN_MASK | AS_FOLIO_ORDER_MAX_MASK)
> +
>   /**
>    * mapping_set_error - record a writeback error in the address_space
>    * @mapping: the mapping in which an error should be set
> @@ -290,6 +296,29 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
>   	m->gfp_mask = mask;
>   }
>   
> +/**
> + * mapping_set_folio_orders() - Set the range of folio sizes supported.
> + * @mapping: The file.
> + * @min: Minimum folio order (between 0-31 inclusive).
> + * @max: Maximum folio order (between 0-31 inclusive).
> + *
> + * The filesystem should call this function in its inode constructor to
> + * indicate which sizes of folio the VFS can use to cache the contents
> + * of the file.  This should only be used if the filesystem needs special
> + * handling of folio sizes (ie there is something the core cannot know).
> + * Do not tune it based on, eg, i_size.
> + *
> + * Context: This should not be called while the inode is active as it
> + * is non-atomic.
> + */
> +static inline void mapping_set_folio_orders(struct address_space *mapping,
> +		unsigned int min, unsigned int max)
> +{
> +	mapping->flags = (mapping->flags & ~AS_FOLIO_ORDER_MASK) |
> +			(min << AS_FOLIO_ORDER_MIN) |
> +			(max << AS_FOLIO_ORDER_MAX);
> +}
> +
>   /**
>    * mapping_set_large_folios() - Indicate the file supports large folios.
>    * @mapping: The file.
> @@ -303,7 +332,12 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
>    */
>   static inline void mapping_set_large_folios(struct address_space *mapping)
>   {
> -	__set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
> +	mapping_set_folio_orders(mapping, 0, 31);
> +}
> +
> +static inline unsigned mapping_max_folio_order(struct address_space *mapping)
> +{
> +	return (mapping->flags & AS_FOLIO_ORDER_MAX_MASK) >> AS_FOLIO_ORDER_MAX;
>   }
>   
>   /*
> @@ -312,8 +346,7 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
>    */
>   static inline bool mapping_large_folio_support(struct address_space *mapping)
>   {
> -	return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> -		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
> +	return mapping_max_folio_order(mapping) > 0;
>   }
>   
>   static inline int filemap_nr_thps(struct address_space *mapping)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-14 23:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-14  6:17 Is it possible to force an address_space to always allocate pages in specific order? Qu Wenruo
2022-09-14 22:23 ` Matthew Wilcox
2022-09-14 23:03   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.