linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-03-04  1:21 Mike Kravetz
  2015-03-04  1:21 ` [PATCH 1/4] hugetlbfs: add reserved mount fields to subpool structure Mike Kravetz
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Mike Kravetz @ 2015-03-04  1:21 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

hugetlbfs allocates huge pages from the global pool as needed.  Even if
the global pool contains a sufficient number pages for the filesystem
size at mount time, those global pages could be grabbed for some other
use.  As a result, filesystem huge page allocations may fail due to lack
of pages.

Applications such as a database want to use huge pages for performance
reasons.  hugetlbfs filesystem semantics with ownership and modes work
well to manage access to a pool of huge pages.  However, the application
would like some reasonable assurance that allocations will not fail due
to a lack of huge pages.  At application startup time, the application
would like to configure itself to use a specific number of huge pages.
Before starting, the application will can check to make sure that enough
huge pages exist in the system global pools.  What the application wants
is exclusive use of a subpool of huge pages. 

Add a new hugetlbfs mount option 'reserved' to specify that the number
of pages associated with the size of the filesystem will be reserved.  If
there are insufficient pages, the mount will fail.  The reservation is
maintained for the duration of the filesystem so that as pages are
allocated and free'ed a sufficient number of pages remains reserved.

Comments from RFC addressed/incorporated

Mike Kravetz (4):
  hugetlbfs: add reserved mount fields to subpool structure
  hugetlbfs: coordinate global and subpool reserve accounting
  hugetlbfs: accept subpool reserved option and setup accordingly
  hugetlbfs: document reserved mount option

 Documentation/vm/hugetlbpage.txt | 18 ++++++++------
 fs/hugetlbfs/inode.c             | 15 ++++++++++--
 include/linux/hugetlb.h          |  7 ++++++
 mm/hugetlb.c                     | 53 +++++++++++++++++++++++++++++++++-------
 4 files changed, 75 insertions(+), 18 deletions(-)

-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] hugetlbfs: add reserved mount fields to subpool structure
  2015-03-04  1:21 [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
@ 2015-03-04  1:21 ` Mike Kravetz
  2015-03-04  1:21 ` [PATCH 2/4] hugetlbfs: coordinate global and subpool reserve accounting Mike Kravetz
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Kravetz @ 2015-03-04  1:21 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Add a boolean to the subpool structure to indicate that the pages for
subpool have been reserved.  The hstate pointer in the subpool is
convenient to have when reserving and unreserving the pages.
hugepage_subool_reserved() is a handy way to check if reserved and
take into account a NULL subpool.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h | 6 ++++++
 mm/hugetlb.c            | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 431b7fc..12fbd5d 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -23,6 +23,8 @@ struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
 	long max_hpages, used_hpages;
+	struct hstate *hstate;
+	bool reserved;
 };
 
 struct resv_map {
@@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
 #define for_each_hstate(h) \
 	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
+static inline bool hugepage_subpool_reserved(struct hugepage_subpool *spool)
+{
+	return spool && spool->reserved;
+}
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 85032de..c6adf65 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -85,6 +85,8 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
 	spool->used_hpages = 0;
+	spool->hstate = NULL;
+	spool->reserved = false;
 
 	return spool;
 }
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] hugetlbfs: coordinate global and subpool reserve accounting
  2015-03-04  1:21 [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
  2015-03-04  1:21 ` [PATCH 1/4] hugetlbfs: add reserved mount fields to subpool structure Mike Kravetz
@ 2015-03-04  1:21 ` Mike Kravetz
  2015-03-04  1:21 ` [PATCH 3/4] hugetlbfs: accept subpool reserved option and setup accordingly Mike Kravetz
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Kravetz @ 2015-03-04  1:21 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

If the pages for a subpool are reserved, then the reservations
have already been accounted for in the global pool(at mount time).
Therefore, when requesting a new reservation (such as for a
mapping) do not adjust the global reserve count.  Also, when
simply unreserving pages for the subpool do not adjust the global
count.  However, when actually allocating or freeing a hugepage
be sure to adjust the global reserve count so that it corresponds
with the global free count.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c6adf65..394bd8f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -879,7 +879,11 @@ void free_huge_page(struct page *page)
 	spin_lock(&hugetlb_lock);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
-	if (restore_reserve)
+	/*
+	 * When a hugepage in a reserved subpool is free'ed, the global
+	 * reserve count must be adjusted along with the global free count.
+	 */
+	if (restore_reserve || hugepage_subpool_reserved(spool))
 		h->resv_huge_pages++;
 
 	if (h->surplus_huge_pages_node[nid]) {
@@ -2466,7 +2470,12 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	kref_put(&resv->refs, resv_map_release);
 
 	if (reserve) {
-		hugetlb_acct_memory(h, -reserve);
+		/*
+		 * For reserved subpools, global reservation counts are
+		 * only adjusted at hugepage alloc and free time.
+		 */
+		if (!hugepage_subpool_reserved(spool))
+			hugetlb_acct_memory(h, -reserve);
 		hugepage_subpool_put_pages(spool, reserve);
 	}
 }
@@ -3442,12 +3451,18 @@ int hugetlb_reserve_pages(struct inode *inode,
 
 	/*
 	 * Check enough hugepages are available for the reservation.
-	 * Hand the pages back to the subpool if there are not
+	 * Hand the pages back to the subpool if there are not.  If
+	 * the entire subpool was reserved, we know there are enough
+	 * hugepages and the global count already reflects the reservation.
 	 */
-	ret = hugetlb_acct_memory(h, chg);
-	if (ret < 0) {
-		hugepage_subpool_put_pages(spool, chg);
-		goto out_err;
+	if (hugepage_subpool_reserved(spool))
+		ret = 0;
+	else {
+		ret = hugetlb_acct_memory(h, chg);
+		if (ret < 0) {
+			hugepage_subpool_put_pages(spool, chg);
+			goto out_err;
+		}
 	}
 
 	/*
@@ -3483,7 +3498,12 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	inode->i_blocks -= (blocks_per_huge_page(h) * freed);
 	spin_unlock(&inode->i_lock);
 
-	hugepage_subpool_put_pages(spool, (chg - freed));
+	/*
+	 * For reserved subpools, global reservation counts are only
+	 * adjusted at hugepage alloc and free time.
+	 */
+	if (!hugepage_subpool_reserved(spool))
+		hugepage_subpool_put_pages(spool, (chg - freed));
 	hugetlb_acct_memory(h, -(chg - freed));
 }
 
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] hugetlbfs: accept subpool reserved option and setup accordingly
  2015-03-04  1:21 [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
  2015-03-04  1:21 ` [PATCH 1/4] hugetlbfs: add reserved mount fields to subpool structure Mike Kravetz
  2015-03-04  1:21 ` [PATCH 2/4] hugetlbfs: coordinate global and subpool reserve accounting Mike Kravetz
@ 2015-03-04  1:21 ` Mike Kravetz
  2015-03-04  1:21 ` [PATCH 4/4] hugetlbfs: document reserved mount option Mike Kravetz
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Kravetz @ 2015-03-04  1:21 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Make reserved be an option when mounting a hugetlbfs.  reserved
option is only possible if size option is also specified, otherwise
the mount will fail.  On mount, reserve size hugepages from the
global pool and note in subpool.  Unreserve hugepages when fs
is unmounted.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c    | 15 +++++++++++++--
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            | 15 ++++++++++++++-
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 5eba47f..10443c3 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -50,6 +50,7 @@ struct hugetlbfs_config {
 	long	nr_blocks;
 	long	nr_inodes;
 	struct hstate *hstate;
+	bool	reserved;
 };
 
 struct hugetlbfs_inode_info {
@@ -73,7 +74,7 @@ int sysctl_hugetlb_shm_group;
 enum {
 	Opt_size, Opt_nr_inodes,
 	Opt_mode, Opt_uid, Opt_gid,
-	Opt_pagesize,
+	Opt_pagesize, Opt_reserved,
 	Opt_err,
 };
 
@@ -84,6 +85,7 @@ static const match_table_t tokens = {
 	{Opt_uid,	"uid=%u"},
 	{Opt_gid,	"gid=%u"},
 	{Opt_pagesize,	"pagesize=%s"},
+	{Opt_reserved,	"reserved"},
 	{Opt_err,	NULL},
 };
 
@@ -832,6 +834,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 			break;
 		}
 
+		case Opt_reserved:
+			pconfig->reserved = true;
+			break;
+
 		default:
 			pr_err("Bad mount option: \"%s\"\n", p);
 			return -EINVAL;
@@ -872,6 +878,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 	config.gid = current_fsgid();
 	config.mode = 0755;
 	config.hstate = &default_hstate;
+	config.reserved = false;
 	ret = hugetlbfs_parse_options(data, &config);
 	if (ret)
 		return ret;
@@ -889,7 +896,11 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 		sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
 		if (!sbinfo->spool)
 			goto out_free;
-	}
+		sbinfo->spool->hstate = config.hstate;
+		if (config.reserved && !hugepage_reserve_subpool(sbinfo->spool))
+			goto out_free;
+	} else if (config.reserved)
+		goto out_free;	/* error if reserved and no size specified */
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sb->s_blocksize = huge_page_size(config.hstate);
 	sb->s_blocksize_bits = huge_page_shift(config.hstate);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 12fbd5d..74cffa4 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -45,6 +45,7 @@ static inline bool hugepage_subpool_reserved(struct hugepage_subpool *spool)
 	return spool && spool->reserved;
 }
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
+bool hugepage_reserve_subpool(struct hugepage_subpool *spool);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
 int PageHuge(struct page *page);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 394bd8f..941c726 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -61,6 +61,8 @@ DEFINE_SPINLOCK(hugetlb_lock);
 static int num_fault_mutexes;
 static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp;
 
+/* Forward declaration */
+static int hugetlb_acct_memory(struct hstate *h, long delta);
 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 {
 	bool free = (spool->count == 0) && (spool->used_hpages == 0);
@@ -69,8 +71,11 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 
 	/* If no pages are used, and no other handles to the subpool
 	 * remain, free the subpool the subpool remain */
-	if (free)
+	if (free) {
+		if (spool->reserved)
+			hugetlb_acct_memory(spool->hstate, -spool->max_hpages);
 		kfree(spool);
+	}
 }
 
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
@@ -91,6 +96,14 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	return spool;
 }
 
+bool hugepage_reserve_subpool(struct hugepage_subpool *spool)
+{
+	if (hugetlb_acct_memory(spool->hstate, spool->max_hpages))
+		return false;
+	spool->reserved = true;
+	return true;
+}
+
 void hugepage_put_subpool(struct hugepage_subpool *spool)
 {
 	spin_lock(&spool->lock);
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] hugetlbfs: document reserved mount option
  2015-03-04  1:21 [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
                   ` (2 preceding siblings ...)
  2015-03-04  1:21 ` [PATCH 3/4] hugetlbfs: accept subpool reserved option and setup accordingly Mike Kravetz
@ 2015-03-04  1:21 ` Mike Kravetz
  2015-03-04  5:49 ` [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time David Rientjes
  2015-03-06 22:13 ` Andi Kleen
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Kravetz @ 2015-03-04  1:21 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Update documentation for the hugetlbfs reserved mount option.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 Documentation/vm/hugetlbpage.txt | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
index f2d3a10..1d88bfb 100644
--- a/Documentation/vm/hugetlbpage.txt
+++ b/Documentation/vm/hugetlbpage.txt
@@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of
 type hugetlbfs:
 
   mount -t hugetlbfs \
-	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,nr_inodes=<value> \
-	none /mnt/huge
+	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,reserved,\
+	nr_inodes=<value> none /mnt/huge
 
 This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
 /mnt/huge.  Any files created on /mnt/huge uses huge pages.  The uid and gid
@@ -277,11 +277,15 @@ the uid and gid of the current process are taken.  The mode option sets the
 mode of root of file system to value & 01777.  This value is given in octal.
 By default the value 0755 is picked. The size option sets the maximum value of
 memory (huge pages) allowed for that filesystem (/mnt/huge). The size is
-rounded down to HPAGE_SIZE.  The option nr_inodes sets the maximum number of
-inodes that /mnt/huge can use.  If the size or nr_inodes option is not
-provided on command line then no limits are set.  For size and nr_inodes
-options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For
-example, size=2K has the same meaning as size=2048.
+rounded down to HPAGE_SIZE.  If the size option is specified, the reserved
+option may also be specified to reserve the number of huge pages required for
+the maximum filesystem size.  This number of huge pages is reserved at mount
+time and will be available for exclusive use by the filesystem.  If not enough
+huge pages are available, the mount will fail.  The option nr_inodes sets
+the maximum number of inodes that /mnt/huge can use.  If the size or nr_inodes
+option is not provided on command line then no limits are set.  For size and
+nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo.
+For example, size=2K has the same meaning as size=2048.
 
 While read system calls are supported on files that reside on hugetlb
 file systems, write system calls are not.
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-04  1:21 [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
                   ` (3 preceding siblings ...)
  2015-03-04  1:21 ` [PATCH 4/4] hugetlbfs: document reserved mount option Mike Kravetz
@ 2015-03-04  5:49 ` David Rientjes
  2015-03-04 17:21   ` Mike Kravetz
  2015-03-06 22:13 ` Andi Kleen
  5 siblings, 1 reply; 9+ messages in thread
From: David Rientjes @ 2015-03-04  5:49 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Tue, 3 Mar 2015, Mike Kravetz wrote:

> hugetlbfs allocates huge pages from the global pool as needed.  Even if
> the global pool contains a sufficient number pages for the filesystem
> size at mount time, those global pages could be grabbed for some other
> use.  As a result, filesystem huge page allocations may fail due to lack
> of pages.
> 
> Applications such as a database want to use huge pages for performance
> reasons.  hugetlbfs filesystem semantics with ownership and modes work
> well to manage access to a pool of huge pages.  However, the application
> would like some reasonable assurance that allocations will not fail due
> to a lack of huge pages.  At application startup time, the application
> would like to configure itself to use a specific number of huge pages.
> Before starting, the application will can check to make sure that enough
> huge pages exist in the system global pools.  What the application wants
> is exclusive use of a subpool of huge pages. 
> 
> Add a new hugetlbfs mount option 'reserved' to specify that the number
> of pages associated with the size of the filesystem will be reserved.  If
> there are insufficient pages, the mount will fail.  The reservation is
> maintained for the duration of the filesystem so that as pages are
> allocated and free'ed a sufficient number of pages remains reserved.
> 

This functionality is somewhat limited because it's not possible to 
reserve a subset of the size for a single mount point, it's either all or 
nothing.  It shouldn't be too difficult to just add a reserved=<value> 
option where <value> is <= size.  If it's done that way, you should be 
able to omit size= entirely for unlimited hugepages but always ensure that 
a low watermark of hugepages are reserved for the database.

> Comments from RFC addressed/incorporated
> 
> Mike Kravetz (4):
>   hugetlbfs: add reserved mount fields to subpool structure
>   hugetlbfs: coordinate global and subpool reserve accounting
>   hugetlbfs: accept subpool reserved option and setup accordingly
>   hugetlbfs: document reserved mount option
> 
>  Documentation/vm/hugetlbpage.txt | 18 ++++++++------
>  fs/hugetlbfs/inode.c             | 15 ++++++++++--
>  include/linux/hugetlb.h          |  7 ++++++
>  mm/hugetlb.c                     | 53 +++++++++++++++++++++++++++++++++-------
>  4 files changed, 75 insertions(+), 18 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-04  5:49 ` [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time David Rientjes
@ 2015-03-04 17:21   ` Mike Kravetz
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Kravetz @ 2015-03-04 17:21 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On 03/03/2015 09:49 PM, David Rientjes wrote:
> On Tue, 3 Mar 2015, Mike Kravetz wrote:
>> Add a new hugetlbfs mount option 'reserved' to specify that the number
>> of pages associated with the size of the filesystem will be reserved.  If
>> there are insufficient pages, the mount will fail.  The reservation is
>> maintained for the duration of the filesystem so that as pages are
>> allocated and free'ed a sufficient number of pages remains reserved.
>>
>
> This functionality is somewhat limited because it's not possible to
> reserve a subset of the size for a single mount point, it's either all or
> nothing.  It shouldn't be too difficult to just add a reserved=<value>
> option where <value> is <= size.  If it's done that way, you should be
> able to omit size= entirely for unlimited hugepages but always ensure that
> a low watermark of hugepages are reserved for the database.

Thanks, I like that suggestion.  You are correct in that it should not
be too difficult to pass in a size for reserved.  I'll work on the
modification.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-04  1:21 [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
                   ` (4 preceding siblings ...)
  2015-03-04  5:49 ` [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time David Rientjes
@ 2015-03-06 22:13 ` Andi Kleen
  2015-03-06 22:30   ` Mike Kravetz
  5 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2015-03-06 22:13 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

Mike Kravetz <mike.kravetz@oracle.com> writes:

> hugetlbfs allocates huge pages from the global pool as needed.  Even if
> the global pool contains a sufficient number pages for the filesystem
> size at mount time, those global pages could be grabbed for some other
> use.  As a result, filesystem huge page allocations may fail due to lack
> of pages.


What's the difference of this new option to simply doing

mount -t hugetlbfs none /huge
echo XXX > /proc/sys/vm/nr_hugepages

?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-06 22:13 ` Andi Kleen
@ 2015-03-06 22:30   ` Mike Kravetz
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Kravetz @ 2015-03-06 22:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-mm, linux-kernel, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On 03/06/2015 02:13 PM, Andi Kleen wrote:
> Mike Kravetz <mike.kravetz@oracle.com> writes:
>
>> hugetlbfs allocates huge pages from the global pool as needed.  Even if
>> the global pool contains a sufficient number pages for the filesystem
>> size at mount time, those global pages could be grabbed for some other
>> use.  As a result, filesystem huge page allocations may fail due to lack
>> of pages.
>
>
> What's the difference of this new option to simply doing
>
> mount -t hugetlbfs none /huge
> echo XXX > /proc/sys/vm/nr_hugepages

In the above sequence, it is still possible for another user/application
to allocate some (or all) of the XXX huge pages.  There is no guarantee
that users of the filesystem will get all XXX pages.

I see the use of the reserve option to be:
# Make sure there are XXX huge pages in the global pool
echo XXX > /proc/sys/vm/nr_hugepages
# Mount/create the filesystem and reserve XXX huge pages
mount -t hugetlbfs -o size=XXX,reserve=XXX none /huge

If the mount is successful, then users of the filesystem know their are
XXX huge pages available for their use.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-06 22:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-04  1:21 [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
2015-03-04  1:21 ` [PATCH 1/4] hugetlbfs: add reserved mount fields to subpool structure Mike Kravetz
2015-03-04  1:21 ` [PATCH 2/4] hugetlbfs: coordinate global and subpool reserve accounting Mike Kravetz
2015-03-04  1:21 ` [PATCH 3/4] hugetlbfs: accept subpool reserved option and setup accordingly Mike Kravetz
2015-03-04  1:21 ` [PATCH 4/4] hugetlbfs: document reserved mount option Mike Kravetz
2015-03-04  5:49 ` [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time David Rientjes
2015-03-04 17:21   ` Mike Kravetz
2015-03-06 22:13 ` Andi Kleen
2015-03-06 22:30   ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).