linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/4]  hugetlbfs: add min_size filesystem mount option
@ 2015-03-16 23:53 Mike Kravetz
  2015-03-16 23:53 ` [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Mike Kravetz
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Mike Kravetz @ 2015-03-16 23:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

hugetlbfs allocates huge pages from the global pool as needed.  Even if
the global pool contains a sufficient number pages for the filesystem
size at mount time, those global pages could be grabbed for some other
use.  As a result, filesystem huge page allocations may fail due to lack
of pages.

Applications such as a database want to use huge pages for performance
reasons.  hugetlbfs filesystem semantics with ownership and modes work
well to manage access to a pool of huge pages.  However, the application
would like some reasonable assurance that allocations will not fail due
to a lack of huge pages.  At application startup time, the application
would like to configure itself to use a specific number of huge pages.
Before starting, the application can check to make sure that enough huge
pages exist in the system global pools.  However, there are no guarantees
that those pages will be available when needed by the application.  What
the application wants is exclusive use of a subset of huge pages.

Add a new hugetlbfs mount option 'min_size=<value>' to indicate that
the specified number of pages will be available for use by the filesystem.
At mount time, this number of huge pages will be reserved for exclusive
use of the filesystem.  If there is not a sufficient number of free pages,
the mount will fail.  As pages are allocated to and freeed from the
filesystem, the number of reserved pages is adjusted so that the specified
minimum is maintained.

V2:
  Added ability to specify minimum size. (David Rientjes)
V1:
  Comments from RFC addressed/incorporated

Mike Kravetz (4):
  hugetlbfs: add minimum size tracking fields to subpool structure
  hugetlbfs: add minimum size accounting to subpools
  hugetlbfs: accept subpool min_size mount option and setup accordingly
  hugetlbfs: document min_size mount option

 Documentation/vm/hugetlbpage.txt |  21 ++++--
 fs/hugetlbfs/inode.c             |  75 ++++++++++++++++-----
 include/linux/hugetlb.h          |   5 +-
 mm/hugetlb.c                     | 138 ++++++++++++++++++++++++++++++++-------
 4 files changed, 190 insertions(+), 49 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure
  2015-03-16 23:53 [PATCH V2 0/4] hugetlbfs: add min_size filesystem mount option Mike Kravetz
@ 2015-03-16 23:53 ` Mike Kravetz
  2015-03-18 21:25   ` Andrew Morton
  2015-03-16 23:53 ` [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Mike Kravetz
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2015-03-16 23:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Add a field to the subpool structure to indicate the minimimum
number of huge pages to always be used by this subpool.  This
minimum count includes allocated pages as well as reserved pages.
If the minimum number of pages for the subpool have not been
allocated, pages are reserved up to this minimum.  An additional
field (rsv_hpages) is used to track the number of pages reserved
to meet this minimum size.  The hstate pointer in the subpool
is convenient to have when reserving and unreserving the pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h | 2 ++
 mm/hugetlb.c            | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 431b7fc..cfe13fd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -23,6 +23,8 @@ struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
 	long max_hpages, used_hpages;
+	struct hstate *hstate;
+	long min_hpages, rsv_hpages;
 };
 
 struct resv_map {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 85032de..07b7226 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -85,6 +85,9 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
 	spool->used_hpages = 0;
+	spool->hstate = NULL;
+	spool->min_hpages = 0;
+	spool->rsv_hpages = 0;
 
 	return spool;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools
  2015-03-16 23:53 [PATCH V2 0/4] hugetlbfs: add min_size filesystem mount option Mike Kravetz
  2015-03-16 23:53 ` [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Mike Kravetz
@ 2015-03-16 23:53 ` Mike Kravetz
  2015-03-18 21:43   ` Andrew Morton
  2015-03-16 23:53 ` [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Mike Kravetz
  2015-03-16 23:53 ` [PATCH V2 4/4] hugetlbfs: document min_size mount option Mike Kravetz
  3 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2015-03-16 23:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

The same routines that perform subpool maximum size accounting
hugepage_subpool_get/put_pages() are modified to also perform
minimum size accounting.  When a delta value is passed to these
routines, calculate how global reservations must be adjusted
to maintain the subpool minimum size.  The routines now return
this global reserve count adjustment.  This global adjusted
reserve count is then passed to the global accounting routine
hugetlb_acct_memory().

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 94 insertions(+), 21 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 07b7226..ab2ea1e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -100,36 +100,85 @@ void hugepage_put_subpool(struct hugepage_subpool *spool)
 	unlock_or_release_subpool(spool);
 }
 
-static int hugepage_subpool_get_pages(struct hugepage_subpool *spool,
+/*
+ * subpool accounting for allocating and reserving pages
+ * return -ENOMEM if there are not enough resources to satisfy the
+ * the request.  Otherwise, return the number of pages by which the
+ * global pools must be adjusted (upward).  The returned value may
+ * only be different than the passed value (delta) in the case where
+ * a subpool minimum size must be manitained.
+ */
+static long hugepage_subpool_get_pages(struct hugepage_subpool *spool,
 				      long delta)
 {
-	int ret = 0;
+	long ret = delta;
 
 	if (!spool)
-		return 0;
+		return ret;
 
 	spin_lock(&spool->lock);
-	if ((spool->used_hpages + delta) <= spool->max_hpages) {
-		spool->used_hpages += delta;
-	} else {
-		ret = -ENOMEM;
+
+	if (spool->max_hpages != -1) {		/* maximum size accounting */
+		if ((spool->used_hpages + delta) <= spool->max_hpages)
+			spool->used_hpages += delta;
+		else {
+			ret = -ENOMEM;
+			goto unlock_ret;
+		}
+	}
+
+	if (spool->min_hpages) {		/* minimum size accounting */
+		if (delta > spool->rsv_hpages) {
+			/* asking for more reserves than those already taken
+			 * on behalf of subpool. return difference */
+			ret = delta - spool->rsv_hpages;
+			spool->rsv_hpages = 0;
+		} else {
+			ret = 0;	/* reserves already accounted for */
+			spool->rsv_hpages -= delta;
+		}
 	}
-	spin_unlock(&spool->lock);
 
+unlock_ret:
+	spin_unlock(&spool->lock);
 	return ret;
 }
 
-static void hugepage_subpool_put_pages(struct hugepage_subpool *spool,
+/*
+ * subpool accounting for freeing and unreserving pages
+ * Return the number of global page reservations that must be dropped.
+ * The return value may only be different than the passed value (delta)
+ * in the case where a subpool minimum size must be maintained.
+ */
+static long hugepage_subpool_put_pages(struct hugepage_subpool *spool,
 				       long delta)
 {
+	long ret = delta;
+
 	if (!spool)
-		return;
+		return delta;
 
 	spin_lock(&spool->lock);
-	spool->used_hpages -= delta;
+
+	if (spool->max_hpages != -1)		/* maximum size accounting */
+		spool->used_hpages -= delta;
+
+	if (spool->min_hpages) {		/* minimum size accounting */
+		if (spool->rsv_hpages + delta <= spool->min_hpages)
+			ret = 0;
+		else
+			ret = spool->rsv_hpages + delta - spool->min_hpages;
+
+		spool->rsv_hpages += delta;
+		if (spool->rsv_hpages > spool->min_hpages)
+			spool->rsv_hpages = spool->min_hpages;
+	}
+
 	/* If hugetlbfs_put_super couldn't free spool due to
 	* an outstanding quota reference, free it now. */
 	unlock_or_release_subpool(spool);
+
+	return ret;
 }
 
 static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
@@ -877,6 +926,14 @@ void free_huge_page(struct page *page)
 	restore_reserve = PagePrivate(page);
 	ClearPagePrivate(page);
 
+	/*
+	 * A return code of zero implies that the subpool will be under
+	 * it's minimum size if the reservation is not restored after
+	 * page is free.  Therefore, force restore_reserve operation.
+	 */
+	if (hugepage_subpool_put_pages(spool, 1) == 0)
+		restore_reserve = true;
+
 	spin_lock(&hugetlb_lock);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
@@ -894,7 +951,6 @@ void free_huge_page(struct page *page)
 		enqueue_huge_page(h, page);
 	}
 	spin_unlock(&hugetlb_lock);
-	hugepage_subpool_put_pages(spool, 1);
 }
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
@@ -1387,7 +1443,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (chg < 0)
 		return ERR_PTR(-ENOMEM);
 	if (chg || avoid_reserve)
-		if (hugepage_subpool_get_pages(spool, 1))
+		if (hugepage_subpool_get_pages(spool, 1) < 0)
 			return ERR_PTR(-ENOSPC);
 
 	ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
@@ -2455,6 +2511,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	struct resv_map *resv = vma_resv_map(vma);
 	struct hugepage_subpool *spool = subpool_vma(vma);
 	unsigned long reserve, start, end;
+	long gbl_reserve;
 
 	if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return;
@@ -2467,8 +2524,12 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	kref_put(&resv->refs, resv_map_release);
 
 	if (reserve) {
-		hugetlb_acct_memory(h, -reserve);
-		hugepage_subpool_put_pages(spool, reserve);
+		/*
+		 * decrement reserve counts.  The global reserve count
+		 * may be adjusted if the subpool has a minimum size.
+		 */
+		gbl_reserve = hugepage_subpool_put_pages(spool, reserve);
+		hugetlb_acct_memory(h, -gbl_reserve);
 	}
 }
 
@@ -3399,6 +3460,7 @@ int hugetlb_reserve_pages(struct inode *inode,
 	struct hstate *h = hstate_inode(inode);
 	struct hugepage_subpool *spool = subpool_inode(inode);
 	struct resv_map *resv_map;
+	long gbl_reserve;
 
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
@@ -3435,8 +3497,13 @@ int hugetlb_reserve_pages(struct inode *inode,
 		goto out_err;
 	}
 
-	/* There must be enough pages in the subpool for the mapping */
-	if (hugepage_subpool_get_pages(spool, chg)) {
+	/*
+	 * There must be enough pages in the subpool for the mapping. If
+	 * the subpool has a minimum size, there may be some global
+	 * reservations already in place (gbl_reserve).
+	 */
+	gbl_reserve = hugepage_subpool_get_pages(spool, chg);
+	if (gbl_reserve < 0) {
 		ret = -ENOSPC;
 		goto out_err;
 	}
@@ -3445,9 +3512,10 @@ int hugetlb_reserve_pages(struct inode *inode,
 	 * Check enough hugepages are available for the reservation.
 	 * Hand the pages back to the subpool if there are not
 	 */
-	ret = hugetlb_acct_memory(h, chg);
+	ret = hugetlb_acct_memory(h, gbl_reserve);
 	if (ret < 0) {
-		hugepage_subpool_put_pages(spool, chg);
+		/* put back original number of pages, chg */
+		(void)hugepage_subpool_put_pages(spool, chg);
 		goto out_err;
 	}
 
@@ -3477,6 +3545,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	struct resv_map *resv_map = inode_resv_map(inode);
 	long chg = 0;
 	struct hugepage_subpool *spool = subpool_inode(inode);
+	long gbl_reserve;
 
 	if (resv_map)
 		chg = region_truncate(resv_map, offset);
@@ -3484,8 +3553,12 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	inode->i_blocks -= (blocks_per_huge_page(h) * freed);
 	spin_unlock(&inode->i_lock);
 
-	hugepage_subpool_put_pages(spool, (chg - freed));
-	hugetlb_acct_memory(h, -(chg - freed));
+	/*
+	 * If the subpool has a minimum size, the number of global
+	 * reservations to be released may be adjusted.
+	 */
+	gbl_reserve = hugepage_subpool_put_pages(spool, (chg - freed));
+	hugetlb_acct_memory(h, -gbl_reserve);
 }
 
 #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly
  2015-03-16 23:53 [PATCH V2 0/4] hugetlbfs: add min_size filesystem mount option Mike Kravetz
  2015-03-16 23:53 ` [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Mike Kravetz
  2015-03-16 23:53 ` [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Mike Kravetz
@ 2015-03-16 23:53 ` Mike Kravetz
  2015-03-18 21:40   ` Andrew Morton
  2015-03-16 23:53 ` [PATCH V2 4/4] hugetlbfs: document min_size mount option Mike Kravetz
  3 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2015-03-16 23:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Make 'min_size=' be an option when mounting a hugetlbfs.  This option
takes the same value as the 'size' option.  min_size can be specified
with specifying size.  If both are specified, min_size must be less
that or equal to size else the mount will fail.  If min_size is
specified, then at mount time an attempt is made to reserve min_size
pages.  If the reservation fails, the mount fails.  At umount time,
the reserved pages are released.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c    | 75 ++++++++++++++++++++++++++++++++++++++-----------
 include/linux/hugetlb.h |  3 +-
 mm/hugetlb.c            | 26 +++++++++++++----
 3 files changed, 80 insertions(+), 24 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 5eba47f..7a20a1b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -50,6 +50,7 @@ struct hugetlbfs_config {
 	long	nr_blocks;
 	long	nr_inodes;
 	struct hstate *hstate;
+	long    min_size;
 };
 
 struct hugetlbfs_inode_info {
@@ -73,7 +74,7 @@ int sysctl_hugetlb_shm_group;
 enum {
 	Opt_size, Opt_nr_inodes,
 	Opt_mode, Opt_uid, Opt_gid,
-	Opt_pagesize,
+	Opt_pagesize, Opt_min_size,
 	Opt_err,
 };
 
@@ -84,6 +85,7 @@ static const match_table_t tokens = {
 	{Opt_uid,	"uid=%u"},
 	{Opt_gid,	"gid=%u"},
 	{Opt_pagesize,	"pagesize=%s"},
+	{Opt_min_size,	"min_size=%s"},
 	{Opt_err,	NULL},
 };
 
@@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = {
 	.show_options	= generic_show_options,
 };
 
+enum { NO_SIZE, SIZE_STD, SIZE_PERCENT };
+
+static bool
+hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize)
+{
+	if (setsize == NO_SIZE)
+		return false;
+
+	if (setsize == SIZE_PERCENT) {
+		*size <<= huge_page_shift(h);
+		*size *= h->max_huge_pages;
+		do_div(*size, 100);
+	}
+
+	*size >>= huge_page_shift(h);
+	return true;
+}
+
 static int
 hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 {
 	char *p, *rest;
 	substring_t args[MAX_OPT_ARGS];
 	int option;
-	unsigned long long size = 0;
-	enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE;
+	unsigned long long max_size = 0, min_size = 0;
+	int max_setsize = NO_SIZE, min_setsize = NO_SIZE;
 
 	if (!options)
 		return 0;
@@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 			/* memparse() will accept a K/M/G without a digit */
 			if (!isdigit(*args[0].from))
 				goto bad_val;
-			size = memparse(args[0].from, &rest);
-			setsize = SIZE_STD;
+			max_size = memparse(args[0].from, &rest);
+			max_setsize = SIZE_STD;
 			if (*rest == '%')
-				setsize = SIZE_PERCENT;
+				max_setsize = SIZE_PERCENT;
 			break;
 		}
 
@@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 			break;
 		}
 
+		case Opt_min_size: {
+			/* memparse() will accept a K/M/G without a digit */
+			if (!isdigit(*args[0].from))
+				goto bad_val;
+			min_size = memparse(args[0].from, &rest);
+			min_setsize = SIZE_STD;
+			if (*rest == '%')
+				min_setsize = SIZE_PERCENT;
+			break;
+		}
+
 		default:
 			pr_err("Bad mount option: \"%s\"\n", p);
 			return -EINVAL;
@@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 		}
 	}
 
-	/* Do size after hstate is set up */
-	if (setsize > NO_SIZE) {
-		struct hstate *h = pconfig->hstate;
-		if (setsize == SIZE_PERCENT) {
-			size <<= huge_page_shift(h);
-			size *= h->max_huge_pages;
-			do_div(size, 100);
-		}
-		pconfig->nr_blocks = (size >> huge_page_shift(h));
+	/* Calculate number of huge pages based on hstate */
+	if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize))
+		pconfig->nr_blocks = max_size;
+	if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize))
+		pconfig->min_size = min_size;
+
+	/* If max_size specified, then min_size must be smaller */
+	if (max_setsize > NO_SIZE && min_setsize > NO_SIZE &&
+	    pconfig->min_size > pconfig->nr_blocks) {
+		pr_err("minimum size can not be greater than maximum size\n");
+		return -EINVAL;
 	}
 
 	return 0;
@@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 	config.gid = current_fsgid();
 	config.mode = 0755;
 	config.hstate = &default_hstate;
+	config.min_size = 0; /* No default minimum size */
 	ret = hugetlbfs_parse_options(data, &config);
 	if (ret)
 		return ret;
@@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 	sbinfo->max_inodes = config.nr_inodes;
 	sbinfo->free_inodes = config.nr_inodes;
 	sbinfo->spool = NULL;
-	if (config.nr_blocks != -1) {
-		sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
+	/*
+	 * Allocate and initialize subpool if maximum or minimum size is
+	 * specified.  Any needed reservations (for minimim size) are taken
+	 * taken when the subpool is created.
+	 */
+	if (config.nr_blocks != -1 || config.min_size != 0) {
+		sbinfo->spool = hugepage_new_subpool(config.hstate,
+							config.nr_blocks,
+							config.min_size);
 		if (!sbinfo->spool)
 			goto out_free;
 	}
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cfe13fd..6883fca 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -40,7 +40,8 @@ extern int hugetlb_max_hstate __read_mostly;
 #define for_each_hstate(h) \
 	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
-struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
+struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long nr_blocks,
+						long min_size);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
 int PageHuge(struct page *page);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ab2ea1e..7d4be33 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -61,6 +61,9 @@ DEFINE_SPINLOCK(hugetlb_lock);
 static int num_fault_mutexes;
 static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp;
 
+/* Forward declaration */
+static int hugetlb_acct_memory(struct hstate *h, long delta);
+
 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 {
 	bool free = (spool->count == 0) && (spool->used_hpages == 0);
@@ -68,12 +71,18 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 	spin_unlock(&spool->lock);
 
 	/* If no pages are used, and no other handles to the subpool
-	 * remain, free the subpool the subpool remain */
-	if (free)
+	 * remain, give up any reservations mased on minimum size and
+	 * free the subpool */
+	if (free) {
+		if (spool->min_hpages)
+			hugetlb_acct_memory(spool->hstate,
+						-spool->min_hpages);
 		kfree(spool);
+	}
 }
 
-struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
+struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long nr_blocks,
+						long min_size)
 {
 	struct hugepage_subpool *spool;
 
@@ -85,9 +94,14 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
 	spool->used_hpages = 0;
-	spool->hstate = NULL;
-	spool->min_hpages = 0;
-	spool->rsv_hpages = 0;
+	spool->hstate = h;
+	spool->min_hpages = min_size;
+
+	if (min_size && hugetlb_acct_memory(h, min_size)) {
+		kfree(spool);
+		return NULL;
+	}
+	spool->rsv_hpages = min_size;
 
 	return spool;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 4/4] hugetlbfs: document min_size mount option
  2015-03-16 23:53 [PATCH V2 0/4] hugetlbfs: add min_size filesystem mount option Mike Kravetz
                   ` (2 preceding siblings ...)
  2015-03-16 23:53 ` [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Mike Kravetz
@ 2015-03-16 23:53 ` Mike Kravetz
  2015-03-18 21:41   ` Andrew Morton
  3 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2015-03-16 23:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Update documentation for the hugetlbfs min_size mount option.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
index f2d3a10..83c0305 100644
--- a/Documentation/vm/hugetlbpage.txt
+++ b/Documentation/vm/hugetlbpage.txt
@@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of
 type hugetlbfs:
 
   mount -t hugetlbfs \
-	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,nr_inodes=<value> \
-	none /mnt/huge
+	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,min_size=<value>, \
+	nr_inodes=<value> none /mnt/huge
 
 This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
 /mnt/huge.  Any files created on /mnt/huge uses huge pages.  The uid and gid
@@ -277,11 +277,18 @@ the uid and gid of the current process are taken.  The mode option sets the
 mode of root of file system to value & 01777.  This value is given in octal.
 By default the value 0755 is picked. The size option sets the maximum value of
 memory (huge pages) allowed for that filesystem (/mnt/huge). The size is
-rounded down to HPAGE_SIZE.  The option nr_inodes sets the maximum number of
-inodes that /mnt/huge can use.  If the size or nr_inodes option is not
-provided on command line then no limits are set.  For size and nr_inodes
-options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For
-example, size=2K has the same meaning as size=2048.
+rounded down to HPAGE_SIZE.  The min_size option sets the minimum value of
+memory (huge pages) allowed for the filesystem.  Like the size option,
+min_size is rounded down to HPAGE_SIZE.  At mount time, the number of huge
+pages specified by min_size are reserved for use by the filesystem.  If
+there are not enough free huge pages available, the mount will fail.  As
+huge pages are allocated to the filesystem and freed, the reserve count
+is adjusted so that the sum of allocated and reserved huge pages is always
+at least min_size.  The option nr_inodes sets the maximum number of
+inodes that /mnt/huge can use.  If the size, min_size or nr_inodes option
+is not provided on command line then no limits are set.  For size, min_size
+and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent
+giga/mega/kilo. For example, size=2K has the same meaning as size=2048.
 
 While read system calls are supported on files that reside on hugetlb
 file systems, write system calls are not.
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure
  2015-03-16 23:53 ` [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Mike Kravetz
@ 2015-03-18 21:25   ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2015-03-18 21:25 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On Mon, 16 Mar 2015 16:53:26 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Add a field to the subpool structure to indicate the minimimum
> number of huge pages to always be used by this subpool.  This
> minimum count includes allocated pages as well as reserved pages.
> If the minimum number of pages for the subpool have not been
> allocated, pages are reserved up to this minimum.  An additional
> field (rsv_hpages) is used to track the number of pages reserved
> to meet this minimum size.  The hstate pointer in the subpool
> is convenient to have when reserving and unreserving the pages.
> 
> ...
>
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -23,6 +23,8 @@ struct hugepage_subpool {
>  	spinlock_t lock;
>  	long count;
>  	long max_hpages, used_hpages;
> +	struct hstate *hstate;
> +	long min_hpages, rsv_hpages;
>  };

Let's leave room for the descriptive comments which aren't there.

--- a/include/linux/hugetlb.h~hugetlbfs-add-minimum-size-tracking-fields-to-subpool-structure-fix
+++ a/include/linux/hugetlb.h
@@ -22,9 +22,11 @@ struct mmu_gather;
 struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
-	long max_hpages, used_hpages;
+	long max_hpagesl
+	long used_hpages;
 	struct hstate *hstate;
-	long min_hpages, rsv_hpages;
+	long min_hpages;
+	long rsv_hpages;
 };
 
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -85,6 +85,9 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
>  	spool->count = 1;
>  	spool->max_hpages = nr_blocks;
>  	spool->used_hpages = 0;
> +	spool->hstate = NULL;
> +	spool->min_hpages = 0;
> +	spool->rsv_hpages = 0;

Four strikes and you're out!

--- a/mm/hugetlb.c~hugetlbfs-add-minimum-size-tracking-fields-to-subpool-structure-fix
+++ a/mm/hugetlb.c
@@ -77,17 +77,13 @@ struct hugepage_subpool *hugepage_new_su
 {
 	struct hugepage_subpool *spool;
 
-	spool = kmalloc(sizeof(*spool), GFP_KERNEL);
+	spool = kzalloc(sizeof(*spool), GFP_KERNEL);
 	if (!spool)
 		return NULL;
 
 	spin_lock_init(&spool->lock);
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
-	spool->used_hpages = 0;
-	spool->hstate = NULL;
-	spool->min_hpages = 0;
-	spool->rsv_hpages = 0;
 
 	return spool;
 }
_


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly
  2015-03-16 23:53 ` [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Mike Kravetz
@ 2015-03-18 21:40   ` Andrew Morton
  2015-03-19  1:34     ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2015-03-18 21:40 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On Mon, 16 Mar 2015 16:53:28 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Make 'min_size=' be an option when mounting a hugetlbfs.  This option
> takes the same value as the 'size' option.  min_size can be specified
> with specifying size.  If both are specified, min_size must be less
> that or equal to size else the mount will fail.  If min_size is
> specified, then at mount time an attempt is made to reserve min_size
> pages.  If the reservation fails, the mount fails.  At umount time,
> the reserved pages are released.
> 
> ...
>
> @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = {
>  	.show_options	= generic_show_options,
>  };
>  
> +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT };
> +
> +static bool
> +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize)
> +{
> +	if (setsize == NO_SIZE)
> +		return false;
> +
> +	if (setsize == SIZE_PERCENT) {
> +		*size <<= huge_page_shift(h);
> +		*size *= h->max_huge_pages;
> +		do_div(*size, 100);

I suppose do_div() takes a long long.  u64 would be more conventional. 
I don't *think* all this code needed to use signed types.

> +	}
> +
> +	*size >>= huge_page_shift(h);
> +	return true;
> +}
> +
>  static int
>  hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>  {
>  	char *p, *rest;
>  	substring_t args[MAX_OPT_ARGS];
>  	int option;
> -	unsigned long long size = 0;
> -	enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE;
> +	unsigned long long max_size = 0, min_size = 0;
> +	int max_setsize = NO_SIZE, min_setsize = NO_SIZE;
>  
>  	if (!options)
>  		return 0;
> @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>  			/* memparse() will accept a K/M/G without a digit */
>  			if (!isdigit(*args[0].from))
>  				goto bad_val;
> -			size = memparse(args[0].from, &rest);
> -			setsize = SIZE_STD;
> +			max_size = memparse(args[0].from, &rest);
> +			max_setsize = SIZE_STD;
>  			if (*rest == '%')
> -				setsize = SIZE_PERCENT;
> +				max_setsize = SIZE_PERCENT;
>  			break;
>  		}
>  
> @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>  			break;
>  		}
>  
> +		case Opt_min_size: {
> +			/* memparse() will accept a K/M/G without a digit */
> +			if (!isdigit(*args[0].from))
> +				goto bad_val;
> +			min_size = memparse(args[0].from, &rest);
> +			min_setsize = SIZE_STD;
> +			if (*rest == '%')
> +				min_setsize = SIZE_PERCENT;
> +			break;
> +		}
> +
>  		default:
>  			pr_err("Bad mount option: \"%s\"\n", p);
>  			return -EINVAL;
> @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>  		}
>  	}
>  
> -	/* Do size after hstate is set up */
> -	if (setsize > NO_SIZE) {
> -		struct hstate *h = pconfig->hstate;
> -		if (setsize == SIZE_PERCENT) {
> -			size <<= huge_page_shift(h);
> -			size *= h->max_huge_pages;
> -			do_div(size, 100);
> -		}
> -		pconfig->nr_blocks = (size >> huge_page_shift(h));
> +	/* Calculate number of huge pages based on hstate */
> +	if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize))
> +		pconfig->nr_blocks = max_size;

So hugetlbfs_options_setsize takes an arg whichis in units of bytes,
modifies it in-place to b in units of pages and then copies it into
something which is in units of nr_blocks.


> +	if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize))
> +		pconfig->min_size = min_size;
> +
> +	/* If max_size specified, then min_size must be smaller */
> +	if (max_setsize > NO_SIZE && min_setsize > NO_SIZE &&
> +	    pconfig->min_size > pconfig->nr_blocks) {
> +		pr_err("minimum size can not be greater than maximum size\n");
> +		return -EINVAL;
>  	}
>  
>  	return 0;
> @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
>  	config.gid = current_fsgid();
>  	config.mode = 0755;
>  	config.hstate = &default_hstate;
> +	config.min_size = 0; /* No default minimum size */
>  	ret = hugetlbfs_parse_options(data, &config);
>  	if (ret)
>  		return ret;
> @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
>  	sbinfo->max_inodes = config.nr_inodes;
>  	sbinfo->free_inodes = config.nr_inodes;
>  	sbinfo->spool = NULL;
> -	if (config.nr_blocks != -1) {
> -		sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
> +	/*
> +	 * Allocate and initialize subpool if maximum or minimum size is
> +	 * specified.  Any needed reservations (for minimim size) are taken
> +	 * taken when the subpool is created.
> +	 */
> +	if (config.nr_blocks != -1 || config.min_size != 0) {
> +		sbinfo->spool = hugepage_new_subpool(config.hstate,
> +							config.nr_blocks,
> +							config.min_size);

And hugepage_new_subpool() takes something in units of nr_blocks and
copies it into something whcih has units of nr-hugepages.

And it takes an arg called "size" which is no longer number-of-bytes
but is actually number-of-hpages.


It's all rather confusing and unclear.  A good philosophy would be
never to use a variable called "size", because the reader doesn't know
what units that size is measured in.  Instead, make sure that the name
reflects the variable's units.  max_bytes, min_hpages, nr_blocks, etc.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option
  2015-03-16 23:53 ` [PATCH V2 4/4] hugetlbfs: document min_size mount option Mike Kravetz
@ 2015-03-18 21:41   ` Andrew Morton
  2015-03-19  1:51     ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2015-03-18 21:41 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On Mon, 16 Mar 2015 16:53:29 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Update documentation for the hugetlbfs min_size mount option.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
> index f2d3a10..83c0305 100644
> --- a/Documentation/vm/hugetlbpage.txt
> +++ b/Documentation/vm/hugetlbpage.txt
> @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of
>  type hugetlbfs:
>  
>    mount -t hugetlbfs \
> -	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,nr_inodes=<value> \
> -	none /mnt/huge
> +	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,min_size=<value>, \
> +	nr_inodes=<value> none /mnt/huge
>  
>  This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
>  /mnt/huge.  Any files created on /mnt/huge uses huge pages.  The uid and gid
> @@ -277,11 +277,18 @@ the uid and gid of the current process are taken.  The mode option sets the
>  mode of root of file system to value & 01777.  This value is given in octal.
>  By default the value 0755 is picked. The size option sets the maximum value of
>  memory (huge pages) allowed for that filesystem (/mnt/huge). The size is
> -rounded down to HPAGE_SIZE.  The option nr_inodes sets the maximum number of
> -inodes that /mnt/huge can use.  If the size or nr_inodes option is not
> -provided on command line then no limits are set.  For size and nr_inodes
> -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For
> -example, size=2K has the same meaning as size=2048.
> +rounded down to HPAGE_SIZE.  The min_size option sets the minimum value of
> +memory (huge pages) allowed for the filesystem.  Like the size option,
> +min_size is rounded down to HPAGE_SIZE.  At mount time, the number of huge
> +pages specified by min_size are reserved for use by the filesystem.  If
> +there are not enough free huge pages available, the mount will fail.  As
> +huge pages are allocated to the filesystem and freed, the reserve count
> +is adjusted so that the sum of allocated and reserved huge pages is always
> +at least min_size.  The option nr_inodes sets the maximum number of
> +inodes that /mnt/huge can use.  If the size, min_size or nr_inodes option
> +is not provided on command line then no limits are set.  For size, min_size
> +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent
> +giga/mega/kilo. For example, size=2K has the same meaning as size=2048.

Nowhere here is the reader told the units of "size".  We should at
least describe that, and maybe even rename the thing to min_bytes.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools
  2015-03-16 23:53 ` [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Mike Kravetz
@ 2015-03-18 21:43   ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2015-03-18 21:43 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On Mon, 16 Mar 2015 16:53:27 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> The same routines that perform subpool maximum size accounting
> hugepage_subpool_get/put_pages() are modified to also perform
> minimum size accounting.  When a delta value is passed to these
> routines, calculate how global reservations must be adjusted
> to maintain the subpool minimum size.  The routines now return
> this global reserve count adjustment.  This global adjusted
> reserve count is then passed to the global accounting routine
> hugetlb_acct_memory().
> 

The comment layout is a bit chaotic.  Also, sentences start with
capital letters and end with little round things!  It's a bit
anal but heck, the kernel isn't written in linglish.


--- a/mm/hugetlb.c~hugetlbfs-add-minimum-size-accounting-to-subpools-fix
+++ a/mm/hugetlb.c
@@ -125,8 +125,10 @@ static long hugepage_subpool_get_pages(s
 
 	if (spool->min_hpages) {		/* minimum size accounting */
 		if (delta > spool->rsv_hpages) {
-			/* asking for more reserves than those already taken
-			 * on behalf of subpool. return difference */
+			/*
+			 * Asking for more reserves than those already taken on
+			 * behalf of subpool.  Return difference.
+			 */
 			ret = delta - spool->rsv_hpages;
 			spool->rsv_hpages = 0;
 		} else {
@@ -141,7 +143,7 @@ unlock_ret:
 }
 
 /*
- * subpool accounting for freeing and unreserving pages
+ * Subpool accounting for freeing and unreserving pages.
  * Return the number of global page reservations that must be dropped.
  * The return value may only be different than the passed value (delta)
  * in the case where a subpool minimum size must be maintained.
@@ -170,8 +172,10 @@ static long hugepage_subpool_put_pages(s
 			spool->rsv_hpages = spool->min_hpages;
 	}
 
-	/* If hugetlbfs_put_super couldn't free spool due to
-	* an outstanding quota reference, free it now. */
+	/*
+	 * If hugetlbfs_put_super couldn't free spool due to an outstanding
+	 * quota reference, free it now.
+	 */
 	unlock_or_release_subpool(spool);
 
 	return ret;
@@ -923,9 +927,9 @@ void free_huge_page(struct page *page)
 	ClearPagePrivate(page);
 
 	/*
-	 * A return code of zero implies that the subpool will be under
-	 * it's minimum size if the reservation is not restored after
-	 * page is free.  Therefore, force restore_reserve operation.
+	 * A return code of zero implies that the subpool will be under its
+	 * minimum size if the reservation is not restored after page is free.
+	 * Therefore, force restore_reserve operation.
 	 */
 	if (hugepage_subpool_put_pages(spool, 1) == 0)
 		restore_reserve = true;
@@ -2523,8 +2527,8 @@ static void hugetlb_vm_op_close(struct v
 
 	if (reserve) {
 		/*
-		 * decrement reserve counts.  The global reserve count
-		 * may be adjusted if the subpool has a minimum size.
+		 * Decrement reserve counts.  The global reserve count may be
+		 * adjusted if the subpool has a minimum size.
 		 */
 		gbl_reserve = hugepage_subpool_put_pages(spool, reserve);
 		hugetlb_acct_memory(h, -gbl_reserve);
_


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly
  2015-03-18 21:40   ` Andrew Morton
@ 2015-03-19  1:34     ` Mike Kravetz
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2015-03-19  1:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On 03/18/2015 02:40 PM, Andrew Morton wrote:
> On Mon, 16 Mar 2015 16:53:28 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> Make 'min_size=' be an option when mounting a hugetlbfs.  This option
>> takes the same value as the 'size' option.  min_size can be specified
>> with specifying size.  If both are specified, min_size must be less
>> that or equal to size else the mount will fail.  If min_size is
>> specified, then at mount time an attempt is made to reserve min_size
>> pages.  If the reservation fails, the mount fails.  At umount time,
>> the reserved pages are released.
>>
>> ...
>>
>> @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = {
>>   	.show_options	= generic_show_options,
>>   };
>>
>> +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT };
>> +
>> +static bool
>> +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize)
>> +{
>> +	if (setsize == NO_SIZE)
>> +		return false;
>> +
>> +	if (setsize == SIZE_PERCENT) {
>> +		*size <<= huge_page_shift(h);
>> +		*size *= h->max_huge_pages;
>> +		do_div(*size, 100);
>
> I suppose do_div() takes a long long.  u64 would be more conventional.
> I don't *think* all this code needed to use signed types.
>
>> +	}
>> +
>> +	*size >>= huge_page_shift(h);
>> +	return true;
>> +}
>> +
>>   static int
>>   hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>>   {
>>   	char *p, *rest;
>>   	substring_t args[MAX_OPT_ARGS];
>>   	int option;
>> -	unsigned long long size = 0;
>> -	enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE;
>> +	unsigned long long max_size = 0, min_size = 0;
>> +	int max_setsize = NO_SIZE, min_setsize = NO_SIZE;
>>
>>   	if (!options)
>>   		return 0;
>> @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>>   			/* memparse() will accept a K/M/G without a digit */
>>   			if (!isdigit(*args[0].from))
>>   				goto bad_val;
>> -			size = memparse(args[0].from, &rest);
>> -			setsize = SIZE_STD;
>> +			max_size = memparse(args[0].from, &rest);
>> +			max_setsize = SIZE_STD;
>>   			if (*rest == '%')
>> -				setsize = SIZE_PERCENT;
>> +				max_setsize = SIZE_PERCENT;
>>   			break;
>>   		}
>>
>> @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>>   			break;
>>   		}
>>
>> +		case Opt_min_size: {
>> +			/* memparse() will accept a K/M/G without a digit */
>> +			if (!isdigit(*args[0].from))
>> +				goto bad_val;
>> +			min_size = memparse(args[0].from, &rest);
>> +			min_setsize = SIZE_STD;
>> +			if (*rest == '%')
>> +				min_setsize = SIZE_PERCENT;
>> +			break;
>> +		}
>> +
>>   		default:
>>   			pr_err("Bad mount option: \"%s\"\n", p);
>>   			return -EINVAL;
>> @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
>>   		}
>>   	}
>>
>> -	/* Do size after hstate is set up */
>> -	if (setsize > NO_SIZE) {
>> -		struct hstate *h = pconfig->hstate;
>> -		if (setsize == SIZE_PERCENT) {
>> -			size <<= huge_page_shift(h);
>> -			size *= h->max_huge_pages;
>> -			do_div(size, 100);
>> -		}
>> -		pconfig->nr_blocks = (size >> huge_page_shift(h));
>> +	/* Calculate number of huge pages based on hstate */
>> +	if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize))
>> +		pconfig->nr_blocks = max_size;
>
> So hugetlbfs_options_setsize takes an arg whichis in units of bytes,
> modifies it in-place to b in units of pages and then copies it into
> something which is in units of nr_blocks.
>
>
>> +	if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize))
>> +		pconfig->min_size = min_size;
>> +
>> +	/* If max_size specified, then min_size must be smaller */
>> +	if (max_setsize > NO_SIZE && min_setsize > NO_SIZE &&
>> +	    pconfig->min_size > pconfig->nr_blocks) {
>> +		pr_err("minimum size can not be greater than maximum size\n");
>> +		return -EINVAL;
>>   	}
>>
>>   	return 0;
>> @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
>>   	config.gid = current_fsgid();
>>   	config.mode = 0755;
>>   	config.hstate = &default_hstate;
>> +	config.min_size = 0; /* No default minimum size */
>>   	ret = hugetlbfs_parse_options(data, &config);
>>   	if (ret)
>>   		return ret;
>> @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
>>   	sbinfo->max_inodes = config.nr_inodes;
>>   	sbinfo->free_inodes = config.nr_inodes;
>>   	sbinfo->spool = NULL;
>> -	if (config.nr_blocks != -1) {
>> -		sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
>> +	/*
>> +	 * Allocate and initialize subpool if maximum or minimum size is
>> +	 * specified.  Any needed reservations (for minimim size) are taken
>> +	 * taken when the subpool is created.
>> +	 */
>> +	if (config.nr_blocks != -1 || config.min_size != 0) {
>> +		sbinfo->spool = hugepage_new_subpool(config.hstate,
>> +							config.nr_blocks,
>> +							config.min_size);
>
> And hugepage_new_subpool() takes something in units of nr_blocks and
> copies it into something whcih has units of nr-hugepages.
>
> And it takes an arg called "size" which is no longer number-of-bytes
> but is actually number-of-hpages.
>
>
> It's all rather confusing and unclear.  A good philosophy would be
> never to use a variable called "size", because the reader doesn't know
> what units that size is measured in.  Instead, make sure that the name
> reflects the variable's units.  max_bytes, min_hpages, nr_blocks, etc.
>

Thanks for the comments.

I didn't want to cut/paste/duplicate the code used to parse the existing
size option.  But, it looks like I made it harder to understand.  I'll
take a pass as cleaning this up and making it more clear.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option
  2015-03-18 21:41   ` Andrew Morton
@ 2015-03-19  1:51     ` Mike Kravetz
  2015-03-19  2:23       ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2015-03-19  1:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On 03/18/2015 02:41 PM, Andrew Morton wrote:
> On Mon, 16 Mar 2015 16:53:29 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> Update documentation for the hugetlbfs min_size mount option.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>   Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++-------
>>   1 file changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
>> index f2d3a10..83c0305 100644
>> --- a/Documentation/vm/hugetlbpage.txt
>> +++ b/Documentation/vm/hugetlbpage.txt
>> @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of
>>   type hugetlbfs:
>>
>>     mount -t hugetlbfs \
>> -	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,nr_inodes=<value> \
>> -	none /mnt/huge
>> +	-o uid=<value>,gid=<value>,mode=<value>,size=<value>,min_size=<value>, \
>> +	nr_inodes=<value> none /mnt/huge
>>
>>   This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
>>   /mnt/huge.  Any files created on /mnt/huge uses huge pages.  The uid and gid
>> @@ -277,11 +277,18 @@ the uid and gid of the current process are taken.  The mode option sets the
>>   mode of root of file system to value & 01777.  This value is given in octal.
>>   By default the value 0755 is picked. The size option sets the maximum value of
>>   memory (huge pages) allowed for that filesystem (/mnt/huge). The size is
>> -rounded down to HPAGE_SIZE.  The option nr_inodes sets the maximum number of
>> -inodes that /mnt/huge can use.  If the size or nr_inodes option is not
>> -provided on command line then no limits are set.  For size and nr_inodes
>> -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For
>> -example, size=2K has the same meaning as size=2048.
>> +rounded down to HPAGE_SIZE.  The min_size option sets the minimum value of
>> +memory (huge pages) allowed for the filesystem.  Like the size option,
>> +min_size is rounded down to HPAGE_SIZE.  At mount time, the number of huge
>> +pages specified by min_size are reserved for use by the filesystem.  If
>> +there are not enough free huge pages available, the mount will fail.  As
>> +huge pages are allocated to the filesystem and freed, the reserve count
>> +is adjusted so that the sum of allocated and reserved huge pages is always
>> +at least min_size.  The option nr_inodes sets the maximum number of
>> +inodes that /mnt/huge can use.  If the size, min_size or nr_inodes option
>> +is not provided on command line then no limits are set.  For size, min_size
>> +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent
>> +giga/mega/kilo. For example, size=2K has the same meaning as size=2048.
>
> Nowhere here is the reader told the units of "size".  We should at
> least describe that, and maybe even rename the thing to min_bytes.
>

Ok, I will add that the size is in unit of bytes.  My choice of
'min_size' as a name for the new mount option was influenced by
the existing 'size' mount option.  I'm open to any suggestions
for the name of this new mount option.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option
  2015-03-19  1:51     ` Mike Kravetz
@ 2015-03-19  2:23       ` Andrew Morton
  2015-03-20 16:24         ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2015-03-19  2:23 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On Wed, 18 Mar 2015 18:51:22 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> > Nowhere here is the reader told the units of "size".  We should at
> > least describe that, and maybe even rename the thing to min_bytes.
> >
> 
> Ok, I will add that the size is in unit of bytes.  My choice of
> 'min_size' as a name for the new mount option was influenced by
> the existing 'size' mount option.  I'm open to any suggestions
> for the name of this new mount option.

Yes, due to the preexisting "size" I think we're stuck with "min_size".
We could use min_size_bytes I guess, but the operator needs to go look
up the units of "size" anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option
  2015-03-19  2:23       ` Andrew Morton
@ 2015-03-20 16:24         ` Mike Kravetz
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2015-03-20 16:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Davidlohr Bueso, Aneesh Kumar, Joonsoo Kim

On 03/18/2015 07:23 PM, Andrew Morton wrote:
> On Wed, 18 Mar 2015 18:51:22 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>>> Nowhere here is the reader told the units of "size".  We should at
>>> least describe that, and maybe even rename the thing to min_bytes.
>>>
>>
>> Ok, I will add that the size is in unit of bytes.  My choice of
>> 'min_size' as a name for the new mount option was influenced by
>> the existing 'size' mount option.  I'm open to any suggestions
>> for the name of this new mount option.
>
> Yes, due to the preexisting "size" I think we're stuck with "min_size".
> We could use min_size_bytes I guess, but the operator needs to go look
> up the units of "size" anyway.
>

Well, the existing size option can also be specified as a percentage of
the huge page pool size.  This is in the current code.  There is a
mount option 'pagesize=' that allows one to select which huge page
(size) pool should be used. If none is specified the default huge page
pool is used.  There is no documentation for this pagesize option or
using size to specify a percentage of the huge page pool size.

I'll add this to the hugetlbpage.txt documentation.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-03-20 16:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16 23:53 [PATCH V2 0/4] hugetlbfs: add min_size filesystem mount option Mike Kravetz
2015-03-16 23:53 ` [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Mike Kravetz
2015-03-18 21:25   ` Andrew Morton
2015-03-16 23:53 ` [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Mike Kravetz
2015-03-18 21:43   ` Andrew Morton
2015-03-16 23:53 ` [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Mike Kravetz
2015-03-18 21:40   ` Andrew Morton
2015-03-19  1:34     ` Mike Kravetz
2015-03-16 23:53 ` [PATCH V2 4/4] hugetlbfs: document min_size mount option Mike Kravetz
2015-03-18 21:41   ` Andrew Morton
2015-03-19  1:51     ` Mike Kravetz
2015-03-19  2:23       ` Andrew Morton
2015-03-20 16:24         ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).