All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-02-27 22:58 ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

hugetlbfs allocates huge pages from the global pool as needed.  Even if
the global pool contains a sufficient number pages for the filesystem
size at mount time, those global pages could be grabbed for some other
use.  As a result, filesystem huge page allocations may fail due to lack
of pages.

Add a new hugetlbfs mount option 'reserved' to specify that the number
of pages associated with the size of the filesystem will be reserved.  If
there are insufficient pages, the mount will fail.  The reservation is
maintained for the duration of the filesystem so that as pages are
allocated and free'ed a sufficient number of pages remains reserved.

Mike Kravetz (3):
  hugetlbfs: add reserved mount fields to subpool structure
  hugetlbfs: coordinate global and subpool reserve accounting
  hugetlbfs: accept subpool reserved option and setup accordingly

 fs/hugetlbfs/inode.c    | 15 +++++++++++++--
 include/linux/hugetlb.h |  7 +++++++
 mm/hugetlb.c            | 37 +++++++++++++++++++++++++++++--------
 3 files changed, 49 insertions(+), 10 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-02-27 22:58 ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

hugetlbfs allocates huge pages from the global pool as needed.  Even if
the global pool contains a sufficient number pages for the filesystem
size at mount time, those global pages could be grabbed for some other
use.  As a result, filesystem huge page allocations may fail due to lack
of pages.

Add a new hugetlbfs mount option 'reserved' to specify that the number
of pages associated with the size of the filesystem will be reserved.  If
there are insufficient pages, the mount will fail.  The reservation is
maintained for the duration of the filesystem so that as pages are
allocated and free'ed a sufficient number of pages remains reserved.

Mike Kravetz (3):
  hugetlbfs: add reserved mount fields to subpool structure
  hugetlbfs: coordinate global and subpool reserve accounting
  hugetlbfs: accept subpool reserved option and setup accordingly

 fs/hugetlbfs/inode.c    | 15 +++++++++++++--
 include/linux/hugetlb.h |  7 +++++++
 mm/hugetlb.c            | 37 +++++++++++++++++++++++++++++--------
 3 files changed, 49 insertions(+), 10 deletions(-)

-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
  2015-02-27 22:58 ` Mike Kravetz
@ 2015-02-27 22:58   ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Add a boolean to the subpool structure to indicate that the pages for
subpool have been reserved.  The hstate pointer in the subpool is
convenient to have when it comes time to unreserve the pages.
subool_reserved() is a handy way to check if reserved and take into
account a NULL subpool.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h | 6 ++++++
 mm/hugetlb.c            | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 431b7fc..605c648 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -23,6 +23,8 @@ struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
 	long max_hpages, used_hpages;
+	struct hstate *hstate;
+	bool reserved;
 };
 
 struct resv_map {
@@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
 #define for_each_hstate(h) \
 	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
+static inline bool subpool_reserved(struct hugepage_subpool *spool)
+{
+	return spool && spool->reserved;
+}
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 85032de..c6adf65 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -85,6 +85,8 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
 	spool->used_hpages = 0;
+	spool->hstate = NULL;
+	spool->reserved = false;
 
 	return spool;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
@ 2015-02-27 22:58   ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Add a boolean to the subpool structure to indicate that the pages for
subpool have been reserved.  The hstate pointer in the subpool is
convenient to have when it comes time to unreserve the pages.
subool_reserved() is a handy way to check if reserved and take into
account a NULL subpool.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h | 6 ++++++
 mm/hugetlb.c            | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 431b7fc..605c648 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -23,6 +23,8 @@ struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
 	long max_hpages, used_hpages;
+	struct hstate *hstate;
+	bool reserved;
 };
 
 struct resv_map {
@@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
 #define for_each_hstate(h) \
 	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
+static inline bool subpool_reserved(struct hugepage_subpool *spool)
+{
+	return spool && spool->reserved;
+}
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 85032de..c6adf65 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -85,6 +85,8 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
 	spool->used_hpages = 0;
+	spool->hstate = NULL;
+	spool->reserved = false;
 
 	return spool;
 }
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
  2015-02-27 22:58 ` Mike Kravetz
@ 2015-02-27 22:58   ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Add a boolean to the subpool structure to indicate that the pages for
subpool have been reserved.  The hstate pointer in the subpool is
convienient to have when it comes time to unreserve the pages.
subool_reserved() is a handy way to check if reserved and take into
account a NULL subpool.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h | 6 ++++++
 mm/hugetlb.c            | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 431b7fc..605c648 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -23,6 +23,8 @@ struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
 	long max_hpages, used_hpages;
+	struct hstate *hstate;
+	bool reserved;
 };
 
 struct resv_map {
@@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
 #define for_each_hstate(h) \
 	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
+static inline bool subpool_reserved(struct hugepage_subpool *spool)
+{
+	return spool && spool->reserved;
+}
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 85032de..c6adf65 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -85,6 +85,8 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
 	spool->used_hpages = 0;
+	spool->hstate = NULL;
+	spool->reserved = false;
 
 	return spool;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
@ 2015-02-27 22:58   ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Add a boolean to the subpool structure to indicate that the pages for
subpool have been reserved.  The hstate pointer in the subpool is
convienient to have when it comes time to unreserve the pages.
subool_reserved() is a handy way to check if reserved and take into
account a NULL subpool.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h | 6 ++++++
 mm/hugetlb.c            | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 431b7fc..605c648 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -23,6 +23,8 @@ struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
 	long max_hpages, used_hpages;
+	struct hstate *hstate;
+	bool reserved;
 };
 
 struct resv_map {
@@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
 #define for_each_hstate(h) \
 	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
+static inline bool subpool_reserved(struct hugepage_subpool *spool)
+{
+	return spool && spool->reserved;
+}
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 85032de..c6adf65 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -85,6 +85,8 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	spool->count = 1;
 	spool->max_hpages = nr_blocks;
 	spool->used_hpages = 0;
+	spool->hstate = NULL;
+	spool->reserved = false;
 
 	return spool;
 }
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
  2015-02-27 22:58 ` Mike Kravetz
@ 2015-02-27 22:58   ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

If the pages for a subpool are reserved, then the reservations have
already been accounted for in the global pool.  Therefore, when
requesting a new reservation (such as for a mapping) for the subpool
do not count again in global pool.  However, when actually allocating
a page for the subpool decrement global reserve count to correspond to
with decrement in global free pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c6adf65..4ef8379 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -879,7 +879,7 @@ void free_huge_page(struct page *page)
 	spin_lock(&hugetlb_lock);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
-	if (restore_reserve)
+	if (restore_reserve || subpool_reserved(spool))
 		h->resv_huge_pages++;
 
 	if (h->surplus_huge_pages_node[nid]) {
@@ -2466,7 +2466,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	kref_put(&resv->refs, resv_map_release);
 
 	if (reserve) {
-		hugetlb_acct_memory(h, -reserve);
+		if (!subpool_reserved(spool))
+			hugetlb_acct_memory(h, -reserve);
 		hugepage_subpool_put_pages(spool, reserve);
 	}
 }
@@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
 	 * Check enough hugepages are available for the reservation.
 	 * Hand the pages back to the subpool if there are not
 	 */
-	ret = hugetlb_acct_memory(h, chg);
-	if (ret < 0) {
-		hugepage_subpool_put_pages(spool, chg);
-		goto out_err;
+	if (subpool_reserved(spool))
+		ret = 0;
+	else {
+		ret = hugetlb_acct_memory(h, chg);
+		if (ret < 0) {
+			hugepage_subpool_put_pages(spool, chg);
+			goto out_err;
+		}
 	}
 
 	/*
@@ -3483,7 +3488,8 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	inode->i_blocks -= (blocks_per_huge_page(h) * freed);
 	spin_unlock(&inode->i_lock);
 
-	hugepage_subpool_put_pages(spool, (chg - freed));
+	if (!subpool_reserved(spool))
+		hugepage_subpool_put_pages(spool, (chg - freed));
 	hugetlb_acct_memory(h, -(chg - freed));
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
@ 2015-02-27 22:58   ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

If the pages for a subpool are reserved, then the reservations have
already been accounted for in the global pool.  Therefore, when
requesting a new reservation (such as for a mapping) for the subpool
do not count again in global pool.  However, when actually allocating
a page for the subpool decrement global reserve count to correspond to
with decrement in global free pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c6adf65..4ef8379 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -879,7 +879,7 @@ void free_huge_page(struct page *page)
 	spin_lock(&hugetlb_lock);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
-	if (restore_reserve)
+	if (restore_reserve || subpool_reserved(spool))
 		h->resv_huge_pages++;
 
 	if (h->surplus_huge_pages_node[nid]) {
@@ -2466,7 +2466,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	kref_put(&resv->refs, resv_map_release);
 
 	if (reserve) {
-		hugetlb_acct_memory(h, -reserve);
+		if (!subpool_reserved(spool))
+			hugetlb_acct_memory(h, -reserve);
 		hugepage_subpool_put_pages(spool, reserve);
 	}
 }
@@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
 	 * Check enough hugepages are available for the reservation.
 	 * Hand the pages back to the subpool if there are not
 	 */
-	ret = hugetlb_acct_memory(h, chg);
-	if (ret < 0) {
-		hugepage_subpool_put_pages(spool, chg);
-		goto out_err;
+	if (subpool_reserved(spool))
+		ret = 0;
+	else {
+		ret = hugetlb_acct_memory(h, chg);
+		if (ret < 0) {
+			hugepage_subpool_put_pages(spool, chg);
+			goto out_err;
+		}
 	}
 
 	/*
@@ -3483,7 +3488,8 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	inode->i_blocks -= (blocks_per_huge_page(h) * freed);
 	spin_unlock(&inode->i_lock);
 
-	hugepage_subpool_put_pages(spool, (chg - freed));
+	if (!subpool_reserved(spool))
+		hugepage_subpool_put_pages(spool, (chg - freed));
 	hugetlb_acct_memory(h, -(chg - freed));
 }
 
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
  2015-02-27 22:58 ` Mike Kravetz
@ 2015-02-27 22:58   ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

If the pages for a subpool are reserved, then the reservations have
already been accounted for in the global pool.  Therefore, when
requesting a new reservation (such as for a mapping) for the subpool
do not count again in global pool.  However, when actually allocating
a page for the subpool decrement gobal reserve count to correspond to
with decrement in global free pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c6adf65..4ef8379 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -879,7 +879,7 @@ void free_huge_page(struct page *page)
 	spin_lock(&hugetlb_lock);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
-	if (restore_reserve)
+	if (restore_reserve || subpool_reserved(spool))
 		h->resv_huge_pages++;
 
 	if (h->surplus_huge_pages_node[nid]) {
@@ -2466,7 +2466,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	kref_put(&resv->refs, resv_map_release);
 
 	if (reserve) {
-		hugetlb_acct_memory(h, -reserve);
+		if (!subpool_reserved(spool))
+			hugetlb_acct_memory(h, -reserve);
 		hugepage_subpool_put_pages(spool, reserve);
 	}
 }
@@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
 	 * Check enough hugepages are available for the reservation.
 	 * Hand the pages back to the subpool if there are not
 	 */
-	ret = hugetlb_acct_memory(h, chg);
-	if (ret < 0) {
-		hugepage_subpool_put_pages(spool, chg);
-		goto out_err;
+	if (subpool_reserved(spool))
+		ret = 0;
+	else {
+		ret = hugetlb_acct_memory(h, chg);
+		if (ret < 0) {
+			hugepage_subpool_put_pages(spool, chg);
+			goto out_err;
+		}
 	}
 
 	/*
@@ -3483,7 +3488,8 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	inode->i_blocks -= (blocks_per_huge_page(h) * freed);
 	spin_unlock(&inode->i_lock);
 
-	hugepage_subpool_put_pages(spool, (chg - freed));
+	if (!subpool_reserved(spool))
+		hugepage_subpool_put_pages(spool, (chg - freed));
 	hugetlb_acct_memory(h, -(chg - freed));
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
@ 2015-02-27 22:58   ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

If the pages for a subpool are reserved, then the reservations have
already been accounted for in the global pool.  Therefore, when
requesting a new reservation (such as for a mapping) for the subpool
do not count again in global pool.  However, when actually allocating
a page for the subpool decrement gobal reserve count to correspond to
with decrement in global free pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c6adf65..4ef8379 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -879,7 +879,7 @@ void free_huge_page(struct page *page)
 	spin_lock(&hugetlb_lock);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
-	if (restore_reserve)
+	if (restore_reserve || subpool_reserved(spool))
 		h->resv_huge_pages++;
 
 	if (h->surplus_huge_pages_node[nid]) {
@@ -2466,7 +2466,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	kref_put(&resv->refs, resv_map_release);
 
 	if (reserve) {
-		hugetlb_acct_memory(h, -reserve);
+		if (!subpool_reserved(spool))
+			hugetlb_acct_memory(h, -reserve);
 		hugepage_subpool_put_pages(spool, reserve);
 	}
 }
@@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
 	 * Check enough hugepages are available for the reservation.
 	 * Hand the pages back to the subpool if there are not
 	 */
-	ret = hugetlb_acct_memory(h, chg);
-	if (ret < 0) {
-		hugepage_subpool_put_pages(spool, chg);
-		goto out_err;
+	if (subpool_reserved(spool))
+		ret = 0;
+	else {
+		ret = hugetlb_acct_memory(h, chg);
+		if (ret < 0) {
+			hugepage_subpool_put_pages(spool, chg);
+			goto out_err;
+		}
 	}
 
 	/*
@@ -3483,7 +3488,8 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	inode->i_blocks -= (blocks_per_huge_page(h) * freed);
 	spin_unlock(&inode->i_lock);
 
-	hugepage_subpool_put_pages(spool, (chg - freed));
+	if (!subpool_reserved(spool))
+		hugepage_subpool_put_pages(spool, (chg - freed));
 	hugetlb_acct_memory(h, -(chg - freed));
 }
 
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 3/3] hugetlbfs: accept subpool reserved option and setup accordingly
  2015-02-27 22:58 ` Mike Kravetz
@ 2015-02-27 22:58   ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Make reserved be an option when mounting a hugetlbfs.  reserved
option is only possible if size option is also specified.  On mount,
reserve size hugepages and note in subpool.  Unreserve pages when
fs is unmounted.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c    | 15 +++++++++++++--
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            | 15 ++++++++++++++-
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 5eba47f..99d0cec 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -50,6 +50,7 @@ struct hugetlbfs_config {
 	long	nr_blocks;
 	long	nr_inodes;
 	struct hstate *hstate;
+	bool	reserved;
 };
 
 struct hugetlbfs_inode_info {
@@ -73,7 +74,7 @@ int sysctl_hugetlb_shm_group;
 enum {
 	Opt_size, Opt_nr_inodes,
 	Opt_mode, Opt_uid, Opt_gid,
-	Opt_pagesize,
+	Opt_pagesize, Opt_reserved,
 	Opt_err,
 };
 
@@ -84,6 +85,7 @@ static const match_table_t tokens = {
 	{Opt_uid,	"uid=%u"},
 	{Opt_gid,	"gid=%u"},
 	{Opt_pagesize,	"pagesize=%s"},
+	{Opt_reserved,	"reserved"},
 	{Opt_err,	NULL},
 };
 
@@ -832,6 +834,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 			break;
 		}
 
+		case Opt_reserved:
+			pconfig->reserved = true;
+			break;
+
 		default:
 			pr_err("Bad mount option: \"%s\"\n", p);
 			return -EINVAL;
@@ -872,6 +878,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 	config.gid = current_fsgid();
 	config.mode = 0755;
 	config.hstate = &default_hstate;
+	config.reserved = false;
 	ret = hugetlbfs_parse_options(data, &config);
 	if (ret)
 		return ret;
@@ -889,7 +896,11 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 		sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
 		if (!sbinfo->spool)
 			goto out_free;
-	}
+		sbinfo->spool->hstate = config.hstate;
+		if (config.reserved && !reserve_hugepage_subpool(sbinfo->spool))
+			goto out_free;
+	} else if (config.reserved)
+		goto out_free;
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sb->s_blocksize = huge_page_size(config.hstate);
 	sb->s_blocksize_bits = huge_page_shift(config.hstate);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 605c648..117e1bd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -45,6 +45,7 @@ static inline bool subpool_reserved(struct hugepage_subpool *spool)
 	return spool && spool->reserved;
 }
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
+bool reserve_hugepage_subpool(struct hugepage_subpool *spool);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
 int PageHuge(struct page *page);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4ef8379..3ae3596 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -61,6 +61,8 @@ DEFINE_SPINLOCK(hugetlb_lock);
 static int num_fault_mutexes;
 static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp;
 
+/* Forward declaration */
+static int hugetlb_acct_memory(struct hstate *h, long delta);
 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 {
 	bool free = (spool->count == 0) && (spool->used_hpages == 0);
@@ -69,8 +71,11 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 
 	/* If no pages are used, and no other handles to the subpool
 	 * remain, free the subpool the subpool remain */
-	if (free)
+	if (free) {
+		if (spool->reserved)
+			hugetlb_acct_memory(spool->hstate, -spool->max_hpages);
 		kfree(spool);
+	}
 }
 
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
@@ -91,6 +96,14 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	return spool;
 }
 
+bool reserve_hugepage_subpool(struct hugepage_subpool *spool)
+{
+	if (hugetlb_acct_memory(spool->hstate, spool->max_hpages))
+		return false;
+	spool->reserved = true;
+	return true;
+}
+
 void hugepage_put_subpool(struct hugepage_subpool *spool)
 {
 	spin_lock(&spool->lock);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 3/3] hugetlbfs: accept subpool reserved option and setup accordingly
@ 2015-02-27 22:58   ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-27 22:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Nadia Yvette Chambers, Andrew Morton, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim, Mike Kravetz

Make reserved be an option when mounting a hugetlbfs.  reserved
option is only possible if size option is also specified.  On mount,
reserve size hugepages and note in subpool.  Unreserve pages when
fs is unmounted.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c    | 15 +++++++++++++--
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            | 15 ++++++++++++++-
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 5eba47f..99d0cec 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -50,6 +50,7 @@ struct hugetlbfs_config {
 	long	nr_blocks;
 	long	nr_inodes;
 	struct hstate *hstate;
+	bool	reserved;
 };
 
 struct hugetlbfs_inode_info {
@@ -73,7 +74,7 @@ int sysctl_hugetlb_shm_group;
 enum {
 	Opt_size, Opt_nr_inodes,
 	Opt_mode, Opt_uid, Opt_gid,
-	Opt_pagesize,
+	Opt_pagesize, Opt_reserved,
 	Opt_err,
 };
 
@@ -84,6 +85,7 @@ static const match_table_t tokens = {
 	{Opt_uid,	"uid=%u"},
 	{Opt_gid,	"gid=%u"},
 	{Opt_pagesize,	"pagesize=%s"},
+	{Opt_reserved,	"reserved"},
 	{Opt_err,	NULL},
 };
 
@@ -832,6 +834,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 			break;
 		}
 
+		case Opt_reserved:
+			pconfig->reserved = true;
+			break;
+
 		default:
 			pr_err("Bad mount option: \"%s\"\n", p);
 			return -EINVAL;
@@ -872,6 +878,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 	config.gid = current_fsgid();
 	config.mode = 0755;
 	config.hstate = &default_hstate;
+	config.reserved = false;
 	ret = hugetlbfs_parse_options(data, &config);
 	if (ret)
 		return ret;
@@ -889,7 +896,11 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 		sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
 		if (!sbinfo->spool)
 			goto out_free;
-	}
+		sbinfo->spool->hstate = config.hstate;
+		if (config.reserved && !reserve_hugepage_subpool(sbinfo->spool))
+			goto out_free;
+	} else if (config.reserved)
+		goto out_free;
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sb->s_blocksize = huge_page_size(config.hstate);
 	sb->s_blocksize_bits = huge_page_shift(config.hstate);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 605c648..117e1bd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -45,6 +45,7 @@ static inline bool subpool_reserved(struct hugepage_subpool *spool)
 	return spool && spool->reserved;
 }
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
+bool reserve_hugepage_subpool(struct hugepage_subpool *spool);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
 int PageHuge(struct page *page);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4ef8379..3ae3596 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -61,6 +61,8 @@ DEFINE_SPINLOCK(hugetlb_lock);
 static int num_fault_mutexes;
 static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp;
 
+/* Forward declaration */
+static int hugetlb_acct_memory(struct hstate *h, long delta);
 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 {
 	bool free = (spool->count == 0) && (spool->used_hpages == 0);
@@ -69,8 +71,11 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 
 	/* If no pages are used, and no other handles to the subpool
 	 * remain, free the subpool the subpool remain */
-	if (free)
+	if (free) {
+		if (spool->reserved)
+			hugetlb_acct_memory(spool->hstate, -spool->max_hpages);
 		kfree(spool);
+	}
 }
 
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
@@ -91,6 +96,14 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
 	return spool;
 }
 
+bool reserve_hugepage_subpool(struct hugepage_subpool *spool)
+{
+	if (hugetlb_acct_memory(spool->hstate, spool->max_hpages))
+		return false;
+	spool->reserved = true;
+	return true;
+}
+
 void hugepage_put_subpool(struct hugepage_subpool *spool)
 {
 	spin_lock(&spool->lock);
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
  2015-02-27 22:58 ` Mike Kravetz
@ 2015-03-02 23:10   ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> hugetlbfs allocates huge pages from the global pool as needed.  Even if
> the global pool contains a sufficient number pages for the filesystem
> size at mount time, those global pages could be grabbed for some other
> use.  As a result, filesystem huge page allocations may fail due to lack
> of pages.

Well OK, but why is this a sufficiently serious problem to justify
kernel changes?  Please provide enough info for others to be able
to understand the value of the change.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-03-02 23:10   ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> hugetlbfs allocates huge pages from the global pool as needed.  Even if
> the global pool contains a sufficient number pages for the filesystem
> size at mount time, those global pages could be grabbed for some other
> use.  As a result, filesystem huge page allocations may fail due to lack
> of pages.

Well OK, but why is this a sufficiently serious problem to justify
kernel changes?  Please provide enough info for others to be able
to understand the value of the change.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
  2015-02-27 22:58   ` Mike Kravetz
@ 2015-03-02 23:10     ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:10 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Add a boolean to the subpool structure to indicate that the pages for
> subpool have been reserved.  The hstate pointer in the subpool is
> convienient to have when it comes time to unreserve the pages.
> subool_reserved() is a handy way to check if reserved and take into
> account a NULL subpool.
> 
> ...
>
> @@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
>  #define for_each_hstate(h) \
>  	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>  
> +static inline bool subpool_reserved(struct hugepage_subpool *spool)
> +{
> +	return spool && spool->reserved;
> +}

"subpool_reserved" is not a good identifier.

>  struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
>  void hugepage_put_subpool(struct hugepage_subpool *spool);

See what they did?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
@ 2015-03-02 23:10     ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:10 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Add a boolean to the subpool structure to indicate that the pages for
> subpool have been reserved.  The hstate pointer in the subpool is
> convienient to have when it comes time to unreserve the pages.
> subool_reserved() is a handy way to check if reserved and take into
> account a NULL subpool.
> 
> ...
>
> @@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
>  #define for_each_hstate(h) \
>  	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>  
> +static inline bool subpool_reserved(struct hugepage_subpool *spool)
> +{
> +	return spool && spool->reserved;
> +}

"subpool_reserved" is not a good identifier.

>  struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
>  void hugepage_put_subpool(struct hugepage_subpool *spool);

See what they did?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
  2015-02-27 22:58   ` Mike Kravetz
@ 2015-03-02 23:10     ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:11 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> If the pages for a subpool are reserved, then the reservations have
> already been accounted for in the global pool.  Therefore, when
> requesting a new reservation (such as for a mapping) for the subpool
> do not count again in global pool.  However, when actually allocating
> a page for the subpool decrement global reserve count to correspond to
> with decrement in global free pages.

The last sentence made my brain hurt.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
@ 2015-03-02 23:10     ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:11 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> If the pages for a subpool are reserved, then the reservations have
> already been accounted for in the global pool.  Therefore, when
> requesting a new reservation (such as for a mapping) for the subpool
> do not count again in global pool.  However, when actually allocating
> a page for the subpool decrement global reserve count to correspond to
> with decrement in global free pages.

The last sentence made my brain hurt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/3] hugetlbfs: accept subpool reserved option and setup accordingly
  2015-02-27 22:58   ` Mike Kravetz
@ 2015-03-02 23:10     ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:13 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Make reserved be an option when mounting a hugetlbfs.

New mount option triggers a user documentation update.  hugetlbfs isn't
well documented, but Documentation/vm/hugetlbpage.txt looks like the
place.


> reserved
> option is only possible if size option is also specified.

The code doesn't appear to check for this (maybe it does).  Probably it
should do so, and warn when it fails.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/3] hugetlbfs: accept subpool reserved option and setup accordingly
@ 2015-03-02 23:10     ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-03-02 23:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Davidlohr Bueso,
	Aneesh Kumar, Joonsoo Kim

On Fri, 27 Feb 2015 14:58:13 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Make reserved be an option when mounting a hugetlbfs.

New mount option triggers a user documentation update.  hugetlbfs isn't
well documented, but Documentation/vm/hugetlbpage.txt looks like the
place.


> reserved
> option is only possible if size option is also specified.

The code doesn't appear to check for this (maybe it does).  Probably it
should do so, and warn when it fails.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-02 23:10   ` Andrew Morton
@ 2015-03-03  1:18     ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> hugetlbfs allocates huge pages from the global pool as needed.  Even if
>> the global pool contains a sufficient number pages for the filesystem
>> size at mount time, those global pages could be grabbed for some other
>> use.  As a result, filesystem huge page allocations may fail due to lack
>> of pages.
>
> Well OK, but why is this a sufficiently serious problem to justify
> kernel changes?  Please provide enough info for others to be able
> to understand the value of the change.
>

Thanks for taking a look.

Applications such as a database want to use huge pages for performance
reasons.  hugetlbfs filesystem semantics with ownership and modes work
well to manage access to a pool of huge pages.  However, the application
would like some reasonable assurance that allocations will not fail due
to a lack of huge pages.  Before starting, the application will ensure
that enough huge pages exist on the system in the global pools.  What
the application wants is exclusive use of a pool of huge pages.

One could argue that this is a system administration issue.  The global
huge page pools are only available to users with root privilege.
Therefore,  exclusive use of a pool of huge pages can be obtained by
limiting access.  However, many applications are installed to run with
elevated privilege to take advantage of resources like huge pages.  It
is quite possible for one application to interfere another, especially
in the case of something like huge pages where the pool size is mostly
fixed.

Suggestions for other ways to approach this situation are appreciated.
I saw the existing support for "reservations" within hugetlbfs and
thought of extending this to cover the size of the filesystem.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-03-03  1:18     ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> hugetlbfs allocates huge pages from the global pool as needed.  Even if
>> the global pool contains a sufficient number pages for the filesystem
>> size at mount time, those global pages could be grabbed for some other
>> use.  As a result, filesystem huge page allocations may fail due to lack
>> of pages.
>
> Well OK, but why is this a sufficiently serious problem to justify
> kernel changes?  Please provide enough info for others to be able
> to understand the value of the change.
>

Thanks for taking a look.

Applications such as a database want to use huge pages for performance
reasons.  hugetlbfs filesystem semantics with ownership and modes work
well to manage access to a pool of huge pages.  However, the application
would like some reasonable assurance that allocations will not fail due
to a lack of huge pages.  Before starting, the application will ensure
that enough huge pages exist on the system in the global pools.  What
the application wants is exclusive use of a pool of huge pages.

One could argue that this is a system administration issue.  The global
huge page pools are only available to users with root privilege.
Therefore,  exclusive use of a pool of huge pages can be obtained by
limiting access.  However, many applications are installed to run with
elevated privilege to take advantage of resources like huge pages.  It
is quite possible for one application to interfere another, especially
in the case of something like huge pages where the pool size is mostly
fixed.

Suggestions for other ways to approach this situation are appreciated.
I saw the existing support for "reservations" within hugetlbfs and
thought of extending this to cover the size of the filesystem.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
  2015-03-02 23:10     ` Andrew Morton
@ 2015-03-03  1:20       ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:10 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> Add a boolean to the subpool structure to indicate that the pages for
>> subpool have been reserved.  The hstate pointer in the subpool is
>> convienient to have when it comes time to unreserve the pages.
>> subool_reserved() is a handy way to check if reserved and take into
>> account a NULL subpool.
>>
>> ...
>>
>> @@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
>>   #define for_each_hstate(h) \
>>   	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>>
>> +static inline bool subpool_reserved(struct hugepage_subpool *spool)
>> +{
>> +	return spool && spool->reserved;
>> +}
>
> "subpool_reserved" is not a good identifier.
>
>>   struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
>>   void hugepage_put_subpool(struct hugepage_subpool *spool);
>
> See what they did?

Got it. Thanks. hugepage_subpool_reserved

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure
@ 2015-03-03  1:20       ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:10 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> Add a boolean to the subpool structure to indicate that the pages for
>> subpool have been reserved.  The hstate pointer in the subpool is
>> convienient to have when it comes time to unreserve the pages.
>> subool_reserved() is a handy way to check if reserved and take into
>> account a NULL subpool.
>>
>> ...
>>
>> @@ -38,6 +40,10 @@ extern int hugetlb_max_hstate __read_mostly;
>>   #define for_each_hstate(h) \
>>   	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>>
>> +static inline bool subpool_reserved(struct hugepage_subpool *spool)
>> +{
>> +	return spool && spool->reserved;
>> +}
>
> "subpool_reserved" is not a good identifier.
>
>>   struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
>>   void hugepage_put_subpool(struct hugepage_subpool *spool);
>
> See what they did?

Got it. Thanks. hugepage_subpool_reserved

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
  2015-03-02 23:10     ` Andrew Morton
@ 2015-03-03  1:30       ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:11 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> If the pages for a subpool are reserved, then the reservations have
>> already been accounted for in the global pool.  Therefore, when
>> requesting a new reservation (such as for a mapping) for the subpool
>> do not count again in global pool.  However, when actually allocating
>> a page for the subpool decrement global reserve count to correspond to
>> with decrement in global free pages.
>
> The last sentence made my brain hurt.
>

Sorry.  I was trying to point out that the global free and reserve
accounting is still the same when doing a page allocation, even
though the entire size of the subpool was reserved.  For example,
when allocating a page the global free and reserve counts are both
decremented.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
@ 2015-03-03  1:30       ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:11 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> If the pages for a subpool are reserved, then the reservations have
>> already been accounted for in the global pool.  Therefore, when
>> requesting a new reservation (such as for a mapping) for the subpool
>> do not count again in global pool.  However, when actually allocating
>> a page for the subpool decrement global reserve count to correspond to
>> with decrement in global free pages.
>
> The last sentence made my brain hurt.
>

Sorry.  I was trying to point out that the global free and reserve
accounting is still the same when doing a page allocation, even
though the entire size of the subpool was reserved.  For example,
when allocating a page the global free and reserve counts are both
decremented.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/3] hugetlbfs: accept subpool reserved option and setup accordingly
  2015-03-02 23:10     ` Andrew Morton
@ 2015-03-03  1:36       ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:13 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> Make reserved be an option when mounting a hugetlbfs.
>
> New mount option triggers a user documentation update.  hugetlbfs isn't
> well documented, but Documentation/vm/hugetlbpage.txt looks like the
> place.
>

Will do

>
>> reserved
>> option is only possible if size option is also specified.
>
> The code doesn't appear to check for this (maybe it does).  Probably it
> should do so, and warn when it fails.
>

It is hard to see from the diffs, but this case is covered.  If size is
not specified, it implies the size is "unlimited".  The code in the
patch actually makes the mount fail in this case.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/3] hugetlbfs: accept subpool reserved option and setup accordingly
@ 2015-03-03  1:36       ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-03  1:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Nadia Yvette Chambers, Aneesh Kumar, Joonsoo Kim

On 03/02/2015 03:10 PM, Andrew Morton wrote:
> On Fri, 27 Feb 2015 14:58:13 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> Make reserved be an option when mounting a hugetlbfs.
>
> New mount option triggers a user documentation update.  hugetlbfs isn't
> well documented, but Documentation/vm/hugetlbpage.txt looks like the
> place.
>

Will do

>
>> reserved
>> option is only possible if size option is also specified.
>
> The code doesn't appear to check for this (maybe it does).  Probably it
> should do so, and warn when it fails.
>

It is hard to see from the diffs, but this case is covered.  If size is
not specified, it implies the size is "unlimited".  The code in the
patch actually makes the mount fail in this case.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-03  1:18     ` Mike Kravetz
@ 2015-03-06 15:10       ` Michal Hocko
  -1 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2015-03-06 15:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, linux-mm, linux-kernel, Nadia Yvette Chambers,
	Aneesh Kumar, Joonsoo Kim

On Mon 02-03-15 17:18:14, Mike Kravetz wrote:
> On 03/02/2015 03:10 PM, Andrew Morton wrote:
> >On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> >>hugetlbfs allocates huge pages from the global pool as needed.  Even if
> >>the global pool contains a sufficient number pages for the filesystem
> >>size at mount time, those global pages could be grabbed for some other
> >>use.  As a result, filesystem huge page allocations may fail due to lack
> >>of pages.
> >
> >Well OK, but why is this a sufficiently serious problem to justify
> >kernel changes?  Please provide enough info for others to be able
> >to understand the value of the change.
> >
> 
> Thanks for taking a look.
> 
> Applications such as a database want to use huge pages for performance
> reasons.  hugetlbfs filesystem semantics with ownership and modes work
> well to manage access to a pool of huge pages.  However, the application
> would like some reasonable assurance that allocations will not fail due
> to a lack of huge pages.  Before starting, the application will ensure
> that enough huge pages exist on the system in the global pools.  What
> the application wants is exclusive use of a pool of huge pages.
> 
> One could argue that this is a system administration issue.  The global
> huge page pools are only available to users with root privilege.
> Therefore,  exclusive use of a pool of huge pages can be obtained by
> limiting access.  However, many applications are installed to run with
> elevated privilege to take advantage of resources like huge pages.  It
> is quite possible for one application to interfere another, especially
> in the case of something like huge pages where the pool size is mostly
> fixed.
> 
> Suggestions for other ways to approach this situation are appreciated.
> I saw the existing support for "reservations" within hugetlbfs and
> thought of extending this to cover the size of the filesystem.

Maybe I do not understand your usecase properly but wouldn't hugetlb
cgroup (CONFIG_CGROUP_HUGETLB) help to guarantee the same? Just
configure limits for different users/applications (inside different
groups) so that they never overcommit the existing pool. Would that work
for you?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-03-06 15:10       ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2015-03-06 15:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, linux-mm, linux-kernel, Nadia Yvette Chambers,
	Aneesh Kumar, Joonsoo Kim

On Mon 02-03-15 17:18:14, Mike Kravetz wrote:
> On 03/02/2015 03:10 PM, Andrew Morton wrote:
> >On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> >>hugetlbfs allocates huge pages from the global pool as needed.  Even if
> >>the global pool contains a sufficient number pages for the filesystem
> >>size at mount time, those global pages could be grabbed for some other
> >>use.  As a result, filesystem huge page allocations may fail due to lack
> >>of pages.
> >
> >Well OK, but why is this a sufficiently serious problem to justify
> >kernel changes?  Please provide enough info for others to be able
> >to understand the value of the change.
> >
> 
> Thanks for taking a look.
> 
> Applications such as a database want to use huge pages for performance
> reasons.  hugetlbfs filesystem semantics with ownership and modes work
> well to manage access to a pool of huge pages.  However, the application
> would like some reasonable assurance that allocations will not fail due
> to a lack of huge pages.  Before starting, the application will ensure
> that enough huge pages exist on the system in the global pools.  What
> the application wants is exclusive use of a pool of huge pages.
> 
> One could argue that this is a system administration issue.  The global
> huge page pools are only available to users with root privilege.
> Therefore,  exclusive use of a pool of huge pages can be obtained by
> limiting access.  However, many applications are installed to run with
> elevated privilege to take advantage of resources like huge pages.  It
> is quite possible for one application to interfere another, especially
> in the case of something like huge pages where the pool size is mostly
> fixed.
> 
> Suggestions for other ways to approach this situation are appreciated.
> I saw the existing support for "reservations" within hugetlbfs and
> thought of extending this to cover the size of the filesystem.

Maybe I do not understand your usecase properly but wouldn't hugetlb
cgroup (CONFIG_CGROUP_HUGETLB) help to guarantee the same? Just
configure limits for different users/applications (inside different
groups) so that they never overcommit the existing pool. Would that work
for you?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-06 15:10       ` Michal Hocko
@ 2015-03-06 18:58         ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-06 18:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Aneesh Kumar, Joonsoo Kim,
	David Rientjes

On 03/06/2015 07:10 AM, Michal Hocko wrote:
> On Mon 02-03-15 17:18:14, Mike Kravetz wrote:
>> On 03/02/2015 03:10 PM, Andrew Morton wrote:
>>> On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>
>>>> hugetlbfs allocates huge pages from the global pool as needed.  Even if
>>>> the global pool contains a sufficient number pages for the filesystem
>>>> size at mount time, those global pages could be grabbed for some other
>>>> use.  As a result, filesystem huge page allocations may fail due to lack
>>>> of pages.
>>>
>>> Well OK, but why is this a sufficiently serious problem to justify
>>> kernel changes?  Please provide enough info for others to be able
>>> to understand the value of the change.
>>>
>>
>> Thanks for taking a look.
>>
>> Applications such as a database want to use huge pages for performance
>> reasons.  hugetlbfs filesystem semantics with ownership and modes work
>> well to manage access to a pool of huge pages.  However, the application
>> would like some reasonable assurance that allocations will not fail due
>> to a lack of huge pages.  Before starting, the application will ensure
>> that enough huge pages exist on the system in the global pools.  What
>> the application wants is exclusive use of a pool of huge pages.
>>
>> One could argue that this is a system administration issue.  The global
>> huge page pools are only available to users with root privilege.
>> Therefore,  exclusive use of a pool of huge pages can be obtained by
>> limiting access.  However, many applications are installed to run with
>> elevated privilege to take advantage of resources like huge pages.  It
>> is quite possible for one application to interfere another, especially
>> in the case of something like huge pages where the pool size is mostly
>> fixed.
>>
>> Suggestions for other ways to approach this situation are appreciated.
>> I saw the existing support for "reservations" within hugetlbfs and
>> thought of extending this to cover the size of the filesystem.
>
> Maybe I do not understand your usecase properly but wouldn't hugetlb
> cgroup (CONFIG_CGROUP_HUGETLB) help to guarantee the same? Just
> configure limits for different users/applications (inside different
> groups) so that they never overcommit the existing pool. Would that work
> for you?

Thanks for the CONFIG_CGROUP_HUGETLB suggestion, however I do not
believe this will be a satisfactory solution for my usecase.  As you
point out, cgroups could be set up (by a sysadmin) for every hugetlb
user/application.  In this case, the sysadmin needs to have knowledge
of every huge page user/application and configure appropriately.

I was approaching this from the point of view of the application.  The
application wants the guarantee of a minimum number of huge pages,
independent of other users/applications.  The "reserve" approach allows
the application to set aside those pages at initialization time.  If it
can not get the pages it needs, it can refuse to start, or configure
itself to use less, or take other action.

As you point out, the cgroup approach could also provide guarantees to
the application if set up properly.  I was trying for an approach that
would provide more control to the application independent of the
sysadmin and other users/applications.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-03-06 18:58         ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-06 18:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Aneesh Kumar, Joonsoo Kim,
	David Rientjes

On 03/06/2015 07:10 AM, Michal Hocko wrote:
> On Mon 02-03-15 17:18:14, Mike Kravetz wrote:
>> On 03/02/2015 03:10 PM, Andrew Morton wrote:
>>> On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>
>>>> hugetlbfs allocates huge pages from the global pool as needed.  Even if
>>>> the global pool contains a sufficient number pages for the filesystem
>>>> size at mount time, those global pages could be grabbed for some other
>>>> use.  As a result, filesystem huge page allocations may fail due to lack
>>>> of pages.
>>>
>>> Well OK, but why is this a sufficiently serious problem to justify
>>> kernel changes?  Please provide enough info for others to be able
>>> to understand the value of the change.
>>>
>>
>> Thanks for taking a look.
>>
>> Applications such as a database want to use huge pages for performance
>> reasons.  hugetlbfs filesystem semantics with ownership and modes work
>> well to manage access to a pool of huge pages.  However, the application
>> would like some reasonable assurance that allocations will not fail due
>> to a lack of huge pages.  Before starting, the application will ensure
>> that enough huge pages exist on the system in the global pools.  What
>> the application wants is exclusive use of a pool of huge pages.
>>
>> One could argue that this is a system administration issue.  The global
>> huge page pools are only available to users with root privilege.
>> Therefore,  exclusive use of a pool of huge pages can be obtained by
>> limiting access.  However, many applications are installed to run with
>> elevated privilege to take advantage of resources like huge pages.  It
>> is quite possible for one application to interfere another, especially
>> in the case of something like huge pages where the pool size is mostly
>> fixed.
>>
>> Suggestions for other ways to approach this situation are appreciated.
>> I saw the existing support for "reservations" within hugetlbfs and
>> thought of extending this to cover the size of the filesystem.
>
> Maybe I do not understand your usecase properly but wouldn't hugetlb
> cgroup (CONFIG_CGROUP_HUGETLB) help to guarantee the same? Just
> configure limits for different users/applications (inside different
> groups) so that they never overcommit the existing pool. Would that work
> for you?

Thanks for the CONFIG_CGROUP_HUGETLB suggestion, however I do not
believe this will be a satisfactory solution for my usecase.  As you
point out, cgroups could be set up (by a sysadmin) for every hugetlb
user/application.  In this case, the sysadmin needs to have knowledge
of every huge page user/application and configure appropriately.

I was approaching this from the point of view of the application.  The
application wants the guarantee of a minimum number of huge pages,
independent of other users/applications.  The "reserve" approach allows
the application to set aside those pages at initialization time.  If it
can not get the pages it needs, it can refuse to start, or configure
itself to use less, or take other action.

As you point out, the cgroup approach could also provide guarantees to
the application if set up properly.  I was trying for an approach that
would provide more control to the application independent of the
sysadmin and other users/applications.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-06 18:58         ` Mike Kravetz
@ 2015-03-06 21:14           ` David Rientjes
  -1 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-03-06 21:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Andrew Morton, linux-mm, linux-kernel,
	Aneesh Kumar, Joonsoo Kim

On Fri, 6 Mar 2015, Mike Kravetz wrote:

> Thanks for the CONFIG_CGROUP_HUGETLB suggestion, however I do not
> believe this will be a satisfactory solution for my usecase.  As you
> point out, cgroups could be set up (by a sysadmin) for every hugetlb
> user/application.  In this case, the sysadmin needs to have knowledge
> of every huge page user/application and configure appropriately.
> 
> I was approaching this from the point of view of the application.  The
> application wants the guarantee of a minimum number of huge pages,
> independent of other users/applications.  The "reserve" approach allows
> the application to set aside those pages at initialization time.  If it
> can not get the pages it needs, it can refuse to start, or configure
> itself to use less, or take other action.
> 

Would it be too difficult to modify the application to mmap() the 
hugepages at startup so they are no longer free in the global pool but 
rather get marked as reserved so other applications cannot map them?  That 
should return MAP_FAILED if there is an insufficient number of hugepages 
available to be reserved (HugePages_Rsvd in /proc/meminfo).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-03-06 21:14           ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-03-06 21:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Andrew Morton, linux-mm, linux-kernel,
	Aneesh Kumar, Joonsoo Kim

On Fri, 6 Mar 2015, Mike Kravetz wrote:

> Thanks for the CONFIG_CGROUP_HUGETLB suggestion, however I do not
> believe this will be a satisfactory solution for my usecase.  As you
> point out, cgroups could be set up (by a sysadmin) for every hugetlb
> user/application.  In this case, the sysadmin needs to have knowledge
> of every huge page user/application and configure appropriately.
> 
> I was approaching this from the point of view of the application.  The
> application wants the guarantee of a minimum number of huge pages,
> independent of other users/applications.  The "reserve" approach allows
> the application to set aside those pages at initialization time.  If it
> can not get the pages it needs, it can refuse to start, or configure
> itself to use less, or take other action.
> 

Would it be too difficult to modify the application to mmap() the 
hugepages at startup so they are no longer free in the global pool but 
rather get marked as reserved so other applications cannot map them?  That 
should return MAP_FAILED if there is an insufficient number of hugepages 
available to be reserved (HugePages_Rsvd in /proc/meminfo).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
  2015-03-06 21:14           ` David Rientjes
@ 2015-03-06 21:32             ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-06 21:32 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Andrew Morton, linux-mm, linux-kernel,
	Aneesh Kumar, Joonsoo Kim

On 03/06/2015 01:14 PM, David Rientjes wrote:
> On Fri, 6 Mar 2015, Mike Kravetz wrote:
>
>> Thanks for the CONFIG_CGROUP_HUGETLB suggestion, however I do not
>> believe this will be a satisfactory solution for my usecase.  As you
>> point out, cgroups could be set up (by a sysadmin) for every hugetlb
>> user/application.  In this case, the sysadmin needs to have knowledge
>> of every huge page user/application and configure appropriately.
>>
>> I was approaching this from the point of view of the application.  The
>> application wants the guarantee of a minimum number of huge pages,
>> independent of other users/applications.  The "reserve" approach allows
>> the application to set aside those pages at initialization time.  If it
>> can not get the pages it needs, it can refuse to start, or configure
>> itself to use less, or take other action.
>>
>
> Would it be too difficult to modify the application to mmap() the
> hugepages at startup so they are no longer free in the global pool but
> rather get marked as reserved so other applications cannot map them?  That
> should return MAP_FAILED if there is an insufficient number of hugepages
> available to be reserved (HugePages_Rsvd in /proc/meminfo).

The application is a database with multiple processes/tasks that will
come and go over time.  I thought about having one task do a big
mmap() at initialization time, but then the issue is how to coordinate
with the other tasks and their requests to allocate/free pages.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time
@ 2015-03-06 21:32             ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-03-06 21:32 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Andrew Morton, linux-mm, linux-kernel,
	Aneesh Kumar, Joonsoo Kim

On 03/06/2015 01:14 PM, David Rientjes wrote:
> On Fri, 6 Mar 2015, Mike Kravetz wrote:
>
>> Thanks for the CONFIG_CGROUP_HUGETLB suggestion, however I do not
>> believe this will be a satisfactory solution for my usecase.  As you
>> point out, cgroups could be set up (by a sysadmin) for every hugetlb
>> user/application.  In this case, the sysadmin needs to have knowledge
>> of every huge page user/application and configure appropriately.
>>
>> I was approaching this from the point of view of the application.  The
>> application wants the guarantee of a minimum number of huge pages,
>> independent of other users/applications.  The "reserve" approach allows
>> the application to set aside those pages at initialization time.  If it
>> can not get the pages it needs, it can refuse to start, or configure
>> itself to use less, or take other action.
>>
>
> Would it be too difficult to modify the application to mmap() the
> hugepages at startup so they are no longer free in the global pool but
> rather get marked as reserved so other applications cannot map them?  That
> should return MAP_FAILED if there is an insufficient number of hugepages
> available to be reserved (HugePages_Rsvd in /proc/meminfo).

The application is a database with multiple processes/tasks that will
come and go over time.  I thought about having one task do a big
mmap() at initialization time, but then the issue is how to coordinate
with the other tasks and their requests to allocate/free pages.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
  2015-02-28  3:25 ` Hillf Danton
@ 2015-02-28 17:25   ` Mike Kravetz
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-28 17:25 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, linux-mm, Andrew Morton, davidlohr,
	'Aneesh Kumar', 'Joonsoo Kim'

On 02/27/2015 07:25 PM, Hillf Danton wrote:
>> @@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
>>   	 * Check enough hugepages are available for the reservation.
>>   	 * Hand the pages back to the subpool if there are not
>>   	 */
>
> Better if comment is updated correspondingly.
> Hillf

Thanks Hillf.  I'll also take a look at other comments in the area
of 'accounting'.  As I discovered, it is only a matter of adjusting
the accounting to support reservation of pages for the entire filesystem.
-- 
Mike Kravetz

>> -	ret = hugetlb_acct_memory(h, chg);
>> -	if (ret < 0) {
>> -		hugepage_subpool_put_pages(spool, chg);
>> -		goto out_err;
>> +	if (subpool_reserved(spool))
>> +		ret = 0;
>> +	else {
>> +		ret = hugetlb_acct_memory(h, chg);
>> +		if (ret < 0) {
>> +			hugepage_subpool_put_pages(spool, chg);
>> +			goto out_err;
>> +		}
>>   	}
>>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
@ 2015-02-28 17:25   ` Mike Kravetz
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Kravetz @ 2015-02-28 17:25 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, linux-mm, Andrew Morton, davidlohr,
	'Aneesh Kumar', 'Joonsoo Kim'

On 02/27/2015 07:25 PM, Hillf Danton wrote:
>> @@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
>>   	 * Check enough hugepages are available for the reservation.
>>   	 * Hand the pages back to the subpool if there are not
>>   	 */
>
> Better if comment is updated correspondingly.
> Hillf

Thanks Hillf.  I'll also take a look at other comments in the area
of 'accounting'.  As I discovered, it is only a matter of adjusting
the accounting to support reservation of pages for the entire filesystem.
-- 
Mike Kravetz

>> -	ret = hugetlb_acct_memory(h, chg);
>> -	if (ret < 0) {
>> -		hugepage_subpool_put_pages(spool, chg);
>> -		goto out_err;
>> +	if (subpool_reserved(spool))
>> +		ret = 0;
>> +	else {
>> +		ret = hugetlb_acct_memory(h, chg);
>> +		if (ret < 0) {
>> +			hugepage_subpool_put_pages(spool, chg);
>> +			goto out_err;
>> +		}
>>   	}
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
@ 2015-02-28  3:25 ` Hillf Danton
  0 siblings, 0 replies; 40+ messages in thread
From: Hillf Danton @ 2015-02-28  3:25 UTC (permalink / raw)
  To: 'Mike Kravetz'
  Cc: linux-kernel, linux-mm, Andrew Morton, davidlohr,
	'Aneesh Kumar', 'Joonsoo Kim'

> @@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	 * Check enough hugepages are available for the reservation.
>  	 * Hand the pages back to the subpool if there are not
>  	 */

Better if comment is updated correspondingly.
Hillf
> -	ret = hugetlb_acct_memory(h, chg);
> -	if (ret < 0) {
> -		hugepage_subpool_put_pages(spool, chg);
> -		goto out_err;
> +	if (subpool_reserved(spool))
> +		ret = 0;
> +	else {
> +		ret = hugetlb_acct_memory(h, chg);
> +		if (ret < 0) {
> +			hugepage_subpool_put_pages(spool, chg);
> +			goto out_err;
> +		}
>  	}
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting
@ 2015-02-28  3:25 ` Hillf Danton
  0 siblings, 0 replies; 40+ messages in thread
From: Hillf Danton @ 2015-02-28  3:25 UTC (permalink / raw)
  To: 'Mike Kravetz'
  Cc: linux-kernel, linux-mm, Andrew Morton, davidlohr,
	'Aneesh Kumar', 'Joonsoo Kim'

> @@ -3444,10 +3445,14 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	 * Check enough hugepages are available for the reservation.
>  	 * Hand the pages back to the subpool if there are not
>  	 */

Better if comment is updated correspondingly.
Hillf
> -	ret = hugetlb_acct_memory(h, chg);
> -	if (ret < 0) {
> -		hugepage_subpool_put_pages(spool, chg);
> -		goto out_err;
> +	if (subpool_reserved(spool))
> +		ret = 0;
> +	else {
> +		ret = hugetlb_acct_memory(h, chg);
> +		if (ret < 0) {
> +			hugepage_subpool_put_pages(spool, chg);
> +			goto out_err;
> +		}
>  	}
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-03-06 21:33 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-27 22:58 [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time Mike Kravetz
2015-02-27 22:58 ` Mike Kravetz
2015-02-27 22:58 ` [RFC 1/3] hugetlbfs: add reserved mount fields to subpool structure Mike Kravetz
2015-02-27 22:58   ` Mike Kravetz
2015-02-27 22:58 ` Mike Kravetz
2015-02-27 22:58   ` Mike Kravetz
2015-03-02 23:10   ` Andrew Morton
2015-03-02 23:10     ` Andrew Morton
2015-03-03  1:20     ` Mike Kravetz
2015-03-03  1:20       ` Mike Kravetz
2015-02-27 22:58 ` [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting Mike Kravetz
2015-02-27 22:58   ` Mike Kravetz
2015-03-02 23:10   ` Andrew Morton
2015-03-02 23:10     ` Andrew Morton
2015-03-03  1:30     ` Mike Kravetz
2015-03-03  1:30       ` Mike Kravetz
2015-02-27 22:58 ` Mike Kravetz
2015-02-27 22:58   ` Mike Kravetz
2015-02-27 22:58 ` [RFC 3/3] hugetlbfs: accept subpool reserved option and setup accordingly Mike Kravetz
2015-02-27 22:58   ` Mike Kravetz
2015-03-02 23:10   ` Andrew Morton
2015-03-02 23:10     ` Andrew Morton
2015-03-03  1:36     ` Mike Kravetz
2015-03-03  1:36       ` Mike Kravetz
2015-03-02 23:10 ` [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time Andrew Morton
2015-03-02 23:10   ` Andrew Morton
2015-03-03  1:18   ` Mike Kravetz
2015-03-03  1:18     ` Mike Kravetz
2015-03-06 15:10     ` Michal Hocko
2015-03-06 15:10       ` Michal Hocko
2015-03-06 18:58       ` Mike Kravetz
2015-03-06 18:58         ` Mike Kravetz
2015-03-06 21:14         ` David Rientjes
2015-03-06 21:14           ` David Rientjes
2015-03-06 21:32           ` Mike Kravetz
2015-03-06 21:32             ` Mike Kravetz
2015-02-28  3:25 [RFC 2/3] hugetlbfs: coordinate global and subpool reserve accounting Hillf Danton
2015-02-28  3:25 ` Hillf Danton
2015-02-28 17:25 ` Mike Kravetz
2015-02-28 17:25   ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.