From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com [209.85.192.169]) by kanga.kvack.org (Postfix) with ESMTP id 993726B0038 for ; Fri, 20 Mar 2015 16:47:54 -0400 (EDT) Received: by pdbni2 with SMTP id ni2so118887544pdb.1 for ; Fri, 20 Mar 2015 13:47:54 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id p5si11525189par.19.2015.03.20.13.47.51 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Mar 2015 13:47:53 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V3 0/4] hugetlbfs: add min_size filesystem mount option Date: Fri, 20 Mar 2015 13:47:06 -0700 Message-Id: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Andi Kleen , David Rientjes , Mike Kravetz hugetlbfs allocates huge pages from the global pool as needed. Even if the global pool contains a sufficient number pages for the filesystem size at mount time, those global pages could be grabbed for some other use. As a result, filesystem huge page allocations may fail due to lack of pages. Applications such as a database want to use huge pages for performance reasons. hugetlbfs filesystem semantics with ownership and modes work well to manage access to a pool of huge pages. However, the application would like some reasonable assurance that allocations will not fail due to a lack of huge pages. At application startup time, the application would like to configure itself to use a specific number of huge pages. Before starting, the application can check to make sure that enough huge pages exist in the system global pools. However, there are no guarantees that those pages will be available when needed by the application. What the application wants is exclusive use of a subset of huge pages. Add a new hugetlbfs mount option 'min_size=' to indicate that the specified number of pages will be available for use by the filesystem. At mount time, this number of huge pages will be reserved for exclusive use of the filesystem. If there is not a sufficient number of free pages, the mount will fail. As pages are allocated to and freeed from the filesystem, the number of reserved pages is adjusted so that the specified minimum is maintained. V3: Struct init and comment cleanup as suggested by Andrew Morton Cleaned up size option argument parsing to hopefully be more clear Fixed kbuild warning introduced in V2 Made documentation more explicit and added descriptions for 'existing' pagesize options and ability to specify size as a percent V2: Added ability to specify minimum size. Suggsted by David Rientjes V1: Comments from RFC addressed/incorporated Mike Kravetz (4): hugetlbfs: add minimum size tracking fields to subpool structure hugetlbfs: add minimum size accounting to subpools hugetlbfs: accept subpool min_size mount option and setup accordingly hugetlbfs: document min_size mount option and cleanup Documentation/vm/hugetlbpage.txt | 31 +++++--- fs/hugetlbfs/inode.c | 90 ++++++++++++++++++----- include/linux/hugetlb.h | 11 ++- mm/hugetlb.c | 151 +++++++++++++++++++++++++++++++-------- 4 files changed, 224 insertions(+), 59 deletions(-) -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id 631666B006C for ; Fri, 20 Mar 2015 16:48:01 -0400 (EDT) Received: by pacwe9 with SMTP id we9so119637778pac.1 for ; Fri, 20 Mar 2015 13:48:01 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id w4si11142121pdi.213.2015.03.20.13.47.57 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 20 Mar 2015 13:47:59 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V3 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Date: Fri, 20 Mar 2015 13:47:07 -0700 Message-Id: <3b1775e06c1cb376570eab1084f482730ba8dd7d.1426880499.git.mike.kravetz@oracle.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Andi Kleen , David Rientjes , Mike Kravetz Add a field to the subpool structure to indicate the minimimum number of huge pages to always be used by this subpool. This minimum count includes allocated pages as well as reserved pages. If the minimum number of pages for the subpool have not been allocated, pages are reserved up to this minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 8 +++++++- mm/hugetlb.c | 3 +-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 431b7fc..2ec06a1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -22,7 +22,13 @@ struct mmu_gather; struct hugepage_subpool { spinlock_t lock; long count; - long max_hpages, used_hpages; + long max_hpages; /* Maximum huge pages or -1 if no maximum. */ + long used_hpages; /* Used count against maximum, includes */ + /* both alloced and reserved pages. */ + struct hstate *hstate; + long min_hpages; /* Minimum huge pages or -1 if no minimum. */ + long rsv_hpages; /* Pages reserved against global pool to */ + /* sasitfy minimum size. */ }; struct resv_map { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 85032de..0b4a01c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -77,14 +77,13 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) { struct hugepage_subpool *spool; - spool = kmalloc(sizeof(*spool), GFP_KERNEL); + spool = kzalloc(sizeof(*spool), GFP_KERNEL); if (!spool) return NULL; spin_lock_init(&spool->lock); spool->count = 1; spool->max_hpages = nr_blocks; - spool->used_hpages = 0; return spool; } -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id 52E316B006E for ; Fri, 20 Mar 2015 16:48:05 -0400 (EDT) Received: by pagj4 with SMTP id j4so28421321pag.2 for ; Fri, 20 Mar 2015 13:48:05 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id nx5si11219470pdb.178.2015.03.20.13.48.03 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 20 Mar 2015 13:48:04 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V3 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Date: Fri, 20 Mar 2015 13:47:09 -0700 Message-Id: <098483d9a9f198ed497ff313126c7f3c8d0d79d7.1426880499.git.mike.kravetz@oracle.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Andi Kleen , David Rientjes , Mike Kravetz Make 'min_size=' be an option when mounting a hugetlbfs. This option takes the same value as the 'size' option. min_size can be specified without specifying size. If both are specified, min_size must be less that or equal to size else the mount will fail. If min_size is specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 90 ++++++++++++++++++++++++++++++++++++++----------- include/linux/hugetlb.h | 3 +- mm/hugetlb.c | 25 +++++++++++--- 3 files changed, 94 insertions(+), 24 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 5eba47f..38d2bc83e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -47,9 +47,10 @@ struct hugetlbfs_config { kuid_t uid; kgid_t gid; umode_t mode; - long nr_blocks; + long max_hpages; long nr_inodes; struct hstate *hstate; + long min_hpages; }; struct hugetlbfs_inode_info { @@ -73,7 +74,7 @@ int sysctl_hugetlb_shm_group; enum { Opt_size, Opt_nr_inodes, Opt_mode, Opt_uid, Opt_gid, - Opt_pagesize, + Opt_pagesize, Opt_min_size, Opt_err, }; @@ -84,6 +85,7 @@ static const match_table_t tokens = { {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_pagesize, "pagesize=%s"}, + {Opt_min_size, "min_size=%s"}, {Opt_err, NULL}, }; @@ -761,14 +763,38 @@ static const struct super_operations hugetlbfs_ops = { .show_options = generic_show_options, }; +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; + +/* + * Convert size option passed from command line to number of huge pages + * in the pool specified by hstate. Size option could be in bytes + * (val_type == SIZE_STD) or percentage of the pool (val_type == SIZE_PERCENT). + */ +static long long +hugetlbfs_size_to_hpages(struct hstate *h, unsigned long long size_opt, + int val_type) +{ + if (val_type == NO_SIZE) + return -1; + + if (val_type == SIZE_PERCENT) { + size_opt <<= huge_page_shift(h); + size_opt *= h->max_huge_pages; + do_div(size_opt, 100); + } + + size_opt >>= huge_page_shift(h); + return size_opt; +} + static int hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) { char *p, *rest; substring_t args[MAX_OPT_ARGS]; int option; - unsigned long long size = 0; - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; + unsigned long long max_size_opt = 0, min_size_opt = 0; + int max_val_type = NO_SIZE, min_val_type = NO_SIZE; if (!options) return 0; @@ -806,10 +832,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) /* memparse() will accept a K/M/G without a digit */ if (!isdigit(*args[0].from)) goto bad_val; - size = memparse(args[0].from, &rest); - setsize = SIZE_STD; + max_size_opt = memparse(args[0].from, &rest); + max_val_type = SIZE_STD; if (*rest == '%') - setsize = SIZE_PERCENT; + max_val_type = SIZE_PERCENT; break; } @@ -832,6 +858,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) break; } + case Opt_min_size: { + /* memparse() will accept a K/M/G without a digit */ + if (!isdigit(*args[0].from)) + goto bad_val; + min_size_opt = memparse(args[0].from, &rest); + min_val_type = SIZE_STD; + if (*rest == '%') + min_val_type = SIZE_PERCENT; + break; + } + default: pr_err("Bad mount option: \"%s\"\n", p); return -EINVAL; @@ -839,15 +876,22 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) } } - /* Do size after hstate is set up */ - if (setsize > NO_SIZE) { - struct hstate *h = pconfig->hstate; - if (setsize == SIZE_PERCENT) { - size <<= huge_page_shift(h); - size *= h->max_huge_pages; - do_div(size, 100); - } - pconfig->nr_blocks = (size >> huge_page_shift(h)); + /* + * Use huge page pool size (in hstate) to convert the size + * options to number of huge pages. If NO_SIZE, -1 is returned. + */ + pconfig->max_hpages = hugetlbfs_size_to_hpages(pconfig->hstate, + max_size_opt, max_val_type); + pconfig->min_hpages = hugetlbfs_size_to_hpages(pconfig->hstate, + min_size_opt, min_val_type); + + /* + * If max_size was specified, then min_size must be smaller + */ + if (max_val_type > NO_SIZE && + pconfig->min_hpages > pconfig->max_hpages) { + pr_err("minimum size can not be greater than maximum size\n"); + return -EINVAL; } return 0; @@ -866,12 +910,13 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) save_mount_options(sb, data); - config.nr_blocks = -1; /* No limit on size by default */ + config.max_hpages = -1; /* No limit on size by default */ config.nr_inodes = -1; /* No limit on number of inodes by default */ config.uid = current_fsuid(); config.gid = current_fsgid(); config.mode = 0755; config.hstate = &default_hstate; + config.min_hpages = -1; /* No default minimum size */ ret = hugetlbfs_parse_options(data, &config); if (ret) return ret; @@ -885,8 +930,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) sbinfo->max_inodes = config.nr_inodes; sbinfo->free_inodes = config.nr_inodes; sbinfo->spool = NULL; - if (config.nr_blocks != -1) { - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); + /* + * Allocate and initialize subpool if maximum or minimum size is + * specified. Any needed reservations (for minimim size) are taken + * taken when the subpool is created. + */ + if (config.max_hpages != -1 || config.min_hpages != -1) { + sbinfo->spool = hugepage_new_subpool(config.hstate, + config.max_hpages, + config.min_hpages); if (!sbinfo->spool) goto out_free; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 2ec06a1..d2b41f4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -44,7 +44,8 @@ extern int hugetlb_max_hstate __read_mostly; #define for_each_hstate(h) \ for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) -struct hugepage_subpool *hugepage_new_subpool(long nr_blocks); +struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, + long min_hpages); void hugepage_put_subpool(struct hugepage_subpool *spool); int PageHuge(struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 28a15c3..84511b6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -61,6 +61,9 @@ DEFINE_SPINLOCK(hugetlb_lock); static int num_fault_mutexes; static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp; +/* Forward declaration */ +static int hugetlb_acct_memory(struct hstate *h, long delta); + static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) { bool free = (spool->count == 0) && (spool->used_hpages == 0); @@ -68,12 +71,18 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) spin_unlock(&spool->lock); /* If no pages are used, and no other handles to the subpool - * remain, free the subpool the subpool remain */ - if (free) + * remain, give up any reservations mased on minimum size and + * free the subpool */ + if (free) { + if (spool->min_hpages != -1) + hugetlb_acct_memory(spool->hstate, + -spool->min_hpages); kfree(spool); + } } -struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) +struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, + long min_hpages) { struct hugepage_subpool *spool; @@ -83,7 +92,15 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) spin_lock_init(&spool->lock); spool->count = 1; - spool->max_hpages = nr_blocks; + spool->max_hpages = max_hpages; + spool->hstate = h; + spool->min_hpages = min_hpages; + + if (min_hpages != -1 && hugetlb_acct_memory(h, min_hpages)) { + kfree(spool); + return NULL; + } + spool->rsv_hpages = min_hpages; return spool; } -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id 38EF46B0070 for ; Fri, 20 Mar 2015 16:48:06 -0400 (EDT) Received: by padcy3 with SMTP id cy3so119556966pad.3 for ; Fri, 20 Mar 2015 13:48:06 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id ql2si11270855pac.138.2015.03.20.13.48.03 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 20 Mar 2015 13:48:04 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V3 2/4] hugetlbfs: add minimum size accounting to subpools Date: Fri, 20 Mar 2015 13:47:08 -0700 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Andi Kleen , David Rientjes , Mike Kravetz The same routines that perform subpool maximum size accounting hugepage_subpool_get/put_pages() are modified to also perform minimum size accounting. When a delta value is passed to these routines, calculate how global reservations must be adjusted to maintain the subpool minimum size. The routines now return this global reserve count adjustment. This global reserve count adjustment is then passed to the global accounting routine hugetlb_acct_memory(). Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 100 insertions(+), 23 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0b4a01c..28a15c3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -96,36 +96,89 @@ void hugepage_put_subpool(struct hugepage_subpool *spool) unlock_or_release_subpool(spool); } -static int hugepage_subpool_get_pages(struct hugepage_subpool *spool, +/* + * Subpool accounting for allocating and reserving pages. + * Return -ENOMEM if there are not enough resources to satisfy the + * the request. Otherwise, return the number of pages by which the + * global pools must be adjusted (upward). The returned value may + * only be different than the passed value (delta) in the case where + * a subpool minimum size must be manitained. + */ +static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, long delta) { - int ret = 0; + long ret = delta; if (!spool) - return 0; + return ret; spin_lock(&spool->lock); - if ((spool->used_hpages + delta) <= spool->max_hpages) { - spool->used_hpages += delta; - } else { - ret = -ENOMEM; + + if (spool->max_hpages != -1) { /* maximum size accounting */ + if ((spool->used_hpages + delta) <= spool->max_hpages) + spool->used_hpages += delta; + else { + ret = -ENOMEM; + goto unlock_ret; + } } - spin_unlock(&spool->lock); + if (spool->min_hpages != -1) { /* minimum size accounting */ + if (delta > spool->rsv_hpages) { + /* + * Asking for more reserves than those already taken on + * behalf of subpool. Return difference. + */ + ret = delta - spool->rsv_hpages; + spool->rsv_hpages = 0; + } else { + ret = 0; /* reserves already accounted for */ + spool->rsv_hpages -= delta; + } + } + +unlock_ret: + spin_unlock(&spool->lock); return ret; } -static void hugepage_subpool_put_pages(struct hugepage_subpool *spool, +/* + * Subpool accounting for freeing and unreserving pages. + * Return the number of global page reservations that must be dropped. + * The return value may only be different than the passed value (delta) + * in the case where a subpool minimum size must be maintained. + */ +static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta) { + long ret = delta; + if (!spool) - return; + return delta; spin_lock(&spool->lock); - spool->used_hpages -= delta; - /* If hugetlbfs_put_super couldn't free spool due to - * an outstanding quota reference, free it now. */ + + if (spool->max_hpages != -1) /* maximum size accounting */ + spool->used_hpages -= delta; + + if (spool->min_hpages != -1) { /* minimum size accounting */ + if (spool->rsv_hpages + delta <= spool->min_hpages) + ret = 0; + else + ret = spool->rsv_hpages + delta - spool->min_hpages; + + spool->rsv_hpages += delta; + if (spool->rsv_hpages > spool->min_hpages) + spool->rsv_hpages = spool->min_hpages; + } + + /* + * If hugetlbfs_put_super couldn't free spool due to an outstanding + * quota reference, free it now. + */ unlock_or_release_subpool(spool); + + return ret; } static inline struct hugepage_subpool *subpool_inode(struct inode *inode) @@ -873,6 +926,14 @@ void free_huge_page(struct page *page) restore_reserve = PagePrivate(page); ClearPagePrivate(page); + /* + * A return code of zero implies that the subpool will be under its + * minimum size if the reservation is not restored after page is free. + * Therefore, force restore_reserve operation. + */ + if (hugepage_subpool_put_pages(spool, 1) == 0) + restore_reserve = true; + spin_lock(&hugetlb_lock); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); @@ -890,7 +951,6 @@ void free_huge_page(struct page *page) enqueue_huge_page(h, page); } spin_unlock(&hugetlb_lock); - hugepage_subpool_put_pages(spool, 1); } static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) @@ -1383,7 +1443,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (chg < 0) return ERR_PTR(-ENOMEM); if (chg || avoid_reserve) - if (hugepage_subpool_get_pages(spool, 1)) + if (hugepage_subpool_get_pages(spool, 1) < 0) return ERR_PTR(-ENOSPC); ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); @@ -2451,6 +2511,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) struct resv_map *resv = vma_resv_map(vma); struct hugepage_subpool *spool = subpool_vma(vma); unsigned long reserve, start, end; + long gbl_reserve; if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) return; @@ -2463,8 +2524,12 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) kref_put(&resv->refs, resv_map_release); if (reserve) { - hugetlb_acct_memory(h, -reserve); - hugepage_subpool_put_pages(spool, reserve); + /* + * Decrement reserve counts. The global reserve count may be + * adjusted if the subpool has a minimum size. + */ + gbl_reserve = hugepage_subpool_put_pages(spool, reserve); + hugetlb_acct_memory(h, -gbl_reserve); } } @@ -3395,6 +3460,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; + long gbl_reserve; /* * Only apply hugepage reservation if asked. At fault time, an @@ -3431,8 +3497,13 @@ int hugetlb_reserve_pages(struct inode *inode, goto out_err; } - /* There must be enough pages in the subpool for the mapping */ - if (hugepage_subpool_get_pages(spool, chg)) { + /* + * There must be enough pages in the subpool for the mapping. If + * the subpool has a minimum size, there may be some global + * reservations already in place (gbl_reserve). + */ + gbl_reserve = hugepage_subpool_get_pages(spool, chg); + if (gbl_reserve < 0) { ret = -ENOSPC; goto out_err; } @@ -3441,9 +3512,10 @@ int hugetlb_reserve_pages(struct inode *inode, * Check enough hugepages are available for the reservation. * Hand the pages back to the subpool if there are not */ - ret = hugetlb_acct_memory(h, chg); + ret = hugetlb_acct_memory(h, gbl_reserve); if (ret < 0) { - hugepage_subpool_put_pages(spool, chg); + /* put back original number of pages, chg */ + (void)hugepage_subpool_put_pages(spool, chg); goto out_err; } @@ -3473,6 +3545,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) struct resv_map *resv_map = inode_resv_map(inode); long chg = 0; struct hugepage_subpool *spool = subpool_inode(inode); + long gbl_reserve; if (resv_map) chg = region_truncate(resv_map, offset); @@ -3480,8 +3553,12 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) inode->i_blocks -= (blocks_per_huge_page(h) * freed); spin_unlock(&inode->i_lock); - hugepage_subpool_put_pages(spool, (chg - freed)); - hugetlb_acct_memory(h, -(chg - freed)); + /* + * If the subpool has a minimum size, the number of global + * reservations to be released may be adjusted. + */ + gbl_reserve = hugepage_subpool_put_pages(spool, (chg - freed)); + hugetlb_acct_memory(h, -gbl_reserve); } #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) by kanga.kvack.org (Postfix) with ESMTP id 467216B0071 for ; Fri, 20 Mar 2015 16:48:22 -0400 (EDT) Received: by pacwe9 with SMTP id we9so119645605pac.1 for ; Fri, 20 Mar 2015 13:48:22 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id fe5si11515967pdb.39.2015.03.20.13.48.20 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 20 Mar 2015 13:48:21 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V3 4/4] hugetlbfs: document min_size mount option and cleanup Date: Fri, 20 Mar 2015 13:47:10 -0700 Message-Id: <0a9330b14a050bdf8ebb9c02722e9bbdd13eaa3f.1426880500.git.mike.kravetz@oracle.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Andi Kleen , David Rientjes , Mike Kravetz Add min_size mount option to the hugetlbfs documentation. Also, add the missing pagesize option and mention that size can be specified as bytes or a percentage of huge page pool. Signed-off-by: Mike Kravetz --- Documentation/vm/hugetlbpage.txt | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index f2d3a10..b32b9cd 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -267,21 +267,34 @@ call, then it is required that system administrator mount a file system of type hugetlbfs: mount -t hugetlbfs \ - -o uid=,gid=,mode=,size=,nr_inodes= \ - none /mnt/huge + -o uid=,gid=,mode=,pagesize=,size=,\ + min_size=,nr_inodes= none /mnt/huge This command mounts a (pseudo) filesystem of type hugetlbfs on the directory /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid options sets the owner and group of the root of the file system. By default the uid and gid of the current process are taken. The mode option sets the mode of root of file system to value & 01777. This value is given in octal. -By default the value 0755 is picked. The size option sets the maximum value of -memory (huge pages) allowed for that filesystem (/mnt/huge). The size is -rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of -inodes that /mnt/huge can use. If the size or nr_inodes option is not -provided on command line then no limits are set. For size and nr_inodes -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For -example, size=2K has the same meaning as size=2048. +By default the value 0755 is picked. If the paltform supports multiple huge +page sizes, the pagesize option can be used to specify the huge page size and +associated pool. pagesize is specified in bytes. If pagesize is not specified +the paltform's default huge page size and associated pool will be used. The +size option sets the maximum value of memory (huge pages) allowed for that +filesystem (/mnt/huge). The size option can be specified in bytes, or as a +percentage of the specified huge page pool (nr_hugepages). The size is +rounded down to HPAGE_SIZE boundary. The min_size option sets the minimum +value of memory (huge pages) allowed for the filesystem. min_size can be +specified in the same way as size, either bytes or a percentage of the +huge page pool. At mount time, the number of huge pages specified by +min_size are reserved for use by the filesystem. If there are not enough +free huge pages available, the mount will fail. As huge pages are allocated +to the filesystem and freed, the reserve count is adjusted so that the sum +of allocated and reserved huge pages is always at least min_size. The option +nr_inodes sets the maximum number of inodes that /mnt/huge can use. If the +size, min_size or nr_inodes option is not provided on command line then +no limits are set. For pagesize, size, min_size and nr_inodes options, you +can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For example, size=2K +has the same meaning as size=2048. While read system calls are supported on files that reside on hugetlb file systems, write system calls are not. -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org