linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] hugetlbfs: fix memory leak for resv_map
@ 2019-03-06  6:10 Yufen Yu
  2019-03-06 23:52 ` Mike Kravetz
  0 siblings, 1 reply; 3+ messages in thread
From: Yufen Yu @ 2019-03-06  6:10 UTC (permalink / raw)
  To: mike.kravetz, linux-mm

When .mknod create a block device file in hugetlbfs, it will
allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
For now, inode->i_mapping->private_data is used to point the resv_map.
However, when open the device, bd_acquire() will set i_mapping as
bd_inode->imapping, result in resv_map memory leak.

We fix it by waiting until a call to hugetlb_reserve_pages() to allocate
the inode specific resv_map. We could then remove the resv_map allocation
at inode creation time.

Programs to reproduce:
	mount -t hugetlbfs nodev hugetlbfs
	mknod hugetlbfs/dev b 0 0
	exec 30<> hugetlbfs/dev
	umount hugetlbfs/

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 fs/hugetlbfs/inode.c | 10 ++--------
 mm/hugetlb.c         |  6 ++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a7fa037b876b..eb87cfb80eb9 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -741,11 +741,6 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 					umode_t mode, dev_t dev)
 {
 	struct inode *inode;
-	struct resv_map *resv_map;
-
-	resv_map = resv_map_alloc();
-	if (!resv_map)
-		return NULL;
 
 	inode = new_inode(sb);
 	if (inode) {
@@ -757,7 +752,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 				&hugetlbfs_i_mmap_rwsem_key);
 		inode->i_mapping->a_ops = &hugetlbfs_aops;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
-		inode->i_mapping->private_data = resv_map;
+		inode->i_mapping->private_data = NULL;
 		info->seals = F_SEAL_SEAL;
 		switch (mode & S_IFMT) {
 		default:
@@ -780,8 +775,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 			break;
 		}
 		lockdep_annotate_inode_mutex_key(inode);
-	} else
-		kref_put(&resv_map->refs, resv_map_release);
+	}
 
 	return inode;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8dfdffc34a99..088fd1239d49 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4462,6 +4462,12 @@ int hugetlb_reserve_pages(struct inode *inode,
 	 */
 	if (!vma || vma->vm_flags & VM_MAYSHARE) {
 		resv_map = inode_resv_map(inode);
+		if (!resv_map) {
+			resv_map = resv_map_alloc();
+			if (!resv_map)
+				return -ENOMEM;
+			inode->i_mapping->private_data = resv_map;
+		}
 
 		chg = region_chg(resv_map, from, to);
 
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] hugetlbfs: fix memory leak for resv_map
  2019-03-06  6:10 [PATCH v2] hugetlbfs: fix memory leak for resv_map Yufen Yu
@ 2019-03-06 23:52 ` Mike Kravetz
  2019-03-07 23:50   ` Mike Kravetz
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Kravetz @ 2019-03-06 23:52 UTC (permalink / raw)
  To: Yufen Yu, linux-mm

On 3/5/19 10:10 PM, Yufen Yu wrote:
> When .mknod create a block device file in hugetlbfs, it will
> allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
> For now, inode->i_mapping->private_data is used to point the resv_map.
> However, when open the device, bd_acquire() will set i_mapping as
> bd_inode->imapping, result in resv_map memory leak.
> 
> We fix it by waiting until a call to hugetlb_reserve_pages() to allocate
> the inode specific resv_map. We could then remove the resv_map allocation
> at inode creation time.
> 
> Programs to reproduce:
> 	mount -t hugetlbfs nodev hugetlbfs
> 	mknod hugetlbfs/dev b 0 0
> 	exec 30<> hugetlbfs/dev
> 	umount hugetlbfs/
> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>

Thank you.  That is the approach I had in mind.

Unfortunately, this patch causes several regressions in the libhugetlbfs
test suite.  I have not debugged to determine exact cause.  

I was unsure about one thing with this approach.  We set
inode->i_mapping->private_data while holding the inode lock, so there
should be no problem there.  However, we access inode_resv_map() in the
page fault path without the inode lock.  The page fault path should get
NULL or a resv_map.  I just wonder if there may be some races where the
fault path may still be seeing NULL.

I can do more debug, but it will take a couple days as I am busy with
other things right now.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] hugetlbfs: fix memory leak for resv_map
  2019-03-06 23:52 ` Mike Kravetz
@ 2019-03-07 23:50   ` Mike Kravetz
  0 siblings, 0 replies; 3+ messages in thread
From: Mike Kravetz @ 2019-03-07 23:50 UTC (permalink / raw)
  To: Yufen Yu, linux-mm, linux-kernel
  Cc: Kirill A. Shutemov, Naoya Horiguchi, Michal Hocko, Andrew Morton

Adding others on Cc to see if they have comments or opinions.

On 3/6/19 3:52 PM, Mike Kravetz wrote:
> On 3/5/19 10:10 PM, Yufen Yu wrote:
>> When .mknod create a block device file in hugetlbfs, it will
>> allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
>> For now, inode->i_mapping->private_data is used to point the resv_map.
>> However, when open the device, bd_acquire() will set i_mapping as
>> bd_inode->imapping, result in resv_map memory leak.
>>
>> We fix it by waiting until a call to hugetlb_reserve_pages() to allocate
>> the inode specific resv_map. We could then remove the resv_map allocation
>> at inode creation time.
>>
>> Programs to reproduce:
>> 	mount -t hugetlbfs nodev hugetlbfs
>> 	mknod hugetlbfs/dev b 0 0
>> 	exec 30<> hugetlbfs/dev
>> 	umount hugetlbfs/
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> 
> Thank you.  That is the approach I had in mind.
> 
> Unfortunately, this patch causes several regressions in the libhugetlbfs
> test suite.  I have not debugged to determine exact cause.  
> 
> I was unsure about one thing with this approach.  We set
> inode->i_mapping->private_data while holding the inode lock, so there
> should be no problem there.  However, we access inode_resv_map() in the
> page fault path without the inode lock.  The page fault path should get
> NULL or a resv_map.  I just wonder if there may be some races where the
> fault path may still be seeing NULL.
> 
> I can do more debug, but it will take a couple days as I am busy with
> other things right now.

My apologies.  Calling resv_map_alloc() only from hugetlb_reserve_pages()
is not going to work.  The reason why is that the reserv_map is used to track
page allocations even if there are not reservations.  So, if reservations
are not created other huge page accounting is impacted.

Sorry for suggesting that approach.

As mentioned, I do not like your original approach as it adds an extra word
to every hugetlbfs inode.  How about something like the following which
only adds the resv_map to inodes which can have associated page allocations?
I have only done limited regression testing with this.

From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Thu, 7 Mar 2019 15:37:31 -0800
Subject: [PATCH] hugetlbfs: only allocate reserve map for inodes that can
 allocate pages

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a7fa037b876b..a3a3d256fb0e 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -741,11 +741,17 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 					umode_t mode, dev_t dev)
 {
 	struct inode *inode;
-	struct resv_map *resv_map;
+	struct resv_map *resv_map = NULL;
 
-	resv_map = resv_map_alloc();
-	if (!resv_map)
-		return NULL;
+	/*
+	 * Reserve maps are only needed for inodes that can have associated
+	 * page allocations.
+	 */
+	if (S_ISREG(mode) || S_ISLNK(mode)) {
+		resv_map = resv_map_alloc();
+		if (!resv_map)
+			return NULL;
+	}
 
 	inode = new_inode(sb);
 	if (inode) {
@@ -780,8 +786,10 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 			break;
 		}
 		lockdep_annotate_inode_mutex_key(inode);
-	} else
-		kref_put(&resv_map->refs, resv_map_release);
+	} else {
+		if (resv_map)
+			kref_put(&resv_map->refs, resv_map_release);
+	}
 
 	return inode;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-07 23:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06  6:10 [PATCH v2] hugetlbfs: fix memory leak for resv_map Yufen Yu
2019-03-06 23:52 ` Mike Kravetz
2019-03-07 23:50   ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).