linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
@ 2019-04-12  4:02 Yufen Yu
  2019-04-12 23:40 ` Mike Kravetz
  0 siblings, 1 reply; 15+ messages in thread
From: Yufen Yu @ 2019-04-12  4:02 UTC (permalink / raw)
  To: mike.kravetz, linux-mm; +Cc: mhocko, n-horiguchi, kirill.shutemov, yuyufen

Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
fix memory leak for resv_map. It only allocate resv_map for inode mode
is 'S_ISREG' and 'S_ISLNK', avoiding i_mapping->private_data be set as
bdev_inode private_data.

However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
free or modify i_mapping->private_data that is owned by bdev inode,
which is not expected!

Fortunately, bdev filesystem have not actually use i_mapping->private_data.
Thus, this bug has not caused serious impact. But, we can not ensure
bdev will not use that field in the furture. And hugetlbfs should not
depend on bdev filesystem implementation.

We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
be more reasonable.

Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 fs/hugetlbfs/inode.c    | 11 ++++++++---
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            |  2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9285dd4f4b1c..f1342a3fa716 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -497,12 +497,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 static void hugetlbfs_evict_inode(struct inode *inode)
 {
 	struct resv_map *resv_map;
+	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
 
 	remove_inode_hugepages(inode, 0, LLONG_MAX);
-	resv_map = (struct resv_map *)inode->i_mapping->private_data;
+	resv_map = info->resv_map;
 	/* root inode doesn't have the resv_map, so we should check it */
-	if (resv_map)
+	if (resv_map) {
 		resv_map_release(&resv_map->refs);
+		info->resv_map = NULL;
+	}
 	clear_inode(inode);
 }
 
@@ -777,7 +780,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 				&hugetlbfs_i_mmap_rwsem_key);
 		inode->i_mapping->a_ops = &hugetlbfs_aops;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
-		inode->i_mapping->private_data = resv_map;
+		info->resv_map = resv_map;
 		info->seals = F_SEAL_SEAL;
 		switch (mode & S_IFMT) {
 		default:
@@ -1047,6 +1050,7 @@ static struct inode *hugetlbfs_alloc_inode(struct super_block *sb)
 	 * private inode.  This simplifies hugetlbfs_destroy_inode.
 	 */
 	mpol_shared_policy_init(&p->policy, NULL);
+	p->resv_map = NULL;
 
 	return &p->vfs_inode;
 }
@@ -1061,6 +1065,7 @@ static void hugetlbfs_destroy_inode(struct inode *inode)
 {
 	hugetlbfs_inc_free_inodes(HUGETLBFS_SB(inode->i_sb));
 	mpol_free_shared_policy(&HUGETLBFS_I(inode)->policy);
+	HUGETLBFS_I(inode)->resv_map = NULL;
 	call_rcu(&inode->i_rcu, hugetlbfs_i_callback);
 }
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 11943b60f208..584030631045 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -297,6 +297,7 @@ struct hugetlbfs_inode_info {
 	struct shared_policy policy;
 	struct inode vfs_inode;
 	unsigned int seals;
+	struct resv_map *resv_map;
 };
 
 static inline struct hugetlbfs_inode_info *HUGETLBFS_I(struct inode *inode)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fe74f94e5327..a2648edffb02 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -740,7 +740,7 @@ void resv_map_release(struct kref *ref)
 
 static inline struct resv_map *inode_resv_map(struct inode *inode)
 {
-	return inode->i_mapping->private_data;
+	return HUGETLBFS_I(inode)->resv_map;
 }
 
 static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-12  4:02 [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info Yufen Yu
@ 2019-04-12 23:40 ` Mike Kravetz
  2019-04-13 11:57   ` yuyufen
  2019-04-15  6:16   ` Naoya Horiguchi
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Kravetz @ 2019-04-12 23:40 UTC (permalink / raw)
  To: Yufen Yu, linux-mm; +Cc: mhocko, n-horiguchi, kirill.shutemov

On 4/11/19 9:02 PM, Yufen Yu wrote:
> Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
...
> However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
> free or modify i_mapping->private_data that is owned by bdev inode,
> which is not expected!
...
> We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
> be more reasonable.

Your patches force me to consider these potential issues.  Thank you!

The root of all these problems (including the original leak) is that the
open of a block special inode will result in bd_acquire() overwriting the
value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
not free the initially allocated resv_map.  In addition, when the
inode is evicted/destroyed inode->i_mapping may point to an address space
not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
points to hugetlbfs inode address space at evict time, there may be bad
data references or worse.

This specific part of the patch made me think,

> @@ -497,12 +497,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
>  static void hugetlbfs_evict_inode(struct inode *inode)
>  {
>  	struct resv_map *resv_map;
> +	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
>  
>  	remove_inode_hugepages(inode, 0, LLONG_MAX);
> -	resv_map = (struct resv_map *)inode->i_mapping->private_data;
> +	resv_map = info->resv_map;
>  	/* root inode doesn't have the resv_map, so we should check it */
> -	if (resv_map)
> +	if (resv_map) {
>  		resv_map_release(&resv_map->refs);
> +		info->resv_map = NULL;
> +	}
>  	clear_inode(inode);
>  }

If inode->i_mapping may not be associated with the hugetlbfs inode, then
remove_inode_hugepages() will also have problems.  It will want to operate
on the address space associated with the inode.  So, there are more issues
than just the resv_map.  When I looked at the first few lines of
remove_inode_hugepages(), I was surprised to see:

	struct address_space *mapping = &inode->i_data;

So remove_inode_hugepages is explicitly using the original address space
that is embedded in the inode.  As a result, it is not impacted by changes
to inode->i_mapping.  Using git history I was unable to determine why
remove_inode_hugepages is the only place in hugetlbfs code doing this.

With this in mind, a simple change like the following will fix the original
leak issue as well as the potential issues mentioned in this patch.

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 53ea3cef526e..9f0719bad46f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -511,6 +511,11 @@ static void hugetlbfs_evict_inode(struct inode *inode)
 {
 	struct resv_map *resv_map;
 
+	/*
+	 * Make sure we are operating on original hugetlbfs address space.
+	 */
+	inode->i_mapping = &inode->i_data;
+
 	remove_inode_hugepages(inode, 0, LLONG_MAX);
 	resv_map = (struct resv_map *)inode->i_mapping->private_data;
 	/* root inode doesn't have the resv_map, so we should check it */


I don't know why hugetlbfs code would ever want to operate on any address
space but the one embedded within the inode.  However, my uderstanding of
the vfs layer is somewhat limited.  I'm wondering if the hugetlbfs code
(helper routines mostly) should perhaps use &inode->i_data instead of
inode->i_mapping.  Does it ever make sense for hugetlbfs code to operate
on inode->i_mapping if inode->i_mapping != &inode->i_data ?
-- 
Mike Kravetz


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-12 23:40 ` Mike Kravetz
@ 2019-04-13 11:57   ` yuyufen
  2019-04-15  6:16   ` Naoya Horiguchi
  1 sibling, 0 replies; 15+ messages in thread
From: yuyufen @ 2019-04-13 11:57 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm; +Cc: mhocko, n-horiguchi, kirill.shutemov



On 2019/4/13 7:40, Mike Kravetz wrote:
> This specific part of the patch made me think,
>
>> @@ -497,12 +497,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
>>   static void hugetlbfs_evict_inode(struct inode *inode)
>>   {
>>   	struct resv_map *resv_map;
>> +	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
>>   
>>   	remove_inode_hugepages(inode, 0, LLONG_MAX);
>> -	resv_map = (struct resv_map *)inode->i_mapping->private_data;
>> +	resv_map = info->resv_map;
>>   	/* root inode doesn't have the resv_map, so we should check it */
>> -	if (resv_map)
>> +	if (resv_map) {
>>   		resv_map_release(&resv_map->refs);
>> +		info->resv_map = NULL;
>> +	}
>>   	clear_inode(inode);
>>   }
> If inode->i_mapping may not be associated with the hugetlbfs inode, then
> remove_inode_hugepages() will also have problems.  It will want to operate
> on the address space associated with the inode.  So, there are more issues
> than just the resv_map.  When I looked at the first few lines of
> remove_inode_hugepages(), I was surprised to see:
>
> 	struct address_space *mapping = &inode->i_data;

Good catch!

> So remove_inode_hugepages is explicitly using the original address space
> that is embedded in the inode.  As a result, it is not impacted by changes
> to inode->i_mapping.  Using git history I was unable to determine why
> remove_inode_hugepages is the only place in hugetlbfs code doing this.
>
> With this in mind, a simple change like the following will fix the original
> leak issue as well as the potential issues mentioned in this patch.
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 53ea3cef526e..9f0719bad46f 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -511,6 +511,11 @@ static void hugetlbfs_evict_inode(struct inode *inode)
>   {
>   	struct resv_map *resv_map;
>   
> +	/*
> +	 * Make sure we are operating on original hugetlbfs address space.
> +	 */
> +	inode->i_mapping = &inode->i_data;
> +
>   	remove_inode_hugepages(inode, 0, LLONG_MAX);
>   	resv_map = (struct resv_map *)inode->i_mapping->private_data;
>   	/* root inode doesn't have the resv_map, so we should check it */
>
>
> I don't know why hugetlbfs code would ever want to operate on any address
> space but the one embedded within the inode.  However, my uderstanding of
> the vfs layer is somewhat limited.  I'm wondering if the hugetlbfs code
> (helper routines mostly) should perhaps use &inode->i_data instead of
> inode->i_mapping.  Does it ever make sense for hugetlbfs code to operate
> on inode->i_mapping if inode->i_mapping != &inode->i_data ?

I also feel very confused.

Yufen
thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-12 23:40 ` Mike Kravetz
  2019-04-13 11:57   ` yuyufen
@ 2019-04-15  6:16   ` Naoya Horiguchi
  2019-04-15  9:15     ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Naoya Horiguchi @ 2019-04-15  6:16 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Yufen Yu, linux-mm, mhocko, kirill.shutemov

On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
> On 4/11/19 9:02 PM, Yufen Yu wrote:
> > Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> ...
> > However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
> > free or modify i_mapping->private_data that is owned by bdev inode,
> > which is not expected!
> ...
> > We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
> > be more reasonable.
> 
> Your patches force me to consider these potential issues.  Thank you!
> 
> The root of all these problems (including the original leak) is that the
> open of a block special inode will result in bd_acquire() overwriting the
> value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
> resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
> not free the initially allocated resv_map.  In addition, when the
> inode is evicted/destroyed inode->i_mapping may point to an address space
> not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
> points to hugetlbfs inode address space at evict time, there may be bad
> data references or worse.

Let me ask a kind of elementary question: is there any good reason/purpose
to create and use block special files on hugetlbfs?  I never heard about
such usecases.  I guess that the conflict of the usage of ->i_mapping is
discovered recently and that's because block special files on hugetlbfs are
just not considered until recently or well defined.  So I think that we might
be better to begin with defining it first.

I tried the procedure described in commit 58b6e5e8f1a ("hugetlbfs: fix
memory leak for resv_map") and I failed to open the block special file with
ENXIO. So I'm not clearly sure what to solve.
I tried a similar test on tmpfs (good reference for hugetlbfs) and that also
failed on opening block file on it (but with EACCESS), so simply fixing it
similarly could be an option if there's no reasonable usecase and we just
want to fix the memory leak. 

> 
> This specific part of the patch made me think,
> 
> > @@ -497,12 +497,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> >  static void hugetlbfs_evict_inode(struct inode *inode)
> >  {
> >  	struct resv_map *resv_map;
> > +	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
> >  
> >  	remove_inode_hugepages(inode, 0, LLONG_MAX);
> > -	resv_map = (struct resv_map *)inode->i_mapping->private_data;
> > +	resv_map = info->resv_map;
> >  	/* root inode doesn't have the resv_map, so we should check it */
> > -	if (resv_map)
> > +	if (resv_map) {
> >  		resv_map_release(&resv_map->refs);
> > +		info->resv_map = NULL;
> > +	}
> >  	clear_inode(inode);
> >  }
> 
> If inode->i_mapping may not be associated with the hugetlbfs inode, then
> remove_inode_hugepages() will also have problems.  It will want to operate
> on the address space associated with the inode.  So, there are more issues
> than just the resv_map.  When I looked at the first few lines of
> remove_inode_hugepages(), I was surprised to see:
> 
> 	struct address_space *mapping = &inode->i_data;
> 
> So remove_inode_hugepages is explicitly using the original address space
> that is embedded in the inode.  As a result, it is not impacted by changes
> to inode->i_mapping.  Using git history I was unable to determine why
> remove_inode_hugepages is the only place in hugetlbfs code doing this.
> 
> With this in mind, a simple change like the following will fix the original
> leak issue as well as the potential issues mentioned in this patch.
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 53ea3cef526e..9f0719bad46f 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -511,6 +511,11 @@ static void hugetlbfs_evict_inode(struct inode *inode)
>  {
>  	struct resv_map *resv_map;
>  
> +	/*
> +	 * Make sure we are operating on original hugetlbfs address space.
> +	 */
> +	inode->i_mapping = &inode->i_data;
> +
>  	remove_inode_hugepages(inode, 0, LLONG_MAX);
>  	resv_map = (struct resv_map *)inode->i_mapping->private_data;
>  	/* root inode doesn't have the resv_map, so we should check it */
> 
> 
> I don't know why hugetlbfs code would ever want to operate on any address
> space but the one embedded within the inode.  However, my uderstanding of
> the vfs layer is somewhat limited.  I'm wondering if the hugetlbfs code
> (helper routines mostly) should perhaps use &inode->i_data instead of
> inode->i_mapping.  Does it ever make sense for hugetlbfs code to operate
> on inode->i_mapping if inode->i_mapping != &inode->i_data ?

I'm not a fs expert, but &inode->i_data seems safer pointer to reach to
the struct address_space because it's embedded in inode and always exists.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-15  6:16   ` Naoya Horiguchi
@ 2019-04-15  9:15     ` Michal Hocko
  2019-04-15 17:11       ` Mike Kravetz
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2019-04-15  9:15 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Mike Kravetz, Yufen Yu, linux-mm, kirill.shutemov

On Mon 15-04-19 06:16:15, Naoya Horiguchi wrote:
> On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
> > On 4/11/19 9:02 PM, Yufen Yu wrote:
> > > Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> > ...
> > > However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
> > > free or modify i_mapping->private_data that is owned by bdev inode,
> > > which is not expected!
> > ...
> > > We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
> > > be more reasonable.
> > 
> > Your patches force me to consider these potential issues.  Thank you!
> > 
> > The root of all these problems (including the original leak) is that the
> > open of a block special inode will result in bd_acquire() overwriting the
> > value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
> > resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
> > not free the initially allocated resv_map.  In addition, when the
> > inode is evicted/destroyed inode->i_mapping may point to an address space
> > not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
> > points to hugetlbfs inode address space at evict time, there may be bad
> > data references or worse.
> 
> Let me ask a kind of elementary question: is there any good reason/purpose
> to create and use block special files on hugetlbfs?  I never heard about
> such usecases.  I guess that the conflict of the usage of ->i_mapping is
> discovered recently and that's because block special files on hugetlbfs are
> just not considered until recently or well defined.  So I think that we might
> be better to begin with defining it first.

A absolutely agree. Hugetlbfs is overly complicated even without that.
So if this is merely "we have tried it and it has blown up" kinda thing
then just refuse the create blockdev files or document it as undefined.
You need a root to do so anyway.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-15  9:15     ` Michal Hocko
@ 2019-04-15 17:11       ` Mike Kravetz
  2019-04-15 23:59         ` Naoya Horiguchi
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mike Kravetz @ 2019-04-15 17:11 UTC (permalink / raw)
  To: Michal Hocko, Naoya Horiguchi; +Cc: Yufen Yu, linux-mm, kirill.shutemov

On 4/15/19 2:15 AM, Michal Hocko wrote:
> On Mon 15-04-19 06:16:15, Naoya Horiguchi wrote:
>> On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
>>> On 4/11/19 9:02 PM, Yufen Yu wrote:
>>>> Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
>>> ...
>>>> However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
>>>> free or modify i_mapping->private_data that is owned by bdev inode,
>>>> which is not expected!
>>> ...
>>>> We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
>>>> be more reasonable.
>>>
>>> Your patches force me to consider these potential issues.  Thank you!
>>>
>>> The root of all these problems (including the original leak) is that the
>>> open of a block special inode will result in bd_acquire() overwriting the
>>> value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
>>> resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
>>> not free the initially allocated resv_map.  In addition, when the
>>> inode is evicted/destroyed inode->i_mapping may point to an address space
>>> not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
>>> points to hugetlbfs inode address space at evict time, there may be bad
>>> data references or worse.
>>
>> Let me ask a kind of elementary question: is there any good reason/purpose
>> to create and use block special files on hugetlbfs?  I never heard about
>> such usecases.

I am not aware of this as a common use case.  Yufen Yu may be able to provide
more details about how the issue was discovered.  My guess is that it was
discovered via code inspection.

>>                 I guess that the conflict of the usage of ->i_mapping is
>> discovered recently and that's because block special files on hugetlbfs are
>> just not considered until recently or well defined.  So I think that we might
>> be better to begin with defining it first.

Unless I am mistaken, this is just like creating a device special file
in any other filesystem.  Correct?  hugetlbfs is just some place for the
inode/file to reside.  What happens when you open/ioctl/close/etc the file
is really dependent on the vfs layer and underlying driver.

> A absolutely agree. Hugetlbfs is overly complicated even without that.
> So if this is merely "we have tried it and it has blown up" kinda thing
> then just refuse the create blockdev files or document it as undefined.
> You need a root to do so anyway.

Can we just refuse to create device special files in hugetlbfs?  Do we need
to worry about breaking any potential users?  I honestly do not know if anyone
does this today.  However, if they did I believe things would "just work".
The only known issue is leaking a resv_map structure when the inode is
destroyed.  I doubt anyone would notice that leak today.

Let me do a little more research.  I think this can all be cleaned up by
making hugetlbfs always operate on the address space embedded in the inode.
If nothing else, a change or explanation should be added as to why most code
operates on inode->mapping and one place operates on &inode->i_data.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-15 17:11       ` Mike Kravetz
@ 2019-04-15 23:59         ` Naoya Horiguchi
  2019-04-16  0:37           ` Mike Kravetz
  2019-04-16  6:50         ` Michal Hocko
  2019-04-16 12:57         ` [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info yuyufen
  2 siblings, 1 reply; 15+ messages in thread
From: Naoya Horiguchi @ 2019-04-15 23:59 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Michal Hocko, Yufen Yu, linux-mm, kirill.shutemov

On Mon, Apr 15, 2019 at 10:11:39AM -0700, Mike Kravetz wrote:
> On 4/15/19 2:15 AM, Michal Hocko wrote:
> > On Mon 15-04-19 06:16:15, Naoya Horiguchi wrote:
> >> On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
> >>> On 4/11/19 9:02 PM, Yufen Yu wrote:
> >>>> Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> >>> ...
> >>>> However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
> >>>> free or modify i_mapping->private_data that is owned by bdev inode,
> >>>> which is not expected!
> >>> ...
> >>>> We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
> >>>> be more reasonable.
> >>>
> >>> Your patches force me to consider these potential issues.  Thank you!
> >>>
> >>> The root of all these problems (including the original leak) is that the
> >>> open of a block special inode will result in bd_acquire() overwriting the
> >>> value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
> >>> resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
> >>> not free the initially allocated resv_map.  In addition, when the
> >>> inode is evicted/destroyed inode->i_mapping may point to an address space
> >>> not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
> >>> points to hugetlbfs inode address space at evict time, there may be bad
> >>> data references or worse.
> >>
> >> Let me ask a kind of elementary question: is there any good reason/purpose
> >> to create and use block special files on hugetlbfs?  I never heard about
> >> such usecases.
> 
> I am not aware of this as a common use case.  Yufen Yu may be able to provide
> more details about how the issue was discovered.  My guess is that it was
> discovered via code inspection.
> 
> >>                 I guess that the conflict of the usage of ->i_mapping is
> >> discovered recently and that's because block special files on hugetlbfs are
> >> just not considered until recently or well defined.  So I think that we might
> >> be better to begin with defining it first.
> 
> Unless I am mistaken, this is just like creating a device special file
> in any other filesystem.  Correct?  hugetlbfs is just some place for the
> inode/file to reside.  What happens when you open/ioctl/close/etc the file
> is really dependent on the vfs layer and underlying driver.
> 

OK. Generally speaking, "special files just work even on hugetlbfs" sounds
fine for me if it properly works.

> > A absolutely agree. Hugetlbfs is overly complicated even without that.
> > So if this is merely "we have tried it and it has blown up" kinda thing
> > then just refuse the create blockdev files or document it as undefined.
> > You need a root to do so anyway.
> 
> Can we just refuse to create device special files in hugetlbfs?  Do we need
> to worry about breaking any potential users?  I honestly do not know if anyone
> does this today.  However, if they did I believe things would "just work".
> The only known issue is leaking a resv_map structure when the inode is
> destroyed.  I doubt anyone would notice that leak today.

Thanks for explanation, so that's unclear now. 

> 
> Let me do a little more research.  I think this can all be cleaned up by
> making hugetlbfs always operate on the address space embedded in the inode.
> If nothing else, a change or explanation should be added as to why most code
> operates on inode->mapping and one place operates on &inode->i_data.

Sounds nice, thank you.

(Just for sharing point, not intending to block the fix ...)
My remaining concern is that this problem might not be hugetlbfs specific,
because what triggers the issue seems to be the usage of inode->i_mapping.
bd_acquire() are callable from any filesystem, so I'm wondering whether we
have something to generally prevent this kind of issue?

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-15 23:59         ` Naoya Horiguchi
@ 2019-04-16  0:37           ` Mike Kravetz
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Kravetz @ 2019-04-16  0:37 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Michal Hocko, Yufen Yu, linux-mm, kirill.shutemov

On 4/15/19 4:59 PM, Naoya Horiguchi wrote:
> On Mon, Apr 15, 2019 at 10:11:39AM -0700, Mike Kravetz wrote:
>> Let me do a little more research.  I think this can all be cleaned up by
>> making hugetlbfs always operate on the address space embedded in the inode.
>> If nothing else, a change or explanation should be added as to why most code
>> operates on inode->mapping and one place operates on &inode->i_data.
> 
> Sounds nice, thank you.
> 
> (Just for sharing point, not intending to block the fix ...)
> My remaining concern is that this problem might not be hugetlbfs specific,
> because what triggers the issue seems to be the usage of inode->i_mapping.
> bd_acquire() are callable from any filesystem, so I'm wondering whether we
> have something to generally prevent this kind of issue?

I have gone through most of the filesystems and have not found any others
where this may be an issue.  From what I have seen, it is fairly common in
filesystem evict_inode routines to explicitly use &inode->i_data to get at
the address space passed to the truncate routine.  As mentioned, even
hugetlbfs does this in remove_inode_hugepages() which was previously part
of the evict_inode routine.  In tmpfs, the evict_inode routuine cleverly
checks the address space ops to determine if the address space is associated
with tmpfs.  If not, it does not call the truncate routine.

One of things different for hugetlbfs is that the resv_map structure hangs
off the address space within the inode.  Yufen Yu's approach was to move the
resv_map pointer into the hugetlbfs inode extenstion hugetlbfs_inode_info.
With that change, we could make the hugetlbfs evict_inode routine look more
like the tmpfs routine.  I do not really like the idea of increasing the size
of hugetlbfs inodes as we already have a place to store the pointer.  In
addition, the reserv_map is used with the mappings within the address space
so one could argue that it makes sense for them to be together.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-15 17:11       ` Mike Kravetz
  2019-04-15 23:59         ` Naoya Horiguchi
@ 2019-04-16  6:50         ` Michal Hocko
  2019-04-19 20:44           ` [PATCH] hugetlbfs: always use address space in inode for resv_map pointer Mike Kravetz
  2019-04-16 12:57         ` [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info yuyufen
  2 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2019-04-16  6:50 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Naoya Horiguchi, Yufen Yu, linux-mm, kirill.shutemov

On Mon 15-04-19 10:11:39, Mike Kravetz wrote:
> On 4/15/19 2:15 AM, Michal Hocko wrote:
> > On Mon 15-04-19 06:16:15, Naoya Horiguchi wrote:
> >> On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
> >>> On 4/11/19 9:02 PM, Yufen Yu wrote:
> >>>> Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> >>> ...
> >>>> However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
> >>>> free or modify i_mapping->private_data that is owned by bdev inode,
> >>>> which is not expected!
> >>> ...
> >>>> We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
> >>>> be more reasonable.
> >>>
> >>> Your patches force me to consider these potential issues.  Thank you!
> >>>
> >>> The root of all these problems (including the original leak) is that the
> >>> open of a block special inode will result in bd_acquire() overwriting the
> >>> value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
> >>> resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
> >>> not free the initially allocated resv_map.  In addition, when the
> >>> inode is evicted/destroyed inode->i_mapping may point to an address space
> >>> not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
> >>> points to hugetlbfs inode address space at evict time, there may be bad
> >>> data references or worse.
> >>
> >> Let me ask a kind of elementary question: is there any good reason/purpose
> >> to create and use block special files on hugetlbfs?  I never heard about
> >> such usecases.
> 
> I am not aware of this as a common use case.  Yufen Yu may be able to provide
> more details about how the issue was discovered.  My guess is that it was
> discovered via code inspection.
> 
> >>                 I guess that the conflict of the usage of ->i_mapping is
> >> discovered recently and that's because block special files on hugetlbfs are
> >> just not considered until recently or well defined.  So I think that we might
> >> be better to begin with defining it first.
> 
> Unless I am mistaken, this is just like creating a device special file
> in any other filesystem.  Correct?  hugetlbfs is just some place for the
> inode/file to reside.  What happens when you open/ioctl/close/etc the file
> is really dependent on the vfs layer and underlying driver.
> 
> > A absolutely agree. Hugetlbfs is overly complicated even without that.
> > So if this is merely "we have tried it and it has blown up" kinda thing
> > then just refuse the create blockdev files or document it as undefined.
> > You need a root to do so anyway.
> 
> Can we just refuse to create device special files in hugetlbfs?  Do we need
> to worry about breaking any potential users?  I honestly do not know if anyone
> does this today.  However, if they did I believe things would "just work".

But why would anybody do something like that? Is there any actual
semantical advantage to create device files on hugetlbfs? I would be
worried that some confused application might expect e.g. hugetlb backed
pagecache for a block device or something like that. I wouldn't be too
worried to outright disallow this and only allow on an explicit and
reasonable usecase.

> The only known issue is leaking a resv_map structure when the inode is
> destroyed.  I doubt anyone would notice that leak today.
> 
> Let me do a little more research.  I think this can all be cleaned up by
> making hugetlbfs always operate on the address space embedded in the inode.
> If nothing else, a change or explanation should be added as to why most code
> operates on inode->mapping and one place operates on &inode->i_data.

Yes, that makes sense.

Thanks!
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
  2019-04-15 17:11       ` Mike Kravetz
  2019-04-15 23:59         ` Naoya Horiguchi
  2019-04-16  6:50         ` Michal Hocko
@ 2019-04-16 12:57         ` yuyufen
  2 siblings, 0 replies; 15+ messages in thread
From: yuyufen @ 2019-04-16 12:57 UTC (permalink / raw)
  To: Mike Kravetz, Michal Hocko, Naoya Horiguchi; +Cc: linux-mm, kirill.shutemov



On 2019/4/16 1:11, Mike Kravetz wrote:
> On 4/15/19 2:15 AM, Michal Hocko wrote:
>> On Mon 15-04-19 06:16:15, Naoya Horiguchi wrote:
>>> On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
>>>> On 4/11/19 9:02 PM, Yufen Yu wrote:
>>>>> Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
>>>> ...
>>>>> However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
>>>>> free or modify i_mapping->private_data that is owned by bdev inode,
>>>>> which is not expected!
>>>> ...
>>>>> We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
>>>>> be more reasonable.
>>>> Your patches force me to consider these potential issues.  Thank you!
>>>>
>>>> The root of all these problems (including the original leak) is that the
>>>> open of a block special inode will result in bd_acquire() overwriting the
>>>> value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
>>>> resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
>>>> not free the initially allocated resv_map.  In addition, when the
>>>> inode is evicted/destroyed inode->i_mapping may point to an address space
>>>> not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
>>>> points to hugetlbfs inode address space at evict time, there may be bad
>>>> data references or worse.
>>> Let me ask a kind of elementary question: is there any good reason/purpose
>>> to create and use block special files on hugetlbfs?  I never heard about
>>> such usecases.
> I am not aware of this as a common use case.  Yufen Yu may be able to provide
> more details about how the issue was discovered.  My guess is that it was
> discovered via code inspection.

In fact, we discover the issue by running syzkaller. The program like:

15:39:59 executing program 0:
r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0/file0\x00', 
0x44000, 0x1)
r1 = syz_open_dev$vcsn(&(0x7f00000000c0)='/dev/vcs#\x00', 0x3f, 0x202000)
renameat2(r0, &(0x7f0000000140)='./file0\x00', r0, 
&(0x7f0000000180)='./file0/file0/file0\x00', 0x4)
mkdir(&(0x7f0000000300)='./file0\x00', 0x0)
mount(0x0, &(0x7f0000000200)='./file0\x00', 
&(0x7f0000000240)='hugetlbfs\x00', 0x0, 0x0)
mknod$loop(&(0x7f0000000000)='./file0/file0\x00', 0x6000, 
0xffffffffffffffff)

Yufen
Thanks

>
>>>                  I guess that the conflict of the usage of ->i_mapping is
>>> discovered recently and that's because block special files on hugetlbfs are
>>> just not considered until recently or well defined.  So I think that we might
>>> be better to begin with defining it first.
> Unless I am mistaken, this is just like creating a device special file
> in any other filesystem.  Correct?  hugetlbfs is just some place for the
> inode/file to reside.  What happens when you open/ioctl/close/etc the file
> is really dependent on the vfs layer and underlying driver.
>
>> A absolutely agree. Hugetlbfs is overly complicated even without that.
>> So if this is merely "we have tried it and it has blown up" kinda thing
>> then just refuse the create blockdev files or document it as undefined.
>> You need a root to do so anyway.
> Can we just refuse to create device special files in hugetlbfs?  Do we need
> to worry about breaking any potential users?  I honestly do not know if anyone
> does this today.  However, if they did I believe things would "just work".
> The only known issue is leaking a resv_map structure when the inode is
> destroyed.  I doubt anyone would notice that leak today.
>
> Let me do a little more research.  I think this can all be cleaned up by
> making hugetlbfs always operate on the address space embedded in the inode.
> If nothing else, a change or explanation should be added as to why most code
> operates on inode->mapping and one place operates on &inode->i_data.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] hugetlbfs: always use address space in inode for resv_map pointer
  2019-04-16  6:50         ` Michal Hocko
@ 2019-04-19 20:44           ` Mike Kravetz
  2019-05-08  7:10             ` yuyufen
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Kravetz @ 2019-04-19 20:44 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Yufen Yu
  Cc: Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov,
	Andrew Morton, Mike Kravetz

Continuing discussion about commit 58b6e5e8f1ad ("hugetlbfs: fix memory
leak for resv_map") brought up the issue that inode->i_mapping may not
point to the address space embedded within the inode at inode eviction
time.  The hugetlbfs truncate routine handles this by explicitly using
inode->i_data.  However, code cleaning up the resv_map will still use
the address space pointed to by inode->i_mapping.  Luckily, private_data
is NULL for address spaces in all such cases today but, there is no
guarantee this will continue.

Change all hugetlbfs code getting a resv_map pointer to explicitly get
it from the address space embedded within the inode.  In addition, add
more comments in the code to indicate why this is being done.

Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 11 +++++++++--
 mm/hugetlb.c         | 19 ++++++++++++++++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9285dd4f4b1c..cbc649cd1722 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -499,8 +499,15 @@ static void hugetlbfs_evict_inode(struct inode *inode)
 	struct resv_map *resv_map;
 
 	remove_inode_hugepages(inode, 0, LLONG_MAX);
-	resv_map = (struct resv_map *)inode->i_mapping->private_data;
-	/* root inode doesn't have the resv_map, so we should check it */
+
+	/*
+	 * Get the resv_map from the address space embedded in the inode.
+	 * This is the address space which points to any resv_map allocated
+	 * at inode creation time.  If this is a device special inode,
+	 * i_mapping may not point to the original address space.
+	 */
+	resv_map = (struct resv_map *)(&inode->i_data)->private_data;
+	/* Only regular and link inodes have associated reserve maps */
 	if (resv_map)
 		resv_map_release(&resv_map->refs);
 	clear_inode(inode);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cdc7b2d9100..b30e97b0ef37 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -740,7 +740,15 @@ void resv_map_release(struct kref *ref)
 
 static inline struct resv_map *inode_resv_map(struct inode *inode)
 {
-	return inode->i_mapping->private_data;
+	/*
+	 * At inode evict time, i_mapping may not point to the original
+	 * address space within the inode.  This original address space
+	 * contains the pointer to the resv_map.  So, always use the
+	 * address space embedded within the inode.
+	 * The VERY common case is inode->mapping == &inode->i_data but,
+	 * this may not be true for device special inodes.
+	 */
+	return (struct resv_map *)(&inode->i_data)->private_data;
 }
 
 static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
@@ -4477,6 +4485,11 @@ int hugetlb_reserve_pages(struct inode *inode,
 	 * called to make the mapping read-write. Assume !vma is a shm mapping
 	 */
 	if (!vma || vma->vm_flags & VM_MAYSHARE) {
+		/*
+		 * resv_map can not be NULL as hugetlb_reserve_pages is only
+		 * called for inodes for which resv_maps were created (see
+		 * hugetlbfs_get_inode).
+		 */
 		resv_map = inode_resv_map(inode);
 
 		chg = region_chg(resv_map, from, to);
@@ -4568,6 +4581,10 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
 	struct hugepage_subpool *spool = subpool_inode(inode);
 	long gbl_reserve;
 
+	/*
+	 * Since this routine can be called in the evict inode path for all
+	 * hugetlbfs inodes, resv_map could be NULL.
+	 */
 	if (resv_map) {
 		chg = region_del(resv_map, start, end);
 		/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: always use address space in inode for resv_map pointer
  2019-04-19 20:44           ` [PATCH] hugetlbfs: always use address space in inode for resv_map pointer Mike Kravetz
@ 2019-05-08  7:10             ` yuyufen
  2019-05-08 20:16               ` Mike Kravetz
  0 siblings, 1 reply; 15+ messages in thread
From: yuyufen @ 2019-05-08  7:10 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov,
	Andrew Morton, stable, yuyufen



On 2019/4/20 4:44, Mike Kravetz wrote:
> Continuing discussion about commit 58b6e5e8f1ad ("hugetlbfs: fix memory
> leak for resv_map") brought up the issue that inode->i_mapping may not
> point to the address space embedded within the inode at inode eviction
> time.  The hugetlbfs truncate routine handles this by explicitly using
> inode->i_data.  However, code cleaning up the resv_map will still use
> the address space pointed to by inode->i_mapping.  Luckily, private_data
> is NULL for address spaces in all such cases today but, there is no
> guarantee this will continue.
>
> Change all hugetlbfs code getting a resv_map pointer to explicitly get
> it from the address space embedded within the inode.  In addition, add
> more comments in the code to indicate why this is being done.
>
> Reported-by: Yufen Yu <yuyufen@huawei.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   fs/hugetlbfs/inode.c | 11 +++++++++--
>   mm/hugetlb.c         | 19 ++++++++++++++++++-
>   2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 9285dd4f4b1c..cbc649cd1722 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -499,8 +499,15 @@ static void hugetlbfs_evict_inode(struct inode *inode)
>   	struct resv_map *resv_map;
>   
>   	remove_inode_hugepages(inode, 0, LLONG_MAX);
> -	resv_map = (struct resv_map *)inode->i_mapping->private_data;
> -	/* root inode doesn't have the resv_map, so we should check it */
> +
> +	/*
> +	 * Get the resv_map from the address space embedded in the inode.
> +	 * This is the address space which points to any resv_map allocated
> +	 * at inode creation time.  If this is a device special inode,
> +	 * i_mapping may not point to the original address space.
> +	 */
> +	resv_map = (struct resv_map *)(&inode->i_data)->private_data;
> +	/* Only regular and link inodes have associated reserve maps */
>   	if (resv_map)
>   		resv_map_release(&resv_map->refs);
>   	clear_inode(inode);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6cdc7b2d9100..b30e97b0ef37 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -740,7 +740,15 @@ void resv_map_release(struct kref *ref)
>   
>   static inline struct resv_map *inode_resv_map(struct inode *inode)
>   {
> -	return inode->i_mapping->private_data;
> +	/*
> +	 * At inode evict time, i_mapping may not point to the original
> +	 * address space within the inode.  This original address space
> +	 * contains the pointer to the resv_map.  So, always use the
> +	 * address space embedded within the inode.
> +	 * The VERY common case is inode->mapping == &inode->i_data but,
> +	 * this may not be true for device special inodes.
> +	 */
> +	return (struct resv_map *)(&inode->i_data)->private_data;
>   }
>   
>   static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
> @@ -4477,6 +4485,11 @@ int hugetlb_reserve_pages(struct inode *inode,
>   	 * called to make the mapping read-write. Assume !vma is a shm mapping
>   	 */
>   	if (!vma || vma->vm_flags & VM_MAYSHARE) {
> +		/*
> +		 * resv_map can not be NULL as hugetlb_reserve_pages is only
> +		 * called for inodes for which resv_maps were created (see
> +		 * hugetlbfs_get_inode).
> +		 */
>   		resv_map = inode_resv_map(inode);
>   
>   		chg = region_chg(resv_map, from, to);
> @@ -4568,6 +4581,10 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
>   	struct hugepage_subpool *spool = subpool_inode(inode);
>   	long gbl_reserve;
>   
> +	/*
> +	 * Since this routine can be called in the evict inode path for all
> +	 * hugetlbfs inodes, resv_map could be NULL.
> +	 */
>   	if (resv_map) {
>   		chg = region_del(resv_map, start, end);
>   		/*

Dose this patch have been applied?

I think it is better to add fixes label, like:
Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")

Since the commit 58b6e5e8f1a has been merged to stable, this patch also 
be needed.
https://www.spinics.net/lists/stable/msg298740.html






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: always use address space in inode for resv_map pointer
  2019-05-08  7:10             ` yuyufen
@ 2019-05-08 20:16               ` Mike Kravetz
  2019-05-09 23:11                 ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Kravetz @ 2019-05-08 20:16 UTC (permalink / raw)
  To: yuyufen, linux-mm, linux-kernel
  Cc: Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov,
	Andrew Morton, stable

On 5/8/19 12:10 AM, yuyufen wrote:
> On 2019/4/20 4:44, Mike Kravetz wrote:
>> Continuing discussion about commit 58b6e5e8f1ad ("hugetlbfs: fix memory
>> leak for resv_map") brought up the issue that inode->i_mapping may not
>> point to the address space embedded within the inode at inode eviction
>> time.  The hugetlbfs truncate routine handles this by explicitly using
>> inode->i_data.  However, code cleaning up the resv_map will still use
>> the address space pointed to by inode->i_mapping.  Luckily, private_data
>> is NULL for address spaces in all such cases today but, there is no
>> guarantee this will continue.
>>
>> Change all hugetlbfs code getting a resv_map pointer to explicitly get
>> it from the address space embedded within the inode.  In addition, add
>> more comments in the code to indicate why this is being done.
>>
>> Reported-by: Yufen Yu <yuyufen@huawei.com>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
...
> 
> Dose this patch have been applied?

Andrew has pulled it into his tree.  However, I do not believe there has
been an ACK or Review, so am not sure of the exact status.

> I think it is better to add fixes label, like:
> Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> 
> Since the commit 58b6e5e8f1a has been merged to stable, this patch also be needed.
> https://www.spinics.net/lists/stable/msg298740.html

It must have been the AI that decided 58b6e5e8f1a needed to go to stable.
Even though this technically does not fix 58b6e5e8f1a, I'm OK with adding
the Fixes: to force this to go to the same stable trees.

-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: always use address space in inode for resv_map pointer
  2019-05-08 20:16               ` Mike Kravetz
@ 2019-05-09 23:11                 ` Andrew Morton
  2019-05-09 23:32                   ` Mike Kravetz
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2019-05-09 23:11 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: yuyufen, linux-mm, linux-kernel, Michal Hocko, Naoya Horiguchi,
	Kirill A . Shutemov, stable

On Wed, 8 May 2019 13:16:09 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> > I think it is better to add fixes label, like:
> > Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> > 
> > Since the commit 58b6e5e8f1a has been merged to stable, this patch also be needed.
> > https://www.spinics.net/lists/stable/msg298740.html
> 
> It must have been the AI that decided 58b6e5e8f1a needed to go to stable.

grr.

> Even though this technically does not fix 58b6e5e8f1a, I'm OK with adding
> the Fixes: to force this to go to the same stable trees.

Why are we bothering with any of this, given that

: Luckily, private_data is NULL for address spaces in all such cases
: today but, there is no guarantee this will continue.

?

Even though 58b6e5e8f1ad was inappropriately backported, the above
still holds, so what problem does a backport of "hugetlbfs: always use
address space in inode for resv_map pointer" actually solve?

And yes, some review of this would be nice


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] hugetlbfs: always use address space in inode for resv_map pointer
  2019-05-09 23:11                 ` Andrew Morton
@ 2019-05-09 23:32                   ` Mike Kravetz
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Kravetz @ 2019-05-09 23:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: yuyufen, linux-mm, linux-kernel, Michal Hocko, Naoya Horiguchi,
	Kirill A . Shutemov, stable

On 5/9/19 4:11 PM, Andrew Morton wrote:
> On Wed, 8 May 2019 13:16:09 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>>> I think it is better to add fixes label, like:
>>> Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
>>>
>>> Since the commit 58b6e5e8f1a has been merged to stable, this patch also be needed.
>>> https://www.spinics.net/lists/stable/msg298740.html
>>
>> It must have been the AI that decided 58b6e5e8f1a needed to go to stable.
> 
> grr.
> 
>> Even though this technically does not fix 58b6e5e8f1a, I'm OK with adding
>> the Fixes: to force this to go to the same stable trees.
> 
> Why are we bothering with any of this, given that
> 
> : Luckily, private_data is NULL for address spaces in all such cases
> : today but, there is no guarantee this will continue.
> 
> ?

You are right.  For stable releases, I do not see any way for this to
be an issue.  We are lucky today (and in the past).  The patch is there
to guard against code changes which may cause this condition to change
in the future.

Yufen Yu, do you see this actually fixing a problem in stable releases?
I believe you originally said this is not a problem today, which would
also imply older releases.  Just want to make sure I am not missing something.
-- 
Mike Kravetz

> Even though 58b6e5e8f1ad was inappropriately backported, the above
> still holds, so what problem does a backport of "hugetlbfs: always use
> address space in inode for resv_map pointer" actually solve?
> 
> And yes, some review of this would be nice


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-05-09 23:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-12  4:02 [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info Yufen Yu
2019-04-12 23:40 ` Mike Kravetz
2019-04-13 11:57   ` yuyufen
2019-04-15  6:16   ` Naoya Horiguchi
2019-04-15  9:15     ` Michal Hocko
2019-04-15 17:11       ` Mike Kravetz
2019-04-15 23:59         ` Naoya Horiguchi
2019-04-16  0:37           ` Mike Kravetz
2019-04-16  6:50         ` Michal Hocko
2019-04-19 20:44           ` [PATCH] hugetlbfs: always use address space in inode for resv_map pointer Mike Kravetz
2019-05-08  7:10             ` yuyufen
2019-05-08 20:16               ` Mike Kravetz
2019-05-09 23:11                 ` Andrew Morton
2019-05-09 23:32                   ` Mike Kravetz
2019-04-16 12:57         ` [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info yuyufen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).