All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Yufen Yu <yuyufen@huawei.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
Date: Mon, 15 Apr 2019 06:16:15 +0000	[thread overview]
Message-ID: <20190415061618.GA16061@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <83a4e275-405f-f1d8-2245-d597bef2ec69@oracle.com>

On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
> On 4/11/19 9:02 PM, Yufen Yu wrote:
> > Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> ...
> > However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
> > free or modify i_mapping->private_data that is owned by bdev inode,
> > which is not expected!
> ...
> > We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
> > be more reasonable.
> 
> Your patches force me to consider these potential issues.  Thank you!
> 
> The root of all these problems (including the original leak) is that the
> open of a block special inode will result in bd_acquire() overwriting the
> value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
> resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
> not free the initially allocated resv_map.  In addition, when the
> inode is evicted/destroyed inode->i_mapping may point to an address space
> not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
> points to hugetlbfs inode address space at evict time, there may be bad
> data references or worse.

Let me ask a kind of elementary question: is there any good reason/purpose
to create and use block special files on hugetlbfs?  I never heard about
such usecases.  I guess that the conflict of the usage of ->i_mapping is
discovered recently and that's because block special files on hugetlbfs are
just not considered until recently or well defined.  So I think that we might
be better to begin with defining it first.

I tried the procedure described in commit 58b6e5e8f1a ("hugetlbfs: fix
memory leak for resv_map") and I failed to open the block special file with
ENXIO. So I'm not clearly sure what to solve.
I tried a similar test on tmpfs (good reference for hugetlbfs) and that also
failed on opening block file on it (but with EACCESS), so simply fixing it
similarly could be an option if there's no reasonable usecase and we just
want to fix the memory leak. 

> 
> This specific part of the patch made me think,
> 
> > @@ -497,12 +497,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> >  static void hugetlbfs_evict_inode(struct inode *inode)
> >  {
> >  	struct resv_map *resv_map;
> > +	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
> >  
> >  	remove_inode_hugepages(inode, 0, LLONG_MAX);
> > -	resv_map = (struct resv_map *)inode->i_mapping->private_data;
> > +	resv_map = info->resv_map;
> >  	/* root inode doesn't have the resv_map, so we should check it */
> > -	if (resv_map)
> > +	if (resv_map) {
> >  		resv_map_release(&resv_map->refs);
> > +		info->resv_map = NULL;
> > +	}
> >  	clear_inode(inode);
> >  }
> 
> If inode->i_mapping may not be associated with the hugetlbfs inode, then
> remove_inode_hugepages() will also have problems.  It will want to operate
> on the address space associated with the inode.  So, there are more issues
> than just the resv_map.  When I looked at the first few lines of
> remove_inode_hugepages(), I was surprised to see:
> 
> 	struct address_space *mapping = &inode->i_data;
> 
> So remove_inode_hugepages is explicitly using the original address space
> that is embedded in the inode.  As a result, it is not impacted by changes
> to inode->i_mapping.  Using git history I was unable to determine why
> remove_inode_hugepages is the only place in hugetlbfs code doing this.
> 
> With this in mind, a simple change like the following will fix the original
> leak issue as well as the potential issues mentioned in this patch.
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 53ea3cef526e..9f0719bad46f 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -511,6 +511,11 @@ static void hugetlbfs_evict_inode(struct inode *inode)
>  {
>  	struct resv_map *resv_map;
>  
> +	/*
> +	 * Make sure we are operating on original hugetlbfs address space.
> +	 */
> +	inode->i_mapping = &inode->i_data;
> +
>  	remove_inode_hugepages(inode, 0, LLONG_MAX);
>  	resv_map = (struct resv_map *)inode->i_mapping->private_data;
>  	/* root inode doesn't have the resv_map, so we should check it */
> 
> 
> I don't know why hugetlbfs code would ever want to operate on any address
> space but the one embedded within the inode.  However, my uderstanding of
> the vfs layer is somewhat limited.  I'm wondering if the hugetlbfs code
> (helper routines mostly) should perhaps use &inode->i_data instead of
> inode->i_mapping.  Does it ever make sense for hugetlbfs code to operate
> on inode->i_mapping if inode->i_mapping != &inode->i_data ?

I'm not a fs expert, but &inode->i_data seems safer pointer to reach to
the struct address_space because it's embedded in inode and always exists.

Thanks,
Naoya Horiguchi

  parent reply	other threads:[~2019-04-15  6:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12  4:02 [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info Yufen Yu
2019-04-12 23:40 ` Mike Kravetz
2019-04-13 11:57   ` yuyufen
2019-04-15  6:16   ` Naoya Horiguchi [this message]
2019-04-15  9:15     ` Michal Hocko
2019-04-15 17:11       ` Mike Kravetz
2019-04-15 23:59         ` Naoya Horiguchi
2019-04-16  0:37           ` Mike Kravetz
2019-04-16  6:50         ` Michal Hocko
2019-04-19 20:44           ` [PATCH] hugetlbfs: always use address space in inode for resv_map pointer Mike Kravetz
2019-05-08  7:10             ` yuyufen
2019-05-08 20:16               ` Mike Kravetz
2019-05-09 23:11                 ` Andrew Morton
2019-05-09 23:32                   ` Mike Kravetz
2019-04-16 12:57         ` [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info yuyufen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190415061618.GA16061@hori.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=yuyufen@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.