All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-01 14:46 ` Naoya Horiguchi
  0 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-01 14:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Hugh Dickins, Rik van Riel, Hillf Danton,
	linux-kernel, linux-mm, Naoya Horiguchi

I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
calculation in rmap_walk_anon() fails to consider compound_order() only to
have an incorrect value. So this patch fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/rmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git v3.16-rc3.orig/mm/rmap.c v3.16-rc3/mm/rmap.c
index b7e94ebbd09e..8cc964c6bd8d 100644
--- v3.16-rc3.orig/mm/rmap.c
+++ v3.16-rc3/mm/rmap.c
@@ -1639,7 +1639,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = page->index << compound_order(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-01 14:46 ` Naoya Horiguchi
  0 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-01 14:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Hugh Dickins, Rik van Riel, Hillf Danton,
	linux-kernel, linux-mm, Naoya Horiguchi

I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
calculation in rmap_walk_anon() fails to consider compound_order() only to
have an incorrect value. So this patch fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/rmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git v3.16-rc3.orig/mm/rmap.c v3.16-rc3/mm/rmap.c
index b7e94ebbd09e..8cc964c6bd8d 100644
--- v3.16-rc3.orig/mm/rmap.c
+++ v3.16-rc3/mm/rmap.c
@@ -1639,7 +1639,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = page->index << compound_order(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
  2014-07-01 14:46 ` Naoya Horiguchi
@ 2014-07-01 17:42   ` Rik van Riel
  -1 siblings, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2014-07-01 17:42 UTC (permalink / raw)
  To: Naoya Horiguchi, Andrew Morton
  Cc: Joonsoo Kim, Hugh Dickins, Hillf Danton, linux-kernel, linux-mm,
	Naoya Horiguchi

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/01/2014 10:46 AM, Naoya Horiguchi wrote:
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an
> anonymous hugepage with mbind() in the kernel v3.16-rc3. This is
> because pgoff's calculation in rmap_walk_anon() fails to consider
> compound_order() only to have an incorrect value. So this patch
> fixes it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTsvMDAAoJEM553pKExN6DFIkH/1JOOZq3VtVLMr4f5I7QsqkE
/g3QU5JKbCdQZKeV+X1jFbZOyU7/B5gwSTVJMBIlAfFjm/tu21OgUeKAafvgfQHl
EVzisjPg1kN5jgtRrO61u4Bnt0rkLWSobBhDpwxslU2nWHNjVlyTBV5Hu2WcpWPB
PHoo+qt1/W185tVe6jeb6VpSZwGDXHWj7AQXGCBO3llCIythpbi/+u1IeoL3jW7V
UG2/xWZiRVBAoKoD4myvJvY6ny8o+WLpZh505WWJKM6WuTvSPhyOt+K1tWV3Sttq
km1aVgnSBYiH6m7if1PwLFtFTlvwhiDI2VZwW+GwhXWak6dh6lS+TssxOw1M9U4=
=zYJe
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-01 17:42   ` Rik van Riel
  0 siblings, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2014-07-01 17:42 UTC (permalink / raw)
  To: Naoya Horiguchi, Andrew Morton
  Cc: Joonsoo Kim, Hugh Dickins, Hillf Danton, linux-kernel, linux-mm,
	Naoya Horiguchi

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/01/2014 10:46 AM, Naoya Horiguchi wrote:
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an
> anonymous hugepage with mbind() in the kernel v3.16-rc3. This is
> because pgoff's calculation in rmap_walk_anon() fails to consider
> compound_order() only to have an incorrect value. So this patch
> fixes it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTsvMDAAoJEM553pKExN6DFIkH/1JOOZq3VtVLMr4f5I7QsqkE
/g3QU5JKbCdQZKeV+X1jFbZOyU7/B5gwSTVJMBIlAfFjm/tu21OgUeKAafvgfQHl
EVzisjPg1kN5jgtRrO61u4Bnt0rkLWSobBhDpwxslU2nWHNjVlyTBV5Hu2WcpWPB
PHoo+qt1/W185tVe6jeb6VpSZwGDXHWj7AQXGCBO3llCIythpbi/+u1IeoL3jW7V
UG2/xWZiRVBAoKoD4myvJvY6ny8o+WLpZh505WWJKM6WuTvSPhyOt+K1tWV3Sttq
km1aVgnSBYiH6m7if1PwLFtFTlvwhiDI2VZwW+GwhXWak6dh6lS+TssxOw1M9U4=
=zYJe
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
  2014-07-01 14:46 ` Naoya Horiguchi
@ 2014-07-01 18:07   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-07-01 18:07 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 10:46:22AM -0400, Naoya Horiguchi wrote:
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value. So this patch fixes it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/rmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git v3.16-rc3.orig/mm/rmap.c v3.16-rc3/mm/rmap.c
> index b7e94ebbd09e..8cc964c6bd8d 100644
> --- v3.16-rc3.orig/mm/rmap.c
> +++ v3.16-rc3/mm/rmap.c
> @@ -1639,7 +1639,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
>  static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
>  {
>  	struct anon_vma *anon_vma;
> -	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +	pgoff_t pgoff = page->index << compound_order(page);
>  	struct anon_vma_chain *avc;
>  	int ret = SWAP_AGAIN;

Hm. It will not work with THP: ->index there is in PAGE_SIZE units.

Why do we need this special case for hugetlb page ->index? Why not use
PAGE_SIZE units there too? Or I miss something?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-01 18:07   ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-07-01 18:07 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 10:46:22AM -0400, Naoya Horiguchi wrote:
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value. So this patch fixes it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/rmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git v3.16-rc3.orig/mm/rmap.c v3.16-rc3/mm/rmap.c
> index b7e94ebbd09e..8cc964c6bd8d 100644
> --- v3.16-rc3.orig/mm/rmap.c
> +++ v3.16-rc3/mm/rmap.c
> @@ -1639,7 +1639,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
>  static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
>  {
>  	struct anon_vma *anon_vma;
> -	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +	pgoff_t pgoff = page->index << compound_order(page);
>  	struct anon_vma_chain *avc;
>  	int ret = SWAP_AGAIN;

Hm. It will not work with THP: ->index there is in PAGE_SIZE units.

Why do we need this special case for hugetlb page ->index? Why not use
PAGE_SIZE units there too? Or I miss something?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
  2014-07-01 18:07   ` Kirill A. Shutemov
@ 2014-07-01 18:50     ` Naoya Horiguchi
  -1 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-01 18:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 01, 2014 at 10:46:22AM -0400, Naoya Horiguchi wrote:
> > I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> > hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> > calculation in rmap_walk_anon() fails to consider compound_order() only to
> > have an incorrect value. So this patch fixes it.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  mm/rmap.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git v3.16-rc3.orig/mm/rmap.c v3.16-rc3/mm/rmap.c
> > index b7e94ebbd09e..8cc964c6bd8d 100644
> > --- v3.16-rc3.orig/mm/rmap.c
> > +++ v3.16-rc3/mm/rmap.c
> > @@ -1639,7 +1639,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
> >  static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
> >  {
> >  	struct anon_vma *anon_vma;
> > -	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +	pgoff_t pgoff = page->index << compound_order(page);
> >  	struct anon_vma_chain *avc;
> >  	int ret = SWAP_AGAIN;
> 
> Hm. It will not work with THP: ->index there is in PAGE_SIZE units.

I wrongly assumed that rmap is never used by thp, sorry.

> Why do we need this special case for hugetlb page ->index? Why not use
> PAGE_SIZE units there too? Or I miss something?

hugetlb pages are never split, so we use larger page cache size for
hugetlbfs file (to avoid large sparse page cache tree.) I'm not sure
if we should do this for anonymous hugepages, but I guess that using
different cache size in hugetlbfs makes code complicated.

Anyway I'll do some generalization to handle any types of pages
rmap_walk_anon() can called on. Maybe something similar to
linear_page_index() will be added.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-01 18:50     ` Naoya Horiguchi
  0 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-01 18:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 01, 2014 at 10:46:22AM -0400, Naoya Horiguchi wrote:
> > I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> > hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> > calculation in rmap_walk_anon() fails to consider compound_order() only to
> > have an incorrect value. So this patch fixes it.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  mm/rmap.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git v3.16-rc3.orig/mm/rmap.c v3.16-rc3/mm/rmap.c
> > index b7e94ebbd09e..8cc964c6bd8d 100644
> > --- v3.16-rc3.orig/mm/rmap.c
> > +++ v3.16-rc3/mm/rmap.c
> > @@ -1639,7 +1639,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
> >  static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
> >  {
> >  	struct anon_vma *anon_vma;
> > -	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +	pgoff_t pgoff = page->index << compound_order(page);
> >  	struct anon_vma_chain *avc;
> >  	int ret = SWAP_AGAIN;
> 
> Hm. It will not work with THP: ->index there is in PAGE_SIZE units.

I wrongly assumed that rmap is never used by thp, sorry.

> Why do we need this special case for hugetlb page ->index? Why not use
> PAGE_SIZE units there too? Or I miss something?

hugetlb pages are never split, so we use larger page cache size for
hugetlbfs file (to avoid large sparse page cache tree.) I'm not sure
if we should do this for anonymous hugepages, but I guess that using
different cache size in hugetlbfs makes code complicated.

Anyway I'll do some generalization to handle any types of pages
rmap_walk_anon() can called on. Maybe something similar to
linear_page_index() will be added.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
  2014-07-01 18:50     ` Naoya Horiguchi
@ 2014-07-01 20:15       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-07-01 20:15 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > Why do we need this special case for hugetlb page ->index? Why not use
> > PAGE_SIZE units there too? Or I miss something?
> 
> hugetlb pages are never split, so we use larger page cache size for
> hugetlbfs file (to avoid large sparse page cache tree.)

For transparent huge page cache I would like to have native support in
page cache radix-tree: since huge pages are always naturally aligned we
can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
range the huge page represents. This approach should fit hugetlb too. And
-1 special case for hugetlb.
But I'm not sure when I'll get time to play with this...

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-01 20:15       ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-07-01 20:15 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > Why do we need this special case for hugetlb page ->index? Why not use
> > PAGE_SIZE units there too? Or I miss something?
> 
> hugetlb pages are never split, so we use larger page cache size for
> hugetlbfs file (to avoid large sparse page cache tree.)

For transparent huge page cache I would like to have native support in
page cache radix-tree: since huge pages are always naturally aligned we
can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
range the huge page represents. This approach should fit hugetlb too. And
-1 special case for hugetlb.
But I'm not sure when I'll get time to play with this...

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
  2014-07-01 20:15       ` Kirill A. Shutemov
@ 2014-07-02  4:30         ` Naoya Horiguchi
  -1 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-02  4:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 11:15:40PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> > On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > > Why do we need this special case for hugetlb page ->index? Why not use
> > > PAGE_SIZE units there too? Or I miss something?
> > 
> > hugetlb pages are never split, so we use larger page cache size for
> > hugetlbfs file (to avoid large sparse page cache tree.)
> 
> For transparent huge page cache I would like to have native support in
> page cache radix-tree: since huge pages are always naturally aligned we
> can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
> HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
> range the huge page represents. This approach should fit hugetlb too. And
> -1 special case for hugetlb.
> But I'm not sure when I'll get time to play with this...

So I'm OK that hugetlb page should have ->index in PAGE_CACHE_SIZE
when transparent huge page is merged. I may try to write patches
on top of your tree after I've done a few series in my work queue.

In order to fix the current problem, I suggest a page_to_pgoff() as a
short-term workaround. I found a few other call sites which can call
on hugepage, so this function help us track such callers.
The similar function seems to be introduced in your transparent huge
page cache tree (page_cache_index()). So this function will be finally
overwritten with it.

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Tue, 1 Jul 2014 21:38:22 -0400
Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly

I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
calculation in rmap_walk_anon() fails to consider compound_order() only to
have an incorrect value.

This patch introduces page_to_pgoff(), which gets the page's offset in
PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
handle hugepages, and in order to make hugetlbfs fit it, page->index of
hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
but page_to_pgoff() contains the point to be fixed in a single function.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/pagemap.h | 12 ++++++++++++
 mm/memory-failure.c     |  4 ++--
 mm/rmap.c               | 10 +++-------
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0a97b583ee8d..e1474ae18c88 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -399,6 +399,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
 }
 
 /*
+ * Get the offset in PAGE_SIZE.
+ * (TODO: hugepage should have ->index in PAGE_SIZE)
+ */
+static inline pgoff_t page_to_pgoff(struct page *page)
+{
+	if (unlikely(PageHeadHuge(page)))
+		return page->index << compound_order(page);
+	else
+		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+}
+
+/*
  * Return byte-offset into filesystem object for page.
  */
 static inline loff_t page_offset(struct page *page)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index cd8989c1027e..61f05d745e3d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -435,7 +435,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	if (av == NULL)	/* Not actually mapped anymore */
 		return;
 
-	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff = page_to_pgoff(page);
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
@@ -469,7 +469,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	mutex_lock(&mapping->i_mmap_mutex);
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
-		pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+		pgoff_t pgoff = page_to_pgoff(page);
 		struct task_struct *t = task_early_kill(tsk, force_early);
 
 		if (!t)
diff --git a/mm/rmap.c b/mm/rmap.c
index b7e94ebbd09e..22a4a7699cdb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -517,11 +517,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
 static inline unsigned long
 __vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-
-	if (unlikely(is_vm_hugetlb_page(vma)))
-		pgoff = page->index << huge_page_order(page_hstate(page));
-
+	pgoff_t pgoff = page_to_pgoff(page);
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
@@ -1639,7 +1635,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = page_to_pgoff(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1680,7 +1676,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page->index << compound_order(page);
+	pgoff_t pgoff = page_to_pgoff(page);
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-02  4:30         ` Naoya Horiguchi
  0 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-02  4:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, Jul 01, 2014 at 11:15:40PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> > On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > > Why do we need this special case for hugetlb page ->index? Why not use
> > > PAGE_SIZE units there too? Or I miss something?
> > 
> > hugetlb pages are never split, so we use larger page cache size for
> > hugetlbfs file (to avoid large sparse page cache tree.)
> 
> For transparent huge page cache I would like to have native support in
> page cache radix-tree: since huge pages are always naturally aligned we
> can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
> HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
> range the huge page represents. This approach should fit hugetlb too. And
> -1 special case for hugetlb.
> But I'm not sure when I'll get time to play with this...

So I'm OK that hugetlb page should have ->index in PAGE_CACHE_SIZE
when transparent huge page is merged. I may try to write patches
on top of your tree after I've done a few series in my work queue.

In order to fix the current problem, I suggest a page_to_pgoff() as a
short-term workaround. I found a few other call sites which can call
on hugepage, so this function help us track such callers.
The similar function seems to be introduced in your transparent huge
page cache tree (page_cache_index()). So this function will be finally
overwritten with it.

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Tue, 1 Jul 2014 21:38:22 -0400
Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly

I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
calculation in rmap_walk_anon() fails to consider compound_order() only to
have an incorrect value.

This patch introduces page_to_pgoff(), which gets the page's offset in
PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
handle hugepages, and in order to make hugetlbfs fit it, page->index of
hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
but page_to_pgoff() contains the point to be fixed in a single function.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/pagemap.h | 12 ++++++++++++
 mm/memory-failure.c     |  4 ++--
 mm/rmap.c               | 10 +++-------
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0a97b583ee8d..e1474ae18c88 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -399,6 +399,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
 }
 
 /*
+ * Get the offset in PAGE_SIZE.
+ * (TODO: hugepage should have ->index in PAGE_SIZE)
+ */
+static inline pgoff_t page_to_pgoff(struct page *page)
+{
+	if (unlikely(PageHeadHuge(page)))
+		return page->index << compound_order(page);
+	else
+		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+}
+
+/*
  * Return byte-offset into filesystem object for page.
  */
 static inline loff_t page_offset(struct page *page)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index cd8989c1027e..61f05d745e3d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -435,7 +435,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	if (av == NULL)	/* Not actually mapped anymore */
 		return;
 
-	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff = page_to_pgoff(page);
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
@@ -469,7 +469,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	mutex_lock(&mapping->i_mmap_mutex);
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
-		pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+		pgoff_t pgoff = page_to_pgoff(page);
 		struct task_struct *t = task_early_kill(tsk, force_early);
 
 		if (!t)
diff --git a/mm/rmap.c b/mm/rmap.c
index b7e94ebbd09e..22a4a7699cdb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -517,11 +517,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
 static inline unsigned long
 __vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-
-	if (unlikely(is_vm_hugetlb_page(vma)))
-		pgoff = page->index << huge_page_order(page_hstate(page));
-
+	pgoff_t pgoff = page_to_pgoff(page);
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
@@ -1639,7 +1635,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = page_to_pgoff(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1680,7 +1676,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page->index << compound_order(page);
+	pgoff_t pgoff = page_to_pgoff(page);
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
  2014-07-02  4:30         ` Naoya Horiguchi
@ 2014-07-03 11:41           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-07-03 11:41 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Wed, Jul 02, 2014 at 12:30:57AM -0400, Naoya Horiguchi wrote:
> On Tue, Jul 01, 2014 at 11:15:40PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> > > On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > > > Why do we need this special case for hugetlb page ->index? Why not use
> > > > PAGE_SIZE units there too? Or I miss something?
> > > 
> > > hugetlb pages are never split, so we use larger page cache size for
> > > hugetlbfs file (to avoid large sparse page cache tree.)
> > 
> > For transparent huge page cache I would like to have native support in
> > page cache radix-tree: since huge pages are always naturally aligned we
> > can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
> > HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
> > range the huge page represents. This approach should fit hugetlb too. And
> > -1 special case for hugetlb.
> > But I'm not sure when I'll get time to play with this...
> 
> So I'm OK that hugetlb page should have ->index in PAGE_CACHE_SIZE
> when transparent huge page is merged. I may try to write patches
> on top of your tree after I've done a few series in my work queue.
> 
> In order to fix the current problem, I suggest a page_to_pgoff() as a
> short-term workaround. I found a few other call sites which can call
> on hugepage, so this function help us track such callers.
> The similar function seems to be introduced in your transparent huge
> page cache tree (page_cache_index()). So this function will be finally
> overwritten with it.
> 
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Tue, 1 Jul 2014 21:38:22 -0400
> Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> 
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value.
> 
> This patch introduces page_to_pgoff(), which gets the page's offset in
> PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> handle hugepages, and in order to make hugetlbfs fit it, page->index of
> hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> but page_to_pgoff() contains the point to be fixed in a single function.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-03 11:41           ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-07-03 11:41 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Wed, Jul 02, 2014 at 12:30:57AM -0400, Naoya Horiguchi wrote:
> On Tue, Jul 01, 2014 at 11:15:40PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> > > On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > > > Why do we need this special case for hugetlb page ->index? Why not use
> > > > PAGE_SIZE units there too? Or I miss something?
> > > 
> > > hugetlb pages are never split, so we use larger page cache size for
> > > hugetlbfs file (to avoid large sparse page cache tree.)
> > 
> > For transparent huge page cache I would like to have native support in
> > page cache radix-tree: since huge pages are always naturally aligned we
> > can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
> > HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
> > range the huge page represents. This approach should fit hugetlb too. And
> > -1 special case for hugetlb.
> > But I'm not sure when I'll get time to play with this...
> 
> So I'm OK that hugetlb page should have ->index in PAGE_CACHE_SIZE
> when transparent huge page is merged. I may try to write patches
> on top of your tree after I've done a few series in my work queue.
> 
> In order to fix the current problem, I suggest a page_to_pgoff() as a
> short-term workaround. I found a few other call sites which can call
> on hugepage, so this function help us track such callers.
> The similar function seems to be introduced in your transparent huge
> page cache tree (page_cache_index()). So this function will be finally
> overwritten with it.
> 
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Tue, 1 Jul 2014 21:38:22 -0400
> Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> 
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value.
> 
> This patch introduces page_to_pgoff(), which gets the page's offset in
> PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> handle hugepages, and in order to make hugetlbfs fit it, page->index of
> hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> but page_to_pgoff() contains the point to be fixed in a single function.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
  2014-07-02  4:30         ` Naoya Horiguchi
@ 2014-07-07 19:39           ` Andrew Morton
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2014-07-07 19:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Wed, 2 Jul 2014 00:30:57 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> 
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value.
> 
> This patch introduces page_to_pgoff(), which gets the page's offset in
> PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> handle hugepages, and in order to make hugetlbfs fit it, page->index of
> hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> but page_to_pgoff() contains the point to be fixed in a single function.
> 
> ...
>
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -399,6 +399,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
>  }
>  
>  /*
> + * Get the offset in PAGE_SIZE.
> + * (TODO: hugepage should have ->index in PAGE_SIZE)
> + */
> +static inline pgoff_t page_to_pgoff(struct page *page)
> +{
> +	if (unlikely(PageHeadHuge(page)))
> +		return page->index << compound_order(page);
> +	else
> +		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +}
> +

This is all a bit of a mess.

We have page_offset() which only works for regular pagecache pages and
not for huge pages.

We have page_file_offset() which works for regular pagecache as well
as swapcache but not for huge pages.

We have page_index() and page_file_index() which differ in undocumented
ways which I cannot be bothered working out.  The latter calls
__page_file_index() which is grossly misnamed.

Now we get a new page_to_pgoff() which in inconsistently named but has
a similarly crappy level of documentation and which works for hugepages
and regular pagecache pages but not for swapcache pages.


Sigh.

I'll merge this patch because it's a bugfix but could someone please
drive a truck through all this stuff and see if we can come up with
something tasteful and sane?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
@ 2014-07-07 19:39           ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2014-07-07 19:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Wed, 2 Jul 2014 00:30:57 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> 
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value.
> 
> This patch introduces page_to_pgoff(), which gets the page's offset in
> PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> handle hugepages, and in order to make hugetlbfs fit it, page->index of
> hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> but page_to_pgoff() contains the point to be fixed in a single function.
> 
> ...
>
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -399,6 +399,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
>  }
>  
>  /*
> + * Get the offset in PAGE_SIZE.
> + * (TODO: hugepage should have ->index in PAGE_SIZE)
> + */
> +static inline pgoff_t page_to_pgoff(struct page *page)
> +{
> +	if (unlikely(PageHeadHuge(page)))
> +		return page->index << compound_order(page);
> +	else
> +		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +}
> +

This is all a bit of a mess.

We have page_offset() which only works for regular pagecache pages and
not for huge pages.

We have page_file_offset() which works for regular pagecache as well
as swapcache but not for huge pages.

We have page_index() and page_file_index() which differ in undocumented
ways which I cannot be bothered working out.  The latter calls
__page_file_index() which is grossly misnamed.

Now we get a new page_to_pgoff() which in inconsistently named but has
a similarly crappy level of documentation and which works for hugepages
and regular pagecache pages but not for swapcache pages.


Sigh.

I'll merge this patch because it's a bugfix but could someone please
drive a truck through all this stuff and see if we can come up with
something tasteful and sane?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -mm] mm: refactor page index/offset getters
  2014-07-07 19:39           ` Andrew Morton
@ 2014-07-15 16:41             ` Naoya Horiguchi
  -1 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-15 16:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Mon, Jul 07, 2014 at 12:39:23PM -0700, Andrew Morton wrote:
> On Wed, 2 Jul 2014 00:30:57 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> > 
> > I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> > hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> > calculation in rmap_walk_anon() fails to consider compound_order() only to
> > have an incorrect value.
> > 
> > This patch introduces page_to_pgoff(), which gets the page's offset in
> > PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> > handle hugepages, and in order to make hugetlbfs fit it, page->index of
> > hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> > but page_to_pgoff() contains the point to be fixed in a single function.
> > 
> > ...
> >
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -399,6 +399,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
> >  }
> >  
> >  /*
> > + * Get the offset in PAGE_SIZE.
> > + * (TODO: hugepage should have ->index in PAGE_SIZE)
> > + */
> > +static inline pgoff_t page_to_pgoff(struct page *page)
> > +{
> > +	if (unlikely(PageHeadHuge(page)))
> > +		return page->index << compound_order(page);
> > +	else
> > +		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +}
> > +
> 
> This is all a bit of a mess.
> 
> We have page_offset() which only works for regular pagecache pages and
> not for huge pages.
> 
> We have page_file_offset() which works for regular pagecache as well
> as swapcache but not for huge pages.
> 
> We have page_index() and page_file_index() which differ in undocumented
> ways which I cannot be bothered working out.  The latter calls
> __page_file_index() which is grossly misnamed.
> 
> Now we get a new page_to_pgoff() which in inconsistently named but has
> a similarly crappy level of documentation and which works for hugepages
> and regular pagecache pages but not for swapcache pages.
> 
> 
> Sigh.
> 
> I'll merge this patch because it's a bugfix but could someone please
> drive a truck through all this stuff and see if we can come up with
> something tasteful and sane?

I wrote a patch for this, could you take a look?

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Tue, 15 Jul 2014 12:05:53 -0400
Subject: [PATCH] mm: refactor page index/offset getters

There is a complaint about duplication around the fundamental routines
of page index/offset getters.

page_(index|offset) and page_file_(index|offset) provide the same
functionality, so we can merge them as page_(index|offset), respectively.

And this patch gives the clear meaning to the getters:
 - page_index(): get page cache index (offset in relevant page size)
 - page_pgoff(): get 4kB page offset
 - page_offset(): get byte offset
All these functions are aware of regular pages, swapcaches, and hugepages.

The definition of PageHuge is relocated to include/linux/mm.h, because
some users of page_pgoff() doesn't include include/linux/hugetlb.h.

__page_file_index() is not well named, because it's only for swap cache.
So let's rename it with __page_swap_index().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/nfs/internal.h       |  6 +++---
 fs/nfs/pagelist.c       |  2 +-
 fs/nfs/read.c           |  2 +-
 fs/nfs/write.c          | 10 +++++-----
 fs/xfs/xfs_aops.c       |  2 +-
 include/linux/hugetlb.h |  7 -------
 include/linux/mm.h      | 26 ++++++++++----------------
 include/linux/pagemap.h | 22 +++++++++-------------
 mm/memory-failure.c     |  4 ++--
 mm/page_io.c            |  6 +++---
 mm/rmap.c               |  6 +++---
 mm/swapfile.c           |  4 ++--
 12 files changed, 40 insertions(+), 57 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index dd8bfc2e2464..82d942fead90 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -576,11 +576,11 @@ unsigned int nfs_page_length(struct page *page)
 	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
-		pgoff_t page_index = page_file_index(page);
+		pgoff_t index = page_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page_index < end_index)
+		if (index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page_index == end_index)
+		if (index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 03ed984ab4d8..eace07a9104d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -173,7 +173,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	 * long write-back delay. This will be adjusted in
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
-	req->wb_index	= page_file_index(page);
+	req->wb_index	= page_index(page);
 	page_cache_get(page);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 411aedda14bb..94ff1cf21d2c 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -538,7 +538,7 @@ int nfs_readpage(struct file *file, struct page *page)
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page_file_index(page));
+		page, PAGE_CACHE_SIZE, page_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ffb9459f180b..e990dd527764 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -194,9 +194,9 @@ static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int c
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page_file_index(page) < end_index)
+	if (i_size > 0 && page_index(page) < end_index)
 		goto out;
-	end = page_file_offset(page) + ((loff_t)offset+count);
+	end = page_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -337,7 +337,7 @@ static int nfs_do_writepage(struct page *page, struct writeback_control *wbc, st
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page_file_index(page));
+	nfs_pageio_cond_complete(pgio, page_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -961,7 +961,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
 
 	dprintk("NFS:       nfs_updatepage(%pD2 %d@%lld)\n",
-		file, count, (long long)(page_file_offset(page) + offset));
+		file, count, (long long)(page_offset(page) + offset));
 
 	if (nfs_can_extend_write(file, page, inode)) {
 		count = max(count + offset, nfs_page_length(page));
@@ -1817,7 +1817,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_file_offset(page);
+	loff_t range_start = page_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0479c32c5eb1..2f94687c2849 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -695,7 +695,7 @@ xfs_convert_page(
 	unsigned int		type;
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
- 	xfs_off_t		offset = page_offset(page);
+	xfs_off_t		offset = page_offset(page);
 
 	if (page->index != tindex)
 		goto fail;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 41272bcf73f8..026d8b147027 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mostly;
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
-int PageHuge(struct page *page);
-
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
@@ -108,11 +106,6 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 #else /* !CONFIG_HUGETLB_PAGE */
 
-static inline int PageHuge(struct page *page)
-{
-	return 0;
-}
-
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 84b2a6cf45f6..5a86ce1b4cd0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -450,8 +450,13 @@ static inline int page_count(struct page *page)
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
+extern int PageHuge(struct page *page);
 extern int PageHeadHuge(struct page *page_head);
 #else /* CONFIG_HUGETLB_PAGE */
+static inline int PageHuge(struct page *page)
+{
+	return 0;
+}
 static inline int PageHeadHuge(struct page *page_head)
 {
 	return 0;
@@ -980,27 +985,16 @@ static inline int PageAnon(struct page *page)
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
-/*
- * Return the pagecache index of the passed page.  Regular pagecache pages
- * use ->index whereas swapcache pages use ->private
- */
-static inline pgoff_t page_index(struct page *page)
-{
-	if (unlikely(PageSwapCache(page)))
-		return page_private(page);
-	return page->index;
-}
-
-extern pgoff_t __page_file_index(struct page *page);
+extern pgoff_t __page_swap_index(struct page *page);
 
 /*
- * Return the file index of the page. Regular pagecache pages use ->index
- * whereas swapcache pages use swp_offset(->private)
+ * Return the pagecache index of the passed page, which is the offset
+ * in the relevant page size.
  */
-static inline pgoff_t page_file_index(struct page *page)
+static inline pgoff_t page_index(struct page *page)
 {
 	if (unlikely(PageSwapCache(page)))
-		return __page_file_index(page);
+		return __page_swap_index(page);
 
 	return page->index;
 }
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e1474ae18c88..3b27877ac6d0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -399,28 +399,24 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
 }
 
 /*
- * Get the offset in PAGE_SIZE.
+ * Return the 4kB page offset of the given page.
  * (TODO: hugepage should have ->index in PAGE_SIZE)
  */
-static inline pgoff_t page_to_pgoff(struct page *page)
+static inline pgoff_t page_pgoff(struct page *page)
 {
-	if (unlikely(PageHeadHuge(page)))
-		return page->index << compound_order(page);
-	else
-		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	if (unlikely(PageHuge(page))) {
+		VM_BUG_ON_PAGE(PageTail(page), page);
+		return page_index(page) << compound_order(page);
+	} else
+		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
 /*
- * Return byte-offset into filesystem object for page.
+ * Return the byte offset of the given page.
  */
 static inline loff_t page_offset(struct page *page)
 {
-	return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
-}
-
-static inline loff_t page_file_offset(struct page *page)
-{
-	return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
+	return ((loff_t)page_pgoff(page)) << PAGE_SHIFT;
 }
 
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e3e2f007946e..75acb65bd912 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -435,7 +435,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	if (av == NULL)	/* Not actually mapped anymore */
 		return;
 
-	pgoff = page_to_pgoff(page);
+	pgoff = page_pgoff(page);
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
@@ -469,7 +469,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	mutex_lock(&mapping->i_mmap_mutex);
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
-		pgoff_t pgoff = page_to_pgoff(page);
+		pgoff_t pgoff = page_pgoff(page);
 		struct task_struct *t = task_early_kill(tsk, force_early);
 
 		if (!t)
diff --git a/mm/page_io.c b/mm/page_io.c
index 58b50d2901fe..4ca964f83718 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 
 static sector_t swap_page_sector(struct page *page)
 {
-	return (sector_t)__page_file_index(page) << (PAGE_CACHE_SHIFT - 9);
+	return (sector_t)__page_swap_index(page) << (PAGE_CACHE_SHIFT - 9);
 }
 
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
@@ -270,7 +270,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		};
 
 		init_sync_kiocb(&kiocb, swap_file);
-		kiocb.ki_pos = page_file_offset(page);
+		kiocb.ki_pos = page_offset(page);
 		kiocb.ki_nbytes = PAGE_SIZE;
 
 		set_page_writeback(page);
@@ -296,7 +296,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 			set_page_dirty(page);
 			ClearPageReclaim(page);
 			pr_err_ratelimited("Write error on dio swapfile (%Lu)\n",
-				page_file_offset(page));
+				page_offset(page));
 		}
 		end_page_writeback(page);
 		return ret;
diff --git a/mm/rmap.c b/mm/rmap.c
index 3e8491c504f8..7928ddd91b6e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -517,7 +517,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
 static inline unsigned long
 __vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
@@ -1615,7 +1615,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1656,7 +1656,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8798b2e0ac59..1c4027c702fb 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2715,13 +2715,13 @@ struct address_space *__page_file_mapping(struct page *page)
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
 
-pgoff_t __page_file_index(struct page *page)
+pgoff_t __page_swap_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
-EXPORT_SYMBOL_GPL(__page_file_index);
+EXPORT_SYMBOL_GPL(__page_swap_index);
 
 /*
  * add_swap_count_continuation - called when a swap count is duplicated
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -mm] mm: refactor page index/offset getters
@ 2014-07-15 16:41             ` Naoya Horiguchi
  0 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-15 16:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Mon, Jul 07, 2014 at 12:39:23PM -0700, Andrew Morton wrote:
> On Wed, 2 Jul 2014 00:30:57 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> > 
> > I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> > hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> > calculation in rmap_walk_anon() fails to consider compound_order() only to
> > have an incorrect value.
> > 
> > This patch introduces page_to_pgoff(), which gets the page's offset in
> > PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> > handle hugepages, and in order to make hugetlbfs fit it, page->index of
> > hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> > but page_to_pgoff() contains the point to be fixed in a single function.
> > 
> > ...
> >
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -399,6 +399,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
> >  }
> >  
> >  /*
> > + * Get the offset in PAGE_SIZE.
> > + * (TODO: hugepage should have ->index in PAGE_SIZE)
> > + */
> > +static inline pgoff_t page_to_pgoff(struct page *page)
> > +{
> > +	if (unlikely(PageHeadHuge(page)))
> > +		return page->index << compound_order(page);
> > +	else
> > +		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +}
> > +
> 
> This is all a bit of a mess.
> 
> We have page_offset() which only works for regular pagecache pages and
> not for huge pages.
> 
> We have page_file_offset() which works for regular pagecache as well
> as swapcache but not for huge pages.
> 
> We have page_index() and page_file_index() which differ in undocumented
> ways which I cannot be bothered working out.  The latter calls
> __page_file_index() which is grossly misnamed.
> 
> Now we get a new page_to_pgoff() which in inconsistently named but has
> a similarly crappy level of documentation and which works for hugepages
> and regular pagecache pages but not for swapcache pages.
> 
> 
> Sigh.
> 
> I'll merge this patch because it's a bugfix but could someone please
> drive a truck through all this stuff and see if we can come up with
> something tasteful and sane?

I wrote a patch for this, could you take a look?

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Tue, 15 Jul 2014 12:05:53 -0400
Subject: [PATCH] mm: refactor page index/offset getters

There is a complaint about duplication around the fundamental routines
of page index/offset getters.

page_(index|offset) and page_file_(index|offset) provide the same
functionality, so we can merge them as page_(index|offset), respectively.

And this patch gives the clear meaning to the getters:
 - page_index(): get page cache index (offset in relevant page size)
 - page_pgoff(): get 4kB page offset
 - page_offset(): get byte offset
All these functions are aware of regular pages, swapcaches, and hugepages.

The definition of PageHuge is relocated to include/linux/mm.h, because
some users of page_pgoff() doesn't include include/linux/hugetlb.h.

__page_file_index() is not well named, because it's only for swap cache.
So let's rename it with __page_swap_index().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/nfs/internal.h       |  6 +++---
 fs/nfs/pagelist.c       |  2 +-
 fs/nfs/read.c           |  2 +-
 fs/nfs/write.c          | 10 +++++-----
 fs/xfs/xfs_aops.c       |  2 +-
 include/linux/hugetlb.h |  7 -------
 include/linux/mm.h      | 26 ++++++++++----------------
 include/linux/pagemap.h | 22 +++++++++-------------
 mm/memory-failure.c     |  4 ++--
 mm/page_io.c            |  6 +++---
 mm/rmap.c               |  6 +++---
 mm/swapfile.c           |  4 ++--
 12 files changed, 40 insertions(+), 57 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index dd8bfc2e2464..82d942fead90 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -576,11 +576,11 @@ unsigned int nfs_page_length(struct page *page)
 	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
-		pgoff_t page_index = page_file_index(page);
+		pgoff_t index = page_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page_index < end_index)
+		if (index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page_index == end_index)
+		if (index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 03ed984ab4d8..eace07a9104d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -173,7 +173,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	 * long write-back delay. This will be adjusted in
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
-	req->wb_index	= page_file_index(page);
+	req->wb_index	= page_index(page);
 	page_cache_get(page);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 411aedda14bb..94ff1cf21d2c 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -538,7 +538,7 @@ int nfs_readpage(struct file *file, struct page *page)
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page_file_index(page));
+		page, PAGE_CACHE_SIZE, page_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ffb9459f180b..e990dd527764 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -194,9 +194,9 @@ static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int c
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page_file_index(page) < end_index)
+	if (i_size > 0 && page_index(page) < end_index)
 		goto out;
-	end = page_file_offset(page) + ((loff_t)offset+count);
+	end = page_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -337,7 +337,7 @@ static int nfs_do_writepage(struct page *page, struct writeback_control *wbc, st
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page_file_index(page));
+	nfs_pageio_cond_complete(pgio, page_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -961,7 +961,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
 
 	dprintk("NFS:       nfs_updatepage(%pD2 %d@%lld)\n",
-		file, count, (long long)(page_file_offset(page) + offset));
+		file, count, (long long)(page_offset(page) + offset));
 
 	if (nfs_can_extend_write(file, page, inode)) {
 		count = max(count + offset, nfs_page_length(page));
@@ -1817,7 +1817,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_file_offset(page);
+	loff_t range_start = page_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0479c32c5eb1..2f94687c2849 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -695,7 +695,7 @@ xfs_convert_page(
 	unsigned int		type;
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
- 	xfs_off_t		offset = page_offset(page);
+	xfs_off_t		offset = page_offset(page);
 
 	if (page->index != tindex)
 		goto fail;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 41272bcf73f8..026d8b147027 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mostly;
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
-int PageHuge(struct page *page);
-
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
@@ -108,11 +106,6 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 #else /* !CONFIG_HUGETLB_PAGE */
 
-static inline int PageHuge(struct page *page)
-{
-	return 0;
-}
-
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 84b2a6cf45f6..5a86ce1b4cd0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -450,8 +450,13 @@ static inline int page_count(struct page *page)
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
+extern int PageHuge(struct page *page);
 extern int PageHeadHuge(struct page *page_head);
 #else /* CONFIG_HUGETLB_PAGE */
+static inline int PageHuge(struct page *page)
+{
+	return 0;
+}
 static inline int PageHeadHuge(struct page *page_head)
 {
 	return 0;
@@ -980,27 +985,16 @@ static inline int PageAnon(struct page *page)
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
-/*
- * Return the pagecache index of the passed page.  Regular pagecache pages
- * use ->index whereas swapcache pages use ->private
- */
-static inline pgoff_t page_index(struct page *page)
-{
-	if (unlikely(PageSwapCache(page)))
-		return page_private(page);
-	return page->index;
-}
-
-extern pgoff_t __page_file_index(struct page *page);
+extern pgoff_t __page_swap_index(struct page *page);
 
 /*
- * Return the file index of the page. Regular pagecache pages use ->index
- * whereas swapcache pages use swp_offset(->private)
+ * Return the pagecache index of the passed page, which is the offset
+ * in the relevant page size.
  */
-static inline pgoff_t page_file_index(struct page *page)
+static inline pgoff_t page_index(struct page *page)
 {
 	if (unlikely(PageSwapCache(page)))
-		return __page_file_index(page);
+		return __page_swap_index(page);
 
 	return page->index;
 }
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e1474ae18c88..3b27877ac6d0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -399,28 +399,24 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
 }
 
 /*
- * Get the offset in PAGE_SIZE.
+ * Return the 4kB page offset of the given page.
  * (TODO: hugepage should have ->index in PAGE_SIZE)
  */
-static inline pgoff_t page_to_pgoff(struct page *page)
+static inline pgoff_t page_pgoff(struct page *page)
 {
-	if (unlikely(PageHeadHuge(page)))
-		return page->index << compound_order(page);
-	else
-		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	if (unlikely(PageHuge(page))) {
+		VM_BUG_ON_PAGE(PageTail(page), page);
+		return page_index(page) << compound_order(page);
+	} else
+		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
 /*
- * Return byte-offset into filesystem object for page.
+ * Return the byte offset of the given page.
  */
 static inline loff_t page_offset(struct page *page)
 {
-	return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
-}
-
-static inline loff_t page_file_offset(struct page *page)
-{
-	return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
+	return ((loff_t)page_pgoff(page)) << PAGE_SHIFT;
 }
 
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e3e2f007946e..75acb65bd912 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -435,7 +435,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	if (av == NULL)	/* Not actually mapped anymore */
 		return;
 
-	pgoff = page_to_pgoff(page);
+	pgoff = page_pgoff(page);
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
@@ -469,7 +469,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	mutex_lock(&mapping->i_mmap_mutex);
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
-		pgoff_t pgoff = page_to_pgoff(page);
+		pgoff_t pgoff = page_pgoff(page);
 		struct task_struct *t = task_early_kill(tsk, force_early);
 
 		if (!t)
diff --git a/mm/page_io.c b/mm/page_io.c
index 58b50d2901fe..4ca964f83718 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 
 static sector_t swap_page_sector(struct page *page)
 {
-	return (sector_t)__page_file_index(page) << (PAGE_CACHE_SHIFT - 9);
+	return (sector_t)__page_swap_index(page) << (PAGE_CACHE_SHIFT - 9);
 }
 
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
@@ -270,7 +270,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		};
 
 		init_sync_kiocb(&kiocb, swap_file);
-		kiocb.ki_pos = page_file_offset(page);
+		kiocb.ki_pos = page_offset(page);
 		kiocb.ki_nbytes = PAGE_SIZE;
 
 		set_page_writeback(page);
@@ -296,7 +296,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 			set_page_dirty(page);
 			ClearPageReclaim(page);
 			pr_err_ratelimited("Write error on dio swapfile (%Lu)\n",
-				page_file_offset(page));
+				page_offset(page));
 		}
 		end_page_writeback(page);
 		return ret;
diff --git a/mm/rmap.c b/mm/rmap.c
index 3e8491c504f8..7928ddd91b6e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -517,7 +517,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
 static inline unsigned long
 __vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
@@ -1615,7 +1615,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1656,7 +1656,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8798b2e0ac59..1c4027c702fb 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2715,13 +2715,13 @@ struct address_space *__page_file_mapping(struct page *page)
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
 
-pgoff_t __page_file_index(struct page *page)
+pgoff_t __page_swap_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
-EXPORT_SYMBOL_GPL(__page_file_index);
+EXPORT_SYMBOL_GPL(__page_swap_index);
 
 /*
  * add_swap_count_continuation - called when a swap count is duplicated
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
  2014-07-15 16:41             ` Naoya Horiguchi
@ 2014-07-23 21:39               ` Andrew Morton
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2014-07-23 21:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, 15 Jul 2014 12:41:12 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> There is a complaint about duplication around the fundamental routines
> of page index/offset getters.
> 
> page_(index|offset) and page_file_(index|offset) provide the same
> functionality, so we can merge them as page_(index|offset), respectively.
> 
> And this patch gives the clear meaning to the getters:
>  - page_index(): get page cache index (offset in relevant page size)
>  - page_pgoff(): get 4kB page offset
>  - page_offset(): get byte offset
> All these functions are aware of regular pages, swapcaches, and hugepages.
> 
> The definition of PageHuge is relocated to include/linux/mm.h, because
> some users of page_pgoff() doesn't include include/linux/hugetlb.h.
> 
> __page_file_index() is not well named, because it's only for swap cache.
> So let's rename it with __page_swap_index().

Thanks, I guess that's better.  Could others please have a look-n-think?

I did this:

--- a/include/linux/pagemap.h~mm-refactor-page-index-offset-getters-fix
+++ a/include/linux/pagemap.h
@@ -412,7 +412,7 @@ static inline pgoff_t page_pgoff(struct
 }
 
 /*
- * Return the byte offset of the given page.
+ * Return the file offset of the given pagecache page, in bytes.
  */
 static inline loff_t page_offset(struct page *page)
 {



You had a random xfs_aops.c whitespace fix which I'll pretend I didn't
notice ;)




From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: mm: refactor page index/offset getters

There is a complaint about duplication around the fundamental routines of
page index/offset getters.

page_(index|offset) and page_file_(index|offset) provide the same
functionality, so we can merge them as page_(index|offset), respectively.

And this patch gives the clear meaning to the getters:
 - page_index(): get page cache index (offset in relevant page size)
 - page_pgoff(): get 4kB page offset
 - page_offset(): get byte offset
All these functions are aware of regular pages, swapcaches, and hugepages.

The definition of PageHuge is relocated to include/linux/mm.h, because
some users of page_pgoff() doesn't include include/linux/hugetlb.h.

__page_file_index() is not well named, because it's only for swap cache.
So let's rename it with __page_swap_index().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/nfs/internal.h       |    6 +++---
 fs/nfs/pagelist.c       |    2 +-
 fs/nfs/read.c           |    2 +-
 fs/nfs/write.c          |   10 +++++-----
 fs/xfs/xfs_aops.c       |    2 +-
 include/linux/hugetlb.h |    7 -------
 include/linux/mm.h      |   26 ++++++++++----------------
 include/linux/pagemap.h |   22 +++++++++-------------
 mm/memory-failure.c     |    4 ++--
 mm/page_io.c            |    6 +++---
 mm/rmap.c               |    6 +++---
 mm/swapfile.c           |    4 ++--
 12 files changed, 40 insertions(+), 57 deletions(-)

diff -puN fs/nfs/internal.h~mm-refactor-page-index-offset-getters fs/nfs/internal.h
--- a/fs/nfs/internal.h~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/internal.h
@@ -566,11 +566,11 @@ unsigned int nfs_page_length(struct page
 	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
-		pgoff_t page_index = page_file_index(page);
+		pgoff_t index = page_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page_index < end_index)
+		if (index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page_index == end_index)
+		if (index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff -puN fs/nfs/pagelist.c~mm-refactor-page-index-offset-getters fs/nfs/pagelist.c
--- a/fs/nfs/pagelist.c~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/pagelist.c
@@ -333,7 +333,7 @@ nfs_create_request(struct nfs_open_conte
 	 * long write-back delay. This will be adjusted in
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
-	req->wb_index	= page_file_index(page);
+	req->wb_index	= page_index(page);
 	page_cache_get(page);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
diff -puN fs/nfs/read.c~mm-refactor-page-index-offset-getters fs/nfs/read.c
--- a/fs/nfs/read.c~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/read.c
@@ -271,7 +271,7 @@ int nfs_readpage(struct file *file, stru
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page_file_index(page));
+		page, PAGE_CACHE_SIZE, page_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
diff -puN fs/nfs/write.c~mm-refactor-page-index-offset-getters fs/nfs/write.c
--- a/fs/nfs/write.c~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/write.c
@@ -153,9 +153,9 @@ static void nfs_grow_file(struct page *p
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page_file_index(page) < end_index)
+	if (i_size > 0 && page_index(page) < end_index)
 		goto out;
-	end = page_file_offset(page) + ((loff_t)offset+count);
+	end = page_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -569,7 +569,7 @@ static int nfs_do_writepage(struct page
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page_file_index(page));
+	nfs_pageio_cond_complete(pgio, page_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -1212,7 +1212,7 @@ int nfs_updatepage(struct file *file, st
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
 
 	dprintk("NFS:       nfs_updatepage(%pD2 %d@%lld)\n",
-		file, count, (long long)(page_file_offset(page) + offset));
+		file, count, (long long)(page_offset(page) + offset));
 
 	if (nfs_can_extend_write(file, page, inode)) {
 		count = max(count + offset, nfs_page_length(page));
@@ -1827,7 +1827,7 @@ int nfs_wb_page_cancel(struct inode *ino
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_file_offset(page);
+	loff_t range_start = page_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
diff -puN fs/xfs/xfs_aops.c~mm-refactor-page-index-offset-getters fs/xfs/xfs_aops.c
--- a/fs/xfs/xfs_aops.c~mm-refactor-page-index-offset-getters
+++ a/fs/xfs/xfs_aops.c
@@ -695,7 +695,7 @@ xfs_convert_page(
 	unsigned int		type;
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
- 	xfs_off_t		offset = page_offset(page);
+	xfs_off_t		offset = page_offset(page);
 
 	if (page->index != tindex)
 		goto fail;
diff -puN include/linux/hugetlb.h~mm-refactor-page-index-offset-getters include/linux/hugetlb.h
--- a/include/linux/hugetlb.h~mm-refactor-page-index-offset-getters
+++ a/include/linux/hugetlb.h
@@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mos
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
-int PageHuge(struct page *page);
-
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
@@ -108,11 +106,6 @@ unsigned long hugetlb_change_protection(
 
 #else /* !CONFIG_HUGETLB_PAGE */
 
-static inline int PageHuge(struct page *page)
-{
-	return 0;
-}
-
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 }
diff -puN include/linux/mm.h~mm-refactor-page-index-offset-getters include/linux/mm.h
--- a/include/linux/mm.h~mm-refactor-page-index-offset-getters
+++ a/include/linux/mm.h
@@ -456,8 +456,13 @@ static inline int page_count(struct page
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
+extern int PageHuge(struct page *page);
 extern int PageHeadHuge(struct page *page_head);
 #else /* CONFIG_HUGETLB_PAGE */
+static inline int PageHuge(struct page *page)
+{
+	return 0;
+}
 static inline int PageHeadHuge(struct page *page_head)
 {
 	return 0;
@@ -986,27 +991,16 @@ static inline int PageAnon(struct page *
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
-/*
- * Return the pagecache index of the passed page.  Regular pagecache pages
- * use ->index whereas swapcache pages use ->private
- */
-static inline pgoff_t page_index(struct page *page)
-{
-	if (unlikely(PageSwapCache(page)))
-		return page_private(page);
-	return page->index;
-}
-
-extern pgoff_t __page_file_index(struct page *page);
+extern pgoff_t __page_swap_index(struct page *page);
 
 /*
- * Return the file index of the page. Regular pagecache pages use ->index
- * whereas swapcache pages use swp_offset(->private)
+ * Return the pagecache index of the passed page, which is the offset
+ * in the relevant page size.
  */
-static inline pgoff_t page_file_index(struct page *page)
+static inline pgoff_t page_index(struct page *page)
 {
 	if (unlikely(PageSwapCache(page)))
-		return __page_file_index(page);
+		return __page_swap_index(page);
 
 	return page->index;
 }
diff -puN include/linux/pagemap.h~mm-refactor-page-index-offset-getters include/linux/pagemap.h
--- a/include/linux/pagemap.h~mm-refactor-page-index-offset-getters
+++ a/include/linux/pagemap.h
@@ -399,28 +399,24 @@ static inline struct page *read_mapping_
 }
 
 /*
- * Get the offset in PAGE_SIZE.
+ * Return the 4kB page offset of the given page.
  * (TODO: hugepage should have ->index in PAGE_SIZE)
  */
-static inline pgoff_t page_to_pgoff(struct page *page)
+static inline pgoff_t page_pgoff(struct page *page)
 {
-	if (unlikely(PageHeadHuge(page)))
-		return page->index << compound_order(page);
-	else
-		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	if (unlikely(PageHuge(page))) {
+		VM_BUG_ON_PAGE(PageTail(page), page);
+		return page_index(page) << compound_order(page);
+	} else
+		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
 /*
- * Return byte-offset into filesystem object for page.
+ * Return the byte offset of the given page.
  */
 static inline loff_t page_offset(struct page *page)
 {
-	return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
-}
-
-static inline loff_t page_file_offset(struct page *page)
-{
-	return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
+	return ((loff_t)page_pgoff(page)) << PAGE_SHIFT;
 }
 
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
diff -puN mm/memory-failure.c~mm-refactor-page-index-offset-getters mm/memory-failure.c
--- a/mm/memory-failure.c~mm-refactor-page-index-offset-getters
+++ a/mm/memory-failure.c
@@ -435,7 +435,7 @@ static void collect_procs_anon(struct pa
 	if (av == NULL)	/* Not actually mapped anymore */
 		return;
 
-	pgoff = page_to_pgoff(page);
+	pgoff = page_pgoff(page);
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
@@ -469,7 +469,7 @@ static void collect_procs_file(struct pa
 	mutex_lock(&mapping->i_mmap_mutex);
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
-		pgoff_t pgoff = page_to_pgoff(page);
+		pgoff_t pgoff = page_pgoff(page);
 		struct task_struct *t = task_early_kill(tsk, force_early);
 
 		if (!t)
diff -puN mm/page_io.c~mm-refactor-page-index-offset-getters mm/page_io.c
--- a/mm/page_io.c~mm-refactor-page-index-offset-getters
+++ a/mm/page_io.c
@@ -250,7 +250,7 @@ out:
 
 static sector_t swap_page_sector(struct page *page)
 {
-	return (sector_t)__page_file_index(page) << (PAGE_CACHE_SHIFT - 9);
+	return (sector_t)__page_swap_index(page) << (PAGE_CACHE_SHIFT - 9);
 }
 
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
@@ -278,7 +278,7 @@ int __swap_writepage(struct page *page,
 		from.bvec = &bv;	/* older gcc versions are broken */
 
 		init_sync_kiocb(&kiocb, swap_file);
-		kiocb.ki_pos = page_file_offset(page);
+		kiocb.ki_pos = page_offset(page);
 		kiocb.ki_nbytes = PAGE_SIZE;
 
 		set_page_writeback(page);
@@ -303,7 +303,7 @@ int __swap_writepage(struct page *page,
 			set_page_dirty(page);
 			ClearPageReclaim(page);
 			pr_err_ratelimited("Write error on dio swapfile (%Lu)\n",
-				page_file_offset(page));
+				page_offset(page));
 		}
 		end_page_writeback(page);
 		return ret;
diff -puN mm/rmap.c~mm-refactor-page-index-offset-getters mm/rmap.c
--- a/mm/rmap.c~mm-refactor-page-index-offset-getters
+++ a/mm/rmap.c
@@ -517,7 +517,7 @@ void page_unlock_anon_vma_read(struct an
 static inline unsigned long
 __vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
@@ -1615,7 +1615,7 @@ static struct anon_vma *rmap_walk_anon_l
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1656,7 +1656,7 @@ static int rmap_walk_anon(struct page *p
 static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
diff -puN mm/swapfile.c~mm-refactor-page-index-offset-getters mm/swapfile.c
--- a/mm/swapfile.c~mm-refactor-page-index-offset-getters
+++ a/mm/swapfile.c
@@ -2715,13 +2715,13 @@ struct address_space *__page_file_mappin
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
 
-pgoff_t __page_file_index(struct page *page)
+pgoff_t __page_swap_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
-EXPORT_SYMBOL_GPL(__page_file_index);
+EXPORT_SYMBOL_GPL(__page_swap_index);
 
 /*
  * add_swap_count_continuation - called when a swap count is duplicated
_


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
@ 2014-07-23 21:39               ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2014-07-23 21:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Tue, 15 Jul 2014 12:41:12 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> There is a complaint about duplication around the fundamental routines
> of page index/offset getters.
> 
> page_(index|offset) and page_file_(index|offset) provide the same
> functionality, so we can merge them as page_(index|offset), respectively.
> 
> And this patch gives the clear meaning to the getters:
>  - page_index(): get page cache index (offset in relevant page size)
>  - page_pgoff(): get 4kB page offset
>  - page_offset(): get byte offset
> All these functions are aware of regular pages, swapcaches, and hugepages.
> 
> The definition of PageHuge is relocated to include/linux/mm.h, because
> some users of page_pgoff() doesn't include include/linux/hugetlb.h.
> 
> __page_file_index() is not well named, because it's only for swap cache.
> So let's rename it with __page_swap_index().

Thanks, I guess that's better.  Could others please have a look-n-think?

I did this:

--- a/include/linux/pagemap.h~mm-refactor-page-index-offset-getters-fix
+++ a/include/linux/pagemap.h
@@ -412,7 +412,7 @@ static inline pgoff_t page_pgoff(struct
 }
 
 /*
- * Return the byte offset of the given page.
+ * Return the file offset of the given pagecache page, in bytes.
  */
 static inline loff_t page_offset(struct page *page)
 {



You had a random xfs_aops.c whitespace fix which I'll pretend I didn't
notice ;)




From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: mm: refactor page index/offset getters

There is a complaint about duplication around the fundamental routines of
page index/offset getters.

page_(index|offset) and page_file_(index|offset) provide the same
functionality, so we can merge them as page_(index|offset), respectively.

And this patch gives the clear meaning to the getters:
 - page_index(): get page cache index (offset in relevant page size)
 - page_pgoff(): get 4kB page offset
 - page_offset(): get byte offset
All these functions are aware of regular pages, swapcaches, and hugepages.

The definition of PageHuge is relocated to include/linux/mm.h, because
some users of page_pgoff() doesn't include include/linux/hugetlb.h.

__page_file_index() is not well named, because it's only for swap cache.
So let's rename it with __page_swap_index().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/nfs/internal.h       |    6 +++---
 fs/nfs/pagelist.c       |    2 +-
 fs/nfs/read.c           |    2 +-
 fs/nfs/write.c          |   10 +++++-----
 fs/xfs/xfs_aops.c       |    2 +-
 include/linux/hugetlb.h |    7 -------
 include/linux/mm.h      |   26 ++++++++++----------------
 include/linux/pagemap.h |   22 +++++++++-------------
 mm/memory-failure.c     |    4 ++--
 mm/page_io.c            |    6 +++---
 mm/rmap.c               |    6 +++---
 mm/swapfile.c           |    4 ++--
 12 files changed, 40 insertions(+), 57 deletions(-)

diff -puN fs/nfs/internal.h~mm-refactor-page-index-offset-getters fs/nfs/internal.h
--- a/fs/nfs/internal.h~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/internal.h
@@ -566,11 +566,11 @@ unsigned int nfs_page_length(struct page
 	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
-		pgoff_t page_index = page_file_index(page);
+		pgoff_t index = page_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page_index < end_index)
+		if (index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page_index == end_index)
+		if (index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff -puN fs/nfs/pagelist.c~mm-refactor-page-index-offset-getters fs/nfs/pagelist.c
--- a/fs/nfs/pagelist.c~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/pagelist.c
@@ -333,7 +333,7 @@ nfs_create_request(struct nfs_open_conte
 	 * long write-back delay. This will be adjusted in
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
-	req->wb_index	= page_file_index(page);
+	req->wb_index	= page_index(page);
 	page_cache_get(page);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
diff -puN fs/nfs/read.c~mm-refactor-page-index-offset-getters fs/nfs/read.c
--- a/fs/nfs/read.c~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/read.c
@@ -271,7 +271,7 @@ int nfs_readpage(struct file *file, stru
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page_file_index(page));
+		page, PAGE_CACHE_SIZE, page_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
diff -puN fs/nfs/write.c~mm-refactor-page-index-offset-getters fs/nfs/write.c
--- a/fs/nfs/write.c~mm-refactor-page-index-offset-getters
+++ a/fs/nfs/write.c
@@ -153,9 +153,9 @@ static void nfs_grow_file(struct page *p
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page_file_index(page) < end_index)
+	if (i_size > 0 && page_index(page) < end_index)
 		goto out;
-	end = page_file_offset(page) + ((loff_t)offset+count);
+	end = page_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -569,7 +569,7 @@ static int nfs_do_writepage(struct page
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page_file_index(page));
+	nfs_pageio_cond_complete(pgio, page_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -1212,7 +1212,7 @@ int nfs_updatepage(struct file *file, st
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
 
 	dprintk("NFS:       nfs_updatepage(%pD2 %d@%lld)\n",
-		file, count, (long long)(page_file_offset(page) + offset));
+		file, count, (long long)(page_offset(page) + offset));
 
 	if (nfs_can_extend_write(file, page, inode)) {
 		count = max(count + offset, nfs_page_length(page));
@@ -1827,7 +1827,7 @@ int nfs_wb_page_cancel(struct inode *ino
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_file_offset(page);
+	loff_t range_start = page_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
diff -puN fs/xfs/xfs_aops.c~mm-refactor-page-index-offset-getters fs/xfs/xfs_aops.c
--- a/fs/xfs/xfs_aops.c~mm-refactor-page-index-offset-getters
+++ a/fs/xfs/xfs_aops.c
@@ -695,7 +695,7 @@ xfs_convert_page(
 	unsigned int		type;
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
- 	xfs_off_t		offset = page_offset(page);
+	xfs_off_t		offset = page_offset(page);
 
 	if (page->index != tindex)
 		goto fail;
diff -puN include/linux/hugetlb.h~mm-refactor-page-index-offset-getters include/linux/hugetlb.h
--- a/include/linux/hugetlb.h~mm-refactor-page-index-offset-getters
+++ a/include/linux/hugetlb.h
@@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mos
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
-int PageHuge(struct page *page);
-
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
@@ -108,11 +106,6 @@ unsigned long hugetlb_change_protection(
 
 #else /* !CONFIG_HUGETLB_PAGE */
 
-static inline int PageHuge(struct page *page)
-{
-	return 0;
-}
-
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 }
diff -puN include/linux/mm.h~mm-refactor-page-index-offset-getters include/linux/mm.h
--- a/include/linux/mm.h~mm-refactor-page-index-offset-getters
+++ a/include/linux/mm.h
@@ -456,8 +456,13 @@ static inline int page_count(struct page
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
+extern int PageHuge(struct page *page);
 extern int PageHeadHuge(struct page *page_head);
 #else /* CONFIG_HUGETLB_PAGE */
+static inline int PageHuge(struct page *page)
+{
+	return 0;
+}
 static inline int PageHeadHuge(struct page *page_head)
 {
 	return 0;
@@ -986,27 +991,16 @@ static inline int PageAnon(struct page *
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
-/*
- * Return the pagecache index of the passed page.  Regular pagecache pages
- * use ->index whereas swapcache pages use ->private
- */
-static inline pgoff_t page_index(struct page *page)
-{
-	if (unlikely(PageSwapCache(page)))
-		return page_private(page);
-	return page->index;
-}
-
-extern pgoff_t __page_file_index(struct page *page);
+extern pgoff_t __page_swap_index(struct page *page);
 
 /*
- * Return the file index of the page. Regular pagecache pages use ->index
- * whereas swapcache pages use swp_offset(->private)
+ * Return the pagecache index of the passed page, which is the offset
+ * in the relevant page size.
  */
-static inline pgoff_t page_file_index(struct page *page)
+static inline pgoff_t page_index(struct page *page)
 {
 	if (unlikely(PageSwapCache(page)))
-		return __page_file_index(page);
+		return __page_swap_index(page);
 
 	return page->index;
 }
diff -puN include/linux/pagemap.h~mm-refactor-page-index-offset-getters include/linux/pagemap.h
--- a/include/linux/pagemap.h~mm-refactor-page-index-offset-getters
+++ a/include/linux/pagemap.h
@@ -399,28 +399,24 @@ static inline struct page *read_mapping_
 }
 
 /*
- * Get the offset in PAGE_SIZE.
+ * Return the 4kB page offset of the given page.
  * (TODO: hugepage should have ->index in PAGE_SIZE)
  */
-static inline pgoff_t page_to_pgoff(struct page *page)
+static inline pgoff_t page_pgoff(struct page *page)
 {
-	if (unlikely(PageHeadHuge(page)))
-		return page->index << compound_order(page);
-	else
-		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	if (unlikely(PageHuge(page))) {
+		VM_BUG_ON_PAGE(PageTail(page), page);
+		return page_index(page) << compound_order(page);
+	} else
+		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
 /*
- * Return byte-offset into filesystem object for page.
+ * Return the byte offset of the given page.
  */
 static inline loff_t page_offset(struct page *page)
 {
-	return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
-}
-
-static inline loff_t page_file_offset(struct page *page)
-{
-	return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
+	return ((loff_t)page_pgoff(page)) << PAGE_SHIFT;
 }
 
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
diff -puN mm/memory-failure.c~mm-refactor-page-index-offset-getters mm/memory-failure.c
--- a/mm/memory-failure.c~mm-refactor-page-index-offset-getters
+++ a/mm/memory-failure.c
@@ -435,7 +435,7 @@ static void collect_procs_anon(struct pa
 	if (av == NULL)	/* Not actually mapped anymore */
 		return;
 
-	pgoff = page_to_pgoff(page);
+	pgoff = page_pgoff(page);
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
@@ -469,7 +469,7 @@ static void collect_procs_file(struct pa
 	mutex_lock(&mapping->i_mmap_mutex);
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
-		pgoff_t pgoff = page_to_pgoff(page);
+		pgoff_t pgoff = page_pgoff(page);
 		struct task_struct *t = task_early_kill(tsk, force_early);
 
 		if (!t)
diff -puN mm/page_io.c~mm-refactor-page-index-offset-getters mm/page_io.c
--- a/mm/page_io.c~mm-refactor-page-index-offset-getters
+++ a/mm/page_io.c
@@ -250,7 +250,7 @@ out:
 
 static sector_t swap_page_sector(struct page *page)
 {
-	return (sector_t)__page_file_index(page) << (PAGE_CACHE_SHIFT - 9);
+	return (sector_t)__page_swap_index(page) << (PAGE_CACHE_SHIFT - 9);
 }
 
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
@@ -278,7 +278,7 @@ int __swap_writepage(struct page *page,
 		from.bvec = &bv;	/* older gcc versions are broken */
 
 		init_sync_kiocb(&kiocb, swap_file);
-		kiocb.ki_pos = page_file_offset(page);
+		kiocb.ki_pos = page_offset(page);
 		kiocb.ki_nbytes = PAGE_SIZE;
 
 		set_page_writeback(page);
@@ -303,7 +303,7 @@ int __swap_writepage(struct page *page,
 			set_page_dirty(page);
 			ClearPageReclaim(page);
 			pr_err_ratelimited("Write error on dio swapfile (%Lu)\n",
-				page_file_offset(page));
+				page_offset(page));
 		}
 		end_page_writeback(page);
 		return ret;
diff -puN mm/rmap.c~mm-refactor-page-index-offset-getters mm/rmap.c
--- a/mm/rmap.c~mm-refactor-page-index-offset-getters
+++ a/mm/rmap.c
@@ -517,7 +517,7 @@ void page_unlock_anon_vma_read(struct an
 static inline unsigned long
 __vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
@@ -1615,7 +1615,7 @@ static struct anon_vma *rmap_walk_anon_l
 static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1656,7 +1656,7 @@ static int rmap_walk_anon(struct page *p
 static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 {
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page_to_pgoff(page);
+	pgoff_t pgoff = page_pgoff(page);
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
diff -puN mm/swapfile.c~mm-refactor-page-index-offset-getters mm/swapfile.c
--- a/mm/swapfile.c~mm-refactor-page-index-offset-getters
+++ a/mm/swapfile.c
@@ -2715,13 +2715,13 @@ struct address_space *__page_file_mappin
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
 
-pgoff_t __page_file_index(struct page *page)
+pgoff_t __page_swap_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
-EXPORT_SYMBOL_GPL(__page_file_index);
+EXPORT_SYMBOL_GPL(__page_swap_index);
 
 /*
  * add_swap_count_continuation - called when a swap count is duplicated
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
  2014-07-23 21:39               ` Andrew Morton
@ 2014-07-23 21:45                 ` Naoya Horiguchi
  -1 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-23 21:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Wed, Jul 23, 2014 at 02:39:18PM -0700, Andrew Morton wrote:
> On Tue, 15 Jul 2014 12:41:12 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > There is a complaint about duplication around the fundamental routines
> > of page index/offset getters.
> > 
> > page_(index|offset) and page_file_(index|offset) provide the same
> > functionality, so we can merge them as page_(index|offset), respectively.
> > 
> > And this patch gives the clear meaning to the getters:
> >  - page_index(): get page cache index (offset in relevant page size)
> >  - page_pgoff(): get 4kB page offset
> >  - page_offset(): get byte offset
> > All these functions are aware of regular pages, swapcaches, and hugepages.
> > 
> > The definition of PageHuge is relocated to include/linux/mm.h, because
> > some users of page_pgoff() doesn't include include/linux/hugetlb.h.
> > 
> > __page_file_index() is not well named, because it's only for swap cache.
> > So let's rename it with __page_swap_index().
> 
> Thanks, I guess that's better.  Could others please have a look-n-think?
> 
> I did this:
> 
> --- a/include/linux/pagemap.h~mm-refactor-page-index-offset-getters-fix
> +++ a/include/linux/pagemap.h
> @@ -412,7 +412,7 @@ static inline pgoff_t page_pgoff(struct
>  }
>  
>  /*
> - * Return the byte offset of the given page.
> + * Return the file offset of the given pagecache page, in bytes.

Thanks, it's clearer.

>  static inline loff_t page_offset(struct page *page)
>  {
> 
> 
> 
> You had a random xfs_aops.c whitespace fix which I'll pretend I didn't
> notice ;)

I just couldn't resist fixing it ;)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
@ 2014-07-23 21:45                 ` Naoya Horiguchi
  0 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-23 21:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins, Rik van Riel,
	Hillf Danton, linux-kernel, linux-mm, Naoya Horiguchi

On Wed, Jul 23, 2014 at 02:39:18PM -0700, Andrew Morton wrote:
> On Tue, 15 Jul 2014 12:41:12 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > There is a complaint about duplication around the fundamental routines
> > of page index/offset getters.
> > 
> > page_(index|offset) and page_file_(index|offset) provide the same
> > functionality, so we can merge them as page_(index|offset), respectively.
> > 
> > And this patch gives the clear meaning to the getters:
> >  - page_index(): get page cache index (offset in relevant page size)
> >  - page_pgoff(): get 4kB page offset
> >  - page_offset(): get byte offset
> > All these functions are aware of regular pages, swapcaches, and hugepages.
> > 
> > The definition of PageHuge is relocated to include/linux/mm.h, because
> > some users of page_pgoff() doesn't include include/linux/hugetlb.h.
> > 
> > __page_file_index() is not well named, because it's only for swap cache.
> > So let's rename it with __page_swap_index().
> 
> Thanks, I guess that's better.  Could others please have a look-n-think?
> 
> I did this:
> 
> --- a/include/linux/pagemap.h~mm-refactor-page-index-offset-getters-fix
> +++ a/include/linux/pagemap.h
> @@ -412,7 +412,7 @@ static inline pgoff_t page_pgoff(struct
>  }
>  
>  /*
> - * Return the byte offset of the given page.
> + * Return the file offset of the given pagecache page, in bytes.

Thanks, it's clearer.

>  static inline loff_t page_offset(struct page *page)
>  {
> 
> 
> 
> You had a random xfs_aops.c whitespace fix which I'll pretend I didn't
> notice ;)

I just couldn't resist fixing it ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
  2014-07-15 16:41             ` Naoya Horiguchi
@ 2014-07-28 20:29               ` Johannes Weiner
  -1 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2014-07-28 20:29 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins,
	Rik van Riel, Hillf Danton, linux-kernel, linux-mm,
	Naoya Horiguchi

On Tue, Jul 15, 2014 at 12:41:12PM -0400, Naoya Horiguchi wrote:
> @@ -399,28 +399,24 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
>  }
>  
>  /*
> - * Get the offset in PAGE_SIZE.
> + * Return the 4kB page offset of the given page.
>   * (TODO: hugepage should have ->index in PAGE_SIZE)
>   */
> -static inline pgoff_t page_to_pgoff(struct page *page)
> +static inline pgoff_t page_pgoff(struct page *page)
>  {
> -	if (unlikely(PageHeadHuge(page)))
> -		return page->index << compound_order(page);
> -	else
> -		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +	if (unlikely(PageHuge(page))) {
> +		VM_BUG_ON_PAGE(PageTail(page), page);
> +		return page_index(page) << compound_order(page);
> +	} else
> +		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>  }

I just bisected the VM refusing to swap and triggering OOM kills to
this patch, which is likely the same bug you reported a couple days
back when you had this patch in your private tree.

Changing page->index to page_index() makes this function return the
swap offset rather than the virtual PFN, but rmap uses this to index
into virtual address space.  Thus, swapcache pages can no longer be
found from try_to_unmap() and reclaim fails.

We can't simply change it back to page->index, however, because the
swapout path, which requires the swap offset, also uses this function
through page_offset().  Virtual address space functions and page cache
address space functions can't use the same helpers, and the helpers
should likely be named distinctly so that they are not confused and
it's clear what is being asked.  Plus, the patch forced every fs using
page_offset() to suddenly check PageHuge(), which is a function call.

How about

o page_offset() for use by filesystems, based on page->index

o page_virt_pgoff() for use on virtual memory math, based on
  page->index and respecting PageHuge()

o page_mapping_pgoff() for use by swapping and when working on
  mappings that could be swapper_space.

o page_mapping_offset() likewise, just in bytes

Hm?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
@ 2014-07-28 20:29               ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2014-07-28 20:29 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins,
	Rik van Riel, Hillf Danton, linux-kernel, linux-mm,
	Naoya Horiguchi

On Tue, Jul 15, 2014 at 12:41:12PM -0400, Naoya Horiguchi wrote:
> @@ -399,28 +399,24 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
>  }
>  
>  /*
> - * Get the offset in PAGE_SIZE.
> + * Return the 4kB page offset of the given page.
>   * (TODO: hugepage should have ->index in PAGE_SIZE)
>   */
> -static inline pgoff_t page_to_pgoff(struct page *page)
> +static inline pgoff_t page_pgoff(struct page *page)
>  {
> -	if (unlikely(PageHeadHuge(page)))
> -		return page->index << compound_order(page);
> -	else
> -		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +	if (unlikely(PageHuge(page))) {
> +		VM_BUG_ON_PAGE(PageTail(page), page);
> +		return page_index(page) << compound_order(page);
> +	} else
> +		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>  }

I just bisected the VM refusing to swap and triggering OOM kills to
this patch, which is likely the same bug you reported a couple days
back when you had this patch in your private tree.

Changing page->index to page_index() makes this function return the
swap offset rather than the virtual PFN, but rmap uses this to index
into virtual address space.  Thus, swapcache pages can no longer be
found from try_to_unmap() and reclaim fails.

We can't simply change it back to page->index, however, because the
swapout path, which requires the swap offset, also uses this function
through page_offset().  Virtual address space functions and page cache
address space functions can't use the same helpers, and the helpers
should likely be named distinctly so that they are not confused and
it's clear what is being asked.  Plus, the patch forced every fs using
page_offset() to suddenly check PageHuge(), which is a function call.

How about

o page_offset() for use by filesystems, based on page->index

o page_virt_pgoff() for use on virtual memory math, based on
  page->index and respecting PageHuge()

o page_mapping_pgoff() for use by swapping and when working on
  mappings that could be swapper_space.

o page_mapping_offset() likewise, just in bytes

Hm?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
  2014-07-28 20:29               ` Johannes Weiner
@ 2014-07-29  0:42                 ` Naoya Horiguchi
  -1 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-29  0:42 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins,
	Rik van Riel, Hillf Danton, linux-kernel, linux-mm,
	Naoya Horiguchi

On Mon, Jul 28, 2014 at 04:29:52PM -0400, Johannes Weiner wrote:
> On Tue, Jul 15, 2014 at 12:41:12PM -0400, Naoya Horiguchi wrote:
> > @@ -399,28 +399,24 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
> >  }
> >  
> >  /*
> > - * Get the offset in PAGE_SIZE.
> > + * Return the 4kB page offset of the given page.
> >   * (TODO: hugepage should have ->index in PAGE_SIZE)
> >   */
> > -static inline pgoff_t page_to_pgoff(struct page *page)
> > +static inline pgoff_t page_pgoff(struct page *page)
> >  {
> > -	if (unlikely(PageHeadHuge(page)))
> > -		return page->index << compound_order(page);
> > -	else
> > -		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +	if (unlikely(PageHuge(page))) {
> > +		VM_BUG_ON_PAGE(PageTail(page), page);
> > +		return page_index(page) << compound_order(page);
> > +	} else
> > +		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> >  }
> 
> I just bisected the VM refusing to swap and triggering OOM kills to
> this patch, which is likely the same bug you reported a couple days
> back when you had this patch in your private tree.

Right, thanks.
And sorry for taking your time for my poor testing.

> Changing page->index to page_index() makes this function return the
> swap offset rather than the virtual PFN, but rmap uses this to index
> Into virtual address space.  Thus, swapcache pages can no longer be
> found from try_to_unmap() and reclaim fails.

I missed the fact that swap code needs both swap offset (stored in
page->private) and virtual PFN (in page->index), so unifying offset
getters to a single helper is completely wrong.

> We can't simply change it back to page->index, however, because the
> swapout path, which requires the swap offset, also uses this function
> through page_offset().  Virtual address space functions and page cache
> address space functions can't use the same helpers, and the helpers
> should likely be named distinctly so that they are not confused and
> it's clear what is being asked.

OK.

>  Plus, the patch forced every fs using
> page_offset() to suddenly check PageHuge(), which is a function call.

OK, PageHuge() check should be done only in vm code.

> How about
> 
> o page_offset() for use by filesystems, based on page->index

And many drivers code and network code use this, so I shouldn't have
touched this :(

> o page_virt_pgoff() for use on virtual memory math, based on
>   page->index and respecting PageHuge()

This is what current code does with page_to_pgoff().

> 
> o page_mapping_pgoff() for use by swapping and when working on
>   mappings that could be swapper_space.

page_file_offset() does this.

So it seems to me OK to just rename them with enough comments.

Thanks,
Naoya Horiguchi

> o page_mapping_offset() likewise, just in bytes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] mm: refactor page index/offset getters
@ 2014-07-29  0:42                 ` Naoya Horiguchi
  0 siblings, 0 replies; 26+ messages in thread
From: Naoya Horiguchi @ 2014-07-29  0:42 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Kirill A. Shutemov, Joonsoo Kim, Hugh Dickins,
	Rik van Riel, Hillf Danton, linux-kernel, linux-mm,
	Naoya Horiguchi

On Mon, Jul 28, 2014 at 04:29:52PM -0400, Johannes Weiner wrote:
> On Tue, Jul 15, 2014 at 12:41:12PM -0400, Naoya Horiguchi wrote:
> > @@ -399,28 +399,24 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
> >  }
> >  
> >  /*
> > - * Get the offset in PAGE_SIZE.
> > + * Return the 4kB page offset of the given page.
> >   * (TODO: hugepage should have ->index in PAGE_SIZE)
> >   */
> > -static inline pgoff_t page_to_pgoff(struct page *page)
> > +static inline pgoff_t page_pgoff(struct page *page)
> >  {
> > -	if (unlikely(PageHeadHuge(page)))
> > -		return page->index << compound_order(page);
> > -	else
> > -		return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +	if (unlikely(PageHuge(page))) {
> > +		VM_BUG_ON_PAGE(PageTail(page), page);
> > +		return page_index(page) << compound_order(page);
> > +	} else
> > +		return page_index(page) << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> >  }
> 
> I just bisected the VM refusing to swap and triggering OOM kills to
> this patch, which is likely the same bug you reported a couple days
> back when you had this patch in your private tree.

Right, thanks.
And sorry for taking your time for my poor testing.

> Changing page->index to page_index() makes this function return the
> swap offset rather than the virtual PFN, but rmap uses this to index
> Into virtual address space.  Thus, swapcache pages can no longer be
> found from try_to_unmap() and reclaim fails.

I missed the fact that swap code needs both swap offset (stored in
page->private) and virtual PFN (in page->index), so unifying offset
getters to a single helper is completely wrong.

> We can't simply change it back to page->index, however, because the
> swapout path, which requires the swap offset, also uses this function
> through page_offset().  Virtual address space functions and page cache
> address space functions can't use the same helpers, and the helpers
> should likely be named distinctly so that they are not confused and
> it's clear what is being asked.

OK.

>  Plus, the patch forced every fs using
> page_offset() to suddenly check PageHuge(), which is a function call.

OK, PageHuge() check should be done only in vm code.

> How about
> 
> o page_offset() for use by filesystems, based on page->index

And many drivers code and network code use this, so I shouldn't have
touched this :(

> o page_virt_pgoff() for use on virtual memory math, based on
>   page->index and respecting PageHuge()

This is what current code does with page_to_pgoff().

> 
> o page_mapping_pgoff() for use by swapping and when working on
>   mappings that could be swapper_space.

page_file_offset() does this.

So it seems to me OK to just rename them with enough comments.

Thanks,
Naoya Horiguchi

> o page_mapping_offset() likewise, just in bytes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-07-29  1:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-01 14:46 [PATCH] rmap: fix pgoff calculation to handle hugepage correctly Naoya Horiguchi
2014-07-01 14:46 ` Naoya Horiguchi
2014-07-01 17:42 ` Rik van Riel
2014-07-01 17:42   ` Rik van Riel
2014-07-01 18:07 ` Kirill A. Shutemov
2014-07-01 18:07   ` Kirill A. Shutemov
2014-07-01 18:50   ` Naoya Horiguchi
2014-07-01 18:50     ` Naoya Horiguchi
2014-07-01 20:15     ` Kirill A. Shutemov
2014-07-01 20:15       ` Kirill A. Shutemov
2014-07-02  4:30       ` Naoya Horiguchi
2014-07-02  4:30         ` Naoya Horiguchi
2014-07-03 11:41         ` Kirill A. Shutemov
2014-07-03 11:41           ` Kirill A. Shutemov
2014-07-07 19:39         ` Andrew Morton
2014-07-07 19:39           ` Andrew Morton
2014-07-15 16:41           ` [PATCH -mm] mm: refactor page index/offset getters Naoya Horiguchi
2014-07-15 16:41             ` Naoya Horiguchi
2014-07-23 21:39             ` Andrew Morton
2014-07-23 21:39               ` Andrew Morton
2014-07-23 21:45               ` Naoya Horiguchi
2014-07-23 21:45                 ` Naoya Horiguchi
2014-07-28 20:29             ` Johannes Weiner
2014-07-28 20:29               ` Johannes Weiner
2014-07-29  0:42               ` Naoya Horiguchi
2014-07-29  0:42                 ` Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.