linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix page_count in ->iomap_migrate_page()
@ 2018-12-14  5:56 zhangjun
  2018-12-14 11:25 ` Richard Weinberger
  2018-12-15  4:26 ` Gao Xiang
  0 siblings, 2 replies; 8+ messages in thread
From: zhangjun @ 2018-12-14  5:56 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Richard Weinberger,
	Darrick J . Wong, zhangjun

IOMAP uses PG_private a little different with buffer_head based
filesystem.
It uses it as marker and when set, the page counter is not incremented,
migrate_page_move_mapping() assumes that PG_private indicates a counter
of +1.
so, we have to pass a extra count of -1 to migrate_page_move_mapping()
if the flag is set.

Signed-off-by: zhangjun <openzhangj@gmail.com>
---
 fs/iomap.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 64ce240..352e58a 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -544,8 +544,17 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 		struct page *page, enum migrate_mode mode)
 {
 	int ret;
+	int extra_count = 0;
 
-	ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
+	/*
+	 * IOMAP uses PG_private as marker and does not raise the page counter.
+	 * migrate_page_move_mapping() expects a incremented counter if PG_private
+	 * is set. Therefore pass -1 as extra_count for this case.
+	 */
+	if (page_has_private(page))
+		extra_count = -1;
+	ret = migrate_page_move_mapping(mapping, newpage, page,
+		       NULL, mode, extra_count);
 	if (ret != MIGRATEPAGE_SUCCESS)
 		return ret;
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix page_count in ->iomap_migrate_page()
  2018-12-14  5:56 [PATCH] fix page_count in ->iomap_migrate_page() zhangjun
@ 2018-12-14 11:25 ` Richard Weinberger
  2018-12-14 12:26   ` Gao Xiang
  2018-12-15 10:51   ` Christoph Hellwig
  2018-12-15  4:26 ` Gao Xiang
  1 sibling, 2 replies; 8+ messages in thread
From: Richard Weinberger @ 2018-12-14 11:25 UTC (permalink / raw)
  To: zhangjun
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Darrick J . Wong,
	hch, bfoster, darrick.wong, Dave Chinner, akpm, kirill.shutemov,
	mhocko, n-horiguchi, mgorman, aarcange, willy, linux, linux-mm,
	Gao Xiang

[CC'ing authors of the code plus mm folks]

Am Freitag, 14. Dezember 2018, 06:56:01 CET schrieb zhangjun:
> IOMAP uses PG_private a little different with buffer_head based
> filesystem.
> It uses it as marker and when set, the page counter is not incremented,
> migrate_page_move_mapping() assumes that PG_private indicates a counter
> of +1.
> so, we have to pass a extra count of -1 to migrate_page_move_mapping()
> if the flag is set.
> 
> Signed-off-by: zhangjun <openzhangj@gmail.com>
> ---
>  fs/iomap.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 64ce240..352e58a 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -544,8 +544,17 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
>  		struct page *page, enum migrate_mode mode)
>  {
>  	int ret;
> +	int extra_count = 0;
>  
> -	ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
> +	/*
> +	 * IOMAP uses PG_private as marker and does not raise the page counter.
> +	 * migrate_page_move_mapping() expects a incremented counter if PG_private
> +	 * is set. Therefore pass -1 as extra_count for this case.
> +	 */
> +	if (page_has_private(page))
> +		extra_count = -1;
> +	ret = migrate_page_move_mapping(mapping, newpage, page,
> +		       NULL, mode, extra_count);
>  	if (ret != MIGRATEPAGE_SUCCESS)
>  		return ret;

This is the third place which needs this workaround.
UBIFS, F2FS, and now iomap.

I agree with Dave that nobody can assume that PG_private implies an additional
page reference.
But page migration does that. Including parts of the write back code.

Thanks,
//richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix page_count in ->iomap_migrate_page()
  2018-12-14 11:25 ` Richard Weinberger
@ 2018-12-14 12:26   ` Gao Xiang
  2018-12-14 13:35     ` Richard Weinberger
  2018-12-15 10:51   ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Gao Xiang @ 2018-12-14 12:26 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
	Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
	kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
	linux, linux-mm, Gao Xiang

Hi Richard,

On 2018/12/14 19:25, Richard Weinberger wrote:
> This is the third place which needs this workaround.
> UBIFS, F2FS, and now iomap.
> 
> I agree with Dave that nobody can assume that PG_private implies an additional
> page reference.
> But page migration does that. Including parts of the write back code.

It seems that it's clearly documented in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780

 * A pagecache page contains an opaque `private' member, which belongs to the
 * page's address_space. Usually, this is the address of a circular list of
 * the page's disk buffers. PG_private must be set to tell the VM to call
 * into the filesystem to release these pages.
 *
 * A page may belong to an inode's memory mapping. In this case, page->mapping
 * is the pointer to the inode, and page->index is the file offset of the page,
 * in units of PAGE_SIZE.
 *
 * If pagecache pages are not associated with an inode, they are said to be
 * anonymous pages. These may become associated with the swapcache, and in that
 * case PG_swapcache is set, and page->private is an offset into the swapcache.
 *
 * In either case (swapcache or inode backed), the pagecache itself holds one
 * reference to the page. Setting PG_private should also increment the
 * refcount. The each user mapping also has a reference to the page.

and when I looked into that, I found
https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/


Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix page_count in ->iomap_migrate_page()
  2018-12-14 12:26   ` Gao Xiang
@ 2018-12-14 13:35     ` Richard Weinberger
  2018-12-14 13:55       ` Gao Xiang
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Weinberger @ 2018-12-14 13:35 UTC (permalink / raw)
  To: Gao Xiang, Artem Bityutskiy
  Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
	Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
	kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
	linux, linux-mm, Gao Xiang

Am Freitag, 14. Dezember 2018, 13:26:28 CET schrieb Gao Xiang:
> Hi Richard,
> 
> On 2018/12/14 19:25, Richard Weinberger wrote:
> > This is the third place which needs this workaround.
> > UBIFS, F2FS, and now iomap.
> > 
> > I agree with Dave that nobody can assume that PG_private implies an additional
> > page reference.
> > But page migration does that. Including parts of the write back code.
> 
> It seems that it's clearly documented in
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780
> 
>  * A pagecache page contains an opaque `private' member, which belongs to the
>  * page's address_space. Usually, this is the address of a circular list of
>  * the page's disk buffers. PG_private must be set to tell the VM to call
>  * into the filesystem to release these pages.
>  *
>  * A page may belong to an inode's memory mapping. In this case, page->mapping
>  * is the pointer to the inode, and page->index is the file offset of the page,
>  * in units of PAGE_SIZE.
>  *
>  * If pagecache pages are not associated with an inode, they are said to be
>  * anonymous pages. These may become associated with the swapcache, and in that
>  * case PG_swapcache is set, and page->private is an offset into the swapcache.
>  *
>  * In either case (swapcache or inode backed), the pagecache itself holds one
>  * reference to the page. Setting PG_private should also increment the
>  * refcount. The each user mapping also has a reference to the page.
> 
> and when I looked into that, I found
> https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/

Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing
the flag.
I did that now and so far none of my tests exploded.

Artem, do you remember why UBIFS never raised the page counter when setting PG_private?

Thanks,
//richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix page_count in ->iomap_migrate_page()
  2018-12-14 13:35     ` Richard Weinberger
@ 2018-12-14 13:55       ` Gao Xiang
  0 siblings, 0 replies; 8+ messages in thread
From: Gao Xiang @ 2018-12-14 13:55 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, zhangjun, Alexander Viro, linux-fsdevel,
	linux-kernel, Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
	kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
	linux, linux-mm, Gao Xiang

Hi Richard,

On 2018/12/14 21:35, Richard Weinberger wrote:
> Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing
> the flag.
> I did that now and so far none of my tests exploded.

Yes, many existed codes are based on this restriction in order to be freeable race-free.
and that's it since PG_Private was once introduced at first by Andrew Morton in 2002
for many Linux versions....and it's not bad I think... :)

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix page_count in ->iomap_migrate_page()
  2018-12-14  5:56 [PATCH] fix page_count in ->iomap_migrate_page() zhangjun
  2018-12-14 11:25 ` Richard Weinberger
@ 2018-12-15  4:26 ` Gao Xiang
  1 sibling, 0 replies; 8+ messages in thread
From: Gao Xiang @ 2018-12-15  4:26 UTC (permalink / raw)
  To: zhangjun
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Richard Weinberger,
	Darrick J . Wong, Andrew Morton


On 2018/12/14 13:56, zhangjun wrote:
> IOMAP uses PG_private a little different with buffer_head based
> filesystem.
> It uses it as marker and when set, the page counter is not incremented,
> migrate_page_move_mapping() assumes that PG_private indicates a counter
> of +1.
> so, we have to pass a extra count of -1 to migrate_page_move_mapping()
> if the flag is set.
> 
> Signed-off-by: zhangjun <openzhangj@gmail.com>
> ---

I found that it fixed in
https://patchwork.kernel.org/patch/10684835/
and has been merged in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61c6de667263184125d5ca75e894fcad632b0dd3

It seems it has been corrected by Piotr.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix page_count in ->iomap_migrate_page()
  2018-12-14 11:25 ` Richard Weinberger
  2018-12-14 12:26   ` Gao Xiang
@ 2018-12-15 10:51   ` Christoph Hellwig
  2018-12-15 11:17     ` Richard Weinberger
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2018-12-15 10:51 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
	Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
	kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
	linux, linux-mm, Gao Xiang

FYI, for iomap we got a patch to just increment the page count when
setting the private data, and it finally got merged into mainline after
a while.

Not that it totally makes sense to me, but it is what it is.  It would
just be nice if set_page_private took care of it and we had a
clear_page_private to undo it, making the whole scheme at lot more
obvious.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix page_count in ->iomap_migrate_page()
  2018-12-15 10:51   ` Christoph Hellwig
@ 2018-12-15 11:17     ` Richard Weinberger
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Weinberger @ 2018-12-15 11:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
	Darrick J . Wong, bfoster, Dave Chinner, akpm, kirill.shutemov,
	mhocko, n-horiguchi, mgorman, aarcange, willy, linux, linux-mm,
	Gao Xiang

Am Samstag, 15. Dezember 2018, 11:51:12 CET schrieb Christoph Hellwig:
> FYI, for iomap we got a patch to just increment the page count when
> setting the private data, and it finally got merged into mainline after
> a while.
> 
> Not that it totally makes sense to me, but it is what it is.  It would
> just be nice if set_page_private took care of it and we had a
> clear_page_private to undo it, making the whole scheme at lot more
> obvious.

Yeah, UBIFS will go the same route. I have already a patch prepared
which increments the page count when UBIFS sets PG_private.

Thanks,
//richard




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-15 11:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14  5:56 [PATCH] fix page_count in ->iomap_migrate_page() zhangjun
2018-12-14 11:25 ` Richard Weinberger
2018-12-14 12:26   ` Gao Xiang
2018-12-14 13:35     ` Richard Weinberger
2018-12-14 13:55       ` Gao Xiang
2018-12-15 10:51   ` Christoph Hellwig
2018-12-15 11:17     ` Richard Weinberger
2018-12-15  4:26 ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).