linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Is shmem page accounting wrong on split?
@ 2020-08-28 14:25 Matthew Wilcox
  2020-08-28 14:55 ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2020-08-28 14:25 UTC (permalink / raw)
  To: linux-mm; +Cc: Hugh Dickins, Yang Shi

If I understand truncate of a shmem THP correctly ...

Let's suppose the file has a single 2MB page at index 0, and is being
truncated down to 7 bytes in size.

shmem_setattr()
  i_size_write(7);
  shmem_truncate_range(7, -1);
    shmem_undo_range(7, -1)
      start = 1;
      page = &head[1];
      shmem_punch_compound();
        split_huge_page()
          end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1
          __split_huge_page(..., 1, ...);
            __delete_from_page_cache(&head[1], ...);
      truncate_inode_page(page);
        delete_from_page_cache(page)
          __delete_from_page_cache(&head[1])

I think the solution is to call truncate_inode_page() from within
shmem_punch_compound() if we don't call split_huge_page().  I came across
this while reusing all this infrastructure for the XFS THP patchset,
so I'm not in a great position to test this patch.

This solution actually makes my life harder because I have a different
function to call if the page doesn't need to be split.  But it's probably
the right solution for upstream today.

diff --git a/mm/shmem.c b/mm/shmem.c
index b2abca3f7f33..a0bc42974c2d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -819,15 +819,18 @@ void shmem_unlock_mapping(struct address_space *mapping)
 static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end)
 {
 	if (!PageTransCompound(page))
-		return true;
+		goto nosplit;
 
 	/* Just proceed to delete a huge page wholly within the range punched */
 	if (PageHead(page) &&
 	    page->index >= start && page->index + HPAGE_PMD_NR <= end)
-		return true;
+		goto nosplit;
 
 	/* Try to split huge page, so we can truly punch the hole or truncate */
 	return split_huge_page(page) >= 0;
+nosplit:
+	truncate_inode_page(page->mapping, page);
+	return true;
 }
 
 /*
@@ -883,8 +886,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			if ((!unfalloc || !PageUptodate(page)) &&
 			    page_mapping(page) == mapping) {
 				VM_BUG_ON_PAGE(PageWriteback(page), page);
-				if (shmem_punch_compound(page, start, end))
-					truncate_inode_page(mapping, page);
+				shmem_punch_compound(page, start, end);
 			}
 			unlock_page(page);
 		}
@@ -966,9 +968,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 					break;
 				}
 				VM_BUG_ON_PAGE(PageWriteback(page), page);
-				if (shmem_punch_compound(page, start, end))
-					truncate_inode_page(mapping, page);
-				else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+				if (!shmem_punch_compound(page, start, end)) {
 					/* Wipe the page and don't get stuck */
 					clear_highpage(page);
 					flush_dcache_page(page);




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Is shmem page accounting wrong on split?
  2020-08-28 14:25 Is shmem page accounting wrong on split? Matthew Wilcox
@ 2020-08-28 14:55 ` Matthew Wilcox
  2020-08-28 15:43   ` Yang Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2020-08-28 14:55 UTC (permalink / raw)
  To: linux-mm; +Cc: Hugh Dickins, Yang Shi

On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote:
> If I understand truncate of a shmem THP correctly ...
> 
> Let's suppose the file has a single 2MB page at index 0, and is being
> truncated down to 7 bytes in size.
> 
> shmem_setattr()
>   i_size_write(7);
>   shmem_truncate_range(7, -1);
>     shmem_undo_range(7, -1)
>       start = 1;
>       page = &head[1];
>       shmem_punch_compound();
>         split_huge_page()
>           end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1
>           __split_huge_page(..., 1, ...);
>             __delete_from_page_cache(&head[1], ...);
>       truncate_inode_page(page);
>         delete_from_page_cache(page)
>           __delete_from_page_cache(&head[1])
> 
> I think the solution is to call truncate_inode_page() from within
> shmem_punch_compound() if we don't call split_huge_page().  I came across
> this while reusing all this infrastructure for the XFS THP patchset,
> so I'm not in a great position to test this patch.

Oh, this works for truncate, but not hole-punch.  __split_huge_page()
won't call __delete_from_page_cache() for pages below the end of the
file.  So maybe this instead?

It's a bit cheesy ... maybe split_huge_page() could return 1 to indicate
that it actually disposed of the page passed in?

+++ b/mm/shmem.c
@@ -827,7 +827,7 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end)
                return true;
 
        /* Try to split huge page, so we can truly punch the hole or truncate */
-       return split_huge_page(page) >= 0;
+       return split_huge_page(page) >= 0 && end < -1;
 }
 
 /*



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Is shmem page accounting wrong on split?
  2020-08-28 14:55 ` Matthew Wilcox
@ 2020-08-28 15:43   ` Yang Shi
  2020-08-28 17:08     ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Shi @ 2020-08-28 15:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Linux MM, Hugh Dickins, Yang Shi

On Fri, Aug 28, 2020 at 7:55 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote:
> > If I understand truncate of a shmem THP correctly ...
> >
> > Let's suppose the file has a single 2MB page at index 0, and is being
> > truncated down to 7 bytes in size.
> >
> > shmem_setattr()
> >   i_size_write(7);
> >   shmem_truncate_range(7, -1);
> >     shmem_undo_range(7, -1)
> >       start = 1;
> >       page = &head[1];
> >       shmem_punch_compound();
> >         split_huge_page()
> >           end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1
> >           __split_huge_page(..., 1, ...);
> >             __delete_from_page_cache(&head[1], ...);
> >       truncate_inode_page(page);
> >         delete_from_page_cache(page)
> >           __delete_from_page_cache(&head[1])
> >
> > I think the solution is to call truncate_inode_page() from within
> > shmem_punch_compound() if we don't call split_huge_page().  I came across
> > this while reusing all this infrastructure for the XFS THP patchset,
> > so I'm not in a great position to test this patch.
>
> Oh, this works for truncate, but not hole-punch.  __split_huge_page()
> won't call __delete_from_page_cache() for pages below the end of the
> file.  So maybe this instead?
>
> It's a bit cheesy ... maybe split_huge_page() could return 1 to indicate
> that it actually disposed of the page passed in?

I'm fine to have split_huge_page() return 1.

>
> +++ b/mm/shmem.c
> @@ -827,7 +827,7 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end)
>                 return true;
>
>         /* Try to split huge page, so we can truly punch the hole or truncate */
> -       return split_huge_page(page) >= 0;
> +       return split_huge_page(page) >= 0 && end < -1;

It would be more clear if we could have some comment about what "-1"
means. It took me a little while to understand the magic number, but
once I understood it it looks more straightforward to me.

>  }
>
>  /*
>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Is shmem page accounting wrong on split?
  2020-08-28 15:43   ` Yang Shi
@ 2020-08-28 17:08     ` Hugh Dickins
  2020-08-28 17:31       ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2020-08-28 17:08 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Linux MM, Hugh Dickins, Yang Shi, Yang Shi

On Fri, 28 Aug 2020, Yang Shi wrote:
> On Fri, Aug 28, 2020 at 7:55 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote:
> > > If I understand truncate of a shmem THP correctly ...
> > >
> > > Let's suppose the file has a single 2MB page at index 0, and is being
> > > truncated down to 7 bytes in size.
> > >
> > > shmem_setattr()
> > >   i_size_write(7);
> > >   shmem_truncate_range(7, -1);
> > >     shmem_undo_range(7, -1)
> > >       start = 1;
> > >       page = &head[1];
> > >       shmem_punch_compound();
> > >         split_huge_page()
> > >           end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1
> > >           __split_huge_page(..., 1, ...);
> > >             __delete_from_page_cache(&head[1], ...);
> > >       truncate_inode_page(page);
> > >         delete_from_page_cache(page)
> > >           __delete_from_page_cache(&head[1])
> > >
> > > I think the solution is to call truncate_inode_page() from within
> > > shmem_punch_compound() if we don't call split_huge_page().  I came across
> > > this while reusing all this infrastructure for the XFS THP patchset,
> > > so I'm not in a great position to test this patch.

It's a good observation of an oddity that I probably didn't think of,
but you haven't said which kind of shmem page accounting goes wrong here
(vm_enough_memory? df of filesystem? du of filesystem? memcg charge?
all of the above? observed in practice?), and what needs solving.

If that page has already been deleted from page cache when splitting,
truncate_inode_page() sees NULL page->mapping != mapping and returns
without doing anything.  What's the problem?

Hugh

> >
> > Oh, this works for truncate, but not hole-punch.  __split_huge_page()
> > won't call __delete_from_page_cache() for pages below the end of the
> > file.  So maybe this instead?
> >
> > It's a bit cheesy ... maybe split_huge_page() could return 1 to indicate
> > that it actually disposed of the page passed in?
> 
> I'm fine to have split_huge_page() return 1.
> 
> >
> > +++ b/mm/shmem.c
> > @@ -827,7 +827,7 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end)
> >                 return true;
> >
> >         /* Try to split huge page, so we can truly punch the hole or truncate */
> > -       return split_huge_page(page) >= 0;
> > +       return split_huge_page(page) >= 0 && end < -1;
> 
> It would be more clear if we could have some comment about what "-1"
> means. It took me a little while to understand the magic number, but
> once I understood it it looks more straightforward to me.
> 
> >  }
> >
> >  /*


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Is shmem page accounting wrong on split?
  2020-08-28 17:08     ` Hugh Dickins
@ 2020-08-28 17:31       ` Matthew Wilcox
  2020-08-28 18:01         ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2020-08-28 17:31 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linux MM, Yang Shi, Yang Shi

On Fri, Aug 28, 2020 at 10:08:52AM -0700, Hugh Dickins wrote:
> On Fri, 28 Aug 2020, Yang Shi wrote:
> > On Fri, Aug 28, 2020 at 7:55 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote:
> > > > If I understand truncate of a shmem THP correctly ...
> > > >
> > > > Let's suppose the file has a single 2MB page at index 0, and is being
> > > > truncated down to 7 bytes in size.
> > > >
> > > > shmem_setattr()
> > > >   i_size_write(7);
> > > >   shmem_truncate_range(7, -1);
> > > >     shmem_undo_range(7, -1)
> > > >       start = 1;
> > > >       page = &head[1];
> > > >       shmem_punch_compound();
> > > >         split_huge_page()
> > > >           end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1
> > > >           __split_huge_page(..., 1, ...);
> > > >             __delete_from_page_cache(&head[1], ...);
> > > >       truncate_inode_page(page);
> > > >         delete_from_page_cache(page)
> > > >           __delete_from_page_cache(&head[1])
> > > >
> > > > I think the solution is to call truncate_inode_page() from within
> > > > shmem_punch_compound() if we don't call split_huge_page().  I came across
> > > > this while reusing all this infrastructure for the XFS THP patchset,
> > > > so I'm not in a great position to test this patch.
> 
> It's a good observation of an oddity that I probably didn't think of,
> but you haven't said which kind of shmem page accounting goes wrong here
> (vm_enough_memory? df of filesystem? du of filesystem? memcg charge?
> all of the above? observed in practice?), and what needs solving.
> 
> If that page has already been deleted from page cache when splitting,
> truncate_inode_page() sees NULL page->mapping != mapping and returns
> without doing anything.  What's the problem?

Ah!  I missed the check in truncate_inode_page().  This should be
fine then.

The problem I've observed in practice is following the same pattern in
truncate_inode_pages_range().  The call to delete_from_page_cache_batch()
trips the assertion that the page hasn't already been deleted from the
page cache.  I think the solution is obvious -- don't add the page to
locked_pvec if page->mapping is NULL.

                        if (thp_punch(page, lstart, lend))
                                pagevec_add(&locked_pvec, page);
                        else
                                unlock_page(page);
                }
                for (i = 0; i < pagevec_count(&locked_pvec); i++)
                        truncate_cleanup_page(mapping, locked_pvec.pages[i]);
                delete_from_page_cache_batch(mapping, &locked_pvec);
                for (i = 0; i < pagevec_count(&locked_pvec); i++)
                        unlock_page(locked_pvec.pages[i]);
                truncate_exceptional_pvec_entries(mapping, &pvec, indices);
                pagevec_release(&pvec);

(shmem_punch_compound() got renamed to thp_punch() and moved to truncate.c)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Is shmem page accounting wrong on split?
  2020-08-28 17:31       ` Matthew Wilcox
@ 2020-08-28 18:01         ` Matthew Wilcox
  0 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2020-08-28 18:01 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linux MM, Yang Shi, Yang Shi

On Fri, Aug 28, 2020 at 06:31:22PM +0100, Matthew Wilcox wrote:
> On Fri, Aug 28, 2020 at 10:08:52AM -0700, Hugh Dickins wrote:
> > It's a good observation of an oddity that I probably didn't think of,
> > but you haven't said which kind of shmem page accounting goes wrong here
> > (vm_enough_memory? df of filesystem? du of filesystem? memcg charge?
> > all of the above? observed in practice?), and what needs solving.

Oh, I forgot to say which 

> The problem I've observed in practice is following the same pattern in
> truncate_inode_pages_range().  The call to delete_from_page_cache_batch()
> trips the assertion that the page hasn't already been deleted from the
> page cache.  I think the solution is obvious -- don't add the page to
> locked_pvec if page->mapping is NULL.

Here's the change I'm currently testing.  It's survived about eight
minutes of xfstests so far, which is far longer than it was surviving
before.

-       /* Try to split huge page, so we can truly punch the hole or truncate */
-       return split_huge_page(page) >= 0;
+       /*
+        * split_huge_page may remove the page itself; if so, we should
+        * not attempt to remove it again.
+        */
+       return split_huge_page(page) >= 0 && page->mapping;



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-08-28 18:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 14:25 Is shmem page accounting wrong on split? Matthew Wilcox
2020-08-28 14:55 ` Matthew Wilcox
2020-08-28 15:43   ` Yang Shi
2020-08-28 17:08     ` Hugh Dickins
2020-08-28 17:31       ` Matthew Wilcox
2020-08-28 18:01         ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).