All of lore.kernel.org
 help / color / mirror / Atom feed
* Data corruption problem with swapfiles and THP
@ 2021-08-12 15:07 Matthew Wilcox
  2021-08-13  0:21   ` Huang, Ying
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2021-08-12 15:07 UTC (permalink / raw)
  To: Huang Ying, linux-kernel, linux-mm

There is an assumption in the swap writepage path that a THP is physically
contiguous on swap:

        bio->bi_iter.bi_sector = swap_page_sector(page);
        bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc);
        bio->bi_end_io = end_write_func;
        bio_add_page(bio, page, thp_size(page), 0);

As far as I can tell, this is not necessarily true.  If a file is not
contiguous, we can have an extent which is 1MB long followed by an extent
somewhere else on storage that's 1MB long.  When we try to write a 2MB
page to swap, we overwrite whatever's on the block device after that
first 1MB extent.

(Came across this by code examination while looking at getting rid of
the bio path entirely; no attempt has been made to produce this problem;
something else may prevent it from actually happening)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Data corruption problem with swapfiles and THP
  2021-08-12 15:07 Data corruption problem with swapfiles and THP Matthew Wilcox
@ 2021-08-13  0:21   ` Huang, Ying
  0 siblings, 0 replies; 3+ messages in thread
From: Huang, Ying @ 2021-08-13  0:21 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, linux-mm

Matthew Wilcox <willy@infradead.org> writes:

> There is an assumption in the swap writepage path that a THP is physically
> contiguous on swap:
>
>         bio->bi_iter.bi_sector = swap_page_sector(page);
>         bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc);
>         bio->bi_end_io = end_write_func;
>         bio_add_page(bio, page, thp_size(page), 0);
>
> As far as I can tell, this is not necessarily true.  If a file is not
> contiguous, we can have an extent which is 1MB long followed by an extent
> somewhere else on storage that's 1MB long.  When we try to write a 2MB
> page to swap, we overwrite whatever's on the block device after that
> first 1MB extent.
>
> (Came across this by code examination while looking at getting rid of
> the bio path entirely; no attempt has been made to produce this problem;
> something else may prevent it from actually happening)

Yes.  THP needs to be split firstly before swapping out to a swap device
backed by a file.  Please take a look at the get_swap_pages()

		if (size == SWAPFILE_CLUSTER) {
			if (si->flags & SWP_BLKDEV)
				n_ret = swap_alloc_cluster(si, swp_entries);
		} else
			n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
						    n_goal, swp_entries);

If the swap device is backed by a file, si->flags & SWP_BLKDEV == 0,
only normal swap entry (not huge) can be allocated.  This will result
that the THP is split.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Data corruption problem with swapfiles and THP
@ 2021-08-13  0:21   ` Huang, Ying
  0 siblings, 0 replies; 3+ messages in thread
From: Huang, Ying @ 2021-08-13  0:21 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, linux-mm

Matthew Wilcox <willy@infradead.org> writes:

> There is an assumption in the swap writepage path that a THP is physically
> contiguous on swap:
>
>         bio->bi_iter.bi_sector = swap_page_sector(page);
>         bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc);
>         bio->bi_end_io = end_write_func;
>         bio_add_page(bio, page, thp_size(page), 0);
>
> As far as I can tell, this is not necessarily true.  If a file is not
> contiguous, we can have an extent which is 1MB long followed by an extent
> somewhere else on storage that's 1MB long.  When we try to write a 2MB
> page to swap, we overwrite whatever's on the block device after that
> first 1MB extent.
>
> (Came across this by code examination while looking at getting rid of
> the bio path entirely; no attempt has been made to produce this problem;
> something else may prevent it from actually happening)

Yes.  THP needs to be split firstly before swapping out to a swap device
backed by a file.  Please take a look at the get_swap_pages()

		if (size == SWAPFILE_CLUSTER) {
			if (si->flags & SWP_BLKDEV)
				n_ret = swap_alloc_cluster(si, swp_entries);
		} else
			n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
						    n_goal, swp_entries);

If the swap device is backed by a file, si->flags & SWP_BLKDEV == 0,
only normal swap entry (not huge) can be allocated.  This will result
that the THP is split.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-13  0:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-12 15:07 Data corruption problem with swapfiles and THP Matthew Wilcox
2021-08-13  0:21 ` Huang, Ying
2021-08-13  0:21   ` Huang, Ying

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.