* FAILED: patch "[PATCH] mm: fix unexpected zeroed page mapping with zram swap" failed to apply to 5.4-stable tree
@ 2022-04-18 8:05 gregkh
2022-05-05 16:27 ` Minchan Kim
0 siblings, 1 reply; 3+ messages in thread
From: gregkh @ 2022-04-18 8:05 UTC (permalink / raw)
To: minchan, akpm, axboe, david, ivan, ngupta, senozhatsky, stable, torvalds
Cc: stable
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e914d8f00391520ecc4495dd0ca0124538ab7119 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Thu, 14 Apr 2022 19:13:46 -0700
Subject: [PATCH] mm: fix unexpected zeroed page mapping with zram swap
Two processes under CLONE_VM cloning, user process can be corrupted by
seeing zeroed page unexpectedly.
CPU A CPU B
do_swap_page do_swap_page
SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
swap_readpage valid data
swap_slot_free_notify
delete zram entry
swap_readpage zeroed(invalid) data
pte_lock
map the *zero data* to userspace
pte_unlock
pte_lock
if (!pte_same)
goto out_nomap;
pte_unlock
return and next refault will
read zeroed data
The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
increase the refcount of swap slot at copy_mm so it couldn't catch up
whether it's safe or not to discard data from backing device. In the
case, only the lock it could rely on to synchronize swap slot freeing is
page table lock. Thus, this patch gets rid of the swap_slot_free_notify
function. With this patch, CPU A will see correct data.
CPU A CPU B
do_swap_page do_swap_page
SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
swap_readpage original data
pte_lock
map the original data
swap_free
swap_range_free
bd_disk->fops->swap_slot_free_notify
swap_readpage read zeroed data
pte_unlock
pte_lock
if (!pte_same)
goto out_nomap;
pte_unlock
return
on next refault will see mapped data by CPU B
The concern of the patch would increase memory consumption since it
could keep wasted memory with compressed form in zram as well as
uncompressed form in address space. However, most of cases of zram uses
no readahead and do_swap_page is followed by swap_free so it will free
the compressed form from in zram quickly.
Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/mm/page_io.c b/mm/page_io.c
index b417f000b49e..89fbf3cae30f 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -51,54 +51,6 @@ void end_swap_bio_write(struct bio *bio)
bio_put(bio);
}
-static void swap_slot_free_notify(struct page *page)
-{
- struct swap_info_struct *sis;
- struct gendisk *disk;
- swp_entry_t entry;
-
- /*
- * There is no guarantee that the page is in swap cache - the software
- * suspend code (at least) uses end_swap_bio_read() against a non-
- * swapcache page. So we must check PG_swapcache before proceeding with
- * this optimization.
- */
- if (unlikely(!PageSwapCache(page)))
- return;
-
- sis = page_swap_info(page);
- if (data_race(!(sis->flags & SWP_BLKDEV)))
- return;
-
- /*
- * The swap subsystem performs lazy swap slot freeing,
- * expecting that the page will be swapped out again.
- * So we can avoid an unnecessary write if the page
- * isn't redirtied.
- * This is good for real swap storage because we can
- * reduce unnecessary I/O and enhance wear-leveling
- * if an SSD is used as the as swap device.
- * But if in-memory swap device (eg zram) is used,
- * this causes a duplicated copy between uncompressed
- * data in VM-owned memory and compressed data in
- * zram-owned memory. So let's free zram-owned memory
- * and make the VM-owned decompressed page *dirty*,
- * so the page should be swapped out somewhere again if
- * we again wish to reclaim it.
- */
- disk = sis->bdev->bd_disk;
- entry.val = page_private(page);
- if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) {
- unsigned long offset;
-
- offset = swp_offset(entry);
-
- SetPageDirty(page);
- disk->fops->swap_slot_free_notify(sis->bdev,
- offset);
- }
-}
-
static void end_swap_bio_read(struct bio *bio)
{
struct page *page = bio_first_page_all(bio);
@@ -114,7 +66,6 @@ static void end_swap_bio_read(struct bio *bio)
}
SetPageUptodate(page);
- swap_slot_free_notify(page);
out:
unlock_page(page);
WRITE_ONCE(bio->bi_private, NULL);
@@ -394,11 +345,6 @@ int swap_readpage(struct page *page, bool synchronous)
if (sis->flags & SWP_SYNCHRONOUS_IO) {
ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
if (!ret) {
- if (trylock_page(page)) {
- swap_slot_free_notify(page);
- unlock_page(page);
- }
-
count_vm_event(PSWPIN);
goto out;
}
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: FAILED: patch "[PATCH] mm: fix unexpected zeroed page mapping with zram swap" failed to apply to 5.4-stable tree
2022-04-18 8:05 FAILED: patch "[PATCH] mm: fix unexpected zeroed page mapping with zram swap" failed to apply to 5.4-stable tree gregkh
@ 2022-05-05 16:27 ` Minchan Kim
2022-05-10 11:31 ` Greg KH
0 siblings, 1 reply; 3+ messages in thread
From: Minchan Kim @ 2022-05-05 16:27 UTC (permalink / raw)
To: gregkh; +Cc: akpm, axboe, david, ivan, ngupta, senozhatsky, stable, torvalds
On Mon, Apr 18, 2022 at 10:05:15AM +0200, gregkh@linuxfoundation.org wrote:
>
> The patch below does not apply to the 5.4-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
>
> thanks,
>
> greg k-h
< snip >
This is backport for 5.4-stable.
From 587a2b370d803766fd57778cdd1f05511658a246 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Thu, 14 Apr 2022 19:13:46 -0700
Subject: [PATCH] mm: fix unexpected zeroed page mapping with zram swap
e914d8f00391520ecc4495dd0ca0124538ab7119 upstream.
Two processes under CLONE_VM cloning, user process can be corrupted by
seeing zeroed page unexpectedly.
CPU A CPU B
do_swap_page do_swap_page
SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
swap_readpage valid data
swap_slot_free_notify
delete zram entry
swap_readpage zeroed(invalid) data
pte_lock
map the *zero data* to userspace
pte_unlock
pte_lock
if (!pte_same)
goto out_nomap;
pte_unlock
return and next refault will
read zeroed data
The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
increase the refcount of swap slot at copy_mm so it couldn't catch up
whether it's safe or not to discard data from backing device. In the
case, only the lock it could rely on to synchronize swap slot freeing is
page table lock. Thus, this patch gets rid of the swap_slot_free_notify
function. With this patch, CPU A will see correct data.
CPU A CPU B
do_swap_page do_swap_page
SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
swap_readpage original data
pte_lock
map the original data
swap_free
swap_range_free
bd_disk->fops->swap_slot_free_notify
swap_readpage read zeroed data
pte_unlock
pte_lock
if (!pte_same)
goto out_nomap;
pte_unlock
return
on next refault will see mapped data by CPU B
The concern of the patch would increase memory consumption since it
could keep wasted memory with compressed form in zram as well as
uncompressed form in address space. However, most of cases of zram uses
no readahead and do_swap_page is followed by swap_free so it will free
the compressed form from in zram quickly.
Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
mm/page_io.c | 54 ----------------------------------------------------
1 file changed, 54 deletions(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index bcf27d057253..f0e3f2be7b44 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -69,54 +69,6 @@ void end_swap_bio_write(struct bio *bio)
bio_put(bio);
}
-static void swap_slot_free_notify(struct page *page)
-{
- struct swap_info_struct *sis;
- struct gendisk *disk;
- swp_entry_t entry;
-
- /*
- * There is no guarantee that the page is in swap cache - the software
- * suspend code (at least) uses end_swap_bio_read() against a non-
- * swapcache page. So we must check PG_swapcache before proceeding with
- * this optimization.
- */
- if (unlikely(!PageSwapCache(page)))
- return;
-
- sis = page_swap_info(page);
- if (!(sis->flags & SWP_BLKDEV))
- return;
-
- /*
- * The swap subsystem performs lazy swap slot freeing,
- * expecting that the page will be swapped out again.
- * So we can avoid an unnecessary write if the page
- * isn't redirtied.
- * This is good for real swap storage because we can
- * reduce unnecessary I/O and enhance wear-leveling
- * if an SSD is used as the as swap device.
- * But if in-memory swap device (eg zram) is used,
- * this causes a duplicated copy between uncompressed
- * data in VM-owned memory and compressed data in
- * zram-owned memory. So let's free zram-owned memory
- * and make the VM-owned decompressed page *dirty*,
- * so the page should be swapped out somewhere again if
- * we again wish to reclaim it.
- */
- disk = sis->bdev->bd_disk;
- entry.val = page_private(page);
- if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) {
- unsigned long offset;
-
- offset = swp_offset(entry);
-
- SetPageDirty(page);
- disk->fops->swap_slot_free_notify(sis->bdev,
- offset);
- }
-}
-
static void end_swap_bio_read(struct bio *bio)
{
struct page *page = bio_first_page_all(bio);
@@ -132,7 +84,6 @@ static void end_swap_bio_read(struct bio *bio)
}
SetPageUptodate(page);
- swap_slot_free_notify(page);
out:
unlock_page(page);
WRITE_ONCE(bio->bi_private, NULL);
@@ -371,11 +322,6 @@ int swap_readpage(struct page *page, bool synchronous)
ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
if (!ret) {
- if (trylock_page(page)) {
- swap_slot_free_notify(page);
- unlock_page(page);
- }
-
count_vm_event(PSWPIN);
return 0;
}
--
2.36.0.512.ge40c2bad7a-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: FAILED: patch "[PATCH] mm: fix unexpected zeroed page mapping with zram swap" failed to apply to 5.4-stable tree
2022-05-05 16:27 ` Minchan Kim
@ 2022-05-10 11:31 ` Greg KH
0 siblings, 0 replies; 3+ messages in thread
From: Greg KH @ 2022-05-10 11:31 UTC (permalink / raw)
To: Minchan Kim
Cc: akpm, axboe, david, ivan, ngupta, senozhatsky, stable, torvalds
On Thu, May 05, 2022 at 09:27:49AM -0700, Minchan Kim wrote:
> On Mon, Apr 18, 2022 at 10:05:15AM +0200, gregkh@linuxfoundation.org wrote:
> >
> > The patch below does not apply to the 5.4-stable tree.
> > If someone wants it applied there, or to any other stable or longterm
> > tree, then please email the backport, including the original git commit
> > id to <stable@vger.kernel.org>.
> >
> > thanks,
> >
> > greg k-h
>
> < snip >
>
> This is backport for 5.4-stable.
>
> >From 587a2b370d803766fd57778cdd1f05511658a246 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Thu, 14 Apr 2022 19:13:46 -0700
> Subject: [PATCH] mm: fix unexpected zeroed page mapping with zram swap
>
> e914d8f00391520ecc4495dd0ca0124538ab7119 upstream.
>
> Two processes under CLONE_VM cloning, user process can be corrupted by
> seeing zeroed page unexpectedly.
>
> CPU A CPU B
>
> do_swap_page do_swap_page
> SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
> swap_readpage valid data
> swap_slot_free_notify
> delete zram entry
> swap_readpage zeroed(invalid) data
> pte_lock
> map the *zero data* to userspace
> pte_unlock
> pte_lock
> if (!pte_same)
> goto out_nomap;
> pte_unlock
> return and next refault will
> read zeroed data
>
> The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
> increase the refcount of swap slot at copy_mm so it couldn't catch up
> whether it's safe or not to discard data from backing device. In the
> case, only the lock it could rely on to synchronize swap slot freeing is
> page table lock. Thus, this patch gets rid of the swap_slot_free_notify
> function. With this patch, CPU A will see correct data.
>
> CPU A CPU B
>
> do_swap_page do_swap_page
> SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
> swap_readpage original data
> pte_lock
> map the original data
> swap_free
> swap_range_free
> bd_disk->fops->swap_slot_free_notify
> swap_readpage read zeroed data
> pte_unlock
> pte_lock
> if (!pte_same)
> goto out_nomap;
> pte_unlock
> return
> on next refault will see mapped data by CPU B
>
> The concern of the patch would increase memory consumption since it
> could keep wasted memory with compressed form in zram as well as
> uncompressed form in address space. However, most of cases of zram uses
> no readahead and do_swap_page is followed by swap_free so it will free
> the compressed form from in zram quickly.
>
> Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
> Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device")
> Reported-by: Ivan Babrou <ivan@cloudflare.com>
> Tested-by: Ivan Babrou <ivan@cloudflare.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: <stable@vger.kernel.org> [4.14+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> mm/page_io.c | 54 ----------------------------------------------------
> 1 file changed, 54 deletions(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index bcf27d057253..f0e3f2be7b44 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -69,54 +69,6 @@ void end_swap_bio_write(struct bio *bio)
> bio_put(bio);
> }
>
> -static void swap_slot_free_notify(struct page *page)
> -{
> - struct swap_info_struct *sis;
> - struct gendisk *disk;
> - swp_entry_t entry;
> -
> - /*
> - * There is no guarantee that the page is in swap cache - the software
> - * suspend code (at least) uses end_swap_bio_read() against a non-
> - * swapcache page. So we must check PG_swapcache before proceeding with
> - * this optimization.
> - */
> - if (unlikely(!PageSwapCache(page)))
> - return;
> -
> - sis = page_swap_info(page);
> - if (!(sis->flags & SWP_BLKDEV))
> - return;
> -
> - /*
> - * The swap subsystem performs lazy swap slot freeing,
> - * expecting that the page will be swapped out again.
> - * So we can avoid an unnecessary write if the page
> - * isn't redirtied.
> - * This is good for real swap storage because we can
> - * reduce unnecessary I/O and enhance wear-leveling
> - * if an SSD is used as the as swap device.
> - * But if in-memory swap device (eg zram) is used,
> - * this causes a duplicated copy between uncompressed
> - * data in VM-owned memory and compressed data in
> - * zram-owned memory. So let's free zram-owned memory
> - * and make the VM-owned decompressed page *dirty*,
> - * so the page should be swapped out somewhere again if
> - * we again wish to reclaim it.
> - */
> - disk = sis->bdev->bd_disk;
> - entry.val = page_private(page);
> - if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) {
> - unsigned long offset;
> -
> - offset = swp_offset(entry);
> -
> - SetPageDirty(page);
> - disk->fops->swap_slot_free_notify(sis->bdev,
> - offset);
> - }
> -}
> -
> static void end_swap_bio_read(struct bio *bio)
> {
> struct page *page = bio_first_page_all(bio);
> @@ -132,7 +84,6 @@ static void end_swap_bio_read(struct bio *bio)
> }
>
> SetPageUptodate(page);
> - swap_slot_free_notify(page);
> out:
> unlock_page(page);
> WRITE_ONCE(bio->bi_private, NULL);
> @@ -371,11 +322,6 @@ int swap_readpage(struct page *page, bool synchronous)
>
> ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
> if (!ret) {
> - if (trylock_page(page)) {
> - swap_slot_free_notify(page);
> - unlock_page(page);
> - }
> -
> count_vm_event(PSWPIN);
> return 0;
> }
> --
> 2.36.0.512.ge40c2bad7a-goog
>
Both now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-05-10 11:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 8:05 FAILED: patch "[PATCH] mm: fix unexpected zeroed page mapping with zram swap" failed to apply to 5.4-stable tree gregkh
2022-05-05 16:27 ` Minchan Kim
2022-05-10 11:31 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).