* [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent
@ 2021-08-18 11:36 Xu Yu
2021-08-18 17:01 ` Darrick J. Wong
2021-08-19 1:43 ` [PATCH] generic: add swapfile maxpages regression test Darrick J. Wong
0 siblings, 2 replies; 8+ messages in thread
From: Xu Yu @ 2021-08-18 11:36 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: hch, djwong, riteshh, tytso, gavin.dg
When the max pages (last_page in the swap header + 1) is smaller than
the total pages (inode size) of the swapfile, iomap_swapfile_activate
overwrites sis->max with total pages.
However, frontswap_map is allocated using max pages, but cleared using
sis->max in __frontswap_invalidate_area(). The consequence is that the
neighbors of frontswap_map bitmap may be overwritten, and then slab is
polluted.
This fixes the issue by considering the limitation of max pages of swap
info in iomap_swapfile_add_extent().
To reproduce the case, compile kernel with slub RED ZONE, then run test:
$ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \
--verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
--log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
--temp-path /root/tmpdir/stress-ng/
We'll get the error log as below:
[ 1151.015141] =============================================================================
[ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
[ 1151.017486] -----------------------------------------------------------------------------
[ 1151.017486]
[ 1151.018997] Disabling lock debugging due to kernel taint
[ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
[ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
[ 1151.022538] __slab_alloc+0xe/0x20
[ 1151.023069] __kmalloc_node+0xfd/0x4b0
[ 1151.023704] __do_sys_swapon+0xcf6/0x1170
[ 1151.024346] do_syscall_64+0x33/0x40
[ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
[ 1151.026889] kfree+0x276/0x2b0
[ 1151.027405] put_cred_rcu+0xa1/0xc0
[ 1151.027949] rcu_do_batch+0x17d/0x410
[ 1151.028566] rcu_core+0x14e/0x2b0
[ 1151.029084] __do_softirq+0x101/0x29e
[ 1151.029645] asm_call_irq_on_stack+0x12/0x20
[ 1151.030381] do_softirq_own_stack+0x37/0x40
[ 1151.031037] do_softirq.part.15+0x2b/0x30
[ 1151.031710] __local_bh_enable_ip+0x4b/0x50
[ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360
[ 1151.033197] __setup_rt_frame+0xce/0x480
[ 1151.033809] arch_do_signal+0x1a3/0x250
[ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110
[ 1151.035242] syscall_exit_to_user_mode+0x27/0x190
[ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
[ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
[ 1151.038323]
[ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
[ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........
[ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7
[ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 1151.048633] Call Trace:
[ 1151.049072] dump_stack+0x57/0x6a
[ 1151.049585] check_bytes_and_report+0xed/0x110
[ 1151.050320] check_object+0x1eb/0x290
[ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540
[ 1151.051646] free_debug_processing+0x151/0x350
[ 1151.052333] __slab_free+0x21a/0x3a0
[ 1151.052938] ? _cond_resched+0x2d/0x40
[ 1151.053529] ? __vunmap+0x1de/0x220
[ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540
[ 1151.054796] ? kfree+0x276/0x2b0
[ 1151.055307] kfree+0x276/0x2b0
[ 1151.055832] __x64_sys_swapoff+0x39a/0x540
[ 1151.056466] do_syscall_64+0x33/0x40
[ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1151.057866] RIP: 0033:0x150340b0ffb7
[ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
[ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
[ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
[ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
[ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
[ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
[ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
[ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
[ 1151.066413]
[ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed
Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap")
Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
fs/iomap/swapfile.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 6250ca6a1f85..4ecf4e1f68ef 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -31,11 +31,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
{
struct iomap *iomap = &isi->iomap;
unsigned long nr_pages;
+ unsigned long max_pages;
uint64_t first_ppage;
uint64_t first_ppage_reported;
uint64_t next_ppage;
int error;
+ if (unlikely(isi->nr_pages >= isi->sis->max))
+ return 0;
+ max_pages = isi->sis->max - isi->nr_pages;
+
/*
* Round the start up and the end down so that the physical
* extent aligns to a page boundary.
@@ -48,6 +53,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
if (first_ppage >= next_ppage)
return 0;
nr_pages = next_ppage - first_ppage;
+ nr_pages = min(nr_pages, max_pages);
/*
* Calculate how much swap space we're adding; the first page contains
--
2.20.1.2432.ga663e714
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent
2021-08-18 11:36 [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent Xu Yu
@ 2021-08-18 17:01 ` Darrick J. Wong
2021-08-19 0:59 ` Yu Xu
2021-08-19 1:43 ` [PATCH] generic: add swapfile maxpages regression test Darrick J. Wong
1 sibling, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2021-08-18 17:01 UTC (permalink / raw)
To: Xu Yu; +Cc: linux-fsdevel, linux-mm, hch, riteshh, tytso, gavin.dg
On Wed, Aug 18, 2021 at 07:36:51PM +0800, Xu Yu wrote:
> When the max pages (last_page in the swap header + 1) is smaller than
> the total pages (inode size) of the swapfile, iomap_swapfile_activate
> overwrites sis->max with total pages.
>
> However, frontswap_map is allocated using max pages, but cleared using
> sis->max in __frontswap_invalidate_area(). The consequence is that the
> neighbors of frontswap_map bitmap may be overwritten, and then slab is
> polluted.
>
> This fixes the issue by considering the limitation of max pages of swap
> info in iomap_swapfile_add_extent().
>
> To reproduce the case, compile kernel with slub RED ZONE, then run test:
> $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \
> --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
> --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
> --temp-path /root/tmpdir/stress-ng/
72 hours? That's not really a targeted reproducer test.
Oh, very interesting! maxpages comes from the swap header.
# fallocate -l 100m /mnt/a
# mkswap /mnt/a
mkswap: /mnt/a: insecure permissions 0644, 0600 suggested.
Setting up swapspace version 1, size = 100 MiB (104853504 bytes)
no label, UUID=c6c93d94-3b91-47cc-9684-9c0edbd04759
# swapon /mnt/a
swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
[ 68.267848] Adding 102396k swap on /mnt/a. Priority:-2 extents:1 across:102396k SS
Ok, ~100M of swap...
# swapoff /mnt/a
# fallocate -l 200m /mnt/a
# swapon /mnt/a
swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
[ 84.458713] Adding 204796k swap on /mnt/a. Priority:-2 extents:1 across:204796k SS
Huh. 200MB of swap, even though we haven't touched the swap header:
# dd if=/mnt/a bs=4096 count=1 | od -tx1 -Ad -c
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0001024 01 00 00 00 ff 63 00 00 00 00 00 00 66 40 eb 42
001 \0 \0 \0 377 c \0 \0 \0 \0 \0 \0 f @ 353 B
Version 1 swap header, 0x63ff pages (~102396kb), 0 bad pages...
0001040 8e 81 4e 4e 92 a1 f9 98 d9 43 2f cc 00 00 00 00
216 201 N N 222 241 371 230 331 C / 314 \0 \0 \0 \0
0001056 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0004080 00 00 00 00 00 00 53 57 41 50 53 50 41 43 45 32
\0 \0 \0 \0 \0 \0 S W A P S P A C E 2
0004096
So yes, iomap_swapfile_* is incorrect. Even better, we have a targeted
way to test this now. :)
> We'll get the error log as below:
>
> [ 1151.015141] =============================================================================
> [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
> [ 1151.017486] -----------------------------------------------------------------------------
> [ 1151.017486]
> [ 1151.018997] Disabling lock debugging due to kernel taint
> [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
> [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
> [ 1151.022538] __slab_alloc+0xe/0x20
> [ 1151.023069] __kmalloc_node+0xfd/0x4b0
> [ 1151.023704] __do_sys_swapon+0xcf6/0x1170
> [ 1151.024346] do_syscall_64+0x33/0x40
> [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
> [ 1151.026889] kfree+0x276/0x2b0
> [ 1151.027405] put_cred_rcu+0xa1/0xc0
> [ 1151.027949] rcu_do_batch+0x17d/0x410
> [ 1151.028566] rcu_core+0x14e/0x2b0
> [ 1151.029084] __do_softirq+0x101/0x29e
> [ 1151.029645] asm_call_irq_on_stack+0x12/0x20
> [ 1151.030381] do_softirq_own_stack+0x37/0x40
> [ 1151.031037] do_softirq.part.15+0x2b/0x30
> [ 1151.031710] __local_bh_enable_ip+0x4b/0x50
> [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360
> [ 1151.033197] __setup_rt_frame+0xce/0x480
> [ 1151.033809] arch_do_signal+0x1a3/0x250
> [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110
> [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190
> [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
> [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
> [ 1151.038323]
> [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
> [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........
> [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7
> [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [ 1151.048633] Call Trace:
> [ 1151.049072] dump_stack+0x57/0x6a
> [ 1151.049585] check_bytes_and_report+0xed/0x110
> [ 1151.050320] check_object+0x1eb/0x290
> [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540
> [ 1151.051646] free_debug_processing+0x151/0x350
> [ 1151.052333] __slab_free+0x21a/0x3a0
> [ 1151.052938] ? _cond_resched+0x2d/0x40
> [ 1151.053529] ? __vunmap+0x1de/0x220
> [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540
> [ 1151.054796] ? kfree+0x276/0x2b0
> [ 1151.055307] kfree+0x276/0x2b0
> [ 1151.055832] __x64_sys_swapoff+0x39a/0x540
> [ 1151.056466] do_syscall_64+0x33/0x40
> [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1151.057866] RIP: 0033:0x150340b0ffb7
> [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
> [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
> [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
> [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
> [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
> [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
> [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
> [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
> [ 1151.066413]
> [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed
>
> Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap")
> Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Looks good; I'll cc you both when I send out the regression test.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/iomap/swapfile.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 6250ca6a1f85..4ecf4e1f68ef 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -31,11 +31,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> {
> struct iomap *iomap = &isi->iomap;
> unsigned long nr_pages;
> + unsigned long max_pages;
> uint64_t first_ppage;
> uint64_t first_ppage_reported;
> uint64_t next_ppage;
> int error;
>
> + if (unlikely(isi->nr_pages >= isi->sis->max))
> + return 0;
> + max_pages = isi->sis->max - isi->nr_pages;
> +
> /*
> * Round the start up and the end down so that the physical
> * extent aligns to a page boundary.
> @@ -48,6 +53,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> if (first_ppage >= next_ppage)
> return 0;
> nr_pages = next_ppage - first_ppage;
> + nr_pages = min(nr_pages, max_pages);
>
> /*
> * Calculate how much swap space we're adding; the first page contains
> --
> 2.20.1.2432.ga663e714
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent
2021-08-18 17:01 ` Darrick J. Wong
@ 2021-08-19 0:59 ` Yu Xu
2021-08-19 1:45 ` Darrick J. Wong
0 siblings, 1 reply; 8+ messages in thread
From: Yu Xu @ 2021-08-19 0:59 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-fsdevel, linux-mm, hch, riteshh, tytso, gavin.dg
On 8/19/21 1:01 AM, Darrick J. Wong wrote:
> On Wed, Aug 18, 2021 at 07:36:51PM +0800, Xu Yu wrote:
>> When the max pages (last_page in the swap header + 1) is smaller than
>> the total pages (inode size) of the swapfile, iomap_swapfile_activate
>> overwrites sis->max with total pages.
>>
>> However, frontswap_map is allocated using max pages, but cleared using
>> sis->max in __frontswap_invalidate_area(). The consequence is that the
>> neighbors of frontswap_map bitmap may be overwritten, and then slab is
>> polluted.
>>
>> This fixes the issue by considering the limitation of max pages of swap
>> info in iomap_swapfile_add_extent().
>>
>> To reproduce the case, compile kernel with slub RED ZONE, then run test:
>> $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \
>> --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
>> --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
>> --temp-path /root/tmpdir/stress-ng/
Hi, Darrick J. Wong,
This is the stress-ng test script from our nightly test, I just copied
as it is.
>
> 72 hours? That's not really a targeted reproducer test.
Actually, the system crashes soon, in less than 5 minutes on average.
>
> Oh, very interesting! maxpages comes from the swap header.
>
> # fallocate -l 100m /mnt/a
> # mkswap /mnt/a
> mkswap: /mnt/a: insecure permissions 0644, 0600 suggested.
> Setting up swapspace version 1, size = 100 MiB (104853504 bytes)
> no label, UUID=c6c93d94-3b91-47cc-9684-9c0edbd04759
> # swapon /mnt/a
> swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
> [ 68.267848] Adding 102396k swap on /mnt/a. Priority:-2 extents:1 across:102396k SS
>
>
> Ok, ~100M of swap...
>
> # swapoff /mnt/a
> # fallocate -l 200m /mnt/a
> # swapon /mnt/a
> swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
> [ 84.458713] Adding 204796k swap on /mnt/a. Priority:-2 extents:1 across:204796k SS
>
> Huh. 200MB of swap, even though we haven't touched the swap header:
>
> # dd if=/mnt/a bs=4096 count=1 | od -tx1 -Ad -c
> 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
> *
> 0001024 01 00 00 00 ff 63 00 00 00 00 00 00 66 40 eb 42
> 001 \0 \0 \0 377 c \0 \0 \0 \0 \0 \0 f @ 353 B
>
> Version 1 swap header, 0x63ff pages (~102396kb), 0 bad pages...
>
> 0001040 8e 81 4e 4e 92 a1 f9 98 d9 43 2f cc 00 00 00 00
> 216 201 N N 222 241 371 230 331 C / 314 \0 \0 \0 \0
> 0001056 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
> *
> 0004080 00 00 00 00 00 00 53 57 41 50 53 50 41 43 45 32
> \0 \0 \0 \0 \0 \0 S W A P S P A C E 2
> 0004096
>
> So yes, iomap_swapfile_* is incorrect. Even better, we have a targeted
> way to test this now. :)
Yes, this is more straightforward. I used to focus on the slab
corruption too much.
>
>> We'll get the error log as below:
>>
>> [ 1151.015141] =============================================================================
>> [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
>> [ 1151.017486] -----------------------------------------------------------------------------
>> [ 1151.017486]
>> [ 1151.018997] Disabling lock debugging due to kernel taint
>> [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
>> [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
>> [ 1151.022538] __slab_alloc+0xe/0x20
>> [ 1151.023069] __kmalloc_node+0xfd/0x4b0
>> [ 1151.023704] __do_sys_swapon+0xcf6/0x1170
>> [ 1151.024346] do_syscall_64+0x33/0x40
>> [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
>> [ 1151.026889] kfree+0x276/0x2b0
>> [ 1151.027405] put_cred_rcu+0xa1/0xc0
>> [ 1151.027949] rcu_do_batch+0x17d/0x410
>> [ 1151.028566] rcu_core+0x14e/0x2b0
>> [ 1151.029084] __do_softirq+0x101/0x29e
>> [ 1151.029645] asm_call_irq_on_stack+0x12/0x20
>> [ 1151.030381] do_softirq_own_stack+0x37/0x40
>> [ 1151.031037] do_softirq.part.15+0x2b/0x30
>> [ 1151.031710] __local_bh_enable_ip+0x4b/0x50
>> [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360
>> [ 1151.033197] __setup_rt_frame+0xce/0x480
>> [ 1151.033809] arch_do_signal+0x1a3/0x250
>> [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110
>> [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190
>> [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
>> [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
>> [ 1151.038323]
>> [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
>> [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........
>> [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7
>> [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>> [ 1151.048633] Call Trace:
>> [ 1151.049072] dump_stack+0x57/0x6a
>> [ 1151.049585] check_bytes_and_report+0xed/0x110
>> [ 1151.050320] check_object+0x1eb/0x290
>> [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540
>> [ 1151.051646] free_debug_processing+0x151/0x350
>> [ 1151.052333] __slab_free+0x21a/0x3a0
>> [ 1151.052938] ? _cond_resched+0x2d/0x40
>> [ 1151.053529] ? __vunmap+0x1de/0x220
>> [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540
>> [ 1151.054796] ? kfree+0x276/0x2b0
>> [ 1151.055307] kfree+0x276/0x2b0
>> [ 1151.055832] __x64_sys_swapoff+0x39a/0x540
>> [ 1151.056466] do_syscall_64+0x33/0x40
>> [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 1151.057866] RIP: 0033:0x150340b0ffb7
>> [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
>> [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
>> [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
>> [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
>> [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
>> [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
>> [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
>> [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
>> [ 1151.066413]
>> [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed
>>
>> Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap")
>> Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
>> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
>
> Looks good; I'll cc you both when I send out the regression test.
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
> --D
>
>> ---
>> fs/iomap/swapfile.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
>> index 6250ca6a1f85..4ecf4e1f68ef 100644
>> --- a/fs/iomap/swapfile.c
>> +++ b/fs/iomap/swapfile.c
>> @@ -31,11 +31,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>> {
>> struct iomap *iomap = &isi->iomap;
>> unsigned long nr_pages;
>> + unsigned long max_pages;
>> uint64_t first_ppage;
>> uint64_t first_ppage_reported;
>> uint64_t next_ppage;
>> int error;
>>
>> + if (unlikely(isi->nr_pages >= isi->sis->max))
>> + return 0;
>> + max_pages = isi->sis->max - isi->nr_pages;
>> +
>> /*
>> * Round the start up and the end down so that the physical
>> * extent aligns to a page boundary.
>> @@ -48,6 +53,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>> if (first_ppage >= next_ppage)
>> return 0;
>> nr_pages = next_ppage - first_ppage;
>> + nr_pages = min(nr_pages, max_pages);
>>
>> /*
>> * Calculate how much swap space we're adding; the first page contains
>> --
>> 2.20.1.2432.ga663e714
>>
--
Thanks,
Yu
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] generic: add swapfile maxpages regression test
2021-08-18 11:36 [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent Xu Yu
2021-08-18 17:01 ` Darrick J. Wong
@ 2021-08-19 1:43 ` Darrick J. Wong
2021-08-19 5:14 ` Zorro Lang
2021-08-19 8:22 ` Christoph Hellwig
1 sibling, 2 replies; 8+ messages in thread
From: Darrick J. Wong @ 2021-08-19 1:43 UTC (permalink / raw)
To: Xu Yu
Cc: linux-fsdevel, linux-mm, hch, riteshh, tytso, gavin.dg, fstests,
linux-btrfs
From: Darrick J. Wong <djwong@kernel.org>
Add regression test for "mm/swap: consider max pages in
iomap_swapfile_add_extent".
Cc: Gang Deng <gavin.dg@linux.alibaba.com>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
tests/generic/727 | 62 +++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/727.out | 2 ++
2 files changed, 64 insertions(+)
create mode 100755 tests/generic/727
create mode 100644 tests/generic/727.out
diff --git a/tests/generic/727 b/tests/generic/727
new file mode 100755
index 00000000..a546ad51
--- /dev/null
+++ b/tests/generic/727
@@ -0,0 +1,62 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2021 Oracle. All Rights Reserved.
+#
+# FS QA Test 727
+#
+# Regression test for "mm/swap: consider max pages in iomap_swapfile_add_extent"
+
+# Xu Yu found that the iomap swapfile activation code failed to constrain
+# itself to activating however many swap pages that the mm asked us for. This
+# is an deviation in behavior from the classic swapfile code. It also leads to
+# kernel memory corruption if the swapfile is cleverly constructed.
+#
+. ./common/preamble
+_begin_fstest auto swap
+
+# Override the default cleanup function.
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+ test -n "$swapfile" && swapoff $swapfile &> /dev/null
+}
+
+# real QA test starts here
+_supported_fs generic
+_require_scratch_swapfile
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount >> $seqres.full
+
+# Assuming we're not borrowing a FAT16 partition from Windows 3.1, we need an
+# unlikely enough name that we can grep /proc/swaps for this.
+swapfile=$SCRATCH_MNT/386spart.par
+_format_swapfile $swapfile 1m >> $seqres.full
+
+swapfile_pages() {
+ local swapfile="$1"
+
+ grep "$swapfile" /proc/swaps | awk '{print $3}'
+}
+
+_swapon_file $swapfile
+before_pages=$(swapfile_pages "$swapfile")
+swapoff $swapfile
+
+# Extend the length of the swapfile but do not rewrite the header.
+# The subsequent swapon should set up 1MB worth of pages, not 2MB.
+$XFS_IO_PROG -f -c 'pwrite 1m 1m' $swapfile >> $seqres.full
+
+_swapon_file $swapfile
+after_pages=$(swapfile_pages "$swapfile")
+swapoff $swapfile
+
+# Both swapon attempts should have found the same number of pages.
+test "$before_pages" -eq "$after_pages" || \
+ echo "swapon added $after_pages pages, expected $before_pages"
+
+# success, all done
+echo Silence is golden
+status=0
+exit
diff --git a/tests/generic/727.out b/tests/generic/727.out
new file mode 100644
index 00000000..2de2b4b2
--- /dev/null
+++ b/tests/generic/727.out
@@ -0,0 +1,2 @@
+QA output created by 727
+Silence is golden
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent
2021-08-19 0:59 ` Yu Xu
@ 2021-08-19 1:45 ` Darrick J. Wong
2021-08-19 6:31 ` Yu Xu
0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2021-08-19 1:45 UTC (permalink / raw)
To: Yu Xu; +Cc: linux-fsdevel, linux-mm, hch, riteshh, tytso, gavin.dg, linux-btrfs
On Thu, Aug 19, 2021 at 08:59:22AM +0800, Yu Xu wrote:
> On 8/19/21 1:01 AM, Darrick J. Wong wrote:
> > On Wed, Aug 18, 2021 at 07:36:51PM +0800, Xu Yu wrote:
> > > When the max pages (last_page in the swap header + 1) is smaller than
> > > the total pages (inode size) of the swapfile, iomap_swapfile_activate
> > > overwrites sis->max with total pages.
> > >
> > > However, frontswap_map is allocated using max pages, but cleared using
> > > sis->max in __frontswap_invalidate_area(). The consequence is that the
> > > neighbors of frontswap_map bitmap may be overwritten, and then slab is
> > > polluted.
> > >
> > > This fixes the issue by considering the limitation of max pages of swap
> > > info in iomap_swapfile_add_extent().
> > >
> > > To reproduce the case, compile kernel with slub RED ZONE, then run test:
> > > $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \
> > > --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
> > > --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
> > > --temp-path /root/tmpdir/stress-ng/
>
> Hi, Darrick J. Wong,
>
> This is the stress-ng test script from our nightly test, I just copied
> as it is.
>
> >
> > 72 hours? That's not really a targeted reproducer test.
>
> Actually, the system crashes soon, in less than 5 minutes on average.
Hehe.
I'm about to send out a regression fstest that (I think!) reproduces the
validation error. Would you mind running it in your environment to
confirm that it trips the same memory debugger warning?
Also ccing the btrfs list because the regression test fails on btrfs in
the same way that iomap failed. :(
--D
> >
> > Oh, very interesting! maxpages comes from the swap header.
> >
> > # fallocate -l 100m /mnt/a
> > # mkswap /mnt/a
> > mkswap: /mnt/a: insecure permissions 0644, 0600 suggested.
> > Setting up swapspace version 1, size = 100 MiB (104853504 bytes)
> > no label, UUID=c6c93d94-3b91-47cc-9684-9c0edbd04759
> > # swapon /mnt/a
> > swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
> > [ 68.267848] Adding 102396k swap on /mnt/a. Priority:-2 extents:1 across:102396k SS
> >
> >
> > Ok, ~100M of swap...
> >
> > # swapoff /mnt/a
> > # fallocate -l 200m /mnt/a
> > # swapon /mnt/a
> > swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
> > [ 84.458713] Adding 204796k swap on /mnt/a. Priority:-2 extents:1 across:204796k SS
> >
> > Huh. 200MB of swap, even though we haven't touched the swap header:
> >
> > # dd if=/mnt/a bs=4096 count=1 | od -tx1 -Ad -c
> > 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
> > *
> > 0001024 01 00 00 00 ff 63 00 00 00 00 00 00 66 40 eb 42
> > 001 \0 \0 \0 377 c \0 \0 \0 \0 \0 \0 f @ 353 B
> >
> > Version 1 swap header, 0x63ff pages (~102396kb), 0 bad pages...
> >
> > 0001040 8e 81 4e 4e 92 a1 f9 98 d9 43 2f cc 00 00 00 00
> > 216 201 N N 222 241 371 230 331 C / 314 \0 \0 \0 \0
> > 0001056 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
> > *
> > 0004080 00 00 00 00 00 00 53 57 41 50 53 50 41 43 45 32
> > \0 \0 \0 \0 \0 \0 S W A P S P A C E 2
> > 0004096
> >
> > So yes, iomap_swapfile_* is incorrect. Even better, we have a targeted
> > way to test this now. :)
>
> Yes, this is more straightforward. I used to focus on the slab
> corruption too much.
>
> >
> > > We'll get the error log as below:
> > >
> > > [ 1151.015141] =============================================================================
> > > [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
> > > [ 1151.017486] -----------------------------------------------------------------------------
> > > [ 1151.017486]
> > > [ 1151.018997] Disabling lock debugging due to kernel taint
> > > [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
> > > [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
> > > [ 1151.022538] __slab_alloc+0xe/0x20
> > > [ 1151.023069] __kmalloc_node+0xfd/0x4b0
> > > [ 1151.023704] __do_sys_swapon+0xcf6/0x1170
> > > [ 1151.024346] do_syscall_64+0x33/0x40
> > > [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
> > > [ 1151.026889] kfree+0x276/0x2b0
> > > [ 1151.027405] put_cred_rcu+0xa1/0xc0
> > > [ 1151.027949] rcu_do_batch+0x17d/0x410
> > > [ 1151.028566] rcu_core+0x14e/0x2b0
> > > [ 1151.029084] __do_softirq+0x101/0x29e
> > > [ 1151.029645] asm_call_irq_on_stack+0x12/0x20
> > > [ 1151.030381] do_softirq_own_stack+0x37/0x40
> > > [ 1151.031037] do_softirq.part.15+0x2b/0x30
> > > [ 1151.031710] __local_bh_enable_ip+0x4b/0x50
> > > [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360
> > > [ 1151.033197] __setup_rt_frame+0xce/0x480
> > > [ 1151.033809] arch_do_signal+0x1a3/0x250
> > > [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110
> > > [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190
> > > [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
> > > [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
> > > [ 1151.038323]
> > > [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
> > > [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........
> > > [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7
> > > [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> > > [ 1151.048633] Call Trace:
> > > [ 1151.049072] dump_stack+0x57/0x6a
> > > [ 1151.049585] check_bytes_and_report+0xed/0x110
> > > [ 1151.050320] check_object+0x1eb/0x290
> > > [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540
> > > [ 1151.051646] free_debug_processing+0x151/0x350
> > > [ 1151.052333] __slab_free+0x21a/0x3a0
> > > [ 1151.052938] ? _cond_resched+0x2d/0x40
> > > [ 1151.053529] ? __vunmap+0x1de/0x220
> > > [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540
> > > [ 1151.054796] ? kfree+0x276/0x2b0
> > > [ 1151.055307] kfree+0x276/0x2b0
> > > [ 1151.055832] __x64_sys_swapoff+0x39a/0x540
> > > [ 1151.056466] do_syscall_64+0x33/0x40
> > > [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [ 1151.057866] RIP: 0033:0x150340b0ffb7
> > > [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
> > > [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
> > > [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
> > > [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
> > > [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
> > > [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
> > > [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
> > > [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
> > > [ 1151.066413]
> > > [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed
> > >
> > > Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap")
> > > Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
> > > Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> >
> > Looks good; I'll cc you both when I send out the regression test.
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> >
> > --D
> >
> > > ---
> > > fs/iomap/swapfile.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> > > index 6250ca6a1f85..4ecf4e1f68ef 100644
> > > --- a/fs/iomap/swapfile.c
> > > +++ b/fs/iomap/swapfile.c
> > > @@ -31,11 +31,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> > > {
> > > struct iomap *iomap = &isi->iomap;
> > > unsigned long nr_pages;
> > > + unsigned long max_pages;
> > > uint64_t first_ppage;
> > > uint64_t first_ppage_reported;
> > > uint64_t next_ppage;
> > > int error;
> > > + if (unlikely(isi->nr_pages >= isi->sis->max))
> > > + return 0;
> > > + max_pages = isi->sis->max - isi->nr_pages;
> > > +
> > > /*
> > > * Round the start up and the end down so that the physical
> > > * extent aligns to a page boundary.
> > > @@ -48,6 +53,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> > > if (first_ppage >= next_ppage)
> > > return 0;
> > > nr_pages = next_ppage - first_ppage;
> > > + nr_pages = min(nr_pages, max_pages);
> > > /*
> > > * Calculate how much swap space we're adding; the first page contains
> > > --
> > > 2.20.1.2432.ga663e714
> > >
>
> --
> Thanks,
> Yu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] generic: add swapfile maxpages regression test
2021-08-19 1:43 ` [PATCH] generic: add swapfile maxpages regression test Darrick J. Wong
@ 2021-08-19 5:14 ` Zorro Lang
2021-08-19 8:22 ` Christoph Hellwig
1 sibling, 0 replies; 8+ messages in thread
From: Zorro Lang @ 2021-08-19 5:14 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Xu Yu, linux-fsdevel, linux-mm, hch, riteshh, tytso, gavin.dg,
fstests, linux-btrfs
On Wed, Aug 18, 2021 at 06:43:26PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Add regression test for "mm/swap: consider max pages in
> iomap_swapfile_add_extent".
>
> Cc: Gang Deng <gavin.dg@linux.alibaba.com>
> Cc: Xu Yu <xuyu@linux.alibaba.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
The code logic looks good to me. And [1][2]... so I think this test
is good. But of course, wait for more review points from cc list.
Reviewed-by: Zorro Lang <zlang@redhat.com>
[1]
Test passed on old kernel without this regression (xfs fails):
# ./check generic/727
FSTYP -- ext4
PLATFORM -- Linux/x86_64 xx-xxxx-xx 4.18.0-xxx.el8.x86_64+debug #1 SMP Wed Jul 14 12:35:49 EDT 2021
MKFS_OPTIONS -- /dev/mapper/rhel-xx-xxxx-xx-xfscratch
MOUNT_OPTIONS -- -o acl,user_xattr -o context=system_u:object_r:root_t:s0 /dev/mapper/rhel-xx-xxxx-xx-xfscratch /mnt/scratch
generic/727 15s
Ran: generic/727
Passed all 1 tests
[2]
Reproduced on new kernel with this regression:
# ./check generic/727
FSTYP -- ext4
PLATFORM -- Linux/x86_64 xxx-xxxx-xx 5.14.0-rc4-xfs #14 SMP Thu Aug 12 00:56:07 CST 2021
MKFS_OPTIONS -- /dev/mapper/testvg-scratchdev
MOUNT_OPTIONS -- -o acl,user_xattr -o context=system_u:object_r:root_t:s0 /dev/mapper/testvg-scratchdev /mnt/scratch
generic/727 - output mismatch (see /root/git/xfstests-dev/results//generic/727.out.bad)
--- tests/generic/727.out 2021-08-19 11:20:14.677794743 +0800
+++ /root/git/xfstests-dev/results//generic/727.out.bad 2021-08-19 11:21:46.654450307 +0800
@@ -1,2 +1,3 @@
QA output created by 727
+swapon added 2044 pages, expected 1020
Silence is golden
...
(Run 'diff -u /root/git/xfstests-dev/tests/generic/727.out /root/git/xfstests-dev/results//generic/727.out.bad' to see the entire diff)
Ran: generic/727
Failures: generic/727
Failed 1 of 1 tests
> tests/generic/727 | 62 +++++++++++++++++++++++++++++++++++++++++++++++++
> tests/generic/727.out | 2 ++
> 2 files changed, 64 insertions(+)
> create mode 100755 tests/generic/727
> create mode 100644 tests/generic/727.out
>
> diff --git a/tests/generic/727 b/tests/generic/727
> new file mode 100755
> index 00000000..a546ad51
> --- /dev/null
> +++ b/tests/generic/727
> @@ -0,0 +1,62 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle. All Rights Reserved.
> +#
> +# FS QA Test 727
> +#
> +# Regression test for "mm/swap: consider max pages in iomap_swapfile_add_extent"
> +
> +# Xu Yu found that the iomap swapfile activation code failed to constrain
> +# itself to activating however many swap pages that the mm asked us for. This
> +# is an deviation in behavior from the classic swapfile code. It also leads to
> +# kernel memory corruption if the swapfile is cleverly constructed.
> +#
> +. ./common/preamble
> +_begin_fstest auto swap
> +
> +# Override the default cleanup function.
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> + test -n "$swapfile" && swapoff $swapfile &> /dev/null
> +}
> +
> +# real QA test starts here
> +_supported_fs generic
> +_require_scratch_swapfile
> +
> +_scratch_mkfs >> $seqres.full
> +_scratch_mount >> $seqres.full
> +
> +# Assuming we're not borrowing a FAT16 partition from Windows 3.1, we need an
> +# unlikely enough name that we can grep /proc/swaps for this.
> +swapfile=$SCRATCH_MNT/386spart.par
> +_format_swapfile $swapfile 1m >> $seqres.full
> +
> +swapfile_pages() {
> + local swapfile="$1"
> +
> + grep "$swapfile" /proc/swaps | awk '{print $3}'
> +}
> +
> +_swapon_file $swapfile
> +before_pages=$(swapfile_pages "$swapfile")
> +swapoff $swapfile
> +
> +# Extend the length of the swapfile but do not rewrite the header.
> +# The subsequent swapon should set up 1MB worth of pages, not 2MB.
> +$XFS_IO_PROG -f -c 'pwrite 1m 1m' $swapfile >> $seqres.full
> +
> +_swapon_file $swapfile
> +after_pages=$(swapfile_pages "$swapfile")
> +swapoff $swapfile
> +
> +# Both swapon attempts should have found the same number of pages.
> +test "$before_pages" -eq "$after_pages" || \
> + echo "swapon added $after_pages pages, expected $before_pages"
> +
> +# success, all done
> +echo Silence is golden
> +status=0
> +exit
> diff --git a/tests/generic/727.out b/tests/generic/727.out
> new file mode 100644
> index 00000000..2de2b4b2
> --- /dev/null
> +++ b/tests/generic/727.out
> @@ -0,0 +1,2 @@
> +QA output created by 727
> +Silence is golden
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent
2021-08-19 1:45 ` Darrick J. Wong
@ 2021-08-19 6:31 ` Yu Xu
0 siblings, 0 replies; 8+ messages in thread
From: Yu Xu @ 2021-08-19 6:31 UTC (permalink / raw)
To: Darrick J. Wong
Cc: linux-fsdevel, linux-mm, hch, riteshh, tytso, gavin.dg, linux-btrfs
On 8/19/21 9:45 AM, Darrick J. Wong wrote:
> On Thu, Aug 19, 2021 at 08:59:22AM +0800, Yu Xu wrote:
>> On 8/19/21 1:01 AM, Darrick J. Wong wrote:
>>> On Wed, Aug 18, 2021 at 07:36:51PM +0800, Xu Yu wrote:
>>>> When the max pages (last_page in the swap header + 1) is smaller than
>>>> the total pages (inode size) of the swapfile, iomap_swapfile_activate
>>>> overwrites sis->max with total pages.
>>>>
>>>> However, frontswap_map is allocated using max pages, but cleared using
>>>> sis->max in __frontswap_invalidate_area(). The consequence is that the
>>>> neighbors of frontswap_map bitmap may be overwritten, and then slab is
>>>> polluted.
>>>>
>>>> This fixes the issue by considering the limitation of max pages of swap
>>>> info in iomap_swapfile_add_extent().
>>>>
>>>> To reproduce the case, compile kernel with slub RED ZONE, then run test:
>>>> $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \
>>>> --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
>>>> --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
>>>> --temp-path /root/tmpdir/stress-ng/
>>
>> Hi, Darrick J. Wong,
>>
>> This is the stress-ng test script from our nightly test, I just copied
>> as it is.
>>
>>>
>>> 72 hours? That's not really a targeted reproducer test.
>>
>> Actually, the system crashes soon, in less than 5 minutes on average.
>
> Hehe.
>
> I'm about to send out a regression fstest that (I think!) reproduces the
> validation error. Would you mind running it in your environment to
> confirm that it trips the same memory debugger warning?
Hi, Darrick J. Wong,
Things are more complicated than I thought. I thought that slab redzone
overwritten happens at __frontswap_invalidate_area() when swapoff, and
your testcase should easily reproduce the debugger warning.
However, it did not.
Then I looked into the codes, and find that sis->max is reset to 0
before frontswap_invalidate_area(). And the slab redzone overwritten
should happen at __frontswap_clear(sis,offset) in
__frontswap_invalidate_page().
I did not reproduce the debugger warning solely with your testcase,
but your testcase should have the same effect as stress-swap.c in
stress-ng testsuite, i.e., maxpages in swap header != file size.
I think to replace stress-swap with your testcase should have the same
result, but I am not going to dig into it.
To summarize,
1) I think that your testcase makes sense, as Zorro Lang commented.
2) I will send PATCH v2 to correct the commit message.
>
> Also ccing the btrfs list because the regression test fails on btrfs in
> the same way that iomap failed. :(
>
> --D
>
>>>
>>> Oh, very interesting! maxpages comes from the swap header.
>>>
>>> # fallocate -l 100m /mnt/a
>>> # mkswap /mnt/a
>>> mkswap: /mnt/a: insecure permissions 0644, 0600 suggested.
>>> Setting up swapspace version 1, size = 100 MiB (104853504 bytes)
>>> no label, UUID=c6c93d94-3b91-47cc-9684-9c0edbd04759
>>> # swapon /mnt/a
>>> swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
>>> [ 68.267848] Adding 102396k swap on /mnt/a. Priority:-2 extents:1 across:102396k SS
>>>
>>>
>>> Ok, ~100M of swap...
>>>
>>> # swapoff /mnt/a
>>> # fallocate -l 200m /mnt/a
>>> # swapon /mnt/a
>>> swapon: /mnt/a: insecure permissions 0644, 0600 suggested.
>>> [ 84.458713] Adding 204796k swap on /mnt/a. Priority:-2 extents:1 across:204796k SS
>>>
>>> Huh. 200MB of swap, even though we haven't touched the swap header:
>>>
>>> # dd if=/mnt/a bs=4096 count=1 | od -tx1 -Ad -c
>>> 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
>>> *
>>> 0001024 01 00 00 00 ff 63 00 00 00 00 00 00 66 40 eb 42
>>> 001 \0 \0 \0 377 c \0 \0 \0 \0 \0 \0 f @ 353 B
>>>
>>> Version 1 swap header, 0x63ff pages (~102396kb), 0 bad pages...
>>>
>>> 0001040 8e 81 4e 4e 92 a1 f9 98 d9 43 2f cc 00 00 00 00
>>> 216 201 N N 222 241 371 230 331 C / 314 \0 \0 \0 \0
>>> 0001056 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
>>> *
>>> 0004080 00 00 00 00 00 00 53 57 41 50 53 50 41 43 45 32
>>> \0 \0 \0 \0 \0 \0 S W A P S P A C E 2
>>> 0004096
>>>
>>> So yes, iomap_swapfile_* is incorrect. Even better, we have a targeted
>>> way to test this now. :)
>>
>> Yes, this is more straightforward. I used to focus on the slab
>> corruption too much.
>>
>>>
>>>> We'll get the error log as below:
>>>>
>>>> [ 1151.015141] =============================================================================
>>>> [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
>>>> [ 1151.017486] -----------------------------------------------------------------------------
>>>> [ 1151.017486]
>>>> [ 1151.018997] Disabling lock debugging due to kernel taint
>>>> [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
>>>> [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
>>>> [ 1151.022538] __slab_alloc+0xe/0x20
>>>> [ 1151.023069] __kmalloc_node+0xfd/0x4b0
>>>> [ 1151.023704] __do_sys_swapon+0xcf6/0x1170
>>>> [ 1151.024346] do_syscall_64+0x33/0x40
>>>> [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
>>>> [ 1151.026889] kfree+0x276/0x2b0
>>>> [ 1151.027405] put_cred_rcu+0xa1/0xc0
>>>> [ 1151.027949] rcu_do_batch+0x17d/0x410
>>>> [ 1151.028566] rcu_core+0x14e/0x2b0
>>>> [ 1151.029084] __do_softirq+0x101/0x29e
>>>> [ 1151.029645] asm_call_irq_on_stack+0x12/0x20
>>>> [ 1151.030381] do_softirq_own_stack+0x37/0x40
>>>> [ 1151.031037] do_softirq.part.15+0x2b/0x30
>>>> [ 1151.031710] __local_bh_enable_ip+0x4b/0x50
>>>> [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360
>>>> [ 1151.033197] __setup_rt_frame+0xce/0x480
>>>> [ 1151.033809] arch_do_signal+0x1a3/0x250
>>>> [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110
>>>> [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190
>>>> [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
>>>> [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
>>>> [ 1151.038323]
>>>> [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
>>>> [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>>> [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........
>>>> [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>>>> [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7
>>>> [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>>>> [ 1151.048633] Call Trace:
>>>> [ 1151.049072] dump_stack+0x57/0x6a
>>>> [ 1151.049585] check_bytes_and_report+0xed/0x110
>>>> [ 1151.050320] check_object+0x1eb/0x290
>>>> [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540
>>>> [ 1151.051646] free_debug_processing+0x151/0x350
>>>> [ 1151.052333] __slab_free+0x21a/0x3a0
>>>> [ 1151.052938] ? _cond_resched+0x2d/0x40
>>>> [ 1151.053529] ? __vunmap+0x1de/0x220
>>>> [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540
>>>> [ 1151.054796] ? kfree+0x276/0x2b0
>>>> [ 1151.055307] kfree+0x276/0x2b0
>>>> [ 1151.055832] __x64_sys_swapoff+0x39a/0x540
>>>> [ 1151.056466] do_syscall_64+0x33/0x40
>>>> [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [ 1151.057866] RIP: 0033:0x150340b0ffb7
>>>> [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
>>>> [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
>>>> [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
>>>> [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
>>>> [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
>>>> [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
>>>> [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
>>>> [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
>>>> [ 1151.066413]
>>>> [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed
>>>>
>>>> Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap")
>>>> Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
>>>> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
>>>
>>> Looks good; I'll cc you both when I send out the regression test.
>>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>>>
>>> --D
>>>
>>>> ---
>>>> fs/iomap/swapfile.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
>>>> index 6250ca6a1f85..4ecf4e1f68ef 100644
>>>> --- a/fs/iomap/swapfile.c
>>>> +++ b/fs/iomap/swapfile.c
>>>> @@ -31,11 +31,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>>>> {
>>>> struct iomap *iomap = &isi->iomap;
>>>> unsigned long nr_pages;
>>>> + unsigned long max_pages;
>>>> uint64_t first_ppage;
>>>> uint64_t first_ppage_reported;
>>>> uint64_t next_ppage;
>>>> int error;
>>>> + if (unlikely(isi->nr_pages >= isi->sis->max))
>>>> + return 0;
>>>> + max_pages = isi->sis->max - isi->nr_pages;
>>>> +
>>>> /*
>>>> * Round the start up and the end down so that the physical
>>>> * extent aligns to a page boundary.
>>>> @@ -48,6 +53,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>>>> if (first_ppage >= next_ppage)
>>>> return 0;
>>>> nr_pages = next_ppage - first_ppage;
>>>> + nr_pages = min(nr_pages, max_pages);
>>>> /*
>>>> * Calculate how much swap space we're adding; the first page contains
>>>> --
>>>> 2.20.1.2432.ga663e714
>>>>
>>
>> --
>> Thanks,
>> Yu
--
Thanks,
Yu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] generic: add swapfile maxpages regression test
2021-08-19 1:43 ` [PATCH] generic: add swapfile maxpages regression test Darrick J. Wong
2021-08-19 5:14 ` Zorro Lang
@ 2021-08-19 8:22 ` Christoph Hellwig
1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2021-08-19 8:22 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Xu Yu, linux-fsdevel, linux-mm, hch, riteshh, tytso, gavin.dg,
fstests, linux-btrfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-08-19 8:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18 11:36 [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent Xu Yu
2021-08-18 17:01 ` Darrick J. Wong
2021-08-19 0:59 ` Yu Xu
2021-08-19 1:45 ` Darrick J. Wong
2021-08-19 6:31 ` Yu Xu
2021-08-19 1:43 ` [PATCH] generic: add swapfile maxpages regression test Darrick J. Wong
2021-08-19 5:14 ` Zorro Lang
2021-08-19 8:22 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.