From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61780C4338F for ; Thu, 19 Aug 2021 00:59:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D75E8608FE for ; Thu, 19 Aug 2021 00:59:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D75E8608FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 715396B006C; Wed, 18 Aug 2021 20:59:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C5758D0001; Wed, 18 Aug 2021 20:59:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DAE06B0072; Wed, 18 Aug 2021 20:59:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 40A536B006C for ; Wed, 18 Aug 2021 20:59:32 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BEAEF267D8 for ; Thu, 19 Aug 2021 00:59:31 +0000 (UTC) X-FDA: 78490022142.01.A851043 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by imf03.hostedemail.com (Postfix) with ESMTP id 53DD9300E57D for ; Thu, 19 Aug 2021 00:59:28 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=xuyu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UjtBNTB_1629334762; Received: from 192.168.0.102(mailfrom:xuyu@linux.alibaba.com fp:SMTPD_---0UjtBNTB_1629334762) by smtp.aliyun-inc.com(127.0.0.1); Thu, 19 Aug 2021 08:59:23 +0800 Subject: Re: [PATCH] mm/swap: consider max pages in iomap_swapfile_add_extent To: "Darrick J. Wong" Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hch@infradead.org, riteshh@linux.ibm.com, tytso@mit.edu, gavin.dg@linux.alibaba.com References: <20210818170152.GG12664@magnolia> From: Yu Xu Message-ID: Date: Thu, 19 Aug 2021 08:59:22 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <20210818170152.GG12664@magnolia> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 53DD9300E57D Authentication-Results: imf03.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf03.hostedemail.com: domain of xuyu@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=xuyu@linux.alibaba.com X-Rspamd-Server: rspam04 X-Stat-Signature: 9csw8xq7os5t7jbhwo3ny8tmzsso88ui X-HE-Tag: 1629334768-126090 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/19/21 1:01 AM, Darrick J. Wong wrote: > On Wed, Aug 18, 2021 at 07:36:51PM +0800, Xu Yu wrote: >> When the max pages (last_page in the swap header + 1) is smaller than >> the total pages (inode size) of the swapfile, iomap_swapfile_activate >> overwrites sis->max with total pages. >> >> However, frontswap_map is allocated using max pages, but cleared using >> sis->max in __frontswap_invalidate_area(). The consequence is that the >> neighbors of frontswap_map bitmap may be overwritten, and then slab is >> polluted. >> >> This fixes the issue by considering the limitation of max pages of swap >> info in iomap_swapfile_add_extent(). >> >> To reproduce the case, compile kernel with slub RED ZONE, then run test: >> $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \ >> --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \ >> --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \ >> --temp-path /root/tmpdir/stress-ng/ Hi, Darrick J. Wong, This is the stress-ng test script from our nightly test, I just copied as it is. > > 72 hours? That's not really a targeted reproducer test. Actually, the system crashes soon, in less than 5 minutes on average. > > Oh, very interesting! maxpages comes from the swap header. > > # fallocate -l 100m /mnt/a > # mkswap /mnt/a > mkswap: /mnt/a: insecure permissions 0644, 0600 suggested. > Setting up swapspace version 1, size = 100 MiB (104853504 bytes) > no label, UUID=c6c93d94-3b91-47cc-9684-9c0edbd04759 > # swapon /mnt/a > swapon: /mnt/a: insecure permissions 0644, 0600 suggested. > [ 68.267848] Adding 102396k swap on /mnt/a. Priority:-2 extents:1 across:102396k SS > > > Ok, ~100M of swap... > > # swapoff /mnt/a > # fallocate -l 200m /mnt/a > # swapon /mnt/a > swapon: /mnt/a: insecure permissions 0644, 0600 suggested. > [ 84.458713] Adding 204796k swap on /mnt/a. Priority:-2 extents:1 across:204796k SS > > Huh. 200MB of swap, even though we haven't touched the swap header: > > # dd if=/mnt/a bs=4096 count=1 | od -tx1 -Ad -c > 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 > * > 0001024 01 00 00 00 ff 63 00 00 00 00 00 00 66 40 eb 42 > 001 \0 \0 \0 377 c \0 \0 \0 \0 \0 \0 f @ 353 B > > Version 1 swap header, 0x63ff pages (~102396kb), 0 bad pages... > > 0001040 8e 81 4e 4e 92 a1 f9 98 d9 43 2f cc 00 00 00 00 > 216 201 N N 222 241 371 230 331 C / 314 \0 \0 \0 \0 > 0001056 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 > * > 0004080 00 00 00 00 00 00 53 57 41 50 53 50 41 43 45 32 > \0 \0 \0 \0 \0 \0 S W A P S P A C E 2 > 0004096 > > So yes, iomap_swapfile_* is incorrect. Even better, we have a targeted > way to test this now. :) Yes, this is more straightforward. I used to focus on the slab corruption too much. > >> We'll get the error log as below: >> >> [ 1151.015141] ============================================================================= >> [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten >> [ 1151.017486] ----------------------------------------------------------------------------- >> [ 1151.017486] >> [ 1151.018997] Disabling lock debugging due to kernel taint >> [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc >> [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816 >> [ 1151.022538] __slab_alloc+0xe/0x20 >> [ 1151.023069] __kmalloc_node+0xfd/0x4b0 >> [ 1151.023704] __do_sys_swapon+0xcf6/0x1170 >> [ 1151.024346] do_syscall_64+0x33/0x40 >> [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041 >> [ 1151.026889] kfree+0x276/0x2b0 >> [ 1151.027405] put_cred_rcu+0xa1/0xc0 >> [ 1151.027949] rcu_do_batch+0x17d/0x410 >> [ 1151.028566] rcu_core+0x14e/0x2b0 >> [ 1151.029084] __do_softirq+0x101/0x29e >> [ 1151.029645] asm_call_irq_on_stack+0x12/0x20 >> [ 1151.030381] do_softirq_own_stack+0x37/0x40 >> [ 1151.031037] do_softirq.part.15+0x2b/0x30 >> [ 1151.031710] __local_bh_enable_ip+0x4b/0x50 >> [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360 >> [ 1151.033197] __setup_rt_frame+0xce/0x480 >> [ 1151.033809] arch_do_signal+0x1a3/0x250 >> [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110 >> [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190 >> [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201 >> [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000 >> [ 1151.038323] >> [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ >> [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........ >> [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ >> [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7 >> [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 >> [ 1151.048633] Call Trace: >> [ 1151.049072] dump_stack+0x57/0x6a >> [ 1151.049585] check_bytes_and_report+0xed/0x110 >> [ 1151.050320] check_object+0x1eb/0x290 >> [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540 >> [ 1151.051646] free_debug_processing+0x151/0x350 >> [ 1151.052333] __slab_free+0x21a/0x3a0 >> [ 1151.052938] ? _cond_resched+0x2d/0x40 >> [ 1151.053529] ? __vunmap+0x1de/0x220 >> [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540 >> [ 1151.054796] ? kfree+0x276/0x2b0 >> [ 1151.055307] kfree+0x276/0x2b0 >> [ 1151.055832] __x64_sys_swapoff+0x39a/0x540 >> [ 1151.056466] do_syscall_64+0x33/0x40 >> [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 1151.057866] RIP: 0033:0x150340b0ffb7 >> [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d. >> [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8 >> [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7 >> [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860 >> [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd >> [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094 >> [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c >> [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc >> [ 1151.066413] >> [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed >> >> Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap") >> Signed-off-by: Gang Deng >> Signed-off-by: Xu Yu > > Looks good; I'll cc you both when I send out the regression test. > Reviewed-by: Darrick J. Wong > > --D > >> --- >> fs/iomap/swapfile.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c >> index 6250ca6a1f85..4ecf4e1f68ef 100644 >> --- a/fs/iomap/swapfile.c >> +++ b/fs/iomap/swapfile.c >> @@ -31,11 +31,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) >> { >> struct iomap *iomap = &isi->iomap; >> unsigned long nr_pages; >> + unsigned long max_pages; >> uint64_t first_ppage; >> uint64_t first_ppage_reported; >> uint64_t next_ppage; >> int error; >> >> + if (unlikely(isi->nr_pages >= isi->sis->max)) >> + return 0; >> + max_pages = isi->sis->max - isi->nr_pages; >> + >> /* >> * Round the start up and the end down so that the physical >> * extent aligns to a page boundary. >> @@ -48,6 +53,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) >> if (first_ppage >= next_ppage) >> return 0; >> nr_pages = next_ppage - first_ppage; >> + nr_pages = min(nr_pages, max_pages); >> >> /* >> * Calculate how much swap space we're adding; the first page contains >> -- >> 2.20.1.2432.ga663e714 >> -- Thanks, Yu