From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E62DEC432BE for ; Thu, 19 Aug 2021 16:48:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE2E8610A5 for ; Thu, 19 Aug 2021 16:48:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230472AbhHSQtD (ORCPT ); Thu, 19 Aug 2021 12:49:03 -0400 Received: from mail.kernel.org ([198.145.29.99]:33182 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229981AbhHSQs7 (ORCPT ); Thu, 19 Aug 2021 12:48:59 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id B7894610CE; Thu, 19 Aug 2021 16:48:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1629391702; bh=VywLRm1/tdvHfSxSGMq2jgjuueUReNkU2eePtg+8HcU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=TjqeStQhUoF9X7uK71xfNCYrRu8p7K6xnOeY/Oi1xFF6r4hXn1ncChvXpG5CBUqBj O7A6ZEfyZnQXLMIp4ktZkq++DWMObp3jRxHq1/14D1IsmxGG43Ir7GTTxPHQARAc5M Ek2n5Msa44LPf667H1V+bW2SWKTJkOdwnorkzMLr1swVjWqM6nVnRxwzZSf8blPGMc KGn+eycISOM9PpqWL3xChznkGez+UJBZQ20GaOpSegNHMSqPNspAU+Myv81Km3PmHv 0arNPHaGy/mdmqpwpC8617hWSwi2e2cUOwE7Sr6asMcThD77JLcC5T33M0wg8aWv2L 77ZKwgbUoZ8Tg== Date: Thu, 19 Aug 2021 09:48:22 -0700 From: "Darrick J. Wong" To: Xu Yu Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hch@infradead.org, riteshh@linux.ibm.com, tytso@mit.edu, gavin.dg@linux.alibaba.com Subject: Re: [PATCH v2] mm/swap: consider max pages in iomap_swapfile_add_extent Message-ID: <20210819164822.GE12597@magnolia> References: <6dac22b2a8a22254fc538054cb42a32e53d2482b.1629355682.git.xuyu@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6dac22b2a8a22254fc538054cb42a32e53d2482b.1629355682.git.xuyu@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Thu, Aug 19, 2021 at 02:52:05PM +0800, Xu Yu wrote: > When the max pages (last_page in the swap header + 1) is smaller than > the total pages (inode size) of the swapfile, iomap_swapfile_activate > overwrites sis->max with total pages. > > However, frontswap_map is allocated using max pages. When test and clear > the sis offset, which is larger than max pages, of frontswap_map in > __frontswap_invalidate_page(), neighbors of frontswap_map may be > overwritten, i.e., slab is polluted. > > This fixes the issue by considering the limitation of max pages of swap > info in iomap_swapfile_add_extent(). > > To reproduce the case, compile kernel with slub RED ZONE, then run test: > $ sudo stress-ng -a 1 -x softlockup,resources -t 1h --metrics --times \ > --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \ > --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \ > --temp-path /root/tmpdir/stress-ng/ > > We'll get the following error log usually within 5 minutes. > > [ 1151.015141] ============================================================================= > [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten > [ 1151.017486] ----------------------------------------------------------------------------- > [ 1151.017486] > [ 1151.018997] Disabling lock debugging due to kernel taint > [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc > [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816 > [ 1151.022538] __slab_alloc+0xe/0x20 > [ 1151.023069] __kmalloc_node+0xfd/0x4b0 > [ 1151.023704] __do_sys_swapon+0xcf6/0x1170 > [ 1151.024346] do_syscall_64+0x33/0x40 > [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041 > [ 1151.026889] kfree+0x276/0x2b0 > [ 1151.027405] put_cred_rcu+0xa1/0xc0 > [ 1151.027949] rcu_do_batch+0x17d/0x410 > [ 1151.028566] rcu_core+0x14e/0x2b0 > [ 1151.029084] __do_softirq+0x101/0x29e > [ 1151.029645] asm_call_irq_on_stack+0x12/0x20 > [ 1151.030381] do_softirq_own_stack+0x37/0x40 > [ 1151.031037] do_softirq.part.15+0x2b/0x30 > [ 1151.031710] __local_bh_enable_ip+0x4b/0x50 > [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360 > [ 1151.033197] __setup_rt_frame+0xce/0x480 > [ 1151.033809] arch_do_signal+0x1a3/0x250 > [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110 > [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190 > [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201 > [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000 > [ 1151.038323] > [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ > [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........ > [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ > [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7 > [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 > [ 1151.048633] Call Trace: > [ 1151.049072] dump_stack+0x57/0x6a > [ 1151.049585] check_bytes_and_report+0xed/0x110 > [ 1151.050320] check_object+0x1eb/0x290 > [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540 > [ 1151.051646] free_debug_processing+0x151/0x350 > [ 1151.052333] __slab_free+0x21a/0x3a0 > [ 1151.052938] ? _cond_resched+0x2d/0x40 > [ 1151.053529] ? __vunmap+0x1de/0x220 > [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540 > [ 1151.054796] ? kfree+0x276/0x2b0 > [ 1151.055307] kfree+0x276/0x2b0 > [ 1151.055832] __x64_sys_swapoff+0x39a/0x540 > [ 1151.056466] do_syscall_64+0x33/0x40 > [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 1151.057866] RIP: 0033:0x150340b0ffb7 > [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d. > [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8 > [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7 > [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860 > [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd > [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094 > [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c > [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc > [ 1151.066413] > [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed > > Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap") > Signed-off-by: Gang Deng > Signed-off-by: Xu Yu > Reviewed-by: Darrick J. Wong Applied, thanks. I've reworked the second paragraph slightly: "However, frontswap_map is a swap page state bitmap allocated using the initial sis->max page count read from the swap header. If swapfile activation increases sis->max, it's possible for the frontswap code to walk off the end of the bitmap, thereby corrupting kernel memory." --D > --- > v1->v2: > - update the commit log > - add Reviewed-by of Darrick J. Wong > --- > fs/iomap/swapfile.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c > index a5e478de1417..2ceea45aefd8 100644 > --- a/fs/iomap/swapfile.c > +++ b/fs/iomap/swapfile.c > @@ -30,11 +30,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) > { > struct iomap *iomap = &isi->iomap; > unsigned long nr_pages; > + unsigned long max_pages; > uint64_t first_ppage; > uint64_t first_ppage_reported; > uint64_t next_ppage; > int error; > > + if (unlikely(isi->nr_pages >= isi->sis->max)) > + return 0; > + max_pages = isi->sis->max - isi->nr_pages; > + > /* > * Round the start up and the end down so that the physical > * extent aligns to a page boundary. > @@ -47,6 +52,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) > if (first_ppage >= next_ppage) > return 0; > nr_pages = next_ppage - first_ppage; > + nr_pages = min(nr_pages, max_pages); > > /* > * Calculate how much swap space we're adding; the first page contains > -- > 2.20.1.2432.ga663e714 >