From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755883AbdKCJ1H (ORCPT ); Fri, 3 Nov 2017 05:27:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:58959 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755802AbdKCJ1F (ORCPT ); Fri, 3 Nov 2017 05:27:05 -0400 Date: Fri, 3 Nov 2017 10:27:03 +0100 From: Michal Hocko To: Pavel Tatashin Cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 1/1] mm: buddy page accessed before initialized Message-ID: <20171103092703.63qyafmg7rnpoqab@dhcp22.suse.cz> References: <20171102170221.7401-1-pasha.tatashin@oracle.com> <20171102170221.7401-2-pasha.tatashin@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171102170221.7401-2-pasha.tatashin@oracle.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 02-11-17 13:02:21, Pavel Tatashin wrote: > This problem is seen when machine is rebooted after kexec: > A message like this is printed: > ========================================================================== > WARNING: CPU: 21 PID: 249 at linux/lib/list_debug.c:53__listd+0x83/0xa0 > Modules linked in: > CPU: 21 PID: 249 Comm: pgdatinit0 Not tainted 4.14.0-rc6_pt_deferred #90 > Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, > BIOS 3016 > node 1 initialised, 32444607 pages in 1679ms > task: ffff880180e75a00 task.stack: ffffc9000cdb0000 > RIP: 0010:__list_del_entry_valid+0x83/0xa0 > RSP: 0000:ffffc9000cdb3d18 EFLAGS: 00010046 > RAX: 0000000000000054 RBX: 0000000000000009 RCX: ffffffff81c5f3e8 > RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000046 > RBP: ffffc9000cdb3d18 R08: 00000000fffffffe R09: 0000000000000154 > R10: 0000000000000005 R11: 0000000000000153 R12: 0000000001fcdc00 > R13: 0000000001fcde00 R14: ffff88207ffded00 R15: ffffea007f370000 > FS: 0000000000000000(0000) GS:ffff881fffac0000(0000) knlGS:0 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000407ec09001 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > free_one_page+0x103/0x390 > __free_pages_ok+0x1cf/0x2d0 > __free_pages+0x19/0x30 > __free_pages_boot_core+0xae/0xba > deferred_free_range+0x60/0x94 > deferred_init_memmap+0x324/0x372 > kthread+0x109/0x140 > ? __free_pages_bootmem+0x2e/0x2e > ? kthread_park+0x60/0x60 > ret_from_fork+0x25/0x30 > > list_del corruption. next->prev should be ffffea007f428020, but was > ffffea007f1d8020 > ========================================================================== > > The problem happens in this path: > > page_alloc_init_late > deferred_init_memmap > deferred_init_range > __def_free > deferred_free_range > __free_pages_boot_core(page, order) > __free_pages() > __free_pages_ok() > free_one_page() > __free_one_page(page, pfn, zone, order, migratetype); > > deferred_init_range() initializes one page at a time by calling > __init_single_page(), once it initializes pageblock_nr_pages pages, it > calls deferred_free_range() to free the initialized pages to the buddy > allocator. Eventually, we reach __free_one_page(), where we compute buddy > page: > buddy_pfn = __find_buddy_pfn(pfn, order); > buddy = page + (buddy_pfn - pfn); > > buddy_pfn is computed as pfn ^ (1 << order), or pfn + pageblock_nr_pages. > Thefore, buddy page becomes a page one after the range that currently was > initialized, and we access this page in this function. Also, later when we > return back to deferred_init_range(), the buddy page is initialized again. > > So, in order to avoid this issue, we must initialize the buddy page prior > to calling deferred_free_range(). Have you measured any negative performance impact with this change? > Signed-off-by: Pavel Tatashin The patch looks good to me otherwise. So if this doesn't introduce a noticeable overhead, which I whope it doesn't then feel free to add Acked-by: Michal Hocko -- Michal Hocko SUSE Labs