From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20A18C2BA83 for ; Thu, 13 Feb 2020 08:20:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD06924650 for ; Thu, 13 Feb 2020 08:20:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K29DXkFc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD06924650 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 637D66B0520; Thu, 13 Feb 2020 03:20:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E8A36B0521; Thu, 13 Feb 2020 03:20:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FF3C6B0522; Thu, 13 Feb 2020 03:20:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id 37FD06B0520 for ; Thu, 13 Feb 2020 03:20:01 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A7D8E180AD815 for ; Thu, 13 Feb 2020 08:20:00 +0000 (UTC) X-FDA: 76484405760.06.can42_917fd4f708139 X-HE-Tag: can42_917fd4f708139 X-Filterd-Recvd-Size: 9313 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Thu, 13 Feb 2020 08:19:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1581581999; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W5Y4HyC4PcnbidwYYUmkgE82G+CHTQTsUJhbBf8+Ako=; b=K29DXkFcsEyFTZbnlzwXxuqt+hED2/7kLLClQmrlKzMzQusaXMJqcUWb6h4FPcqgWzum0m Nb7TG5oWjx12yz8wR6omxDoq6HOa/XerhFaAaITsYplzi5VGUxXi0CCUK/ws0VCRJG9Hc5 91TQHydS3po4+vHPRuF1P3+MlD8yWQs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-244-92YfdHlgOAykGeuHmJ_MeA-1; Thu, 13 Feb 2020 03:19:49 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1F581800D4C; Thu, 13 Feb 2020 08:19:48 +0000 (UTC) Received: from localhost (ovpn-12-47.pek2.redhat.com [10.72.12.47]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A6F0627181; Thu, 13 Feb 2020 08:19:44 +0000 (UTC) Date: Thu, 13 Feb 2020 16:19:41 +0800 From: Baoquan He To: kabe@vega.pgw.jp Cc: bugzilla-daemon@bugzilla.kernel.org, akpm@linux-foundation.org, richardw.yang@linux.intel.com, david@redhat.com, mhocko@kernel.org, n-horiguchi@ah.jp.nec.com, linux-mm@kvack.org, kkabe@vega.pgw.jp Subject: Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes due tomemory hot-add Message-ID: <20200213081941.GA19207@MiWiFi-R3L-srv> References: <20200212073123.GG8965@MiWiFi-R3L-srv> <200213132206.M0106897@vega.pgw.jp> MIME-Version: 1.0 In-Reply-To: <200213132206.M0106897@vega.pgw.jp> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: 92YfdHlgOAykGeuHmJ_MeA-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02/13/20 at 01:22pm, kabe@vega.pgw.jp wrote: > bhe@redhat.com sed in <20200212073123.GG8965@MiWiFi-R3L-srv> >=20 > >> On 02/11/20 at 04:41pm, Andrew Morton wrote: > >> > On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang wrote: > >> >=20 > >> > > On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote: > >> > > >On 02/10/20 at 02:09pm, Baoquan He wrote: > >> > > >> On 02/09/20 at 09:56pm, Andrew Morton wrote: > >> > > >> > On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He wrote: > >> > > >> >=20 > >> > > >> > > Hi Andrew, > >> > > >> > >=20 > >> > > >> > > On 02/09/20 at 09:32pm, Andrew Morton wrote: > >> > > >> > > > On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon@bugzil= la.kernel.org wrote: > >> > > >> > > >=20 > >> > > >> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=3D206401 > >> > > >> > > > >=20 > >> > > >> > > >=20 > >> > > >> > > > An oops during mem hotadd. Could someone please take a l= ook when > >> > > >> > > > convenient? > >> > > >> > >=20 > >> > > >> > > This has been addressed by Wei Yang's patch, please check i= t here: > >> > > >> > >=20 > >> > > >> > > http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.c= om > >> > > >> > >=20 > >> > > >> >=20 > >> > > >> > hm, OK, thanks. It's unfortunate that a 5.5 fix is buried in= a > >> > > >> > six-patch series which is still in progress! Can we please m= erge that > >> > > >> > as a standalone fix with a cc:stable, Fixes:, etc? > >> > > > > >> > > >Maybe can add Fixes tag as follow when merge: > >> > > > > >> > > >Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > >> > > > > >> >=20 > >> > The reporter (cc'ed here) is still seeing issues: > >> > https://bugzilla.kernel.org/show_bug.cgi?id=3D206401 > >> >=20 > >> > Could we please continue this investigation via emailed reply-to-all= , > >> > rather than via the bugzilla interface? > >>=20 > >> Yes, people prefer mailing list to discuss issues. > >>=20 > >> Hi T.Kabe,=20 > >>=20 > >> Could you provide the call trace again after below patch is applied? > >> The comment #9 in bugzilla is not very clear to me. > >>=20 > >> mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com > >>=20 > >> And, as you said, applying above patch, and do not call > >> __free_pages_core() in generic_online_page() will work. I doubt it, > >> because without __free_pages_core(), your added pages are not added > >> into buddy for managing. I think we should make clear this problem > >> firstly, in order not to introduce new problem by improper work around= , > >> then check next. > >>=20 > >> Thanks > >> Baoquan >=20 > Got it, I restarted off fresh from kernel-5.6-rc1, > applied patch > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com > and got the following panic. >=20 > Diag printk's for add_memory() et al is not there, but I guess > memory hot-add request from hypervisor is returning "success",=20 > corrupting something else and bombing out later. >=20 >=20 > [ 24.289967] Not activating Mandatory Access Control as /sbin/tomoyo-in= it does not exist. > [ 302.263730] hv_balloon: Max. dynamic memory size: 1048576 MB > [ 635.216014] BUG: unable to handle page fault for address: d13ff000 > [ 635.216058] #PF: supervisor write access in kernel mode > [ 635.216076] #PF: error_code(0x0002) - not-present page > [ 635.216106] *pde =3D 00000000 Thanks for the info. What ARCH is your system? Could you attach your kernel config and paste the output of executing 'readelf /proc/kcore'? The pmd entry is not filled, I want to check which address range the kernel is acessing, and please attach the log of dmesg. Probably it's hot added page area, I guess, since this time the preceding trace is different with comment #9. > [ 635.216139] Oops: 0002 [#1] SMP > [ 635.216171] CPU: 0 PID: 470 Comm: systemd-udevd Not tainted 5.6.0-rc1.= el8.i586 #1 > [ 635.216199] Hardware name: Microsoft Corporation Virtual Machine/Virtu= al Machine, BIOS 090006 05/23/2012 > [ 635.216233] EIP: wp_page_copy+0x8e/0x750 > [ 635.216253] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e= 5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <= 89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29 > [ 635.216293] EAX: d13ff000 EBX: c3743f28 ECX: 00000000 EDX: c10c9000 > [ 635.216314] ESI: c10c9000 EDI: d13ff004 EBP: c3743eec ESP: c3743ea8 > [ 635.216336] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210= 282 > [ 635.216368] CR0: 80050033 CR2: d13ff000 CR3: 03add000 CR4: 003406d0 > [ 635.216389] Call Trace: > [ 635.216407] ? reuse_swap_page+0x83/0x390 > [ 635.216425] do_wp_page+0x87/0x6e0 > [ 635.216438] ? __do_sys_fstat64+0x4a/0x60 > [ 635.216453] handle_mm_fault+0x808/0xe30 > [ 635.216468] do_page_fault+0x19f/0x4d0 > [ 635.216484] ? do_kern_addr_fault+0x80/0x80 > [ 635.216500] common_exception_read_cr2+0x15a/0x15f > [ 635.216521] EIP: 0xb7b28104 > [ 635.216538] Code: 29 f9 89 4c 24 10 83 f9 0f 0f 86 92 00 00 00 8b 45 4= 0 8d 14 3e 8b 4c 24 0c 39 48 0c 75 74 8b 4c 24 0c 81 7c 24 10 ef 03 00 00 <= 89> 42 08 89 4a 0c 89 55 40 89 50 0c 76 0e c7 42 10 00 00 00 00 c7 > [ 635.216591] EAX: b7c4e7d8 EBX: 000011a0 ECX: b7c4e7d8 EDX: 01994178 > [ 635.216606] ESI: 01993168 EDI: 00001010 EBP: b7c4e7a0 ESP: bfcc9f00 > [ 635.216628] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00210= 293 > [ 635.216661] Modules linked in: rfkill intel_rapl_msr intel_rapl_common= snd_pcm snd_timer snd soundcore crc32_pclmul intel_rapl_perf sg pcspkr hv_= netvsc joydev i2c_piix4 hyperv_fb hv_utils hv_balloon ip_tables ext4 mbcach= e jbd2 sd_mod t10_pi sr_mod cdrom ata_generic hyperv_keyboard hid_hyperv hv= _storvsc scsi_transport_fc ata_piix crc32c_intel serio_raw hv_vmbus libata > [ 635.216758] CR2: 00000000d13ff000 > [ 635.216769] ---[ end trace dee4a93859538102 ]--- > [ 635.216785] EIP: wp_page_copy+0x8e/0x750 > [ 635.216811] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e= 5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <= 89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29 > [ 635.216847] EAX: d13ff000 EBX: c3743f28 ECX: 00000000 EDX: c10c9000 > [ 635.216864] ESI: c10c9000 EDI: d13ff004 EBP: c3743eec ESP: c3743ea8 > [ 635.216883] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210= 282 > [ 635.216899] CR0: 80050033 CR2: d13ff000 CR3: 03add000 CR4: 003406d0 > [ 635.216914] Kernel panic - not syncing: Fatal exception > [ 635.216926] Kernel Offset: 0x1400000 from 0xc1000000 (relocation range= : 0xc0000000-0xcafeffff) > [ 635.216946] ---[ end Kernel panic - not syncing: Fatal exception ]--- >=20 > --=20 > kabe >=20