From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B3EAC2BA83 for ; Fri, 14 Feb 2020 15:04:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E27C24649 for ; Fri, 14 Feb 2020 15:04:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="c93AgrMz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E27C24649 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B07EA6B062F; Fri, 14 Feb 2020 10:04:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A931C6B0631; Fri, 14 Feb 2020 10:04:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95AF06B0632; Fri, 14 Feb 2020 10:04:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0165.hostedemail.com [216.40.44.165]) by kanga.kvack.org (Postfix) with ESMTP id 7B1D66B062F for ; Fri, 14 Feb 2020 10:04:17 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 29C9211819 for ; Fri, 14 Feb 2020 15:04:17 +0000 (UTC) X-FDA: 76489053354.23.cough31_47f7ec756b84e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id ACE90417DA for ; Fri, 14 Feb 2020 15:01:35 +0000 (UTC) X-HE-Tag: cough31_47f7ec756b84e X-Filterd-Recvd-Size: 7578 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Feb 2020 15:01:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1581692493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=USNL+eMultMEQue+rnxoGmm9nVZ2uvOPS6DPmkVl6wc=; b=c93AgrMzqJ64TT20pU4L2KiW0zBB4dDCQH1p57ytphJ60FRyUu50EceGxTB19WnfrVsU7b sr4sBFcfOdrKOmM5aQM6GhfMh6rIzx3QaIXIFmfuNHfD8VWPgQ7Mxjw+2fOyF85xyG83gj Ys/7f46dBQ29neZ93My4ApWRASlRPIo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-48-KzQ_FlvZObyP0rq-qr5gbQ-1; Fri, 14 Feb 2020 10:01:26 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4AAEF13E5; Fri, 14 Feb 2020 15:01:25 +0000 (UTC) Received: from localhost (ovpn-12-245.pek2.redhat.com [10.72.12.245]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EDFAA19C70; Fri, 14 Feb 2020 15:01:21 +0000 (UTC) Date: Fri, 14 Feb 2020 23:01:17 +0800 From: Baoquan He To: kkabe@vega.pgw.jp Cc: bugzilla-daemon@bugzilla.kernel.org, akpm@linux-foundation.org, richardw.yang@linux.intel.com, david@redhat.com, mhocko@kernel.org, n-horiguchi@ah.jp.nec.com, linux-mm@kvack.org Subject: Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemory hot-add Message-ID: <20200214150117.GK26758@MiWiFi-R3L-srv> References: <20200213081941.GA19207@MiWiFi-R3L-srv> <200214232629.M0108877@vega.pgw.jp> <20200214144857.GA4816@MiWiFi-R3L-srv> MIME-Version: 1.0 In-Reply-To: <20200214144857.GA4816@MiWiFi-R3L-srv> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: KzQ_FlvZObyP0rq-qr5gbQ-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02/14/20 at 10:48pm, Baoquan He wrote: > On 02/14/20 at 11:26pm, kkabe@vega.pgw.jp wrote: > > bhe@redhat.com sed in <20200213081941.GA19207@MiWiFi-R3L-srv> > >=20 > > >> On 02/13/20 at 01:22pm, kabe@vega.pgw.jp wrote: > > >> > bhe@redhat.com sed in <20200212073123.GG8965@MiWiFi-R3L-srv> > > >> >=20 > > >> > >> On 02/11/20 at 04:41pm, Andrew Morton wrote: > > >> > >> > On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang wrote: > > >> > >> >=20 > > >> > >> > > On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote: > > >> > >> > > >On 02/10/20 at 02:09pm, Baoquan He wrote: > > >> > >> > > >> On 02/09/20 at 09:56pm, Andrew Morton wrote: > > >> > >> > > >> > On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He wrote: > > >> > >> > > >> >=20 > > >> > >> > > >> > > Hi Andrew, > > >> > >> > > >> > >=20 > > >> > >> > > >> > > On 02/09/20 at 09:32pm, Andrew Morton wrote: > > >> > >> > > >> > > > On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon= @bugzilla.kernel.org wrote: > > >> > >> > > >> > > >=20 > > >> > >> > > >> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=3D20= 6401 > > >> > >> > > >> > > > >=20 > > >> > >> > > >> > > >=20 > > >> > >> > > >> > > > An oops during mem hotadd. Could someone please t= ake a look when > > >> > >> > > >> > > > convenient? > > >> > >> > > >> > >=20 > > >> > >> > > >> > > This has been addressed by Wei Yang's patch, please = check it here: > > >> > >> > > >> > >=20 > > >> > >> > > >> > > http://lkml.kernel.org/r/20200209104826.3385-7-bhe@r= edhat.com > > >> > >> > > >> > >=20 > > >> > >> > > >> >=20 > > >> > >> > > >> > hm, OK, thanks. It's unfortunate that a 5.5 fix is bu= ried in a > > >> > >> > > >> > six-patch series which is still in progress! Can we p= lease merge that > > >> > >> > > >> > as a standalone fix with a cc:stable, Fixes:, etc? > > >> > >> > > > > > >> > >> > > >Maybe can add Fixes tag as follow when merge: > > >> > >> > > > > > >> > >> > > >Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section ho= tplug") > > >> > >> > > > > > >> > >> >=20 > > >> > >> > The reporter (cc'ed here) is still seeing issues: > > >> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=3D206401 > > >> > >> >=20 > > >> > >> > Could we please continue this investigation via emailed reply= -to-all, > > >> > >> > rather than via the bugzilla interface? > > >> > >>=20 > > >> > >> Yes, people prefer mailing list to discuss issues. > > >> > >>=20 > > >> > >> Hi T.Kabe,=20 > > >> > >>=20 > > >> > >> Could you provide the call trace again after below patch is app= lied? > > >> > >> The comment #9 in bugzilla is not very clear to me. > > >> > >>=20 > > >> > >> mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM > > >> > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com > > >> > >>=20 > > >> > >> And, as you said, applying above patch, and do not call > > >> > >> __free_pages_core() in generic_online_page() will work. I doubt= it, > > >> > >> because without __free_pages_core(), your added pages are not a= dded > > >> > >> into buddy for managing. I think we should make clear this prob= lem > > >> > >> firstly, in order not to introduce new problem by improper work= around, > > >> > >> then check next. > > >> > >>=20 > > >> > >> Thanks > > >> > >> Baoquan > > >> >=20 > > >> > Got it, I restarted off fresh from kernel-5.6-rc1, > > >> > applied patch > > >> > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com > > >> > and got the following panic. > > >> >=20 > > >> > Diag printk's for add_memory() et al is not there, but I guess > > >> > memory hot-add request from hypervisor is returning "success",=20 > > >> > corrupting something else and bombing out later. > > >> >=20 > > >> >=20 > > >> > [ 24.289967] Not activating Mandatory Access Control as /sbin/to= moyo-init does not exist. > > >> > [ 302.263730] hv_balloon: Max. dynamic memory size: 1048576 MB > > >> > [ 635.216014] BUG: unable to handle page fault for address: d13ff= 000 > > >> > [ 635.216058] #PF: supervisor write access in kernel mode > > >> > [ 635.216076] #PF: error_code(0x0002) - not-present page > > >> > [ 635.216106] *pde =3D 00000000 > > >>=20 > > >> Thanks for the info. What ARCH is your system? Could you attach you= r > > >> kernel config and paste the output of executing 'readelf /proc/kcore= '? > >=20 > > Arch is i386(i586), non-PAE. > >=20 > > I'll attach the "readelf -a /proc/kcore", dmesg and .config . > > The stack trace is different this time also; > > it seems to have slightly difference panic trace every time=20 > > after handle_mm_fault(). >=20 > Sorry, I didn't say it clearly. 'readelf -l /proc/kcore' is OK, and the > relevant call trace. No need to provide them, can find them from the 'readelf -a'. Will check and see if I can find anything. Thanks for the info. >=20 > >=20 > > I've temporary added pr_info() before and after add_memory() in hv_balo= on.ko, > > so it says it's taining the kernel. > > add_memory() itself is returning 0 (success). > >=20 > >=20 >=20 >=20