bhe@redhat.com sed in <20200213081941.GA19207@MiWiFi-R3L-srv> >> On 02/13/20 at 01:22pm, kabe@vega.pgw.jp wrote: >> > bhe@redhat.com sed in <20200212073123.GG8965@MiWiFi-R3L-srv> >> > >> > >> On 02/11/20 at 04:41pm, Andrew Morton wrote: >> > >> > On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang wrote: >> > >> > >> > >> > > On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote: >> > >> > > >On 02/10/20 at 02:09pm, Baoquan He wrote: >> > >> > > >> On 02/09/20 at 09:56pm, Andrew Morton wrote: >> > >> > > >> > On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He wrote: >> > >> > > >> > >> > >> > > >> > > Hi Andrew, >> > >> > > >> > > >> > >> > > >> > > On 02/09/20 at 09:32pm, Andrew Morton wrote: >> > >> > > >> > > > On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: >> > >> > > >> > > > >> > >> > > >> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206401 >> > >> > > >> > > > > >> > >> > > >> > > > >> > >> > > >> > > > An oops during mem hotadd. Could someone please take a look when >> > >> > > >> > > > convenient? >> > >> > > >> > > >> > >> > > >> > > This has been addressed by Wei Yang's patch, please check it here: >> > >> > > >> > > >> > >> > > >> > > http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com >> > >> > > >> > > >> > >> > > >> > >> > >> > > >> > hm, OK, thanks. It's unfortunate that a 5.5 fix is buried in a >> > >> > > >> > six-patch series which is still in progress! Can we please merge that >> > >> > > >> > as a standalone fix with a cc:stable, Fixes:, etc? >> > >> > > > >> > >> > > >Maybe can add Fixes tag as follow when merge: >> > >> > > > >> > >> > > >Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >> > >> > > > >> > >> > >> > >> > The reporter (cc'ed here) is still seeing issues: >> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=206401 >> > >> > >> > >> > Could we please continue this investigation via emailed reply-to-all, >> > >> > rather than via the bugzilla interface? >> > >> >> > >> Yes, people prefer mailing list to discuss issues. >> > >> >> > >> Hi T.Kabe, >> > >> >> > >> Could you provide the call trace again after below patch is applied? >> > >> The comment #9 in bugzilla is not very clear to me. >> > >> >> > >> mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM >> > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com >> > >> >> > >> And, as you said, applying above patch, and do not call >> > >> __free_pages_core() in generic_online_page() will work. I doubt it, >> > >> because without __free_pages_core(), your added pages are not added >> > >> into buddy for managing. I think we should make clear this problem >> > >> firstly, in order not to introduce new problem by improper work around, >> > >> then check next. >> > >> >> > >> Thanks >> > >> Baoquan >> > >> > Got it, I restarted off fresh from kernel-5.6-rc1, >> > applied patch >> > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com >> > and got the following panic. >> > >> > Diag printk's for add_memory() et al is not there, but I guess >> > memory hot-add request from hypervisor is returning "success", >> > corrupting something else and bombing out later. >> > >> > >> > [ 24.289967] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist. >> > [ 302.263730] hv_balloon: Max. dynamic memory size: 1048576 MB >> > [ 635.216014] BUG: unable to handle page fault for address: d13ff000 >> > [ 635.216058] #PF: supervisor write access in kernel mode >> > [ 635.216076] #PF: error_code(0x0002) - not-present page >> > [ 635.216106] *pde = 00000000 >> >> Thanks for the info. What ARCH is your system? Could you attach your >> kernel config and paste the output of executing 'readelf /proc/kcore'? Arch is i386(i586), non-PAE. I'll attach the "readelf -a /proc/kcore", dmesg and .config . The stack trace is different this time also; it seems to have slightly difference panic trace every time after handle_mm_fault(). I've temporary added pr_info() before and after add_memory() in hv_baloon.ko, so it says it's taining the kernel. add_memory() itself is returning 0 (success). >> The pmd entry is not filled, I want to check which address range the kernel >> is acessing, and please attach the log of dmesg. Probably it's hot added >> page area, I guess, since this time the preceding trace is different >> with comment #9. >> >> > [ 635.216139] Oops: 0002 [#1] SMP >> > [ 635.216171] CPU: 0 PID: 470 Comm: systemd-udevd Not tainted 5.6.0-rc1.el8.i586 #1 >> > [ 635.216199] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012 >> > [ 635.216233] EIP: wp_page_copy+0x8e/0x750 >> > [ 635.216253] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29 >> > [ 635.216293] EAX: d13ff000 EBX: c3743f28 ECX: 00000000 EDX: c10c9000 >> > [ 635.216314] ESI: c10c9000 EDI: d13ff004 EBP: c3743eec ESP: c3743ea8 >> > [ 635.216336] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282 >> > [ 635.216368] CR0: 80050033 CR2: d13ff000 CR3: 03add000 CR4: 003406d0 >> > [ 635.216389] Call Trace: >> > [ 635.216407] ? reuse_swap_page+0x83/0x390 >> > [ 635.216425] do_wp_page+0x87/0x6e0 >> > [ 635.216438] ? __do_sys_fstat64+0x4a/0x60 >> > [ 635.216453] handle_mm_fault+0x808/0xe30 >> > [ 635.216468] do_page_fault+0x19f/0x4d0 >> > [ 635.216484] ? do_kern_addr_fault+0x80/0x80 >> > [ 635.216500] common_exception_read_cr2+0x15a/0x15f >> > [ 635.216521] EIP: 0xb7b28104 [redacted]