All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: kkabe@vega.pgw.jp
Cc: bugzilla-daemon@bugzilla.kernel.org, akpm@linux-foundation.org,
	richardw.yang@linux.intel.com, david@redhat.com,
	mhocko@kernel.org, n-horiguchi@ah.jp.nec.com, linux-mm@kvack.org
Subject: Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add
Date: Mon, 17 Feb 2020 15:44:57 +0800	[thread overview]
Message-ID: <20200217074457.GB19207@MiWiFi-R3L-srv> (raw)
In-Reply-To: <200217144627.M0113305@vega.pgw.jp>

On 02/17/20 at 02:46pm, kkabe@vega.pgw.jp wrote:
> bhe@redhat.com sed in <20200212073123.GG8965@MiWiFi-R3L-srv>
> 
> >> On 02/11/20 at 04:41pm, Andrew Morton wrote:
> >> > On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:
> >> > 
> >> > > On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote:
> >> > > >On 02/10/20 at 02:09pm, Baoquan He wrote:
> >> > > >> On 02/09/20 at 09:56pm, Andrew Morton wrote:
> >> > > >> > On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He <bhe@redhat.com> wrote:
> >> > > >> > 
> >> > > >> > > Hi Andrew,
> >> > > >> > > 
> >> > > >> > > On 02/09/20 at 09:32pm, Andrew Morton wrote:
> >> > > >> > > > On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> >> > > >> > > > 
> >> > > >> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206401
> >> > > >> > > > > 
> >> > > >> > > > 
> >> > > >> > > > An oops during mem hotadd.  Could someone please take a look when
> >> > > >> > > > convenient?
> >> > > >> > > 
> >> > > >> > > This has been addressed by Wei Yang's patch, please check it here:
> >> > > >> > > 
> >> > > >> > > http://lkml.kernel.org/r/20200209104826.3385-7-bhe@redhat.com
> >> > > >> > > 
> >> > > >> > 
> >> > > >> > hm, OK, thanks.  It's unfortunate that a 5.5 fix is buried in a
> >> > > >> > six-patch series which is still in progress!  Can we please merge that
> >> > > >> > as a standalone fix with a cc:stable, Fixes:, etc?
> >> > > >
> >> > > >Maybe can add Fixes tag as follow when merge:
> >> > > >
> >> > > >Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> >> > > >
> >> > 
> >> > The reporter (cc'ed here) is still seeing issues:
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=206401
> >> > 
> >> > Could we please continue this investigation via emailed reply-to-all,
> >> > rather than via the bugzilla interface?
> >> 
> >> Yes, people prefer mailing list to discuss issues.
> 
> 
> I found perplexing behavior in populate_section_memmap().
> 
> populate_section_memmap() calls alloc_pages(), and if that fails,
> falls back to vmalloc().
> 
> But according to the trace, populate_section_memmap() seems to
> throw out the alloc_pages() result and always falls back to vmalloc(),
> which could be a wrong area to use.
> 
> I sprinkled pr_info() in mm/sparse.c:populate_section_memmap() as below:
> 
> ===========================================
> struct page * __meminit populate_section_memmap(unsigned long pfn,
>                 unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
> {
>         struct page *page, *ret;
>         unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
> 
>         page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
>         if (page) {
>                 goto got_map_page;
>         }
> pr_info("%s: alloc_pages() returned 0x%p (should be 0), reverting to vmalloc(memmap_size=%lu)\n", __func__, page, memmap_size);
> BUG_ON(page != 0);
> 
>         ret = vmalloc(memmap_size);
> pr_info("%s: vmalloc(%lu) returned 0x%p\n", __func__, memmap_size, ret);
>         if (ret) {
>                 goto got_map_ptr;
>         }
> 
>         return NULL;
> got_map_page:
>         ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
> pr_info("%s: allocated struct page *page=0x%p\n", __func__, page);
> got_map_ptr:
> 
> pr_info("%s: returning struct page * =0x%p\n", __func__, ret);
>         return ret;
> }
> ==================================================
> 
> and got a following panic.
> It even ignores BUG_ON() (perhaps optimized out).
> 
> Is this worth investigating?
> Disassembly doesn't reveal anything suspicious, but I have feeling that
> I'm looking at disassembly different than that the CPU is seeing.
> It's too trivial to be a compiler bug.
> 
> 
> ==================================================
> [root@localhost ~]# readelf -l /proc/kcore
> 
> Elf file type is CORE (Core file)
> Entry point 0x0
> There are 3 program headers, starting at offset 52
> 
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   NOTE           0x000094 0x00000000 0x00000000 0x01304 0x00000     0
>   LOAD           0xaff2000 0xcaff0000 0xffffffff 0x3400e000 0x3400e000 RWE 0x1000

This should be vmalloc area, the region covers [0xcaff0000, 0xcaff0000+0x3400e000]
					    [0xcaff0000, 0xfeffe000]

>   LOAD           0x002000 0xc0000000 0x00000000 0xa7f0000 0xa7f0000 RWE 0x1000
This should be the direct mapping starting from 0xc0000000, covers the boot memory
you set for guest kernel, 168M,             [0x0xc0000000, 0xca7f0000]

Since system only detects your boot memory, the max_pfn is 168M, so
VMALLOC_START = high_memory + VMALLOC_OFFSET;

So any hot added memory will be taken as high memory. Sorry, I have
forgot most of details of i386, these are just my rough understanding
about it.


> 
> 
> [  302.784196] hv_balloon: Max. dynamic memory size: 1048576 MB
> [  643.475080] hv_balloon: hv_mem_hot_add: calling add_memory(nid=0, ((start_pfn=0x10000) << PAGE_SHIFT)=0x10000000, (HA_CHUNK << PAGE_SHIFT)=134217728)
> [  643.513804] populate_section_memmap: alloc_pages() returned 0xb1a7c4b2 (should be 0), reverting to vmalloc(memmap_size=655360)

This pr_info is truly weird.

> [  643.513849] populate_section_memmap: vmalloc(655360) returned 0x11b0e715
> [  643.513872] populate_section_memmap: returning struct page * =0x11b0e715

But here the returned page address is 0x11b0e715, which is also bizarre.
Kernel address is above 3G, right?

> [  643.525352] populate_section_memmap: alloc_pages() returned 0xb1a7c4b2 (should be 0), reverting to vmalloc(memmap_size=655360)
> [  643.536698] populate_section_memmap: vmalloc(655360) returned 0xf2ba6510
> [  643.536722] populate_section_memmap: returning struct page * =0xf2ba6510

Here, the returned page address looks regular.

> [  643.536749] hv_balloon: hv_mem_hot_add: add_memory() returned 0
> [  645.394458] BUG: unable to handle page fault for address: d13ff000
> [  645.394518] #PF: supervisor write access in kernel mode
> [  645.394565] #PF: error_code(0x0002) - not-present page
> [  645.394584] *pde = 00000000
> [  645.394601] Oops: 0002 [#1] SMP
> [  645.394614] CPU: 0 PID: 361 Comm: systemd-udevd Not tainted 5.6.0-rc1.el8.i586 #1
> [  645.394636] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
> [  645.394670] EIP: wp_page_copy+0x8e/0x750
> [  645.394690] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29
> [  645.394739] EAX: d13ff000 EBX: c752df28 ECX: 00000000 EDX: c5e0d000
> [  645.394767] ESI: c5e0d000 EDI: d13ff004 EBP: c752deec ESP: c752dea8
> [  645.394790] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
> [  645.394815] CR0: 80050033 CR2: d13ff000 CR3: 08e5a000 CR4: 003406d0
> [  645.394840] Call Trace:
> [  645.394852]  ? reuse_swap_page+0x83/0x390
> [  645.394873]  do_wp_page+0x87/0x6e0
> [  645.394885]  handle_mm_fault+0x808/0xe30
> [  645.394893]  do_page_fault+0x19f/0x4d0
> [  645.394901]  ? do_kern_addr_fault+0x80/0x80
> [  645.394915]  common_exception_read_cr2+0x15a/0x15f
> [  645.394930] EIP: 0xb7aaf8bb
> [  645.394944] Code: 24 0c e3 2c 89 d7 83 e2 03 74 11 7a 04 aa 49 74 1f aa 49 74 1b 83 f2 01 75 02 aa 49 89 ca c1 e9 02 83 e2 03 69 c0 01 01 01 01 <f3> ab 89 d1 f3 aa 8b 44 24 08 5f c3 66 90 66 90 66 90 66 90 90 f3
> [  645.394973] EAX: 00000000 EBX: b7f05f60 ECX: 0000000d EDX: 00000000
> [  645.394988] ESI: 02194db4 EDI: 02194db4 EBP: b7f05db4 ESP: bffed978
> [  645.395003] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00210206
> [  645.395018] Modules linked in: rfkill intel_rapl_msr intel_rapl_common crc32_pclmul snd_pcm snd_timer snd soundcore intel_rapl_perf sg pcspkr hv_netvsc i2c_piix4 hyperv_fb hv_utils hv_balloon joydev ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod t10_pi ata_generic hyperv_keyboard hid_hyperv hv_storvsc scsi_transport_fc ata_piix crc32c_intel serio_raw hv_vmbus libata
> [  645.395101] CR2: 00000000d13ff000
> [  645.395121] ---[ end trace 3bb1d66cb8b20841 ]---
> [  645.395144] EIP: wp_page_copy+0x8e/0x750
> [  645.395157] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29
> [  645.395206] EAX: d13ff000 EBX: c752df28 ECX: 00000000 EDX: c5e0d000
> [  645.395235] ESI: c5e0d000 EDI: d13ff004 EBP: c752deec ESP: c752dea8
> [  645.395261] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
> [  645.395278] CR0: 80050033 CR2: d13ff000 CR3: 08e5a000 CR4: 003406d0
> [  645.395308] Kernel panic - not syncing: Fatal exception
> [  645.395329] Kernel Offset: 0x3e00000 from 0xc1000000 (relocation range: 0xc0000000-0xcafeffff)
> [  645.395354] ---[ end Kernel panic - not syncing: Fatal exception ]---
> ==================================================
> 



  reply	other threads:[~2020-02-17  7:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-206401-27@https.bugzilla.kernel.org/>
     [not found] ` <bug-206401-27-zYD8WfDKqD@https.bugzilla.kernel.org/>
2020-02-10  5:32   ` [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add Andrew Morton
2020-02-10  5:40     ` Baoquan He
2020-02-10  5:56       ` Andrew Morton
2020-02-10  6:09         ` Baoquan He
2020-02-10  6:15           ` Baoquan He
2020-02-10 23:07             ` Wei Yang
2020-02-12  0:41               ` Andrew Morton
2020-02-12  7:31                 ` Baoquan He
2020-02-12  8:21                   ` David Hildenbrand
2020-02-13  4:22                   ` [Bug 206401] kernel panic on Hyper-V after 5 minutes due tomemory hot-add kabe
2020-02-13  8:19                     ` Baoquan He
2020-02-14 14:26                       ` [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemory hot-add kkabe
2020-02-14 14:48                         ` Baoquan He
2020-02-14 15:01                           ` Baoquan He
2020-02-17  4:48                         ` Baoquan He
2020-02-17  5:31                           ` [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemoryhot-add kkabe
2020-02-17  8:00                             ` David Hildenbrand
2020-02-17 10:33                         ` [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemory hot-add Michal Hocko
2020-02-17 11:21                           ` [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add kkabe
2020-02-17  5:46                   ` kkabe
2020-02-17  7:44                     ` Baoquan He [this message]
2020-02-17  9:34                     ` Oscar Salvador
2020-02-17 10:13                       ` Baoquan He
2020-02-17 10:17                         ` Baoquan He
2020-02-17 10:24                         ` David Hildenbrand
2020-02-17 10:33                           ` Baoquan He
2020-02-17 10:38                             ` David Hildenbrand
2020-02-17 11:20                               ` Baoquan He
2020-02-17 12:47                                 ` Michal Hocko
2020-02-18  6:24                                 ` kkabe
2020-02-18  8:47                                   ` Michal Hocko
2020-02-18  9:19                                     ` kkabe
2020-02-18  9:26                                       ` David Hildenbrand
2020-02-18 10:05                                       ` [RFC PATCH] memory_hotplug: disable the functionality for 32b (was: Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes due to) " Michal Hocko
2020-02-18 10:11                                         ` David Hildenbrand
2020-02-19  3:23                                         ` Baoquan He
2020-02-19 21:46                                         ` Andrew Morton
2020-02-19 21:46                                           ` Andrew Morton
2020-02-19 23:07                                           ` [RFC PATCH] memory_hotplug: disable the functionality for 32b Robin Murphy
2020-02-19 23:07                                             ` Robin Murphy
2020-02-19  3:39                                   ` [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200217074457.GB19207@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=david@redhat.com \
    --cc=kkabe@vega.pgw.jp \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=richardw.yang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.