bhe@redhat.com sed in <20200217112054.GA9823@MiWiFi-R3L-srv> >> On 02/17/20 at 11:38am, David Hildenbrand wrote: >> > On 17.02.20 11:33, Baoquan He wrote: >> > > On 02/17/20 at 11:24am, David Hildenbrand wrote: >> > >> On 17.02.20 11:13, Baoquan He wrote: >> > >>> On 02/17/20 at 10:34am, Oscar Salvador wrote: >> > >>>> On Mon, Feb 17, 2020 at 02:46:27PM +0900, kkabe@vega.pgw.jp wrote: >> > >>>>> =========================================== >> > >>>>> struct page * __meminit populate_section_memmap(unsigned long pfn, >> > >>>>> unsigned long nr_pages, int nid, struct vmem_altmap *altmap) >> > >>>>> { >> > >>>>> struct page *page, *ret; >> > >>>>> unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION; >> > >>>>> >> > >>>>> page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size)); >> > >>>>> if (page) { >> > >>>>> goto got_map_page; >> > >>>>> } >> > >>>>> pr_info("%s: alloc_pages() returned 0x%p (should be 0), reverting to vmalloc(memmap_size=%lu)\n", __func__, page, memmap_size); >> > >>>>> BUG_ON(page != 0); >> > >>>>> >> > >>>>> ret = vmalloc(memmap_size); >> > >>>>> pr_info("%s: vmalloc(%lu) returned 0x%p\n", __func__, memmap_size, ret); >> > >>>>> if (ret) { >> > >>>>> goto got_map_ptr; >> > >>>>> } >> > >>>>> >> > >>>>> return NULL; >> > >>>>> got_map_page: >> > >>>>> ret = (struct page *)pfn_to_kaddr(page_to_pfn(page)); >> > >>>>> pr_info("%s: allocated struct page *page=0x%p\n", __func__, page); >> > >>>>> got_map_ptr: >> > >>>>> >> > >>>>> pr_info("%s: returning struct page * =0x%p\n", __func__, ret); >> > >>>>> return ret; >> > >>>>> } >> > >>>> >> > >>>> Could you please replace %p with %px. Wih the first, pointers are hashed so it is trickier >> > >>>> to get an overview of the meaning. >> > >>>> >> > >>>> David could be right about ZONE_NORMAL vs ZONE_HIGHMEM. >> > >>>> IIUC, default_kernel_zone_for_pfn and default_zone_for_pfn seem to only deal with >> > >>>> (ZONE_DMA,ZONE_NORMAL] or ZONE_MOVABLE. >> > >>> >> > >>> Ah, I think you both have spotted the problem. >> > >>> >> > >>> In i386, if w/o momory hot add, normal memory will only include those >> > >>> below 896M and they are added into normal zone. The left are added into >> > >>> highmem zone. >> > >>> >> > >>> How this influence the page allocation? >> > >>> >> > >>> Very huge. As we know, in i386, normal memory can be accessed with >> > >>> virt_to_phys, namely PAGE_OFFSET + phys. But highmem has to be accessed >> > >>> with kmap. However, the later hot added memory are all put into normal >> > >>> memmory, accessing into them will stump into vmalloc area, I would say. >> > >>> >> > >>> So, i386 doesn't support memory hot add well. Not sure if below change >> > >>> can make it work normally. >> > >>> >> >> Please try below code instead, see if it works. However, as David and >> and Michal said in other reply, if no real use case, we may not be so >> eager to support mem hotplug on i386. >> >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 475d0d68a32c..9faf47bd026e 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -715,15 +715,20 @@ static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn >> { >> struct pglist_data *pgdat = NODE_DATA(nid); >> int zid; >> + enum zone_type default_zone = ZONE_NORMAL; >> >> - for (zid = 0; zid <= ZONE_NORMAL; zid++) { >> +#ifdef CONFIG_HIGHMEM >> + default_zone = ZONE_HIGHMEM; >> +#endif >> + >> + for (zid = 0; zid <= default_zone; zid++) { >> struct zone *zone = &pgdat->node_zones[zid]; >> >> if (zone_intersects(zone, start_pfn, nr_pages)) >> return zone; >> } >> >> - return &pgdat->node_zones[ZONE_NORMAL]; >> + return &pgdat->node_zones[default_zone]; >> } >> >> static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, >> >> Tried out the above patch. It seems to be working; no panic, total memory has increased and the hot-added memory is added as HIGHMEM. I had to backout Oscar's first section of patch https://bugzilla.kernel.org/show_bug.cgi?id=206401#c28 since it spams console too much and bogs down systemd. Minimal install of 168MB memory worked, so this time the sample is running anaconda installer starting at 512MB. Eventually memory was hot-added to around 1.2GB. The weird pr_info() from populate_section_memmap() is still remaining though... 2nd parameter of add_memory() (phys_addr_t, 32bit on non-PAE) is going up to 0x60000000, so drivers/hv/hv_balloon.c:hv_mem_hot_add() may need limit check to not overflow 4GB for heavier usage. (Yes you should limit it in hypervisor dialog, but default is 1TB) Do we need modifications for arch/x86/mm/init_32.c:arch_add_memory() so that the hot-added memory is always in highmem area? Currently it just >>PAGE_SHIFT given parameters and call generic __add_pages(). ======================= readelf -l /proc/kcore: Elf file type is CORE (Core file) Entry point 0x0 There are 3 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align NOTE 0x000094 0x00000000 0x00000000 0x01304 0x00000 0 LOAD 0x207f2000 0xe07f0000 0xffffffff 0x1e80e000 0x1e80e000 RWE 0x1000 LOAD 0x002000 0xc0000000 0x00000000 0x1fff0000 0x1fff0000 RWE 0x1000 ======================== dmesg excerpt: [ 302.503487] hv_balloon: Max. dynamic memory size: 1048576 MB [ 303.171640] hv_balloon: hv_mem_hot_add: calling add_memory(nid=0, ((start_pfn=0x28000) << PAGE_SHIFT)=0x28000000, (HA_CHUNK << PAGE_SHIFT)=134217728) [ 303.173031] populate_section_memmap: alloc_pages() returned 0x56164d26 (should be 0), reverting to vmalloc(memmap_size=655360) [ 303.173031] populate_section_memmap: vmalloc(655360) returned 0x912eede0 [ 303.173031] populate_section_memmap: returning struct page * =0x912eede0 [ 303.173032] populate_section_memmap: alloc_pages() returned 0x56164d26 (should be 0), reverting to vmalloc(memmap_size=655360) [ 303.173032] populate_section_memmap: vmalloc(655360) returned 0x900acc37 [ 303.173032] populate_section_memmap: returning struct page * =0x900acc37 [ 303.173033] hv_balloon: hv_mem_hot_add: add_memory() returned 0 [ 303.213109] online_pages: pfn: 28000 - 2c000 (zone: HighMem) [ 303.223135] Built 1 zonelists, mobility grouping on. Total pages: 123131 [ 303.223139] online_pages: pfn: 2c000 - 30000 (zone: HighMem) .... [ 305.124224] hv_balloon: hv_mem_hot_add: calling add_memory(nid=0, ((start_pfn=0x60000) << PAGE_SHIFT)=0x60000000, (HA_CHUNK << PAGE_SHIFT)=134217728) [ 305.124239] populate_section_memmap: alloc_pages() returned 0x56164d26 (should be 0), reverting to vmalloc(memmap_size=655360) [ 305.124240] populate_section_memmap: vmalloc(655360) returned 0x5dd5170c [ 305.124240] populate_section_memmap: returning struct page * =0x5dd5170c [ 305.124254] populate_section_memmap: alloc_pages() returned 0x56164d26 (should be 0), reverting to vmalloc(memmap_size=655360) [ 305.124254] populate_section_memmap: vmalloc(655360) returned 0xf8ef699a [ 305.124254] populate_section_memmap: returning struct page * =0xf8ef699a [ 305.124256] hv_balloon: hv_mem_hot_add: add_memory() returned 0 [ 305.143791] online_pages: pfn: 60000 - 64000 (zone: HighMem) [ 305.153186] online_pages: pfn: 64000 - 68000 (zone: HighMem) ======================= /proc/zoneinfo before hot-add Node 0, zone DMA per-node stats nr_inactive_anon 12069 nr_active_anon 11288 nr_inactive_file 13748 nr_active_file 17527 nr_unevictable 6734 nr_slab_reclaimable 4337 nr_slab_unreclaimable 8457 nr_isolated_anon 0 nr_isolated_file 0 workingset_nodes 2262 workingset_refault 223120 workingset_activate 208515 workingset_restore 137786 workingset_nodereclaim 707 nr_anon_pages 26686 nr_mapped 10129 nr_file_pages 34688 nr_dirty 1 nr_writeback 231 nr_writeback_temp 0 nr_shmem 942 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_file_hugepages 0 nr_file_pmdmapped 0 nr_anon_transparent_hugepages 0 nr_unstable 0 nr_vmscan_write 71210 nr_vmscan_immediate_reclaim 3265 nr_dirtied 6588 nr_written 77555 nr_kernel_misc_reclaimable 0 pages free 403 min 1049 low 1055 high 1061 spanned 4095 present 3998 managed 3979 protection: (0, 357, 357, 357) nr_free_pages 403 nr_zone_inactive_anon 371 nr_zone_active_anon 321 nr_zone_inactive_file 544 nr_zone_active_file 683 nr_zone_unevictable 164 nr_zone_write_pending 0 nr_mlock 164 nr_page_table_pages 11 nr_kernel_stack 360 nr_bounce 0 nr_zspages 583 nr_free_cma 0 pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 2 node_unreclaimable: 0 start_pfn: 1 Node 0, zone Normal pages free 1400 min 592 low 740 high 888 spanned 126960 present 126960 managed 105057 protection: (0, 0, 0, 0) nr_free_pages 1400 nr_zone_inactive_anon 11695 nr_zone_active_anon 10961 nr_zone_inactive_file 13204 nr_zone_active_file 16844 nr_zone_unevictable 6570 nr_zone_write_pending 235 nr_mlock 6570 nr_page_table_pages 514 nr_kernel_stack 1272 nr_bounce 0 nr_zspages 22175 nr_free_cma 0 pagesets cpu: 0 count: 44 high: 186 batch: 31 vm stats threshold: 6 node_unreclaimable: 0 start_pfn: 4096 Node 0, zone HighMem pages free 0 min 32 low 32 high 32 spanned 0 present 0 managed 0 protection: (0, 0, 0, 0) Node 0, zone Movable pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 0) ============================ /proc/zoneinfo after hot-add Node 0, zone DMA per-node stats nr_inactive_anon 13438 nr_active_anon 10249 nr_inactive_file 6955 nr_active_file 26815 nr_unevictable 6734 nr_slab_reclaimable 4442 nr_slab_unreclaimable 8670 nr_isolated_anon 0 nr_isolated_file 0 workingset_nodes 2174 workingset_refault 635931 workingset_activate 594855 workingset_restore 486703 workingset_nodereclaim 1247 nr_anon_pages 25862 nr_mapped 12441 nr_file_pages 38352 nr_dirty 8 nr_writeback 0 nr_writeback_temp 0 nr_shmem 2136 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_file_hugepages 0 nr_file_pmdmapped 0 nr_anon_transparent_hugepages 0 nr_unstable 0 nr_vmscan_write 123858 nr_vmscan_immediate_reclaim 12156 nr_dirtied 7219 nr_written 130953 nr_kernel_misc_reclaimable 0 pages free 1380 min 23 low 28 high 33 spanned 4095 present 3998 managed 3979 protection: (0, 410, 1306, 1306) nr_free_pages 1380 nr_zone_inactive_anon 27 nr_zone_active_anon 102 nr_zone_inactive_file 122 nr_zone_active_file 238 nr_zone_unevictable 164 nr_zone_write_pending 0 nr_mlock 164 nr_page_table_pages 13 nr_kernel_stack 328 nr_bounce 0 nr_zspages 660 nr_free_cma 0 pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 2 node_unreclaimable: 0 start_pfn: 1 Node 0, zone Normal pages free 20635 min 633 low 791 high 949 spanned 126960 present 126960 managed 105057 protection: (0, 0, 7168, 7168) nr_free_pages 20635 nr_zone_inactive_anon 8967 nr_zone_active_anon 7980 nr_zone_inactive_file 6309 nr_zone_active_file 5881 nr_zone_unevictable 6570 nr_zone_write_pending 8 nr_mlock 6570 nr_page_table_pages 537 nr_kernel_stack 1176 nr_bounce 0 nr_zspages 25936 nr_free_cma 0 pagesets cpu: 0 count: 97 high: 186 batch: 31 vm stats threshold: 6 node_unreclaimable: 0 start_pfn: 4096 Node 0, zone HighMem pages free 199096 min 128 low 473 high 818 spanned 262144 present 262144 managed 229376 protection: (0, 0, 0, 0) nr_free_pages 199096 nr_zone_inactive_anon 4444 nr_zone_active_anon 2167 nr_zone_inactive_file 524 nr_zone_active_file 20691 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_mlock 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_bounce 0 nr_zspages 122 nr_free_cma 0 pagesets cpu: 0 count: 67 high: 378 batch: 63 vm stats threshold: 8 node_unreclaimable: 0 start_pfn: 163840 Node 0, zone Movable pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 0) -- kabe