On 02.05.22 02:14, Liam Howlett wrote: > * Andrew Morton [220428 21:16]: >> On Fri, 29 Apr 2022 00:38:50 +0000 Liam Howlett wrote: >> >>>> mm/mmap.c: In function 'do_brk_flags': >>>> mm/mmap.c:2908:17: error: implicit declaration of function >>>> 'khugepaged_enter_vma_merge'; did you mean 'khugepaged_enter_vma'? >>>> >>>> It appears that this is later fixed, but it hurts bisectability >>>> (and prevents me from finding the actual build failure in linux-next >>>> when trying to build corenet64_smp_defconfig). >>> >>> Yeah, that khugepaged_enter_vma_merge was renamed in another patch set. >>> Andrew made the correction but kept the patch as it was. I think the >>> suggested change is right.. if you read the commit that introduced >>> khugepaged_enter_vma(), it seems right at least. >> >> Things are a bit crazy lately. Merge issues with mapletree, merge >> issues with mglru on mapletree, me doing a bunch of retooling to start >> publishing/merging via git, mapletree runtime issues, etc. >> >> I've dropped the mapletree patches again. Please scoop up all known >> fixes and redo against the (non-rebasing) mm-stable branch at >> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > Okay, sounds good. > > I have been porting my patches over and hit a bit of a snag. It looked > like my patches were not booting on the s390 - but not all the time. So > I reverted back to mm-stable (059342d1dd4e) and found that also failed > to boot sometimes on my qemu setup. When it fails it's ~4-5sec into > booting. The last thing I see is: > > "[ 4.668916] Spectre V2 mitigation: execute trampolines" > > I've bisected back to commit e553f62f10d9 (mm, page_alloc: fix > build_zonerefs_node()) > > With the this commit, I am unable to boot one out of three times. When > using the previous commit I was not able to get it to hang after trying > 10+ times. This is a qemu s390 install with KASAN on and I see no error > messages. I think it's likely it is this patch, but no guaranteed. This sounds like a race condition during the setup of memory zones. I could imagine my patch is triggering this problem, but it should not be the real root cause. I'm no expert regarding zone setup, but I think it might help to print some zone data in case the problem is happening. Which data is needed I have no real idea, but maybe someone else can help here. The following diff should recognize the problematic case (it might show false positives, though): diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0e42038382c1..23f029f39985 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6132,6 +6132,9 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs) zone_type--; zone = pgdat->node_zones + zone_type; if (populated_zone(zone)) { + if (!managed_zone(zone)) { + /* Print some data regarding the zone. */ + } zoneref_set_zone(zone, &zonerefs[nr_zones++]); check_highest_zone(zone_type); } Juergen