* mm/compaction: BUG: NULL pointer dereference @ 2019-05-24 9:20 ` Suzuki K Poulose 0 siblings, 0 replies; 18+ messages in thread From: Suzuki K Poulose @ 2019-05-24 9:20 UTC (permalink / raw) To: linux-mm Cc: mgorman, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm, Suzuki K Poulose Hi, We are hitting NULL pointer dereferences while running stress tests with KVM. See splat [0]. The test is to spawn 100 VMs all doing standard debian installation (Thanks to Marc's automated scripts, available here [1] ). The problem has been reproduced with a better rate of success from 5.1-rc6 onwards. The issue is only reproducible with swapping enabled and the entire memory is used up, when swapping heavily. Also this issue is only reproducible on only one server with 128GB, which has the following memory layout: [32GB@4GB, hole , 96GB@544GB] Here is my non-expert analysis of the issue so far. Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() to figure out the cached values for migrate/free pfn for a zone, by scanning through the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], with the following area of holes : [ 0x20_0000, 0x880_0000 ]. In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, since we cant find anything during the search we fall back to using the page belonging to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid PFN or not. This is then passed on to fast_isolate_around() which tries to do : set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. The following patch seems to fix the issue for me, but I am not quite convinced that it is the right fix. Thoughts ? diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } Suzuki [ 0 ] Kernel splat Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 60000005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 sp : ffff00001fc03010 x29: ffff00001fc03010 x28: 0000000000000000 x27: 0000000000000000 x26: ffff000010bf7000 x25: 0000000006445000 x24: 0000000006444e00 x23: ffff7e018f138000 x22: 0000000000000003 x21: 0000000000000001 x20: 0000000006444e00 x19: 0000000000000001 x18: 0000000000000000 x17: 0000000000000000 x16: ffff809f7fe97268 x15: 0000000191138000 x14: 0000000000000000 x13: 0000000000000070 x12: 0000000000000000 x11: ffff00001fc03108 x10: 0000000000000000 x9 : 0000000009222400 x8 : 0000000000000187 x7 : 00000000063c4e00 x6 : 0000000006444e00 x5 : 0000000000080000 x4 : 0000000000000001 x3 : 0000000000000003 x2 : ffff809f7fe92840 x1 : 0000000000000220 x0 : 0000000000000000 Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f100001f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/vminstall.git/ ^ permalink raw reply related [flat|nested] 18+ messages in thread
* mm/compaction: BUG: NULL pointer dereference @ 2019-05-24 9:20 ` Suzuki K Poulose 0 siblings, 0 replies; 18+ messages in thread From: Suzuki K Poulose @ 2019-05-24 9:20 UTC (permalink / raw) To: linux-mm Cc: mhocko, kvm, marc.zyngier, linux-kernel, cai, akpm, mgorman, kvmarm Hi, We are hitting NULL pointer dereferences while running stress tests with KVM. See splat [0]. The test is to spawn 100 VMs all doing standard debian installation (Thanks to Marc's automated scripts, available here [1] ). The problem has been reproduced with a better rate of success from 5.1-rc6 onwards. The issue is only reproducible with swapping enabled and the entire memory is used up, when swapping heavily. Also this issue is only reproducible on only one server with 128GB, which has the following memory layout: [32GB@4GB, hole , 96GB@544GB] Here is my non-expert analysis of the issue so far. Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() to figure out the cached values for migrate/free pfn for a zone, by scanning through the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], with the following area of holes : [ 0x20_0000, 0x880_0000 ]. In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, since we cant find anything during the search we fall back to using the page belonging to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid PFN or not. This is then passed on to fast_isolate_around() which tries to do : set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. The following patch seems to fix the issue for me, but I am not quite convinced that it is the right fix. Thoughts ? diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } Suzuki [ 0 ] Kernel splat Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 60000005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 sp : ffff00001fc03010 x29: ffff00001fc03010 x28: 0000000000000000 x27: 0000000000000000 x26: ffff000010bf7000 x25: 0000000006445000 x24: 0000000006444e00 x23: ffff7e018f138000 x22: 0000000000000003 x21: 0000000000000001 x20: 0000000006444e00 x19: 0000000000000001 x18: 0000000000000000 x17: 0000000000000000 x16: ffff809f7fe97268 x15: 0000000191138000 x14: 0000000000000000 x13: 0000000000000070 x12: 0000000000000000 x11: ffff00001fc03108 x10: 0000000000000000 x9 : 0000000009222400 x8 : 0000000000000187 x7 : 00000000063c4e00 x6 : 0000000006444e00 x5 : 0000000000080000 x4 : 0000000000000001 x3 : 0000000000000003 x2 : ffff809f7fe92840 x1 : 0000000000000220 x0 : 0000000000000000 Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f100001f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/vminstall.git/ _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference 2019-05-24 9:20 ` Suzuki K Poulose @ 2019-05-24 10:39 ` Mel Gorman -1 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2019-05-24 10:39 UTC (permalink / raw) To: Suzuki K Poulose Cc: linux-mm, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote: > Hi, > > We are hitting NULL pointer dereferences while running stress tests with KVM. > See splat [0]. The test is to spawn 100 VMs all doing standard debian > installation (Thanks to Marc's automated scripts, available here [1] ). > The problem has been reproduced with a better rate of success from 5.1-rc6 > onwards. > > The issue is only reproducible with swapping enabled and the entire > memory is used up, when swapping heavily. Also this issue is only reproducible > on only one server with 128GB, which has the following memory layout: > > [32GB@4GB, hole , 96GB@544GB] > > Here is my non-expert analysis of the issue so far. > > Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() > to figure out the cached values for migrate/free pfn for a zone, by scanning through > the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], > with the following area of holes : [ 0x20_0000, 0x880_0000 ]. > In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which > is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, > with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. > > Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, > since we cant find anything during the search we fall back to using the page belonging > to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid > PFN or not. This is then passed on to fast_isolate_around() which tries to do : > set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. > > The following patch seems to fix the issue for me, but I am not quite convinced that > it is the right fix. Thoughts ? > I think the patch is valid and the alternatives would be unnecessarily complicated. During a normal scan for free pages to isolate, there is a check for pageblock_pfn_to_page() which uses a pfn_valid check for non-contiguous zones in __pageblock_pfn_to_page. Now, while the non-contiguous check could be made in the area you highlight, it would be a relatively small optimisation that would be unmeasurable overall. However, it is definitely the case that if the PFN you highlight is invalid that badness happens. If you want to express this as a signed-off patch with an adjusted changelog then I'd be happy to add Reviewed-by: Mel Gorman <mgorman@techsingularity.net> If you are not comfortable with rewriting the changelog and formatting it as a patch then I can do it on your behalf and preserve your Signed-off-by. Just let me know. Thanks for researching this, I think it also applies to other people but had not found the time to track it down. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference @ 2019-05-24 10:39 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2019-05-24 10:39 UTC (permalink / raw) To: Suzuki K Poulose Cc: mhocko, kvm, marc.zyngier, linux-kernel, linux-mm, cai, akpm, kvmarm On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote: > Hi, > > We are hitting NULL pointer dereferences while running stress tests with KVM. > See splat [0]. The test is to spawn 100 VMs all doing standard debian > installation (Thanks to Marc's automated scripts, available here [1] ). > The problem has been reproduced with a better rate of success from 5.1-rc6 > onwards. > > The issue is only reproducible with swapping enabled and the entire > memory is used up, when swapping heavily. Also this issue is only reproducible > on only one server with 128GB, which has the following memory layout: > > [32GB@4GB, hole , 96GB@544GB] > > Here is my non-expert analysis of the issue so far. > > Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() > to figure out the cached values for migrate/free pfn for a zone, by scanning through > the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], > with the following area of holes : [ 0x20_0000, 0x880_0000 ]. > In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which > is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, > with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. > > Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, > since we cant find anything during the search we fall back to using the page belonging > to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid > PFN or not. This is then passed on to fast_isolate_around() which tries to do : > set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. > > The following patch seems to fix the issue for me, but I am not quite convinced that > it is the right fix. Thoughts ? > I think the patch is valid and the alternatives would be unnecessarily complicated. During a normal scan for free pages to isolate, there is a check for pageblock_pfn_to_page() which uses a pfn_valid check for non-contiguous zones in __pageblock_pfn_to_page. Now, while the non-contiguous check could be made in the area you highlight, it would be a relatively small optimisation that would be unmeasurable overall. However, it is definitely the case that if the PFN you highlight is invalid that badness happens. If you want to express this as a signed-off patch with an adjusted changelog then I'd be happy to add Reviewed-by: Mel Gorman <mgorman@techsingularity.net> If you are not comfortable with rewriting the changelog and formatting it as a patch then I can do it on your behalf and preserve your Signed-off-by. Just let me know. Thanks for researching this, I think it also applies to other people but had not found the time to track it down. -- Mel Gorman SUSE Labs _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference 2019-05-24 10:39 ` Mel Gorman @ 2019-05-24 10:42 ` Suzuki K Poulose -1 siblings, 0 replies; 18+ messages in thread From: Suzuki K Poulose @ 2019-05-24 10:42 UTC (permalink / raw) To: mgorman Cc: linux-mm, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm Hi Mel, Thanks for your quick response. On 24/05/2019 11:39, Mel Gorman wrote: > On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote: >> Hi, >> >> We are hitting NULL pointer dereferences while running stress tests with KVM. >> See splat [0]. The test is to spawn 100 VMs all doing standard debian >> installation (Thanks to Marc's automated scripts, available here [1] ). >> The problem has been reproduced with a better rate of success from 5.1-rc6 >> onwards. >> >> The issue is only reproducible with swapping enabled and the entire >> memory is used up, when swapping heavily. Also this issue is only reproducible >> on only one server with 128GB, which has the following memory layout: >> >> [32GB@4GB, hole , 96GB@544GB] >> >> Here is my non-expert analysis of the issue so far. >> >> Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() >> to figure out the cached values for migrate/free pfn for a zone, by scanning through >> the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], >> with the following area of holes : [ 0x20_0000, 0x880_0000 ]. >> In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which >> is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, >> with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. >> >> Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, >> since we cant find anything during the search we fall back to using the page belonging >> to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid >> PFN or not. This is then passed on to fast_isolate_around() which tries to do : >> set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. >> >> The following patch seems to fix the issue for me, but I am not quite convinced that >> it is the right fix. Thoughts ? >> > > I think the patch is valid and the alternatives would be unnecessarily > complicated. During a normal scan for free pages to isolate, there > is a check for pageblock_pfn_to_page() which uses a pfn_valid check > for non-contiguous zones in __pageblock_pfn_to_page. Now, while the I had the initial version with the pageblock_pfn_to_page(), but as you said, it is a complicated way of perform the same check as pfn_valid(). > non-contiguous check could be made in the area you highlight, it would be a > relatively small optimisation that would be unmeasurable overall. However, > it is definitely the case that if the PFN you highlight is invalid that > badness happens. If you want to express this as a signed-off patch with > an adjusted changelog then I'd be happy to add Sure, will send it right away. > > Reviewed-by: Mel Gorman <mgorman@techsingularity.net> > Thanks. Suzuki ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference @ 2019-05-24 10:42 ` Suzuki K Poulose 0 siblings, 0 replies; 18+ messages in thread From: Suzuki K Poulose @ 2019-05-24 10:42 UTC (permalink / raw) To: mgorman Cc: mhocko, kvm, marc.zyngier, linux-kernel, linux-mm, cai, akpm, kvmarm Hi Mel, Thanks for your quick response. On 24/05/2019 11:39, Mel Gorman wrote: > On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote: >> Hi, >> >> We are hitting NULL pointer dereferences while running stress tests with KVM. >> See splat [0]. The test is to spawn 100 VMs all doing standard debian >> installation (Thanks to Marc's automated scripts, available here [1] ). >> The problem has been reproduced with a better rate of success from 5.1-rc6 >> onwards. >> >> The issue is only reproducible with swapping enabled and the entire >> memory is used up, when swapping heavily. Also this issue is only reproducible >> on only one server with 128GB, which has the following memory layout: >> >> [32GB@4GB, hole , 96GB@544GB] >> >> Here is my non-expert analysis of the issue so far. >> >> Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() >> to figure out the cached values for migrate/free pfn for a zone, by scanning through >> the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], >> with the following area of holes : [ 0x20_0000, 0x880_0000 ]. >> In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which >> is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, >> with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. >> >> Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, >> since we cant find anything during the search we fall back to using the page belonging >> to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid >> PFN or not. This is then passed on to fast_isolate_around() which tries to do : >> set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. >> >> The following patch seems to fix the issue for me, but I am not quite convinced that >> it is the right fix. Thoughts ? >> > > I think the patch is valid and the alternatives would be unnecessarily > complicated. During a normal scan for free pages to isolate, there > is a check for pageblock_pfn_to_page() which uses a pfn_valid check > for non-contiguous zones in __pageblock_pfn_to_page. Now, while the I had the initial version with the pageblock_pfn_to_page(), but as you said, it is a complicated way of perform the same check as pfn_valid(). > non-contiguous check could be made in the area you highlight, it would be a > relatively small optimisation that would be unmeasurable overall. However, > it is definitely the case that if the PFN you highlight is invalid that > badness happens. If you want to express this as a signed-off patch with > an adjusted changelog then I'd be happy to add Sure, will send it right away. > > Reviewed-by: Mel Gorman <mgorman@techsingularity.net> > Thanks. Suzuki _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] mm, compaction: Make sure we isolate a valid PFN 2019-05-24 10:39 ` Mel Gorman @ 2019-05-24 15:31 ` Suzuki K Poulose -1 siblings, 0 replies; 18+ messages in thread From: Suzuki K Poulose @ 2019-05-24 15:31 UTC (permalink / raw) To: linux-mm Cc: mgorman, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm, suzuki.poulose When we have holes in a normal memory zone, we could endup having cached_migrate_pfns which may not necessarily be valid, under heavy memory pressure with swapping enabled ( via __reset_isolation_suitable(), triggered by kswapd). Later if we fail to find a page via fast_isolate_freepages(), we may end up using the migrate_pfn we started the search with, as valid page. This could lead to accessing NULL pointer derefernces like below, due to an invalid mem_section pointer. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 60000005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 [...] Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f100001f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- The issue was reported on an arm64 server with 128GB with holes in the zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM guest instances. This patch fixes the issue by ensuring that the page belongs to a valid PFN when we fallback to using the lower limit of the scan range upon failure in fast_isolate_freepages(). Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Reported-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> --- mm/compaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH] mm, compaction: Make sure we isolate a valid PFN @ 2019-05-24 15:31 ` Suzuki K Poulose 0 siblings, 0 replies; 18+ messages in thread From: Suzuki K Poulose @ 2019-05-24 15:31 UTC (permalink / raw) To: linux-mm Cc: mhocko, kvm, marc.zyngier, linux-kernel, cai, akpm, mgorman, kvmarm When we have holes in a normal memory zone, we could endup having cached_migrate_pfns which may not necessarily be valid, under heavy memory pressure with swapping enabled ( via __reset_isolation_suitable(), triggered by kswapd). Later if we fail to find a page via fast_isolate_freepages(), we may end up using the migrate_pfn we started the search with, as valid page. This could lead to accessing NULL pointer derefernces like below, due to an invalid mem_section pointer. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 60000005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 [...] Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f100001f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- The issue was reported on an arm64 server with 128GB with holes in the zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM guest instances. This patch fixes the issue by ensuring that the page belongs to a valid PFN when we fallback to using the lower limit of the scan range upon failure in fast_isolate_freepages(). Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Reported-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> --- mm/compaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } -- 2.7.4 _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] mm, compaction: Make sure we isolate a valid PFN 2019-05-24 15:31 ` Suzuki K Poulose @ 2019-05-24 15:51 ` Mel Gorman -1 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2019-05-24 15:51 UTC (permalink / raw) To: Suzuki K Poulose Cc: linux-mm, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm On Fri, May 24, 2019 at 04:31:48PM +0100, Suzuki K Poulose wrote: > When we have holes in a normal memory zone, we could endup having > cached_migrate_pfns which may not necessarily be valid, under heavy memory > pressure with swapping enabled ( via __reset_isolation_suitable(), triggered > by kswapd). > > Later if we fail to find a page via fast_isolate_freepages(), we may > end up using the migrate_pfn we started the search with, as valid > page. This could lead to accessing NULL pointer derefernces like below, > due to an invalid mem_section pointer. > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] > Mem abort info: > ESR = 0x96000004 > Exception class = DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 > [0000000000000008] pgd=0000000000000000 > Internal error: Oops: 96000004 [#1] SMP > ... > CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 > Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 > pstate: 60000005 (nZCv daif -PAN -UAO) > pc : set_pfnblock_flags_mask+0x58/0xe8 > lr : compaction_alloc+0x300/0x950 > [...] > Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) > Call trace: > set_pfnblock_flags_mask+0x58/0xe8 > compaction_alloc+0x300/0x950 > migrate_pages+0x1a4/0xbb0 > compact_zone+0x750/0xde8 > compact_zone_order+0xd8/0x118 > try_to_compact_pages+0xb4/0x290 > __alloc_pages_direct_compact+0x84/0x1e0 > __alloc_pages_nodemask+0x5e0/0xe18 > alloc_pages_vma+0x1cc/0x210 > do_huge_pmd_anonymous_page+0x108/0x7c8 > __handle_mm_fault+0xdd4/0x1190 > handle_mm_fault+0x114/0x1c0 > __get_user_pages+0x198/0x3c0 > get_user_pages_unlocked+0xb4/0x1d8 > __gfn_to_pfn_memslot+0x12c/0x3b8 > gfn_to_pfn_prot+0x4c/0x60 > kvm_handle_guest_abort+0x4b0/0xcd8 > handle_exit+0x140/0x1b8 > kvm_arch_vcpu_ioctl_run+0x260/0x768 > kvm_vcpu_ioctl+0x490/0x898 > do_vfs_ioctl+0xc4/0x898 > ksys_ioctl+0x8c/0xa0 > __arm64_sys_ioctl+0x28/0x38 > el0_svc_common+0x74/0x118 > el0_svc_handler+0x38/0x78 > el0_svc+0x8/0xc > Code: f8607840 f100001f 8b011401 9a801020 (f9400400) > ---[ end trace af6a35219325a9b6 ]--- > > The issue was reported on an arm64 server with 128GB with holes in the zone > (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM > guest instances. > > This patch fixes the issue by ensuring that the page belongs to a valid PFN > when we fallback to using the lower limit of the scan range upon failure in > fast_isolate_freepages(). > > Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") > Reported-by: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm, compaction: Make sure we isolate a valid PFN @ 2019-05-24 15:51 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2019-05-24 15:51 UTC (permalink / raw) To: Suzuki K Poulose Cc: mhocko, kvm, marc.zyngier, linux-kernel, linux-mm, cai, akpm, kvmarm On Fri, May 24, 2019 at 04:31:48PM +0100, Suzuki K Poulose wrote: > When we have holes in a normal memory zone, we could endup having > cached_migrate_pfns which may not necessarily be valid, under heavy memory > pressure with swapping enabled ( via __reset_isolation_suitable(), triggered > by kswapd). > > Later if we fail to find a page via fast_isolate_freepages(), we may > end up using the migrate_pfn we started the search with, as valid > page. This could lead to accessing NULL pointer derefernces like below, > due to an invalid mem_section pointer. > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] > Mem abort info: > ESR = 0x96000004 > Exception class = DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 > [0000000000000008] pgd=0000000000000000 > Internal error: Oops: 96000004 [#1] SMP > ... > CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 > Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 > pstate: 60000005 (nZCv daif -PAN -UAO) > pc : set_pfnblock_flags_mask+0x58/0xe8 > lr : compaction_alloc+0x300/0x950 > [...] > Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) > Call trace: > set_pfnblock_flags_mask+0x58/0xe8 > compaction_alloc+0x300/0x950 > migrate_pages+0x1a4/0xbb0 > compact_zone+0x750/0xde8 > compact_zone_order+0xd8/0x118 > try_to_compact_pages+0xb4/0x290 > __alloc_pages_direct_compact+0x84/0x1e0 > __alloc_pages_nodemask+0x5e0/0xe18 > alloc_pages_vma+0x1cc/0x210 > do_huge_pmd_anonymous_page+0x108/0x7c8 > __handle_mm_fault+0xdd4/0x1190 > handle_mm_fault+0x114/0x1c0 > __get_user_pages+0x198/0x3c0 > get_user_pages_unlocked+0xb4/0x1d8 > __gfn_to_pfn_memslot+0x12c/0x3b8 > gfn_to_pfn_prot+0x4c/0x60 > kvm_handle_guest_abort+0x4b0/0xcd8 > handle_exit+0x140/0x1b8 > kvm_arch_vcpu_ioctl_run+0x260/0x768 > kvm_vcpu_ioctl+0x490/0x898 > do_vfs_ioctl+0xc4/0x898 > ksys_ioctl+0x8c/0xa0 > __arm64_sys_ioctl+0x28/0x38 > el0_svc_common+0x74/0x118 > el0_svc_handler+0x38/0x78 > el0_svc+0x8/0xc > Code: f8607840 f100001f 8b011401 9a801020 (f9400400) > ---[ end trace af6a35219325a9b6 ]--- > > The issue was reported on an arm64 server with 128GB with holes in the zone > (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM > guest instances. > > This patch fixes the issue by ensuring that the page belongs to a valid PFN > when we fallback to using the lower limit of the scan range upon failure in > fast_isolate_freepages(). > > Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") > Reported-by: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> -- Mel Gorman SUSE Labs _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm, compaction: Make sure we isolate a valid PFN 2019-05-24 15:31 ` Suzuki K Poulose @ 2019-05-27 5:38 ` Anshuman Khandual -1 siblings, 0 replies; 18+ messages in thread From: Anshuman Khandual @ 2019-05-27 5:38 UTC (permalink / raw) To: Suzuki K Poulose, linux-mm Cc: mgorman, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm On 05/24/2019 09:01 PM, Suzuki K Poulose wrote: > When we have holes in a normal memory zone, we could endup having > cached_migrate_pfns which may not necessarily be valid, under heavy memory > pressure with swapping enabled ( via __reset_isolation_suitable(), triggered > by kswapd). > > Later if we fail to find a page via fast_isolate_freepages(), we may > end up using the migrate_pfn we started the search with, as valid > page. This could lead to accessing NULL pointer derefernces like below, > due to an invalid mem_section pointer. > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] > Mem abort info: > ESR = 0x96000004 > Exception class = DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 > [0000000000000008] pgd=0000000000000000 > Internal error: Oops: 96000004 [#1] SMP > ... > CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 > Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 > pstate: 60000005 (nZCv daif -PAN -UAO) > pc : set_pfnblock_flags_mask+0x58/0xe8 > lr : compaction_alloc+0x300/0x950 > [...] > Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) > Call trace: > set_pfnblock_flags_mask+0x58/0xe8 > compaction_alloc+0x300/0x950 > migrate_pages+0x1a4/0xbb0 > compact_zone+0x750/0xde8 > compact_zone_order+0xd8/0x118 > try_to_compact_pages+0xb4/0x290 > __alloc_pages_direct_compact+0x84/0x1e0 > __alloc_pages_nodemask+0x5e0/0xe18 > alloc_pages_vma+0x1cc/0x210 > do_huge_pmd_anonymous_page+0x108/0x7c8 > __handle_mm_fault+0xdd4/0x1190 > handle_mm_fault+0x114/0x1c0 > __get_user_pages+0x198/0x3c0 > get_user_pages_unlocked+0xb4/0x1d8 > __gfn_to_pfn_memslot+0x12c/0x3b8 > gfn_to_pfn_prot+0x4c/0x60 > kvm_handle_guest_abort+0x4b0/0xcd8 > handle_exit+0x140/0x1b8 > kvm_arch_vcpu_ioctl_run+0x260/0x768 > kvm_vcpu_ioctl+0x490/0x898 > do_vfs_ioctl+0xc4/0x898 > ksys_ioctl+0x8c/0xa0 > __arm64_sys_ioctl+0x28/0x38 > el0_svc_common+0x74/0x118 > el0_svc_handler+0x38/0x78 > el0_svc+0x8/0xc > Code: f8607840 f100001f 8b011401 9a801020 (f9400400) > ---[ end trace af6a35219325a9b6 ]--- > > The issue was reported on an arm64 server with 128GB with holes in the zone > (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM > guest instances. > > This patch fixes the issue by ensuring that the page belongs to a valid PFN > when we fallback to using the lower limit of the scan range upon failure in > fast_isolate_freepages(). > > Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") > Reported-by: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm, compaction: Make sure we isolate a valid PFN @ 2019-05-27 5:38 ` Anshuman Khandual 0 siblings, 0 replies; 18+ messages in thread From: Anshuman Khandual @ 2019-05-27 5:38 UTC (permalink / raw) To: Suzuki K Poulose, linux-mm Cc: mhocko, kvm, marc.zyngier, linux-kernel, cai, akpm, mgorman, kvmarm On 05/24/2019 09:01 PM, Suzuki K Poulose wrote: > When we have holes in a normal memory zone, we could endup having > cached_migrate_pfns which may not necessarily be valid, under heavy memory > pressure with swapping enabled ( via __reset_isolation_suitable(), triggered > by kswapd). > > Later if we fail to find a page via fast_isolate_freepages(), we may > end up using the migrate_pfn we started the search with, as valid > page. This could lead to accessing NULL pointer derefernces like below, > due to an invalid mem_section pointer. > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] > Mem abort info: > ESR = 0x96000004 > Exception class = DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 > [0000000000000008] pgd=0000000000000000 > Internal error: Oops: 96000004 [#1] SMP > ... > CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 > Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 > pstate: 60000005 (nZCv daif -PAN -UAO) > pc : set_pfnblock_flags_mask+0x58/0xe8 > lr : compaction_alloc+0x300/0x950 > [...] > Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) > Call trace: > set_pfnblock_flags_mask+0x58/0xe8 > compaction_alloc+0x300/0x950 > migrate_pages+0x1a4/0xbb0 > compact_zone+0x750/0xde8 > compact_zone_order+0xd8/0x118 > try_to_compact_pages+0xb4/0x290 > __alloc_pages_direct_compact+0x84/0x1e0 > __alloc_pages_nodemask+0x5e0/0xe18 > alloc_pages_vma+0x1cc/0x210 > do_huge_pmd_anonymous_page+0x108/0x7c8 > __handle_mm_fault+0xdd4/0x1190 > handle_mm_fault+0x114/0x1c0 > __get_user_pages+0x198/0x3c0 > get_user_pages_unlocked+0xb4/0x1d8 > __gfn_to_pfn_memslot+0x12c/0x3b8 > gfn_to_pfn_prot+0x4c/0x60 > kvm_handle_guest_abort+0x4b0/0xcd8 > handle_exit+0x140/0x1b8 > kvm_arch_vcpu_ioctl_run+0x260/0x768 > kvm_vcpu_ioctl+0x490/0x898 > do_vfs_ioctl+0xc4/0x898 > ksys_ioctl+0x8c/0xa0 > __arm64_sys_ioctl+0x28/0x38 > el0_svc_common+0x74/0x118 > el0_svc_handler+0x38/0x78 > el0_svc+0x8/0xc > Code: f8607840 f100001f 8b011401 9a801020 (f9400400) > ---[ end trace af6a35219325a9b6 ]--- > > The issue was reported on an arm64 server with 128GB with holes in the zone > (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM > guest instances. > > This patch fixes the issue by ensuring that the page belongs to a valid PFN > when we fallback to using the lower limit of the scan range upon failure in > fast_isolate_freepages(). > > Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") > Reported-by: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference 2019-05-24 9:20 ` Suzuki K Poulose @ 2019-05-24 10:56 ` Anshuman Khandual -1 siblings, 0 replies; 18+ messages in thread From: Anshuman Khandual @ 2019-05-24 10:56 UTC (permalink / raw) To: Suzuki K Poulose, linux-mm Cc: mgorman, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: > Hi, > > We are hitting NULL pointer dereferences while running stress tests with KVM. > See splat [0]. The test is to spawn 100 VMs all doing standard debian > installation (Thanks to Marc's automated scripts, available here [1] ). > The problem has been reproduced with a better rate of success from 5.1-rc6 > onwards. > > The issue is only reproducible with swapping enabled and the entire > memory is used up, when swapping heavily. Also this issue is only reproducible > on only one server with 128GB, which has the following memory layout: > > [32GB@4GB, hole , 96GB@544GB] > > Here is my non-expert analysis of the issue so far. > > Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() > to figure out the cached values for migrate/free pfn for a zone, by scanning through > the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], > with the following area of holes : [ 0x20_0000, 0x880_0000 ]. > In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which > is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, > with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. > > Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, > since we cant find anything during the search we fall back to using the page belonging > to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid > PFN or not. This is then passed on to fast_isolate_around() which tries to do : > set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. > > The following patch seems to fix the issue for me, but I am not quite convinced that > it is the right fix. Thoughts ? > > > diff --git a/mm/compaction.c b/mm/compaction.c > index 9febc8c..9e1b9ac 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) > page = pfn_to_page(highest); > cc->free_pfn = highest; > } else { > - if (cc->direct_compaction) { > + if (cc->direct_compaction && pfn_valid(min_pfn)) { > page = pfn_to_page(min_pfn); pfn_to_online_page() here would be better as it does not add pfn_valid() cost on architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if the compaction is trying to scan pfns in zone holes, then it should be avoided. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference @ 2019-05-24 10:56 ` Anshuman Khandual 0 siblings, 0 replies; 18+ messages in thread From: Anshuman Khandual @ 2019-05-24 10:56 UTC (permalink / raw) To: Suzuki K Poulose, linux-mm Cc: mhocko, kvm, marc.zyngier, linux-kernel, cai, akpm, mgorman, kvmarm On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: > Hi, > > We are hitting NULL pointer dereferences while running stress tests with KVM. > See splat [0]. The test is to spawn 100 VMs all doing standard debian > installation (Thanks to Marc's automated scripts, available here [1] ). > The problem has been reproduced with a better rate of success from 5.1-rc6 > onwards. > > The issue is only reproducible with swapping enabled and the entire > memory is used up, when swapping heavily. Also this issue is only reproducible > on only one server with 128GB, which has the following memory layout: > > [32GB@4GB, hole , 96GB@544GB] > > Here is my non-expert analysis of the issue so far. > > Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() > to figure out the cached values for migrate/free pfn for a zone, by scanning through > the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], > with the following area of holes : [ 0x20_0000, 0x880_0000 ]. > In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which > is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, > with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. > > Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, > since we cant find anything during the search we fall back to using the page belonging > to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid > PFN or not. This is then passed on to fast_isolate_around() which tries to do : > set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. > > The following patch seems to fix the issue for me, but I am not quite convinced that > it is the right fix. Thoughts ? > > > diff --git a/mm/compaction.c b/mm/compaction.c > index 9febc8c..9e1b9ac 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) > page = pfn_to_page(highest); > cc->free_pfn = highest; > } else { > - if (cc->direct_compaction) { > + if (cc->direct_compaction && pfn_valid(min_pfn)) { > page = pfn_to_page(min_pfn); pfn_to_online_page() here would be better as it does not add pfn_valid() cost on architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if the compaction is trying to scan pfns in zone holes, then it should be avoided. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference 2019-05-24 10:56 ` Anshuman Khandual @ 2019-05-24 12:30 ` Mel Gorman -1 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2019-05-24 12:30 UTC (permalink / raw) To: Anshuman Khandual Cc: Suzuki K Poulose, linux-mm, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote: > > > On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: > > Hi, > > > > We are hitting NULL pointer dereferences while running stress tests with KVM. > > See splat [0]. The test is to spawn 100 VMs all doing standard debian > > installation (Thanks to Marc's automated scripts, available here [1] ). > > The problem has been reproduced with a better rate of success from 5.1-rc6 > > onwards. > > > > The issue is only reproducible with swapping enabled and the entire > > memory is used up, when swapping heavily. Also this issue is only reproducible > > on only one server with 128GB, which has the following memory layout: > > > > [32GB@4GB, hole , 96GB@544GB] > > > > Here is my non-expert analysis of the issue so far. > > > > Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() > > to figure out the cached values for migrate/free pfn for a zone, by scanning through > > the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], > > with the following area of holes : [ 0x20_0000, 0x880_0000 ]. > > In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which > > is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, > > with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. > > > > Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, > > since we cant find anything during the search we fall back to using the page belonging > > to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid > > PFN or not. This is then passed on to fast_isolate_around() which tries to do : > > set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. > > > > The following patch seems to fix the issue for me, but I am not quite convinced that > > it is the right fix. Thoughts ? > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index 9febc8c..9e1b9ac 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) > > page = pfn_to_page(highest); > > cc->free_pfn = highest; > > } else { > > - if (cc->direct_compaction) { > > + if (cc->direct_compaction && pfn_valid(min_pfn)) { > > page = pfn_to_page(min_pfn); > > pfn_to_online_page() here would be better as it does not add pfn_valid() cost on > architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if > the compaction is trying to scan pfns in zone holes, then it should be avoided. CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch punches holes within a section. As both do a section lookup, the cost is similar but pfn_valid in general is less subtle in this case. Normally pfn_valid_within is only ok when a pfn_valid check has been made on the max_order aligned range as well as a zone boundary check. In this case, it's much more straight-forward to leave it as pfn_valid. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference @ 2019-05-24 12:30 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2019-05-24 12:30 UTC (permalink / raw) To: Anshuman Khandual Cc: mhocko, kvm, marc.zyngier, linux-kernel, linux-mm, cai, akpm, kvmarm On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote: > > > On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: > > Hi, > > > > We are hitting NULL pointer dereferences while running stress tests with KVM. > > See splat [0]. The test is to spawn 100 VMs all doing standard debian > > installation (Thanks to Marc's automated scripts, available here [1] ). > > The problem has been reproduced with a better rate of success from 5.1-rc6 > > onwards. > > > > The issue is only reproducible with swapping enabled and the entire > > memory is used up, when swapping heavily. Also this issue is only reproducible > > on only one server with 128GB, which has the following memory layout: > > > > [32GB@4GB, hole , 96GB@544GB] > > > > Here is my non-expert analysis of the issue so far. > > > > Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() > > to figure out the cached values for migrate/free pfn for a zone, by scanning through > > the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], > > with the following area of holes : [ 0x20_0000, 0x880_0000 ]. > > In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which > > is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, > > with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. > > > > Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, > > since we cant find anything during the search we fall back to using the page belonging > > to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid > > PFN or not. This is then passed on to fast_isolate_around() which tries to do : > > set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. > > > > The following patch seems to fix the issue for me, but I am not quite convinced that > > it is the right fix. Thoughts ? > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index 9febc8c..9e1b9ac 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) > > page = pfn_to_page(highest); > > cc->free_pfn = highest; > > } else { > > - if (cc->direct_compaction) { > > + if (cc->direct_compaction && pfn_valid(min_pfn)) { > > page = pfn_to_page(min_pfn); > > pfn_to_online_page() here would be better as it does not add pfn_valid() cost on > architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if > the compaction is trying to scan pfns in zone holes, then it should be avoided. CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch punches holes within a section. As both do a section lookup, the cost is similar but pfn_valid in general is less subtle in this case. Normally pfn_valid_within is only ok when a pfn_valid check has been made on the max_order aligned range as well as a zone boundary check. In this case, it's much more straight-forward to leave it as pfn_valid. -- Mel Gorman SUSE Labs _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference 2019-05-24 12:30 ` Mel Gorman @ 2019-05-24 13:13 ` Anshuman Khandual -1 siblings, 0 replies; 18+ messages in thread From: Anshuman Khandual @ 2019-05-24 13:13 UTC (permalink / raw) To: Mel Gorman Cc: Suzuki K Poulose, linux-mm, akpm, mhocko, cai, linux-kernel, marc.zyngier, kvmarm, kvm On 05/24/2019 06:00 PM, Mel Gorman wrote: > On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote: >> >> >> On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: >>> Hi, >>> >>> We are hitting NULL pointer dereferences while running stress tests with KVM. >>> See splat [0]. The test is to spawn 100 VMs all doing standard debian >>> installation (Thanks to Marc's automated scripts, available here [1] ). >>> The problem has been reproduced with a better rate of success from 5.1-rc6 >>> onwards. >>> >>> The issue is only reproducible with swapping enabled and the entire >>> memory is used up, when swapping heavily. Also this issue is only reproducible >>> on only one server with 128GB, which has the following memory layout: >>> >>> [32GB@4GB, hole , 96GB@544GB] >>> >>> Here is my non-expert analysis of the issue so far. >>> >>> Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() >>> to figure out the cached values for migrate/free pfn for a zone, by scanning through >>> the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], >>> with the following area of holes : [ 0x20_0000, 0x880_0000 ]. >>> In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which >>> is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, >>> with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. >>> >>> Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, >>> since we cant find anything during the search we fall back to using the page belonging >>> to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid >>> PFN or not. This is then passed on to fast_isolate_around() which tries to do : >>> set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. >>> >>> The following patch seems to fix the issue for me, but I am not quite convinced that >>> it is the right fix. Thoughts ? >>> >>> >>> diff --git a/mm/compaction.c b/mm/compaction.c >>> index 9febc8c..9e1b9ac 100644 >>> --- a/mm/compaction.c >>> +++ b/mm/compaction.c >>> @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) >>> page = pfn_to_page(highest); >>> cc->free_pfn = highest; >>> } else { >>> - if (cc->direct_compaction) { >>> + if (cc->direct_compaction && pfn_valid(min_pfn)) { >>> page = pfn_to_page(min_pfn); >> >> pfn_to_online_page() here would be better as it does not add pfn_valid() cost on >> architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if >> the compaction is trying to scan pfns in zone holes, then it should be avoided. > > CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch > punches holes within a section. As both do a section lookup, the cost is > similar but pfn_valid in general is less subtle in this case. Normally > pfn_valid_within is only ok when a pfn_valid check has been made on the > max_order aligned range as well as a zone boundary check. In this case, > it's much more straight-forward to leave it as pfn_valid. Sure, makes sense. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm/compaction: BUG: NULL pointer dereference @ 2019-05-24 13:13 ` Anshuman Khandual 0 siblings, 0 replies; 18+ messages in thread From: Anshuman Khandual @ 2019-05-24 13:13 UTC (permalink / raw) To: Mel Gorman Cc: mhocko, kvm, marc.zyngier, linux-kernel, linux-mm, cai, akpm, kvmarm On 05/24/2019 06:00 PM, Mel Gorman wrote: > On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote: >> >> >> On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: >>> Hi, >>> >>> We are hitting NULL pointer dereferences while running stress tests with KVM. >>> See splat [0]. The test is to spawn 100 VMs all doing standard debian >>> installation (Thanks to Marc's automated scripts, available here [1] ). >>> The problem has been reproduced with a better rate of success from 5.1-rc6 >>> onwards. >>> >>> The issue is only reproducible with swapping enabled and the entire >>> memory is used up, when swapping heavily. Also this issue is only reproducible >>> on only one server with 128GB, which has the following memory layout: >>> >>> [32GB@4GB, hole , 96GB@544GB] >>> >>> Here is my non-expert analysis of the issue so far. >>> >>> Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() >>> to figure out the cached values for migrate/free pfn for a zone, by scanning through >>> the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], >>> with the following area of holes : [ 0x20_0000, 0x880_0000 ]. >>> In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which >>> is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, >>> with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. >>> >>> Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, >>> since we cant find anything during the search we fall back to using the page belonging >>> to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid >>> PFN or not. This is then passed on to fast_isolate_around() which tries to do : >>> set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. >>> >>> The following patch seems to fix the issue for me, but I am not quite convinced that >>> it is the right fix. Thoughts ? >>> >>> >>> diff --git a/mm/compaction.c b/mm/compaction.c >>> index 9febc8c..9e1b9ac 100644 >>> --- a/mm/compaction.c >>> +++ b/mm/compaction.c >>> @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) >>> page = pfn_to_page(highest); >>> cc->free_pfn = highest; >>> } else { >>> - if (cc->direct_compaction) { >>> + if (cc->direct_compaction && pfn_valid(min_pfn)) { >>> page = pfn_to_page(min_pfn); >> >> pfn_to_online_page() here would be better as it does not add pfn_valid() cost on >> architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if >> the compaction is trying to scan pfns in zone holes, then it should be avoided. > > CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch > punches holes within a section. As both do a section lookup, the cost is > similar but pfn_valid in general is less subtle in this case. Normally > pfn_valid_within is only ok when a pfn_valid check has been made on the > max_order aligned range as well as a zone boundary check. In this case, > it's much more straight-forward to leave it as pfn_valid. Sure, makes sense. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-05-27 5:38 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-24 9:20 mm/compaction: BUG: NULL pointer dereference Suzuki K Poulose 2019-05-24 9:20 ` Suzuki K Poulose 2019-05-24 10:39 ` Mel Gorman 2019-05-24 10:39 ` Mel Gorman 2019-05-24 10:42 ` Suzuki K Poulose 2019-05-24 10:42 ` Suzuki K Poulose 2019-05-24 15:31 ` [PATCH] mm, compaction: Make sure we isolate a valid PFN Suzuki K Poulose 2019-05-24 15:31 ` Suzuki K Poulose 2019-05-24 15:51 ` Mel Gorman 2019-05-24 15:51 ` Mel Gorman 2019-05-27 5:38 ` Anshuman Khandual 2019-05-27 5:38 ` Anshuman Khandual 2019-05-24 10:56 ` mm/compaction: BUG: NULL pointer dereference Anshuman Khandual 2019-05-24 10:56 ` Anshuman Khandual 2019-05-24 12:30 ` Mel Gorman 2019-05-24 12:30 ` Mel Gorman 2019-05-24 13:13 ` Anshuman Khandual 2019-05-24 13:13 ` Anshuman Khandual
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.