Kirill A. Shutemov wrote: > On Tue, Feb 07, 2017 at 09:11:05AM -0600, Zi Yan wrote: >>>> This causes memory leak or kernel crashing, if VM_BUG_ON() is enabled. >>> The problem is that numabalancing calls change_huge_pmd() under >>> down_read(mmap_sem), not down_write(mmap_sem) as the rest of users do. >>> It makes numabalancing the only code path beyond page fault that can turn >>> pmd_none() into pmd_trans_huge() under down_read(mmap_sem). >>> >>> This can lead to race when MADV_DONTNEED miss THP. That's not critical for >>> pagefault vs. MADV_DONTNEED race as we will end up with clear page in that >>> case. Not so much for change_huge_pmd(). >>> >>> Looks like we need pmdp_modify() or something to modify protection bits >>> inplace, without clearing pmd. >>> >>> Not sure how to get crash scenario. >>> >>> BTW, Zi, have you observed the crash? Or is it based on code inspection? >>> Any backtraces? >> The problem should be very rare in the upstream kernel. I discover the >> problem in my customized kernel which does very frequent page migration >> and uses numa_protnone. >> >> The crash scenario I guess is like: >> 1. A huge page pmd entry is in the middle of being changed into either a >> pmd_protnone or a pmd_migration_entry. It is cleared to pmd_none. >> >> 2. At the same time, the application frees the vma this page belongs to. > > Em... no. > > This shouldn't be possible: your 1. must be done under down_read(mmap_sem). > And we only be able to remove vma under down_write(mmap_sem), so the > scenario should be excluded. > > What do I miss? You are right. This problem will not happen in the upstream kernel. The problem comes from my customized kernel, where I migrate pages away instead of reclaiming them when memory is under pressure. I did not take any mmap_sem when I migrate pages. So I got this error. It is a false alarm. Sorry about that. Thanks for clarifying the problem. -- Best Regards, Yan Zi