* [PATCH v2 0/2] A couple hugetlbfs fixes @ 2019-03-28 23:47 Mike Kravetz 2019-03-28 23:47 ` [PATCH v2 1/2] huegtlbfs: on restore reserve error path retain subpool reservation Mike Kravetz ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: Mike Kravetz @ 2019-03-28 23:47 UTC (permalink / raw) To: linux-kernel, linux-mm, Davidlohr Bueso Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov, Andrew Morton, Mike Kravetz I stumbled on these two hugetlbfs issues while looking at other things: - The 'restore reserve' functionality at page free time should not be adjusting subpool counts. - A BUG can be triggered (not easily) due to temporarily mapping a page before doing a COW. Both are described in detail in the commit message of the patches. I would appreciate comments from Davidlohr Bueso as one patch is directly related to code he added in commit 8382d914ebf7. I did not cc stable as the first problem has been around since reserves were added to hugetlbfs and nobody has noticed. The second is very hard to hit/reproduce. v2 - Update definition and all callers of hugetlb_fault_mutex_hash as the arguments mm and vma are no longer used or necessary. Mike Kravetz (2): huegtlbfs: on restore reserve error path retain subpool reservation hugetlb: use same fault hash key for shared and private mappings fs/hugetlbfs/inode.c | 7 ++----- include/linux/hugetlb.h | 4 +--- mm/hugetlb.c | 43 +++++++++++++++++++++-------------------- mm/userfaultfd.c | 3 +-- 4 files changed, 26 insertions(+), 31 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/2] huegtlbfs: on restore reserve error path retain subpool reservation 2019-03-28 23:47 [PATCH v2 0/2] A couple hugetlbfs fixes Mike Kravetz @ 2019-03-28 23:47 ` Mike Kravetz 2019-03-28 23:47 ` [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Mike Kravetz @ 2019-03-28 23:47 UTC (permalink / raw) To: linux-kernel, linux-mm, Davidlohr Bueso Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov, Andrew Morton, Mike Kravetz When a huge page is allocated, PagePrivate() is set if the allocation consumed a reservation. When freeing a huge page, PagePrivate is checked. If set, it indicates the reservation should be restored. PagePrivate being set at free huge page time mostly happens on error paths. When huge page reservations are created, a check is made to determine if the mapping is associated with an explicitly mounted filesystem. If so, pages are also reserved within the filesystem. The default action when freeing a huge page is to decrement the usage count in any associated explicitly mounted filesystem. However, if the reservation is to be restored the reservation/use count within the filesystem should not be decrementd. Otherwise, a subsequent page allocation and free for the same mapping location will cause the file filesystem usage to go 'negative'. Filesystem Size Used Avail Use% Mounted on nodev 4.0G -4.0M 4.1G - /opt/hugepool To fix, when freeing a huge page do not adjust filesystem usage if PagePrivate() is set to indicate the reservation should be restored. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f79ae4e42159..8651d6a602f9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1268,12 +1268,23 @@ void free_huge_page(struct page *page) ClearPagePrivate(page); /* - * A return code of zero implies that the subpool will be under its - * minimum size if the reservation is not restored after page is free. - * Therefore, force restore_reserve operation. + * If PagePrivate() was set on page, page allocation consumed a + * reservation. If the page was associated with a subpool, there + * would have been a page reserved in the subpool before allocation + * via hugepage_subpool_get_pages(). Since we are 'restoring' the + * reservtion, do not call hugepage_subpool_put_pages() as this will + * remove the reserved page from the subpool. */ - if (hugepage_subpool_put_pages(spool, 1) == 0) - restore_reserve = true; + if (!restore_reserve) { + /* + * A return code of zero implies that the subpool will be + * under its minimum size if the reservation is not restored + * after page is free. Therefore, force restore_reserve + * operation. + */ + if (hugepage_subpool_put_pages(spool, 1) == 0) + restore_reserve = true; + } spin_lock(&hugetlb_lock); clear_page_huge_active(page); -- 2.20.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings 2019-03-28 23:47 [PATCH v2 0/2] A couple hugetlbfs fixes Mike Kravetz 2019-03-28 23:47 ` [PATCH v2 1/2] huegtlbfs: on restore reserve error path retain subpool reservation Mike Kravetz @ 2019-03-28 23:47 ` Mike Kravetz 2019-04-11 18:32 ` Mike Kravetz 2019-04-03 5:29 ` [PATCH v2 0/2] A couple hugetlbfs fixes Naoya Horiguchi 2019-04-08 19:48 ` Davidlohr Bueso 3 siblings, 1 reply; 8+ messages in thread From: Mike Kravetz @ 2019-03-28 23:47 UTC (permalink / raw) To: linux-kernel, linux-mm, Davidlohr Bueso Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov, Andrew Morton, Mike Kravetz hugetlb uses a fault mutex hash table to prevent page faults of the same pages concurrently. The key for shared and private mappings is different. Shared keys off address_space and file index. Private keys off mm and virtual address. Consider a private mappings of a populated hugetlbfs file. A write fault will first map the page from the file and then do a COW to map a writable page. Hugetlbfs hole punch uses the fault mutex to prevent mappings of file pages. It uses the address_space file index key. However, private mappings will use a different key and could temporarily map the file page before COW. This causes problems (BUG) for the hole punch code as it expects the mutex to prevent additional uses/mappings of the page. There seems to be another potential COW issue/race with this approach of different private and shared keys as notes in commit 8382d914ebf7 ("mm, hugetlb: improve page-fault scalability"). Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- fs/hugetlbfs/inode.c | 7 ++----- include/linux/hugetlb.h | 4 +--- mm/hugetlb.c | 22 ++++++---------------- mm/userfaultfd.c | 3 +-- 4 files changed, 10 insertions(+), 26 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ec32fece5e1e..6189ba80b57b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -440,9 +440,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, u32 hash; index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, - &pseudo_vma, - mapping, index, 0); + hash = hugetlb_fault_mutex_hash(h, mapping, index, 0); mutex_lock(&hugetlb_fault_mutex_table[hash]); /* @@ -639,8 +637,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, addr = index * hpage_size; /* mutex taken here, fault path and hole punch */ - hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, - index, addr); + hash = hugetlb_fault_mutex_hash(h, mapping, index, addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); /* See if already present in mapping to avoid alloc/free */ diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ea35263eb76b..3bc0d02649fe 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -123,9 +123,7 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason); void free_huge_page(struct page *page); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address); pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8651d6a602f9..4409a87434f1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3837,8 +3837,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * handling userfault. Reacquire after handling * fault to make calling code simpler. */ - hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, - idx, haddr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, haddr); mutex_unlock(&hugetlb_fault_mutex_table[hash]); ret = handle_userfault(&vmf, VM_UFFD_MISSING); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -3946,21 +3945,14 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } #ifdef CONFIG_SMP -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address) { unsigned long key[2]; u32 hash; - if (vma->vm_flags & VM_SHARED) { - key[0] = (unsigned long) mapping; - key[1] = idx; - } else { - key[0] = (unsigned long) mm; - key[1] = address >> huge_page_shift(h); - } + key[0] = (unsigned long) mapping; + key[1] = idx; hash = jhash2((u32 *)&key, sizeof(key)/sizeof(u32), 0); @@ -3971,9 +3963,7 @@ u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, * For uniprocesor systems we always use a single mutex, so just * return 0 and avoid the hashing overhead. */ -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address) { return 0; @@ -4018,7 +4008,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, idx, haddr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, haddr); mutex_lock(&hugetlb_fault_mutex_table[hash]); entry = huge_ptep_get(ptep); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index d59b5a73dfb3..9932d5755e4c 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -271,8 +271,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; - hash = hugetlb_fault_mutex_hash(h, dst_mm, dst_vma, mapping, - idx, dst_addr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, dst_addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); err = -ENOMEM; -- 2.20.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings 2019-03-28 23:47 ` [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz @ 2019-04-11 18:32 ` Mike Kravetz 2019-04-12 16:52 ` Davidlohr Bueso 0 siblings, 1 reply; 8+ messages in thread From: Mike Kravetz @ 2019-04-11 18:32 UTC (permalink / raw) To: linux-kernel, linux-mm, Davidlohr Bueso, Andrew Morton Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov On 3/28/19 4:47 PM, Mike Kravetz wrote: > hugetlb uses a fault mutex hash table to prevent page faults of the > same pages concurrently. The key for shared and private mappings is > different. Shared keys off address_space and file index. Private > keys off mm and virtual address. Consider a private mappings of a > populated hugetlbfs file. A write fault will first map the page from > the file and then do a COW to map a writable page. Davidlohr suggested adding the stack trace to the commit log. When I originally 'discovered' this issue I was debugging something else. The routine remove_inode_hugepages() contains the following: * ... * This race can only happen in the hole punch case. * Getting here in a truncate operation is a bug. */ if (unlikely(page_mapped(page))) { BUG_ON(truncate_op); i_mmap_lock_write(mapping); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), (index + 1) * pages_per_huge_page(h)); i_mmap_unlock_write(mapping); } lock_page(page); /* * We must free the huge page and remove from page * ... */ VM_BUG_ON(PagePrivate(page)); remove_huge_page(page); freed++; I observed that the page could be mapped (again) before the call to lock_page if we raced with a private write fault. However, for COW faults the faulting code is holding the page lock until it unmaps the file page. Hence, we will not call remove_huge_page() with the page mapped. That is good. However, for simple read faults the page remains mapped after releasing the page lock and we can call remove_huge_page with a mapped page and BUG. Sorry, the original commit message was not completely accurate in describing the issue. I was basing the change on behavior experienced during debug of a another issue. Actually, it is MUCH easier to BUG by making private read faults race with hole punch. As a result, I now think this should go to stable. Andrew, below is an updated commit message. No changes to code. Would you like me to send an updated patch? Also, need to add stable. hugetlb uses a fault mutex hash table to prevent page faults of the same pages concurrently. The key for shared and private mappings is different. Shared keys off address_space and file index. Private keys off mm and virtual address. Consider a private mappings of a populated hugetlbfs file. A fault will map the page from the file and if needed do a COW to map a writable page. Hugetlbfs hole punch uses the fault mutex to prevent mappings of file pages. It uses the address_space file index key. However, private mappings will use a different key and could race with this code to map the file page. This causes problems (BUG) for the page cache remove code as it expects the page to be unmapped. A sample stack is: page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) kernel BUG at mm/filemap.c:169! ... RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 ... Call Trace: __delete_from_page_cache+0x39/0x220 delete_from_page_cache+0x45/0x70 remove_inode_hugepages+0x13c/0x380 ? __add_to_page_cache_locked+0x162/0x380 hugetlbfs_fallocate+0x403/0x540 ? _cond_resched+0x15/0x30 ? __inode_security_revalidate+0x5d/0x70 ? selinux_file_permission+0x100/0x130 vfs_fallocate+0x13f/0x270 ksys_fallocate+0x3c/0x80 __x64_sys_fallocate+0x1a/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 There seems to be another potential COW issue/race with this approach of different private and shared keys as noted in commit 8382d914ebf7 ("mm, hugetlb: improve page-fault scalability"). Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case. Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Cc: <stable@vger.kernel.org> -- Mike Kravetz ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings 2019-04-11 18:32 ` Mike Kravetz @ 2019-04-12 16:52 ` Davidlohr Bueso 0 siblings, 0 replies; 8+ messages in thread From: Davidlohr Bueso @ 2019-04-12 16:52 UTC (permalink / raw) To: Mike Kravetz Cc: linux-kernel, linux-mm, Andrew Morton, Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov On Thu, 11 Apr 2019, Mike Kravetz wrote: >On 3/28/19 4:47 PM, Mike Kravetz wrote: >> hugetlb uses a fault mutex hash table to prevent page faults of the >> same pages concurrently. The key for shared and private mappings is >> different. Shared keys off address_space and file index. Private >> keys off mm and virtual address. Consider a private mappings of a >> populated hugetlbfs file. A write fault will first map the page from >> the file and then do a COW to map a writable page. > >Davidlohr suggested adding the stack trace to the commit log. When I >originally 'discovered' this issue I was debugging something else. The >routine remove_inode_hugepages() contains the following: > > * ... > * This race can only happen in the hole punch case. > * Getting here in a truncate operation is a bug. > */ > if (unlikely(page_mapped(page))) { > BUG_ON(truncate_op); > > i_mmap_lock_write(mapping); > hugetlb_vmdelete_list(&mapping->i_mmap, > index * pages_per_huge_page(h), > (index + 1) * pages_per_huge_page(h)); > i_mmap_unlock_write(mapping); > } > > lock_page(page); > /* > * We must free the huge page and remove from page > * ... > */ > VM_BUG_ON(PagePrivate(page)); > remove_huge_page(page); > freed++; > >I observed that the page could be mapped (again) before the call to lock_page >if we raced with a private write fault. However, for COW faults the faulting >code is holding the page lock until it unmaps the file page. Hence, we will >not call remove_huge_page() with the page mapped. That is good. However, for >simple read faults the page remains mapped after releasing the page lock and >we can call remove_huge_page with a mapped page and BUG. > >Sorry, the original commit message was not completely accurate in describing >the issue. I was basing the change on behavior experienced during debug of >a another issue. Actually, it is MUCH easier to BUG by making private read >faults race with hole punch. As a result, I now think this should go to >stable. > >Andrew, below is an updated commit message. No changes to code. Would you >like me to send an updated patch? Also, need to add stable. > >hugetlb uses a fault mutex hash table to prevent page faults of the >same pages concurrently. The key for shared and private mappings is >different. Shared keys off address_space and file index. Private >keys off mm and virtual address. Consider a private mappings of a >populated hugetlbfs file. A fault will map the page from the file >and if needed do a COW to map a writable page. > >Hugetlbfs hole punch uses the fault mutex to prevent mappings of file >pages. It uses the address_space file index key. However, private >mappings will use a different key and could race with this code to map >the file page. This causes problems (BUG) for the page cache remove >code as it expects the page to be unmapped. A sample stack is: > >page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) >kernel BUG at mm/filemap.c:169! >... >RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 >... >Call Trace: >__delete_from_page_cache+0x39/0x220 >delete_from_page_cache+0x45/0x70 >remove_inode_hugepages+0x13c/0x380 >? __add_to_page_cache_locked+0x162/0x380 >hugetlbfs_fallocate+0x403/0x540 >? _cond_resched+0x15/0x30 >? __inode_security_revalidate+0x5d/0x70 >? selinux_file_permission+0x100/0x130 >vfs_fallocate+0x13f/0x270 >ksys_fallocate+0x3c/0x80 >__x64_sys_fallocate+0x1a/0x20 >do_syscall_64+0x5b/0x180 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >There seems to be another potential COW issue/race with this approach >of different private and shared keys as noted in commit 8382d914ebf7 >("mm, hugetlb: improve page-fault scalability"). > >Since every hugetlb mapping (even anon and private) is actually a file >mapping, just use the address_space index key for all mappings. This >results in potentially more hash collisions. However, this should not >be the common case. This is fair enough as most mappings will be shared anyway (it would be lovely to have some machinery to measure collisions in kernel hash tables, in general). >Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Ok the issue was introduced after we had the mutex table. >Cc: <stable@vger.kernel.org> Thanks for the details, I'm definitely seeing the idx mismatch issue now. Reviewed-by: Davidlohr Bueso <dbueso@suse.de> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/2] A couple hugetlbfs fixes 2019-03-28 23:47 [PATCH v2 0/2] A couple hugetlbfs fixes Mike Kravetz 2019-03-28 23:47 ` [PATCH v2 1/2] huegtlbfs: on restore reserve error path retain subpool reservation Mike Kravetz 2019-03-28 23:47 ` [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz @ 2019-04-03 5:29 ` Naoya Horiguchi 2019-04-08 19:48 ` Davidlohr Bueso 3 siblings, 0 replies; 8+ messages in thread From: Naoya Horiguchi @ 2019-04-03 5:29 UTC (permalink / raw) To: Mike Kravetz Cc: linux-kernel, linux-mm, Davidlohr Bueso, Joonsoo Kim, Michal Hocko, Kirill A . Shutemov, Andrew Morton On Thu, Mar 28, 2019 at 04:47:02PM -0700, Mike Kravetz wrote: > I stumbled on these two hugetlbfs issues while looking at other things: > - The 'restore reserve' functionality at page free time should not > be adjusting subpool counts. > - A BUG can be triggered (not easily) due to temporarily mapping a > page before doing a COW. > > Both are described in detail in the commit message of the patches. > I would appreciate comments from Davidlohr Bueso as one patch is > directly related to code he added in commit 8382d914ebf7. > > I did not cc stable as the first problem has been around since reserves > were added to hugetlbfs and nobody has noticed. The second is very hard > to hit/reproduce. > > v2 - Update definition and all callers of hugetlb_fault_mutex_hash as > the arguments mm and vma are no longer used or necessary. > > Mike Kravetz (2): > huegtlbfs: on restore reserve error path retain subpool reservation > hugetlb: use same fault hash key for shared and private mappings > > fs/hugetlbfs/inode.c | 7 ++----- > include/linux/hugetlb.h | 4 +--- > mm/hugetlb.c | 43 +++++++++++++++++++++-------------------- > mm/userfaultfd.c | 3 +-- > 4 files changed, 26 insertions(+), 31 deletions(-) Both fixes look fine to me. Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/2] A couple hugetlbfs fixes 2019-03-28 23:47 [PATCH v2 0/2] A couple hugetlbfs fixes Mike Kravetz ` (2 preceding siblings ...) 2019-04-03 5:29 ` [PATCH v2 0/2] A couple hugetlbfs fixes Naoya Horiguchi @ 2019-04-08 19:48 ` Davidlohr Bueso 2019-04-09 3:30 ` Mike Kravetz 3 siblings, 1 reply; 8+ messages in thread From: Davidlohr Bueso @ 2019-04-08 19:48 UTC (permalink / raw) To: Mike Kravetz Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov, Andrew Morton On Thu, 28 Mar 2019, Mike Kravetz wrote: >- A BUG can be triggered (not easily) due to temporarily mapping a > page before doing a COW. But you actually _have_ seen it? Do you have the traces? I ask not because of the patches perse, but because it would be nice to have a real snipplet in the Changelog for patch 2. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/2] A couple hugetlbfs fixes 2019-04-08 19:48 ` Davidlohr Bueso @ 2019-04-09 3:30 ` Mike Kravetz 0 siblings, 0 replies; 8+ messages in thread From: Mike Kravetz @ 2019-04-09 3:30 UTC (permalink / raw) To: linux-kernel, linux-mm, Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov, Andrew Morton On 4/8/19 12:48 PM, Davidlohr Bueso wrote: > On Thu, 28 Mar 2019, Mike Kravetz wrote: > >> - A BUG can be triggered (not easily) due to temporarily mapping a >> page before doing a COW. > > But you actually _have_ seen it? Do you have the traces? I ask > not because of the patches perse, but because it would be nice > to have a real snipplet in the Changelog for patch 2. Yes, I actually saw this problem. It happened while I was debugging and testing some patches for hugetlb migration. The BUG I hit was in unaccount_page_cache_page(): VM_BUG_ON_PAGE(page_mapped(page), page). Stack trace was something like: unaccount_page_cache_page __delete_from_page_cache delete_from_page_cache remove_huge_page remove_inode_hugepages hugetlbfs_punch_hole hugetlbfs_fallocate When I hit that, it took me a while to figure out how it could happen. i.e. How could a page be mapped at that point in remove_inode_hugepages? It checks page_mapped and we are holding the fault mutex. With some additional debug code (strategic udelays) I could hit the issue on a somewhat regular basis and verified another thread was in the hugetlb_no_page/hugetlb_cow path for the same page at the same time. Unfortunately, I did not save the traces. I am trying to recreate now. However, my test system was recently updated and it might take a little time to recreate. -- Mike Kravetz ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-04-12 16:52 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-28 23:47 [PATCH v2 0/2] A couple hugetlbfs fixes Mike Kravetz 2019-03-28 23:47 ` [PATCH v2 1/2] huegtlbfs: on restore reserve error path retain subpool reservation Mike Kravetz 2019-03-28 23:47 ` [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz 2019-04-11 18:32 ` Mike Kravetz 2019-04-12 16:52 ` Davidlohr Bueso 2019-04-03 5:29 ` [PATCH v2 0/2] A couple hugetlbfs fixes Naoya Horiguchi 2019-04-08 19:48 ` Davidlohr Bueso 2019-04-09 3:30 ` Mike Kravetz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).