On Thu, Aug 16, 2012 at 09:27:38PM +0200, Andrea Arcangeli wrote: > On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote: > > +static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd, > > + unsigned long address) > > +{ > > + pgtable_t pgtable; > > + pmd_t _pmd; > > + unsigned long haddr = address & HPAGE_PMD_MASK; > > + struct vm_area_struct *vma; > > + int i; > > + > > + vma = find_vma(mm, address); > > + VM_BUG_ON(vma == NULL); > > I think you can use BUG_ON here just in case but see below how I would > change it. > > > + pmdp_clear_flush_notify(vma, haddr, pmd); > > + /* leave pmd empty until pte is filled */ > > + > > + pgtable = get_pmd_huge_pte(mm); > > + pmd_populate(mm, &_pmd, pgtable); > > + > > + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { > > + pte_t *pte, entry; > > + entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot); > > + entry = pte_mkspecial(entry); > > + pte = pte_offset_map(&_pmd, haddr); > > + VM_BUG_ON(!pte_none(*pte)); > > + set_pte_at(mm, haddr, pte, entry); > > + pte_unmap(pte); > > + } > > + smp_wmb(); /* make pte visible before pmd */ > > + pmd_populate(mm, pmd, pgtable); > > +} > > + > > The last pmd_populate will corrupt memory. Nice catch, thank you. I've used do_huge_pmd_wp_page_fallback() as template for my code. What's difference between these two code paths? Why is do_huge_pmd_wp_page_fallback() safe? > > + if (is_huge_zero_pmd(*pmd)) { > > + __split_huge_zero_page_pmd(mm, pmd, address); > > This will work fine but it's a bit sad having to add "address" at > every call, just to run a find_vma(). The only place that doesn't have > a vma already on the caller stack is actually pagewalk, all other > places already have a vma on the stack without having to find it with > the rbtree. > > I think it may be better to change the param to > split_huge_page_pmd(vma, pmd). > > Then have standard split_huge_page_pmd obtain the mm with vma->vm_mm > (most callers already calles it with split_huge_page_pmd(vma->vm_mm) > so it won't alter the cost to do vma->vm_mm in caller or callee). > > split_huge_page_address also should take the vma (all callers are > invoking it as split_huge_page_address(vma->vm_mm) so it'll be zero > cost change). > > Then we can add a split_huge_page_pmd_mm(mm, address, pmd) or > split_huge_page_pmd_address(mm, address, pmd) (call it as you > prefer...) only for the pagewalk caller that will do the find_vma and > BUG_ON if it's not found. > > In that new split_huge_page_pmd_mm you can also add a BUG_ON checking > vma->vm_start to be <= haddr and vma->vm_end >= haddr+HPAGE_PMD_SIZE > in addition to BUG_ON(!vma) above, for more robustness. I'm not aware > of any place calling it without mmap_sem hold at least for reading > and the vma must be stable, but more debug checks won't hurt. Looks resonable. I'll update it in next revision. -- Kirill A. Shutemov