From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755545AbbCRPDA (ORCPT ); Wed, 18 Mar 2015 11:03:00 -0400 Received: from cantor2.suse.de ([195.135.220.15]:60475 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750916AbbCRPC7 (ORCPT ); Wed, 18 Mar 2015 11:02:59 -0400 Date: Wed, 18 Mar 2015 16:02:57 +0100 From: Michal Hocko To: Vlastimil Babka Cc: Andrew Morton , Johannes Weiner , linux-mm@kvack.org, LKML Subject: Re: [PATCH] mm, memcg: sync allocation and memcg charge gfp flags for THP Message-ID: <20150318150257.GL17241@dhcp22.suse.cz> References: <1426514892-7063-1-git-send-email-mhocko@suse.cz> <55098D0A.8090605@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55098D0A.8090605@suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 18-03-15 15:34:50, Vlastimil Babka wrote: > On 03/16/2015 03:08 PM, Michal Hocko wrote: > >memcg currently uses hardcoded GFP_TRANSHUGE gfp flags for all THP > >charges. THP allocations, however, might be using different flags > >depending on /sys/kernel/mm/transparent_hugepage/{,khugepaged/}defrag > >and the current allocation context. > > > >The primary difference is that defrag configured to "madvise" value will > >clear __GFP_WAIT flag from the core gfp mask to make the allocation > >lighter for all mappings which are not backed by VM_HUGEPAGE vmas. > >If memcg charge path ignores this fact we will get light allocation but > >the a potential memcg reclaim would kill the whole point of the > >configuration. > > > >Fix the mismatch by providing the same gfp mask used for the > >allocation to the charge functions. This is quite easy for all > >paths except for hugepaged kernel thread with !CONFIG_NUMA which is > >doing a pre-allocation long before the allocated page is used in > >collapse_huge_page via khugepaged_alloc_page. To prevent from cluttering > >the whole code path from khugepaged_do_scan we simply return the current > >flags as per khugepaged_defrag() value which might have changed since > >the preallocation. If somebody changed the value of the knob we would > >charge differently but this shouldn't happen often and it is definitely > >not critical because it would only lead to a reduced success rate of > >one-off THP promotion. > > > >Signed-off-by: Michal Hocko > > Acked-by: Vlastimil Babka Thanks! [...] > >@@ -1080,6 +1080,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > > unsigned long haddr; > > unsigned long mmun_start; /* For mmu_notifiers */ > > unsigned long mmun_end; /* For mmu_notifiers */ > >+ gfp_t huge_gfp = GFP_TRANSHUGE; /* for allocation and charge */ > > This value is actually never used. Is it here because the compiler emits a > spurious non-initialized value warning otherwise? It should be easy for it > to prove that setting new_page to something non-null implies initializing > huge_gfp (in the hunk below), and NULL new_page means it doesn't reach the > mem_cgroup_try_charge() call? No, I haven't tried to workaround the compiler. It just made the code more obvious to me. I can remove the initialization if you prefer, of course. > > ptl = pmd_lockptr(mm, pmd); > > VM_BUG_ON_VMA(!vma->anon_vma, vma); > >@@ -1106,10 +1107,8 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > > alloc: > > if (transparent_hugepage_enabled(vma) && > > !transparent_hugepage_debug_cow()) { > >- gfp_t gfp; > >- > >- gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0); > >- new_page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER); > >+ huge_gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0); > >+ new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PMD_ORDER); > > } else > > new_page = NULL; > > > >@@ -1131,7 +1130,7 @@ alloc: > > } > > > > if (unlikely(mem_cgroup_try_charge(new_page, mm, > >- GFP_TRANSHUGE, &memcg))) { > >+ huge_gfp, &memcg))) { > > put_page(new_page); > > if (page) { > > split_huge_page(page); > -- Michal Hocko SUSE Labs