From patchwork Tue Nov 24 19:49:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 1344864 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47D88C64E75 for ; Tue, 24 Nov 2020 19:50:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 197AA206CA for ; Tue, 24 Nov 2020 19:50:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728392AbgKXTtq (ORCPT ); Tue, 24 Nov 2020 14:49:46 -0500 Received: from shelob.surriel.com ([96.67.55.147]:36752 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728285AbgKXTtl (ORCPT ); Tue, 24 Nov 2020 14:49:41 -0500 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1kheJc-0006wM-1r; Tue, 24 Nov 2020 14:49:28 -0500 From: Rik van Riel To: hughd@google.com Cc: xuyu@linux.alibaba.com, akpm@linux-foundation.org, mgorman@suse.de, aarcange@redhat.com, willy@infradead.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, vbabka@suse.cz, mhocko@suse.com, Rik van Riel Subject: [PATCH 1/3] mm,thp,shmem: limit shmem THP alloc gfp_mask Date: Tue, 24 Nov 2020 14:49:23 -0500 Message-Id: <20201124194925.623931-2-riel@surriel.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201124194925.623931-1-riel@surriel.com> References: <20201124194925.623931-1-riel@surriel.com> MIME-Version: 1.0 Sender: riel@shelob.surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. Controlling the gfp_mask of THP allocations through the knobs in sysfs allows users to determine the balance between how aggressively the system tries to allocate THPs at fault time, and how much the application may end up stalling attempting those allocations. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. Signed-off-by: Rik van Riel Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- include/linux/gfp.h | 2 ++ mm/huge_memory.c | 6 +++--- mm/shmem.c | 8 +++++--- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index c603237e006c..c7615c9ba03c 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -614,6 +614,8 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask); extern void pm_restrict_gfp_mask(void); extern void pm_restore_gfp_mask(void); +extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); + #ifdef CONFIG_PM_SLEEP extern bool pm_suspended_storage(void); #else diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9474dbc150ed..c5d03b2f2f2f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -649,9 +649,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma) { - const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); + const bool vma_madvised = vma && (vma->vm_flags & VM_HUGEPAGE); /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) @@ -744,7 +744,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) pte_free(vma->vm_mm, pgtable); return ret; } - gfp = alloc_hugepage_direct_gfpmask(vma); + gfp = vma_thp_gfp_mask(vma); page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER); if (unlikely(!page)) { count_vm_event(THP_FAULT_FALLBACK); diff --git a/mm/shmem.c b/mm/shmem.c index 537c137698f8..6c3cb192a88d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1545,8 +1545,8 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp, return NULL; shmem_pseudo_vma_init(&pvma, info, hindex); - page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN, - HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true); + page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), + true); shmem_pseudo_vma_destroy(&pvma); if (page) prep_transhuge_page(page); @@ -1802,6 +1802,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, struct page *page; enum sgp_type sgp_huge = sgp; pgoff_t hindex = index; + gfp_t huge_gfp; int error; int once = 0; int alloced = 0; @@ -1887,7 +1888,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, } alloc_huge: - page = shmem_alloc_and_acct_page(gfp, inode, index, true); + huge_gfp = vma_thp_gfp_mask(vma); + page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true); if (IS_ERR(page)) { alloc_nohuge: page = shmem_alloc_and_acct_page(gfp, inode, From patchwork Tue Nov 24 19:49:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 1344865 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35182C63777 for ; Tue, 24 Nov 2020 19:50:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3BA3208CA for ; Tue, 24 Nov 2020 19:50:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728355AbgKXTtp (ORCPT ); Tue, 24 Nov 2020 14:49:45 -0500 Received: from shelob.surriel.com ([96.67.55.147]:36790 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726447AbgKXTtl (ORCPT ); Tue, 24 Nov 2020 14:49:41 -0500 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1kheJc-0006wM-3G; Tue, 24 Nov 2020 14:49:28 -0500 From: Rik van Riel To: hughd@google.com Cc: xuyu@linux.alibaba.com, akpm@linux-foundation.org, mgorman@suse.de, aarcange@redhat.com, willy@infradead.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, vbabka@suse.cz, mhocko@suse.com, Rik van Riel Subject: [PATCH 2/3] mm,thp,shm: limit gfp mask to no more than specified Date: Tue, 24 Nov 2020 14:49:24 -0500 Message-Id: <20201124194925.623931-3-riel@surriel.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201124194925.623931-1-riel@surriel.com> References: <20201124194925.623931-1-riel@surriel.com> MIME-Version: 1.0 Sender: riel@shelob.surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Matthew Wilcox pointed out that the i915 driver opportunistically allocates tmpfs memory, but will happily reclaim some of its pool if no memory is available. Make sure the gfp mask used to opportunistically allocate a THP is always at least as restrictive as the original gfp mask. Signed-off-by: Rik van Riel Suggested-by: Matthew Wilcox --- mm/shmem.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 6c3cb192a88d..ee3cea10c2a4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1531,6 +1531,26 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, return page; } +/* + * Make sure huge_gfp is always more limited than limit_gfp. + * Some of the flags set permissions, while others set limitations. + */ +static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp) +{ + gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM; + gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY; + gfp_t result = huge_gfp & ~allowflags; + + /* + * Minimize the result gfp by taking the union with the deny flags, + * and the intersection of the allow flags. + */ + result |= (limit_gfp & denyflags); + result |= (huge_gfp & limit_gfp) & allowflags; + + return result; +} + static struct page *shmem_alloc_hugepage(gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { @@ -1889,6 +1909,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, alloc_huge: huge_gfp = vma_thp_gfp_mask(vma); + huge_gfp = limit_gfp_mask(huge_gfp, gfp); page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true); if (IS_ERR(page)) { alloc_nohuge: From patchwork Tue Nov 24 19:49:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 1344863 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C1A0C2D0E4 for ; Tue, 24 Nov 2020 19:50:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B89972087C for ; Tue, 24 Nov 2020 19:50:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728316AbgKXTtn (ORCPT ); Tue, 24 Nov 2020 14:49:43 -0500 Received: from shelob.surriel.com ([96.67.55.147]:36726 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727849AbgKXTtk (ORCPT ); Tue, 24 Nov 2020 14:49:40 -0500 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1kheJc-0006wM-4h; Tue, 24 Nov 2020 14:49:28 -0500 From: Rik van Riel To: hughd@google.com Cc: xuyu@linux.alibaba.com, akpm@linux-foundation.org, mgorman@suse.de, aarcange@redhat.com, willy@infradead.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, vbabka@suse.cz, mhocko@suse.com, Rik van Riel Subject: [PATCH 3/3] mm,thp,shmem: make khugepaged obey tmpfs mount flags Date: Tue, 24 Nov 2020 14:49:25 -0500 Message-Id: <20201124194925.623931-4-riel@surriel.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201124194925.623931-1-riel@surriel.com> References: <20201124194925.623931-1-riel@surriel.com> MIME-Version: 1.0 Sender: riel@shelob.surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently if thp enabled=[madvise], mounting a tmpfs filesystem with huge=always and mmapping files from that tmpfs does not result in khugepaged collapsing those mappings, despite the mount flag indicating that it should. Fix that by breaking up the blocks of tests in hugepage_vma_check a little bit, and testing things in the correct order. Signed-off-by: Rik van Riel Fixes: c2231020ea7b ("mm: thp: register mm for khugepaged when merging vma for shmem") --- include/linux/khugepaged.h | 2 ++ mm/khugepaged.c | 22 ++++++++++++++++------ 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index c941b7377321..2fcc01891b47 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -3,6 +3,7 @@ #define _LINUX_KHUGEPAGED_H #include /* MMF_VM_HUGEPAGE */ +#include #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -57,6 +58,7 @@ static inline int khugepaged_enter(struct vm_area_struct *vma, { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags)) if ((khugepaged_always() || + (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) || (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) && !(vm_flags & VM_NOHUGEPAGE) && !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4e3dff13eb70..abab394c4206 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -440,18 +440,28 @@ static inline int khugepaged_test_exit(struct mm_struct *mm) static bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags) { - if ((!(vm_flags & VM_HUGEPAGE) && !khugepaged_always()) || - (vm_flags & VM_NOHUGEPAGE) || + /* Explicitly disabled through madvise. */ + if ((vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) return false; - if (shmem_file(vma->vm_file) || - (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && - vma->vm_file && - (vm_flags & VM_DENYWRITE))) { + /* Enabled via shmem mount options or sysfs settings. */ + if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) { return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, HPAGE_PMD_NR); } + + /* THP settings require madvise. */ + if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always()) + return false; + + /* Read-only file mappings need to be aligned for THP to work. */ + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file && + (vm_flags & VM_DENYWRITE)) { + return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, + HPAGE_PMD_NR); + } + if (!vma->anon_vma || vma->vm_ops) return false; if (vma_is_temporary_stack(vma)) From patchwork Wed Feb 24 17:10:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 1384024 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C414C433DB for ; Wed, 24 Feb 2021 17:11:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 43D3D64F09 for ; Wed, 24 Feb 2021 17:11:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233152AbhBXRLJ (ORCPT ); Wed, 24 Feb 2021 12:11:09 -0500 Received: from shelob.surriel.com ([96.67.55.147]:50078 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231439AbhBXRLF (ORCPT ); Wed, 24 Feb 2021 12:11:05 -0500 Received: from [2603:3005:d05:2b00:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lExg2-0000fq-F3; Wed, 24 Feb 2021 12:10:18 -0500 Date: Wed, 24 Feb 2021 12:10:16 -0500 From: Rik van Riel To: Hugh Dickins Cc: Vlastimil Babka , Andrew Morton , xuyu@linux.alibaba.com, mgorman@suse.de, aarcange@redhat.com, willy@infradead.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, mhocko@suse.com Subject: [PATCH 4/3] mm,shmem,thp: limit shmem THP allocations to requested zones Message-ID: <20210224121016.1314ed6d@imladris.surriel.com> In-Reply-To: References: <20201124194925.623931-1-riel@surriel.com> X-Mailer: Claws Mail 3.17.6 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Sender: riel@shelob.surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 24 Feb 2021 08:55:40 -0800 (PST) Hugh Dickins wrote: > On Wed, 24 Feb 2021, Rik van Riel wrote: > > On Wed, 2021-02-24 at 00:41 -0800, Hugh Dickins wrote: > > > Oh, I'd forgotten all about that gma500 aspect: > > > well, I can send a fixup later on. > > > > I already have code to fix that, which somebody earlier > > in this discussion convinced me to throw away. Want me > > to send it as a patch 4/3 ? > > If Andrew wants it all, yes, please do add that - thanks Rik. Trivial patch to fix the gma500 thing below: ---8<--- mm,shmem,thp: limit shmem THP allocations to requested zones Hugh pointed out that the gma500 driver uses shmem pages, but needs to limit them to the DMA32 zone. Ensure the allocations resulting from the gfp_mask returned by limit_gfp_mask use the zone flags that were originally passed to shmem_getpage_gfp. Signed-off-by: Rik van Riel Suggested-by: Hugh Dickins Acked-by: Vlastimil Babka --- mm/shmem.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index ee3cea10c2a4..876fec89686f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1539,7 +1539,11 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp) { gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM; gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY; - gfp_t result = huge_gfp & ~allowflags; + gfp_t zoneflags = limit_gfp & GFP_ZONEMASK; + gfp_t result = huge_gfp & ~(allowflags | GFP_ZONEMASK); + + /* Allow allocations only from the originally specified zones. */ + result |= zoneflags; /* * Minimize the result gfp by taking the union with the deny flags,