From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965087AbbESOhc (ORCPT ); Tue, 19 May 2015 10:37:32 -0400 Received: from cantor2.suse.de ([195.135.220.15]:36079 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932990AbbESOh3 (ORCPT ); Tue, 19 May 2015 10:37:29 -0400 Message-ID: <555B4AA5.7000504@suse.cz> Date: Tue, 19 May 2015 16:37:25 +0200 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: "Kirill A. Shutemov" CC: "Kirill A. Shutemov" , Andrew Morton , Andrea Arcangeli , Hugh Dickins , Dave Hansen , Mel Gorman , Rik van Riel , Christoph Lameter , Naoya Horiguchi , Steve Capper , "Aneesh Kumar K.V" , Johannes Weiner , Michal Hocko , Jerome Marchand , Sasha Levin , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv5 07/28] thp, mlock: do not allow huge pages in mlocked area References: <1429823043-157133-1-git-send-email-kirill.shutemov@linux.intel.com> <1429823043-157133-8-git-send-email-kirill.shutemov@linux.intel.com> <5555ED0A.5010702@suse.cz> <20150515134103.GC6625@node.dhcp.inet.fi> In-Reply-To: <20150515134103.GC6625@node.dhcp.inet.fi> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/15/2015 03:41 PM, Kirill A. Shutemov wrote: > On Fri, May 15, 2015 at 02:56:42PM +0200, Vlastimil Babka wrote: >> On 04/23/2015 11:03 PM, Kirill A. Shutemov wrote: >>> With new refcounting THP can belong to several VMAs. This makes tricky >>> to track THP pages, when they partially mlocked. It can lead to leaking >>> mlocked pages to non-VM_LOCKED vmas and other problems. >>> With this patch we will split all pages on mlock and avoid >>> fault-in/collapse new THP in VM_LOCKED vmas. >>> >>> I've tried alternative approach: do not mark THP pages mlocked and keep >>> them on normal LRUs. This way vmscan could try to split huge pages on >>> memory pressure and free up subpages which doesn't belong to VM_LOCKED >>> vmas. But this is user-visible change: we screw up Mlocked accouting >>> reported in meminfo, so I had to leave this approach aside. >>> >>> We can bring something better later, but this should be good enough for >>> now. >> >> I can imagine people won't be happy about losing benefits of THP's when they >> mlock(). >> How difficult would it be to support mlocked THP pages without splitting >> until something actually tries to do a partial (un)mapping, and only then do >> the split? That will support the most common case, no? > > Yes, it will. > > But what will we do if we fail to split huge page on munmap()? Fail > munmap() with -EBUSY? We could just unmlock the whole THP page and if we could make the deferred split done ASAP, and not waiting for memory pressure, the window with NR_MLOCK being undercounted would be minimized. Since the RLIMIT_MEMLOCK is tracked independently from NR_MLOCK, there should be no danger wrt breaching the limit due to undercounting here? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by kanga.kvack.org (Postfix) with ESMTP id 1DDB56B00BF for ; Tue, 19 May 2015 10:37:30 -0400 (EDT) Received: by wibt6 with SMTP id t6so25277683wib.0 for ; Tue, 19 May 2015 07:37:29 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id eq3si4847688wjd.142.2015.05.19.07.37.28 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 19 May 2015 07:37:28 -0700 (PDT) Message-ID: <555B4AA5.7000504@suse.cz> Date: Tue, 19 May 2015 16:37:25 +0200 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: [PATCHv5 07/28] thp, mlock: do not allow huge pages in mlocked area References: <1429823043-157133-1-git-send-email-kirill.shutemov@linux.intel.com> <1429823043-157133-8-git-send-email-kirill.shutemov@linux.intel.com> <5555ED0A.5010702@suse.cz> <20150515134103.GC6625@node.dhcp.inet.fi> In-Reply-To: <20150515134103.GC6625@node.dhcp.inet.fi> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: "Kirill A. Shutemov" , Andrew Morton , Andrea Arcangeli , Hugh Dickins , Dave Hansen , Mel Gorman , Rik van Riel , Christoph Lameter , Naoya Horiguchi , Steve Capper , "Aneesh Kumar K.V" , Johannes Weiner , Michal Hocko , Jerome Marchand , Sasha Levin , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 05/15/2015 03:41 PM, Kirill A. Shutemov wrote: > On Fri, May 15, 2015 at 02:56:42PM +0200, Vlastimil Babka wrote: >> On 04/23/2015 11:03 PM, Kirill A. Shutemov wrote: >>> With new refcounting THP can belong to several VMAs. This makes tricky >>> to track THP pages, when they partially mlocked. It can lead to leaking >>> mlocked pages to non-VM_LOCKED vmas and other problems. >>> With this patch we will split all pages on mlock and avoid >>> fault-in/collapse new THP in VM_LOCKED vmas. >>> >>> I've tried alternative approach: do not mark THP pages mlocked and keep >>> them on normal LRUs. This way vmscan could try to split huge pages on >>> memory pressure and free up subpages which doesn't belong to VM_LOCKED >>> vmas. But this is user-visible change: we screw up Mlocked accouting >>> reported in meminfo, so I had to leave this approach aside. >>> >>> We can bring something better later, but this should be good enough for >>> now. >> >> I can imagine people won't be happy about losing benefits of THP's when they >> mlock(). >> How difficult would it be to support mlocked THP pages without splitting >> until something actually tries to do a partial (un)mapping, and only then do >> the split? That will support the most common case, no? > > Yes, it will. > > But what will we do if we fail to split huge page on munmap()? Fail > munmap() with -EBUSY? We could just unmlock the whole THP page and if we could make the deferred split done ASAP, and not waiting for memory pressure, the window with NR_MLOCK being undercounted would be minimized. Since the RLIMIT_MEMLOCK is tracked independently from NR_MLOCK, there should be no danger wrt breaching the limit due to undercounting here? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org