From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 240B1C433DB for ; Tue, 9 Feb 2021 10:31:16 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B023364E15 for ; Tue, 9 Feb 2021 10:31:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B023364E15 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 55FF2100EB35A; Tue, 9 Feb 2021 02:31:13 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=216.205.24.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=david@redhat.com; receiver= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id BB880100EC1E9 for ; Tue, 9 Feb 2021 02:31:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612866669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l9NmyWsJuFC0/5JNGWNXSCR8ksDc6+RfyJhTiUXWtF4=; b=OZiBee4/hx+0WzQCZMLlGJRIFCiAD9X3KouUn1rReCunutViiArKTZnU/OexnZuFy204c/ LYzC+rWAwbXUxsPTNOAI49XW3mGoED+KCRwulMrjPNhrcCRxrehQrJ+7vIdt/eBaDEf9et AgkqIvHPCXaP23isZybVF6BdD42+aWk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-105-IIfTrpvSPlC2PvPZbsJlDQ-1; Tue, 09 Feb 2021 05:31:06 -0500 X-MC-Unique: IIfTrpvSPlC2PvPZbsJlDQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C379C107ACE4; Tue, 9 Feb 2021 10:31:01 +0000 (UTC) Received: from [10.36.113.141] (ovpn-113-141.ams2.redhat.com [10.36.113.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3559C17AE2; Tue, 9 Feb 2021 10:30:54 +0000 (UTC) Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas From: David Hildenbrand To: Michal Hocko References: <20210208211326.GV242749@kernel.org> <1F6A73CF-158A-4261-AA6C-1F5C77F4F326@redhat.com> <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> Organization: Red Hat GmbH Message-ID: Date: Tue, 9 Feb 2021 11:30:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Message-ID-Hash: 26RQ4MGKMZOX45YOGFSY46RXVF3PAUJL X-Message-ID-Hash: 26RQ4MGKMZOX45YOGFSY46RXVF3PAUJL X-MailFrom: david@redhat.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Ander sen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii"; format="flowed" Content-Transfer-Encoding: 7bit On 09.02.21 11:23, David Hildenbrand wrote: >>>> A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE. >>>> As I've said it is quite easy to land at the similar situation even with >>>> tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is >>>> really uncommon. It would be even worse that those would be allowed to >>>> consume both CMA/ZONE_MOVABLE. >>> >>> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory >>> a) Is movable, can land in ZONE_MOVABLE/CMA >>> b) Can be limited by sizing tmpfs appropriately >>> >>> AFAIK, what you describe is a problem with memory overcommit, not with zone >>> imbalances (below). Or what am I missing? >> >> It can be problem for both. If you have just too much of shm (do not >> forget about MAP_SHARED|MAP_ANON which is much harder to size from an >> admin POV) then migrateability doesn't really help because you need a >> free memory to migrate. Without reclaimability this can easily become a >> problem. That is why I am saying this is not really a new problem. >> Swapless systems are not all that uncommon. > > I get your point, it's similar but still different. "no memory in the > system" vs. "plenty of unusable free memory available in the system". > > In many setups, memory for user space applications can go to > ZONE_MOVABLE just fine. ZONE_NORMAL etc. can be used for supporting user > space memory (e.g., page tables) and other kernel stuff. > > Like, have 4GB of ZONE_MOVABLE with 2GB of ZONE_NORMAL. Have an > application (database) that allocates 4GB of memory. Works just fine. > The zone ratio ends up being a problem for example with many processes > (-> many page tables). > > Not being able to put user space memory into the movable zone is a > special case. And we are introducing yet another special case here > (besides vfio, rdma, unmigratable huge pages like gigantic pages). > > With plenty of secretmem, looking at /proc/meminfo Total vs. Free can be > a big lie of how your system behaves. > >> >>>> One has to be very careful when relying on CMA or movable zones. This is >>>> definitely worth a comment in the kernel command line parameter >>>> documentation. But this is not a new problem. >>> >>> I see the following thing worth documenting: >>> >>> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of >>> ZONE_MOVABLE/CMA. >>> >>> Assume you make use of 1.5GB of secretmem. Your system might run into OOM >>> any time although you still have plenty of memory on ZONE_MOVAVLE (and even >>> swap!), simply because you are making excessive use of unmovable allocations >>> (for user space!) in an environment where you should not make excessive use >>> of unmovable allocations (e.g., where should page tables go?). >> >> yes, you are right of course and I am not really disputing this. But I >> would argue that 2:1 Movable/Normal is something to expect problems >> already. "Lowmem" allocations can easily trigger OOM even without secret >> mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or >> even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the >> room and one has to be really careful when relying on them. > > Right, it's all about what the setup actually needs. Sure, there are > cases where you need significantly more GFP_KERNEL/GFP_{HIGH}USER such > that a 2:1 ratio is not feasible. But I claim that these are corner cases. > > Secretmem gives user space the option to allocate a lot of > GFP_{HIGH}USER memory. If I am not wrong, "ulimit -a" tells me that each > application on F33 can allocate 16 GiB (!) of secretmem. Got to learn to do my math. It's 16 MiB - so as a default it's less dangerous than I thought! -- Thanks, David / dhildenb _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 589A7C433E6 for ; Tue, 9 Feb 2021 10:35:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1483F64E4B for ; Tue, 9 Feb 2021 10:35:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231856AbhBIKfB (ORCPT ); Tue, 9 Feb 2021 05:35:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:46113 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231881AbhBIKcf (ORCPT ); Tue, 9 Feb 2021 05:32:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612866669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l9NmyWsJuFC0/5JNGWNXSCR8ksDc6+RfyJhTiUXWtF4=; b=OZiBee4/hx+0WzQCZMLlGJRIFCiAD9X3KouUn1rReCunutViiArKTZnU/OexnZuFy204c/ LYzC+rWAwbXUxsPTNOAI49XW3mGoED+KCRwulMrjPNhrcCRxrehQrJ+7vIdt/eBaDEf9et AgkqIvHPCXaP23isZybVF6BdD42+aWk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-105-IIfTrpvSPlC2PvPZbsJlDQ-1; Tue, 09 Feb 2021 05:31:06 -0500 X-MC-Unique: IIfTrpvSPlC2PvPZbsJlDQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C379C107ACE4; Tue, 9 Feb 2021 10:31:01 +0000 (UTC) Received: from [10.36.113.141] (ovpn-113-141.ams2.redhat.com [10.36.113.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3559C17AE2; Tue, 9 Feb 2021 10:30:54 +0000 (UTC) Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas From: David Hildenbrand To: Michal Hocko Cc: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org References: <20210208211326.GV242749@kernel.org> <1F6A73CF-158A-4261-AA6C-1F5C77F4F326@redhat.com> <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> Organization: Red Hat GmbH Message-ID: Date: Tue, 9 Feb 2021 11:30:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09.02.21 11:23, David Hildenbrand wrote: >>>> A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE. >>>> As I've said it is quite easy to land at the similar situation even with >>>> tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is >>>> really uncommon. It would be even worse that those would be allowed to >>>> consume both CMA/ZONE_MOVABLE. >>> >>> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory >>> a) Is movable, can land in ZONE_MOVABLE/CMA >>> b) Can be limited by sizing tmpfs appropriately >>> >>> AFAIK, what you describe is a problem with memory overcommit, not with zone >>> imbalances (below). Or what am I missing? >> >> It can be problem for both. If you have just too much of shm (do not >> forget about MAP_SHARED|MAP_ANON which is much harder to size from an >> admin POV) then migrateability doesn't really help because you need a >> free memory to migrate. Without reclaimability this can easily become a >> problem. That is why I am saying this is not really a new problem. >> Swapless systems are not all that uncommon. > > I get your point, it's similar but still different. "no memory in the > system" vs. "plenty of unusable free memory available in the system". > > In many setups, memory for user space applications can go to > ZONE_MOVABLE just fine. ZONE_NORMAL etc. can be used for supporting user > space memory (e.g., page tables) and other kernel stuff. > > Like, have 4GB of ZONE_MOVABLE with 2GB of ZONE_NORMAL. Have an > application (database) that allocates 4GB of memory. Works just fine. > The zone ratio ends up being a problem for example with many processes > (-> many page tables). > > Not being able to put user space memory into the movable zone is a > special case. And we are introducing yet another special case here > (besides vfio, rdma, unmigratable huge pages like gigantic pages). > > With plenty of secretmem, looking at /proc/meminfo Total vs. Free can be > a big lie of how your system behaves. > >> >>>> One has to be very careful when relying on CMA or movable zones. This is >>>> definitely worth a comment in the kernel command line parameter >>>> documentation. But this is not a new problem. >>> >>> I see the following thing worth documenting: >>> >>> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of >>> ZONE_MOVABLE/CMA. >>> >>> Assume you make use of 1.5GB of secretmem. Your system might run into OOM >>> any time although you still have plenty of memory on ZONE_MOVAVLE (and even >>> swap!), simply because you are making excessive use of unmovable allocations >>> (for user space!) in an environment where you should not make excessive use >>> of unmovable allocations (e.g., where should page tables go?). >> >> yes, you are right of course and I am not really disputing this. But I >> would argue that 2:1 Movable/Normal is something to expect problems >> already. "Lowmem" allocations can easily trigger OOM even without secret >> mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or >> even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the >> room and one has to be really careful when relying on them. > > Right, it's all about what the setup actually needs. Sure, there are > cases where you need significantly more GFP_KERNEL/GFP_{HIGH}USER such > that a 2:1 ratio is not feasible. But I claim that these are corner cases. > > Secretmem gives user space the option to allocate a lot of > GFP_{HIGH}USER memory. If I am not wrong, "ulimit -a" tells me that each > application on F33 can allocate 16 GiB (!) of secretmem. Got to learn to do my math. It's 16 MiB - so as a default it's less dangerous than I thought! -- Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11515C433DB for ; Tue, 9 Feb 2021 10:31:25 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 985C264D9D for ; Tue, 9 Feb 2021 10:31:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 985C264D9D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:References: To:From:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=C++utn6PitSdvTirhvgdoUHuG7LSKbAmHwV9Fr/JUCc=; b=IszLgR+2A5Kv4mKl3hjVnB0Lk kh8Vr5BmRplRyPzFOY0eUNY/etBg4R4DQEVeG0aSB1XndSh9FACqC4jeQmZPbCfpjFIc5MiFZFs2M VHbTOQ71A1WcTwsV0eA0C+0mzGrSowafWB/n0Tj+pfsvOstpChDPrnF/UoNwehIDgADwOPF+3Mmkq OrR9rQrsXNReslsGNee3ot60kiTy/mZABxdzlhhuGFOGd0BqTPVC5+VoskKmNrFr7VUKXaFyu4Qyo vAZJ1q8wb1y4T5XahRsPHR3n6GxLA0qoM1Plxm4fnLJYU1U4gR44g7Rb8DNjsEjJLX9WhiZlO8aXz teV6zBd2w==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9QId-0007h2-93; Tue, 09 Feb 2021 10:31:15 +0000 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9QIX-0007ea-MP for linux-riscv@lists.infradead.org; Tue, 09 Feb 2021 10:31:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612866669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l9NmyWsJuFC0/5JNGWNXSCR8ksDc6+RfyJhTiUXWtF4=; b=OZiBee4/hx+0WzQCZMLlGJRIFCiAD9X3KouUn1rReCunutViiArKTZnU/OexnZuFy204c/ LYzC+rWAwbXUxsPTNOAI49XW3mGoED+KCRwulMrjPNhrcCRxrehQrJ+7vIdt/eBaDEf9et AgkqIvHPCXaP23isZybVF6BdD42+aWk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-105-IIfTrpvSPlC2PvPZbsJlDQ-1; Tue, 09 Feb 2021 05:31:06 -0500 X-MC-Unique: IIfTrpvSPlC2PvPZbsJlDQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C379C107ACE4; Tue, 9 Feb 2021 10:31:01 +0000 (UTC) Received: from [10.36.113.141] (ovpn-113-141.ams2.redhat.com [10.36.113.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3559C17AE2; Tue, 9 Feb 2021 10:30:54 +0000 (UTC) Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas From: David Hildenbrand To: Michal Hocko References: <20210208211326.GV242749@kernel.org> <1F6A73CF-158A-4261-AA6C-1F5C77F4F326@redhat.com> <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> Organization: Red Hat GmbH Message-ID: Date: Tue, 9 Feb 2021 11:30:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210209_053109_812033_9CE74A9A X-CRM114-Status: GOOD ( 30.16 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Arnd Bergmann , James Bottomley , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin , Mike Rapoport Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On 09.02.21 11:23, David Hildenbrand wrote: >>>> A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE. >>>> As I've said it is quite easy to land at the similar situation even with >>>> tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is >>>> really uncommon. It would be even worse that those would be allowed to >>>> consume both CMA/ZONE_MOVABLE. >>> >>> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory >>> a) Is movable, can land in ZONE_MOVABLE/CMA >>> b) Can be limited by sizing tmpfs appropriately >>> >>> AFAIK, what you describe is a problem with memory overcommit, not with zone >>> imbalances (below). Or what am I missing? >> >> It can be problem for both. If you have just too much of shm (do not >> forget about MAP_SHARED|MAP_ANON which is much harder to size from an >> admin POV) then migrateability doesn't really help because you need a >> free memory to migrate. Without reclaimability this can easily become a >> problem. That is why I am saying this is not really a new problem. >> Swapless systems are not all that uncommon. > > I get your point, it's similar but still different. "no memory in the > system" vs. "plenty of unusable free memory available in the system". > > In many setups, memory for user space applications can go to > ZONE_MOVABLE just fine. ZONE_NORMAL etc. can be used for supporting user > space memory (e.g., page tables) and other kernel stuff. > > Like, have 4GB of ZONE_MOVABLE with 2GB of ZONE_NORMAL. Have an > application (database) that allocates 4GB of memory. Works just fine. > The zone ratio ends up being a problem for example with many processes > (-> many page tables). > > Not being able to put user space memory into the movable zone is a > special case. And we are introducing yet another special case here > (besides vfio, rdma, unmigratable huge pages like gigantic pages). > > With plenty of secretmem, looking at /proc/meminfo Total vs. Free can be > a big lie of how your system behaves. > >> >>>> One has to be very careful when relying on CMA or movable zones. This is >>>> definitely worth a comment in the kernel command line parameter >>>> documentation. But this is not a new problem. >>> >>> I see the following thing worth documenting: >>> >>> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of >>> ZONE_MOVABLE/CMA. >>> >>> Assume you make use of 1.5GB of secretmem. Your system might run into OOM >>> any time although you still have plenty of memory on ZONE_MOVAVLE (and even >>> swap!), simply because you are making excessive use of unmovable allocations >>> (for user space!) in an environment where you should not make excessive use >>> of unmovable allocations (e.g., where should page tables go?). >> >> yes, you are right of course and I am not really disputing this. But I >> would argue that 2:1 Movable/Normal is something to expect problems >> already. "Lowmem" allocations can easily trigger OOM even without secret >> mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or >> even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the >> room and one has to be really careful when relying on them. > > Right, it's all about what the setup actually needs. Sure, there are > cases where you need significantly more GFP_KERNEL/GFP_{HIGH}USER such > that a 2:1 ratio is not feasible. But I claim that these are corner cases. > > Secretmem gives user space the option to allocate a lot of > GFP_{HIGH}USER memory. If I am not wrong, "ulimit -a" tells me that each > application on F33 can allocate 16 GiB (!) of secretmem. Got to learn to do my math. It's 16 MiB - so as a default it's less dangerous than I thought! -- Thanks, David / dhildenb _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39C10C433E0 for ; Tue, 9 Feb 2021 10:32:23 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C8E9364E4B for ; Tue, 9 Feb 2021 10:32:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8E9364E4B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:References: To:From:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=cZ0eBlqPzjfbJq/xlUxflRpunhNUHI/VqQYCZ9szr/8=; b=X2IJsLhqyvsPzCk9NTIpFcbkp GUfOE1ZIRK4kPNZIRzBOHdnnsbQ08g9CuP/HX6lB/e0kjheB4i8mYUc++BhgeJxhtvOQNkx37xB30 AQeppHV7ep8UiqXN7TMN6wvpAPGkRyUV8C1djM5s3W0J4AASU/QIbJsNLreysklrBamqk2o7bmqAB 30GDhIqguiQtwRmgFAP3MfNGqLwJp3b5R/fHfMboQAU8a06BJfCmC65Q026gJ95jav7uiw2tX5n/W naTdMkYk5sJei1F2k7k73aR8wovl3kXOkUPaxCE7RUp6f0rYP4+isTtt8WQcaXyU+IatWqbTOAG8N K8QxdLs6A==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9QIa-0007gG-RC; Tue, 09 Feb 2021 10:31:12 +0000 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9QIX-0007eZ-MJ for linux-arm-kernel@lists.infradead.org; Tue, 09 Feb 2021 10:31:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612866669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l9NmyWsJuFC0/5JNGWNXSCR8ksDc6+RfyJhTiUXWtF4=; b=OZiBee4/hx+0WzQCZMLlGJRIFCiAD9X3KouUn1rReCunutViiArKTZnU/OexnZuFy204c/ LYzC+rWAwbXUxsPTNOAI49XW3mGoED+KCRwulMrjPNhrcCRxrehQrJ+7vIdt/eBaDEf9et AgkqIvHPCXaP23isZybVF6BdD42+aWk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-105-IIfTrpvSPlC2PvPZbsJlDQ-1; Tue, 09 Feb 2021 05:31:06 -0500 X-MC-Unique: IIfTrpvSPlC2PvPZbsJlDQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C379C107ACE4; Tue, 9 Feb 2021 10:31:01 +0000 (UTC) Received: from [10.36.113.141] (ovpn-113-141.ams2.redhat.com [10.36.113.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3559C17AE2; Tue, 9 Feb 2021 10:30:54 +0000 (UTC) Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas From: David Hildenbrand To: Michal Hocko References: <20210208211326.GV242749@kernel.org> <1F6A73CF-158A-4261-AA6C-1F5C77F4F326@redhat.com> <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> Organization: Red Hat GmbH Message-ID: Date: Tue, 9 Feb 2021 11:30:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210209_053109_803852_B47CA168 X-CRM114-Status: GOOD ( 31.15 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Arnd Bergmann , James Bottomley , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin , Mike Rapoport Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 09.02.21 11:23, David Hildenbrand wrote: >>>> A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE. >>>> As I've said it is quite easy to land at the similar situation even with >>>> tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is >>>> really uncommon. It would be even worse that those would be allowed to >>>> consume both CMA/ZONE_MOVABLE. >>> >>> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory >>> a) Is movable, can land in ZONE_MOVABLE/CMA >>> b) Can be limited by sizing tmpfs appropriately >>> >>> AFAIK, what you describe is a problem with memory overcommit, not with zone >>> imbalances (below). Or what am I missing? >> >> It can be problem for both. If you have just too much of shm (do not >> forget about MAP_SHARED|MAP_ANON which is much harder to size from an >> admin POV) then migrateability doesn't really help because you need a >> free memory to migrate. Without reclaimability this can easily become a >> problem. That is why I am saying this is not really a new problem. >> Swapless systems are not all that uncommon. > > I get your point, it's similar but still different. "no memory in the > system" vs. "plenty of unusable free memory available in the system". > > In many setups, memory for user space applications can go to > ZONE_MOVABLE just fine. ZONE_NORMAL etc. can be used for supporting user > space memory (e.g., page tables) and other kernel stuff. > > Like, have 4GB of ZONE_MOVABLE with 2GB of ZONE_NORMAL. Have an > application (database) that allocates 4GB of memory. Works just fine. > The zone ratio ends up being a problem for example with many processes > (-> many page tables). > > Not being able to put user space memory into the movable zone is a > special case. And we are introducing yet another special case here > (besides vfio, rdma, unmigratable huge pages like gigantic pages). > > With plenty of secretmem, looking at /proc/meminfo Total vs. Free can be > a big lie of how your system behaves. > >> >>>> One has to be very careful when relying on CMA or movable zones. This is >>>> definitely worth a comment in the kernel command line parameter >>>> documentation. But this is not a new problem. >>> >>> I see the following thing worth documenting: >>> >>> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of >>> ZONE_MOVABLE/CMA. >>> >>> Assume you make use of 1.5GB of secretmem. Your system might run into OOM >>> any time although you still have plenty of memory on ZONE_MOVAVLE (and even >>> swap!), simply because you are making excessive use of unmovable allocations >>> (for user space!) in an environment where you should not make excessive use >>> of unmovable allocations (e.g., where should page tables go?). >> >> yes, you are right of course and I am not really disputing this. But I >> would argue that 2:1 Movable/Normal is something to expect problems >> already. "Lowmem" allocations can easily trigger OOM even without secret >> mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or >> even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the >> room and one has to be really careful when relying on them. > > Right, it's all about what the setup actually needs. Sure, there are > cases where you need significantly more GFP_KERNEL/GFP_{HIGH}USER such > that a 2:1 ratio is not feasible. But I claim that these are corner cases. > > Secretmem gives user space the option to allocate a lot of > GFP_{HIGH}USER memory. If I am not wrong, "ulimit -a" tells me that each > application on F33 can allocate 16 GiB (!) of secretmem. Got to learn to do my math. It's 16 MiB - so as a default it's less dangerous than I thought! -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel