From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43CB9C433DB for ; Tue, 9 Feb 2021 09:15:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD84464EBA for ; Tue, 9 Feb 2021 09:15:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD84464EBA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 07E236B0005; Tue, 9 Feb 2021 04:15:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 006556B006C; Tue, 9 Feb 2021 04:15:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC3A76B006E; Tue, 9 Feb 2021 04:15:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0139.hostedemail.com [216.40.44.139]) by kanga.kvack.org (Postfix) with ESMTP id BFB896B0005 for ; Tue, 9 Feb 2021 04:15:40 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7DA8F824805A for ; Tue, 9 Feb 2021 09:15:40 +0000 (UTC) X-FDA: 77798171640.19.tank58_07130a927606 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 55F3B1ACC2B for ; Tue, 9 Feb 2021 09:15:40 +0000 (UTC) X-HE-Tag: tank58_07130a927606 X-Filterd-Recvd-Size: 6988 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Tue, 9 Feb 2021 09:15:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612862139; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uS7AnbSaMBJT+nB4CnATGQkmbCcT2ALwEDRI78KagxU=; b=NoP0wjrotGeQCOapcyrPa1aTNGmKnLaQHo+KVz29jYzdpzv9BGsHU4gHPZY4rSl51aPKlB aGeBYAYIpz4iyVKQYG10kCLvDpqdL5vwKE10ybNtxxxYYS/Djuxlv8di0kgndPS/eHc7mC Bp3WnukctLPpHUwzaWQOci3dd8JN384= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-63-NwZYbqYANyOm-qFvzH4vwA-1; Tue, 09 Feb 2021 04:15:34 -0500 X-MC-Unique: NwZYbqYANyOm-qFvzH4vwA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BBB2C79EC5; Tue, 9 Feb 2021 09:15:29 +0000 (UTC) Received: from [10.36.113.141] (ovpn-113-141.ams2.redhat.com [10.36.113.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 282C760CEC; Tue, 9 Feb 2021 09:15:18 +0000 (UTC) To: Michal Hocko Cc: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org References: <20210208211326.GV242749@kernel.org> <1F6A73CF-158A-4261-AA6C-1F5C77F4F326@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> Date: Tue, 9 Feb 2021 10:15:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09.02.21 09:59, Michal Hocko wrote: > On Mon 08-02-21 22:38:03, David Hildenbrand wrote: >> >>> Am 08.02.2021 um 22:13 schrieb Mike Rapoport : >>> >>> =EF=BB=BFOn Mon, Feb 08, 2021 at 10:27:18AM +0100, David Hildenbrand = wrote: >>>> On 08.02.21 09:49, Mike Rapoport wrote: >>>> >>>> Some questions (and request to document the answers) as we now allow= to have >>>> unmovable allocations all over the place and I don't see a single co= mment >>>> regarding that in the cover letter: >>>> >>>> 1. How will the issue of plenty of unmovable allocations for user sp= ace be >>>> tackled in the future? >>>> >>>> 2. How has this issue been documented? E.g., interaction with ZONE_M= OVABLE >>>> and CMA, alloc_conig_range()/alloc_contig_pages?. >>> >>> Secretmem sets the mappings gfp mask to GFP_HIGHUSER, so it does not >>> allocate movable pages at the first place. >> >> That is not the point. Secretmem cannot go on CMA / ZONE_MOVABLE >> memory and behaves like long-term pinnings in that sense. This is a >> real issue when using a lot of sectremem. >=20 > A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE= . > As I've said it is quite easy to land at the similar situation even wit= h > tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is > really uncommon. It would be even worse that those would be allowed to > consume both CMA/ZONE_MOVABLE. IIRC, tmpfs/MAP_ANON|MAP_SHARED memory a) Is movable, can land in ZONE_MOVABLE/CMA b) Can be limited by sizing tmpfs appropriately AFAIK, what you describe is a problem with memory overcommit, not with=20 zone imbalances (below). Or what am I missing? >=20 > One has to be very careful when relying on CMA or movable zones. This i= s > definitely worth a comment in the kernel command line parameter > documentation. But this is not a new problem. I see the following thing worth documenting: Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of=20 ZONE_MOVABLE/CMA. Assume you make use of 1.5GB of secretmem. Your system might run into=20 OOM any time although you still have plenty of memory on ZONE_MOVAVLE=20 (and even swap!), simply because you are making excessive use of=20 unmovable allocations (for user space!) in an environment where you=20 should not make excessive use of unmovable allocations (e.g., where=20 should page tables go?). The existing controls (mlock limit) don't really match the current=20 semantics of that memory. I repeat it once again: secretmem *currently*=20 resembles long-term pinned memory, not mlocked memory. Things will=20 change when implementing migration support for secretmem pages. Until=20 then, the semantics are different and this should be spelled out. For long-term pinnings this is kind of obvious, still we're now=20 documenting it because it's dangerous to not be aware of. Secretmem=20 behaves exactly the same and I think this is worth spelling out:=20 secretmem has the potential of being used much more often than fairly=20 special vfio/rdma/ ... Looking at a cover letter that doesn't even mention the issue of=20 unmovable allocations makes me thing that we are either trying to ignore=20 the problem or are not aware of the problem. --=20 Thanks, David / dhildenb