From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59F4EC433DB for ; Tue, 26 Jan 2021 11:57:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 000AA2311D for ; Tue, 26 Jan 2021 11:57:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 000AA2311D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 751138D00C7; Tue, 26 Jan 2021 06:57:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B0EA8D00B0; Tue, 26 Jan 2021 06:57:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 503E38D00C7; Tue, 26 Jan 2021 06:57:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 33EDA8D00B0 for ; Tue, 26 Jan 2021 06:57:15 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EC9D28249980 for ; Tue, 26 Jan 2021 11:57:14 +0000 (UTC) X-FDA: 77747775588.17.tree39_0c001b12758e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id CD0CD180D0184 for ; Tue, 26 Jan 2021 11:57:14 +0000 (UTC) X-HE-Tag: tree39_0c001b12758e X-Filterd-Recvd-Size: 6362 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 26 Jan 2021 11:57:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1611662233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qjJ7EBIraH0kcBiHWVTNphRxbXFU8GNfvJrsoVivTWk=; b=KjC3Gvup1ks2/8FbcynL2MN0cya87aHxTGbdekVClXjZVMK963mGWq/0BqMgu3k2dRuPLT 0SvCI346yB97WaYhAY/DWGARuKclOINuDZ7CBh9cA1xh25Qz+fakOmTYy0xZaCfpwrxhwf 1o92eKo0V5/Zse6x/VQYhoICFgCGT+E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-344-Ryg-fxAdMkms-vm90M751Q-1; Tue, 26 Jan 2021 06:57:11 -0500 X-MC-Unique: Ryg-fxAdMkms-vm90M751Q-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9F94E107ACF6; Tue, 26 Jan 2021 11:57:06 +0000 (UTC) Received: from [10.36.114.192] (ovpn-114-192.ams2.redhat.com [10.36.114.192]) by smtp.corp.redhat.com (Postfix) with ESMTP id B16B65D751; Tue, 26 Jan 2021 11:56:49 +0000 (UTC) To: Michal Hocko , Mike Rapoport Cc: Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt References: <20210121122723.3446-1-rppt@kernel.org> <20210121122723.3446-8-rppt@kernel.org> <20210126114657.GL827@dhcp22.suse.cz> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> Date: Tue, 26 Jan 2021 12:56:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210126114657.GL827@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 26.01.21 12:46, Michal Hocko wrote: > On Thu 21-01-21 14:27:19, Mike Rapoport wrote: >> From: Mike Rapoport >> >> Removing a PAGE_SIZE page from the direct map every time such page is >> allocated for a secret memory mapping will cause severe fragmentation = of >> the direct map. This fragmentation can be reduced by using PMD-size pa= ges >> as a pool for small pages for secret memory mappings. >> >> Add a gen_pool per secretmem inode and lazily populate this pool with >> PMD-size pages. >> >> As pages allocated by secretmem become unmovable, use CMA to back larg= e >> page caches so that page allocator won't be surprised by failing attem= pt to >> migrate these pages. >> >> The CMA area used by secretmem is controlled by the "secretmem=3D" ker= nel >> parameter. This allows explicit control over the memory available for >> secretmem and provides upper hard limit for secretmem consumption. >=20 > OK, so I have finally had a look at this closer and this is really not > acceptable. I have already mentioned that in a response to other patch > but any task is able to deprive access to secret memory to other tasks > and cause OOM killer which wouldn't really recover ever and potentially > panic the system. Now you could be less drastic and only make SIGBUS on > fault but that would be still quite terrible. There is a very good > reason why hugetlb implements is non-trivial reservation system to avoi= d > exactly these problems. >=20 > So unless I am really misreading the code > Nacked-by: Michal Hocko >=20 > That doesn't mean I reject the whole idea. There are some details to > sort out as mentioned elsewhere but you cannot really depend on > pre-allocated pool which can fail at a fault time like that. So, to do it similar to hugetlbfs (e.g., with CMA), there would have to=20 be a mechanism to actually try pre-reserving (e.g., from the CMA area),=20 at which point in time the pages would get moved to the secretmem pool,=20 and a mechanism for mmap() etc. to "reserve" from these secretmem pool,=20 such that there are guarantees at fault time? What we have right now feels like some kind of overcommit (reading, as=20 overcommiting huge pages, so we might get SIGBUS at fault time). TBH, the SIGBUS thingy doesn't sound terrible to me - if this behavior=20 is to be expected right now by applications using it and they can handle=20 it - no guarantees. I fully agree that some kind of=20 reservation/guarantee mechanism would be preferable. --=20 Thanks, David / dhildenb