From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13E2AC07E96 for ; Mon, 12 Jul 2021 01:06:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7930D6101E for ; Mon, 12 Jul 2021 01:06:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7930D6101E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 22AA26B0088; Sun, 11 Jul 2021 21:06:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DA8C6B008A; Sun, 11 Jul 2021 21:06:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07C606B008C; Sun, 11 Jul 2021 21:06:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id D967F6B0088 for ; Sun, 11 Jul 2021 21:05:59 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E5D2110F5D for ; Mon, 12 Jul 2021 01:05:58 +0000 (UTC) X-FDA: 78352143996.10.FC68028 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf21.hostedemail.com (Postfix) with ESMTP id E0AF1D004130 for ; Mon, 12 Jul 2021 01:05:57 +0000 (UTC) Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4GNQSl6vt7z78MB; Mon, 12 Jul 2021 09:01:27 +0800 (CST) Received: from dggpeml500016.china.huawei.com (7.185.36.70) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 12 Jul 2021 09:05:47 +0800 Received: from [10.174.148.223] (10.174.148.223) by dggpeml500016.china.huawei.com (7.185.36.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 12 Jul 2021 09:05:46 +0800 Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC To: Steven Sistare , Anthony Yznaga , , CC: "Gonglei (Arei)" , References: <1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com> <43471cbb-67c6-f189-ef12-0f8302e81b06@oracle.com> From: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" Message-ID: Date: Mon, 12 Jul 2021 09:05:45 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <43471cbb-67c6-f189-ef12-0f8302e81b06@oracle.com> Content-Type: text/plain; charset="gbk" X-Originating-IP: [10.174.148.223] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpeml500016.china.huawei.com (7.185.36.70) X-CFilter-Loop: Reflected X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E0AF1D004130 X-Stat-Signature: 83uqn9hdknsi3q1wzwakkdk35yzstqna Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=huawei.com; spf=pass (imf21.hostedemail.com: domain of longpeng2@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=longpeng2@huawei.com X-HE-Tag: 1626051957-737135 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Steven, =D4=DA 2021/7/8 20:48, Steven Sistare =D0=B4=B5=C0: > On 7/8/2021 5:52 AM, Longpeng (Mike, Cloud Infrastructure Service Produ= ct Dept.) wrote: >> Hi Anthony and Steven, >> >> =D4=DA 2020/7/28 1:11, Anthony Yznaga =D0=B4=B5=C0: >>> This patchset adds support for preserving an anonymous memory range a= cross >>> exec(3) using a new madvise MADV_DOEXEC argument. The primary benefi= t for >>> sharing memory in this manner, as opposed to re-attaching to a named = shared >>> memory segment, is to ensure it is mapped at the same virtual address= in >>> the new process as it was in the old one. An intended use for this i= s to >>> preserve guest memory for guests using vfio while qemu exec's an upda= ted >>> version of itself. By ensuring the memory is preserved at a fixed ad= dress, >>> vfio mappings and their associated kernel data structures can remain = valid. >>> In addition, for the qemu use case, qemu instances that back guest RA= M with >>> anonymous memory can be updated. >> >> We have a requirement like yours, but ours seems more complex. We want= to >> isolate some memory regions from the VM's memory space and the start a= child >> process who will using these memory regions. >> >> I've wrote a draft to support this feature, but I just find that my dr= aft is >> pretty like yours. >> >> It seems that you've already abandoned this patchset, why ? >=20 > Hi Longpeng, > The reviewers did not like the proposal for several reasons, but the = showstopper > was that they did not want to add complexity to the exec path in the ke= rnel. You > can read the email archive for details. >=20 I've read the archive and did some study these days, maybe this soluation= is more sutiable for my use case. Let me describe my use case more clearly (just ignore if you're not inter= ested in it): 1. Prog A mmap() 4GB memory (anon or file-mapping), suppose the allocated= VA range is [0x40000000,0x140000000) 2. Prog A specifies [0x48000000,0x50000000) and [0x80000000,0x100000000) = will be shared by its child. 3. Prog A fork() Prog B and then Prog B exec() a new ELF binary. 4. Prog B notice the shared ranges (e.g. by input parameters or ...) and = remap them to a continuous VA range. Do you have any suggestions ? > We solved part of our problem by adding new vfio interfaces: VFIO_DMA_U= NMAP_FLAG_VADDR > and VFIO_DMA_MAP_FLAG_VADDR. That solves the vfio problem for shared m= emory, but not > for mmap MAP_ANON memory. >=20 > - Steve >=20 >>> Patches 1 and 2 ensure that loading of ELF load segments does not sil= ently >>> clobber existing VMAS, and remove assumptions that the stack is the o= nly >>> VMA in the mm when the stack is set up. Patch 1 re-introduces the us= e of >>> MAP_FIXED_NOREPLACE to load ELF binaries that addresses the previous = issues >>> and could be considered on its own. >>> >>> Patches 3, 4, and 5 introduce the feature and an opt-in method for it= s use >>> using an ELF note. >>> >>> Anthony Yznaga (5): >>> elf: reintroduce using MAP_FIXED_NOREPLACE for elf executable mappi= ngs >>> mm: do not assume only the stack vma exists in setup_arg_pages() >>> mm: introduce VM_EXEC_KEEP >>> exec, elf: require opt-in for accepting preserved mem >>> mm: introduce MADV_DOEXEC >>> >>> arch/x86/Kconfig | 1 + >>> fs/binfmt_elf.c | 196 +++++++++++++++++++++++= ++-------- >>> fs/exec.c | 33 +++++- >>> include/linux/binfmts.h | 7 +- >>> include/linux/mm.h | 5 + >>> include/uapi/asm-generic/mman-common.h | 3 + >>> kernel/fork.c | 2 +- >>> mm/madvise.c | 25 +++++ >>> mm/mmap.c | 47 ++++++++ >>> 9 files changed, 266 insertions(+), 53 deletions(-) >>> >> > . >=20