From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11837C432BE for ; Tue, 17 Aug 2021 00:52:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 89AE760184 for ; Tue, 17 Aug 2021 00:52:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 89AE760184 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 261C48D0001; Mon, 16 Aug 2021 20:52:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2131F6B0072; Mon, 16 Aug 2021 20:52:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 101278D0001; Mon, 16 Aug 2021 20:52:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253]) by kanga.kvack.org (Postfix) with ESMTP id E76696B006C for ; Mon, 16 Aug 2021 20:52:25 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 86F5A1829912D for ; Tue, 17 Aug 2021 00:52:25 +0000 (UTC) X-FDA: 78482746650.08.FBA9FC3 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf23.hostedemail.com (Postfix) with ESMTP id 77C9A900B4E1 for ; Tue, 17 Aug 2021 00:52:24 +0000 (UTC) Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4GpXRP1zNCzYmt2; Tue, 17 Aug 2021 08:46:57 +0800 (CST) Received: from dggpemm100008.china.huawei.com (7.185.36.125) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 17 Aug 2021 08:47:19 +0800 Received: from dggpeml500016.china.huawei.com (7.185.36.70) by dggpemm100008.china.huawei.com (7.185.36.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 17 Aug 2021 08:47:19 +0800 Received: from dggpeml500016.china.huawei.com ([7.185.36.70]) by dggpeml500016.china.huawei.com ([7.185.36.70]) with mapi id 15.01.2176.012; Tue, 17 Aug 2021 08:47:19 +0800 From: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" To: Matthew Wilcox , David Hildenbrand CC: Khalid Aziz , Steven Sistare , Anthony Yznaga , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "Gonglei (Arei)" Subject: RE: [RFC PATCH 0/5] madvise MADV_DOEXEC Thread-Topic: [RFC PATCH 0/5] madvise MADV_DOEXEC Thread-Index: AQHXkHx3rHlzMKHix0+kvEwtcZpkNat1QwAAgABEngCAAU730A== Date: Tue, 17 Aug 2021 00:47:19 +0000 Message-ID: <3cdccacab6244dd3ac9d491ac7233b43@huawei.com> References: <1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com> <43471cbb-67c6-f189-ef12-0f8302e81b06@oracle.com> <55720e1b39cff0a0f882d8610e7906dc80ea0a01.camel@oracle.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.174.148.223] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected Authentication-Results: imf23.hostedemail.com; dkim=none; spf=pass (imf23.hostedemail.com: domain of longpeng2@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=longpeng2@huawei.com; dmarc=pass (policy=none) header.from=huawei.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 77C9A900B4E1 X-Stat-Signature: 13p7r9bs8t3wg86mcft15iub9wabqdio X-HE-Tag: 1629161544-886609 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > -----Original Message----- > From: Matthew Wilcox [mailto:willy@infradead.org] > Sent: Monday, August 16, 2021 8:08 PM > To: David Hildenbrand > Cc: Khalid Aziz ; Longpeng (Mike, Cloud Infrastru= cture > Service Product Dept.) ; Steven Sistare > ; Anthony Yznaga ; > linux-kernel@vger.kernel.org; linux-mm@kvack.org; Gonglei (Arei) > > Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC >=20 > On Mon, Aug 16, 2021 at 10:02:22AM +0200, David Hildenbrand wrote: > > > Mappings within this address range behave as if they were shared > > > between threads, so a write to a MAP_PRIVATE mapping will create a > > > page which is shared between all the sharers. The first process that > > > declares an address range mshare'd can continue to map objects in > > > the shared area. All other processes that want mshare'd access to > > > this memory area can do so by calling mshare(). After this call, the > > > address range given by mshare becomes a shared range in its address > > > space. Anonymous mappings will be shared and not COWed. > > > > Did I understand correctly that you want to share actual page tables > > between processes and consequently different MMs? That sounds like a ve= ry bad > idea. >=20 > That is the entire point. Consider a machine with 10,000 instances of an > application running (process model, not thread model). If each applicati= on wants > to map 1TB of RAM using 2MB pages, that's 4MB of page tables per process = or > 40GB of RAM for the whole machine. >=20 > There's a reason hugetlbfs was enhanced to allow this page table sharing. > I'm not a fan of the implementation as it gets some locks upside down, so= this is an > attempt to generalise the concept beyond hugetlbfs. >=20 > Think of it like partial threading. You get to share some parts, but not= all, of your > address space with your fellow processes. Obviously you don't want to ex= pose > this to random other processes, only to other instances of yourself being= run as the > same user. I understand your intent now, you want to share memory ranges by sharing th= e relevant pgtable pages.=20 I had implemented a similar idea to support QEMU live upgrade about four ye= ars ago ( in late 2017), https://patents.google.com/patent/US20210089345A1 """ [0131] In a first possible implementation, the generation unit includes a copying = subunit configured to copy an entry corresponding to the virtual memory area in a PGD page tab= le of the first virtual machine to an entry corresponding to the virtual memory area in a P= GD page table of the second virtual machine. """ We want to share the anonymous memory between old QEMU process and the new = one,=20 so we limit the QEMU to mmap the VM's memory address in 4T-8T and then shar= e the=20 memory by direct copy the PGD entries ( implementation is much more complic= ated than I=20 say ). Besides to save memory, large memory range can be shared fast in this way.