From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17060C3526D for ; Wed, 26 Jan 2022 13:42:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4931D6B0073; Wed, 26 Jan 2022 08:42:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 440E76B0074; Wed, 26 Jan 2022 08:42:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32FD36B0075; Wed, 26 Jan 2022 08:42:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0137.hostedemail.com [216.40.44.137]) by kanga.kvack.org (Postfix) with ESMTP id 253CB6B0073 for ; Wed, 26 Jan 2022 08:42:19 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DD24692EC6 for ; Wed, 26 Jan 2022 13:42:18 +0000 (UTC) X-FDA: 79072552356.24.3B605E4 Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf12.hostedemail.com (Postfix) with ESMTP id 84F3240013 for ; Wed, 26 Jan 2022 13:42:16 +0000 (UTC) Received: by mail-lf1-f47.google.com with SMTP id z4so7934630lft.3 for ; Wed, 26 Jan 2022 05:42:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Alc8oEV5YDVD8wUXku55TL6Eo0GDfRno/HSGsO5a5G0=; b=ccIXkZTbmVbS5oHJZvDnte715BIEU630afKo6hLkstHyy76syUo+6c1PKybn7G0wgS 8nTdLVy596xTsQurfMO3Cp1LqlQMuKeQYlWjaviNG3yhDmO/NjmKXSNl1lzKJ112cz0a 114c5gNmTDhdvKNNfxPaZFh2An5EbGroo6Oay6AEL2BZKFzjI9wetSY7tCdBPrG+DC2g ntGdAiE6WJGZEI8tKaAXaNlFBEfKPIGxjX8TsV0e9PfSovYTObcHihO4pmlo0jDfJeBZ 8N3lvagsb8RrTX+IHC/XO34LdBLqUMET3mgtxNCSNNtJvLuO1PcPOuerJfniBLkjebx9 YRDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Alc8oEV5YDVD8wUXku55TL6Eo0GDfRno/HSGsO5a5G0=; b=Hfo3M7S1KEQmtzX1QDcWXB36CUgOKoJH74BAZqV5BZJXf6QF0rww3id+xGi/QDEBS3 xj6Mw0RJvjfB8Vebo/e+U0iQjySbskHoa+BazsTV4w+VVwaYwBZyMwFVul/giT1as5JQ QV7cV3f1LQ7gtAV55cuQtBLxqMKh5Hf/+t4z2GMnwQs4odz564ybwkwgvR5KsRmbOUr0 Ph+n2QpT6lz9vCnTpTD1DEn6M0gAL1fI06241GqHBwvKfPEu17JyT6mibPuhYr84uTlI bOlfUk92C/ejJ60TcSwSDmQ6QChU0b7dc8kcSHyt7i09DrnZZmfQcz9j7v2wx7QZAONp YXRA== X-Gm-Message-State: AOAM533HtAnkvv6brMsLAvrAlHO1555CMgrPTmBquXYlSaa0IRJAMzNV YSjend3+c+eeFVmWebKfW1Y2NA== X-Google-Smtp-Source: ABdhPJytZbSzXvdmITmfj9YLbyBXILQ/6yBAYZJlxO9L46JVZHO0xyL3H2Xktijp9xq98vZzaUD5+Q== X-Received: by 2002:a05:6512:1116:: with SMTP id l22mr10659161lfg.229.1643204535314; Wed, 26 Jan 2022 05:42:15 -0800 (PST) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id y22sm608299lji.129.2022.01.26.05.42.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jan 2022 05:42:14 -0800 (PST) Received: by box.localdomain (Postfix, from userid 1000) id 42AE1104818; Wed, 26 Jan 2022 16:42:47 +0300 (+03) Date: Wed, 26 Jan 2022 16:42:47 +0300 From: "Kirill A. Shutemov" To: Matthew Wilcox Cc: Khalid Aziz , akpm@linux-foundation.org, longpeng2@huawei.com, arnd@arndb.de, dave.hansen@linux.intel.com, david@redhat.com, rppt@kernel.org, surenb@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes Message-ID: <20220126134247.fadtwbvyknh3ejpe@box.shutemov.name> References: <20220125114212.ks2qtncaahi6foan@box.shutemov.name> <20220125135917.ezi6itozrchsdcxg@box.shutemov.name> <20220125185705.wf7p2l77vggipfry@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: 4fm8nmgaoja5z5xiss46ghggrzrgtdhw X-Rspam-User: nil Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=shutemov-name.20210112.gappssmtp.com header.s=20210112 header.b=ccIXkZTb; spf=none (imf12.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.167.47) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 84F3240013 X-HE-Tag: 1643204536-543382 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 26, 2022 at 04:04:48AM +0000, Matthew Wilcox wrote: > On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote: > > On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote: > > > On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote: > > > > > I think zero-API approach (plus madvise() hints to tweak it) is worth > > > > > considering. > > > > > > > > I think the zero-API approach actually misses out on a lot of > > > > possibilities that the mshare() approach offers. For example, mshare() > > > > allows you to mmap() many small files in the shared region -- you > > > > can't do that with zeroAPI. > > > > > > Do you consider a use-case for many small files to be common? I would > > > think that the main consumer of the feature to be mmap of huge files. > > > And in this case zero enabling burden on userspace side sounds like a > > > sweet deal. > > > > mmap() of huge files is certainly the Oracle use-case. With occasional > > funny business like mprotect() of a single page in the middle of a 1GB > > hugepage. > > Bill and I were talking about this earlier and realised that this is > the key point. There's a requirement that when one process mprotects > a page that it gets protected in all processes. You can't do that > without *some* API because that's different behaviour than any existing > API would produce. "hurr, durr, we are Oracle" :P Sounds like a very niche requirement. I doubt there will more than single digit user count for the feature. Maybe only the DB. > So how about something like this ... > > int mcreate(const char *name, int flags, mode_t mode); > > creates a new mm_struct with a refcount of 2. returns an fd (one > of the two refcounts) and creates a name for it (inside msharefs, > holds the other refcount). > > You can then mmap() that fd to attach it to a chunk of your address > space. Once attached, you can start to populate it by calling > mmap() and specifying an address inside the attached mm as the first > argument to mmap(). That is not what mmap() would normally do to an existing mapping. So it requires special treatment. In general mmap() of a mm_struct scares me. I can't wrap my head around implications. Like how does it work on fork()? How accounting works? What happens on OOM? What prevents creating loops, like mapping a mm_struct inside itself? What mremap()/munmap() do to such mapping? Will it affect mapping of mm_struct or will it target mapping inside the mm_sturct? Maybe it just didn't clicked for me, I donno. > Maybe mcreate() is just a library call, and it's really a thin wrapper > around open() that happens to know where msharefs is mounted. -- Kirill A. Shutemov