From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AED0CC4338F for ; Mon, 23 Aug 2021 10:33:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5BF496137B for ; Mon, 23 Aug 2021 10:33:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5BF496137B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D4A686B006C; Mon, 23 Aug 2021 06:33:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF9AA6B0072; Mon, 23 Aug 2021 06:33:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBFF28D0001; Mon, 23 Aug 2021 06:33:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id A059D6B006C for ; Mon, 23 Aug 2021 06:33:52 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 32FA01802390C for ; Mon, 23 Aug 2021 10:33:52 +0000 (UTC) X-FDA: 78505984704.32.1A2B67A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id C54707000091 for ; Mon, 23 Aug 2021 10:33:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629714831; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Py9e86qkf7btDdlh4r+/+mH+zcZ5pulk9mYpdUI/9Q=; b=BA2orWimUp4z7A/1EVCtAEEIX+CNuHQt2DXFoVXeVkYdjqXz2inQZ40C5oII1DPrg8ylYu a9mbQzoP54pQpGyIjc0MhtVrg0ZviaShiaKRBrRqPTPt8yO6ZSJBdvCZp05zfefJxukoE8 hADlJzOaZ267VapYDxGz1sPGQ4xtMEg= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-62-SsU7McYqON6mLgGQT1DrOw-1; Mon, 23 Aug 2021 06:33:48 -0400 X-MC-Unique: SsU7McYqON6mLgGQT1DrOw-1 Received: by mail-wr1-f69.google.com with SMTP id v18-20020adfe2920000b029013bbfb19640so4938864wri.17 for ; Mon, 23 Aug 2021 03:33:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=5Py9e86qkf7btDdlh4r+/+mH+zcZ5pulk9mYpdUI/9Q=; b=QoQ4L0fkM9fFzFX7FcjP757XNgrIQQK8y73hgou5L0ZfeK07lXdtQf4Fr76eSw0nr0 i3/kUQSY/cuF9hYZTYA8hhzNWUQNisS1XRoDjvVnbQdKJ00qF+vvNtHQDstCLZb0MNtT qJw8kad6vVCOUFFID62fs0zKSTCjCW2ftj7cR13+vKl+s1eTKHLmY/sWzEM4WSxD6L67 G319uOxUzqBNzOn2xS7/P+ECHUdQlb/rce0tm2y1ph3xqN0IApm4V+zoMLQ7j2M3FNPX DlpJP8C/e2XWVlk0brwHa+HBwPJ4vMX7Jgu2r0NCFHjrq3D+B1cdE2IK+qZbdBiiYMO8 ab2Q== X-Gm-Message-State: AOAM5314tiPtN7PICNM+teD28XEcCdrt4+0WE1MQBluFTCRJ88eFbPav ZlOiafA+SyEaB+7aAW/McRlw9p9i0kAfoLBBhgSNPW4oZYCRhSGjmrcitd7knEdor90ZgIuQjL7 FA7y2r8CvVw0= X-Received: by 2002:a7b:c40e:: with SMTP id k14mr15809204wmi.46.1629714826729; Mon, 23 Aug 2021 03:33:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw2CWhr+beg/9TmfMtDxx7BlN7yeyp1dYSt8fN9Elumrjd6b4bImla63mXmuvGwq1PRYjlnHw== X-Received: by 2002:a7b:c40e:: with SMTP id k14mr15809181wmi.46.1629714826484; Mon, 23 Aug 2021 03:33:46 -0700 (PDT) Received: from ?IPv6:2003:d8:2f0a:7f00:fad7:3bc9:69d:31f? (p200300d82f0a7f00fad73bc9069d031f.dip0.t-ipconnect.de. [2003:d8:2f0a:7f00:fad7:3bc9:69d:31f]) by smtp.gmail.com with ESMTPSA id c13sm10684489wru.73.2021.08.23.03.33.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Aug 2021 03:33:46 -0700 (PDT) To: Ralf Ramsauer , linux-mm@kvack.org Cc: Wolfgang Mauerer , Mario Mintel References: <8bc6b208-2b4c-03d6-c9c3-c36daf55d3f7@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [EXT] Re: COW in userspace Message-ID: Date: Mon, 23 Aug 2021 12:33:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BA2orWim; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf27.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C54707000091 X-Stat-Signature: 6trsnd5hcu1w8jkz8tgc56t67z63e3ee X-HE-Tag: 1629714831-685769 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 23.08.21 12:16, Ralf Ramsauer wrote: >=20 >=20 > On 23/08/2021 10:02, David Hildenbrand wrote: >> On 20.08.21 15:13, Ralf Ramsauer wrote: >>> Dear mm folks, >>> >>> I have an issue, where it would be great to have a COW-backed virtual >>> memory area within an userspace process. I know there's the possibili= ty >>> to have a file-backed MAP_SHARED vma, which is later duplicated with >>> MAP_PRIVATE, but that's not exactly what I'm looking for. >>> >>> Say I have an anonymous page-aligned VMA a, with MAP_PRIVATE and >>> PROT_RW. Userspace happily writes to/reads from it. At some point in >>> time, I want to 'snapshot' that single VMA within the context of the >>> process and without the need to fork(). Say there's something like >>> >>> =C2=A0=C2=A0 a =3D mmap(0, len, PROT_RW, MAP_ANON | MAP_POPULATE, -1= , 0); >>> =C2=A0=C2=A0 [... fill a ...] >>> >>> =C2=A0=C2=A0 b =3D mmdup(a, len, PROT_READ); >>> >>> b shall be the new base pointer of a new VMA that is backed by COW >>> mechanisms. After mmdup, those regular COW mechanisms do the rest: bo= th >>> VMAs (a and b) will fault on subsequent writes and duplicate the >>> previously shared physical mapping, pretty much what cow_fault or >>> shared_fault does. >>> >>> Afaict, this, or at least something like this is currently not suppor= ted >>> by the kernel. Is that correct? If so, why? Generally spoken, is it a >>> bad idea? >> >> Not sure if it helps (most probably not), QEMU uses uffd-wp for >> background snapshots of VM memory. It's different, though, as you'll >> only have a single mapping and will be catching modifications to your >> single mapping, such that you can "safe away" relevant snapshot pages >> before any modifications. >=20 > Thanks for the pointer, David. I'll have a look. >=20 >> >> You mention "both VMAs (a and b) will fault on subsequent writes", so >> would you actually be allowing PROT_WRITE access to b ("snapshot")? >> >=20 > In general, yes, both should be allowed to be PROT_WRITE. So no matter > "which side" causes the fault, simply both will lead to duplication. >=20 > If it would make things easier, then it would also be absolutely fine t= o > have the snapshot PROT_READ, which would suffice my requirements as wel= l. I recall that Redis has very similar requirements for live snapshotting.=20 They used to handle it via fork() just as you described as I was told. I=20 don't know if they already switched to uffd-wp, but I would guess they=20 already did, because they were another excellent use case for uffd-wp https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg02955.html You can handle COW manually in user space that way 1. Creating a second anonymous mapping 2. Registering a UFFD-WP handler on the original mapping 3. WP-protecting the original mapping via UFFD 4. Tracking in a bitmap which pages were already copied So when you get notified about a WP event, you copy the page manually to=20 the second mapping, un-protect the page, and remember in the bitmap that=20 the page has been copied. When reading the snapshot, you have to take a look at the bitmap to=20 figure out if you have to read a specific page from the original, or=20 from the second mapping. But you won't be able to just read the second=20 mapping. (question would be, if that is really required or can be=20 worked-around) --=20 Thanks, David / dhildenb