From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 113B3C77B75 for ; Tue, 16 May 2023 16:43:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35130280002; Tue, 16 May 2023 12:43:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30184900002; Tue, 16 May 2023 12:43:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A11B280002; Tue, 16 May 2023 12:43:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 06E07900002 for ; Tue, 16 May 2023 12:43:24 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8C3A9A0358 for ; Tue, 16 May 2023 16:43:23 +0000 (UTC) X-FDA: 80796688686.09.E237DC6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 2B7FDC0013 for ; Tue, 16 May 2023 16:43:20 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="atl3uRw/"; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684255401; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AKcrkflnER1P+rdmxeBIz0lO2UyMu/OownH5oaz4C1c=; b=C2ofJ6GrZy6vzpSSe+iPYc4RMM/aX5kvsoV/jGX98Yd5OzdlDG4UZQNLdDZcKIF4TvFJ9B RrJo+L07Xp/L/faKNS0voVOoMFqXrCGwZ6eacjmmVt5RflGTSFsoMVba/E2x5BADs2XHeA CEqt017ekx+JU3lUyOmwdOEyJkvxhHU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="atl3uRw/"; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684255401; a=rsa-sha256; cv=none; b=lkW2Fic9oM0nPOpHWPB/TycK+BnUVTi985Xm6rvcVbsFUf1ddcNiA+14R+H2zpUm76DGpz iKNgaHG6LySi++Nx8ZaTFCdx+GoPG3n6nOKvASURIf29CPG97jVKRPEVixjCy2v72K83qT BhugCeuXn4Eiwmi31gREsklN/QTXO8w= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684255400; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AKcrkflnER1P+rdmxeBIz0lO2UyMu/OownH5oaz4C1c=; b=atl3uRw/5i+49qMg/omls1mOJA1OhLhYLGfe/+x1gcRDIMVsb7E5cQPiFnj/sRfJ8vPswn fu891hmq9CYRA01c7kg93Px7togO7+v4RQds88iQfEBsvcVlezgnyHqVjhh0NGLAFPOU2c SE+2+pQgL0+1tbBXcAOUfUyXuJkGt9c= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-370-_0kuBMD4PGCd6O9KiG_tCw-1; Tue, 16 May 2023 12:43:19 -0400 X-MC-Unique: _0kuBMD4PGCd6O9KiG_tCw-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-62143c665cbso15982396d6.0 for ; Tue, 16 May 2023 09:43:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684255399; x=1686847399; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AKcrkflnER1P+rdmxeBIz0lO2UyMu/OownH5oaz4C1c=; b=Yaq1TYPKuQgWWkiwsDbLrWbVmLH6Ugy4tw6XpSdywy8BkxvAhd0wjhpRzVCh1vUzA/ CLetxCcek2kDwCjoYsQFRFNcLEG7vibConOLEyK/uVBU4tcp0ZSwkMITM4aDMj0yu/f3 zFUx4L28F4OTkZG/vVBxjoobKdApz41vHV1yRkgaUJkms4TqCV1F5Tjl98XCUE0ag0KA 1q1PSQWcs5CAYOi64EEO38GsdN9DKPncnGR79eXkdX92qqlGy9dhKO0i4TznXat09nKK FKR5QmFtjuD/AJUECw5BhrQ2+bjgCMCabhClvgpuV2gCM1oUIZNu9UAdkmgBwK1X9JDZ hq5Q== X-Gm-Message-State: AC+VfDxQj4yKxVyNA5CWGbaCadPTDT+gLDU0Bh6Zm4a5v8ZMoo8zKUO1 wEdM/KXEIUz8z/pqaR2MTR4wQ17khJSKbkpg3Z10qEA0ppb03RfwDHcxgprU8MrCLQgHQs8zJ4A GQQ2BN07bBTs= X-Received: by 2002:a05:6214:2526:b0:623:7348:4f07 with SMTP id gg6-20020a056214252600b0062373484f07mr22172qvb.1.1684255398929; Tue, 16 May 2023 09:43:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4CEH7NUBu06Z7A3BFwaLQbq0W2FO20l0Zq86mlEYEpXqsP5w/l/Wd1DWkhStZV5mEZ3C35bw== X-Received: by 2002:a05:6214:2526:b0:623:7348:4f07 with SMTP id gg6-20020a056214252600b0062373484f07mr22119qvb.1.1684255398483; Tue, 16 May 2023 09:43:18 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-62-70-24-86-62.dsl.bell.ca. [70.24.86.62]) by smtp.gmail.com with ESMTPSA id z7-20020a0cf247000000b005ef4ad380cesm5747651qvl.10.2023.05.16.09.43.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 May 2023 09:43:17 -0700 (PDT) Date: Tue, 16 May 2023 12:43:15 -0400 From: Peter Xu To: Lokesh Gidra Cc: Andrea Arcangeli , Axel Rasmussen , Andrew Morton , "open list:MEMORY MANAGEMENT" , linux-kernel , "Kirill A . Shutemov" , "Kirill A. Shutemov" , Brian Geffon , Suren Baghdasaryan , Kalesh Singh , Nicolas Geoffray , Jared Duke , android-mm , Blake Caldwell , Mike Rapoport Subject: Re: RFC for new feature to move pages from one vma to another without split Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2B7FDC0013 X-Stat-Signature: ohy7p6jxzrabqgndqwueg7i87jg4qhhj X-HE-Tag: 1684255400-279102 X-HE-Meta: U2FsdGVkX1/21TTQhbIr0IFB65USapp0EDzJd8uy+SiojxJ5YpD5HWX/RxpIdDs0nlNaHpofOZXP+3GX/osjtwQe/QmfEVghxt0yX2YCNHujpcw5KHqhA1vohvtaHJmZ9+yqI6XOr3V+igNuwSTIzmn6Bfi32c1/O6Tpfld2SH7T3MV8xQeU33GSIj/w+48e3SitSptuEoCtSDAtunop5tcAXpNZtdCEh/FRlbPHhGuhdAtr1e680wbJq9d0P/Jn7ZS/aDeyqTXoByxRUrPKHex9oNQ/zCENceQIifBVHUB0Ki+HRVvMJIHTzE2CXHX9fPO6rf8fW+0WBb68Zk8MeYGVEuMICP02QvVd/8HwuA3i7jV58Q6ICakcVub4Fxko5u3n1Vn/Gi2q0N+4EJ9qvQwYBOtAy0t4wZEm2ALPVjMZhbmRVJmALK+9IvystX574yfGsTT3wjb4eQ0p/B4Vl624zAsWNRa01ZWFdDpLK1pNJHTRy3y2wg7zefb19a0gHcosr9iGqxPQoYW4FI6yxkpLmfn4NYtZE3D/zfArAALkuySMlstolFRKenjvxTn9w4wrs3nuO1PegDu0etfiFGw/ogviNVZHDZlfVsSvyHChTa07bIL5l4vY+4T/ojc+RmY0iHo2EZHOSKKqHxvLmNi3CnfflfjFlgm5x2HvobagseIxUhKRxkUe4SioVghEb6j3GTqhamsh4pxpQt4v4/64T1E4BrMZgfJtpPQIwaG/EmEFxKN7qNHzuUK80266RYuGFg6e9HQVIg6qu+Q3Rh2BkWWG6j91fFiGCgVxIswJ0TYv44PU8XzNnanaFQRwcnQigbnYZqqmw5lEmFB+1tEOpj1vSVReQ4YafyJhkBrDnBXAf8A+FfDlbbtdj1Z+T1+tCREXDllNOAdXz1hH3/fEYZGdN50+bI+IEdYyjq7zIYnV3HxdTWKC/1A1Tgkvk0dQnQCAQZL2ta2IZKa 8JVTpkoe 8BRXFcVqDQvIe9fLJiqwJbrHR02UhIcYgtsE2A5MYh2fehvU6cHvC8TA1g51M+kp6RVouk9iJs7PRKyhHBTmkRiZnoE1ewDnI7eokB7K/jXyaf1rq+VYTPC5q2vbYS0yoAlbSIm7Wxc2hTAfgqnO4pgiVOgdXA4OtMszMpOs/UeucgzzLON3GtqWXMDyuippruajM5lHxUCIceUJhtvEctSwiepnylVUMGDNORXjW31YoyJTyagF00evusEIh0ga8FzL05clVgLg27GySXJ97VLxlfrc/lX3ehuCfwEi7rXGhiJ5n/TPjMX5D0EU2IDFg3mdgzZoPdRwrEYr+px4PcO6yf1VsDwqqmdMGY+zpZ5qmA47op0ShbTLp0rbTy1Hfqnv1fdCRa7rNg4gXCbv2tdByjpPwpuRuUBQBAJZKO0/tljANUjiDWI3nAqQsNhYfymrEkH63T1NyjF1lpuQfEXgyuNaGbBBkkz0shfO/b9oKcZAGrQUrUbU95ExmvlfammyLwRFGb9ze8TFGP8ghZ7nIBg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 08, 2023 at 03:56:50PM -0700, Lokesh Gidra wrote: > On Tue, Apr 11, 2023 at 8:14 AM Peter Xu wrote: > > > > On Mon, Apr 10, 2023 at 12:41:31AM -0700, Lokesh Gidra wrote: > > > On Thu, Apr 6, 2023 at 10:29 AM Peter Xu wrote: > > > > > > > > Hi, Lokesh, > > > > > > > > Sorry for a late reply. Copy Blake Caldwell and Mike too. > > > > > > Thanks for the reply. It's extremely helpful. > > > > > > > > On Thu, Feb 16, 2023 at 02:27:11PM -0800, Lokesh Gidra wrote: > > > > > I) SUMMARY: > > > > > Requesting comments on a new feature which remaps pages from one > > > > > private anonymous mapping to another, without altering the vmas > > > > > involved. Two alternatives exist but both have drawbacks: > > > > > 1. userfaultfd ioctls allocate new pages, copy data and free the old > > > > > ones even when updates could be done in-place; > > > > > 2. mremap results in vma splitting in most of the cases due to 'pgoff' mismatch. > > > > > > > > Personally it was always a mistery to me on how vm_pgoff works with > > > > anonymous vmas and why it needs to be setup with vm_start >> PAGE_SHIFT. > > > > > > > > Just now I tried to apply below oneliner change: > > > > > > > > @@ -1369,7 +1369,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > > > /* > > > > * Set pgoff according to addr for anon_vma. > > > > */ > > > > - pgoff = addr >> PAGE_SHIFT; > > > > + pgoff = 0; > > > > break; > > > > default: > > > > return -EINVAL; > > > > > > > > The kernel even boots without a major problem so far.. > > > > > > > > I had a feeling that I miss something else here, it'll be great if anyone > > > > knows. > > > > > > > > Anyway, I agree mremap() is definitely not the best way to do page level > > > > operations like this, no matter whether vm_pgoff can match or not. > > > > > > > > > > > > > > Proposing a new mremap flag or userfaultfd ioctl which enables > > > > > remapping pages without these drawbacks. Such a feature, as described > > > > > below, would be very helpful in efficient implementation of concurrent > > > > > compaction algorithms. > > > > > > > > After I read the proposal, I had a feeling that you're not aware that we > > > > have similar proposals adding UFFDIO_REMAP. > > > > > > Yes, I wasn't aware of this. Thanks a lot for sharing the details. > > > > > > > > I think it started with Andrea's initial proposal on the whole uffd: > > > > > > > > https://lore.kernel.org/linux-mm/1425575884-2574-1-git-send-email-aarcange@redhat.com/ > > > > > > > > Then for some reason it's not merged in initial version, but at least it's > > > > been proposed again here (even though it seems the goal is slightly > > > > different; that may want to move page out instead of moving in): > > > > > > > > https://lore.kernel.org/linux-mm/cover.1547251023.git.blake.caldwell@colorado.edu/ > > > > > > Yeah, this seems to be the opposite of what I'm looking for. IIUC, > > > page out REMAP can't > > > satisfy any MISSING userfault. In fact, it enables MISSING faults in > > > future. Maybe a flag > > > can be added to uffdio_remap struct to accommodate this case, if it is > > > still being pursued. > > > > Yes, I don't think that's a major problem if the use cases share mostly the > > same fundation. > > > > > > > > > > Also worth checking with the latest commit that Andrea maintains himself (I > > > > doubt whether there's major changes, but still just to make it complete): > > > > > > > > https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc19e92 > > > > > > > > So far I think that's what you're looking for. I'm not sure whether the > > > > limitations will be a problem, though, at least mentioned in the old > > > > proposals of UFFDIO_REMAP. For example, it required not only anonymous but > > > > also mapcount==1 on all src pages. But maybe that's not a problem here > > > > too. > > > > > > Yes, this is exactly what I am looking for. The mapcount==1 is not a > > > problem either. Any idea why the patch isn't merged? > > > > The initial verion of discussion mentioned some of the reason of lacking > > use cases: > > > > https://lore.kernel.org/linux-mm/20150305185112.GL4280@redhat.com/ > > > Thanks for sharing the link. I assume the 20% performance gap in > UFFDIO_COPY vs UFFDIO_REMAP is > just for ioctl calls. But (at least) in case of compaction (our use > case), COPY increases other overheads. Per my read: Yes, we already measured the UFFDIO_COPY is faster than UFFDIO_REMAP, the userfault latency decreases -20%. It was the fault latency so it can be more than the pure ioctl measurements. However I think the point is valid that for this specific use case it's not purely adding memory but also including removals. It seems indeed a proper use case to me at least for what I can see now. > It leads to more page allocations, mem-copies, and madvises than > required. OTOH, with REMAP: > > 1) Page allocations can be mostly avoided by recycling the pages as > they are freed during compaction > 2) Memcpy (for compacting objects) into the page (from (1)) is needed > only once (as compared to COPY wherein it does another memcpy). > Furthermore, as described in the RFC, sometimes even 1 memcpy isn't > required (with REMAP) > 3) As pages are being recycled in userspace, there would be far fewer > pages to madvise at the end of compaction. > > Also, as described in the RFC, REMAP allows moving pages within heap > for page-level coarse-grained compaction, which helps by avoiding > swapping in the page. This wouldn't be possible with COPY. Please feel free to pick up the work if you think that's the right one for you. IMHO it'll be very helpful if you can justify how REMAP could improve the use case in the cover letter with some real numbers. Thanks, -- Peter Xu