From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E71AC433F5 for ; Wed, 22 Sep 2021 22:30:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 208C761215 for ; Wed, 22 Sep 2021 22:30:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 208C761215 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 701CE6B006C; Wed, 22 Sep 2021 18:30:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AFE76B0071; Wed, 22 Sep 2021 18:30:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59E4B900002; Wed, 22 Sep 2021 18:30:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0122.hostedemail.com [216.40.44.122]) by kanga.kvack.org (Postfix) with ESMTP id 4C0CD6B006C for ; Wed, 22 Sep 2021 18:30:21 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F3A942DEDE for ; Wed, 22 Sep 2021 22:30:20 +0000 (UTC) X-FDA: 78616654200.34.A086D2A Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) by imf23.hostedemail.com (Postfix) with ESMTP id AC5B890000A6 for ; Wed, 22 Sep 2021 22:30:20 +0000 (UTC) Received: by mail-io1-f52.google.com with SMTP id q205so5544560iod.8 for ; Wed, 22 Sep 2021 15:30:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZVmQQIFn/AGiG5dhWv6Igf7NGcQ53Z5OL98C9qRWHKg=; b=JP1y2Rel5ayOpQ1MHdxi2P0JuzTzGjrDWiOHJw6RnGpgetZIjkudo5xs+TCZAyg8Q0 8jsq6SgQn62e28sJRzegap15Pnl3Db1UV7hYg7klWkFxjtlx4yoKU/qihIGuRE0yI9bG PZ0ZE+rOjgz+qC1mbLna3jW5oNOcvcfTRUPVJUQ+7ZL0KOsE5QLdwShlPYXbedcJHJ8/ pG87g7s04hpigyN1r/Y6BvO9HGaKBcJWy0Tqm65d4GRcXPOlT2IBey9pgQfckb9B2TGG e4pnvpNAkPLg/TYHAorbMUd477HGoctYswpc0YsJlT8Rv3X9nhKdJ8X8S03mKupCV61e RuSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZVmQQIFn/AGiG5dhWv6Igf7NGcQ53Z5OL98C9qRWHKg=; b=q9KYvJ87IsZ4dfxKbDiFhIsXd80++q0Tfhn4E1ofLkzPCJBRNFf4cdfL+emM/ecfwE 2530p1JZazZcnAFrJ/Iv4zqZ+5risdF0Qvr2pa5sHMkvhd6W4991Ps8ETKrgecMlZNpa Mx8ZiadwNX7YZbrUGNZ9nqF7rDb1ZYPEaCufizbwYHqBqLrp90dz9dTPnmgST3gOOtwY E5SA6EA9dxO3o9l1LswudlbPQjMsLkbNl704cduxPtGwqoTe/FiZczEf2BM3+XM+23Va s/cadTbaCPFRVW1fzJQ0TsrcQecEnZW+gxl8PHyez2kN6A51XV1CfWooytfBNCofHS4N CiLg== X-Gm-Message-State: AOAM5315MEAolrKEnw5ef4ARnFFMD51zLId06MqKKl5fNJY1/JmRHgm5 mfbm3m34PCiVGvg7IV0Qcg2tRRy6UNzLk0355U+Azw== X-Google-Smtp-Source: ABdhPJxKU5KnplQiK3wJ89dgP2Z6r6YkDM6wqIouxA5E+GzMS6P4D6NNbrETbSLxCx5gPHjIjiIP6whaMvYs6GgW/z0= X-Received: by 2002:a5d:950d:: with SMTP id d13mr1142651iom.138.1632349819768; Wed, 22 Sep 2021 15:30:19 -0700 (PDT) MIME-Version: 1.0 References: <20210921163323.944352-1-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Wed, 22 Sep 2021 15:29:42 -0700 Message-ID: Subject: Re: [PATCH 1/3] userfaultfd/selftests: fix feature support detection To: Peter Xu Cc: Andrew Morton , Shuah Khan , Linux MM , Linuxkselftest , LKML , James Houghton Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: AC5B890000A6 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=JP1y2Rel; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.166.52 as permitted sender) smtp.mailfrom=axelrasmussen@google.com X-Stat-Signature: wzuyrumr98o3prf6btnu9b3x1gemc47r X-HE-Tag: 1632349820-366073 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 22, 2021 at 2:52 PM Peter Xu wrote: > > On Wed, Sep 22, 2021 at 01:54:53PM -0700, Axel Rasmussen wrote: > > On Wed, Sep 22, 2021 at 10:33 AM Peter Xu wrote: > > > > > > Hello, Axel, > > > > > > On Wed, Sep 22, 2021 at 10:04:03AM -0700, Axel Rasmussen wrote: > > > > Thanks for discussing the design Peter. I have some ideas which might > > > > make for a nicer v2; I'll massage the code a bit and see what I can > > > > come up with. > > > > > > Sure thing. Note again that as I don't have a strong opinion on that, feel > > > free to keep it. However if you provide v2, I'll read. > > > > > > [off-topic below] > > > > > > Another thing I probably have forgot but need your confirmation is, when you > > > worked on uffd minor mode, did you explicitly disable thp, or is it allowed? > > > > I gave a more detailed answer in the other thread, but: currently it > > is allowed, but this was a bug / oversight on my part. :) THP collapse > > can break the guarantees minor fault registration is trying to > > provide. > > I've replied there: > > https://lore.kernel.org/linux-mm/YUueOUfoamxOvEyO@t490s/ > > We can try to keep the discussion unified there regarding this. > > > But there's another scenario: what if the collapse happened well > > before registration happened? > > Maybe yes, but my understanding of the current uffd-minor scenario tells me > that this is fine too. Meanwhile I actually have another idea regarding minor > mode, please continue reading. > > Firstly, let me try to re-cap on how minor mode is used in your production > systems: I believe there should have two processes A and B, if A is the main > process, B could be the migration process. B migrates pages in the background, > while A so far should have been stopped and never ran. When we want to start > A, we should register A with uffd-minor upon the whole range (note: I think so > far A does not have any pgtable mapped within uffd-minor range). Then any page > access of A should kick B and asking "whether it is the latest page", if yes > then UFFDIO_CONTINUE, if no then B modifies the page, plus UFFDIO_CONTINUE > afterwards. Am I right above? > > So if that's the case, then A should have no page table at all. > > Then, is that a problem if the shmem file that A maps contains huge thps? I > think no - because UFFDIO_CONTINUE will only install small pages. > > Let me know if I'm understanding it right above; I'll be happy to be corrected. Right, except that our use case is even more similar to QEMU: the code doing UFFDIO_CONTINUE / demand paging, and the code running the vCPUs, are in the same process (same mm) - just different threads. > > Actually besides this scenario, I'm also thinking of another scenario of using > minor fault in a single process - that's mostly what QEMU is doing right now, > as QEMU has the vcpu threads and migration thread sharing a single mm/pgtable. > So I think it'll be great to have a new madvise(MADV_ZAP) which will tear down > all the file-backed memory pgtables of a specific range. I think it'll suite > perfectly for the minor fault use case, and it can be used for other things > too. Let me know what you think about this idea, and whether that'll help in > your case too (e.g., if you worry a current process A mapped huge shmem thp > somewhere, we can use madvise(MADV_ZAP) to drop it). Yes, this would be convenient for our implementation too. :) There are workarounds if the feature doesn't exist, but it would be nice to have. It's also useful for memory poisoning, I think, if the host decides some page(s) are "bad" and wants to intercept any future guest accesses to those page(s). > > > I *think* the existing code deals with THPs correctly in that case, but then > > again I don't think our selftest really covers this case, and it's not > > something I've tested in production either (to work around the other bug, we > > currently MADV_NOHUGEPAGE the area until after VM demand paging completes, > > and the UFFD registration is removed), so I am not super confident this is > > the case. > > In all cases, enhancing the test program will always be welcomed. > > Thanks, > > -- > Peter Xu >