From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A37E9C433F5 for ; Mon, 27 Sep 2021 17:50:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5420E60F44 for ; Mon, 27 Sep 2021 17:50:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5420E60F44 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D19A56B006C; Mon, 27 Sep 2021 13:50:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC94A6B0071; Mon, 27 Sep 2021 13:50:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9266900002; Mon, 27 Sep 2021 13:50:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AC1036B006C for ; Mon, 27 Sep 2021 13:50:17 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6C689181D46FE for ; Mon, 27 Sep 2021 17:50:17 +0000 (UTC) X-FDA: 78634092474.31.C036610 Received: from mail-il1-f181.google.com (mail-il1-f181.google.com [209.85.166.181]) by imf23.hostedemail.com (Postfix) with ESMTP id D7B5990000B8 for ; Mon, 27 Sep 2021 17:50:16 +0000 (UTC) Received: by mail-il1-f181.google.com with SMTP id b8so20135177ilh.12 for ; Mon, 27 Sep 2021 10:50:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JeXAWwMhuDPs2Ucq4IAy5MhKNcSmQryqivEVv2PvDCs=; b=s9ZJL5p5FYvRrjU9NT1lZZPz6433tvlK8E6ueAnEAOThH49SUqnJnUCD4/4ymIk0z7 /KSaT37hlbdMTvwZLyBKLo6uKignRJDRewVGDXXHNWS3jtsYQF652+48uRExIm1iBWmh 2bCq3xiX/P1HLK/A+Kp6UeUV7NTfBGKoploeSrFe57usKdF5euTy2CW05M1v7HxioKjt 5YH/hDNBsKzGeXez/Gv8mVhPdChzaWf2HBwM82xejR7Ewaf1SvXGSjQmDHBAeHj4SxCe 40m2LoVEGi786EsfZffrwumhN+ZwOl4iFct6oTS5JorDeffpsmmsT3NRPnz/mCDjin90 M17w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JeXAWwMhuDPs2Ucq4IAy5MhKNcSmQryqivEVv2PvDCs=; b=iXoZc+z8iTqCglF11R4g9f69I+d5eV4swu4CN6bQZPY4isZLY8Wj1N51OfUDmgbt4M lP3IQAHYUW/lx7ikygGVcW5sB0aLzqsTSPv9S3DhQihb1j1UnirR1u9N8TgASJZY1ZE2 lVUCku+/n05apYLXGR1J964MaivCER6gTT46BEoiGXidmKIA4HIypIcIQu7utludHzPI Ex+mYOOMDRZyntuLOUHQka9qROg5qD5vEhiEjpwDOM9iQ7SOGnmGBlFqF0tVf6U+jaWn xfHtlRehLVJjt5uL73p7L8fIu/d1eOCBJMF7gVqVR0Ns//qxe6OQuzT426sIhYA1WrCp GB/g== X-Gm-Message-State: AOAM530QD13CY7NFDSpeKNf8i25N5uBDpPIZcuyB3XWnNOTmaLIeWRB5 qnZwFkGsTot/4QM9weO1UQJXzIptbBmGexooQRpMfw== X-Google-Smtp-Source: ABdhPJyrmCxobuoyAQvGwJsxAHPiuYGwrnui4qRSnu3bfn+xhA6+xuVdUEYs94sZCaLvn1luw3VU8VWthC/K2dtll1c= X-Received: by 2002:a92:de07:: with SMTP id x7mr956128ilm.239.1632765015930; Mon, 27 Sep 2021 10:50:15 -0700 (PDT) MIME-Version: 1.0 References: <20210923232512.210092-1-peterx@redhat.com> In-Reply-To: From: Axel Rasmussen Date: Mon, 27 Sep 2021 10:49:39 -0700 Message-ID: Subject: Re: [PATCH] mm/userfaultfd: selftests: Fix memory corruption with thp enabled To: Peter Xu Cc: Linux MM , LKML , Andrew Morton , Andrea Arcangeli , Nadav Amit , Li Wang Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D7B5990000B8 X-Stat-Signature: fpr5jpzj53rhmj1kfe7hq9u15kbwoa48 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=s9ZJL5p5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.166.181 as permitted sender) smtp.mailfrom=axelrasmussen@google.com X-HE-Tag: 1632765016-520381 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 27, 2021 at 10:36 AM Peter Xu wrote: > > On Fri, Sep 24, 2021 at 03:59:32PM -0400, Peter Xu wrote: > > On Fri, Sep 24, 2021 at 10:21:30AM -0700, Axel Rasmussen wrote: > > > On Thu, Sep 23, 2021 at 4:25 PM Peter Xu wrote: > > > > > > > > In RHEL's gating selftests we've encountered memory corruption in the uffd > > > > event test even with upstream kernel: > > > > > > > > # ./userfaultfd anon 128 4 > > > > nr_pages: 32768, nr_pages_per_cpu: 32768 > > > > bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729) > > > > bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877) > > > > bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699) > > > > bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196) > > > > testing uffd-wp with pagemap (pgsize=4096): done > > > > testing uffd-wp with pagemap (pgsize=2097152): done > > > > testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963) > > > > ERROR: faulting process failed (errno=0, line=1117) > > > > > > > > It can be easily reproduced when global thp enabled, which is the default for > > > > RHEL. > > > > > > > > It's also known as a side effect of commit 0db282ba2c12 ("selftest: use mmap > > > > instead of posix_memalign to allocate memory", 2021-07-23), which is imho right > > > > itself on using mmap() to make sure the addresses will be untagged even on arm. > > > > > > > > The problem is, for each test we allocate buffers using two allocate_area() > > > > calls. We assumed these two buffers won't affect each other, however they > > > > could, because mmap() could have found that the two buffers are near each other > > > > and having the same VMA flags, so they got merged into one VMA. > > > > > > > > It won't be a big problem if thp is not enabled, but when thp is agressively > > > > enabled it means when initializing the src buffer it could accidentally setup > > > > part of the dest buffer too when there's a shared THP that overlaps the two > > > > regions. Then some of the dest buffer won't be able to be trapped by > > > > userfaultfd missing mode, then it'll cause memory corruption as described. > > > > > > > > To fix it, do release_pages() after initializing the src buffer. > > > > > > But, if I understand correctly, release_pages() will just free the > > > physical pages, but not touch the VMA(s). So, with the right > > > max_ptes_none setting, why couldn't khugepaged just decide to > > > re-collapse (with zero pages) immediately after we release the pages, > > > causing the same problem? It seems to me this change just > > > significantly narrows the race window (which explains why we see less > > > of the issue), but doesn't fix it fundamentally. > > > > Did you mean you can reproduce the issue even with this patch? > > > > It is a good point anyway, indeed I don't see anything stops it from happening. > > > > I wanted to prepare a v2 by releasing the pages after uffdio registration where > > we'll do the vma split, but it won't simply work because release_pages() will > > cause the process to hang death since that test registers with EVENT_REMOVE, > > and release_pages() upon the thp will trigger synchronous EVENT_REMOVE which > > cannot be handled by anyone. > > > > Another solution is to map some PROT_NONE regions between the buffers, to make > > sure they won't share a VMA. I'll need to think more about which is better.. > > Axel, let me know if you can reproduce an issue with this patch, or otherwise > would you mind we keep this patch in -mm and fix the issue first? I can never > reproduce any issue with current patch even if I agree you're probably right, > however before the patch is mostly 100% reproducable to fail. Totally fair, if nothing else the patch at least makes it a lot better. :) Keeping it in the mm tree or even merging it seems fine to me, we can continue iterating later. One small comment: I'd prefer to keep the "uffd_test_ops->release_pages(area_src);" above, to ensure the src region is empty. It's not immediately obvious to me that we overwrite *all* of the bytes in src when we initialize it. (I'd have to go look at the definition of area_count and read the loop carefully.) It may not be technically needed, but it makes the guarantee that we're starting with a clean slate, free from any changes from previous test cases, very clear + explicit. Moving the release_pages(area_dst) down as you've done seems correct to me. Either way you can take: Reviewed-by: Axel Rasmussen > > It's just that after the weekend when I look back I still don't see a 100% > clean way to fix it yet. Mapping 4K PROT_NONE before/after each allocation is > the most ideal but still looks tricky to me. > > Would you have time on looking for a better solution, so as to (see it a way > to) complete what commit 8ba6e8640844 whats to do afterwards? Sure, it seems related to the other THP investigations we talked about in the other thread, so I'm happy to look into it. Just to set expectations, progress may be slightly slow as I'm balancing other work my employer wants done, and some upcoming time off. But, I think with your patch the test is at least stable (not flaky) enough that there is no *urgent* need for this, so it should be fine. > > Thanks, > > -- > Peter Xu >