From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B25E3EB64DA for ; Tue, 4 Jul 2023 08:03:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 144506B007D; Tue, 4 Jul 2023 04:03:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F40C6B007E; Tue, 4 Jul 2023 04:03:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFD8B280049; Tue, 4 Jul 2023 04:03:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E145D6B007D for ; Tue, 4 Jul 2023 04:03:20 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A8818140991 for ; Tue, 4 Jul 2023 08:03:20 +0000 (UTC) X-FDA: 80973189360.12.1D8FAD9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 62C1D120019 for ; Tue, 4 Jul 2023 08:03:18 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hPey8UXY; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688457798; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zSW5jIdKPoSOIjhCs0QiWSoHB4OsoyY9LhVkAMOsjIk=; b=NFByK6oczHBv8j44/Jrn7GPVTG8ph/3hRDSqC69N2FhvqVWATQ4WdyDbmB8t56wsB3vXx1 K27H7s/sTV+yZdHW+zUBRT9nbkcumCjcWK34pLC+MWNhFD9OkVttjU8YO7b8g4dYGAHG3G P6fPwbHw748S+J3Xt1nogD9HN3sil44= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hPey8UXY; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688457798; a=rsa-sha256; cv=none; b=kY612j/sYMvUTIidNZh+EABvTDI3Raz8DWb4KtdCsXHkiq3lw/qzIBqgnnBL7wTZEGoI2b WhIppxHtaX+KGU+nXBRe0QMDkKOoewc2pEbdHeQQg51LgEK12Oyrs9sqgEBoMkwYOb26uG C7ylrCU5KR4PlotXAn1vYoPFGq0ubi8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688457797; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zSW5jIdKPoSOIjhCs0QiWSoHB4OsoyY9LhVkAMOsjIk=; b=hPey8UXYyWB0dqB8a7yD/7FMyEH3S+R5uzEo+kofFW8+2ruVlUpBVPwyB2J4j9DxowI1mL bGq7ZGdHy+4lgpjkZNbZYao3y+AdDahLCV8j6AIfHht8lQfp/T3/pIuMUsr6QlLHWm2jNI nrygrK1sHxtYly6qjRq0XPgbKj0eTzY= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-186-TRoE3PktMjSrKmHmDu2J4w-1; Tue, 04 Jul 2023 04:03:16 -0400 X-MC-Unique: TRoE3PktMjSrKmHmDu2J4w-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3fbdbd0ac1eso10815095e9.1 for ; Tue, 04 Jul 2023 01:03:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688457795; x=1691049795; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zSW5jIdKPoSOIjhCs0QiWSoHB4OsoyY9LhVkAMOsjIk=; b=lVkrXtplEKkmhnu5ysBpqrs+jenzbJ4DiyN84zjnMFhJceKYezkmUTZdPVI0CJymrI VpaGi3WVNd+atFNVh6L9bYlflhXJGfkUWL1o/Q65tmutxFT+yBgUOv6nvHQiv1Xpmdsh f+S2VO91lpHx082pN1bqKyhnL65XrRp6EzE8qRPxXza8PqS/DYQEQv8OU4dpk1tiwkT4 QjHdIoRvLg9e2mfFnM69OzXERc0OTXBE8VG1727Pe4f1pfRaB+VqpwtBy4PcRDUfgvUB ODqIUpMb9HwGMvNPvnSduPNrWvBoZtJNFaLd01Pl6qeOkhmgqGuK7l1FRU/TFdtvV+0e mz6Q== X-Gm-Message-State: AC+VfDyw9MhSGC1PJK/rVcvQ86yQoe5uuSsc7e76hX+zAN+LCvTuicbr BlQpIcQQrFLWZgqTrefRPXhWxp9HO1RI3qGm7+RuqsWJKdHEQvMRwsQb6Apd46hXSVpuAB9+vev PhatCGB5JBxI= X-Received: by 2002:a05:600c:ac8:b0:3f8:f80e:7b48 with SMTP id c8-20020a05600c0ac800b003f8f80e7b48mr10858954wmr.32.1688457795002; Tue, 04 Jul 2023 01:03:15 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7BFHOXymai5gl56cEgmurdhOSj5IAuVi7K0djy7G3raFmOkxV5iyKE6Mbdda1v8YEErkiuag== X-Received: by 2002:a05:600c:ac8:b0:3f8:f80e:7b48 with SMTP id c8-20020a05600c0ac800b003f8f80e7b48mr10858915wmr.32.1688457794572; Tue, 04 Jul 2023 01:03:14 -0700 (PDT) Received: from ?IPV6:2003:d8:2f30:5a00:b30d:e6bc:74c3:d6f2? (p200300d82f305a00b30de6bc74c3d6f2.dip0.t-ipconnect.de. [2003:d8:2f30:5a00:b30d:e6bc:74c3:d6f2]) by smtp.gmail.com with ESMTPSA id h9-20020a05600c314900b003fa9a00d74csm11727925wmo.3.2023.07.04.01.03.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Jul 2023 01:03:14 -0700 (PDT) Message-ID: <25e5c12d-6038-4482-b6da-bb41af5a9486@redhat.com> Date: Tue, 4 Jul 2023 10:03:11 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, jirislaby@kernel.org, jacobly.alt@gmail.com, holger@applied-asynchrony.com, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, linux-mm@kvack.org References: <20230703182150.2193578-1-surenb@google.com> <7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 62C1D120019 X-Stat-Signature: easwaqn4daes8xhn3fkss8c1zyuxq8nz X-HE-Tag: 1688457798-228766 X-HE-Meta: U2FsdGVkX195ZxySZTnCyr+hhzt1Ng/z6hJQ8cDtDPdaref0aBeNPrLpvODFXv9CAWsB0Loaew3INN3cpNPp0dPhPZfMFzfGIwaqQ6zU4F4m+G4kVzvB8XPOkg8BbvdBEkxRnOtCPR/mBucpfUxNoChfUN/wATYTjqq8W01Vv8OVYLPKjhxgG+FlOYEd/lxupNUbZZWcbO7lJ5N4fGzhSbuVhhXxuoXNkL0fdWzXSLzcAeLzYfyhKlGDMTf3KhDVfQOSL2MeT36qL4zKVUIDmYRqZ/uJ/Cx58TxKfpA8D7gP1phIXh8FAcySqBNIbBv+jj2xktEa7rkF8ZuYI2b9aul5BPSiE7rU3cWD7LFEEnqm5x30goVrGiP2rMX4INM77CKYN0BYEPGOfMyoYSWl7vYoAhS3eTX81JEijNDbZZqRlXCYpzkW0AMZXJhp758FV4KFZfIzKvgQLZHp1kLZ0RLFdHp+F6aOSV9qelu9bL+hnhdhmFmlo2x/8o1xlxG1afQzOd/Tz1kFLqjTZcbWc8lyst5xV/l2n1oxYAikITyCFfTo0ibKTdDPVJ+9RcXjUm6peFo1+QWrRnX2GlPzjQSVpeOnJwOUNV0ezflYYo5GSIMnIQyeeWZZ9YpEmY6oPx4J2O/1myaUuPuCByy+GuiLlVQl373BXCij+BSF7Lp+uIKCuVhSXdTTj9et/7ZSpT6VJVG3W/Jazj/BgRpbNU/k79/MzI2Ut2QoEr4f1acEqn9PC4icadSoDS8ilyQOJokTZrVYlpMZGvflhkz6HohrijrTGsVrloVeWSh5trzvcK744eDLg7pd9zbm4DWZko9Su5JpCZDkJm28ujejoJZPU1ZKnFJrGbCLvUkgnpy70qDy61MqB1nV68DvW7PxTZJdwDZoYnKFcPShr0mIkBwk7/rV7H0gyZzHyCdB52SLHmVvRIcKTbUVTBTU9WtpAjamBBPkfYN0oc0JUyc VmmwSbZi QTVgIHDLWTECSUnDtB8ObVgmd4j6r11meDeqeUMPIUyCT5i2krfXRTH0ipClqGz861tdqK49gwICv90DmenWwIMVwf5trNCo4yfZMBT+Ak6HjlfAgpLlfYLx0oCYSLK/7D4K3QrZO49ova151JmJRC3y65lcVkYCOIbNekytFe4QWpiGyGtTXWcj7nKh++c++/LNh3pBnBKcj0TiE5S+cAu8B1kG1jU1pBkKcNpd9HaJCSqI46kGrc6fw3xllE6EHaANZvlyf+APTQtQJupJZajvbuCecgvVtqZE9rmm1jlA+g/t8gkvgL8FCHbqVLdIrOVokZNQfktSBCG/pQjYo9i0IdFmBn1/WOy6r0Ck5C/jLySBYYBLNnj0RqTlTIma2OYY6V54b96+Ljk9xQCWZxihH1XRbKoRF1w2R18HpDtCgeuGxNJgkbjXRyBEQ6Q1N4LbWM1A3NPmChk0nj0+qUWhurMZ9v0Bh7nM+rA7BmUKUcwcgBi/0mOlWxnUIyP9K4YHCfY5Lisw4y9n8PYnN4UK8khuu6kCYisk0e4YLag/ft2Ic92d+uBcjUheQmKfQY8It8mZK6t5OjEUJCpTWvxnzJLmP+NmacmQwLz67xDZFZP18TLslZC01/k4hUk0AG0jR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04.07.23 09:34, Suren Baghdasaryan wrote: > On Tue, Jul 4, 2023 at 12:18 AM David Hildenbrand wrote: >> >> On 04.07.23 08:50, Suren Baghdasaryan wrote: >>> On Mon, Jul 3, 2023 at 10:39 PM Suren Baghdasaryan wrote: >>>> >>>> On Mon, Jul 3, 2023 at 8:30 PM David Hildenbrand wrote: >>>>> >>>>> On 03.07.23 20:21, Suren Baghdasaryan wrote: >>>>>> A memory corruption was reported in [1] with bisection pointing to the >>>>>> patch [2] enabling per-VMA locks for x86. >>>>>> Disable per-VMA locks config to prevent this issue while the problem is >>>>>> being investigated. This is expected to be a temporary measure. >>>>>> >>>>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624 >>>>>> [2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com >>>>>> >>>>>> Reported-by: Jiri Slaby >>>>>> Reported-by: Jacob Young >>>>>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") >>>>>> Signed-off-by: Suren Baghdasaryan >>>>>> --- >>>>>> mm/Kconfig | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/mm/Kconfig b/mm/Kconfig >>>>>> index 09130434e30d..de94b2497600 100644 >>>>>> --- a/mm/Kconfig >>>>>> +++ b/mm/Kconfig >>>>>> @@ -1224,7 +1224,7 @@ config ARCH_SUPPORTS_PER_VMA_LOCK >>>>>> def_bool n >>>>>> >>>>>> config PER_VMA_LOCK >>>>>> - def_bool y >>>>>> + bool "Enable per-vma locking during page fault handling." >>>>>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP >>>>>> help >>>>>> Allow per-vma locking during page fault handling. >>>>> >>>>> As raised at LSF/MM, I was "surprised" that we can now handle page faults >>>>> concurrent to fork() and was expecting something to be broken already. >>>>> >>>>> What probably happens is that we wr-protected the page in the parent process and >>>>> COW-shared an anon page with the child using copy_present_pte(). >>>>> >>>>> But we only flush the parent MM tlb before we drop the parent MM lock in >>>>> dup_mmap(). >>>>> >>>>> >>>>> If we get a write-fault before that TLB flush in the parent, and we end up >>>>> replacing that anon page in the parent process in do_wp_page() [because, COW-shared with the child], >>>>> this might be problematic: some stale writable TLB entries can target the wrong (old) page. >>>> >>>> Hi David, >>>> Thanks for the detailed explanation. Let me check if this is indeed >>>> what's happening here. If that's indeed the cause, I think we can >>>> write-lock the VMAs being dup'ed until the TLB is flushed and >>>> mmap_write_unlock(oldmm) unlocks them all and lets page faults to >>>> proceed. If that works we at least will know the reason for the memory >>>> corruption. >>> >>> Yep, locking the VMAs being copied inside dup_mmap() seems to fix the issue: >>> >>> for_each_vma(old_vmi, mpnt) { >>> struct file *file; >>> >>> + vma_start_write(mpnt); >>> if (mpnt->vm_flags & VM_DONTCOPY) { >>> vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt)); >>> continue; >>> } >>> >>> At least the reproducer at >>> https://bugzilla.kernel.org/show_bug.cgi?id=217624 is working now. But >>> I wonder if that's the best way to fix this. It's surely simple but >>> locking every VMA is not free and doing that on every fork might >>> regress performance. >> >> >> That would mean that we can possibly still get page faults concurrent to >> fork(), on the yet unprocessed part. While that fixes the issue at hand, >> I cannot reliably tell if this doesn't mess with some other fork() >> corner case. >> >> I'd suggest write-locking all VMAs upfront, before doing any kind of >> fork-mm operation. Just like the old code did. See below. >> >>> >>>> Thanks, >>>> Suren. >>>> >>>>> >>>>> >>>>> We had similar issues in the past with userfaultfd, see the comment at the beginning of do_wp_page(): >>>>> >>>>> >>>>> if (likely(!unshare)) { >>>>> if (userfaultfd_pte_wp(vma, *vmf->pte)) { >>>>> pte_unmap_unlock(vmf->pte, vmf->ptl); >>>>> return handle_userfault(vmf, VM_UFFD_WP); >>>>> } >>>>> >>>>> /* >>>>> * Userfaultfd write-protect can defer flushes. Ensure the TLB >>>>> * is flushed in this case before copying. >>>>> */ >>>>> if (unlikely(userfaultfd_wp(vmf->vma) && >>>>> mm_tlb_flush_pending(vmf->vma->vm_mm))) >>>>> flush_tlb_page(vmf->vma, vmf->address); >>>>> } >>> >>> If do_wp_page() could identify that vmf->vma is being copied, we could >>> simply return VM_FAULT_RETRY and retry the page fault under mmap_lock, >>> which would block until dup_mmap() is done... Maybe we could use >>> mm_tlb_flush_pending() for that? WDYT? >> >> I'm not convinced that we should be making that code more complicated >> simply to speed up fork() with concurrent page faults. >> >> My gut feeling is that many operations that could possible take the VMA >> lock in the future (page pinning triggering unsharing) should not run >> concurrent with fork(). >> >> So IMHO, keep the old behavior of fork() -- i.e., no concurrent page >> faults -- and unlock that eventually in the future when deemed really >> required (but people should really avoid fork() in performance-sensitive >> applications if not absolutely required). > > Thanks for the input, David. Yeah, that sounds reasonable. I'll test > some more tomorrow morning and if everything looks good will post a > patch to lock the VMAs and another one to re-enable > CONFIG_PER_VMA_LOCK. > Thanks for all the help! Fortunately, I spotted fork() in the reproducer and remembered that there is something nasty about COW page replacement and TLB flushes :) Can we avoid the temporary disabling of per-vma lock by a simple "lock all VMAs" patch, or is that patch (here) already upstream/on its way upstream? -- Cheers, David / dhildenb