From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08162C433E0 for ; Tue, 22 Dec 2020 18:30:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9DC7322571 for ; Tue, 22 Dec 2020 18:30:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9DC7322571 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E5CAA8D0001; Tue, 22 Dec 2020 13:30:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E0DE36B00A8; Tue, 22 Dec 2020 13:30:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D22DC8D0001; Tue, 22 Dec 2020 13:30:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id BCED76B00A6 for ; Tue, 22 Dec 2020 13:30:43 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 804A5180AD822 for ; Tue, 22 Dec 2020 18:30:43 +0000 (UTC) X-FDA: 77621759166.26.bear73_2a0d23527462 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 658671804A330 for ; Tue, 22 Dec 2020 18:30:43 +0000 (UTC) X-HE-Tag: bear73_2a0d23527462 X-Filterd-Recvd-Size: 9134 Received: from mail-il1-f174.google.com (mail-il1-f174.google.com [209.85.166.174]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 18:30:42 +0000 (UTC) Received: by mail-il1-f174.google.com with SMTP id n9so12862912ili.0 for ; Tue, 22 Dec 2020 10:30:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=2/91dbxcc8vjfFczLbnffexa0oogy+Nlat6FcAAxiKE=; b=k8+J1Xh03Vy0Zpqimx3ivXxIaRfuw00eKPTDw1vF0sVhuKz13AepLp4Ns4vA44mCCW g77cyTV7YoHCr61qWC7vcaHtOwHtJlNlLRFQSqoFaAjhgVyQK6L6FD8cTn1eBQDLovzn WW9DFcJ16vywwncwlaCEJWg1kTECRDNbVYUCLPWPwq71Zy2dQejspHyJB7JfPz6fAhGl kG6Di9Q2K8If7+3pKVLhYFRzjd9D6GqfgCHv4sF/uy+ZA4ms+3oR9dPEUUeyma1jFzc1 2Zexq7UdNM0kMYmNaQ2kaZY4VZKcO3OIU2J1we/48pJh50E3A8G5BhgXrufMLytRQw2F NmvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=2/91dbxcc8vjfFczLbnffexa0oogy+Nlat6FcAAxiKE=; b=gN+bR8wFYbyMPSutw17TYNVzzSUleeAHnhNCajJrH8TznXgc1Y+0FWhvo/U4RDb/Qg Pe3KMkMAk0CJ2HnOB0gCfN0E3CqmnCVIEWRsX9zYKWbqSoGk22vZ8er9Z0h0i+cOR2hc s3V+wZmBYQX0PmJNerOc87j2o/sU+U3YlNS4ZQO1o8arxpad5BGRtGrcXJKU7YUfsIyb hQ/R10ZLLqmoHVTlV08XV6IXbWE+y+JkkxQxpDD37m6g9bqbjlWu3+ku87XcHSmPihIZ XEL6wq7KO7IapswiKFdiB3NBMWq06TcRQOd4YrDRRXOjAr3sdIKiIeSm4sazr3Go9tVe XMww== X-Gm-Message-State: AOAM531HdJpHSvJgy56Z6UazNwxAiQDGCUtWk/OVhADTo0/T4pn7yXoM Y0YKulHmeaqQ8aKCO49wBgwpFw== X-Google-Smtp-Source: ABdhPJz2iQ/q85zYmOOvFgtHRoG8/tOM89MswWR+0F+w4sHO+r2ZNQ873gc1H6TgLrU4X7omsXjP9g== X-Received: by 2002:a05:6e02:152f:: with SMTP id i15mr21837403ilu.104.1608661842045; Tue, 22 Dec 2020 10:30:42 -0800 (PST) Received: from google.com ([2620:15c:183:200:7220:84ff:fe09:2d90]) by smtp.gmail.com with ESMTPSA id t22sm16453414ill.35.2020.12.22.10.30.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 10:30:41 -0800 (PST) Date: Tue, 22 Dec 2020 11:30:37 -0700 From: Yu Zhao To: Nadav Amit Cc: Peter Zijlstra , Minchan Kim , Linus Torvalds , Peter Xu , Andrea Arcangeli , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Andy Lutomirski , Will Deacon Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote: > > On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote: > >=20 > > On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote: > >> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit w= rote: > >>> Using mmap_write_lock() was my initial fix and there was a strong p= ushback > >>> on this approach due to its potential impact on performance. > >>=20 > >> From whom? > >>=20 > >> Somebody who doesn't understand that correctness is more important > >> than performance? And that userfaultfd is not the most important par= t > >> of the system? > >>=20 > >> The fact is, userfaultfd is CLEARLY BUGGY. > >>=20 > >> Linus > >=20 > > Fair enough. > >=20 > > Nadav, for your patch (you might want to update the commit message). > >=20 > > Reviewed-by: Yu Zhao > >=20 > > While we are all here, there is also clear_soft_dirty() that could > > use a similar fix=E2=80=A6 >=20 > Just an update as for why I have still not sent v2: I fixed > clear_soft_dirty(), created a reproducer, and the reproducer kept faili= ng. >=20 > So after some debugging, it appears that clear_refs_write() does not fl= ush > the TLB. It indeed calls tlb_finish_mmu() but since 0758cd830494 > ("asm-generic/tlb: avoid potential double flush=E2=80=9D), tlb_finish_m= mu() does not > flush the TLB since there is clear_refs_write() does not call to > __tlb_adjust_range() (unless there are nested TLBs are pending). Sorry Nadav, I assumed you knew this existing problem fixed by: https://patchwork.kernel.org/project/linux-mm/cover/20201210121110.10094-= 1-will@kernel.org/ > So I have a patch for this issue too: arguably the tlb_gather interface= is > not the right one for clear_refs_write() that does not clear PTEs but > changes them. >=20 > Yet, sadly, my reproducer keeps falling (less frequently, but still). S= o I > will keep debugging to see what goes wrong. I will send v2 once I figur= e out > what the heck is wrong in the code or my reproducer. >=20 > For the reference, here is my reproducer: Thanks. This would be helpful in case any other breakages happen in the future. > -- >8 -- >=20 > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include >=20 > #define PAGE_SIZE (4096) > #define TLB_SIZE (2000) > #define N_PAGES (300000) > #define ITERATIONS (100) > #define N_THREADS (2) >=20 > static int stop; > static char *m; >=20 > static int writer(void *argp) > { > unsigned long t_idx =3D (unsigned long)argp; > int i, cnt =3D 0; >=20 > while (!atomic_load(&stop)) { > cnt++; > atomic_fetch_add((atomic_int *)m, 1); >=20 > /* > * First thread only accesses the page to have it cached in the > * TLB. > */ > if (t_idx =3D=3D 0) > continue; >=20 > /* > * Other threads access enough entries to cause eviction from > * the TLB and trigger #PF upon the next access (before the TLB > * flush of clear_ref actually takes place). > */ > for (i =3D 1; i < TLB_SIZE; i++) { > if (atomic_load((atomic_int *)(m + PAGE_SIZE * i))) { > fprintf(stderr, "unexpected error\n"); > exit(1); > } > } > } > return cnt; > } >=20 > /* > * Runs mlock/munlock in the background to raise the page-count of the = page and > * force copying instead of reusing the page. > */ > static int do_mlock(void *argp) > { > while (!atomic_load(&stop)) { > if (mlock(m, PAGE_SIZE) || munlock(m, PAGE_SIZE)) { > perror("mlock/munlock"); > exit(1); > } > } > return 0; > } >=20 > int main(void) > { > int r, cnt, fd, total =3D 0; > long i; > thrd_t thr[N_THREADS]; > thrd_t mlock_thr[N_THREADS]; >=20 > fd =3D open("/proc/self/clear_refs", O_WRONLY, 0666); > if (fd < 0) { > perror("open"); > exit(1); > } >=20 > /* > * Have large memory for clear_ref, so there would be some time betwee= n > * the unmap and the actual deferred flush. > */ > m =3D mmap(NULL, PAGE_SIZE * N_PAGES, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE, -1, 0); > if (m =3D=3D MAP_FAILED) { > perror("mmap"); > exit(1); > } >=20 > for (i =3D 0; i < N_THREADS; i++) { > r =3D thrd_create(&thr[i], writer, (void *)i); > assert(r =3D=3D thrd_success); > } >=20 > for (i =3D 0; i < N_THREADS; i++) { > r =3D thrd_create(&mlock_thr[i], do_mlock, (void *)i); > assert(r =3D=3D thrd_success); > } >=20 > for (i =3D 0; i < ITERATIONS; i++) { > for (i =3D 0; i < ITERATIONS; i++) { > r =3D pwrite(fd, "4", 1, 0); > if (r < 0) { > perror("pwrite"); > exit(1); > } > } > } >=20 > atomic_store(&stop, 1); >=20 > for (i =3D 0; i < N_THREADS; i++) { > r =3D thrd_join(mlock_thr[i], NULL); > assert(r =3D=3D thrd_success); > } >=20 > for (i =3D 0; i < N_THREADS; i++) { > r =3D thrd_join(thr[i], &cnt); > assert(r =3D=3D thrd_success); > total +=3D cnt; > } >=20 > r =3D atomic_load((atomic_int *)(m)); > if (r !=3D total) { > fprintf(stderr, "failed - expected=3D%d actual=3D%d\n", total, r); > exit(-1); > } >=20 > fprintf(stderr, "ok\n"); > return 0; > }