From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF431C433F5 for ; Mon, 25 Oct 2021 16:19:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5707460EFF for ; Mon, 25 Oct 2021 16:19:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5707460EFF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id CAE82940009; Mon, 25 Oct 2021 12:19:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5DC0940008; Mon, 25 Oct 2021 12:19:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B26E7940009; Mon, 25 Oct 2021 12:19:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 8911A940008 for ; Mon, 25 Oct 2021 12:19:40 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3D391180588C4 for ; Mon, 25 Oct 2021 16:19:40 +0000 (UTC) X-FDA: 78735470520.26.F9D219D Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf24.hostedemail.com (Postfix) with ESMTP id C5344B00009F for ; Mon, 25 Oct 2021 16:19:39 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id r5so3704100pls.1 for ; Mon, 25 Oct 2021 09:19:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=CV9xop57N44Xoc3ELL1tfkVbPt6ec4IJCZX741zNY1A=; b=IVxTXSin5BglUw4bXJHX+dRiUU5dDRS/G8xMx7vrIAKQGjUrGJoafbgQcoyySh8izD QOHQyTU2CzqkphZ631WAQqBJFo3HqbzSPddnHSqGtyhAvIzzqqpRCnD9EcKXn5pQ0DZc gRWIcFSA9GRZLWqHqxTjLYOPVWcrdV453x9sZKnHvJgSJUqKgcIaTRph9nJ0bzV7w1Bz 6eL5sn21ezzZ2AMqBJOEyIr90Hhcdu/A3YvL5tFpGCLZjnWGCLhTTxwzL5SlOkXP4BbT IWeqM5ekgmOkOFHbVV8yyCIlg0Es5Ebss7lPx0ECXG302ElTCnfd2VLON4PqHymGPzgU 0Bbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=CV9xop57N44Xoc3ELL1tfkVbPt6ec4IJCZX741zNY1A=; b=Iu1T6b982OdIytF488Zk/2/WfmeuyTiIJzSnXQ90JpvIJe3vtxiAsc7WhO51errp95 v1Kcun45qZeFbwX6T6vlZZhAXiIa6F5KsbFNZfpiXqui/UQ6xKEmHMmXoGWaWVZww/LW +fnTH2dcfBU+eXtA4c1Vn5oEV1mVy5yxy3qICqO/LjxFIl+9MM98Hy4jHtPcZPQQsrTl oIu5s8uAE6qt6ikI2oanVb4cYfb0OxQ5DSST0STG4oDMteqvLucWAf4B2CsTWAI6hwhU sBkY/Y2ufMXL0AZqQbbvVL2sysdyoSVxdLj6ZTZwCPVkJ+r4EHfb5itNIEj/XqCePUqh dLOA== X-Gm-Message-State: AOAM53346fymoKGyaqFBFEaZ5SW7TYNJvxLyrqbiv+Jvop+OCj1VQxR5 +bt2ffRt3noc9lA39PH/9Bc= X-Google-Smtp-Source: ABdhPJyMTuhfDNWd3bvHGuBmf0uJfF0cYE7S48+Y4SY1LpFsL5Dsb1AM/rwGEpkbzKD67eBZ00Tvcw== X-Received: by 2002:a17:90a:c088:: with SMTP id o8mr17219078pjs.1.1635178778487; Mon, 25 Oct 2021 09:19:38 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id gp21sm3727491pjb.47.2021.10.25.09.19.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Oct 2021 09:19:37 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: [PATCH v2 3/5] x86/mm: check exec permissions on fault From: Nadav Amit In-Reply-To: Date: Mon, 25 Oct 2021 09:19:35 -0700 Cc: Linux-MM , LKML , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <00C2DC4B-A77D-4B32-B7F7-2291830BC2D2@gmail.com> References: <20211021122112.592634-1-namit@vmware.com> <20211021122112.592634-4-namit@vmware.com> To: Dave Hansen X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C5344B00009F X-Stat-Signature: 9pbn6gpr9wrdswpikh34ck8z6c9owawa Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IVxTXSin; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-HE-Tag: 1635178779-161787 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Oct 25, 2021, at 7:20 AM, Dave Hansen = wrote: >=20 > On 10/21/21 5:21 AM, Nadav Amit wrote: >> access_error() currently does not check for execution permission >> violation.=20 > Ye >=20 >> As a result, spurious page-faults due to execution permission >> violation cause SIGSEGV. >=20 > While I could totally believe that something is goofy when VMAs are > being changed underneath a page fault, I'm having trouble figuring out > why the "if (error_code & X86_PF_WRITE)" code is being modified. In the scenario I mentioned the VMAs are not changed underneath the page-fault. They change *before* the page-fault, but there are residues of the old PTE in the TLB.=20 >=20 >> It appears not to be an issue so far, but the next patches avoid TLB >> flushes on permission promotion, which can lead to this scenario. = nodejs >> for instance crashes when TLB flush is avoided on permission = promotion. >=20 > Just to be clear, "promotion" is going from something like: >=20 > W=3D0->W=3D1 > or > NX=3D1->NX=3D0 >=20 > right? I tend to call that "relaxing" permissions. I specifically talk about NX=3D1>NX=3D0. I can change the language to =E2=80=9Crelaxing=E2=80=9D. >=20 > Currently, X86_PF_WRITE faults are considered an access error unless = the > VMA to which the write occurred allows writes. Returning "no access > error" permits continuing and handling the copy-on-write. >=20 > It sounds like you want to expand that. You want to add a whole class > of new faults that can be ignored: not just that some COW handling = might > be necessary, but that the PTE itself might be out of date. Just = like > a "COW fault" may just result in setting the PTE.W=3D1 and moving on = with > our day, an instruction fault might now just end up with setting > PTE.NX=3D0 and also moving on with our day. You raise an interesting idea (which can easily be implemented with = uffd), but no - I had none of that in my mind. My only purpose is to deal with actual spurious page-faults that I encountered when I removed the TLB flush the happens after NX=3D1->NX=3D0.= I am actually surprised that the kernel makes such a strong assumption that every change of NX=3D1->NX=3D0 would be followed by a TLB flush, = and that during these changes the mm is locked for write. But that is the case. If you do not have this change and a PTE is changed from NX=3D1->NX=3D0 and *later* you access the page, you can have a = page-fault due to stale PTE, and get a SIGSEGV since access_error() is wrong to assume that this is an invalid access. I did not change and there are no changes to the VMA during the page-fault. The page-fault handler would do pretty much nothing and return to user-space which would retry the instruction. [ page-fault triggers an implicit TLB flush of the offending PTE ] >=20 > I'm really confused why the "error_code & X86_PF_WRITE" case is = getting > modified. I would have expected it to be something like just adding: >=20 > /* read, instruction fetch */ > if (error_code & X86_PF_INSN) { > /* Avoid enforcing access error if spurious: */ > if (unlikely(!(vma->vm_flags & VM_EXEC))) > return 1; > return 0; > } >=20 > I'm really confused what X86_PF_WRITE and X86_PF_INSN have in common > other than both being able to (now) be generated spuriously. That was my first version, but I was concerned that perhaps there is some strange scenario in which both X86_PF_WRITE and X86_PF_INSN can be set. That is the reason that Peter asked you whether this is something that might happen. If you confirm they cannot be both set, I would the version you just mentioned.