From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C07DC433E7 for ; Wed, 14 Oct 2020 21:25:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D15821D7F for ; Wed, 14 Oct 2020 21:25:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="IbW8Otvl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730786AbgJNVZI (ORCPT ); Wed, 14 Oct 2020 17:25:08 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:54270 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726484AbgJNVZH (ORCPT ); Wed, 14 Oct 2020 17:25:07 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09ELEgF5054525; Wed, 14 Oct 2020 21:24:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=51BNKsiW5qE6NaIm6zaY6u/ayB56rwx6A3kLDW5+QNk=; b=IbW8OtvlhkDvE//wE2IUCqFyQxF1DWqKqmVPPwQ8zPpdQcEcU8Jft8onRFZ5vO47EoXB 2XSXHKhd/ZvQK5mQXIQACqSSGArm9ML1WsKZCSxltF7nd6eWW5ymsxa3fgF16jXfHXwv TZ3EYYaf/5IVkhEAbTPVB5aA/zgZkr6hRDD6iqDyBKBCXFb/aeyDJNGK8uAb/FsuO5QQ rW5cRg3ky70yp8sdsVlnyw0wccz64kyxswt8PcgntGC54UHrzeMXvTRz0oMZ//7WZ6Q7 fC6PCwHPLahDZ81y592NAkGB8A2iOJY5Oa9/wwlkP9SQgv6XuuVyFd/FYHRkGFLm39R6 fQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3434wksv92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 21:24:12 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09ELFaQH174951; Wed, 14 Oct 2020 21:22:11 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 344by47uag-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 21:22:11 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 09ELM1Yu022077; Wed, 14 Oct 2020 21:22:06 GMT Received: from [10.65.149.55] (/10.65.149.55) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 14:22:01 -0700 Subject: Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length To: Catalin Marinas Cc: Jann Horn , "David S. Miller" , sparclinux@vger.kernel.org, Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , Anthony Yznaga , Will Deacon , linux-arm-kernel@lists.infradead.org, Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , linuxppc-dev@lists.ozlabs.org References: <20201007073932.865218-1-jannh@google.com> <20201010110949.GA32545@gaia> <20201012172218.GE6493@gaia> <20c85633-b559-c299-3e57-ae136b201526@oracle.com> <20201013091638.GA10778@gaia> From: Khalid Aziz Organization: Oracle Corp X-Pep-Version: 2.0 Message-ID: Date: Wed, 14 Oct 2020 15:21:16 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201013091638.GA10778@gaia> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9774 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 suspectscore=0 mlxscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140148 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9774 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 suspectscore=0 impostorscore=0 clxscore=1015 spamscore=0 priorityscore=1501 bulkscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140148 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/13/20 3:16 AM, Catalin Marinas wrote: > On Mon, Oct 12, 2020 at 01:14:50PM -0600, Khalid Aziz wrote: >> On 10/12/20 11:22 AM, Catalin Marinas wrote: >>> On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: >>>> On 10/10/20 5:09 AM, Catalin Marinas wrote: >>>>> On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: >>>>>> On 10/7/20 1:39 AM, Jann Horn wrote: >>>>>>> arch_validate_prot() is a hook that can validate whether a given = set of >>>>>>> protection flags is valid in an mprotect() operation. It is given= the set >>>>>>> of protection flags and the address being modified. >>>>>>> >>>>>>> However, the address being modified can currently not actually be= used in >>>>>>> a meaningful way because: >>>>>>> >>>>>>> 1. Only the address is given, but not the length, and the operati= on can >>>>>>> span multiple VMAs. Therefore, the callee can't actually tell = which >>>>>>> virtual address range, or which VMAs, are being targeted. >>>>>>> 2. The mmap_lock is not held, meaning that if the callee were to = check >>>>>>> the VMA at @addr, that VMA would be unrelated to the one the >>>>>>> operation is performed on. >>>>>>> >>>>>>> Currently, custom arch_validate_prot() handlers are defined by >>>>>>> arm64, powerpc and sparc. >>>>>>> arm64 and powerpc don't care about the address range, they just c= heck the >>>>>>> flags against CPU support masks. >>>>>>> sparc's arch_validate_prot() attempts to look at the VMA, but doe= sn't take >>>>>>> the mmap_lock. >>>>>>> >>>>>>> Change the function signature to also take a length, and move the= >>>>>>> arch_validate_prot() call in mm/mprotect.c down into the locked r= egion. >>>>> [...] >>>>>> As Chris pointed out, the call to arch_validate_prot() from do_mma= p2() >>>>>> is made without holding mmap_lock. Lock is not acquired until >>>>>> vm_mmap_pgoff(). This variance is uncomfortable but I am more >>>>>> uncomfortable forcing all implementations of validate_prot to requ= ire >>>>>> mmap_lock be held when non-sparc implementations do not have such = need >>>>>> yet. Since do_mmap2() is in powerpc specific code, for now this pa= tch >>>>>> solves a current problem. >>>>> >>>>> I still think sparc should avoid walking the vmas in >>>>> arch_validate_prot(). The core code already has the vmas, though no= t >>>>> when calling arch_validate_prot(). That's one of the reasons I adde= d >>>>> arch_validate_flags() with the MTE patches. For sparc, this could b= e >>>>> (untested, just copied the arch_validate_prot() code): >>>> >>>> I am little uncomfortable with the idea of validating protection bit= s >>>> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being >>>> enabled across multiple VMAs and arch_validate_flags() fails on a VM= A >>>> later, do_mprotect_pkey() will bail out with error leaving ADI enabl= ed >>>> on earlier VMAs. This will apply to protection bits other than ADI a= s >>>> well of course. This becomes a partial failure of mprotect() call. I= >>>> think it should be all or nothing with mprotect() - when one calls >>>> mprotect() from userspace, either the entire address range passed in= >>>> gets its protection bits updated or none of it does. That requires >>>> validating protection bits upfront or undoing what earlier iteration= s of >>>> VMA walk loop might have done. >>> >>> I thought the same initially but mprotect() already does this with th= e >>> VM_MAY* flag checking. If you ask it for an mprotect() that crosses >>> multiple vmas and one of them fails, it doesn't roll back the changes= to >>> the prior ones. I considered that a similar approach is fine for MTE >>> (it's most likely a user error). >> >> You are right about the current behavior with VM_MAY* flags, but that = is >> not the right behavior. Adding more cases to this just perpetuates >> incorrect behavior. It is not easy to roll back changes after VMAs hav= e >> potentially been split/merged which is probably why the current code >> simply throws in the towel and returns with partially modified address= >> space. It is lot easier to do all the checks upfront and then proceed = or >> not proceed with modifying VMAs. One approach might be to call >> arch_validate_flags() in a loop before modifying VMAs and walk all VMA= s >> with a read lock held. Current code also bails out with ENOMEM if it >> finds a hole in the address range and leaves any modifications already= >> made in place. This is another case where a hole could have been >> detected earlier. >=20 > This should be ideal indeed though with the risk of breaking the curren= t > ABI (FWIW, FreeBSD seems to do a first pass to check for violations: > https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L2630). I am not sure I understand where the ABI breakage would be. Are we aware of apps that intentionally modify address space partially using the current code? What FreeBSD does seems like a reasonable thing to do. Any way first thing to do is to update sparc to use arch_validate_flags() and update sparc_validate_prot() to not peek into vma without lock. I can do that unless Jann wants to rework this 2 patch series with these changes. >=20 > However, I'm not sure it's worth the hassle. Do we expect the user to > call mprotect() across multiple mixed type mappings while relying on no= > change if an error is returned? We should probably at least document th= e > current behaviour in the mprotect man page. >=20 Yes, documenting current behavior is definitely a good thing to do. -- Khalid From mboxrd@z Thu Jan 1 00:00:00 1970 From: Khalid Aziz Date: Wed, 14 Oct 2020 21:21:16 +0000 Subject: Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length Message-Id: List-Id: References: <20201007073932.865218-1-jannh@google.com> <20201010110949.GA32545@gaia> <20201012172218.GE6493@gaia> <20c85633-b559-c299-3e57-ae136b201526@oracle.com> <20201013091638.GA10778@gaia> In-Reply-To: <20201013091638.GA10778@gaia> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Catalin Marinas Cc: Jann Horn , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-mm@kvack.org, Paul Mackerras , Benjamin Herrenschmidt , sparclinux@vger.kernel.org, Anthony Yznaga , Andrew Morton , Will Deacon , "David S. Miller" , linux-arm-kernel@lists.infradead.org On 10/13/20 3:16 AM, Catalin Marinas wrote: > On Mon, Oct 12, 2020 at 01:14:50PM -0600, Khalid Aziz wrote: >> On 10/12/20 11:22 AM, Catalin Marinas wrote: >>> On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: >>>> On 10/10/20 5:09 AM, Catalin Marinas wrote: >>>>> On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: >>>>>> On 10/7/20 1:39 AM, Jann Horn wrote: >>>>>>> arch_validate_prot() is a hook that can validate whether a given set of >>>>>>> protection flags is valid in an mprotect() operation. It is given the set >>>>>>> of protection flags and the address being modified. >>>>>>> >>>>>>> However, the address being modified can currently not actually be used in >>>>>>> a meaningful way because: >>>>>>> >>>>>>> 1. Only the address is given, but not the length, and the operation can >>>>>>> span multiple VMAs. Therefore, the callee can't actually tell which >>>>>>> virtual address range, or which VMAs, are being targeted. >>>>>>> 2. The mmap_lock is not held, meaning that if the callee were to check >>>>>>> the VMA at @addr, that VMA would be unrelated to the one the >>>>>>> operation is performed on. >>>>>>> >>>>>>> Currently, custom arch_validate_prot() handlers are defined by >>>>>>> arm64, powerpc and sparc. >>>>>>> arm64 and powerpc don't care about the address range, they just check the >>>>>>> flags against CPU support masks. >>>>>>> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take >>>>>>> the mmap_lock. >>>>>>> >>>>>>> Change the function signature to also take a length, and move the >>>>>>> arch_validate_prot() call in mm/mprotect.c down into the locked region. >>>>> [...] >>>>>> As Chris pointed out, the call to arch_validate_prot() from do_mmap2() >>>>>> is made without holding mmap_lock. Lock is not acquired until >>>>>> vm_mmap_pgoff(). This variance is uncomfortable but I am more >>>>>> uncomfortable forcing all implementations of validate_prot to require >>>>>> mmap_lock be held when non-sparc implementations do not have such need >>>>>> yet. Since do_mmap2() is in powerpc specific code, for now this patch >>>>>> solves a current problem. >>>>> >>>>> I still think sparc should avoid walking the vmas in >>>>> arch_validate_prot(). The core code already has the vmas, though not >>>>> when calling arch_validate_prot(). That's one of the reasons I added >>>>> arch_validate_flags() with the MTE patches. For sparc, this could be >>>>> (untested, just copied the arch_validate_prot() code): >>>> >>>> I am little uncomfortable with the idea of validating protection bits >>>> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being >>>> enabled across multiple VMAs and arch_validate_flags() fails on a VMA >>>> later, do_mprotect_pkey() will bail out with error leaving ADI enabled >>>> on earlier VMAs. This will apply to protection bits other than ADI as >>>> well of course. This becomes a partial failure of mprotect() call. I >>>> think it should be all or nothing with mprotect() - when one calls >>>> mprotect() from userspace, either the entire address range passed in >>>> gets its protection bits updated or none of it does. That requires >>>> validating protection bits upfront or undoing what earlier iterations of >>>> VMA walk loop might have done. >>> >>> I thought the same initially but mprotect() already does this with the >>> VM_MAY* flag checking. If you ask it for an mprotect() that crosses >>> multiple vmas and one of them fails, it doesn't roll back the changes to >>> the prior ones. I considered that a similar approach is fine for MTE >>> (it's most likely a user error). >> >> You are right about the current behavior with VM_MAY* flags, but that is >> not the right behavior. Adding more cases to this just perpetuates >> incorrect behavior. It is not easy to roll back changes after VMAs have >> potentially been split/merged which is probably why the current code >> simply throws in the towel and returns with partially modified address >> space. It is lot easier to do all the checks upfront and then proceed or >> not proceed with modifying VMAs. One approach might be to call >> arch_validate_flags() in a loop before modifying VMAs and walk all VMAs >> with a read lock held. Current code also bails out with ENOMEM if it >> finds a hole in the address range and leaves any modifications already >> made in place. This is another case where a hole could have been >> detected earlier. > > This should be ideal indeed though with the risk of breaking the current > ABI (FWIW, FreeBSD seems to do a first pass to check for violations: > https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L2630). I am not sure I understand where the ABI breakage would be. Are we aware of apps that intentionally modify address space partially using the current code? What FreeBSD does seems like a reasonable thing to do. Any way first thing to do is to update sparc to use arch_validate_flags() and update sparc_validate_prot() to not peek into vma without lock. I can do that unless Jann wants to rework this 2 patch series with these changes. > > However, I'm not sure it's worth the hassle. Do we expect the user to > call mprotect() across multiple mixed type mappings while relying on no > change if an error is returned? We should probably at least document the > current behaviour in the mprotect man page. > Yes, documenting current behavior is definitely a good thing to do. -- Khalid From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBE17C433DF for ; Wed, 14 Oct 2020 21:46:37 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 020CE21D81 for ; Wed, 14 Oct 2020 21:46:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="IbW8Otvl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 020CE21D81 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CBQwV2rqTzDrDf for ; Thu, 15 Oct 2020 08:46:34 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=oracle.com (client-ip=141.146.126.78; helo=aserp2120.oracle.com; envelope-from=khalid.aziz@oracle.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.a=rsa-sha256 header.s=corp-2020-01-29 header.b=IbW8Otvl; dkim-atps=neutral Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4CBQRX3b67zDrDD for ; Thu, 15 Oct 2020 08:24:56 +1100 (AEDT) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09ELEgF5054525; Wed, 14 Oct 2020 21:24:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=51BNKsiW5qE6NaIm6zaY6u/ayB56rwx6A3kLDW5+QNk=; b=IbW8OtvlhkDvE//wE2IUCqFyQxF1DWqKqmVPPwQ8zPpdQcEcU8Jft8onRFZ5vO47EoXB 2XSXHKhd/ZvQK5mQXIQACqSSGArm9ML1WsKZCSxltF7nd6eWW5ymsxa3fgF16jXfHXwv TZ3EYYaf/5IVkhEAbTPVB5aA/zgZkr6hRDD6iqDyBKBCXFb/aeyDJNGK8uAb/FsuO5QQ rW5cRg3ky70yp8sdsVlnyw0wccz64kyxswt8PcgntGC54UHrzeMXvTRz0oMZ//7WZ6Q7 fC6PCwHPLahDZ81y592NAkGB8A2iOJY5Oa9/wwlkP9SQgv6XuuVyFd/FYHRkGFLm39R6 fQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3434wksv92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 21:24:12 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09ELFaQH174951; Wed, 14 Oct 2020 21:22:11 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 344by47uag-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 21:22:11 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 09ELM1Yu022077; Wed, 14 Oct 2020 21:22:06 GMT Received: from [10.65.149.55] (/10.65.149.55) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 14:22:01 -0700 Subject: Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length To: Catalin Marinas References: <20201007073932.865218-1-jannh@google.com> <20201010110949.GA32545@gaia> <20201012172218.GE6493@gaia> <20c85633-b559-c299-3e57-ae136b201526@oracle.com> <20201013091638.GA10778@gaia> From: Khalid Aziz Organization: Oracle Corp X-Pep-Version: 2.0 Message-ID: Date: Wed, 14 Oct 2020 15:21:16 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201013091638.GA10778@gaia> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9774 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 suspectscore=0 mlxscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140148 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9774 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 suspectscore=0 impostorscore=0 clxscore=1015 spamscore=0 priorityscore=1501 bulkscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140148 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jann Horn , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-mm@kvack.org, Paul Mackerras , sparclinux@vger.kernel.org, Anthony Yznaga , Andrew Morton , Will Deacon , "David S. Miller" , linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 10/13/20 3:16 AM, Catalin Marinas wrote: > On Mon, Oct 12, 2020 at 01:14:50PM -0600, Khalid Aziz wrote: >> On 10/12/20 11:22 AM, Catalin Marinas wrote: >>> On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: >>>> On 10/10/20 5:09 AM, Catalin Marinas wrote: >>>>> On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: >>>>>> On 10/7/20 1:39 AM, Jann Horn wrote: >>>>>>> arch_validate_prot() is a hook that can validate whether a given = set of >>>>>>> protection flags is valid in an mprotect() operation. It is given= the set >>>>>>> of protection flags and the address being modified. >>>>>>> >>>>>>> However, the address being modified can currently not actually be= used in >>>>>>> a meaningful way because: >>>>>>> >>>>>>> 1. Only the address is given, but not the length, and the operati= on can >>>>>>> span multiple VMAs. Therefore, the callee can't actually tell = which >>>>>>> virtual address range, or which VMAs, are being targeted. >>>>>>> 2. The mmap_lock is not held, meaning that if the callee were to = check >>>>>>> the VMA at @addr, that VMA would be unrelated to the one the >>>>>>> operation is performed on. >>>>>>> >>>>>>> Currently, custom arch_validate_prot() handlers are defined by >>>>>>> arm64, powerpc and sparc. >>>>>>> arm64 and powerpc don't care about the address range, they just c= heck the >>>>>>> flags against CPU support masks. >>>>>>> sparc's arch_validate_prot() attempts to look at the VMA, but doe= sn't take >>>>>>> the mmap_lock. >>>>>>> >>>>>>> Change the function signature to also take a length, and move the= >>>>>>> arch_validate_prot() call in mm/mprotect.c down into the locked r= egion. >>>>> [...] >>>>>> As Chris pointed out, the call to arch_validate_prot() from do_mma= p2() >>>>>> is made without holding mmap_lock. Lock is not acquired until >>>>>> vm_mmap_pgoff(). This variance is uncomfortable but I am more >>>>>> uncomfortable forcing all implementations of validate_prot to requ= ire >>>>>> mmap_lock be held when non-sparc implementations do not have such = need >>>>>> yet. Since do_mmap2() is in powerpc specific code, for now this pa= tch >>>>>> solves a current problem. >>>>> >>>>> I still think sparc should avoid walking the vmas in >>>>> arch_validate_prot(). The core code already has the vmas, though no= t >>>>> when calling arch_validate_prot(). That's one of the reasons I adde= d >>>>> arch_validate_flags() with the MTE patches. For sparc, this could b= e >>>>> (untested, just copied the arch_validate_prot() code): >>>> >>>> I am little uncomfortable with the idea of validating protection bit= s >>>> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being >>>> enabled across multiple VMAs and arch_validate_flags() fails on a VM= A >>>> later, do_mprotect_pkey() will bail out with error leaving ADI enabl= ed >>>> on earlier VMAs. This will apply to protection bits other than ADI a= s >>>> well of course. This becomes a partial failure of mprotect() call. I= >>>> think it should be all or nothing with mprotect() - when one calls >>>> mprotect() from userspace, either the entire address range passed in= >>>> gets its protection bits updated or none of it does. That requires >>>> validating protection bits upfront or undoing what earlier iteration= s of >>>> VMA walk loop might have done. >>> >>> I thought the same initially but mprotect() already does this with th= e >>> VM_MAY* flag checking. If you ask it for an mprotect() that crosses >>> multiple vmas and one of them fails, it doesn't roll back the changes= to >>> the prior ones. I considered that a similar approach is fine for MTE >>> (it's most likely a user error). >> >> You are right about the current behavior with VM_MAY* flags, but that = is >> not the right behavior. Adding more cases to this just perpetuates >> incorrect behavior. It is not easy to roll back changes after VMAs hav= e >> potentially been split/merged which is probably why the current code >> simply throws in the towel and returns with partially modified address= >> space. It is lot easier to do all the checks upfront and then proceed = or >> not proceed with modifying VMAs. One approach might be to call >> arch_validate_flags() in a loop before modifying VMAs and walk all VMA= s >> with a read lock held. Current code also bails out with ENOMEM if it >> finds a hole in the address range and leaves any modifications already= >> made in place. This is another case where a hole could have been >> detected earlier. >=20 > This should be ideal indeed though with the risk of breaking the curren= t > ABI (FWIW, FreeBSD seems to do a first pass to check for violations: > https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L2630). I am not sure I understand where the ABI breakage would be. Are we aware of apps that intentionally modify address space partially using the current code? What FreeBSD does seems like a reasonable thing to do. Any way first thing to do is to update sparc to use arch_validate_flags() and update sparc_validate_prot() to not peek into vma without lock. I can do that unless Jann wants to rework this 2 patch series with these changes. >=20 > However, I'm not sure it's worth the hassle. Do we expect the user to > call mprotect() across multiple mixed type mappings while relying on no= > change if an error is returned? We should probably at least document th= e > current behaviour in the mprotect man page. >=20 Yes, documenting current behavior is definitely a good thing to do. -- Khalid From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A64CC433E7 for ; Wed, 14 Oct 2020 21:26:10 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A1B9821D7F for ; Wed, 14 Oct 2020 21:26:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="T4G4S18r"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="IbW8Otvl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1B9821D7F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=G9bUjbyDPbJiSQGVa6E1FMTMXx9A3bC5rT+NCTYNn6Q=; b=T4G4S18rrrYvN6rpqgXGPHjDc ZHBKqAvubfXpzGlXJ5aHW1XdOd4beForlraQzbF+xDXOJ8S6xgXff6G874a4W9AvMrARw7+K7znKI pX3pgG6X3u6kEk7+1lnQq3BzdY4OBWuZDAv3A9Y+YWQrP8XQ3gLOHZagBlO98yEUiZl93Xr2dphc8 6JlAperAcd+R+xv8RDeK2mBhd1ah2gccPpSYRRtUj3AE3kAk6/JIsdLM36VkhmRmZ8vStyR90Wk6B /VxF3PDQVv+dOwhIfDfgH/UAUkbwQq8+FiBID2IpzFvK2Ew1ANttV7dx4a4YTCIIEFv2HIXl6Z3L2 rFRL6aiLg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kSoGO-0006PX-Sn; Wed, 14 Oct 2020 21:24:48 +0000 Received: from aserp2120.oracle.com ([141.146.126.78]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kSoGL-0006Ol-EQ for linux-arm-kernel@lists.infradead.org; Wed, 14 Oct 2020 21:24:46 +0000 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09ELEgF5054525; Wed, 14 Oct 2020 21:24:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=51BNKsiW5qE6NaIm6zaY6u/ayB56rwx6A3kLDW5+QNk=; b=IbW8OtvlhkDvE//wE2IUCqFyQxF1DWqKqmVPPwQ8zPpdQcEcU8Jft8onRFZ5vO47EoXB 2XSXHKhd/ZvQK5mQXIQACqSSGArm9ML1WsKZCSxltF7nd6eWW5ymsxa3fgF16jXfHXwv TZ3EYYaf/5IVkhEAbTPVB5aA/zgZkr6hRDD6iqDyBKBCXFb/aeyDJNGK8uAb/FsuO5QQ rW5cRg3ky70yp8sdsVlnyw0wccz64kyxswt8PcgntGC54UHrzeMXvTRz0oMZ//7WZ6Q7 fC6PCwHPLahDZ81y592NAkGB8A2iOJY5Oa9/wwlkP9SQgv6XuuVyFd/FYHRkGFLm39R6 fQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3434wksv92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 21:24:12 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09ELFaQH174951; Wed, 14 Oct 2020 21:22:11 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 344by47uag-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 21:22:11 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 09ELM1Yu022077; Wed, 14 Oct 2020 21:22:06 GMT Received: from [10.65.149.55] (/10.65.149.55) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 14:22:01 -0700 Subject: Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length To: Catalin Marinas References: <20201007073932.865218-1-jannh@google.com> <20201010110949.GA32545@gaia> <20201012172218.GE6493@gaia> <20c85633-b559-c299-3e57-ae136b201526@oracle.com> <20201013091638.GA10778@gaia> From: Khalid Aziz Organization: Oracle Corp X-Pep-Version: 2.0 Message-ID: Date: Wed, 14 Oct 2020 15:21:16 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201013091638.GA10778@gaia> Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9774 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 suspectscore=0 mlxscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140148 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9774 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 suspectscore=0 impostorscore=0 clxscore=1015 spamscore=0 priorityscore=1501 bulkscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140148 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201014_172445_585422_9CE49DB5 X-CRM114-Status: GOOD ( 36.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jann Horn , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-mm@kvack.org, Paul Mackerras , Benjamin Herrenschmidt , sparclinux@vger.kernel.org, Anthony Yznaga , Andrew Morton , Will Deacon , "David S. Miller" , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 10/13/20 3:16 AM, Catalin Marinas wrote: > On Mon, Oct 12, 2020 at 01:14:50PM -0600, Khalid Aziz wrote: >> On 10/12/20 11:22 AM, Catalin Marinas wrote: >>> On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: >>>> On 10/10/20 5:09 AM, Catalin Marinas wrote: >>>>> On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: >>>>>> On 10/7/20 1:39 AM, Jann Horn wrote: >>>>>>> arch_validate_prot() is a hook that can validate whether a given set of >>>>>>> protection flags is valid in an mprotect() operation. It is given the set >>>>>>> of protection flags and the address being modified. >>>>>>> >>>>>>> However, the address being modified can currently not actually be used in >>>>>>> a meaningful way because: >>>>>>> >>>>>>> 1. Only the address is given, but not the length, and the operation can >>>>>>> span multiple VMAs. Therefore, the callee can't actually tell which >>>>>>> virtual address range, or which VMAs, are being targeted. >>>>>>> 2. The mmap_lock is not held, meaning that if the callee were to check >>>>>>> the VMA at @addr, that VMA would be unrelated to the one the >>>>>>> operation is performed on. >>>>>>> >>>>>>> Currently, custom arch_validate_prot() handlers are defined by >>>>>>> arm64, powerpc and sparc. >>>>>>> arm64 and powerpc don't care about the address range, they just check the >>>>>>> flags against CPU support masks. >>>>>>> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take >>>>>>> the mmap_lock. >>>>>>> >>>>>>> Change the function signature to also take a length, and move the >>>>>>> arch_validate_prot() call in mm/mprotect.c down into the locked region. >>>>> [...] >>>>>> As Chris pointed out, the call to arch_validate_prot() from do_mmap2() >>>>>> is made without holding mmap_lock. Lock is not acquired until >>>>>> vm_mmap_pgoff(). This variance is uncomfortable but I am more >>>>>> uncomfortable forcing all implementations of validate_prot to require >>>>>> mmap_lock be held when non-sparc implementations do not have such need >>>>>> yet. Since do_mmap2() is in powerpc specific code, for now this patch >>>>>> solves a current problem. >>>>> >>>>> I still think sparc should avoid walking the vmas in >>>>> arch_validate_prot(). The core code already has the vmas, though not >>>>> when calling arch_validate_prot(). That's one of the reasons I added >>>>> arch_validate_flags() with the MTE patches. For sparc, this could be >>>>> (untested, just copied the arch_validate_prot() code): >>>> >>>> I am little uncomfortable with the idea of validating protection bits >>>> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being >>>> enabled across multiple VMAs and arch_validate_flags() fails on a VMA >>>> later, do_mprotect_pkey() will bail out with error leaving ADI enabled >>>> on earlier VMAs. This will apply to protection bits other than ADI as >>>> well of course. This becomes a partial failure of mprotect() call. I >>>> think it should be all or nothing with mprotect() - when one calls >>>> mprotect() from userspace, either the entire address range passed in >>>> gets its protection bits updated or none of it does. That requires >>>> validating protection bits upfront or undoing what earlier iterations of >>>> VMA walk loop might have done. >>> >>> I thought the same initially but mprotect() already does this with the >>> VM_MAY* flag checking. If you ask it for an mprotect() that crosses >>> multiple vmas and one of them fails, it doesn't roll back the changes to >>> the prior ones. I considered that a similar approach is fine for MTE >>> (it's most likely a user error). >> >> You are right about the current behavior with VM_MAY* flags, but that is >> not the right behavior. Adding more cases to this just perpetuates >> incorrect behavior. It is not easy to roll back changes after VMAs have >> potentially been split/merged which is probably why the current code >> simply throws in the towel and returns with partially modified address >> space. It is lot easier to do all the checks upfront and then proceed or >> not proceed with modifying VMAs. One approach might be to call >> arch_validate_flags() in a loop before modifying VMAs and walk all VMAs >> with a read lock held. Current code also bails out with ENOMEM if it >> finds a hole in the address range and leaves any modifications already >> made in place. This is another case where a hole could have been >> detected earlier. > > This should be ideal indeed though with the risk of breaking the current > ABI (FWIW, FreeBSD seems to do a first pass to check for violations: > https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L2630). I am not sure I understand where the ABI breakage would be. Are we aware of apps that intentionally modify address space partially using the current code? What FreeBSD does seems like a reasonable thing to do. Any way first thing to do is to update sparc to use arch_validate_flags() and update sparc_validate_prot() to not peek into vma without lock. I can do that unless Jann wants to rework this 2 patch series with these changes. > > However, I'm not sure it's worth the hassle. Do we expect the user to > call mprotect() across multiple mixed type mappings while relying on no > change if an error is returned? We should probably at least document the > current behaviour in the mprotect man page. > Yes, documenting current behavior is definitely a good thing to do. -- Khalid _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel