From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 65C03C433F5 for ; Sat, 27 Nov 2021 15:23:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ZGR29j+HU9Bm+wa0de07OosZ+iN7CgxCaou1FHXXBbY=; b=tDIpAFZ5YbBFMN SXtaBeZ6S6gyv0+KUzRHz82Enputc2mcaXmT38Cm13wDTRemOxltqZnHtxVkI0Tr41EqQchtGW+eR dYmISKGC9EMD2oZu7w+BkMI9aGkXVLgUjqTp/UfDO/vmyQv7gQmecRYC5ri78PxkYV1BTlW2TtkZI IxV5BEduraP6bHMUkgHIdYh6e2s+LxWBdA45zEy61W5YjcE43bfbiTbfQOsD7GZUD972lmv9biSvr TVnnskl6UTLnGf4a55WKDzANn/4PHbsnX/VPJRLEGM2PwyO7tf/Ob+TCV3/KbVz5Zmi6mxlpRM9mC 4Enu03JrrDSQdj+61L2w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mqzVy-00DtbH-3n; Sat, 27 Nov 2021 15:21:22 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mqzVt-00DtZo-Tk for linux-arm-kernel@lists.infradead.org; Sat, 27 Nov 2021 15:21:19 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2E3DAB81B05; Sat, 27 Nov 2021 15:21:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C6D1AC53FBF; Sat, 27 Nov 2021 15:21:09 +0000 (UTC) Date: Sat, 27 Nov 2021 15:21:06 +0000 From: Catalin Marinas To: Andreas Gruenbacher Cc: Matthew Wilcox , Linus Torvalds , Josef Bacik , David Sterba , Al Viro , Andrew Morton , Will Deacon , linux-fsdevel , LKML , Linux ARM , linux-btrfs Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Message-ID: References: <20211124192024.2408218-1-catalin.marinas@arm.com> <20211124192024.2408218-4-catalin.marinas@arm.com> <20211127123958.588350-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20211127123958.588350-1-agruenba@redhat.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211127_072118_260879_77F05AF5 X-CRM114-Status: GOOD ( 50.19 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sat, Nov 27, 2021 at 01:39:58PM +0100, Andreas Gruenbacher wrote: > On Sat, Nov 27, 2021 at 4:52 AM Andreas Gruenbacher = wrote: > > On Sat, Nov 27, 2021 at 12:06 AM Catalin Marinas wrote: > > > If we know that the arch copy_to_user() has an error of say maximum 16 > > > bytes (or 15 rather on arm64), we can instead get fault_in_writeable() > > > to probe the first 16 bytes rather than 1. > > > > That isn't going to help one bit: [raw_]copy_to_user() is allowed to > > copy as little or as much as it wants as long as it follows the rules > > documented in include/linux/uaccess.h: > > > > [] If copying succeeds, the return value must be 0. =A0If some data can= not be > > [] fetched, it is permitted to copy less than had been fetched; the only > > [] hard requirement is that not storing anything at all (i.e. returning= size) > > [] should happen only when nothing could be copied. =A0In other words, = you don't > > [] have to squeeze as much as possible - it is allowed, but not necessa= ry. > > > > When fault_in_writeable() tells us that an address range is accessible > > in principle, that doesn't mean that copy_to_user() will allow us to > > access it in arbitrary chunks. It's also not the case that > > fault_in_writeable(addr, size) is always followed by > > copy_to_user(addr, ..., size) for the exact same address range, not > > even in this case. > > > > These alignment restrictions have nothing to do with page or sub-page f= aults. > > > > I'm also fairly sure that passing in an unaligned buffer will send > > search_ioctl into an endless loop on architectures with copy_to_user() > > alignment restrictions; there don't seem to be any buffer alignment > > checks. > = > Let me retract that ... > = > The description in include/linux/uaccess.h leaves out permissible > reasons for fetching/storing less than requested. Thinking about it, if > the address range passed to one of the copy functions includes an > address that faults, it kind of makes sense to allow the copy function > to stop short instead of copying every last byte right up to the address > that fails. > = > If that's the only reason, then it would be great to have that included > in the description. And then we can indeed deal with the alignment > effects in fault_in_writeable(). Ah, I started replying last night, sent it today without seeing your follow-up. > > > I attempted the above here and works ok: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?= h=3Ddevel/btrfs-live-lock-fix > > > > > > but too late to post it this evening, I'll do it in the next day or so > > > as an alternative to this series. > = > I've taken a quick look. Under the assumption that alignment effects > are tied to page / sub-page faults, I think we can really solve this > generically as Willy has proposed. I think Willy's proposal stopped at the page boundary, it should go beyond. > Maybe as shown below; no need for arch-specific code. > = > diff --git a/mm/gup.c b/mm/gup.c > index 2c51e9748a6a..a9b3d916b625 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1658,6 +1658,8 @@ static long __get_user_pages_locked(struct mm_struc= t *mm, unsigned long start, > } > #endif /* !CONFIG_MMU */ > = > +#define SUBPAGE_FAULT_SIZE 16 > + > /** > * fault_in_writeable - fault in userspace address range for writing > * @uaddr: start of address range > @@ -1673,8 +1675,19 @@ size_t fault_in_writeable(char __user *uaddr, size= _t size) > if (unlikely(size =3D=3D 0)) > return 0; > if (!PAGE_ALIGNED(uaddr)) { > + if (SUBPAGE_FAULT_SIZE && > + !IS_ALIGNED((unsigned long)uaddr, SUBPAGE_FAULT_SIZE)) { > + end =3D PTR_ALIGN(uaddr, SUBPAGE_FAULT_SIZE); > + if (end - uaddr < size) { > + if (unlikely(__put_user(0, uaddr) !=3D 0)) > + return size; > + uaddr =3D end; > + if (unlikely(!end)) > + goto out; > + } > + } > if (unlikely(__put_user(0, uaddr) !=3D 0)) > - return size; > + goto out; > uaddr =3D (char __user *)PAGE_ALIGN((unsigned long)uaddr); > } > end =3D (char __user *)PAGE_ALIGN((unsigned long)start + size); That's similar, somehow, to the arch-specific probing in one of my patches: [1]. We could do the above if we can guarantee that the maximum error margin in copy_to_user() is smaller than SUBPAGE_FAULT_SIZE. For arm64 copy_to_user(), it is fine, but for copy_from_user(), if we ever need to handle fault_in_readable(), it isn't (on arm64 up to 64 bytes even if aligned: reads of large blocks are done in 4 * 16 loads, and if one of them fails e.g. because of the 16-byte sub-page fault, no write is done, hence such larger than 16 delta). If you want something in the generic fault_in_writeable(), we probably need a loop over UACCESS_MAX_WRITE_ERROR in SUBPAGE_FAULT_SIZE increments. But I thought I'd rather keep this in the arch-specific code. Of course, the above fault_in_writeable() still needs the btrfs search_ioctl() counterpart to change the probing on the actual fault address or offset. In the general case (uaccess error margin larger), I'm not entirely convinced we can skip the check if PAGE_ALIGNED(uaddr). I should probably get this logic through CBMC (or TLA+), I can't think it through. Thanks. [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/= ?h=3Ddevel/btrfs-live-lock-fix&id=3Daf7e96d9e9537d9f9cc014f388b7b2bb4a5bc343 -- = Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel