From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 203F9C433F5 for ; Tue, 12 Oct 2021 17:27:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EE183610C9 for ; Tue, 12 Oct 2021 17:27:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231614AbhJLR3P (ORCPT ); Tue, 12 Oct 2021 13:29:15 -0400 Received: from mail.kernel.org ([198.145.29.99]:52608 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230306AbhJLR3O (ORCPT ); Tue, 12 Oct 2021 13:29:14 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E0EB460EFE; Tue, 12 Oct 2021 17:27:09 +0000 (UTC) Date: Tue, 12 Oct 2021 18:27:06 +0100 From: Catalin Marinas To: Linus Torvalds Cc: Al Viro , Andreas Gruenbacher , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , "ocfs2-devel@oss.oracle.com" , Josef Bacik , Will Deacon Subject: Re: [RFC][arm64] possible infinite loop in btrfs search_ioctl() Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon, Oct 11, 2021 at 04:59:28PM -0700, Linus Torvalds wrote: > On Mon, Oct 11, 2021 at 2:08 PM Catalin Marinas wrote: > > +#ifdef CONFIG_ARM64_MTE > > +#define FAULT_GRANULE_SIZE (16) > > +#define FAULT_GRANULE_MASK (~(FAULT_GRANULE_SIZE-1)) > > [...] > > > If this looks in the right direction, I'll do some proper patches > > tomorrow. > > Looks fine to me. It's going to be quite expensive and bad for caches, > though. > > That said, fault_in_writable() is _supposed_ to all be for the slow > path when things go south and the normal path didn't work out, so I > think it's fine. > > I do wonder how the sub-page granularity works. Is it sufficient to > just read from it? For arm64 MTE and I think SPARC ADI, just reading should be sufficient. There is CHERI in the long run, if it takes off, where the user can set independent read/write permissions and uaccess would use the capability rather than a match-all pointer (hence checked). > Because then a _slightly_ better option might be to > do one write per page (to catch page table writability) and then one > read per "granule" (to catch pointer coloring or cache poisoning > issues)? > > That said, since this is all preparatory to us wanting to write to it > eventually anyway, maybe marking it all dirty in the caches is only > good. It depends on how much would be written in the actual copy. For significant memcpy on arm CPUs, write streaming usually kicks in and the cache dirtying is skipped. This probably matters more for copy_page_to_iter_iovec() than the btrfs search ioctl. Apart from fault_in_pages_*(), there's also fault_in_user_writeable() called from the futex code which uses the GUP mechanism as the write would be destructive. It looks like it could potentially trigger the same infinite loop on -EFAULT. For arm64 MTE, we get away with this by disabling the tag checking around the arch futex code (we did it for an unrelated issue - we don't have LDXR/STXR that would run with user permissions in kernel mode like we do with LDTR/STTR). I wonder whether we should actually just disable tag checking around the problematic accesses. What these callers seem to have in common is using pagefault_disable/enable(). We could abuse this to disable tag checking or maybe in_atomic() when handling the exception to lazily disable such faults temporarily. A more invasive change would be to return a different error for such faults like -EACCESS and treat them differently in the caller. -- Catalin