From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00C97C4320A for ; Fri, 27 Aug 2021 23:22:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF1A960FF2 for ; Fri, 27 Aug 2021 23:22:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232479AbhH0XXk (ORCPT ); Fri, 27 Aug 2021 19:23:40 -0400 Received: from mga02.intel.com ([134.134.136.20]:52665 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232433AbhH0XXj (ORCPT ); Fri, 27 Aug 2021 19:23:39 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10089"; a="205253045" X-IronPort-AV: E=Sophos;i="5.84,357,1620716400"; d="scan'208";a="205253045" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Aug 2021 16:22:47 -0700 X-IronPort-AV: E=Sophos;i="5.84,357,1620716400"; d="scan'208";a="538679488" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Aug 2021 16:22:47 -0700 Date: Fri, 27 Aug 2021 16:22:46 -0700 From: "Luck, Tony" To: Al Viro Cc: Linus Torvalds , Andreas Gruenbacher , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , ocfs2-devel@oss.oracle.com Subject: Re: [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable Message-ID: <20210827232246.GA1668365@agluck-desk2.amr.corp.intel.com> References: <20210827164926.1726765-1-agruenba@redhat.com> <20210827164926.1726765-6-agruenba@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 27, 2021 at 09:57:10PM +0000, Al Viro wrote: > On Fri, Aug 27, 2021 at 09:48:55PM +0000, Al Viro wrote: > > > [btrfs]search_ioctl() > > Broken with memory poisoning, for either variant of semantics. Same for > > arm64 sub-page permission differences, I think. > > > > So we have 3 callers where we want all-or-nothing semantics - two in > > arch/x86/kernel/fpu/signal.c and one in btrfs. HWPOISON will be a problem > > for all 3, AFAICS... > > > > IOW, it looks like we have two different things mixed here - one that wants > > to try and fault stuff in, with callers caring only about having _something_ > > faulted in (most of the users) and one that wants to make sure we *can* do > > stores or loads on each byte in the affected area. > > > > Just accessing a byte in each page really won't suffice for the second kind. > > Neither will g-u-p use, unless we teach it about HWPOISON and other fun > > beasts... Looks like we want that thing to be a separate primitive; for > > btrfs I'd probably replace fault_in_pages_writeable() with clear_user() > > as a quick fix for now... > > > > Comments? > > Wait a sec... Wasn't HWPOISON a per-page thing? arm64 definitely does have > smaller-than-page areas with different permissions, so btrfs search_ioctl() > has a problem there, but arch/x86/kernel/fpu/signal.c doesn't have to deal > with that... > > Sigh... I really need more coffee... On Intel poison is tracked at the cache line granularity. Linux inflates that to per-page (because it can only take a whole page away). For faults triggered in ring3 this is pretty much the same thing because mm/memory_failure.c unmaps the page ... so while you see a #MC on first access, you get #PF when you retry. The x86 fault handler sees a magic signature in the page table and sends a SIGBUS. But it's all different if the #MC is triggerd from ring0. The machine check handler can't unmap the page. It just schedules task_work to do the unmap when next returning to the user. But if your kernel code loops and tries again without a return to user, then your get another #MC. -Tony