From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC430C33CB2 for ; Fri, 31 Jan 2020 12:04:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B7EDC20CC7 for ; Fri, 31 Jan 2020 12:04:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="InWWajXp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728571AbgAaMEI (ORCPT ); Fri, 31 Jan 2020 07:04:08 -0500 Received: from mail-ot1-f65.google.com ([209.85.210.65]:41030 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728570AbgAaMEI (ORCPT ); Fri, 31 Jan 2020 07:04:08 -0500 Received: by mail-ot1-f65.google.com with SMTP id r27so6305595otc.8 for ; Fri, 31 Jan 2020 04:04:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bPZYbEHgT8QqWy65RuP2LwO/ekDNyecnlbDtriR+rAY=; b=InWWajXpKjo8nIUsFLbd3+gXkRiyQwKrsN3SuyxJtIAV+5mvx5jWLI+32kU4GxZqH0 RL3AB4Duh8Nu54vbn6Muxq8c7ddYI2vv1j7nsgV4hvKBOc2ykYv8yJY6xRCulQ68HJ9n iMi+kRfnObu5vSYzqNBr6p9OyWwUMYtfZw8chuNpE5PrP9ihITkeTb6GRpUv11oX15Gx hKLXcS74O2uqJiYEjhy6o05QCFBK6KK6y1VA4haWcWqrRWsb+K6OeBV3e3AiqLuW6rBp N4hqHrI7+oEEVCExwmrVqpD/5Cb0z1bSEqvHGmen9xNTQ9Sn5kSIhznFRsrTIHBgnN/y BP4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bPZYbEHgT8QqWy65RuP2LwO/ekDNyecnlbDtriR+rAY=; b=h7JlNqNwv4fYZRybUZPZ0r0isjmxwmu0As39DXjuq4TPTwMb8MxCULNKiVsG3fJGW4 1qqSU2kHi39QagDQGIPYqjKCB2kbpImqeejOxyexDFxmtdKPiLDosIi5zBIlNCKJcslc BBSdwuJhHOzWJwsJV3of/CWLlGsVNWl3H4wtHXd7bihUp6BX0GvhQupvi+6cGpWApDhp CRYAteXJu6u2lWPTO3HpiN9iMJcjHEhfXnINsJBTTuj2oqd+69j/pppfKSGkGAl470qX +NutYA9m750KYZA7OSeBLtSFt4bQsOoL4BdIFc1YWEi4SnuNStL3Y/CfH1J6djcM/h14 +j/A== X-Gm-Message-State: APjAAAWjiP8mNXBhHJoW4hOW4vHAlOQqTyAYCYS+7HhMllofrHoQgkcT G6ZWNJxh3Uxn2KAx4+M8SRvkp+gvUY6G+ObZGnfo1g== X-Google-Smtp-Source: APXvYqxX4kvjJgrVd0xsJAPveBUs527WZZheCaEcNR7VaN+pr2nJJe9jO88RJTZyjdW6BmhGD/zGL14X4MmAFHztlf4= X-Received: by 2002:a05:6830:22cc:: with SMTP id q12mr7537510otc.110.1580472246617; Fri, 31 Jan 2020 04:04:06 -0800 (PST) MIME-Version: 1.0 References: <201911121313.1097D6EE@keescook> <201911141327.4DE6510@keescook> <202001271519.AA6ADEACF0@keescook> <5861936c-1fe1-4c44-d012-26efa0c8b6e7@de.ibm.com> <202001281457.FA11CC313A@keescook> <6844ea47-8e0e-4fb7-d86f-68046995a749@de.ibm.com> <20200129170939.GA4277@infradead.org> <771c5511-c5ab-3dd1-d938-5dbc40396daa@de.ibm.com> <202001300945.7D465B5F5@keescook> In-Reply-To: <202001300945.7D465B5F5@keescook> From: Jann Horn Date: Fri, 31 Jan 2020 13:03:40 +0100 Message-ID: Subject: Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches To: Kees Cook Cc: Christian Borntraeger , Christoph Hellwig , Christopher Lameter , Jiri Slaby , Julian Wiedmann , Ursula Braun , Alexander Viro , kernel list , David Windsor , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Linux-MM , linux-xfs@vger.kernel.org, Linus Torvalds , Andy Lutomirski , "David S. Miller" , Laura Abbott , Mark Rutland , "Martin K. Petersen" , Paolo Bonzini , Christoffer Dall , Dave Kleikamp , Jan Kara , Luis de Bethencourt , Marc Zyngier , Rik van Riel , Matthew Garrett , linux-fsdevel , linux-arch , Network Development , Kernel Hardening , Vlastimil Babka , Michal Kubecek Content-Type: text/plain; charset="UTF-8" Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, Jan 30, 2020 at 8:23 PM Kees Cook wrote: > On Wed, Jan 29, 2020 at 06:19:56PM +0100, Christian Borntraeger wrote: > > On 29.01.20 18:09, Christoph Hellwig wrote: > > > On Wed, Jan 29, 2020 at 06:07:14PM +0100, Christian Borntraeger wrote: > > >>> DMA can be done to NORMAL memory as well. > > >> > > >> Exactly. > > >> I think iucv uses GFP_DMA because z/VM needs those buffers to reside below 2GB (which is ZONA_DMA for s390). > > > > > > The normal way to allocate memory with addressing limits would be to > > > use dma_alloc_coherent and friends. Any chance to switch iucv over to > > > that? Or is there no device associated with it? > > > > There is not necessarily a device for that. It is a hypervisor interface (an > > instruction that is interpreted by z/VM). We do have the netiucv driver that > > creates a virtual nic, but there is also AF_IUCV which works without a device. > > > > But back to the original question: If we mark kmalloc caches as usercopy caches, > > we should do the same for DMA kmalloc caches. As outlined by Christoph, this has > > nothing to do with device DMA. > > Hm, looks like it's allocated from the low 16MB. Seems like poor naming! The physical address limit of the DMA zone depends on the architecture (and the kernel version); e.g. on Linux 4.4 on arm64 (which is used on the Pixel 2), the DMA zone goes up to 4GiB. Later, arm64 started using the DMA32 zone for that instead (as was already the case on e.g. X86-64); but recently (commit 1a8e1cef7603), arm64 started using the DMA zone again, but now for up to 1GiB. (AFAICS the DMA32 zone can't be used with kmalloc at all, that only works with the DMA zone.) > :) There seems to be a LOT of stuff using GFP_DMA, and it seems unlikely > those are all expecting low addresses? I think there's a bunch of (especially really old) hardware where the hardware can only talk to low physical addresses, e.g. stuff that uses the ISA bus. However, there aren't *that* many users of GFP_DMA that actually cause kmalloc allocations with GFP_DMA - many of the users of GFP_DMA actually just pass that flag to dma_alloc_coherent()/dma_pool_alloc(), where it is filtered away and the allocation ultimately doesn't go through the slab allocator AFAICS, or they pass it directly to the page allocator, where it has no effect on the usercopy stuff. Looking on my workstation, there are zero objects allocated in dma-kmalloc-* slabs: /sys/kernel/slab# for name in dma-kmalloc-*; do echo "$name: $(cat $name/objects)"; done dma-kmalloc-128: 0 dma-kmalloc-16: 0 dma-kmalloc-192: 0 dma-kmalloc-1k: 0 dma-kmalloc-256: 0 dma-kmalloc-2k: 0 dma-kmalloc-32: 0 dma-kmalloc-4k: 0 dma-kmalloc-512: 0 dma-kmalloc-64: 0 dma-kmalloc-8: 0 dma-kmalloc-8k: 0 dma-kmalloc-96: 0 On a Pixel 2, there are a whole five objects allocated across the DMA zone kmalloc caches: walleye:/sys/kernel/slab # for name in dma-kmalloc-*; do echo "$name: $(cat $name/objects)"; done dma-kmalloc-1024: 0 dma-kmalloc-128: 0 dma-kmalloc-2048: 2 dma-kmalloc-256: 0 dma-kmalloc-4096: 3 dma-kmalloc-512: 0 dma-kmalloc-8192: 0 > Since this has only been a problem on s390, should just s390 gain the > weakening of the usercopy restriction? Something like: > > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 1907cb2903c7..c5bbc141f20b 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -1303,7 +1303,9 @@ void __init create_kmalloc_caches(slab_flags_t flags) > kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( > kmalloc_info[i].name[KMALLOC_DMA], > kmalloc_info[i].size, > - SLAB_CACHE_DMA | flags, 0, 0); > + SLAB_CACHE_DMA | flags, 0, > + IS_ENABLED(CONFIG_S390) ? > + kmalloc_info[i].size : 0); > } > } > #endif I think dma-kmalloc slabs should be handled the same way as normal kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is just normal kernel memory - even if it might later be used for DMA -, and it should be perfectly fine to copy_from_user() into such allocations at that point, and to copy_to_user() out of them at the end. If you look at the places where such allocations are created, you can see things like kmemdup(), memcpy() and so on - all normal operations that shouldn't conceptually be different from usercopy in any relevant way.