From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DF85C4360F for ; Wed, 3 Apr 2019 20:00:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EF33E2084B for ; Wed, 3 Apr 2019 20:00:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726415AbfDCUAn (ORCPT ); Wed, 3 Apr 2019 16:00:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49044 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726064AbfDCUAn (ORCPT ); Wed, 3 Apr 2019 16:00:43 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1BCE1308220B; Wed, 3 Apr 2019 20:00:43 +0000 (UTC) Received: from x1.home (ovpn-116-99.phx2.redhat.com [10.3.116.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5B86119C69; Wed, 3 Apr 2019 20:00:38 +0000 (UTC) Date: Wed, 3 Apr 2019 14:00:37 -0600 From: Alex Williamson To: Jerome Glisse Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric.auger@redhat.com, cohuck@redhat.com, peterx@redhat.com Subject: Re: [PATCH v2] vfio/type1: Limit DMA mappings per container Message-ID: <20190403140037.15fcd764@x1.home> In-Reply-To: <20190403192426.GA16117@redhat.com> References: <155422160029.16896.1992475589398080933.stgit@gimli.home> <20190403192426.GA16117@redhat.com> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Wed, 03 Apr 2019 20:00:43 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 3 Apr 2019 15:24:26 -0400 Jerome Glisse wrote: > On Tue, Apr 02, 2019 at 10:15:38AM -0600, Alex Williamson wrote: > > Memory backed DMA mappings are accounted against a user's locked > > memory limit, including multiple mappings of the same memory. This > > accounting bounds the number of such mappings that a user can create. > > However, DMA mappings that are not backed by memory, such as DMA > > mappings of device MMIO via mmaps, do not make use of page pinning > > and therefore do not count against the user's locked memory limit. > > These mappings still consume memory, but the memory is not well > > associated to the process for the purpose of oom killing a task. > > > > To add bounding on this use case, we introduce a limit to the total > > number of concurrent DMA mappings that a user is allowed to create. > > This limit is exposed as a tunable module option where the default > > value of 64K is expected to be well in excess of any reasonable use > > case (a large virtual machine configuration would typically only make > > use of tens of concurrent mappings). > > > > This fixes CVE-2019-3882. > > > > Signed-off-by: Alex Williamson > > Have you tested with GPU passthrough ? GPU have huge BAR from > hundred of mega bytes to giga bytes (some driver resize them > to cover the whole GPU memory). Driver need to map those to > properly work. I am not sure what path is taken by mmap of > mmio BAR by a guest on the host but i just thought i would > point that out. The limit introduced is the number of mappings that a user can have outstanding, not the size of the mappings. We don't try to estimate the overhead of a mapping based on the mapping size since IOMMU super-page support can make a 1GB mapping comparable in overhead to a 4KB mapping. QEMU will generally try to map a bar with a single mapping, unless it's split by something like an MSI-X vector table or quirks, which still results in a low single digit number of mappings per BAR. This does not affect how the guest drivers use the device, BARs cannot be partially enabled from a DMA address space perspective. If a userspace driver were trying to map a large GPU BAR with separate 4K mappings, they could indeed hit the limit, but it's far from the common or expected use case and the module tunable could be used to provide this functionality if it were really necessary. There's really no support for resizable BARs through vfio-pci right now, we get the device in its base configuration, QEMU maps that and exposes a rather fixed device to the VM. If this is something we need to address for GPU assignment, let's talk. Thanks, Alex