From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EEDFC2BA83 for ; Wed, 12 Feb 2020 12:39:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 22D9320714 for ; Wed, 12 Feb 2020 12:39:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZoiwioTs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22D9320714 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B0D4C6B0440; Wed, 12 Feb 2020 07:39:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ABDF56B0441; Wed, 12 Feb 2020 07:39:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AC5B6B0442; Wed, 12 Feb 2020 07:39:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100]) by kanga.kvack.org (Postfix) with ESMTP id 817926B0440 for ; Wed, 12 Feb 2020 07:39:29 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 400268245571 for ; Wed, 12 Feb 2020 12:39:29 +0000 (UTC) X-FDA: 76481430858.10.nose05_18810147f6b5f X-HE-Tag: nose05_18810147f6b5f X-Filterd-Recvd-Size: 8357 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Wed, 12 Feb 2020 12:39:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1581511167; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lrOf0m5SvxP5QCA86tTGN9hUolA1Hd0GGjFfSXWjAYA=; b=ZoiwioTsuOiMiNpJW9yQv4i32Q9jWS4GwFKD9eWVpkG/Qx33AmPZNJZ9udc0GTSf/e33iP p2S+v5QM2gNtkTAmvSH0enExBIIP71HiNj9Yjku/GpWWT+y4f3f6q4owfDGM0xMqrYddMV pu9FYAZ9ob4ewS+Jvifljfm2Oucqo8w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-208-kW5tyTK4OfajhoaOohWPdg-1; Wed, 12 Feb 2020 07:39:16 -0500 X-MC-Unique: kW5tyTK4OfajhoaOohWPdg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 33328107ACC7; Wed, 12 Feb 2020 12:39:15 +0000 (UTC) Received: from gondolin (dhcp-192-195.str.redhat.com [10.33.192.195]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5E0B0388; Wed, 12 Feb 2020 12:39:10 +0000 (UTC) Date: Wed, 12 Feb 2020 13:39:08 +0100 From: Cornelia Huck To: Christian Borntraeger Cc: david@redhat.com, Ulrich.Weigand@de.ibm.com, aarcange@redhat.com, akpm@linux-foundation.org, frankja@linux.vnet.ibm.com, gor@linux.ibm.com, imbrenda@linux.ibm.com, kvm@vger.kernel.org, linux-mm@kvack.org, linux-s390@vger.kernel.org, mimu@linux.ibm.com, thuth@redhat.com Subject: Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages Message-ID: <20200212133908.6c6c9072.cohuck@redhat.com> In-Reply-To: <20200211092341.3965-1-borntraeger@de.ibm.com> References: <567B980B-BDA5-4EF3-A96E-1542D11F2BD4@redhat.com> <20200211092341.3965-1-borntraeger@de.ibm.com> Organization: Red Hat GmbH MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 11 Feb 2020 04:23:41 -0500 Christian Borntraeger wrote: > From: Ulrich Weigand > > The adapter interrupt page containing the indicator bits is currently > pinned. That means that a guest with many devices can pin a lot of > memory pages in the host. This also complicates the reference tracking > which is needed for memory management handling of protected virtual > machines. > We can simply try to get the userspace page set the bits and free the > page. By storing the userspace address in the irq routing entry instead > of the guest address we can actually avoid many lookups and list walks > so that this variant is very likely not slower. > > Signed-off-by: Ulrich Weigand > [borntraeger@de.ibm.com: patch simplification] > Signed-off-by: Christian Borntraeger > --- > quick and dirty, how this could look like > > > arch/s390/include/asm/kvm_host.h | 3 - > arch/s390/kvm/interrupt.c | 146 +++++++++++-------------------- > 2 files changed, 49 insertions(+), 100 deletions(-) > (...) > @@ -2488,83 +2485,26 @@ int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked) > > static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr) > { > - struct s390_io_adapter *adapter = get_io_adapter(kvm, id); > - struct s390_map_info *map; > - int ret; > - > - if (!adapter || !addr) > - return -EINVAL; > - > - map = kzalloc(sizeof(*map), GFP_KERNEL); > - if (!map) { > - ret = -ENOMEM; > - goto out; > - } > - INIT_LIST_HEAD(&map->list); > - map->guest_addr = addr; > - map->addr = gmap_translate(kvm->arch.gmap, addr); > - if (map->addr == -EFAULT) { > - ret = -EFAULT; > - goto out; > - } > - ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page); > - if (ret < 0) > - goto out; > - BUG_ON(ret != 1); > - down_write(&adapter->maps_lock); > - if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) { > - list_add_tail(&map->list, &adapter->maps); > - ret = 0; > - } else { > - put_page(map->page); > - ret = -EINVAL; > + /* > + * We resolve the gpa to hva when setting the IRQ routing. If userspace > + * decides to mess with the memslots it better also updates the irq > + * routing. Otherwise we will write to the wrong userspace address. > + */ > + return 0; Given that this function now always returns 0, we basically get a completely useless roundtrip into the kernel when userspace is trying to setup the mappings. Can we define a new IO_ADAPTER_MAPPING_NOT_NEEDED or so capability that userspace can check? This change in behaviour probably wants a change in the documentation as well. > } > - up_write(&adapter->maps_lock); > -out: > - if (ret) > - kfree(map); > - return ret; > -} > > static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr) > { > - struct s390_io_adapter *adapter = get_io_adapter(kvm, id); > - struct s390_map_info *map, *tmp; > - int found = 0; > - > - if (!adapter || !addr) > - return -EINVAL; > - > - down_write(&adapter->maps_lock); > - list_for_each_entry_safe(map, tmp, &adapter->maps, list) { > - if (map->guest_addr == addr) { > - found = 1; > - atomic_dec(&adapter->nr_maps); > - list_del(&map->list); > - put_page(map->page); > - kfree(map); > - break; > - } > - } > - up_write(&adapter->maps_lock); > - > - return found ? 0 : -EINVAL; > + return 0; Same here. > } > > void kvm_s390_destroy_adapters(struct kvm *kvm) > { > int i; > - struct s390_map_info *map, *tmp; > > for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) { > if (!kvm->arch.adapters[i]) > continue; > - list_for_each_entry_safe(map, tmp, > - &kvm->arch.adapters[i]->maps, list) { > - list_del(&map->list); > - put_page(map->page); > - kfree(map); > - } > kfree(kvm->arch.adapters[i]); Call kfree() unconditionally? > } > } > @@ -2831,19 +2771,25 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap) > return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit; > } > > -static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter, > - u64 addr) > +static struct page *get_map_page(struct kvm *kvm, > + struct s390_io_adapter *adapter, > + u64 uaddr) > { > - struct s390_map_info *map; > + struct page *page; > + int ret; > > if (!adapter) > return NULL; > - > - list_for_each_entry(map, &adapter->maps, list) { > - if (map->guest_addr == addr) > - return map; > - } > - return NULL; > + page = NULL; > + if (!uaddr) > + return NULL; > + down_read(&kvm->mm->mmap_sem); > + ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE, > + &page, NULL, NULL); > + if (ret < 1) > + page = NULL; > + up_read(&kvm->mm->mmap_sem); > + return page; > } > > static int adapter_indicators_set(struct kvm *kvm, (...) > @@ -2951,12 +2900,15 @@ int kvm_set_routing_entry(struct kvm *kvm, > const struct kvm_irq_routing_entry *ue) > { > int ret; > + u64 uaddr; > > switch (ue->type) { > case KVM_IRQ_ROUTING_S390_ADAPTER: > e->set = set_adapter_int; > - e->adapter.summary_addr = ue->u.adapter.summary_addr; > - e->adapter.ind_addr = ue->u.adapter.ind_addr; > + uaddr = gmap_translate(kvm->arch.gmap, ue->u.adapter.summary_addr); Can gmap_translate() return -EFAULT here? The code above only seems to check for 0... do we want to return an error here? > + e->adapter.summary_addr = uaddr; > + uaddr = gmap_translate(kvm->arch.gmap, ue->u.adapter.ind_addr); > + e->adapter.ind_addr = uaddr; > e->adapter.summary_offset = ue->u.adapter.summary_offset; > e->adapter.ind_offset = ue->u.adapter.ind_offset; > e->adapter.adapter_id = ue->u.adapter.adapter_id;