From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D103C2BA2B for ; Mon, 13 Apr 2020 22:21:45 +0000 (UTC) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3EAFB20678 for ; Mon, 13 Apr 2020 22:21:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eXFQ5Dmw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3EAFB20678 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=iommu-bounces@lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 14B56864FB; Mon, 13 Apr 2020 22:21:45 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4IiKI9Aexg2d; Mon, 13 Apr 2020 22:21:44 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 0EE80864F5; Mon, 13 Apr 2020 22:21:44 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E7ED2C1D7D; Mon, 13 Apr 2020 22:21:43 +0000 (UTC) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id BEA29C0172 for ; Mon, 13 Apr 2020 22:21:42 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id AD743864FB for ; Mon, 13 Apr 2020 22:21:42 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EnvgaUfPpC+e for ; Mon, 13 Apr 2020 22:21:40 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by whitealder.osuosl.org (Postfix) with ESMTPS id 531CF864F5 for ; Mon, 13 Apr 2020 22:21:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586816499; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YUKZXXSYeFudW02TCBb+YJ5vlqz4fbhPl0ENrJ4E7Ok=; b=eXFQ5DmwTIzZcAaATlBsbjNLtkBThssWfuZnkfRm8UxWJ8o54ROaIL6HdscHArwLgFE9Ns u1qa0K6UPueOd00hUFSb+hz6frbZ64GsWF5Fz4mM/czq2hmtBBqFrs2V5cFq/loy3JL4u7 msZR+qBmJRHAbzWjmZNRxbolh+6mCs8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-98-aIdoVem1PjqPjFgWhFaP8w-1; Mon, 13 Apr 2020 18:21:33 -0400 X-MC-Unique: aIdoVem1PjqPjFgWhFaP8w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2547A18C43C1; Mon, 13 Apr 2020 22:21:32 +0000 (UTC) Received: from w520.home (ovpn-112-162.phx2.redhat.com [10.3.112.162]) by smtp.corp.redhat.com (Postfix) with ESMTP id 759535D9C9; Mon, 13 Apr 2020 22:21:30 +0000 (UTC) Date: Mon, 13 Apr 2020 16:21:29 -0600 From: Alex Williamson To: Jacob Pan Subject: Re: [PATCH v2 1/3] iommu/uapi: Define uapi version and capabilities Message-ID: <20200413162129.313b3b5a@w520.home> In-Reply-To: <20200413134157.395981a6@jacob-builder> References: <1585178227-17061-1-git-send-email-jacob.jun.pan@linux.intel.com> <1585178227-17061-2-git-send-email-jacob.jun.pan@linux.intel.com> <20200326092316.GA31648@infradead.org> <20200326094442.5be042ce@jacob-builder> <20200327074702.GA27959@infradead.org> <20200327165335.397f24a3@jacob-builder> <20200330090746.23c5599c@jacob-builder> <20200331085444.44bee0bb@jacob-builder> <20200402113604.6eea1e6f@jacob-builder> <20200413134157.395981a6@jacob-builder> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Cc: "Tian, Kevin" , "Raj, Ashok" , Jean-Philippe Brucker , LKML , "iommu@lists.linux-foundation.org" , David Woodhouse X-BeenThere: iommu@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Development issues for Linux IOMMU support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: iommu-bounces@lists.linux-foundation.org Sender: "iommu" On Mon, 13 Apr 2020 13:41:57 -0700 Jacob Pan wrote: > Hi All, > > Just a gentle reminder, any feedback on the options I listed below? New > ideas will be even better. > > Christoph, does the explanation make sense to you? We do have the > capability/flag based scheme for IOMMU API extension, the version is > mainly used for size lookup. Compatibility checking is another use of > the version, it makes checking easy when a vIOMMU is launched. > > Thanks, > > Jacob > > On Thu, 2 Apr 2020 11:36:04 -0700 > Jacob Pan wrote: > > > On Wed, 1 Apr 2020 05:32:21 +0000 > > "Tian, Kevin" wrote: > > > > > > From: Jacob Pan > > > > Sent: Tuesday, March 31, 2020 11:55 PM > > > > > > > > On Tue, 31 Mar 2020 06:06:38 +0000 > > > > "Tian, Kevin" wrote: > > > > > > > > > > From: Jacob Pan > > > > > > Sent: Tuesday, March 31, 2020 12:08 AM > > > > > > > > > > > > On Mon, 30 Mar 2020 05:40:40 +0000 > > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > > > From: Jacob Pan > > > > > > > > Sent: Saturday, March 28, 2020 7:54 AM > > > > > > > > > > > > > > > > On Fri, 27 Mar 2020 00:47:02 -0700 > > > > > > > > Christoph Hellwig wrote: > > > > > > > > > > > > > > > > > On Fri, Mar 27, 2020 at 02:49:55AM +0000, Tian, Kevin > > > > > > > > > wrote: > > > > > > > > > > If those API calls are inter-dependent for composing a > > > > > > > > > > feature (e.g. SVA), shouldn't we need a way to check > > > > > > > > > > them together before exposing the feature to the > > > > > > > > > > guest, e.g. through a iommu_get_uapi_capabilities > > > > > > > > > > interface? > > > > > > > > > > > > > > > > > > Yes, that makes sense. The important bit is to have a > > > > > > > > > capability flags and not version numbers. > > > > > > > > > > > > > > > > The challenge is that there are two consumers in the > > > > > > > > kernel for this. 1. VFIO only look for compatibility, and > > > > > > > > size of each data struct such that it can copy_from_user. > > > > > > > > > > > > > > > > 2. IOMMU driver, the "real consumer" of the content. > > > > > > > > > > > > > > > > For 2, I agree and we do plan to use the capability flags > > > > > > > > to check content and maintain backward compatibility etc. > > > > > > > > > > > > > > > > For VFIO, it is difficult to do size look up based on > > > > > > > > capability flags. > > > > > > > > > > > > > > Can you elaborate the difficulty in VFIO? if, as Christoph > > > > > > > Hellwig pointed out, version number is already avoided > > > > > > > everywhere, it is interesting to know whether this work > > > > > > > becomes a real exception or just requires a different > > > > > > > mindset. > > > > > > From VFIO p.o.v. the IOMMU UAPI data is opaque, it only needs > > > > > > to do two things: > > > > > > 1. is the UAPI compatible? > > > > > > 2. what is the size to copy? > > > > > > > > > > > > If you look at the version number, this is really a "version > > > > > > as size" lookup, as provided by the helper function in this > > > > > > patch. An example can be the newly introduced clone3 syscall. > > > > > > https://lwn.net/Articles/792628/ > > > > > > In clone3, new version must have new size. The slight > > > > > > difference here is that, unlike clone3, we have multiple data > > > > > > structures instead of a single struct clone_args {}. And each > > > > > > struct has flags to enumerate its contents besides size. > > > > > > > > > > Thanks for providing that link. However clone3 doesn't include a > > > > > version field to do "version as size" lookup. Instead, as you > > > > > said, it includes a size parameter which sounds like the option > > > > > 3 (argsz) listed below. > > > > > > > > > Right, there is no version in clone3. size = version. I view this > > > > as a 1:1 lookup. > > > > > > > > > > > > > > > > Besides breaching data abstraction, if VFIO has to check IOMMU > > > > > > flags to determine the sizes, it has many combinations. > > > > > > > > > > > > We also separate the responsibilities into two parts > > > > > > 1. compatibility - version, size by VFIO > > > > > > 2. sanity check - capability flags - by IOMMU > > > > > > > > > > I feel argsz+flags approach can perfectly meet above > > > > > requirement. The userspace set the size and flags for whatever > > > > > capabilities it uses, and VFIO simply copies the parameters by > > > > > size and pass to IOMMU for further sanity check. Of course the > > > > > assumption is that we do provide an interface for userspace to > > > > > enumerate all supported capabilities. > > > > You cannot trust user for argsz. the size to be copied from user > > > > must be based on knowledge in kernel. That is why we have this > > > > version to size lookup. > > > > > > > > In VFIO, the size to copy is based on knowledge of each VFIO UAPI > > > > structures and VFIO flags. But here the flags are IOMMU UAPI > > > > flags. As you pointed out in another thread, VFIO is one user. > > > > > > If that is the case, can we let VFIO only copy its own UAPI fields > > > while simply passing the user pointer of IOMMU UAPI structure to > > > IOMMU driver for further size check and copy? Otherwise we are > > > entering a dead end that VFIO doesn't want to parse a structure > > > which is not defined by him while using version to represent the > > > black box size is considered as a discarded scheme and doesn't > > > scale well... > > I think this could be an other viable option. Let me try to summarize > > since this has been a long discussion since the original version. > > > > Problem statements: > > 1. When launching vIOMMU in the guest, how can we ensure the host has > > compatible support upfront? as compared to fail later. This sounds like a feature/extension interface, both KVM and vfio have them to allow userspace to check support of specific features. > > 2. As UAPI data gets extended (both in size and flags), how can we > > know the size to copy For vfio we of course use the argsz/flags trick where the user tells us how big the buffer is and flags in the header tell us what fields beyond the base specification are enabled. This can get tricky to extend and there can be confusion whether a flag indicates the presence of a field or the validity of a field. We also have interfaces where the ioctl is a header plus a data blob where flags tell us what the data is. These can serve double duty as a extension check too as we've done for VFIO_DEVICE_FEATURE. This doesn't really support extension of a defined feature though, rather we'd be more likely to create a set of flags that indicate the data object is feature-v2 and redefine the structure, or of course we revisit the entire featuring question within the structure of that data blob. We also implement capability chains, though they're more meant for passing data to the user, where the user provides a buffer and we link capabilities together within that buffer for the user to walk. We've defined a mechanism through -ENOSPC and argsz to tell the user how large a buffer is necessary. I dare mention we have a version per capability as these are largely modeled after capability chains in PCI config space. We haven't actually incremented any versions, but I imagine we'd do so like PCI, maintaining backwards compatibility and only defining unused bits and adding fields as the version increases. Is the objection to a global version or to any version fields? I don't really understand the global version, I'd think a mechanism to check extensions plus a per structure flags/version would be preferred. The former should resolve how userspace can test support for features requiring multiple interfaces. A global version also implies that we're only ever adding features and never removing. For example, feature Foo is added in version 4, but it's replaced by feature Bar in version 5, now userspace can't simply test version >= 4 must include feature Foo. It seems to me that version and flags can also be complimentary, for example a field might be defined by a version but a flag could indicate if it's implemented. With only the flag, we'd infer the field from the flag, with only the version we'd need to assume the field is always implemented. So I have a hard time making a blanket statement that all versions fields should be avoided. > > 3. Maintain backward compatibility while allowing extensions? > > > > I think we all agreed that using flags (capability or types) is the > > way to address #3. As Christoph pointed out, version number should > > not be used for this purpose. > > > > So for problem 1 & 2, we have the following options: > > 1. Have a version-size mapping as proposed in this set. VFIO copies > > from user the correct size based on version-type lookup. Processing > > of the data is based on flags in IOMMU driver. > > > > 2. VFIO copy its own minsz then pass the user pointer to IOMMU driver > > for further copy_from_user based on flags. (by Kevin) > > > > 3. Adopt VFIO argsz scheme, caller fills in argsz for the offset the > > variable size union. VFIO do not check argsz in that it requires IOMMU > > specific knowledge. IOMMU driver Use flags to handle the variable > > size.(by Alex). I think this what we have in Yi's VFIO & QEMU patch. > > argsz filled by QEMU includes bind_data. > > > > 4. Do not use a unified version, have a fixed size of all UAPI > > structures, padding in struct and union. (Wasteful, not preferred per > > V1 discussion) > > > > For both 2 & 3, a unified version is not used, each API > > treated separately. vIOMMU will be launched w/o assurance of > > compatibility of all APIs. Fault handling may be more complex in > > normal operations. > > > > Appreciate everyone's input. Joerg and Alex, could you help to make a > > decision here? As above, I think using a global API version number to imply support for a feature is doomed to fail, we should instead expose an interface to check for specific features. In any of the proposed solutions, the IOMMU driver is ultimately responsible for validating the user data, so do we want vfio performing the copy_from_user() to an object that could later be assumed to be sanitized, or should vfio just pass a user pointer to make it obvious that the consumer is responsible for all the user protections? Seems like the latter. That still really doesn't address what's in that user data blob yet, but the vfio interface could be: struct { __u32 argsz; __u32 flags; __u8 data[]; } Where flags might be partitioned like we do for DEVICE_FEATURE to indicate the format of data and what vfio should do with it, and data might simply be defined as a (__u64 __user *). This user pointer would then likely be an IOMMU UAPI struct, so I've only just gotten back the the IOMMU UAPI question at hand, but I don't really see the disadvantage to including both version and flags fields per structure. Perhaps this is choice 1. above, but with a version at a per structure level indicating the backwards compatible size and layout of the structure and flags being used to indicate support for optional features within those fields. Is a version field still taboo for such a use case? Thanks, Alex _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu