From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8742FC04AB4 for ; Tue, 14 May 2019 07:21:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4AE6A208C3 for ; Tue, 14 May 2019 07:21:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726461AbfENHU6 (ORCPT ); Tue, 14 May 2019 03:20:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33110 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726190AbfENHU6 (ORCPT ); Tue, 14 May 2019 03:20:58 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3891499C4B; Tue, 14 May 2019 07:20:57 +0000 (UTC) Received: from beluga.usersys.redhat.com (unknown [10.43.2.166]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5F8AF5D6A6; Tue, 14 May 2019 07:20:42 +0000 (UTC) Date: Tue, 14 May 2019 09:20:40 +0200 From: Erik Skultety To: Yan Zhao Cc: Cornelia Huck , "Dr. David Alan Gilbert" , Alex Williamson , "intel-gvt-dev@lists.freedesktop.org" , "arei.gonglei@huawei.com" , "aik@ozlabs.ru" , "Zhengxiao.zx@alibaba-inc.com" , "shuangtai.tst@alibaba-inc.com" , "qemu-devel@nongnu.org" , "eauger@redhat.com" , "Liu, Yi L" , "Yang, Ziye" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "felipe@nutanix.com" , "Liu, Changpeng" , "Ken.Xue@amd.com" , "jonathan.davies@nutanix.com" , "He, Shaopeng" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "libvir-list@redhat.com" , "Tian, Kevin" , "zhenyuw@linux.intel.com" , "Wang, Zhi A" , "cjia@nvidia.com" , "kwankhede@nvidia.com" , "berrange@redhat.com" , "dinechin@redhat.com" Subject: Re: [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device Message-ID: <20190514072039.GA2089@beluga.usersys.redhat.com> References: <20190507151826.502be009@x1.home> <20190509173839.2b9b2b46.cohuck@redhat.com> <20190509154857.GF2868@work-vm> <20190509175404.512ae7aa.cohuck@redhat.com> <20190509164825.GG2868@work-vm> <20190510110838.2df4c4d0.cohuck@redhat.com> <20190510093608.GD2854@work-vm> <20190510114838.7e16c3d6.cohuck@redhat.com> <20190513132804.GD11139@beluga.usersys.redhat.com> <20190514061235.GC20407@joy-OptiPlex-7040> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190514061235.GC20407@joy-OptiPlex-7040> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 14 May 2019 07:20:57 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote: > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote: > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote: > > > On Fri, 10 May 2019 10:36:09 +0100 > > > "Dr. David Alan Gilbert" wrote: > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > On Thu, 9 May 2019 17:48:26 +0100 > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > > > On Thu, 9 May 2019 16:48:57 +0100 > > > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600 > > > > > > > > > Alex Williamson wrote: > > > > > > > > > > > > > > > > > > > On Sun, 5 May 2019 21:49:04 -0400 > > > > > > > > > > Yan Zhao wrote: > > > > > > > > > > > > > > > > > > > > + Errno: > > > > > > > > > > > + If vendor driver wants to claim a mdev device incompatible to all other mdev > > > > > > > > > > > + devices, it should not register version attribute for this mdev device. But if > > > > > > > > > > > + a vendor driver has already registered version attribute and it wants to claim > > > > > > > > > > > + a mdev device incompatible to all other mdev devices, it needs to return > > > > > > > > > > > + -ENODEV on access to this mdev device's version attribute. > > > > > > > > > > > + If a mdev device is only incompatible to certain mdev devices, write of > > > > > > > > > > > + incompatible mdev devices's version strings to its version attribute should > > > > > > > > > > > + return -EINVAL; > > > > > > > > > > > > > > > > > > > > I think it's best not to define the specific errno returned for a > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply > > > > > > > > > > needs to know that an errno on read indicates the device does not > > > > > > > > > > support migration version comparison and that an errno on write > > > > > > > > > > indicates the devices are incompatible or the target doesn't support > > > > > > > > > > migration versions. > > > > > > > > > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between > > > > > > > > > those two particular devices'. Userspace might want to do different > > > > > > > > > things (e.g. trying with different device pairs). > > > > > > > > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't > > > > > > > > get much information that way. > > > > > > > > > > > > > > So, what would be a reasonable approach? Userspace should first read > > > > > > > the version attributes on both devices (to find out whether migration > > > > > > > is supported at all), and only then figure out via writing whether they > > > > > > > are compatible? > > > > > > > > > > > > > > (Or just go ahead and try, if it does not care about the reason.) > > > > > > > > > > > > Well, I'm OK with something like writing to test whether it's > > > > > > compatible, it's just we need a better way of saying 'no'. > > > > > > I'm not sure if that involves reading back from somewhere after > > > > > > the write or what. > > > > > > > > > > Hm, so I basically see two ways of doing that: > > > > > - standardize on some error codes... problem: error codes can be hard > > > > > to fit to reasons > > > > > - make the error available in some attribute that can be read > > > > > > > > > > I'm not sure how we can serialize the readback with the last write, > > > > > though (this looks inherently racy). > > > > > > > > > > How important is detailed error reporting here? > > > > > > > > I think we need something, otherwise we're just going to get vague > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be > > > > good enough to point most users to something they can understand > > > > (e.g. wrong card family/too old a driver etc). > > > > > > Ok, that sounds like a reasonable point. Not that I have a better idea > > > how to achieve that, though... we could also log a more verbose error > > > message to the kernel log, but that's not necessarily where a user will > > > look first. > > > > In case of libvirt checking the compatibility, it won't matter how good the > > error message in the kernel log is and regardless of how many error states you > > want to handle, libvirt's only limited to errno here, since we're going to do > > plain read/write, so our internal error message returned to the user is only > > going to contain what the errno says - okay, of course we can (and we DO) > > provide libvirt specific string, further specifying the error but like I > > mentioned, depending on how many error cases we want to distinguish this may be > > hard for anyone to figure out solely on the error code, as apps will most > > probably not parse the > > logs. > > > > Regards, > > Erik > hi Erik > do you mean you are agreeing on defining common errors and only returning errno? In a sense, yes. While it is highly desirable to have logs with descriptive messages which will help in troubleshooting tremendously, I wanted to point out that spending time with error logs may not be that worthwhile especially since most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs. That means that we're limited by the errnos available, so apart from reporting the generic system message we can't any more magic in terms of the error messages, so the driver needs to assure that a proper message is propagated to the journal and at best libvirt can direct the user (consumer) to look through the system logs for more info. I also agree with the point mentioned above that defining a specific errno is IMO not the way to go, as these would be just too specific for the read(3)/write(3) use case. That said, from libvirt POV as a consumer, I'd expect there to be truly only 2 errors (I believe Alex has mentioned something similar in one of his responses in one of the threads): a) read error indicating that an mdev type doesn't support migration - I assume if one type doesn't support migration, none of the other types exposed on the parent device do, is that a fair assumption? b) write error indicating that the mdev types are incompatible for migration Regards, Erik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B718C04AB7 for ; Tue, 14 May 2019 07:24:22 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5F114208CA for ; Tue, 14 May 2019 07:24:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F114208CA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([127.0.0.1]:41027 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hQRnQ-0007zL-N3 for qemu-devel@archiver.kernel.org; Tue, 14 May 2019 03:24:20 -0400 Received: from eggs.gnu.org ([209.51.188.92]:33094) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hQRkG-0005ho-Q4 for qemu-devel@nongnu.org; Tue, 14 May 2019 03:21:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hQRkD-0007wI-8X for qemu-devel@nongnu.org; Tue, 14 May 2019 03:21:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48236) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hQRkC-0007qI-5u for qemu-devel@nongnu.org; Tue, 14 May 2019 03:21:01 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3891499C4B; Tue, 14 May 2019 07:20:57 +0000 (UTC) Received: from beluga.usersys.redhat.com (unknown [10.43.2.166]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5F8AF5D6A6; Tue, 14 May 2019 07:20:42 +0000 (UTC) Date: Tue, 14 May 2019 09:20:40 +0200 From: Erik Skultety To: Yan Zhao Message-ID: <20190514072039.GA2089@beluga.usersys.redhat.com> References: <20190507151826.502be009@x1.home> <20190509173839.2b9b2b46.cohuck@redhat.com> <20190509154857.GF2868@work-vm> <20190509175404.512ae7aa.cohuck@redhat.com> <20190509164825.GG2868@work-vm> <20190510110838.2df4c4d0.cohuck@redhat.com> <20190510093608.GD2854@work-vm> <20190510114838.7e16c3d6.cohuck@redhat.com> <20190513132804.GD11139@beluga.usersys.redhat.com> <20190514061235.GC20407@joy-OptiPlex-7040> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190514061235.GC20407@joy-OptiPlex-7040> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 14 May 2019 07:20:57 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "cjia@nvidia.com" , "kvm@vger.kernel.org" , "aik@ozlabs.ru" , "Zhengxiao.zx@alibaba-inc.com" , "shuangtai.tst@alibaba-inc.com" , "qemu-devel@nongnu.org" , "kwankhede@nvidia.com" , "eauger@redhat.com" , "Liu, Yi L" , "Yang, Ziye" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "libvir-list@redhat.com" , "arei.gonglei@huawei.com" , "felipe@nutanix.com" , "Ken.Xue@amd.com" , "Tian, Kevin" , "Dr. David Alan Gilbert" , "zhenyuw@linux.intel.com" , "dinechin@redhat.com" , Alex Williamson , "intel-gvt-dev@lists.freedesktop.org" , "Liu, Changpeng" , "berrange@redhat.com" , Cornelia Huck , "linux-kernel@vger.kernel.org" , "Wang, Zhi A" , "jonathan.davies@nutanix.com" , "He, Shaopeng" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote: > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote: > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote: > > > On Fri, 10 May 2019 10:36:09 +0100 > > > "Dr. David Alan Gilbert" wrote: > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > On Thu, 9 May 2019 17:48:26 +0100 > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > > > On Thu, 9 May 2019 16:48:57 +0100 > > > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600 > > > > > > > > > Alex Williamson wrote: > > > > > > > > > > > > > > > > > > > On Sun, 5 May 2019 21:49:04 -0400 > > > > > > > > > > Yan Zhao wrote: > > > > > > > > > > > > > > > > > > > > + Errno: > > > > > > > > > > > + If vendor driver wants to claim a mdev device incompatible to all other mdev > > > > > > > > > > > + devices, it should not register version attribute for this mdev device. But if > > > > > > > > > > > + a vendor driver has already registered version attribute and it wants to claim > > > > > > > > > > > + a mdev device incompatible to all other mdev devices, it needs to return > > > > > > > > > > > + -ENODEV on access to this mdev device's version attribute. > > > > > > > > > > > + If a mdev device is only incompatible to certain mdev devices, write of > > > > > > > > > > > + incompatible mdev devices's version strings to its version attribute should > > > > > > > > > > > + return -EINVAL; > > > > > > > > > > > > > > > > > > > > I think it's best not to define the specific errno returned for a > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply > > > > > > > > > > needs to know that an errno on read indicates the device does not > > > > > > > > > > support migration version comparison and that an errno on write > > > > > > > > > > indicates the devices are incompatible or the target doesn't support > > > > > > > > > > migration versions. > > > > > > > > > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between > > > > > > > > > those two particular devices'. Userspace might want to do different > > > > > > > > > things (e.g. trying with different device pairs). > > > > > > > > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't > > > > > > > > get much information that way. > > > > > > > > > > > > > > So, what would be a reasonable approach? Userspace should first read > > > > > > > the version attributes on both devices (to find out whether migration > > > > > > > is supported at all), and only then figure out via writing whether they > > > > > > > are compatible? > > > > > > > > > > > > > > (Or just go ahead and try, if it does not care about the reason.) > > > > > > > > > > > > Well, I'm OK with something like writing to test whether it's > > > > > > compatible, it's just we need a better way of saying 'no'. > > > > > > I'm not sure if that involves reading back from somewhere after > > > > > > the write or what. > > > > > > > > > > Hm, so I basically see two ways of doing that: > > > > > - standardize on some error codes... problem: error codes can be hard > > > > > to fit to reasons > > > > > - make the error available in some attribute that can be read > > > > > > > > > > I'm not sure how we can serialize the readback with the last write, > > > > > though (this looks inherently racy). > > > > > > > > > > How important is detailed error reporting here? > > > > > > > > I think we need something, otherwise we're just going to get vague > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be > > > > good enough to point most users to something they can understand > > > > (e.g. wrong card family/too old a driver etc). > > > > > > Ok, that sounds like a reasonable point. Not that I have a better idea > > > how to achieve that, though... we could also log a more verbose error > > > message to the kernel log, but that's not necessarily where a user will > > > look first. > > > > In case of libvirt checking the compatibility, it won't matter how good the > > error message in the kernel log is and regardless of how many error states you > > want to handle, libvirt's only limited to errno here, since we're going to do > > plain read/write, so our internal error message returned to the user is only > > going to contain what the errno says - okay, of course we can (and we DO) > > provide libvirt specific string, further specifying the error but like I > > mentioned, depending on how many error cases we want to distinguish this may be > > hard for anyone to figure out solely on the error code, as apps will most > > probably not parse the > > logs. > > > > Regards, > > Erik > hi Erik > do you mean you are agreeing on defining common errors and only returning errno? In a sense, yes. While it is highly desirable to have logs with descriptive messages which will help in troubleshooting tremendously, I wanted to point out that spending time with error logs may not be that worthwhile especially since most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs. That means that we're limited by the errnos available, so apart from reporting the generic system message we can't any more magic in terms of the error messages, so the driver needs to assure that a proper message is propagated to the journal and at best libvirt can direct the user (consumer) to look through the system logs for more info. I also agree with the point mentioned above that defining a specific errno is IMO not the way to go, as these would be just too specific for the read(3)/write(3) use case. That said, from libvirt POV as a consumer, I'd expect there to be truly only 2 errors (I believe Alex has mentioned something similar in one of his responses in one of the threads): a) read error indicating that an mdev type doesn't support migration - I assume if one type doesn't support migration, none of the other types exposed on the parent device do, is that a fair assumption? b) write error indicating that the mdev types are incompatible for migration Regards, Erik