From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AFCFC433E6 for ; Sun, 10 Jan 2021 08:48:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B7A1239EB for ; Sun, 10 Jan 2021 08:48:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726665AbhAJIsU (ORCPT ); Sun, 10 Jan 2021 03:48:20 -0500 Received: from mail.kernel.org ([198.145.29.99]:60530 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726142AbhAJIsU (ORCPT ); Sun, 10 Jan 2021 03:48:20 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B31ED239EB; Sun, 10 Jan 2021 08:47:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1610268459; bh=irKAhtaJtCZe7N6lWVZH9hxZFfZco7OZkLCmKDfdn88=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=eAS/0EP5SqObKkIVZntlBA6V9qV71hEJ1cFHBihZ7qcN1biUc63pVcAbKioWPMj0a 79/x4ZxBMDd3rcA+f1fspyX3ayFf/MGSrAeG5xq4F64ZQsnmA9+nBhrgsJs2vsop2k KEz80++RprQXuTHZThW2vLo+gaSuQtoVpXi4QcyK85O9gHuztKQosoB4ji4foSE5q9 tO4HprIXJCb9PQr8owxzgkBWF2vo86v+lOO2J0iboKIteAXXEKUzjShV+vZbifG/hE ZvHS3QM3ih0TEIKT5Id3QfRtolvGajdm82mb+4irp8m+UK/ZZzFHj54oZyEJnJLqrG kqd46rVmb81RQ== Date: Sun, 10 Jan 2021 10:47:35 +0200 From: Leon Romanovsky To: Alex Williamson Cc: Don Dutile , Bjorn Helgaas , Bjorn Helgaas , Saeed Mahameed , Jason Gunthorpe , Jakub Kicinski , linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH mlx5-next 1/4] PCI: Configure number of MSI-X vectors for SR-IOV VFs Message-ID: <20210110084735.GH31158@unreal> References: <20210108005721.GA1403391@bjorn-Precision-5520> <20210108072525.GB31158@unreal> <20210108092145.7c70ff74@omen.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210108092145.7c70ff74@omen.home> Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Fri, Jan 08, 2021 at 09:21:45AM -0700, Alex Williamson wrote: > On Fri, 8 Jan 2021 09:25:25 +0200 > Leon Romanovsky wrote: > > > On Thu, Jan 07, 2021 at 10:54:38PM -0500, Don Dutile wrote: > > > On 1/7/21 7:57 PM, Bjorn Helgaas wrote: > > > > [+cc Alex, Don] > > > > <...> > > > > > > Help me connect the dots here. Is this required because of something > > > > peculiar to mlx5, or is something like this required for all SR-IOV > > > > devices because of the way the PCIe spec is written? > > > So, overall, I'm guessing the mlx5 device can have 1000's of MSIX -- say, one per send/receive/completion queue. > > > This device capability may exceed the max number MSIX a VM can have/support (depending on guestos). > > > So, a sysfs tunable is used to set the max MSIX available, and thus, the device puts >1 send/rcv/completion queue intr on a given MSIX. > > > > > > ok, time for Leon to better state what this patch does, > > > and why it's needed on mlx5 (and may be applicable to other/future high-MSIX devices assigned to VMs (NVME?)). > > > Hmmm, now that I said it, why is it SRIOV-centric and not pci-device centric (can pass a PF w/high number of MSIX to a VM). > > > > Thanks Don and Bjorn, > > > > I will answer on all comments a little bit later when I will return > > to the office (Sunday). > > > > However it is important for me to present the use case. > > > > Our mlx5 SR-IOV devices were always capable to drive many MSI-X (upto 2K, > > don't catch me on exact number), however when user created VFs, the FW has > > no knowledge of how those VFs will be used. So FW had no choice but statically > > and equally assign same amount of MSI-X to all VFs. > > > > After SR-IOV VF creation, user will bind those new VFs to the VMs, but > > the VMs have different number of CPUs and despite HW being able to deliver > > all needed number of vectors (in mlx5 netdev world, number of channels == number > > of CPUs == number of vectors), we will be limited by already set low number > > of vectors. > > > > So it is not for vector reduction, but more for vector re-partition. > > > > As an example, imagine mlx5 with two VFs. One VF is bounded to VM with 200 CPUs > > and another is bounded to VM with 1 CPU. They need different amount of MSI-X vectors. > > > > Hope that I succeeded to explain :). > > The idea is not unreasonable imo, but without knowing the size of the > vector pool, range available per vf, or ultimately whether the vf > supports this feature before we try to configure it, I don't see how > userspace is expected to make use of this in the general case. If the > configuration requires such specific vf vector usage and pf driver > specific knowledge, I'm not sure it's fit as a generic pci-sysfs > interface. Thanks, I didn't prohibit read of newly created sysfs file, but if I change the implementation to vf_msix_vec_show() to return -EOPNOTSUPP for not-supported device, the software will be able to distinguish supported/not-supported. SW will read this file: -> success -> feature supported -> failure -> feature not supported There is one extra sysfs file needed: vf_total_msix. That file will give total number of MSI-X vectors that is possible to configure. The same logic (supported/not-supported) can be applicable here as well. The feature itself will be used by orchestration software that will make decisions based on already configured values or future promises and the overall total number. The positive outcome of this scheme that driver stays lean. Thanks > > Alex >