From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 697E8C43381 for ; Tue, 5 Mar 2019 02:11:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2E7BC20684 for ; Tue, 5 Mar 2019 02:11:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="Enpa9IAa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726909AbfCECLP (ORCPT ); Mon, 4 Mar 2019 21:11:15 -0500 Received: from mail-qt1-f196.google.com ([209.85.160.196]:42180 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726882AbfCECLP (ORCPT ); Mon, 4 Mar 2019 21:11:15 -0500 Received: by mail-qt1-f196.google.com with SMTP id u7so7444063qtg.9 for ; Mon, 04 Mar 2019 18:11:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=Hmsov/tUGayUYfSH66yCJmcNLYIoXVp1CP7yUCr97zs=; b=Enpa9IAa8+2R2d8syZu7KTDst6oqokfcv0JzebopCL0tk+Ph5nhZROuph+i7GYdCQZ VR4OXSGOwBKUP3RBP4XhFekiOvfrPAInEKspNGHm/ULuCdZHIDvtMC05IPX6QzJcBfDH uAglg3i786vJR67F2k+jNBxBdsGY4+Dol33gSB9RUwmGQvPMUXukC67VBAZ7otvfmKwU py6wkElujcJESby8tdwBT52j/0o9JeS+EuU8brRdMs44HGxEuNphOq26C0tpBjmDuiqF 1/Ur2vbccfqh3tk3yULtA3vFiuqYU8iyemWslIu4S1AI1bmT6qiIBdfpu+5JmGLamj93 S8+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=Hmsov/tUGayUYfSH66yCJmcNLYIoXVp1CP7yUCr97zs=; b=JiRmmYO0l9A4vavkhW4GtfhRMDM51atM9lCGGS1dRg+aMh5zuLqjdn69GhSRFFUOxJ 7sgH+F77YXGYc1AwCtXGwpcxI8jGh0zypWrBGmMMJgF7uG5KrVxuq8662IC6u+vBhaC/ lkiax5wXOBExA8j8qyM3azp+/lHnFgYux/X+1M/UbXSz21vqzCieGBgK6ZWCXH6xzZmH M5FIT+z0pN2Rg0ZHKRUCQz8xORsfq8o4aehy0+Q6ZgMPRASDrIR1yl10bbqHyMtYpqiF uQGnFJmZTCRte3Zbl0BaT8mFA7voAuARFA0nsjzRxf+YJFQyWo+3MiNpcsQmLDlyLQDH aqVA== X-Gm-Message-State: APjAAAWMS75the17fsEB87WQ1jfgLXZgiCYw6pU5bfwuwPWx9Hf/UVE+ OnnvbOK5x6AfcHZs3Oi7wqlQfw== X-Google-Smtp-Source: APXvYqxcINYuvxTIqd9nXGVW3ROTBhV1E223n8kgmqYKsSRlOOqGuP/z2sIfn0mcKQqdzYVrkptdug== X-Received: by 2002:a0c:b9ae:: with SMTP id v46mr332951qvf.19.1551751874438; Mon, 04 Mar 2019 18:11:14 -0800 (PST) Received: from cakuba.netronome.com ([66.60.152.14]) by smtp.gmail.com with ESMTPSA id x43sm6087415qtc.10.2019.03.04.18.11.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 04 Mar 2019 18:11:14 -0800 (PST) Date: Mon, 4 Mar 2019 18:11:07 -0800 From: Jakub Kicinski To: Jason Gunthorpe Cc: Jiri Pirko , "davem@davemloft.net" , "oss-drivers@netronome.com" , "netdev@vger.kernel.org" , Parav Pandit Subject: Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports Message-ID: <20190304181107.1379e358@cakuba.netronome.com> In-Reply-To: <20190305013013.GK8627@mellanox.com> References: <20190226182436.23811-1-jakub.kicinski@netronome.com> <20190226182436.23811-5-jakub.kicinski@netronome.com> <20190227123753.GB2240@nanopsycho> <20190227103000.6ea6f7c0@cakuba.netronome.com> <20190304161510.GO8627@mellanox.com> <20190304170320.10e40255@cakuba.netronome.com> <20190305013013.GK8627@mellanox.com> Organization: Netronome Systems, Ltd. MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, 5 Mar 2019 01:30:19 +0000, Jason Gunthorpe wrote: > On Mon, Mar 04, 2019 at 05:03:20PM -0800, Jakub Kicinski wrote: > > > > Don't we already have devlink instances for every mlx5 physical port > > > and VF as they are unique PCI functions? > > > > That's a very NIC-centric view of the world, though. Equating devlink > > instances to ports, and further to PCI devices. Its fundamentally > > different from what switches and some NICs do, where all ports are under > > single devlink instance. > > I think, as a practical matter, it is a bit hard to recombine an asic > that presents multiple PCI BDFs into a single SW object. It is tricky > to give stable labels to things, to leave gaps to allow for uncertain > discovey, to co-ordinate between multiple struct pci_device drivers > probe functions, etc. It is tricky indeed, hence my so far unsuccessful search for a stable handle :/ One thing which would not make things easier tho, is if we objects we use to model this scenario don't have clear meanings... > And at least with devlink, if you have a object layer that is broader > then PCI BDF, how do the devlink commands work? Are all BDFs just an > alias for this hidden super object? My thinking was that they'd alias. > Do any drivers attempt to provide single instant made up of merged > BDFs? Not yet, but our NFP can do it. NFP used to be single PF per host, which made life easier, but the silicon team was persuaded to remove that comfort :) > In other words, is a PCI BDF really the largest granularity that > devlink can address today? Yes, DBDF is the largest today, _but_ most advanced devices (mlxsw, nfp) have only one PF per host. IOW we existed blissfully in a world where devices either pipelined from port to PF or had only one PF. > At least in RDMA we have drivers doing all combinations of this: > multiple ports per BDF, one port per BDF, and one composite RDMA > device formed by combining multiple BDFs worth of ports together. Right, last but not least we have the case where there is one port but multiple links (for NUMA, or just because 1 PCIe link can't really cope with 200Gbps). In that case which DBDF would the port go to? :( Do all internal info of the ASIC (health, regions, sbs) get registered twice? > > > > You guys come from the RDMA side of the world, with which I'm less > > > > familiar, and the soft bus + spawning devices seems to be a popular > > > > design there. Could you describe the advantages of that model for > > > > the sake of the netdev-only folks? :) > > > > > > I don't think we do this in RDMA at all yet, or maybe I'm not sure > > > what you are thinking of? > > > > Mm.. I caught an Intel patch set recently which was talking about buses > > and spawning devices. It must have been a different kettle of fish. > > That sounds like scalable iov.. > > Jason