From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A548C2D0A3 for ; Mon, 26 Oct 2020 13:00:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C082A2465B for ; Mon, 26 Oct 2020 13:00:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1775988AbgJZNAL (ORCPT ); Mon, 26 Oct 2020 09:00:11 -0400 Received: from vps0.lunn.ch ([185.16.172.187]:44488 "EHLO vps0.lunn.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1773559AbgJZNAK (ORCPT ); Mon, 26 Oct 2020 09:00:10 -0400 Received: from andrew by vps0.lunn.ch with local (Exim 4.94) (envelope-from ) id 1kX26T-003bac-Ue; Mon, 26 Oct 2020 14:00:01 +0100 Date: Mon, 26 Oct 2020 14:00:01 +0100 From: Andrew Lunn To: Xu Yilun Cc: jesse.brandeburg@intel.com, anthony.l.nguyen@intel.com, davem@davemloft.net, kuba@kernel.org, mdf@kernel.org, lee.jones@linaro.org, linux-kernel@vger.kernel.org, linux-fpga@vger.kernel.org, netdev@vger.kernel.org, trix@redhat.com, lgoncalv@redhat.com, hao.wu@intel.com Subject: Re: [RFC PATCH 1/6] docs: networking: add the document for DFL Ether Group driver Message-ID: <20201026130001.GC836546@lunn.ch> References: <1603442745-13085-1-git-send-email-yilun.xu@intel.com> <1603442745-13085-2-git-send-email-yilun.xu@intel.com> <20201023153731.GC718124@lunn.ch> <20201026085246.GC25281@yilunxu-OptiPlex-7050> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201026085246.GC25281@yilunxu-OptiPlex-7050> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > > +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload > > > +networking application acceleration. A simple diagram below to for the board: > > > + > > > + +----------------------------------------+ > > > + | FPGA | > > > ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+ > > > +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 | > > > ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet | > > > + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller| > > > + +-----------+ |offloading| +-----------+ +----------+ > > > + | +----------+ | > > > + | | > > > + +----------------------------------------+ > > > > Is XL710 required? I assume any MAC with the correct MII interface > > will work? > > The XL710 is required for this implementation, in which we have the Host > Side Ether Group facing the host. The Host Side Ether Group actually > contains the same IP blocks as Line Side. It contains the compacted MAC & > PHY functionalities for 25G/40G case. The 25G MAC-PHY soft IP SPEC can > be found at: > > https://www.intel.com/content/www/us/en/programmable/documentation/ewo1447742896786.html > > So raw serial data is output from Host Side FPGA, and XL710 is good to > handle this. What i have seen working with Marvell Ethernet switches, is that Marvell normally recommends connecting them to the Ethernet interfaces of Marvell SoCs. But the switch just needs a compatible MII interface, and lots of boards make use of non-Marvell MAC chips. Freescale FEC is very popular. What i'm trying to say is that ideally we need a collection of generic drivers for the different major components on the board, and a board driver which glues it all together. That then allows somebody to build other boards, or integrate the FPGA directly into an embedded system directly connected to a SoC, etc. > > Do you really mean PHY? I actually expect it is PCS? > > For this implementation, yes. Yes, you have a PHY? Or Yes, it is PCS? To me, the phylib maintainer, having a PHY means you have a base-T interface, 25Gbase-T, 40Gbase-T? That would be an odd and expensive architecture when you should be able to just connect SERDES interfaces together. > > > +The DFL Ether Group driver registers netdev for each line side link. Users > > > +could use standard commands (ethtool, ip, ifconfig) for configuration and > > > +link state/statistics reading. For host side links, they are always connected > > > +to the host ethernet controller, so they should always have same features as > > > +the host ethernet controller. There is no need to register netdevs for them. > > > > So lets say the XL710 is eth0. The line side netif is eth1. Where do i > > put the IP address? What interface do i add to quagga OSPF? > > The IP address should be put in eth0. eth0 should always be used for the > tools. That was what i was afraid of :-) > > The line/host side Ether Group is not the terminal of the network data stream. > Eth1 will not paticipate in the network data exchange to host. > > The main purposes for eth1 are: > 1. For users to monitor the network statistics on Line Side, and by comparing the > statistics between eth0 & eth1, users could get some knowledge of how the User > logic is taking function. > > 2. Get the link state of the front panel. The XL710 is now connected to > Host Side of the FPGA and the its link state would be always on. So to > check the link state of the front panel, we need to query eth1. This is very non-intuitive. We try to avoid this in the kernel and the API to userspace. Ethernet switches are always modelled as accelerators for what the Linux network stack can already do. You configure an Ethernet switch port in just the same way configure any other netdev. You add an IP address to the switch port, you get the Ethernet statistics from the switch port, routing protocols use the switch port. You design needs to be the same. All configuration needs to happen via eth1. Please look at the DSA architecture. What you have here is very similar to a two port DSA switch. In DSA terminology, we would call eth0 the master interface. It needs to be up, but otherwise the user does not configure it. eth1 is the slave interface. It is the user facing interface of the switch. All configuration happens on this interface. Linux can also send/receive packets on this netdev. The slave TX function forwards the frame to the master interface netdev, via a DSA tagger. Frames which eth0 receive are passed through the tagger and then passed to the slave interface. All the infrastructure you need is already in place. Please use it. I'm not saying you need to write a DSA driver, but you should make use of the same ideas and low level hooks in the network stack which DSA uses. > > What about the QSPF socket? Can the host get access to the I2C bus? > > The pins for TX enable, etc. ethtool -m? > > No, the QSPF/I2C are also managed by the BMC firmware, and host doesn't > have interface to talk to BMC firmware about QSPF. So can i even tell what SFP is in the socket? > > > +Speed/Duplex > > > +------------ > > > +The Ether Group doesn't support auto-negotiation. The link speed is fixed to > > > +10G, 25G or 40G full duplex according to which Ether Group IP is programmed. > > > > So that means, if i pop out the SFP and put in a different one which > > supports a different speed, it is expected to be broken until the FPGA > > is reloaded? > > It is expected to be broken. And since i have no access to the SFP information, i have no idea what is actually broken? How i should configure the various layers? > Now the line side is expected to be configured to 4x10G, 4x25G, 2x25G, 1x25G. > host side is expected to be 4x10G or 2x40G for XL710. > > So 4 channel SFP is expected to be inserted to front panel. And we should use > 4x25G SFP, which is compatible to 4x10G connection. So if you had exported the SFP to linux, phylink could of handled some of this for you. Probably with some extensions to phylink, but Russell King would of probably helped you. phylink has a good idea how to decode the SFP EEPROM and figure out the link mode. It has interfaces to configure PCS blocks, So it could probably deal with the line side and host side PCS. And it would of been easy to send a udev notification that the SFP has changed, maybe user space needs to download a different FPGA bit file? So the user would not see a broken interface, the hardware could be reconfigured on the fly. This is one problem i have with this driver. It is based around this somewhat broken reference design. phylib, along with the hacks you have, are enough for this reference design. But really you want to make use of phylink in order to support less limited designs which will follow. Or you need to push a lot more into the BMC, and don't use phylib at all. Andrew