From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3478AC3A589 for ; Sun, 18 Aug 2019 18:26:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 008FF2183E for ; Sun, 18 Aug 2019 18:26:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LtVDRULU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727025AbfHRS0T (ORCPT ); Sun, 18 Aug 2019 14:26:19 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:38147 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726115AbfHRS0R (ORCPT ); Sun, 18 Aug 2019 14:26:17 -0400 Received: by mail-wr1-f66.google.com with SMTP id g17so6397924wrr.5; Sun, 18 Aug 2019 11:26:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=Y2pLIKi2zQEQInFkEj/i59d6Ic1scSJw78P06TC77Zg=; b=LtVDRULUooPZYnfuK/C0nD8GYUNfQQArmC0zD9or1yu8kY9lXaKZrblTdjY61a1/Em YVVaPiY+18e0L0A488lMemhT0KfrmkL//qC3PfRLjCnnQbqgyr2krT9V3srHgX7g9pmk 0tRpgQPP/AXZqfQcKfMe8hwmp7pKvLuP0P5oot0URTUfDQwY0nC1FzH1GoGUwL+ggL0L a28FHiNPX6z6E17MmJkgFvBOv/Tsuv8jYBunZoClEVXkwwqfaVr8CQ4t3IzeVPiND4d0 YjAp49O+ZgQTsxOidpbZaCdpB1uQ1aVT3ZuRnPqACNfLwya0glhVIIfTSGq7hijJXJXM 2BCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=Y2pLIKi2zQEQInFkEj/i59d6Ic1scSJw78P06TC77Zg=; b=GqUFy4s0aF4VcXDA+VRQvnpdI9+tjFl3Qg3YHp/hAQKoVpOBOhdM6kxIN/XFQrAjn+ vPnn62Yp4UcHMvRxAjwS1Wu5d4vNBMG5IGaRGMpk/8OvxLfpXLGWMPWLDpsW6NYznoGY LB1xqXfQnUAplODe7d822qH+KHB5CqQUP7YnuJhAyBy/LHe+lA0XuLXMj4SXmFsMwxaE 3Hv4sD5V4m+xibSXq7LfjRLpvw55qmcQKphI7APBb5QrLRSWxa3yirvmvlGOaW2KfxNX QBwxm7784FKp1LMdkA7jjpo9ZD7wkBqS9MTtcVvzZQ+gTAOMxXY/5aFDrNoRyrCmhdnF Xc+g== X-Gm-Message-State: APjAAAVr8M9btIYmfKYG54GoOOdL7QKWTZH2F4Cr7DCRQQCjssqlNaEg S2TmbQbdietdzqBvsilNKgIIo/Ao X-Google-Smtp-Source: APXvYqw2eUybIwkcYZs+Tz1Lh6DZOOgvSrdUUyv2l5+BdgRz20evNuuZTh7UE85nd4GE2IYZlti/pw== X-Received: by 2002:a5d:4b8b:: with SMTP id b11mr23332359wrt.294.1566152773095; Sun, 18 Aug 2019 11:26:13 -0700 (PDT) Received: from localhost.localdomain ([188.25.91.80]) by smtp.gmail.com with ESMTPSA id 39sm40831107wrc.45.2019.08.18.11.26.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2019 11:26:12 -0700 (PDT) From: Vladimir Oltean To: broonie@kernel.org, h.feurstein@gmail.com, mlichvar@redhat.com, richardcochran@gmail.com, andrew@lunn.ch, f.fainelli@gmail.com Cc: linux-spi@vger.kernel.org, netdev@vger.kernel.org, Vladimir Oltean Subject: [PATCH spi for-5.4 0/5] Deterministic SPI latency with NXP DSPI driver Date: Sun, 18 Aug 2019 21:25:55 +0300 Message-Id: <20190818182600.3047-1-olteanv@gmail.com> X-Mailer: git-send-email 2.17.1 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patchset proposes an interface from the SPI subsystem for software timestamping SPI transfers. There is a default implementation provided in the core, as well as a mechanism for SPI slave drivers to check which byte was in fact timestamped post-facto. The patchset also adds the first user of this interface (the NXP DSPI driver in TCFQ mode). The interface is somewhat similar to Hubert Feurstein's proposal for the MDIO subsystem: https://lkml.org/lkml/2019/8/16/638 Original cover letter below. Also provided at the end some results with an extra test (J - phc2sys using the timestamps taken by the SPI core). =========================================================== Continuing the discussion created by Hubert Feurstein around the mv88e6xxx driver for MDIO-controlled switches (https://lkml.org/lkml/2019/8/2/1364), this patchset takes a similar approach for the NXP LS1021A-TSN board, which has a SPI-controlled DSA switch (SJA1105). The patchset is motivated by some experiments done with a logic analyzer, trying to understand the source of latency (and especially of the jitter). SJA1105 SPI messages for reading the PTP clock are 12 bytes in length: 4 for the SPI header and 8 for the timestamp. When looking at the messages with a scope, there's jitter basically everywhere: between bits of a frame and between frames in a transfer. The inter-bit jitter is hardware and impacts us to a lesser extend (is smaller and caused by the PVT stability of the oscillators, PLLs, etc). We will focus on the latency between consecutive SPI frames within a 12-byte transfer. As a preface, revisions of the DSPI controller IP are integrated in many Freescale/NXP devices. As a result, the driver has 3 modes of operation: - TCFQ (Transfer Complete Flag mode): The controller signals software that data has been sent/received after each individual word. - EOQ (End of Queue mode): The driver can implement batching by making use of the controller's 4-word deep FIFO. - DMA (Direct Memory Access mode): The SPI controller's FIFO is no longer in direct interaction with the driver, but is used to trigger the RX and TX channels of the eDMA module on the SoC. In LS1021A, the driver works in the least efficient mode of the 3 (TCFQ). There is a well-known errata that the DSPI controller is broken in conjunction with the eDMA module. As for the EOQ mode, I have tried unsuccessfully for a few days to make use of the 4 entry FIFO, and the hardware simply fails to reliably acknowledge the transmission when the FIFO gets full. So it looks like we're stuck with the TCFQ mode. The problem with phc2sys on the LS1021A-TSN board is that in order for the gettime64() call to complete on the sja1105, the system has to service 12 IRQs. Intuitively that is excessive and is the main source of jitter, but let's not get ahead of ourselves. An outline of the experiments that were done (unless otherwise mentioned, all of these ran for 120 seconds): A. First I have measured the (poor) performance of phc2sys under current conditions. (DSPI driver in IRQ mode, no PTP system timestamping) offset: min -53310 max 16107 mean -1737.18 std dev 11444.3 delay: min 163680 max 237360 mean 201149 std dev 22446.6 lost servo lock 1 times B. I switched the .gettime64 callback to .gettimex64, snapshotting the PTP system timestamp within the sja1105 driver. offset: min -48923 max 64217 mean -904.137 std dev 17358.1 delay: min 149600 max 203840 mean 169045 std dev 17993.3 lost servo lock 8 times C. I patched "struct spi_transfer" to contain the PTP system timestamp, and from the sja1105 driver, I passed this structure to be snapshotted by the SPI controller's driver (spi-fsl-dspi). This is the "transfer-level" snapshot. offset: min -64979 max 38979 mean -416.197 std dev 15367.9 delay: min 125120 max 168320 mean 150286 std dev 17675.3 lost servo lock 10 times D. I changed the placement of the transfer snapshotting within the DSPI driver, from "transfer-level" to "byte-level". offset: min -9021 max 7149 mean -0.418803 std dev 3529.81 delay: min 7840 max 23920 mean 14493.7 std dev 5982.17 lost servo lock 0 times E. I moved the DSPI driver to poll mode. I went back to collecting the PTP system timestamps from the sja1105 driver (same as B). offset: min -4199 max 46643 mean 418.214 std dev 4554.01 delay: min 84000 max 194000 mean 99463.2 std dev 12936.5 lost servo lock 1 times F. Transfer-level snapshotting in the DSPI driver (same as C), but in poll mode. offset: min -24244 max 1115 mean -230.478 std dev 2297.28 delay: min 69440 max 119040 mean 70312.9 std dev 8065.34 lost servo lock 1 times G. Byte-level snapshotting (same as D) but in poll mode. offset: min -314 max 288 mean -2.48718 std dev 118.045 delay: min 4880 max 6000 mean 5118.63 std dev 507.258 lost servo lock 0 times This seemed suspiciously good to me, so I let it run for longer (58 minutes): offset: min -26251 max 16416 mean -21.8672 std dev 863.416 delay: min 4720 max 57280 mean 5182.49 std dev 1607.19 lost servo lock 3 times H. Transfer-level snapshotting (same as F), but with IRQs disabled. This ran for 86 minutes. offset: min -1927 max 1843 mean -0.209203 std dev 529.398 delay: min 85440 max 93680 mean 88245 std dev 1454.71 lost servo lock 0 times I. Byte-level snapshotting (same as G), but with IRQs disabled. This ran for 102 minutes. offset: min -378 max 381 mean -0.0083089 std dev 101.495 delay: min 4720 max 5920 mean 5129.38 std dev 154.899 lost servo lock 0 times J. Default snapshotting taken by the SPI core, with the DSPI driver running in poll mode, IRQs enabled. This ran for 274 minutes. offset: min -42568 max 44576 mean 2.91646 std dev 947.467 delay: min 58480 max 171040 mean 80750.7 std dev 2001.61 lost servo lock 3 times As a result, this patchset proposes the implementation of scenario I. The others were done through temporary patches which are not presented here due to the difficulty of presenting a coherent git history without resorting to reverts etc. The gist of each experiment should be clear though. The raw data is available for dissection at https://drive.google.com/open?id=1r9raU9ZeqOqkqts6Lb-ISf5ubLDLP3wk. The logic analyzer captures can be opened with a free-as-in-beer program provided by Saleae: https://www.saleae.com/downloads/. In the capture data one can find the MOSI, SCK SPI signals, as well as a debug GPIO which was toggled at the same time as the PTP system timestamp was taken, to give the viewer an impression of what the software is capturing compared to the actual timing of the SPI transfer. Attached are also some close-up screenshots of transfers where there is a clear and huge delay in-between frames of the same 12-byte SPI transfer. As it turns out, these were all caused by the CPU getting interrupted by some other IRQ. Approaches H and I are the only ones that get rid of these glitches. In theory, the byte-level snapshotting should be less vulnerable to an IRQ interrupting the SPI transfer (because the time window is much smaller) but as the 58 minutes experiment shows, it is not immune. Vladimir Oltean (5): spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages spi: Add a PTP system timestamp to the transfer structure spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode transfer drivers/spi/spi-fsl-dspi.c | 117 +++++++++++++++++++++++++++++++------ drivers/spi/spi.c | 85 +++++++++++++++++++++++---- include/linux/spi/spi.h | 38 ++++++++++++ 3 files changed, 210 insertions(+), 30 deletions(-) -- 2.17.1