netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Cc: Saeed Mahameed <saeed@kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	Richard Cochran <richardcochran@gmail.com>,
	"Tariq Toukan" <tariqt@nvidia.com>, Gal Pressman <gal@nvidia.com>,
	Vadim Fedorenko <vadim.fedorenko@linux.dev>,
	Andrew Lunn <andrew@lunn.ch>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	"Ahmed Zaki" <ahmed.zaki@intel.com>,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	Hangbin Liu <liuhangbin@gmail.com>,
	"Paul Greenwalt" <paul.greenwalt@intel.com>,
	Justin Stitt <justinstitt@google.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Maxime Chevallier <maxime.chevallier@bootlin.com>,
	Kory Maincent <kory.maincent@bootlin.com>,
	Wojciech Drewek <wojciech.drewek@intel.com>,
	Vladimir Oltean <vladimir.oltean@nxp.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Alexandre Torgue <alexandre.torgue@foss.st.com>,
	Jose Abreu <joabreu@synopsys.com>,
	"Dragos Tatulea" <dtatulea@nvidia.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics
Date: Fri, 8 Mar 2024 14:28:01 -0800	[thread overview]
Message-ID: <b2862eca-bff1-4def-9ac2-2ee426d3c6dd@intel.com> (raw)
In-Reply-To: <87edcljnp0.fsf@nvidia.com>



On 3/7/2024 9:09 PM, Rahul Rameshbabu wrote:
> 
> On Thu, 07 Mar, 2024 19:29:08 -0800 Jacob Keller <jacob.e.keller@intel.com> wrote:
>> On 3/7/2024 10:47 AM, Rahul Rameshbabu wrote:
>>> Hi Jacob,
>>>
>>> On Mon, 26 Feb, 2024 11:54:49 -0800 Jacob Keller <jacob.e.keller@intel.com> wrote:
>>>> On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote:
>>>>>
>>>>> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <jacob.e.keller@intel.com>
>>>>> wrote:
>>>>>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>>>>>> Do you have any example of a case of skipping timestamp information that
>>>>>>> is not related to lack of delivery over time? I am wondering if this
>>>>>>> case is more like a hardware error or not. Or is it more like something
>>>>>>> along the lines of being busy/would impact line rate of timestamp
>>>>>>> information must be recorded?
>>>>>>>
>>>>>>
>>>>>> The main example for skipped is the event where all our slots are full
>>>>>> at point of timestamp request.
>>>>>
>>>>> This is what I was guessing as the main (if not only reason). For this
>>>>> specific reason, I think a general "busy" stats counter makes sense.
>>>>> mlx5 does not need this counter, but I can see a lot of other hw
>>>>> implementations needing this. (The skipped counter name obviously should
>>>>> be left only in the ice driver. Just felt "busy" was easy to understand
>>>>> for generalized counters.)
>>>>
>>>> Yea, I don't expect this would be required for all hardware but it seems
>>>> like a common approach if you have limited slots for Tx timestamps
>>>> available.
>>>>
>>> Sorry to bump this thread once more, but I had a question regarding the
>>> Intel driver in regards to this. Instead of having a busy case when all
>>> the slots are full, would it make sense to stop the netdev queues in
>>> this case, we actually do this in mlx5 (though keep in mind that we have
>>> a dedicated queue just for port/phy timestamping that we start/stop).
>>>
>>> Maybe in your case, you can have a mix of HW timestamping and non-HW
>>> timestamping in the same queue, which is why you have a busy case?
>>>
>>
>> We don't use a dedicated queue. The issue isn't queue capacity so much
>> as it is the number of slots in the PHY for where it can save the
>> timestamp data.
> 
> In mlx5, we use a dedicated queue just for the purpose of HW
> timestamping because we actually do have a similar slot mechanism. We
> call it metadata. We have a limit of 256 entries. We steer PTP traffic
> specifically (though we will be changing this to any HW timestamped
> traffic with the work Kory is doing) to this queue by matching against
> the protocol and port. All other traffic goes to the normal queues that
> cannot consume the timestamping slots. When all the slots are occupied,
> we stop the timestamping queue rather than throwing some busy error.
> 
>>
>> In practice the most common application (ptp4l) synchronously waits for
>> timestamps, and only has one outstanding at a time. Likely due to
>> limitations with original hardware that only supported one outstanding
>> Tx timestamp.
>>
>>> Wanted to inquire about this before sending out a RFC v2.
>>
>> That's actually an interesting approach to change to a dedicated queue
>> which we could lock and start/stop it when the indexes are full. How
>> does that interact with the stack UDP and Ethernet stacks? Presumably
>> when you go to transmit, you'd need to pick a queue and if its stopped
>> you'd have to drop or tell the stack?
> 
> Let me share a pointer in mlx5 for how we do the queue selection. Like I
> mentioned, we steer ptp traffic specifically, but we can change this to
> just steer any skb that indicates hw timestamping.
> 
> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n71
> 
> Then, here is how we manage stopping and waking the queue (we tell the
> core stack about this so we do not have to drop traffic due to some kind
> of busy state because our metadata/slots are all consumed).
> 

Makes sense.

> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n775
> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n257
> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n397
> 
>>
>> I think I remember someone experimenting with returning NETDEV_TX_BUSY
>> when the slots were full, but in practice this caused a lot of issues.
>> None of the other devices we have with only a single slot (one set of
>> registers, ixgbe, i40e, igb, e1000) did that either.
> 
> So we experimented that even with a single slot (we had reasons for
> testing this), the dedicated queue for timestamping worked out nicely. I
> really would suggest investigating this model since I think it might
> play out nicely for the Intel family.
> 
>>
>> If this queue model behaves in a sane way (or if we can communicate
>> something similar by reporting back up the stack without needing a
>> dedicated queue?) that could be better than the current situation.
> 
> I personally really like the dedicated queue in the device drivers, but
> if we want to instead model this slot management work in the core netdev
> stack, I do not think that is a bad endeavor either (when slots are
> full, hw timestamping traffic is held back till they become available).
> I do think the netif_tx_wake_queue/netif_tx_stop_queue + dedicated HW
> timestamping queue does work out nicely.

Ok so if I understand this right, .ndo_select_queue has the stack pick a
queue, and we'd implement this to use the SKB flag. Then whenever the
slots for the queue are full we issue netif_tx_stop_queue, and whenever
the slots are released and we have slots open again we issue
netif_tx_wake_queue..

While the queue is stopped, the stack basically just buffers requests
and doesn't try to call the ndo_do_xmit routine for that queue until the
queue is ready again?

Using a dedicated queue has some other advantages in that it could be
programmed with different priority both from the hardware side (prefer
packets waiting in the timestamp queue) and from the software side
(prioritize CPUs running the threads for processing it). That could be
useful in some applications too...

> 
> Let me know your thoughts on this. If you think it's an interesting idea
> to explore, lets not add the busy counter now in this series. I already
> dropped the late counter. We can add the busy counter later on if you
> feel this model I have shared is not viable for Intel. I wanted to avoid
> introducing too many counters pre-emptively that might not actually be
> consumed widely. I had a thought that what you presented with slots is
> very similar to what we have with metadata in mlx5, so I thought that
> maybe handling the management of these slots in a different way with
> something like a dedicated queue for HW timestamping could make the
> design cleaner.
> 

I think I agree with the queue model, though I'm not sure when I could
get to working on implementing this. I'm fine with dropping the busy
counter from this series.

> --
> Thanks,
> 
> Rahul Rameshbabu

  reply	other threads:[~2024-03-08 22:28 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-23 19:24 [PATCH RFC net-next v1 0/6] ethtool HW timestamping statistics Rahul Rameshbabu
2024-02-23 19:24 ` [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware " Rahul Rameshbabu
2024-02-23 21:07   ` Jacob Keller
2024-02-23 22:21     ` Rahul Rameshbabu
2024-02-23 22:48       ` Jacob Keller
2024-02-23 23:43         ` Rahul Rameshbabu
2024-02-26 19:54           ` Jacob Keller
2024-03-07 18:47             ` Rahul Rameshbabu
2024-03-08  3:29               ` Jacob Keller
2024-03-08  5:09                 ` Rahul Rameshbabu
2024-03-08 22:28                   ` Jacob Keller [this message]
2024-03-08 22:30                     ` Rahul Rameshbabu
2024-02-26  8:59   ` Köry Maincent
2024-02-26 10:09   ` Köry Maincent
2024-02-29  2:05   ` Jakub Kicinski
2024-02-29 22:20     ` Rahul Rameshbabu
2024-02-23 19:24 ` [PATCH RFC net-next v1 2/6] net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQ Rahul Rameshbabu
2024-02-23 19:24 ` [PATCH RFC net-next v1 3/6] net/mlx5e: Introduce timestamps statistic counter for Tx DMA layer Rahul Rameshbabu
2024-02-23 19:24 ` [PATCH RFC net-next v1 4/6] net/mlx5e: Implement ethtool hardware timestamping statistics Rahul Rameshbabu
2024-02-26  9:26   ` Köry Maincent
2024-02-23 19:24 ` [PATCH RFC net-next v1 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD Rahul Rameshbabu
2024-02-23 21:08   ` Jacob Keller
2024-02-23 22:39     ` Rahul Rameshbabu
2024-02-29  2:08       ` Jakub Kicinski
2024-02-23 19:24 ` [PATCH RFC net-next v1 6/6] tools: ynl: ethtool.py: Add ts ethtool statistics group Rahul Rameshbabu
2024-02-23 21:00 ` [PATCH RFC net-next v1 0/6] ethtool HW timestamping statistics Jacob Keller
2024-02-23 21:12   ` Jacob Keller
2024-02-23 22:47   ` Rahul Rameshbabu
2024-03-09  8:44 ` [PATCH RFC v2 " Rahul Rameshbabu
2024-03-09  8:44   ` [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware " Rahul Rameshbabu
2024-03-12 23:53     ` Jakub Kicinski
2024-03-14  0:26       ` Rahul Rameshbabu
2024-03-14  0:41         ` Jakub Kicinski
2024-03-14  0:50           ` Rahul Rameshbabu
2024-03-14  1:40             ` Jakub Kicinski
2024-03-14  4:19               ` Rahul Rameshbabu
2024-03-14 17:50               ` Keller, Jacob E
2024-03-14 18:48                 ` Rahul Rameshbabu
2024-03-14 17:01       ` Rahul Rameshbabu
2024-03-14 17:59         ` Jakub Kicinski
2024-03-14 18:43           ` Rahul Rameshbabu
2024-03-14 19:06             ` Jakub Kicinski
2024-03-14 20:16               ` Rahul Rameshbabu
2024-03-09  8:44   ` [PATCH RFC v2 2/6] net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQ Rahul Rameshbabu
2024-03-09  8:44   ` [PATCH RFC v2 3/6] net/mlx5e: Introduce timestamps statistic counter for Tx DMA layer Rahul Rameshbabu
2024-03-09  8:44   ` [PATCH RFC v2 4/6] net/mlx5e: Implement ethtool hardware timestamping statistics Rahul Rameshbabu
2024-03-09  8:44   ` [PATCH RFC v2 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD Rahul Rameshbabu
2024-03-11 12:43     ` Köry Maincent
2024-03-09  8:44   ` [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation Rahul Rameshbabu
2024-03-12 23:55     ` Jakub Kicinski
2024-03-14  0:22       ` Rahul Rameshbabu
2024-03-14  0:47         ` Jakub Kicinski
2024-03-14  6:07           ` Rahul Rameshbabu
2024-03-14 18:05             ` Jakub Kicinski
2024-03-14 18:39               ` Rahul Rameshbabu
2024-03-14 20:04           ` Jakub Kicinski
2024-03-14 20:05             ` Rahul Rameshbabu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2862eca-bff1-4def-9ac2-2ee426d3c6dd@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=ahmed.zaki@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=alexandre.torgue@foss.st.com \
    --cc=andrew@lunn.ch \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=hkallweit1@gmail.com \
    --cc=jiri@resnulli.us \
    --cc=joabreu@synopsys.com \
    --cc=justinstitt@google.com \
    --cc=kory.maincent@bootlin.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=maxime.chevallier@bootlin.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paul.greenwalt@intel.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=rdunlap@infradead.org \
    --cc=richardcochran@gmail.com \
    --cc=rrameshbabu@nvidia.com \
    --cc=saeed@kernel.org \
    --cc=tariqt@nvidia.com \
    --cc=vadim.fedorenko@linux.dev \
    --cc=vladimir.oltean@nxp.com \
    --cc=wojciech.drewek@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).