netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sudarsana Reddy Kalluru <skalluru@marvell.com>
To: "Guilherme G. Piccoli" <gpiccoli@canonical.com>,
	GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: Ariel Elior <aelior@marvell.com>,
	"jay.vosburgh@canonical.com" <jay.vosburgh@canonical.com>
Subject: RE: [EXT] [PATCH V3] bnx2x: Prevent ptp_task to be rescheduled indefinitely
Date: Thu, 27 Jun 2019 10:09:44 +0000	[thread overview]
Message-ID: <MN2PR18MB2528E0CB660FC35C475816E1D3FD0@MN2PR18MB2528.namprd18.prod.outlook.com> (raw)
In-Reply-To: <20190626201835.31660-1-gpiccoli@canonical.com>


> -----Original Message-----
> From: Guilherme G. Piccoli <gpiccoli@canonical.com>
> Sent: Thursday, June 27, 2019 1:49 AM
> To: GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>;
> netdev@vger.kernel.org; Sudarsana Reddy Kalluru <skalluru@marvell.com>
> Cc: Ariel Elior <aelior@marvell.com>; gpiccoli@canonical.com;
> jay.vosburgh@canonical.com
> Subject: [EXT] [PATCH V3] bnx2x: Prevent ptp_task to be rescheduled
> indefinitely
> 
> External Email
> 
> ----------------------------------------------------------------------
> Currently bnx2x ptp worker tries to read a register with timestamp
> information in case of TX packet timestamping and in case it fails, the routine
> reschedules itself indefinitely. This was reported as a kworker always at 100%
> of CPU usage, which was narrowed down to be bnx2x ptp_task.
> 
> By following the ioctl handler, we could narrow down the problem to an NTP
> tool (chrony) requesting HW timestamping from bnx2x NIC with RX filter
> zeroed; this isn't reproducible for example with ptp4l (from linuxptp) since
> this tool requests a supported RX filter.
> It seems NIC FW timestamp mechanism cannot work well with
> RX_FILTER_NONE - driver's PTP filter init routine skips a register write to the
> adapter if there's not a supported filter request.
> 
> This patch addresses the problem of bnx2x ptp thread's everlasting
> reschedule by retrying the register read 10 times; between the read
> attempts the thread sleeps for an increasing amount of time starting in 1ms
> to give FW some time to perform the timestamping. If it still fails after all
> retries, we bail out in order to prevent an unbound resource consumption
> from bnx2x.
> 
> The patch also adds an ethtool statistic for accounting the skipped TX
> timestamp packets and it reduces the priority of timestamping error
> messages to prevent log flooding. The code was tested using both linuxptp
> and chrony.
> 
> Reported-and-tested-by: Przemyslaw Hausman
> <przemyslaw.hausman@canonical.com>
> Suggested-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
> 
> Sudarsana, thanks again for your feedback. I've reduced the sleep times to
> start in 1ms and goes up to 512ms - the sum of sleep times is 1023ms, but
> due to the inherent overhead in sleeping/waking-up procedure, I've
> measured the total delay in the register read loop (on bnx2x_ptp_task) to be
> ~1.6 seconds.  It is almost the 2s value you first suggested as PTP_TIMEOUT.
> 
>  .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   | 12 +++++--
>  .../ethernet/broadcom/bnx2x/bnx2x_ethtool.c   |  4 ++-
>  .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  | 36 ++++++++++++++----
> -  .../net/ethernet/broadcom/bnx2x/bnx2x_stats.h |  3 ++
>  4 files changed, 43 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> index 008ad0ca89ba..6751cd04e8d8 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> @@ -3857,9 +3857,17 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff
> *skb, struct net_device *dev)
> 
>  	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
>  		if (!(bp->flags & TX_TIMESTAMPING_EN)) {
> -			BNX2X_ERR("Tx timestamping was not enabled, this
> packet will not be timestamped\n");
> +			bp->eth_stats.ptp_skip_tx_ts++;
> +			netdev_err_once(bp->dev,
> +					"Tx timestamping isn't enabled, this
> packet won't be timestamped\n");
> +			DP(BNX2X_MSG_PTP,
> +			   "Tx timestamping isn't enabled, this packet won't
> be
> +timestamped\n");

Hitting this path is very unlikely and also PTP packets arrive once in a second in general. Either retain BNX2X_ERR() statement or remove the extra call netdev_err_once().

>  		} else if (bp->ptp_tx_skb) {
> -			BNX2X_ERR("The device supports only a single
> outstanding packet to timestamp, this packet will not be timestamped\n");
> +			bp->eth_stats.ptp_skip_tx_ts++;
> +			netdev_err_once(bp->dev,
> +					"Device supports only a single
> outstanding packet to timestamp, this packet won't be timestamped\n");
> +			DP(BNX2X_MSG_PTP,
> +			   "Device supports only a single outstanding packet to
> timestamp,
> +this packet won't be timestamped\n");
Same as above.

>  		} else {
>  			skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
>  			/* schedule check for Tx timestamp */ diff --git
> a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> index 51fc845de31a..4a0ba6801c9e 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> @@ -182,7 +182,9 @@ static const struct {
>  	{ STATS_OFFSET32(driver_filtered_tx_pkt),
>  				4, false, "driver_filtered_tx_pkt" },
>  	{ STATS_OFFSET32(eee_tx_lpi),
> -				4, true, "Tx LPI entry count"}
> +				4, true, "Tx LPI entry count"},
> +	{ STATS_OFFSET32(ptp_skip_tx_ts),
> +				4, false, "ptp_skipped_tx_tstamp" },
>  };
> 
>  #define BNX2X_NUM_STATS		ARRAY_SIZE(bnx2x_stats_arr)
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 03ac10b1cd1e..af6e7a950a28 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -15214,11 +15214,27 @@ static void bnx2x_ptp_task(struct work_struct
> *work)
>  	u32 val_seq;
>  	u64 timestamp, ns;
>  	struct skb_shared_hwtstamps shhwtstamps;
> +	bool bail = true;
> +	int i;
> 
> -	/* Read Tx timestamp registers */
> -	val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
> -			 NIG_REG_P0_TLLH_PTP_BUF_SEQID);
> -	if (val_seq & 0x10000) {
> +	/* FW may take a while to complete timestamping; try a bit and if it's
> +	 * still not complete, may indicate an error state - bail out then.
> +	 */
> +	for (i = 0; i < 10; i++) {
> +		/* Read Tx timestamp registers */
> +		val_seq = REG_RD(bp, port ?
> NIG_REG_P1_TLLH_PTP_BUF_SEQID :
> +				 NIG_REG_P0_TLLH_PTP_BUF_SEQID);
> +		if (val_seq & 0x10000) {
> +			bail = false;
> +			break;
> +		}
> +
> +		if (!(i % 4)) /* Avoid log flood */
> +			DP(BNX2X_MSG_PTP, "There's no valid Tx timestamp
> yet\n");
This debug statement is not required as we anyway capture it in the error path below.

> +		msleep(1 << i);
> +	}
> +
> +	if (!bail) {
>  		/* There is a valid timestamp value */
>  		timestamp = REG_RD(bp, port ?
> NIG_REG_P1_TLLH_PTP_BUF_TS_MSB :
>  				   NIG_REG_P0_TLLH_PTP_BUF_TS_MSB);
> @@ -15233,16 +15249,18 @@ static void bnx2x_ptp_task(struct work_struct
> *work)
>  		memset(&shhwtstamps, 0, sizeof(shhwtstamps));
>  		shhwtstamps.hwtstamp = ns_to_ktime(ns);
>  		skb_tstamp_tx(bp->ptp_tx_skb, &shhwtstamps);
> -		dev_kfree_skb_any(bp->ptp_tx_skb);
> -		bp->ptp_tx_skb = NULL;
> 
>  		DP(BNX2X_MSG_PTP, "Tx timestamp, timestamp cycles =
> %llu, ns = %llu\n",
>  		   timestamp, ns);
>  	} else {
> -		DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp
> yet\n");
> -		/* Reschedule to keep checking for a valid timestamp value
> */
> -		schedule_work(&bp->ptp_task);
> +		DP(BNX2X_MSG_PTP,
> +		   "Tx timestamp is not recorded (register read=%u)\n",
> +		   val_seq);
> +		bp->eth_stats.ptp_skip_tx_ts++;
>  	}
> +
> +	dev_kfree_skb_any(bp->ptp_tx_skb);
> +	bp->ptp_tx_skb = NULL;
>  }
> 
>  void bnx2x_set_rx_ts(struct bnx2x *bp, struct sk_buff *skb) diff --git
> a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
> index b2644ed13d06..d55e63692cf3 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
> @@ -207,6 +207,9 @@ struct bnx2x_eth_stats {
>  	u32 driver_filtered_tx_pkt;
>  	/* src: Clear-on-Read register; Will not survive PMF Migration */
>  	u32 eee_tx_lpi;
> +
> +	/* PTP */
> +	u32 ptp_skip_tx_ts;
The value need to be cleared in the case of internal reload e.g., mtu change, ifconfig-down/up. If this is not happening, please reset it in the nic load path.

>  };
> 
>  struct bnx2x_eth_q_stats {
> --
> 2.22.0


  reply	other threads:[~2019-06-27 10:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-26 20:18 [PATCH V3] bnx2x: Prevent ptp_task to be rescheduled indefinitely Guilherme G. Piccoli
2019-06-27 10:09 ` Sudarsana Reddy Kalluru [this message]
2019-06-27 16:20   ` [EXT] " Guilherme G. Piccoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR18MB2528E0CB660FC35C475816E1D3FD0@MN2PR18MB2528.namprd18.prod.outlook.com \
    --to=skalluru@marvell.com \
    --cc=GR-everest-linux-l2@marvell.com \
    --cc=aelior@marvell.com \
    --cc=gpiccoli@canonical.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).