linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Manish Rangankar <mrangankar@marvell.com>
To: Mike Christie <michael.christie@oracle.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"lduncan@suse.com" <lduncan@suse.com>,
	"cleech@redhat.com" <cleech@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	GR-QLogic-Storage-Upstream
	<GR-QLogic-Storage-Upstream@marvell.com>
Subject: RE: [EXT] Re: [PATCH] qedi: Fix cmd_cleanup_cmpl counter mismatch issue.
Date: Wed, 24 Nov 2021 06:05:28 +0000	[thread overview]
Message-ID: <PH0PR18MB4425F4F08057B89453C2222ED8619@PH0PR18MB4425.namprd18.prod.outlook.com> (raw)
In-Reply-To: <9c21c019-d6ff-a908-80e5-51b9c765d118@oracle.com>

> >
> >  check_cleanup_reqs:
> >  	if (qedi_conn->cmd_cleanup_req > 0) {
> > -		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
> > -			  "Freeing tid=0x%x for cid=0x%x\n",
> > -			  cqe->itid, qedi_conn->iscsi_conn_id);
> > -		qedi_conn->cmd_cleanup_cmpl++;
> > +		++qedi_conn->cmd_cleanup_cmpl;
> > +		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
> > +			  "Freeing tid=0x%x for cid=0x%x cleanup count=%d\n",
> > +			  cqe->itid, qedi_conn->iscsi_conn_id,
> > +			  qedi_conn->cmd_cleanup_cmpl);
> 
> Is the issue that cmd_cleanup_cmpl's increment is not seen by
> qedi_cleanup_all_io's wait_event_interruptible_timeout call when it wakes up,
> and your patch fixes this by doing a pre increment?
> 

Yes, cmd_cleanup_cmpl's increment is not seen by qedi_cleanup_all_io's 
wait_event_interruptible_timeout call when it wakes up, even after firmware 
post all the ISCSI_CQE_TYPE_TASK_CLEANUP events for requested cmd_cleanup_req.
Yes, pre increment did addressed this issue. Do you feel otherwise ?

> Does doing a pre increment give you barrier like behavior and is that why this
> works? I thought if wake_up ends up waking up the other thread it does a barrier
> already, so it's not clear to me how changing to a pre-increment helps.
> 
> Is doing a pre-increment a common way to handle this? It looks like we do a
> post increment and wake_up* in other places. However, like in the scsi layer we
> do wake_up_process and memory-barriers.txt says that always does a general
> barrier, so is that why we can do a post increment there?
> 
> Does pre-increment give you barrier like behavior, and is the wake_up call not
> waking up the process so we didn't get a barrier from that, and so that's why this
> works?
> 

Issue happen before calling wake_up. When we gets a ISCSI_CQE_TYPE_TASK_CLEANUP surge on
multiple Rx threads, cmd_cleanup_cmpl tend to miss the increment. The scenario is more similar to
multiple threads access cmd_cleanup_cmpl causing race during postfix increment. This could be because of 
thread reading the same value at a time.

Now that I am explaining it, it felt instead of pre-incrementing cmd_cleanup_cmpl, 
it should be atomic variable. Do see any issue ? 

From logs,
-------------------------------------------------------
[root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_iscsi_cleanup_task:2160" conn_err.log | wc -l
99

[root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_cleanup_all_io:1215" conn_err.log | wc -l
99

[root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_fp_process_cqes:925" conn_err.log | wc -l
99

[root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_fp_process_cqes:922" conn_err.log | wc -l
99

[Thu Oct 21 22:03:32 2021] [0000:a5:00.5]:[qedi_cleanup_all_io:1246]:18: i/o cmd_cleanup_req=99, not equal to cmd_cleanup_cmpl=97, cid=0x0   <<<
[Thu Oct 21 22:03:38 2021] [0000:a5:00.5]:[qedi_clearsq:1299]:18: fatal error, need hard reset, cid=0x0
-----------------------------------------------------

  reply	other threads:[~2021-11-24  6:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 12:21 [PATCH] qedi: Fix cmd_cleanup_cmpl counter mismatch issue Manish Rangankar
2021-11-23 18:04 ` Lee Duncan
2021-11-23 21:16 ` Mike Christie
2021-11-24  6:05   ` Manish Rangankar [this message]
2021-11-24 17:41     ` [EXT] " michael.christie
2021-11-25  5:34       ` Manish Rangankar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR18MB4425F4F08057B89453C2222ED8619@PH0PR18MB4425.namprd18.prod.outlook.com \
    --to=mrangankar@marvell.com \
    --cc=GR-QLogic-Storage-Upstream@marvell.com \
    --cc=cleech@redhat.com \
    --cc=lduncan@suse.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=michael.christie@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).