All of lore.kernel.org
 help / color / mirror / Atom feed
From: Noa Osherovich <noaos@mellanox.com>
To: Christoph Hellwig <hch@lst.de>
Cc: sagi@grimberg.me, linux-rdma@vger.kernel.org,
	Majd Dibbiny <majd@mellanox.com>,
	tj@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Poll CQ syncing problem
Date: Wed, 1 Mar 2017 17:28:41 +0200	[thread overview]
Message-ID: <67049755-56c9-d2ba-c7c1-4a1593a5706f@mellanox.com> (raw)
In-Reply-To: <20170301145124.GA12121@lst.de>

On 3/1/2017 4:51 PM, Christoph Hellwig wrote:

> On Wed, Mar 01, 2017 at 04:30:26PM +0200, Noa Osherovich wrote:
>> Analysis:
>> Since ib_comp_wq isn't single threaded, two works can run in parallel for the same CQ,
>> executing __ib_process_cq.
> They shouldn't.  Each CQ has a single work_struct, and any given work_struct
> should only be executing at once:
>
> "Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all
> workqueues are now non-reentrant - any work item is guaranteed to be
> executed by at most one worker system-wide at any given time."
>
>> Since this function isn't thread safe and the wc array is shared, it causes a data corruption
>> which eventually crashes in the MAD layer due to a double list_del of the same element.
> This should not be the case.  What kernel version are you testing and does
> it contain any patches touching core kernel code?

Thanks Christoph for the quick response.

Currently we see this only in old kernels. I'll investigate this more and update.

WARNING: multiple messages have this Message-ID (diff)
From: Noa Osherovich <noaos@mellanox.com>
To: Christoph Hellwig <hch@lst.de>
Cc: <sagi@grimberg.me>, <linux-rdma@vger.kernel.org>,
	Majd Dibbiny <majd@mellanox.com>, <tj@kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: Poll CQ syncing problem
Date: Wed, 1 Mar 2017 17:28:41 +0200	[thread overview]
Message-ID: <67049755-56c9-d2ba-c7c1-4a1593a5706f@mellanox.com> (raw)
In-Reply-To: <20170301145124.GA12121@lst.de>

On 3/1/2017 4:51 PM, Christoph Hellwig wrote:

> On Wed, Mar 01, 2017 at 04:30:26PM +0200, Noa Osherovich wrote:
>> Analysis:
>> Since ib_comp_wq isn't single threaded, two works can run in parallel for the same CQ,
>> executing __ib_process_cq.
> They shouldn't.  Each CQ has a single work_struct, and any given work_struct
> should only be executing at once:
>
> "Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all
> workqueues are now non-reentrant - any work item is guaranteed to be
> executed by at most one worker system-wide at any given time."
>
>> Since this function isn't thread safe and the wc array is shared, it causes a data corruption
>> which eventually crashes in the MAD layer due to a double list_del of the same element.
> This should not be the case.  What kernel version are you testing and does
> it contain any patches touching core kernel code?

Thanks Christoph for the quick response.

Currently we see this only in old kernels. I'll investigate this more and update.

  reply	other threads:[~2017-03-01 15:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <b4355d22-fc79-c860-de8a-5a4d468c884d@mellanox.com>
     [not found] ` <b4355d22-fc79-c860-de8a-5a4d468c884d-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-01 14:30   ` Poll CQ syncing problem Noa Osherovich
     [not found]     ` <3ba1baab-e2ac-358d-3b3b-ff4a27405c93-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-01 14:51       ` Christoph Hellwig
2017-03-01 14:51         ` Christoph Hellwig
2017-03-01 15:28         ` Noa Osherovich [this message]
2017-03-01 15:28           ` Noa Osherovich
2017-03-01 16:44       ` Sagi Grimberg
     [not found]         ` <0786659a-da12-e8f7-329e-3caa8cc8791f-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-01 16:46           ` Sagi Grimberg
2017-03-02  6:04           ` Noa Osherovich
2017-03-01 16:52       ` Bart Van Assche
     [not found]         ` <1488387143.2699.6.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-02  5:59           ` Noa Osherovich
     [not found]             ` <4d8ac8fd-8ef1-6f6b-177c-2c3ab131f99c-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-02  6:06               ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67049755-56c9-d2ba-c7c1-4a1593a5706f@mellanox.com \
    --to=noaos@mellanox.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=majd@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.