From mboxrd@z Thu Jan  1 00:00:00 1970
From: Keith Busch <keith.busch@intel.com>
Subject: Re: [PATCH] nvme: Acknowledge completion queue on each iteration
Date: Tue, 18 Jul 2017 10:36:17 -0400
Message-ID: <20170718143617.GA7613@localhost.localdomain>
References: <1500330983-27501-1-git-send-email-okaya@codeaurora.org>
 <20170717224551.GA1496@localhost.localdomain>
 <6d10032c-35ec-978c-6b8f-1ab9c07adf7f@codeaurora.org>
 <20170717225615.GB1496@localhost.localdomain>
 <79413407294645f0e1252112c3435a29@codeaurora.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Return-path: <linux-kernel-owner@vger.kernel.org>
Content-Disposition: inline
In-Reply-To: <79413407294645f0e1252112c3435a29@codeaurora.org>
Sender: linux-kernel-owner@vger.kernel.org
To: okaya@codeaurora.org
Cc: linux-nvme@lists.infradead.org, timur@codeaurora.org, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Jens Axboe <axboe@fb.com>, Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>, linux-kernel@vger.kernel.org
List-Id: linux-arm-msm@vger.kernel.org

On Mon, Jul 17, 2017 at 07:07:00PM -0400, okaya@codeaurora.org wrote:
> Maybe, I need to understand the design better. I was curious why completion
> and submission queues were protected by a single lock causing lock
> contention.

Ideally the queues are tied to CPUs, so you couldn't have one thread
submitting to a particular queue-pair while another thread is reaping
completions from it. Such a setup wouldn't get lock contention.

Some machines have so many CPUs, though, that sharing hardware queues
is required. We've experimented with separate submission and completion
locks for such cases, but I've never seen an improved performance as a
result.

From mboxrd@z Thu Jan  1 00:00:00 1970
From: keith.busch@intel.com (Keith Busch)
Date: Tue, 18 Jul 2017 10:36:17 -0400
Subject: [PATCH] nvme: Acknowledge completion queue on each iteration
In-Reply-To: <79413407294645f0e1252112c3435a29@codeaurora.org>
References: <1500330983-27501-1-git-send-email-okaya@codeaurora.org>
 <20170717224551.GA1496@localhost.localdomain>
 <6d10032c-35ec-978c-6b8f-1ab9c07adf7f@codeaurora.org>
 <20170717225615.GB1496@localhost.localdomain>
 <79413407294645f0e1252112c3435a29@codeaurora.org>
Message-ID: <20170718143617.GA7613@localhost.localdomain>

On Mon, Jul 17, 2017@07:07:00PM -0400, okaya@codeaurora.org wrote:
> Maybe, I need to understand the design better. I was curious why completion
> and submission queues were protected by a single lock causing lock
> contention.

Ideally the queues are tied to CPUs, so you couldn't have one thread
submitting to a particular queue-pair while another thread is reaping
completions from it. Such a setup wouldn't get lock contention.

Some machines have so many CPUs, though, that sharing hardware queues
is required. We've experimented with separate submission and completion
locks for such cases, but I've never seen an improved performance as a
result.

From mboxrd@z Thu Jan  1 00:00:00 1970
From: keith.busch@intel.com (Keith Busch)
Date: Tue, 18 Jul 2017 10:36:17 -0400
Subject: [PATCH] nvme: Acknowledge completion queue on each iteration
In-Reply-To: <79413407294645f0e1252112c3435a29@codeaurora.org>
References: <1500330983-27501-1-git-send-email-okaya@codeaurora.org>
 <20170717224551.GA1496@localhost.localdomain>
 <6d10032c-35ec-978c-6b8f-1ab9c07adf7f@codeaurora.org>
 <20170717225615.GB1496@localhost.localdomain>
 <79413407294645f0e1252112c3435a29@codeaurora.org>
Message-ID: <20170718143617.GA7613@localhost.localdomain>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Mon, Jul 17, 2017 at 07:07:00PM -0400, okaya at codeaurora.org wrote:
> Maybe, I need to understand the design better. I was curious why completion
> and submission queues were protected by a single lock causing lock
> contention.

Ideally the queues are tied to CPUs, so you couldn't have one thread
submitting to a particular queue-pair while another thread is reaping
completions from it. Such a setup wouldn't get lock contention.

Some machines have so many CPUs, though, that sharing hardware queues
is required. We've experimented with separate submission and completion
locks for such cases, but I've never seen an improved performance as a
result.