All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>, Sage Weil <sage@newdream.net>,
	ceph-devel@vger.kernel.org
Subject: [PATCH 23/32] net/ceph: make ceph_msgr_wq non-reentrant
Date: Mon,  3 Jan 2011 14:49:46 +0100	[thread overview]
Message-ID: <1294062595-30097-24-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1294062595-30097-1-git-send-email-tj@kernel.org>

ceph messenger code does a rather complex dancing around multithread
workqueue to make sure the same work item isn't executed concurrently
on different CPUs.  This restriction can be provided by workqueue with
WQ_NON_REENTRANT.

Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
level and remove the QUEUED/BUSY logic.

* This removes backoff handling in con_work() but it couldn't reliably
  block execution of con_work() to begin with - queue_con() can be
  called after the work started but before BUSY is set.  It seems that
  it was an optimization for a rather cold path and can be safely
  removed.

* The number of concurrent work items is bound by the number of
  connections and connetions are independent from each other.  With
  the default concurrency level, different connections will be
  executed independently.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
---
Only compile tested.  I think the dropping of backoff logic is safe
but am not completely sure.  Please verify it's actually okay.  Feel
free to take it into the subsystem tree or simply ack - I'll route it
through the wq tree.

Thanks.

 include/linux/ceph/messenger.h |    5 ----
 net/ceph/messenger.c           |   46 +--------------------------------------
 2 files changed, 2 insertions(+), 49 deletions(-)

diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index a108b42..c3011be 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -110,17 +110,12 @@ struct ceph_msg_pos {
 
 /*
  * ceph_connection state bit flags
- *
- * QUEUED and BUSY are used together to ensure that only a single
- * thread is currently opening, reading or writing data to the socket.
  */
 #define LOSSYTX         0  /* we can close channel or drop messages on errors */
 #define CONNECTING	1
 #define NEGOTIATING	2
 #define KEEPALIVE_PENDING      3
 #define WRITE_PENDING	4  /* we have data ready to send */
-#define QUEUED          5  /* there is work queued on this connection */
-#define BUSY            6  /* work is being done */
 #define STANDBY		8  /* no outgoing messages, socket closed.  we keep
 			    * the ceph_connection around to maintain shared
 			    * state with the peer. */
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index b6ff4a1..dff633d 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -96,7 +96,7 @@ struct workqueue_struct *ceph_msgr_wq;
 
 int ceph_msgr_init(void)
 {
-	ceph_msgr_wq = create_workqueue("ceph-msgr");
+	ceph_msgr_wq = alloc_workqueue("ceph-msgr", WQ_NON_REENTRANT, 0);
 	if (!ceph_msgr_wq) {
 		pr_err("msgr_init failed to create workqueue\n");
 		return -ENOMEM;
@@ -1920,20 +1920,6 @@ bad_tag:
 /*
  * Atomically queue work on a connection.  Bump @con reference to
  * avoid races with connection teardown.
- *
- * There is some trickery going on with QUEUED and BUSY because we
- * only want a _single_ thread operating on each connection at any
- * point in time, but we want to use all available CPUs.
- *
- * The worker thread only proceeds if it can atomically set BUSY.  It
- * clears QUEUED and does it's thing.  When it thinks it's done, it
- * clears BUSY, then rechecks QUEUED.. if it's set again, it loops
- * (tries again to set BUSY).
- *
- * To queue work, we first set QUEUED, _then_ if BUSY isn't set, we
- * try to queue work.  If that fails (work is already queued, or BUSY)
- * we give up (work also already being done or is queued) but leave QUEUED
- * set so that the worker thread will loop if necessary.
  */
 static void queue_con(struct ceph_connection *con)
 {
@@ -1948,11 +1934,7 @@ static void queue_con(struct ceph_connection *con)
 		return;
 	}
 
-	set_bit(QUEUED, &con->state);
-	if (test_bit(BUSY, &con->state)) {
-		dout("queue_con %p - already BUSY\n", con);
-		con->ops->put(con);
-	} else if (!queue_work(ceph_msgr_wq, &con->work.work)) {
+	if (!queue_delayed_work(ceph_msgr_wq, &con->work, 0)) {
 		dout("queue_con %p - already queued\n", con);
 		con->ops->put(con);
 	} else {
@@ -1967,15 +1949,6 @@ static void con_work(struct work_struct *work)
 {
 	struct ceph_connection *con = container_of(work, struct ceph_connection,
 						   work.work);
-	int backoff = 0;
-
-more:
-	if (test_and_set_bit(BUSY, &con->state) != 0) {
-		dout("con_work %p BUSY already set\n", con);
-		goto out;
-	}
-	dout("con_work %p start, clearing QUEUED\n", con);
-	clear_bit(QUEUED, &con->state);
 
 	mutex_lock(&con->mutex);
 
@@ -1994,28 +1967,13 @@ more:
 	    try_read(con) < 0 ||
 	    try_write(con) < 0) {
 		mutex_unlock(&con->mutex);
-		backoff = 1;
 		ceph_fault(con);     /* error/fault path */
 		goto done_unlocked;
 	}
 
 done:
 	mutex_unlock(&con->mutex);
-
 done_unlocked:
-	clear_bit(BUSY, &con->state);
-	dout("con->state=%lu\n", con->state);
-	if (test_bit(QUEUED, &con->state)) {
-		if (!backoff || test_bit(OPENING, &con->state)) {
-			dout("con_work %p QUEUED reset, looping\n", con);
-			goto more;
-		}
-		dout("con_work %p QUEUED reset, but just faulted\n", con);
-		clear_bit(QUEUED, &con->state);
-	}
-	dout("con_work %p done\n", con);
-
-out:
 	con->ops->put(con);
 }
 
-- 
1.7.1


  parent reply	other threads:[~2011-01-03 13:52 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-03 13:49 [PATCHSET] workqueue: update workqueue users - replace create_workqueue() Tejun Heo
2011-01-03 13:49 ` [PATCH 01/32] arm/omap: use system_wq in mailbox Tejun Heo
2011-01-03 21:35   ` Kanigeri, Hari
2011-01-04  5:24     ` Tejun Heo
2011-01-25 13:47       ` Tejun Heo
2011-01-25 15:34         ` Hari Kanigeri
2011-01-25 15:34           ` Hari Kanigeri
2011-01-25 15:37           ` Tejun Heo
2011-01-03 13:49 ` [PATCH 02/32] powerpc/cell: use system_wq in cpufreq_spudemand Tejun Heo
2011-01-03 13:49   ` Tejun Heo
2011-01-03 13:49 ` [PATCH 03/32] block: make kblockd_workqueue smarter Tejun Heo
2011-01-03 14:00   ` Jens Axboe
2011-01-03 13:49 ` [PATCH 04/32] bio-integrity: mark kintegrityd_wq highpri and CPU intensive Tejun Heo
2011-01-03 13:49 ` [PATCH 05/32] crypto: mark crypto workqueues CPU_INTENSIVE Tejun Heo
2011-01-03 13:49   ` Tejun Heo
2011-01-04  4:38   ` Herbert Xu
2011-01-04  4:38     ` Herbert Xu
2011-01-03 13:49 ` [PATCH 06/32] acpi: kacpi*_wq don't need WQ_MEM_RECLAIM Tejun Heo
2011-01-03 13:49 ` [PATCH 07/32] cpufreq: use system_wq instead of dedicated workqueues Tejun Heo
2011-01-03 13:49 ` [PATCH 08/32] drm/nouveau: use system_wq instead of dev_priv->wq Tejun Heo
2011-01-03 13:49   ` Tejun Heo
2011-01-05  1:07   ` Ben Skeggs
2011-01-05  1:16     ` Ben Skeggs
2011-01-06 17:29       ` Tejun Heo
2011-01-26 16:49         ` [PATCH UPDATED " Tejun Heo
2011-02-01 10:41           ` Tejun Heo
2011-02-04  1:53             ` Ben Skeggs
2011-02-04 11:03               ` Tejun Heo
2011-01-03 13:49 ` [PATCH 09/32] drm/radeon: " Tejun Heo
2011-01-03 13:49   ` Tejun Heo
2011-01-05  0:21   ` Alex Deucher
2011-01-06  4:31     ` Dave Airlie
2011-01-03 13:49 ` [PATCH 10/32] input/tps6507x-ts: use system_wq instead of dedicated workqueue Tejun Heo
2011-01-03 14:39   ` Dan Carpenter
2011-01-03 16:34     ` Todd Fischer
2011-01-25 14:19       ` Tejun Heo
2011-01-25 16:13         ` Todd Fischer
2011-01-25 16:50           ` Dmitry Torokhov
2011-01-26 10:43             ` Tejun Heo
2011-01-03 13:49 ` [PATCH 11/32] v4l/cx18: update workqueue usage Tejun Heo
2011-01-04  0:54   ` Andy Walls
2011-01-04  8:36     ` Tejun Heo
2011-01-04 13:21       ` Andy Walls
2011-01-08 17:03   ` Andy Walls
2011-01-03 13:49 ` [PATCH 12/32] i2o: use alloc_workqueue() instead of create_workqueue() Tejun Heo
2011-01-03 13:49 ` [PATCH 13/32] misc/iwmc3200top: use system_wq instead of dedicated workqueues Tejun Heo
2011-01-03 13:49 ` [PATCH 14/32] wireless/ipw2x00: " Tejun Heo
2011-01-06 20:51   ` John W. Linville
2011-01-03 13:49 ` [PATCH 15/32] wireless/libertas[_tf]: " Tejun Heo
2011-02-01 10:52   ` Tejun Heo
2011-01-03 13:49 ` [PATCH 16/32] scsi/be2iscsi,qla2xxx: convert to alloc_workqueue() Tejun Heo
2011-02-01 23:45   ` Mike Christie
2011-02-02 10:25     ` Tejun Heo
2011-02-02 20:41       ` Mike Christie
2011-02-03  9:31         ` Tejun Heo
2011-01-03 13:49 ` [PATCH 17/32] scsi/ibmvstgt: use system_wq instead of vtgtd workqueue Tejun Heo
2011-01-03 17:45   ` Bart Van Assche
2011-01-04  5:20     ` Tejun Heo
2011-01-24 16:09   ` Bart Van Assche
2011-01-24 16:24     ` Tejun Heo
2011-02-01 10:40       ` Tejun Heo
2011-02-01 14:18         ` FUJITA Tomonori
2011-02-01 14:25           ` James Bottomley
2011-02-01 14:29           ` Tejun Heo
2011-01-03 13:49 ` [PATCH 18/32] scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path Tejun Heo
2011-01-03 13:49 ` [PATCH 19/32] usb/ueagle-atm: use system_wq instead of dedicated workqueues Tejun Heo
2011-01-03 13:49 ` [PATCH 20/32] video/msm_fb: " Tejun Heo
     [not found]   ` <1294062595-30097-21-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2011-01-03 17:06     ` Stanislaw Gruszka
2011-01-03 17:06       ` Stanislaw Gruszka
     [not found]       ` <20110103170615.GB2285-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-01-03 18:10         ` Daniel Walker
2011-01-03 18:10           ` Daniel Walker
     [not found]           ` <1294078245.18295.5.camel-y5Owza0q8UhBVvN7MMdr1KRtKmQZhJ7pQQ4Iyu8u01E@public.gmane.org>
2011-01-25 13:45             ` Tejun Heo
2011-01-25 13:45               ` Tejun Heo
     [not found]               ` <20110125134558.GY27510-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2011-01-25 18:00                 ` Daniel Walker
2011-01-25 18:00                   ` Daniel Walker
2011-01-25 19:14                   ` David Brown
2011-01-25 19:16                     ` Daniel Walker
     [not found]                       ` <1295982976.25496.29.camel-y5Owza0q8UhBVvN7MMdr1KRtKmQZhJ7pQQ4Iyu8u01E@public.gmane.org>
2011-01-25 22:08                         ` Carl Vanderlip
2011-01-25 22:08                           ` Carl Vanderlip
2011-01-25 23:21   ` David Brown
2011-01-26 10:41     ` Tejun Heo
2011-01-26 18:04       ` David Brown
2011-01-03 13:49 ` [PATCH 21/32] fs/aio: aio_wq isn't used in memory reclaim path Tejun Heo
2011-01-04 15:56   ` Jeff Moyer
2011-01-05 11:28     ` Tejun Heo
2011-01-05 14:50       ` Jeff Moyer
2011-01-05 15:00         ` Benjamin LaHaise
2011-01-05 15:49           ` Jeff Moyer
2011-01-26 11:21   ` [PATCH UPDATED " Tejun Heo
2011-01-26 16:29     ` Jeff Moyer
2011-01-26 16:38       ` Tejun Heo
2011-01-03 13:49 ` [PATCH 22/32] ceph: fsc->*_wq's aren't " Tejun Heo
2011-01-03 17:58   ` Sage Weil
2011-01-03 13:49 ` Tejun Heo [this message]
2011-01-03 17:58   ` [PATCH 23/32] net/ceph: make ceph_msgr_wq non-reentrant Sage Weil
2011-01-03 13:49 ` [PATCH 24/32] dlm: dlm workqueues aren't used in memory reclaim path Tejun Heo
2011-01-03 13:58   ` Steven Whitehouse
2011-01-03 13:58     ` [Cluster-devel] " Steven Whitehouse
2011-01-03 14:01     ` Tejun Heo
2011-01-03 14:21       ` Steven Whitehouse
2011-01-03 14:21         ` [Cluster-devel] " Steven Whitehouse
2011-01-03 14:27         ` Tejun Heo
2011-01-03 14:39           ` Steven Whitehouse
2011-01-03 14:39             ` [Cluster-devel] " Steven Whitehouse
2011-01-03 14:44             ` Tejun Heo
2011-01-03 13:49 ` [PATCH 25/32] ext4: convert to alloc_workqueue() Tejun Heo
2011-01-03 13:49 ` [PATCH 26/32] ocfs2: use system_wq instead of ocfs2_quota_wq Tejun Heo
2011-01-03 13:49 ` [PATCH 27/32] reiserfs: make commit_wq use the default concurrency level Tejun Heo
2011-01-03 13:49 ` [PATCH 28/32] xfs: convert to alloc_workqueue() Tejun Heo
2011-01-03 13:49 ` [PATCH 29/32] net/9p: use system_wq instead of p9_mux_wq Tejun Heo
2011-01-03 13:49 ` [PATCH 30/32] net/9p: replace p9_poll_task with a work Tejun Heo
2011-01-03 13:49 ` [PATCH 31/32] rds/ib: use system_wq instead of rds_ib_fmr_wq Tejun Heo
2011-01-03 13:49 ` [PATCH 32/32] rxrpc: rxrpc_workqueue isn't used during memory reclaim Tejun Heo
2011-01-25 14:29 ` [PATCHSET] workqueue: update workqueue users - replace create_workqueue() Tejun Heo
2011-01-25 16:26   ` Dave Jones
2011-01-26 10:40     ` Tejun Heo
2011-01-25 17:47   ` Madhu Iyengar
2011-01-26  4:46   ` Greg KH
2011-02-01 10:44   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1294062595-30097-24-git-send-email-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.