linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/17] DRBD updates
@ 2017-08-24 21:22 Philipp Reisner
  2017-08-24 21:22 ` [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug Philipp Reisner
                   ` (16 more replies)
  0 siblings, 17 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:22 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

Hi Jens,

Please consider these patches for your for-4.14 branch.

The first and third patch help with request merging on DRBD's secondary side.
That can improves performance for some workloads.

The other patches are fixes and random mentenance.


Baoyou Xie (1):
  drbd: mark symbols static where possible

Geliang Tang (1):
  drbd: Use setup_timer() instead of init_timer() to simplify the code.

Greg Kroah-Hartman (1):
  drbd: rename "usermode_helper" to "drbd_usermode_helper"

Lars Ellenberg (9):
  drbd: introduce drbd_recv_header_maybe_unplug
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: add explicit plugging when submitting batches
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: new disk-option disable-write-same
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: fix race between handshake and admin disconnect/down

Markus Elfring (1):
  drbd: A single dot should be put into a sequence.

Philipp Reisner (1):
  drbd: Fix resource role for newly created resources in events2

Roland Kammerer (3):
  drbd: move global variables to drbd namespace and make some static
  drbd: abort drbd_start_resync if there is no connection
  drbd: switch from kmalloc() to kmalloc_array()

 drivers/block/drbd/drbd_int.h      |  27 +++++-----
 drivers/block/drbd/drbd_main.c     | 106 +++++++++++++++++++++----------------
 drivers/block/drbd/drbd_nl.c       |  60 +++++++++------------
 drivers/block/drbd/drbd_proc.c     |  10 ++--
 drivers/block/drbd/drbd_receiver.c |  56 +++++++++++++++++---
 drivers/block/drbd/drbd_req.c      |  80 ++++++++++++++++++++++++++--
 drivers/block/drbd/drbd_req.h      |   6 +++
 drivers/block/drbd/drbd_state.c    |  48 ++++++++++++++++-
 drivers/block/drbd/drbd_state.h    |   8 +++
 drivers/block/drbd/drbd_worker.c   |  46 ++++++++++++----
 include/linux/drbd.h               |   2 +-
 include/linux/drbd_genl.h          |   3 +-
 include/linux/drbd_limits.h        |   8 ++-
 13 files changed, 333 insertions(+), 127 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
@ 2017-08-24 21:22 ` Philipp Reisner
  2017-08-25 17:26   ` Jens Axboe
  2017-08-24 21:22 ` [PATCH 02/17] drbd: change list_for_each_safe to while(list_first_entry_or_null) Philipp Reisner
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:22 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

Recently, drbd_recv_header() was changed to potentially
implicitly "unplug" the backend device(s), in case there
is currently nothing to receive.

Be more explicit about it: re-introduce the original drbd_recv_header(),
and introduce a new drbd_recv_header_maybe_unplug() for use by the
receiver "main loop".

Using explicit plugging via blk_start_plug(); blk_finish_plug();
really helps the io-scheduler of the backend with merging requests.

Wrap the receiver "main loop" with such a plug.
Also catch unplug events on the Primary,
and try to propagate.

This is performance relevant.  Without this, if the receiving side does
not merge requests, number of IOPS on the peer can me significantly
higher than IOPS on the Primary, and can easily become the bottleneck.

Together, both changes should help to reduce the number of IOPS
as seen on the backend of the receiving side, by increasing
the chance of merging mergable requests, without trading latency
for more throughput.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 819f9d0..74a7d0b 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -745,6 +745,8 @@ struct drbd_connection {
 	unsigned current_tle_writes;	/* writes seen within this tl epoch */
 
 	unsigned long last_reconnect_jif;
+	/* empty member on older kernels without blk_start_plug() */
+	struct blk_plug receiver_plug;
 	struct drbd_thread receiver;
 	struct drbd_thread worker;
 	struct drbd_thread ack_receiver;
@@ -1131,7 +1133,8 @@ extern void conn_send_sr_reply(struct drbd_connection *connection, enum drbd_sta
 extern int drbd_send_rs_deallocated(struct drbd_peer_device *, struct drbd_peer_request *);
 extern void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *ldev);
 extern void drbd_device_cleanup(struct drbd_device *device);
-void drbd_print_uuids(struct drbd_device *device, const char *text);
+extern void drbd_print_uuids(struct drbd_device *device, const char *text);
+extern void drbd_queue_unplug(struct drbd_device *device);
 
 extern void conn_md_sync(struct drbd_connection *connection);
 extern void drbd_md_write(struct drbd_device *device, void *buffer);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index e2ed28d..a3b2ee7 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1952,6 +1952,19 @@ static void drbd_release(struct gendisk *gd, fmode_t mode)
 	mutex_unlock(&drbd_main_mutex);
 }
 
+/* need to hold resource->req_lock */
+void drbd_queue_unplug(struct drbd_device *device)
+{
+	if (device->state.pdsk >= D_INCONSISTENT && device->state.conn >= C_CONNECTED) {
+		D_ASSERT(device, device->state.role == R_PRIMARY);
+		if (test_and_clear_bit(UNPLUG_REMOTE, &device->flags)) {
+			drbd_queue_work_if_unqueued(
+				&first_peer_device(device)->connection->sender_work,
+				&device->unplug_work);
+		}
+	}
+}
+
 static void drbd_set_defaults(struct drbd_device *device)
 {
 	/* Beware! The actual layout differs
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index ece6e5d..1b3f439 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1194,6 +1194,14 @@ static int decode_header(struct drbd_connection *connection, void *header, struc
 	return 0;
 }
 
+static void drbd_unplug_all_devices(struct drbd_connection *connection)
+{
+	if (current->plug == &connection->receiver_plug) {
+		blk_finish_plug(&connection->receiver_plug);
+		blk_start_plug(&connection->receiver_plug);
+	} /* else: maybe just schedule() ?? */
+}
+
 static int drbd_recv_header(struct drbd_connection *connection, struct packet_info *pi)
 {
 	void *buffer = connection->data.rbuf;
@@ -1209,6 +1217,36 @@ static int drbd_recv_header(struct drbd_connection *connection, struct packet_in
 	return err;
 }
 
+static int drbd_recv_header_maybe_unplug(struct drbd_connection *connection, struct packet_info *pi)
+{
+	void *buffer = connection->data.rbuf;
+	unsigned int size = drbd_header_size(connection);
+	int err;
+
+	err = drbd_recv_short(connection->data.socket, buffer, size, MSG_NOSIGNAL|MSG_DONTWAIT);
+	if (err != size) {
+		/* If we have nothing in the receive buffer now, to reduce
+		 * application latency, try to drain the backend queues as
+		 * quickly as possible, and let remote TCP know what we have
+		 * received so far. */
+		if (err == -EAGAIN) {
+			drbd_tcp_quickack(connection->data.socket);
+			drbd_unplug_all_devices(connection);
+		}
+		if (err > 0) {
+			buffer += err;
+			size -= err;
+		}
+		err = drbd_recv_all_warn(connection, buffer, size);
+		if (err)
+			return err;
+	}
+
+	err = decode_header(connection, connection->data.rbuf, pi);
+	connection->last_received = jiffies;
+
+	return err;
+}
 /* This is blkdev_issue_flush, but asynchronous.
  * We want to submit to all component volumes in parallel,
  * then wait for all completions.
@@ -4882,8 +4920,8 @@ static void drbdd(struct drbd_connection *connection)
 		struct data_cmd const *cmd;
 
 		drbd_thread_current_set_cpu(&connection->receiver);
-		update_receiver_timing_details(connection, drbd_recv_header);
-		if (drbd_recv_header(connection, &pi))
+		update_receiver_timing_details(connection, drbd_recv_header_maybe_unplug);
+		if (drbd_recv_header_maybe_unplug(connection, &pi))
 			goto err_out;
 
 		cmd = &drbd_cmd_handler[pi.cmd];
@@ -5375,8 +5413,11 @@ int drbd_receiver(struct drbd_thread *thi)
 		}
 	} while (h == 0);
 
-	if (h > 0)
+	if (h > 0) {
+		blk_start_plug(&connection->receiver_plug);
 		drbdd(connection);
+		blk_finish_plug(&connection->receiver_plug);
+	}
 
 	conn_disconnect(connection);
 
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 447c975..5955ab8 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1279,6 +1279,62 @@ static bool may_do_writes(struct drbd_device *device)
 	return s.disk == D_UP_TO_DATE || s.pdsk == D_UP_TO_DATE;
 }
 
+#ifndef blk_queue_plugged
+struct drbd_plug_cb {
+	struct blk_plug_cb cb;
+	struct drbd_request *most_recent_req;
+	/* do we need more? */
+};
+
+static void drbd_unplug(struct blk_plug_cb *cb, bool from_schedule)
+{
+	struct drbd_plug_cb *plug = container_of(cb, struct drbd_plug_cb, cb);
+	struct drbd_resource *resource = plug->cb.data;
+	struct drbd_request *req = plug->most_recent_req;
+
+	if (!req)
+		return;
+
+	spin_lock_irq(&resource->req_lock);
+	/* In case the sender did not process it yet, raise the flag to
+	 * have it followed with P_UNPLUG_REMOTE just after. */
+	req->rq_state |= RQ_UNPLUG;
+	/* but also queue a generic unplug */
+	drbd_queue_unplug(req->device);
+	spin_unlock_irq(&resource->req_lock);
+	kref_put(&req->kref, drbd_req_destroy);
+}
+
+static struct drbd_plug_cb* drbd_check_plugged(struct drbd_resource *resource)
+{
+	/* A lot of text to say
+	 * return (struct drbd_plug_cb*)blk_check_plugged(); */
+	struct drbd_plug_cb *plug;
+	struct blk_plug_cb *cb = blk_check_plugged(drbd_unplug, resource, sizeof(*plug));
+
+	if (cb)
+		plug = container_of(cb, struct drbd_plug_cb, cb);
+	else
+		plug = NULL;
+	return plug;
+}
+
+static void drbd_update_plug(struct drbd_plug_cb *plug, struct drbd_request *req)
+{
+	struct drbd_request *tmp = plug->most_recent_req;
+	/* Will be sent to some peer.
+	 * Remember to tag it with UNPLUG_REMOTE on unplug */
+	kref_get(&req->kref);
+	plug->most_recent_req = req;
+	if (tmp)
+		kref_put(&tmp->kref, drbd_req_destroy);
+}
+#else
+struct drbd_plug_cb { };
+static void * drbd_check_plugged(struct drbd_resource *resource) { return NULL; };
+static void drbd_update_plug(struct drbd_plug_cb *plug, struct drbd_request *req) { };
+#endif
+
 static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request *req)
 {
 	struct drbd_resource *resource = device->resource;
@@ -1287,6 +1343,8 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
 	bool no_remote = false;
 	bool submit_private_bio = false;
 
+	struct drbd_plug_cb *plug = drbd_check_plugged(resource);
+
 	spin_lock_irq(&resource->req_lock);
 	if (rw == WRITE) {
 		/* This may temporarily give up the req_lock,
@@ -1351,6 +1409,9 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
 			no_remote = true;
 	}
 
+	if (plug != NULL && no_remote == false)
+		drbd_update_plug(plug, req);
+
 	/* If it took the fast path in drbd_request_prepare, add it here.
 	 * The slow path has added it already. */
 	if (list_empty(&req->req_pending_master_completion))
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index 9e1866a..a2254f8 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -212,6 +212,11 @@ enum drbd_req_state_bits {
 	/* Should call drbd_al_complete_io() for this request... */
 	__RQ_IN_ACT_LOG,
 
+	/* This was the most recent request during some blk_finish_plug()
+	 * or its implicit from-schedule equivalent.
+	 * We may use it as hint to send a P_UNPLUG_REMOTE */
+	__RQ_UNPLUG,
+
 	/* The peer has sent a retry ACK */
 	__RQ_POSTPONED,
 
@@ -249,6 +254,7 @@ enum drbd_req_state_bits {
 #define RQ_WSAME           (1UL << __RQ_WSAME)
 #define RQ_UNMAP           (1UL << __RQ_UNMAP)
 #define RQ_IN_ACT_LOG      (1UL << __RQ_IN_ACT_LOG)
+#define RQ_UNPLUG          (1UL << __RQ_UNPLUG)
 #define RQ_POSTPONED	   (1UL << __RQ_POSTPONED)
 #define RQ_COMPLETION_SUSP (1UL << __RQ_COMPLETION_SUSP)
 #define RQ_EXP_RECEIVE_ACK (1UL << __RQ_EXP_RECEIVE_ACK)
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index c268d88..2745db2 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1382,18 +1382,22 @@ static int drbd_send_barrier(struct drbd_connection *connection)
 	return conn_send_command(connection, sock, P_BARRIER, sizeof(*p), NULL, 0);
 }
 
+static int pd_send_unplug_remote(struct drbd_peer_device *pd)
+{
+	struct drbd_socket *sock = &pd->connection->data;
+	if (!drbd_prepare_command(pd, sock))
+		return -EIO;
+	return drbd_send_command(pd, sock, P_UNPLUG_REMOTE, 0, NULL, 0);
+}
+
 int w_send_write_hint(struct drbd_work *w, int cancel)
 {
 	struct drbd_device *device =
 		container_of(w, struct drbd_device, unplug_work);
-	struct drbd_socket *sock;
 
 	if (cancel)
 		return 0;
-	sock = &first_peer_device(device)->connection->data;
-	if (!drbd_prepare_command(first_peer_device(device), sock))
-		return -EIO;
-	return drbd_send_command(first_peer_device(device), sock, P_UNPLUG_REMOTE, 0, NULL, 0);
+	return pd_send_unplug_remote(first_peer_device(device));
 }
 
 static void re_init_if_first_write(struct drbd_connection *connection, unsigned int epoch)
@@ -1455,6 +1459,7 @@ int w_send_dblock(struct drbd_work *w, int cancel)
 	struct drbd_device *device = req->device;
 	struct drbd_peer_device *const peer_device = first_peer_device(device);
 	struct drbd_connection *connection = peer_device->connection;
+	bool do_send_unplug = req->rq_state & RQ_UNPLUG;
 	int err;
 
 	if (unlikely(cancel)) {
@@ -1470,6 +1475,9 @@ int w_send_dblock(struct drbd_work *w, int cancel)
 	err = drbd_send_dblock(peer_device, req);
 	req_mod(req, err ? SEND_FAILED : HANDED_OVER_TO_NETWORK);
 
+	if (do_send_unplug && !err)
+		pd_send_unplug_remote(peer_device);
+
 	return err;
 }
 
@@ -1484,6 +1492,7 @@ int w_send_read_req(struct drbd_work *w, int cancel)
 	struct drbd_device *device = req->device;
 	struct drbd_peer_device *const peer_device = first_peer_device(device);
 	struct drbd_connection *connection = peer_device->connection;
+	bool do_send_unplug = req->rq_state & RQ_UNPLUG;
 	int err;
 
 	if (unlikely(cancel)) {
@@ -1501,6 +1510,9 @@ int w_send_read_req(struct drbd_work *w, int cancel)
 
 	req_mod(req, err ? SEND_FAILED : HANDED_OVER_TO_NETWORK);
 
+	if (do_send_unplug && !err)
+		pd_send_unplug_remote(peer_device);
+
 	return err;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/17] drbd: change list_for_each_safe to while(list_first_entry_or_null)
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
  2017-08-24 21:22 ` [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug Philipp Reisner
@ 2017-08-24 21:22 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 03/17] drbd: add explicit plugging when submitting batches Philipp Reisner
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:22 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

Two instances of list_for_each_safe can drop their tmp element, they
really just peel off each element in turn from the start of the list.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 5955ab8..85e05ee 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1485,12 +1485,12 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
 					    struct list_head *pending,
 					    struct list_head *later)
 {
-	struct drbd_request *req, *tmp;
+	struct drbd_request *req;
 	int wake = 0;
 	int err;
 
 	spin_lock_irq(&device->al_lock);
-	list_for_each_entry_safe(req, tmp, incoming, tl_requests) {
+	while ((req = list_first_entry_or_null(incoming, struct drbd_request, tl_requests))) {
 		err = drbd_al_begin_io_nonblock(device, &req->i);
 		if (err == -ENOBUFS)
 			break;
@@ -1509,9 +1509,9 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
 
 void send_and_submit_pending(struct drbd_device *device, struct list_head *pending)
 {
-	struct drbd_request *req, *tmp;
+	struct drbd_request *req;
 
-	list_for_each_entry_safe(req, tmp, pending, tl_requests) {
+	while ((req = list_first_entry_or_null(pending, struct drbd_request, tl_requests))) {
 		req->rq_state |= RQ_IN_ACT_LOG;
 		req->in_actlog_jif = jiffies;
 		atomic_dec(&device->ap_actlog_cnt);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/17] drbd: add explicit plugging when submitting batches
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
  2017-08-24 21:22 ` [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug Philipp Reisner
  2017-08-24 21:22 ` [PATCH 02/17] drbd: change list_for_each_safe to while(list_first_entry_or_null) Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 04/17] drbd: Send P_NEG_ACK upon write error in protocol != C Philipp Reisner
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

When submitting batches of requests which had been queued on the
submitter thread, typically because they needed to wait for an
activity log transactions, use explicit plugging to help potential
merging of requests in the backend io-scheduler.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 85e05ee..2c82330 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1292,6 +1292,7 @@ static void drbd_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	struct drbd_resource *resource = plug->cb.data;
 	struct drbd_request *req = plug->most_recent_req;
 
+	kfree(cb);
 	if (!req)
 		return;
 
@@ -1301,8 +1302,8 @@ static void drbd_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	req->rq_state |= RQ_UNPLUG;
 	/* but also queue a generic unplug */
 	drbd_queue_unplug(req->device);
-	spin_unlock_irq(&resource->req_lock);
 	kref_put(&req->kref, drbd_req_destroy);
+	spin_unlock_irq(&resource->req_lock);
 }
 
 static struct drbd_plug_cb* drbd_check_plugged(struct drbd_resource *resource)
@@ -1343,8 +1344,6 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
 	bool no_remote = false;
 	bool submit_private_bio = false;
 
-	struct drbd_plug_cb *plug = drbd_check_plugged(resource);
-
 	spin_lock_irq(&resource->req_lock);
 	if (rw == WRITE) {
 		/* This may temporarily give up the req_lock,
@@ -1409,8 +1408,11 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
 			no_remote = true;
 	}
 
-	if (plug != NULL && no_remote == false)
-		drbd_update_plug(plug, req);
+	if (no_remote == false) {
+		struct drbd_plug_cb *plug = drbd_check_plugged(resource);
+		if (plug)
+			drbd_update_plug(plug, req);
+	}
 
 	/* If it took the fast path in drbd_request_prepare, add it here.
 	 * The slow path has added it already. */
@@ -1460,7 +1462,10 @@ void __drbd_make_request(struct drbd_device *device, struct bio *bio, unsigned l
 
 static void submit_fast_path(struct drbd_device *device, struct list_head *incoming)
 {
+	struct blk_plug plug;
 	struct drbd_request *req, *tmp;
+
+	blk_start_plug(&plug);
 	list_for_each_entry_safe(req, tmp, incoming, tl_requests) {
 		const int rw = bio_data_dir(req->master_bio);
 
@@ -1478,6 +1483,7 @@ static void submit_fast_path(struct drbd_device *device, struct list_head *incom
 		list_del_init(&req->tl_requests);
 		drbd_send_and_submit(device, req);
 	}
+	blk_finish_plug(&plug);
 }
 
 static bool prepare_al_transaction_nonblock(struct drbd_device *device,
@@ -1507,10 +1513,12 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
 	return !list_empty(pending);
 }
 
-void send_and_submit_pending(struct drbd_device *device, struct list_head *pending)
+static void send_and_submit_pending(struct drbd_device *device, struct list_head *pending)
 {
+	struct blk_plug plug;
 	struct drbd_request *req;
 
+	blk_start_plug(&plug);
 	while ((req = list_first_entry_or_null(pending, struct drbd_request, tl_requests))) {
 		req->rq_state |= RQ_IN_ACT_LOG;
 		req->in_actlog_jif = jiffies;
@@ -1518,6 +1526,7 @@ void send_and_submit_pending(struct drbd_device *device, struct list_head *pendi
 		list_del_init(&req->tl_requests);
 		drbd_send_and_submit(device, req);
 	}
+	blk_finish_plug(&plug);
 }
 
 void do_submit(struct work_struct *ws)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/17] drbd: Send P_NEG_ACK upon write error in protocol != C
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (2 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 03/17] drbd: add explicit plugging when submitting batches Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 05/17] drbd: mark symbols static where possible Philipp Reisner
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

In protocol != C, we forgot to send the P_NEG_ACK for failing writes.

Once we no longer submit to local disk, because we already "detached",
due to the typical "on-io-error detach;" config setting,
we already send the neg acks right away.

Only those requests that have been submitted,
and have been error-completed by the local disk,
would forget to send the neg-ack,
and only in asynchronous replication (protocol != C).
Unless this happened during resync,
where we already always send acks, regardless of protocol.

The primary side needs the P_NEG_ACK in order to mark
the affected block(s) for resync in its out-of-sync bitmap.

If the blocks in question are not re-written again,
we may miss to resync them later, causing data inconsistencies.

This patch will always send the neg-acks, and also at least try to
persist the out-of-sync status on the local node already.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 2745db2..72cb0bd 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -128,6 +128,14 @@ void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req) __releases(l
 	block_id = peer_req->block_id;
 	peer_req->flags &= ~EE_CALL_AL_COMPLETE_IO;
 
+	if (peer_req->flags & EE_WAS_ERROR) {
+		/* In protocol != C, we usually do not send write acks.
+		 * In case of a write error, send the neg ack anyways. */
+		if (!__test_and_set_bit(__EE_SEND_WRITE_ACK, &peer_req->flags))
+			inc_unacked(device);
+		drbd_set_out_of_sync(device, peer_req->i.sector, peer_req->i.size);
+	}
+
 	spin_lock_irqsave(&device->resource->req_lock, flags);
 	device->writ_cnt += peer_req->i.size >> 9;
 	list_move_tail(&peer_req->w.list, &device->done_ee);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/17] drbd: mark symbols static where possible
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (3 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 04/17] drbd: Send P_NEG_ACK upon write error in protocol != C Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 06/17] drbd: Fix resource role for newly created resources in events2 Philipp Reisner
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Baoyou Xie <baoyou.xie@linaro.org>

We get a few warnings when building kernel with W=1:
drbd/drbd_receiver.c:1224:6: warning: no previous prototype for 'one_flush_endio' [-Wmissing-prototypes]
drbd/drbd_req.c:1450:6: warning: no previous prototype for 'send_and_submit_pending' [-Wmissing-prototypes]
drbd/drbd_main.c:924:6: warning: no previous prototype for 'assign_p_sizes_qlim' [-Wmissing-prototypes]
....

In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
So this patch marks these functions with 'static'.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a3b2ee7..11f3852 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -923,7 +923,9 @@ void drbd_gen_and_send_sync_uuid(struct drbd_peer_device *peer_device)
 }
 
 /* communicated if (agreed_features & DRBD_FF_WSAME) */
-void assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p, struct request_queue *q)
+static void
+assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p,
+					struct request_queue *q)
 {
 	if (q) {
 		p->qlim->physical_block_size = cpu_to_be32(queue_physical_block_size(q));
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 1b3f439..2489667 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1261,7 +1261,7 @@ struct one_flush_context {
 	struct issue_flush_context *ctx;
 };
 
-void one_flush_endio(struct bio *bio)
+static void one_flush_endio(struct bio *bio)
 {
 	struct one_flush_context *octx = bio->bi_private;
 	struct drbd_device *device = octx->device;
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 72cb0bd..e48012d 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -203,7 +203,8 @@ void drbd_peer_request_endio(struct bio *bio)
 	}
 }
 
-void drbd_panic_after_delayed_completion_of_aborted_request(struct drbd_device *device)
+static void
+drbd_panic_after_delayed_completion_of_aborted_request(struct drbd_device *device)
 {
 	panic("drbd%u %s/%u potential random memory corruption caused by delayed completion of aborted local request\n",
 		device->minor, device->resource->name, device->vnr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/17] drbd: Fix resource role for newly created resources in events2
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (4 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 05/17] drbd: mark symbols static where possible Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 07/17] drbd: new disk-option disable-write-same Philipp Reisner
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

The conn_higest_role() (a terribly misnamed function) returns
the role of the resource. It returned R_UNKNOWN as long as the
resource has not a single device.

Resources without devices are short living objects.

But it matters for the NOTIFY_CREATE netwlink message. It makes
a lot more sense to report R_SECONDARY for the newly created
resource than R_UNKNOWN.

I reviewd all call sites of conn_highest_role(), that change
does not matter for the other call sites.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index eea0c4a..306f116 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -346,7 +346,7 @@ static enum drbd_role min_role(enum drbd_role role1, enum drbd_role role2)
 
 enum drbd_role conn_highest_role(struct drbd_connection *connection)
 {
-	enum drbd_role role = R_UNKNOWN;
+	enum drbd_role role = R_SECONDARY;
 	struct drbd_peer_device *peer_device;
 	int vnr;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/17] drbd: new disk-option disable-write-same
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (5 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 06/17] drbd: Fix resource role for newly created resources in events2 Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 08/17] drbd: fix potential get_ldev/put_ldev refcount imbalance during attach Philipp Reisner
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

Some backend devices claim to support write-same,
but would fail actual write-same requests.

Allow to set (or toggle) whether or not DRBD tries to support write-same.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index ad0fcb4..c383b6c 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1236,12 +1236,18 @@ static void fixup_discard_if_not_supported(struct request_queue *q)
 
 static void decide_on_write_same_support(struct drbd_device *device,
 			struct request_queue *q,
-			struct request_queue *b, struct o_qlim *o)
+			struct request_queue *b, struct o_qlim *o,
+			bool disable_write_same)
 {
 	struct drbd_peer_device *peer_device = first_peer_device(device);
 	struct drbd_connection *connection = peer_device->connection;
 	bool can_do = b ? b->limits.max_write_same_sectors : true;
 
+	if (can_do && disable_write_same) {
+		can_do = false;
+		drbd_info(peer_device, "WRITE_SAME disabled by config\n");
+	}
+
 	if (can_do && connection->cstate >= C_CONNECTED && !(connection->agreed_features & DRBD_FF_WSAME)) {
 		can_do = false;
 		drbd_info(peer_device, "peer does not support WRITE_SAME\n");
@@ -1302,6 +1308,7 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
 	struct request_queue *b = NULL;
 	struct disk_conf *dc;
 	bool discard_zeroes_if_aligned = true;
+	bool disable_write_same = false;
 
 	if (bdev) {
 		b = bdev->backing_bdev->bd_disk->queue;
@@ -1311,6 +1318,7 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
 		dc = rcu_dereference(device->ldev->disk_conf);
 		max_segments = dc->max_bio_bvecs;
 		discard_zeroes_if_aligned = dc->discard_zeroes_if_aligned;
+		disable_write_same = dc->disable_write_same;
 		rcu_read_unlock();
 
 		blk_set_stacking_limits(&q->limits);
@@ -1321,7 +1329,7 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
 	blk_queue_max_segments(q, max_segments ? max_segments : BLK_MAX_SEGMENTS);
 	blk_queue_segment_boundary(q, PAGE_SIZE-1);
 	decide_on_discard_support(device, q, b, discard_zeroes_if_aligned);
-	decide_on_write_same_support(device, q, b, o);
+	decide_on_write_same_support(device, q, b, o, disable_write_same);
 
 	if (b) {
 		blk_queue_stack_limits(q, b);
@@ -1612,7 +1620,8 @@ int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info)
 	if (write_ordering_changed(old_disk_conf, new_disk_conf))
 		drbd_bump_write_ordering(device->resource, NULL, WO_BDEV_FLUSH);
 
-	if (old_disk_conf->discard_zeroes_if_aligned != new_disk_conf->discard_zeroes_if_aligned)
+	if (old_disk_conf->discard_zeroes_if_aligned != new_disk_conf->discard_zeroes_if_aligned
+	||  old_disk_conf->disable_write_same != new_disk_conf->disable_write_same)
 		drbd_reconsider_queue_parameters(device, device->ldev, NULL);
 
 	drbd_md_sync(device);
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index 2896f93..4e6d4d4 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -132,7 +132,8 @@ GENL_struct(DRBD_NLA_DISK_CONF, 3, disk_conf,
 	__flg_field_def(18, DRBD_GENLA_F_MANDATORY,	disk_drain, DRBD_DISK_DRAIN_DEF)
 	__flg_field_def(19, DRBD_GENLA_F_MANDATORY,	md_flushes, DRBD_MD_FLUSHES_DEF)
 	__flg_field_def(23,     0 /* OPTIONAL */,	al_updates, DRBD_AL_UPDATES_DEF)
-	__flg_field_def(24,     0 /* OPTIONAL */,	discard_zeroes_if_aligned, DRBD_DISCARD_ZEROES_IF_ALIGNED)
+	__flg_field_def(24,     0 /* OPTIONAL */,	discard_zeroes_if_aligned, DRBD_DISCARD_ZEROES_IF_ALIGNED_DEF)
+	__flg_field_def(26,     0 /* OPTIONAL */,	disable_write_same, DRBD_DISABLE_WRITE_SAME_DEF)
 )
 
 GENL_struct(DRBD_NLA_RESOURCE_OPTS, 4, res_opts,
diff --git a/include/linux/drbd_limits.h b/include/linux/drbd_limits.h
index ddac684..24ae1b9 100644
--- a/include/linux/drbd_limits.h
+++ b/include/linux/drbd_limits.h
@@ -209,12 +209,18 @@
 #define DRBD_MD_FLUSHES_DEF	1
 #define DRBD_TCP_CORK_DEF	1
 #define DRBD_AL_UPDATES_DEF     1
+
 /* We used to ignore the discard_zeroes_data setting.
  * To not change established (and expected) behaviour,
  * by default assume that, for discard_zeroes_data=0,
  * we can make that an effective discard_zeroes_data=1,
  * if we only explicitly zero-out unaligned partial chunks. */
-#define DRBD_DISCARD_ZEROES_IF_ALIGNED 1
+#define DRBD_DISCARD_ZEROES_IF_ALIGNED_DEF 1
+
+/* Some backends pretend to support WRITE SAME,
+ * but fail such requests when they are actually submitted.
+ * This is to tell DRBD to not even try. */
+#define DRBD_DISABLE_WRITE_SAME_DEF 0
 
 #define DRBD_ALLOW_TWO_PRIMARIES_DEF	0
 #define DRBD_ALWAYS_ASBP_DEF	0
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/17] drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (6 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 07/17] drbd: new disk-option disable-write-same Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 09/17] drbd: Use setup_timer() instead of init_timer() to simplify the code Philipp Reisner
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

Race:

drbd_adm_attach()               | async drbd_md_endio()
                                |
device->ldev is still NULL.     |
                                |
drbd_md_read(                   |
 .endio = drbd_md_endio;        |
 submit;                        |
 ....                           |
 wait for done == 1;            |       done = 1;
);                              |       wake_up();
.. lot of other stuff,          |
.. includeing taking and        |
...giving up locks,             |
.. doing further IO,            |
.. stuff that takes "some time" |
                                | while in this context,
                                | this is the next statement.
                                | which means this context was scheduled
.. only then, finally,          | away for "some time".
device->ldev = nbc;             |
                                |       if (device->ldev)
                                |               put_ldev()

Unlikely, but possible. I was able to provoke it "reliably"
by adding an mdelay(500); after the wake_up().
Fixed by moving the if (!NULL) put_ldev() before done = 1;

Impact of the bug was that the resulting refcount imbalance
could lead to premature destruction of the object, potentially
causing a NULL pointer dereference during a subsequent detach.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index e48012d..f0717a9 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -65,6 +65,11 @@ void drbd_md_endio(struct bio *bio)
 	device = bio->bi_private;
 	device->md_io.error = blk_status_to_errno(bio->bi_status);
 
+	/* special case: drbd_md_read() during drbd_adm_attach() */
+	if (device->ldev)
+		put_ldev(device);
+	bio_put(bio);
+
 	/* We grabbed an extra reference in _drbd_md_sync_page_io() to be able
 	 * to timeout on the lower level device, and eventually detach from it.
 	 * If this io completion runs after that timeout expired, this
@@ -79,9 +84,6 @@ void drbd_md_endio(struct bio *bio)
 	drbd_md_put_buffer(device);
 	device->md_io.done = 1;
 	wake_up(&device->misc_wait);
-	bio_put(bio);
-	if (device->ldev) /* special case: drbd_md_read() during drbd_adm_attach() */
-		put_ldev(device);
 }
 
 /* reads on behalf of the partner,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/17] drbd: Use setup_timer() instead of init_timer() to simplify the code.
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (7 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 08/17] drbd: fix potential get_ldev/put_ldev refcount imbalance during attach Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 10/17] drbd: fix rmmod cleanup, remove _all_ debugfs entries Philipp Reisner
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Geliang Tang <geliangtang@gmail.com>

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 11f3852..056d9ab 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2023,18 +2023,14 @@ void drbd_init_set_defaults(struct drbd_device *device)
 	device->unplug_work.cb  = w_send_write_hint;
 	device->bm_io_work.w.cb = w_bitmap_io;
 
-	init_timer(&device->resync_timer);
-	init_timer(&device->md_sync_timer);
-	init_timer(&device->start_resync_timer);
-	init_timer(&device->request_timer);
-	device->resync_timer.function = resync_timer_fn;
-	device->resync_timer.data = (unsigned long) device;
-	device->md_sync_timer.function = md_sync_timer_fn;
-	device->md_sync_timer.data = (unsigned long) device;
-	device->start_resync_timer.function = start_resync_timer_fn;
-	device->start_resync_timer.data = (unsigned long) device;
-	device->request_timer.function = request_timer_fn;
-	device->request_timer.data = (unsigned long) device;
+	setup_timer(&device->resync_timer, resync_timer_fn,
+			(unsigned long)device);
+	setup_timer(&device->md_sync_timer, md_sync_timer_fn,
+			(unsigned long)device);
+	setup_timer(&device->start_resync_timer, start_resync_timer_fn,
+			(unsigned long)device);
+	setup_timer(&device->request_timer, request_timer_fn,
+			(unsigned long)device);
 
 	init_waitqueue_head(&device->misc_wait);
 	init_waitqueue_head(&device->state_wait);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/17] drbd: fix rmmod cleanup, remove _all_ debugfs entries
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (8 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 09/17] drbd: Use setup_timer() instead of init_timer() to simplify the code Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 11/17] drbd: A single dot should be put into a sequence Philipp Reisner
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

If there are still resources defined, but "empty", no more volumes
or connections configured, they don't hold module reference counts,
so rmmod is possible.

To avoid DRBD leftovers in debugfs, we need to call our global
drbd_debugfs_cleanup() only after all resources have been cleaned up.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 056d9ab..8b8dd82 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2420,7 +2420,6 @@ static void drbd_cleanup(void)
 		destroy_workqueue(retry.wq);
 
 	drbd_genl_unregister();
-	drbd_debugfs_cleanup();
 
 	idr_for_each_entry(&drbd_devices, device, i)
 		drbd_delete_device(device);
@@ -2431,6 +2430,8 @@ static void drbd_cleanup(void)
 		drbd_free_resource(resource);
 	}
 
+	drbd_debugfs_cleanup();
+
 	drbd_destroy_mempools();
 	unregister_blkdev(DRBD_MAJOR, "drbd");
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/17] drbd: A single dot should be put into a sequence.
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (9 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 10/17] drbd: fix rmmod cleanup, remove _all_ debugfs entries Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 12/17] drbd: fix potential deadlock when trying to detach during handshake Philipp Reisner
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Markus Elfring <elfring@users.sourceforge.net>

Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index 8378142..fc0f627 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -127,7 +127,7 @@ static void drbd_syncer_progress(struct drbd_device *device, struct seq_file *se
 		seq_putc(seq, '=');
 	seq_putc(seq, '>');
 	for (i = 0; i < y; i++)
-		seq_printf(seq, ".");
+		seq_putc(seq, '.');
 	seq_puts(seq, "] ");
 
 	if (state.conn == C_VERIFY_S || state.conn == C_VERIFY_T)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/17] drbd: fix potential deadlock when trying to detach during handshake
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (10 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 11/17] drbd: A single dot should be put into a sequence Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 13/17] drbd: fix race between handshake and admin disconnect/down Philipp Reisner
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

When requesting a detach, we first suspend IO, and also inhibit meta-data IO
by means of drbd_md_get_buffer(), because we don't want to "fail" the disk
while there is IO in-flight: the transition into D_FAILED for detach purposes
may get misinterpreted as actual IO error in a confused endio function.

We wrap it all into wait_event(), to retry in case the drbd_req_state()
returns SS_IN_TRANSIENT_STATE, as it does for example during an ongoing
connection handshake.

In that example, the receiver thread may need to grab drbd_md_get_buffer()
during the handshake to make progress.  To avoid potential deadlock with
detach, detach needs to grab and release the meta data buffer inside of
that wait_event retry loop. To avoid lock inversion between
mutex_lock(&device->state_mutex) and drbd_md_get_buffer(device),
introduce a new enum chg_state_flag CS_INHIBIT_MD_IO, and move the
call to drbd_md_get_buffer() inside the state_mutex grabbed in
drbd_req_state().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index c383b6c..6bb58a6 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2149,34 +2149,13 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
 
 static int adm_detach(struct drbd_device *device, int force)
 {
-	enum drbd_state_rv retcode;
-	void *buffer;
-	int ret;
-
 	if (force) {
 		set_bit(FORCE_DETACH, &device->flags);
 		drbd_force_state(device, NS(disk, D_FAILED));
-		retcode = SS_SUCCESS;
-		goto out;
+		return SS_SUCCESS;
 	}
 
-	drbd_suspend_io(device); /* so no-one is stuck in drbd_al_begin_io */
-	buffer = drbd_md_get_buffer(device, __func__); /* make sure there is no in-flight meta-data IO */
-	if (buffer) {
-		retcode = drbd_request_state(device, NS(disk, D_FAILED));
-		drbd_md_put_buffer(device);
-	} else /* already <= D_FAILED */
-		retcode = SS_NOTHING_TO_DO;
-	/* D_FAILED will transition to DISKLESS. */
-	drbd_resume_io(device);
-	ret = wait_event_interruptible(device->misc_wait,
-			device->state.disk != D_FAILED);
-	if ((int)retcode == (int)SS_IS_DISKLESS)
-		retcode = SS_NOTHING_TO_DO;
-	if (ret)
-		retcode = ERR_INTR;
-out:
-	return retcode;
+	return drbd_request_detach_interruptible(device);
 }
 
 /* Detaching the disk is a process in multiple stages.  First we need to lock
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 306f116..0813c65 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -579,11 +579,14 @@ drbd_req_state(struct drbd_device *device, union drbd_state mask,
 	unsigned long flags;
 	union drbd_state os, ns;
 	enum drbd_state_rv rv;
+	void *buffer = NULL;
 
 	init_completion(&done);
 
 	if (f & CS_SERIALIZE)
 		mutex_lock(device->state_mutex);
+	if (f & CS_INHIBIT_MD_IO)
+		buffer = drbd_md_get_buffer(device, __func__);
 
 	spin_lock_irqsave(&device->resource->req_lock, flags);
 	os = drbd_read_state(device);
@@ -636,6 +639,8 @@ drbd_req_state(struct drbd_device *device, union drbd_state mask,
 	}
 
 abort:
+	if (buffer)
+		drbd_md_put_buffer(device);
 	if (f & CS_SERIALIZE)
 		mutex_unlock(device->state_mutex);
 
@@ -664,6 +669,47 @@ _drbd_request_state(struct drbd_device *device, union drbd_state mask,
 	return rv;
 }
 
+/*
+ * We grab drbd_md_get_buffer(), because we don't want to "fail" the disk while
+ * there is IO in-flight: the transition into D_FAILED for detach purposes
+ * may get misinterpreted as actual IO error in a confused endio function.
+ *
+ * We wrap it all into wait_event(), to retry in case the drbd_req_state()
+ * returns SS_IN_TRANSIENT_STATE.
+ *
+ * To avoid potential deadlock with e.g. the receiver thread trying to grab
+ * drbd_md_get_buffer() while trying to get out of the "transient state", we
+ * need to grab and release the meta data buffer inside of that wait_event loop.
+ */
+static enum drbd_state_rv
+request_detach(struct drbd_device *device)
+{
+	return drbd_req_state(device, NS(disk, D_FAILED),
+			CS_VERBOSE | CS_ORDERED | CS_INHIBIT_MD_IO);
+}
+
+enum drbd_state_rv
+drbd_request_detach_interruptible(struct drbd_device *device)
+{
+	enum drbd_state_rv rv;
+	int ret;
+
+	drbd_suspend_io(device); /* so no-one is stuck in drbd_al_begin_io */
+	wait_event_interruptible(device->state_wait,
+		(rv = request_detach(device)) != SS_IN_TRANSIENT_STATE);
+	drbd_resume_io(device);
+
+	ret = wait_event_interruptible(device->misc_wait,
+			device->state.disk != D_FAILED);
+
+	if (rv == SS_IS_DISKLESS)
+		rv = SS_NOTHING_TO_DO;
+	if (ret)
+		rv = ERR_INTR;
+
+	return rv;
+}
+
 enum drbd_state_rv
 _drbd_request_state_holding_state_mutex(struct drbd_device *device, union drbd_state mask,
 		    union drbd_state val, enum chg_state_flags f)
diff --git a/drivers/block/drbd/drbd_state.h b/drivers/block/drbd/drbd_state.h
index 6c9d5d4..0276c98 100644
--- a/drivers/block/drbd/drbd_state.h
+++ b/drivers/block/drbd/drbd_state.h
@@ -71,6 +71,10 @@ enum chg_state_flags {
 	CS_DC_SUSP       = 1 << 10,
 	CS_DC_MASK       = CS_DC_ROLE + CS_DC_PEER + CS_DC_CONN + CS_DC_DISK + CS_DC_PDSK,
 	CS_IGN_OUTD_FAIL = 1 << 11,
+
+	/* Make sure no meta data IO is in flight, by calling
+	 * drbd_md_get_buffer().  Used for graceful detach. */
+	CS_INHIBIT_MD_IO = 1 << 12,
 };
 
 /* drbd_dev_state and drbd_state are different types. This is to stress the
@@ -156,6 +160,10 @@ static inline int drbd_request_state(struct drbd_device *device,
 	return _drbd_request_state(device, mask, val, CS_VERBOSE + CS_ORDERED);
 }
 
+/* for use in adm_detach() (drbd_adm_detach(), drbd_adm_down()) */
+enum drbd_state_rv
+drbd_request_detach_interruptible(struct drbd_device *device);
+
 enum drbd_role conn_highest_role(struct drbd_connection *connection);
 enum drbd_role conn_highest_peer(struct drbd_connection *connection);
 enum drbd_disk_state conn_highest_disk(struct drbd_connection *connection);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/17] drbd: fix race between handshake and admin disconnect/down
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (11 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 12/17] drbd: fix potential deadlock when trying to detach during handshake Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 14/17] drbd: rename "usermode_helper" to "drbd_usermode_helper" Philipp Reisner
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

conn_try_disconnect() could potentialy hit the BUG_ON()
in _conn_set_state() where it iterates over _drbd_set_state()
and "asserts" via BUG_ON() that the latter was successful.

If the STATE_SENT bit was not yet visible to conn_is_valid_transition()
early in _conn_request_state(), but became visible before conn_set_state()
later in that call path, we could hit the BUG_ON() after _drbd_set_state(),
because it returned SS_IN_TRANSIENT_STATE.

To avoid that race, we better protect set_bit(SENT_STATE) with the spinlock.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 2489667..5e090a1 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1100,7 +1100,10 @@ static int conn_connect(struct drbd_connection *connection)
 	idr_for_each_entry(&connection->peer_devices, peer_device, vnr)
 		mutex_lock(peer_device->device->state_mutex);
 
+	/* avoid a race with conn_request_state( C_DISCONNECTING ) */
+	spin_lock_irq(&connection->resource->req_lock);
 	set_bit(STATE_SENT, &connection->flags);
+	spin_unlock_irq(&connection->resource->req_lock);
 
 	idr_for_each_entry(&connection->peer_devices, peer_device, vnr)
 		mutex_unlock(peer_device->device->state_mutex);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/17] drbd: rename "usermode_helper" to "drbd_usermode_helper"
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (12 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 13/17] drbd: fix race between handshake and admin disconnect/down Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 15/17] drbd: move global variables to drbd namespace and make some static Philipp Reisner
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Nothing like having a very generic global variable in a tiny driver
subsystem to make a mess of the global namespace...

Note, there are many other "generic" named global variables in the drbd
subsystem, someone should fix those up one day before they hit a linking
error.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 74a7d0b..61596af 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -75,7 +75,7 @@ extern int fault_rate;
 extern int fault_devs;
 #endif
 
-extern char usermode_helper[];
+extern char drbd_usermode_helper[];
 
 
 /* This is used to stop/restart our threads.
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 8b8dd82..bdd9ab2 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -109,9 +109,9 @@ int proc_details;       /* Detail level in proc drbd*/
 
 /* Module parameter for setting the user mode helper program
  * to run. Default is /sbin/drbdadm */
-char usermode_helper[80] = "/sbin/drbdadm";
+char drbd_usermode_helper[80] = "/sbin/drbdadm";
 
-module_param_string(usermode_helper, usermode_helper, sizeof(usermode_helper), 0644);
+module_param_string(usermode_helper, drbd_usermode_helper, sizeof(drbd_usermode_helper), 0644);
 
 /* in 2.6.x, our device mapping and config info contains our virtual gendisks
  * as member "struct gendisk *vdisk;"
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 6bb58a6..a12f77e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -344,7 +344,7 @@ int drbd_khelper(struct drbd_device *device, char *cmd)
 			 (char[60]) { }, /* address */
 			NULL };
 	char mb[14];
-	char *argv[] = {usermode_helper, cmd, mb, NULL };
+	char *argv[] = {drbd_usermode_helper, cmd, mb, NULL };
 	struct drbd_connection *connection = first_peer_device(device)->connection;
 	struct sib_info sib;
 	int ret;
@@ -359,19 +359,19 @@ int drbd_khelper(struct drbd_device *device, char *cmd)
 	 * write out any unsynced meta data changes now */
 	drbd_md_sync(device);
 
-	drbd_info(device, "helper command: %s %s %s\n", usermode_helper, cmd, mb);
+	drbd_info(device, "helper command: %s %s %s\n", drbd_usermode_helper, cmd, mb);
 	sib.sib_reason = SIB_HELPER_PRE;
 	sib.helper_name = cmd;
 	drbd_bcast_event(device, &sib);
 	notify_helper(NOTIFY_CALL, device, connection, cmd, 0);
-	ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC);
+	ret = call_usermodehelper(drbd_usermode_helper, argv, envp, UMH_WAIT_PROC);
 	if (ret)
 		drbd_warn(device, "helper command: %s %s %s exit code %u (0x%x)\n",
-				usermode_helper, cmd, mb,
+				drbd_usermode_helper, cmd, mb,
 				(ret >> 8) & 0xff, ret);
 	else
 		drbd_info(device, "helper command: %s %s %s exit code %u (0x%x)\n",
-				usermode_helper, cmd, mb,
+				drbd_usermode_helper, cmd, mb,
 				(ret >> 8) & 0xff, ret);
 	sib.sib_reason = SIB_HELPER_POST;
 	sib.helper_exit_code = ret;
@@ -396,24 +396,24 @@ enum drbd_peer_state conn_khelper(struct drbd_connection *connection, char *cmd)
 			 (char[60]) { }, /* address */
 			NULL };
 	char *resource_name = connection->resource->name;
-	char *argv[] = {usermode_helper, cmd, resource_name, NULL };
+	char *argv[] = {drbd_usermode_helper, cmd, resource_name, NULL };
 	int ret;
 
 	setup_khelper_env(connection, envp);
 	conn_md_sync(connection);
 
-	drbd_info(connection, "helper command: %s %s %s\n", usermode_helper, cmd, resource_name);
+	drbd_info(connection, "helper command: %s %s %s\n", drbd_usermode_helper, cmd, resource_name);
 	/* TODO: conn_bcast_event() ?? */
 	notify_helper(NOTIFY_CALL, NULL, connection, cmd, 0);
 
-	ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC);
+	ret = call_usermodehelper(drbd_usermode_helper, argv, envp, UMH_WAIT_PROC);
 	if (ret)
 		drbd_warn(connection, "helper command: %s %s %s exit code %u (0x%x)\n",
-			  usermode_helper, cmd, resource_name,
+			  drbd_usermode_helper, cmd, resource_name,
 			  (ret >> 8) & 0xff, ret);
 	else
 		drbd_info(connection, "helper command: %s %s %s exit code %u (0x%x)\n",
-			  usermode_helper, cmd, resource_name,
+			  drbd_usermode_helper, cmd, resource_name,
 			  (ret >> 8) & 0xff, ret);
 	/* TODO: conn_bcast_event() ?? */
 	notify_helper(NOTIFY_RESPONSE, NULL, connection, cmd, ret);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/17] drbd: move global variables to drbd namespace and make some static
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (13 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 14/17] drbd: rename "usermode_helper" to "drbd_usermode_helper" Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 16/17] drbd: abort drbd_start_resync if there is no connection Philipp Reisner
  2017-08-24 21:23 ` [PATCH 17/17] drbd: switch from kmalloc() to kmalloc_array() Philipp Reisner
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Roland Kammerer <roland.kammerer@linbit.com>

This is a follow-up to Gregs complaints that drbd clutteres the global
namespace.
Some of DRBD's module parameters are only used within one compilation
unit. Make these static.

Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 61596af..7e8589c 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -63,19 +63,15 @@
 # define __must_hold(x)
 #endif
 
-/* module parameter, defined in drbd_main.c */
-extern unsigned int minor_count;
-extern bool disable_sendpage;
-extern bool allow_oos;
-void tl_abort_disk_io(struct drbd_device *device);
-
+/* shared module parameters, defined in drbd_main.c */
 #ifdef CONFIG_DRBD_FAULT_INJECTION
-extern int enable_faults;
-extern int fault_rate;
-extern int fault_devs;
+extern int drbd_enable_faults;
+extern int drbd_fault_rate;
 #endif
 
+extern unsigned int drbd_minor_count;
 extern char drbd_usermode_helper[];
+extern int drbd_proc_details;
 
 
 /* This is used to stop/restart our threads.
@@ -181,8 +177,8 @@ _drbd_insert_fault(struct drbd_device *device, unsigned int type);
 static inline int
 drbd_insert_fault(struct drbd_device *device, unsigned int type) {
 #ifdef CONFIG_DRBD_FAULT_INJECTION
-	return fault_rate &&
-		(enable_faults & (1<<type)) &&
+	return drbd_fault_rate &&
+		(drbd_enable_faults & (1<<type)) &&
 		_drbd_insert_fault(device, type);
 #else
 	return 0;
@@ -1466,8 +1462,6 @@ extern struct drbd_resource *drbd_find_resource(const char *name);
 extern void drbd_destroy_resource(struct kref *kref);
 extern void conn_free_crypto(struct drbd_connection *connection);
 
-extern int proc_details;
-
 /* drbd_req */
 extern void do_submit(struct work_struct *ws);
 extern void __drbd_make_request(struct drbd_device *, struct bio *, unsigned long);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index bdd9ab2..56a5436 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -77,40 +77,40 @@ MODULE_PARM_DESC(minor_count, "Approximate number of drbd devices ("
 MODULE_ALIAS_BLOCKDEV_MAJOR(DRBD_MAJOR);
 
 #include <linux/moduleparam.h>
-/* allow_open_on_secondary */
-MODULE_PARM_DESC(allow_oos, "DONT USE!");
 /* thanks to these macros, if compiled into the kernel (not-module),
- * this becomes the boot parameter drbd.minor_count */
-module_param(minor_count, uint, 0444);
-module_param(disable_sendpage, bool, 0644);
-module_param(allow_oos, bool, 0);
-module_param(proc_details, int, 0644);
+ * these become boot parameters (e.g., drbd.minor_count) */
 
 #ifdef CONFIG_DRBD_FAULT_INJECTION
-int enable_faults;
-int fault_rate;
-static int fault_count;
-int fault_devs;
+int drbd_enable_faults;
+int drbd_fault_rate;
+static int drbd_fault_count;
+static int drbd_fault_devs;
 /* bitmap of enabled faults */
-module_param(enable_faults, int, 0664);
+module_param_named(enable_faults, drbd_enable_faults, int, 0664);
 /* fault rate % value - applies to all enabled faults */
-module_param(fault_rate, int, 0664);
+module_param_named(fault_rate, drbd_fault_rate, int, 0664);
 /* count of faults inserted */
-module_param(fault_count, int, 0664);
+module_param_named(fault_count, drbd_fault_count, int, 0664);
 /* bitmap of devices to insert faults on */
-module_param(fault_devs, int, 0644);
+module_param_named(fault_devs, drbd_fault_devs, int, 0644);
 #endif
 
-/* module parameter, defined */
-unsigned int minor_count = DRBD_MINOR_COUNT_DEF;
-bool disable_sendpage;
-bool allow_oos;
-int proc_details;       /* Detail level in proc drbd*/
-
+/* module parameters we can keep static */
+static bool drbd_allow_oos; /* allow_open_on_secondary */
+static bool drbd_disable_sendpage;
+MODULE_PARM_DESC(allow_oos, "DONT USE!");
+module_param_named(allow_oos, drbd_allow_oos, bool, 0);
+module_param_named(disable_sendpage, drbd_disable_sendpage, bool, 0644);
+
+/* module parameters we share */
+int drbd_proc_details; /* Detail level in proc drbd*/
+module_param_named(proc_details, drbd_proc_details, int, 0644);
+/* module parameters shared with defaults */
+unsigned int drbd_minor_count = DRBD_MINOR_COUNT_DEF;
 /* Module parameter for setting the user mode helper program
  * to run. Default is /sbin/drbdadm */
 char drbd_usermode_helper[80] = "/sbin/drbdadm";
-
+module_param_named(minor_count, drbd_minor_count, uint, 0444);
 module_param_string(usermode_helper, drbd_usermode_helper, sizeof(drbd_usermode_helper), 0644);
 
 /* in 2.6.x, our device mapping and config info contains our virtual gendisks
@@ -1562,7 +1562,7 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
 	 * put_page(); and would cause either a VM_BUG directly, or
 	 * __page_cache_release a page that would actually still be referenced
 	 * by someone, leading to some obscure delayed Oops somewhere else. */
-	if (disable_sendpage || (page_count(page) < 1) || PageSlab(page))
+	if (drbd_disable_sendpage || (page_count(page) < 1) || PageSlab(page))
 		return _drbd_no_send_page(peer_device, page, offset, size, msg_flags);
 
 	msg_flags |= MSG_NOSIGNAL;
@@ -1934,7 +1934,7 @@ static int drbd_open(struct block_device *bdev, fmode_t mode)
 	if (device->state.role != R_PRIMARY) {
 		if (mode & FMODE_WRITE)
 			rv = -EROFS;
-		else if (!allow_oos)
+		else if (!drbd_allow_oos)
 			rv = -EMEDIUMTYPE;
 	}
 
@@ -2142,7 +2142,7 @@ static void drbd_destroy_mempools(void)
 static int drbd_create_mempools(void)
 {
 	struct page *page;
-	const int number = (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * minor_count;
+	const int number = (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * drbd_minor_count;
 	int i;
 
 	/* prepare our caches and mempools */
@@ -2984,8 +2984,8 @@ static int __init drbd_init(void)
 {
 	int err;
 
-	if (minor_count < DRBD_MINOR_COUNT_MIN || minor_count > DRBD_MINOR_COUNT_MAX) {
-		pr_err("invalid minor_count (%d)\n", minor_count);
+	if (drbd_minor_count < DRBD_MINOR_COUNT_MIN || drbd_minor_count > DRBD_MINOR_COUNT_MAX) {
+		pr_err("invalid minor_count (%d)\n", drbd_minor_count);
 #ifdef MODULE
 		return -EINVAL;
 #else
@@ -3912,12 +3912,12 @@ _drbd_insert_fault(struct drbd_device *device, unsigned int type)
 	static struct fault_random_state rrs = {0, 0};
 
 	unsigned int ret = (
-		(fault_devs == 0 ||
-			((1 << device_to_minor(device)) & fault_devs) != 0) &&
-		(((_drbd_fault_random(&rrs) % 100) + 1) <= fault_rate));
+		(drbd_fault_devs == 0 ||
+			((1 << device_to_minor(device)) & drbd_fault_devs) != 0) &&
+		(((_drbd_fault_random(&rrs) % 100) + 1) <= drbd_fault_rate));
 
 	if (ret) {
-		fault_count++;
+		drbd_fault_count++;
 
 		if (__ratelimit(&drbd_ratelimit_state))
 			drbd_warn(device, "***Simulating %s failure\n",
diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index fc0f627..582caeb 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -179,7 +179,7 @@ static void drbd_syncer_progress(struct drbd_device *device, struct seq_file *se
 	seq_printf_with_thousands_grouping(seq, dbdt);
 	seq_puts(seq, " (");
 	/* ------------------------- ~3s average ------------------------ */
-	if (proc_details >= 1) {
+	if (drbd_proc_details >= 1) {
 		/* this is what drbd_rs_should_slow_down() uses */
 		i = (device->rs_last_mark + DRBD_SYNC_MARKS-1) % DRBD_SYNC_MARKS;
 		dt = (jiffies - device->rs_mark_time[i]) / HZ;
@@ -209,7 +209,7 @@ static void drbd_syncer_progress(struct drbd_device *device, struct seq_file *se
 	}
 	seq_printf(seq, " K/sec%s\n", stalled ? " (stalled)" : "");
 
-	if (proc_details >= 1) {
+	if (drbd_proc_details >= 1) {
 		/* 64 bit:
 		 * we convert to sectors in the display below. */
 		unsigned long bm_bits = drbd_bm_bits(device);
@@ -332,13 +332,13 @@ static int drbd_seq_show(struct seq_file *seq, void *v)
 		    state.conn == C_VERIFY_T)
 			drbd_syncer_progress(device, seq, state);
 
-		if (proc_details >= 1 && get_ldev_if_state(device, D_FAILED)) {
+		if (drbd_proc_details >= 1 && get_ldev_if_state(device, D_FAILED)) {
 			lc_seq_printf_stats(seq, device->resync);
 			lc_seq_printf_stats(seq, device->act_log);
 			put_ldev(device);
 		}
 
-		if (proc_details >= 2)
+		if (drbd_proc_details >= 2)
 			seq_printf(seq, "\tblocked on activity log: %d\n", atomic_read(&device->ap_actlog_cnt));
 	}
 	rcu_read_unlock();
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 5e090a1..4e8a543 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -332,7 +332,7 @@ static void drbd_free_pages(struct drbd_device *device, struct page *page, int i
 	if (page == NULL)
 		return;
 
-	if (drbd_pp_vacant > (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * minor_count)
+	if (drbd_pp_vacant > (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * drbd_minor_count)
 		i = page_chain_free(page);
 	else {
 		struct page *tmp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 16/17] drbd: abort drbd_start_resync if there is no connection
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (14 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 15/17] drbd: move global variables to drbd namespace and make some static Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  2017-08-24 21:23 ` [PATCH 17/17] drbd: switch from kmalloc() to kmalloc_array() Philipp Reisner
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Roland Kammerer <roland.kammerer@linbit.com>

This was found by a static analysis tool. While highly unlikely, be sure
to return without dereferencing the NULL pointer.

Reported-by: Shaobo <shaobo@cs.utah.edu>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index f0717a9..03471b3 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1756,6 +1756,11 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
 		return;
 	}
 
+	if (!connection) {
+		drbd_err(device, "No connection to peer, aborting!\n");
+		return;
+	}
+
 	if (!test_bit(B_RS_H_DONE, &device->flags)) {
 		if (side == C_SYNC_TARGET) {
 			/* Since application IO was locked out during C_WF_BITMAP_T and
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 17/17] drbd: switch from kmalloc() to kmalloc_array()
  2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
                   ` (15 preceding siblings ...)
  2017-08-24 21:23 ` [PATCH 16/17] drbd: abort drbd_start_resync if there is no connection Philipp Reisner
@ 2017-08-24 21:23 ` Philipp Reisner
  16 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-24 21:23 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Roland Kammerer <roland.kammerer@linbit.com>

We had one call to kmalloc that actually allocates an array. Switch that
one to the kmalloc_array() function.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 4e8a543..796eaf3 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -4126,7 +4126,7 @@ static int receive_uuids(struct drbd_connection *connection, struct packet_info
 		return config_unknown_volume(connection, pi);
 	device = peer_device->device;
 
-	p_uuid = kmalloc(sizeof(u64)*UI_EXTENDED_SIZE, GFP_NOIO);
+	p_uuid = kmalloc_array(UI_EXTENDED_SIZE, sizeof(*p_uuid), GFP_NOIO);
 	if (!p_uuid) {
 		drbd_err(device, "kmalloc of p_uuid failed\n");
 		return false;
diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 002611c..2d02593 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -51,7 +51,7 @@
 #endif
 
 extern const char *drbd_buildtag(void);
-#define REL_VERSION "8.4.7"
+#define REL_VERSION "8.4.10"
 #define API_VERSION 1
 #define PRO_VERSION_MIN 86
 #define PRO_VERSION_MAX 101
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug
  2017-08-24 21:22 ` [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug Philipp Reisner
@ 2017-08-25 17:26   ` Jens Axboe
  2017-08-28 13:35     ` Philipp Reisner
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2017-08-25 17:26 UTC (permalink / raw)
  To: Philipp Reisner, linux-kernel; +Cc: drbd-dev

On 08/24/2017 03:22 PM, Philipp Reisner wrote:
> +#ifndef blk_queue_plugged
> +struct drbd_plug_cb {
> +	struct blk_plug_cb cb;
> +	struct drbd_request *most_recent_req;
> +	/* do we need more? */
> +};

What is this blk_queue_plugged ifdef?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug
  2017-08-25 17:26   ` Jens Axboe
@ 2017-08-28 13:35     ` Philipp Reisner
  2017-08-28 13:43       ` Philipp Reisner
  0 siblings, 1 reply; 21+ messages in thread
From: Philipp Reisner @ 2017-08-28 13:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, drbd-dev

Am Freitag, 25. August 2017, 19:26:20 CEST schrieb Jens Axboe:
> On 08/24/2017 03:22 PM, Philipp Reisner wrote:
> > +#ifndef blk_queue_plugged
> > +struct drbd_plug_cb {
> > +	struct blk_plug_cb cb;
> > +	struct drbd_request *most_recent_req;
> > +	/* do we need more? */
> > +};
> 
> What is this blk_queue_plugged ifdef?

Facepalm. That escaped from out out-of-tree code. It has compat code
so that it can be compiled with older (distro) kernels. I will resend
it without that.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug
  2017-08-28 13:35     ` Philipp Reisner
@ 2017-08-28 13:43       ` Philipp Reisner
  0 siblings, 0 replies; 21+ messages in thread
From: Philipp Reisner @ 2017-08-28 13:43 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: drbd-dev

From: Lars Ellenberg <lars.ellenberg@linbit.com>

Recently, drbd_recv_header() was changed to potentially
implicitly "unplug" the backend device(s), in case there
is currently nothing to receive.

Be more explicit about it: re-introduce the original drbd_recv_header(),
and introduce a new drbd_recv_header_maybe_unplug() for use by the
receiver "main loop".

Using explicit plugging via blk_start_plug(); blk_finish_plug();
really helps the io-scheduler of the backend with merging requests.

Wrap the receiver "main loop" with such a plug.
Also catch unplug events on the Primary,
and try to propagate.

This is performance relevant.  Without this, if the receiving side does
not merge requests, number of IOPS on the peer can me significantly
higher than IOPS on the Primary, and can easily become the bottleneck.

Together, both changes should help to reduce the number of IOPS
as seen on the backend of the receiving side, by increasing
the chance of merging mergable requests, without trading latency
for more throughput.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
---
 drivers/block/drbd/drbd_int.h      |  5 +++-
 drivers/block/drbd/drbd_main.c     | 13 +++++++++
 drivers/block/drbd/drbd_receiver.c | 47 +++++++++++++++++++++++++++++---
 drivers/block/drbd/drbd_req.c      | 55 ++++++++++++++++++++++++++++++++++++++
 drivers/block/drbd/drbd_req.h      |  6 +++++
 drivers/block/drbd/drbd_worker.c   | 22 +++++++++++----
 6 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 819f9d0..74a7d0b 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -745,6 +745,8 @@ struct drbd_connection {
 	unsigned current_tle_writes;	/* writes seen within this tl epoch */
 
 	unsigned long last_reconnect_jif;
+	/* empty member on older kernels without blk_start_plug() */
+	struct blk_plug receiver_plug;
 	struct drbd_thread receiver;
 	struct drbd_thread worker;
 	struct drbd_thread ack_receiver;
@@ -1131,7 +1133,8 @@ extern void conn_send_sr_reply(struct drbd_connection *connection, enum drbd_sta
 extern int drbd_send_rs_deallocated(struct drbd_peer_device *, struct drbd_peer_request *);
 extern void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *ldev);
 extern void drbd_device_cleanup(struct drbd_device *device);
-void drbd_print_uuids(struct drbd_device *device, const char *text);
+extern void drbd_print_uuids(struct drbd_device *device, const char *text);
+extern void drbd_queue_unplug(struct drbd_device *device);
 
 extern void conn_md_sync(struct drbd_connection *connection);
 extern void drbd_md_write(struct drbd_device *device, void *buffer);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index e2ed28d..a3b2ee7 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1952,6 +1952,19 @@ static void drbd_release(struct gendisk *gd, fmode_t mode)
 	mutex_unlock(&drbd_main_mutex);
 }
 
+/* need to hold resource->req_lock */
+void drbd_queue_unplug(struct drbd_device *device)
+{
+	if (device->state.pdsk >= D_INCONSISTENT && device->state.conn >= C_CONNECTED) {
+		D_ASSERT(device, device->state.role == R_PRIMARY);
+		if (test_and_clear_bit(UNPLUG_REMOTE, &device->flags)) {
+			drbd_queue_work_if_unqueued(
+				&first_peer_device(device)->connection->sender_work,
+				&device->unplug_work);
+		}
+	}
+}
+
 static void drbd_set_defaults(struct drbd_device *device)
 {
 	/* Beware! The actual layout differs
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index ece6e5d..1b3f439 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1194,6 +1194,14 @@ static int decode_header(struct drbd_connection *connection, void *header, struc
 	return 0;
 }
 
+static void drbd_unplug_all_devices(struct drbd_connection *connection)
+{
+	if (current->plug == &connection->receiver_plug) {
+		blk_finish_plug(&connection->receiver_plug);
+		blk_start_plug(&connection->receiver_plug);
+	} /* else: maybe just schedule() ?? */
+}
+
 static int drbd_recv_header(struct drbd_connection *connection, struct packet_info *pi)
 {
 	void *buffer = connection->data.rbuf;
@@ -1209,6 +1217,36 @@ static int drbd_recv_header(struct drbd_connection *connection, struct packet_in
 	return err;
 }
 
+static int drbd_recv_header_maybe_unplug(struct drbd_connection *connection, struct packet_info *pi)
+{
+	void *buffer = connection->data.rbuf;
+	unsigned int size = drbd_header_size(connection);
+	int err;
+
+	err = drbd_recv_short(connection->data.socket, buffer, size, MSG_NOSIGNAL|MSG_DONTWAIT);
+	if (err != size) {
+		/* If we have nothing in the receive buffer now, to reduce
+		 * application latency, try to drain the backend queues as
+		 * quickly as possible, and let remote TCP know what we have
+		 * received so far. */
+		if (err == -EAGAIN) {
+			drbd_tcp_quickack(connection->data.socket);
+			drbd_unplug_all_devices(connection);
+		}
+		if (err > 0) {
+			buffer += err;
+			size -= err;
+		}
+		err = drbd_recv_all_warn(connection, buffer, size);
+		if (err)
+			return err;
+	}
+
+	err = decode_header(connection, connection->data.rbuf, pi);
+	connection->last_received = jiffies;
+
+	return err;
+}
 /* This is blkdev_issue_flush, but asynchronous.
  * We want to submit to all component volumes in parallel,
  * then wait for all completions.
@@ -4882,8 +4920,8 @@ static void drbdd(struct drbd_connection *connection)
 		struct data_cmd const *cmd;
 
 		drbd_thread_current_set_cpu(&connection->receiver);
-		update_receiver_timing_details(connection, drbd_recv_header);
-		if (drbd_recv_header(connection, &pi))
+		update_receiver_timing_details(connection, drbd_recv_header_maybe_unplug);
+		if (drbd_recv_header_maybe_unplug(connection, &pi))
 			goto err_out;
 
 		cmd = &drbd_cmd_handler[pi.cmd];
@@ -5375,8 +5413,11 @@ int drbd_receiver(struct drbd_thread *thi)
 		}
 	} while (h == 0);
 
-	if (h > 0)
+	if (h > 0) {
+		blk_start_plug(&connection->receiver_plug);
 		drbdd(connection);
+		blk_finish_plug(&connection->receiver_plug);
+	}
 
 	conn_disconnect(connection);
 
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 447c975..5cf43f13 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1279,6 +1279,56 @@ static bool may_do_writes(struct drbd_device *device)
 	return s.disk == D_UP_TO_DATE || s.pdsk == D_UP_TO_DATE;
 }
 
+struct drbd_plug_cb {
+	struct blk_plug_cb cb;
+	struct drbd_request *most_recent_req;
+	/* do we need more? */
+};
+
+static void drbd_unplug(struct blk_plug_cb *cb, bool from_schedule)
+{
+	struct drbd_plug_cb *plug = container_of(cb, struct drbd_plug_cb, cb);
+	struct drbd_resource *resource = plug->cb.data;
+	struct drbd_request *req = plug->most_recent_req;
+
+	if (!req)
+		return;
+
+	spin_lock_irq(&resource->req_lock);
+	/* In case the sender did not process it yet, raise the flag to
+	 * have it followed with P_UNPLUG_REMOTE just after. */
+	req->rq_state |= RQ_UNPLUG;
+	/* but also queue a generic unplug */
+	drbd_queue_unplug(req->device);
+	spin_unlock_irq(&resource->req_lock);
+	kref_put(&req->kref, drbd_req_destroy);
+}
+
+static struct drbd_plug_cb* drbd_check_plugged(struct drbd_resource *resource)
+{
+	/* A lot of text to say
+	 * return (struct drbd_plug_cb*)blk_check_plugged(); */
+	struct drbd_plug_cb *plug;
+	struct blk_plug_cb *cb = blk_check_plugged(drbd_unplug, resource, sizeof(*plug));
+
+	if (cb)
+		plug = container_of(cb, struct drbd_plug_cb, cb);
+	else
+		plug = NULL;
+	return plug;
+}
+
+static void drbd_update_plug(struct drbd_plug_cb *plug, struct drbd_request *req)
+{
+	struct drbd_request *tmp = plug->most_recent_req;
+	/* Will be sent to some peer.
+	 * Remember to tag it with UNPLUG_REMOTE on unplug */
+	kref_get(&req->kref);
+	plug->most_recent_req = req;
+	if (tmp)
+		kref_put(&tmp->kref, drbd_req_destroy);
+}
+
 static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request *req)
 {
 	struct drbd_resource *resource = device->resource;
@@ -1287,6 +1337,8 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
 	bool no_remote = false;
 	bool submit_private_bio = false;
 
+	struct drbd_plug_cb *plug = drbd_check_plugged(resource);
+
 	spin_lock_irq(&resource->req_lock);
 	if (rw == WRITE) {
 		/* This may temporarily give up the req_lock,
@@ -1351,6 +1403,9 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
 			no_remote = true;
 	}
 
+	if (plug != NULL && no_remote == false)
+		drbd_update_plug(plug, req);
+
 	/* If it took the fast path in drbd_request_prepare, add it here.
 	 * The slow path has added it already. */
 	if (list_empty(&req->req_pending_master_completion))
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index 9e1866a..a2254f8 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -212,6 +212,11 @@ enum drbd_req_state_bits {
 	/* Should call drbd_al_complete_io() for this request... */
 	__RQ_IN_ACT_LOG,
 
+	/* This was the most recent request during some blk_finish_plug()
+	 * or its implicit from-schedule equivalent.
+	 * We may use it as hint to send a P_UNPLUG_REMOTE */
+	__RQ_UNPLUG,
+
 	/* The peer has sent a retry ACK */
 	__RQ_POSTPONED,
 
@@ -249,6 +254,7 @@ enum drbd_req_state_bits {
 #define RQ_WSAME           (1UL << __RQ_WSAME)
 #define RQ_UNMAP           (1UL << __RQ_UNMAP)
 #define RQ_IN_ACT_LOG      (1UL << __RQ_IN_ACT_LOG)
+#define RQ_UNPLUG          (1UL << __RQ_UNPLUG)
 #define RQ_POSTPONED	   (1UL << __RQ_POSTPONED)
 #define RQ_COMPLETION_SUSP (1UL << __RQ_COMPLETION_SUSP)
 #define RQ_EXP_RECEIVE_ACK (1UL << __RQ_EXP_RECEIVE_ACK)
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index c268d88..2745db2 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1382,18 +1382,22 @@ static int drbd_send_barrier(struct drbd_connection *connection)
 	return conn_send_command(connection, sock, P_BARRIER, sizeof(*p), NULL, 0);
 }
 
+static int pd_send_unplug_remote(struct drbd_peer_device *pd)
+{
+	struct drbd_socket *sock = &pd->connection->data;
+	if (!drbd_prepare_command(pd, sock))
+		return -EIO;
+	return drbd_send_command(pd, sock, P_UNPLUG_REMOTE, 0, NULL, 0);
+}
+
 int w_send_write_hint(struct drbd_work *w, int cancel)
 {
 	struct drbd_device *device =
 		container_of(w, struct drbd_device, unplug_work);
-	struct drbd_socket *sock;
 
 	if (cancel)
 		return 0;
-	sock = &first_peer_device(device)->connection->data;
-	if (!drbd_prepare_command(first_peer_device(device), sock))
-		return -EIO;
-	return drbd_send_command(first_peer_device(device), sock, P_UNPLUG_REMOTE, 0, NULL, 0);
+	return pd_send_unplug_remote(first_peer_device(device));
 }
 
 static void re_init_if_first_write(struct drbd_connection *connection, unsigned int epoch)
@@ -1455,6 +1459,7 @@ int w_send_dblock(struct drbd_work *w, int cancel)
 	struct drbd_device *device = req->device;
 	struct drbd_peer_device *const peer_device = first_peer_device(device);
 	struct drbd_connection *connection = peer_device->connection;
+	bool do_send_unplug = req->rq_state & RQ_UNPLUG;
 	int err;
 
 	if (unlikely(cancel)) {
@@ -1470,6 +1475,9 @@ int w_send_dblock(struct drbd_work *w, int cancel)
 	err = drbd_send_dblock(peer_device, req);
 	req_mod(req, err ? SEND_FAILED : HANDED_OVER_TO_NETWORK);
 
+	if (do_send_unplug && !err)
+		pd_send_unplug_remote(peer_device);
+
 	return err;
 }
 
@@ -1484,6 +1492,7 @@ int w_send_read_req(struct drbd_work *w, int cancel)
 	struct drbd_device *device = req->device;
 	struct drbd_peer_device *const peer_device = first_peer_device(device);
 	struct drbd_connection *connection = peer_device->connection;
+	bool do_send_unplug = req->rq_state & RQ_UNPLUG;
 	int err;
 
 	if (unlikely(cancel)) {
@@ -1501,6 +1510,9 @@ int w_send_read_req(struct drbd_work *w, int cancel)
 
 	req_mod(req, err ? SEND_FAILED : HANDED_OVER_TO_NETWORK);
 
+	if (do_send_unplug && !err)
+		pd_send_unplug_remote(peer_device);
+
 	return err;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-08-28 13:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-24 21:22 [PATCH 00/17] DRBD updates Philipp Reisner
2017-08-24 21:22 ` [PATCH 01/17] drbd: introduce drbd_recv_header_maybe_unplug Philipp Reisner
2017-08-25 17:26   ` Jens Axboe
2017-08-28 13:35     ` Philipp Reisner
2017-08-28 13:43       ` Philipp Reisner
2017-08-24 21:22 ` [PATCH 02/17] drbd: change list_for_each_safe to while(list_first_entry_or_null) Philipp Reisner
2017-08-24 21:23 ` [PATCH 03/17] drbd: add explicit plugging when submitting batches Philipp Reisner
2017-08-24 21:23 ` [PATCH 04/17] drbd: Send P_NEG_ACK upon write error in protocol != C Philipp Reisner
2017-08-24 21:23 ` [PATCH 05/17] drbd: mark symbols static where possible Philipp Reisner
2017-08-24 21:23 ` [PATCH 06/17] drbd: Fix resource role for newly created resources in events2 Philipp Reisner
2017-08-24 21:23 ` [PATCH 07/17] drbd: new disk-option disable-write-same Philipp Reisner
2017-08-24 21:23 ` [PATCH 08/17] drbd: fix potential get_ldev/put_ldev refcount imbalance during attach Philipp Reisner
2017-08-24 21:23 ` [PATCH 09/17] drbd: Use setup_timer() instead of init_timer() to simplify the code Philipp Reisner
2017-08-24 21:23 ` [PATCH 10/17] drbd: fix rmmod cleanup, remove _all_ debugfs entries Philipp Reisner
2017-08-24 21:23 ` [PATCH 11/17] drbd: A single dot should be put into a sequence Philipp Reisner
2017-08-24 21:23 ` [PATCH 12/17] drbd: fix potential deadlock when trying to detach during handshake Philipp Reisner
2017-08-24 21:23 ` [PATCH 13/17] drbd: fix race between handshake and admin disconnect/down Philipp Reisner
2017-08-24 21:23 ` [PATCH 14/17] drbd: rename "usermode_helper" to "drbd_usermode_helper" Philipp Reisner
2017-08-24 21:23 ` [PATCH 15/17] drbd: move global variables to drbd namespace and make some static Philipp Reisner
2017-08-24 21:23 ` [PATCH 16/17] drbd: abort drbd_start_resync if there is no connection Philipp Reisner
2017-08-24 21:23 ` [PATCH 17/17] drbd: switch from kmalloc() to kmalloc_array() Philipp Reisner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).