All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH dlm/next 0/4] fs: dlm: cleanup plock code
@ 2022-02-16 15:53 Alexander Aring
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON Alexander Aring
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 15:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

this patch-series cleanups some plock code handling and remove the
plock_op vs plock_xop casting which had made trouble in the past. Also
I add the logging improvment handling to see when a plock op was not
being found, the wait was interrupted before.

This followed up cleanup patch series should be applied on top of the
previous stable patch "[PATCH dlm/next] fs: dlm: fix plock invalid read".

Thanks.

- Alex

Alexander Aring (4):
  fs: dlm: replace sanity checks with WARN_ON
  fs: dlm: cleanup plock_op vs plock_xop
  fs: dlm: rearrange async condition return
  fs: dlm: improve plock logging if interrupted

 fs/dlm/plock.c | 137 +++++++++++++++++++++++--------------------------
 1 file changed, 64 insertions(+), 73 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-16 15:53 [Cluster-devel] [PATCH dlm/next 0/4] fs: dlm: cleanup plock code Alexander Aring
@ 2022-02-16 15:53 ` Alexander Aring
  2022-02-16 16:08   ` Andreas Gruenbacher
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 2/4] fs: dlm: cleanup plock_op vs plock_xop Alexander Aring
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 15:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

There are several sanity checks and recover handling if they occur in
the dlm plock handling. They should never occur otherwise we have a bug
in the code. To make such bugs more visible we remove the recover
handling and add a WARN_ON() on those sanity checks.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 32 ++++----------------------------
 1 file changed, 4 insertions(+), 28 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index a10d2bcfe75a..55fba2f0234f 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 		goto out;
 	}
 
-	spin_lock(&ops_lock);
-	if (!list_empty(&op->list)) {
-		log_error(ls, "dlm_posix_lock: op on list %llx",
-			  (unsigned long long)number);
-		list_del(&op->list);
-	}
-	spin_unlock(&ops_lock);
+	WARN_ON(!list_empty(&op->list));
 
 	rv = op->info.rv;
 
@@ -190,13 +184,7 @@ static int dlm_plock_callback(struct plock_op *op)
 	struct plock_xop *xop = (struct plock_xop *)op;
 	int rv = 0;
 
-	spin_lock(&ops_lock);
-	if (!list_empty(&op->list)) {
-		log_print("dlm_plock_callback: op on list %llx",
-			  (unsigned long long)op->info.number);
-		list_del(&op->list);
-	}
-	spin_unlock(&ops_lock);
+	WARN_ON(!list_empty(&op->list));
 
 	/* check if the following 2 are still valid or make a copy */
 	file = xop->file;
@@ -289,13 +277,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 	send_op(op);
 	wait_event(recv_wq, (op->done != 0));
 
-	spin_lock(&ops_lock);
-	if (!list_empty(&op->list)) {
-		log_error(ls, "dlm_posix_unlock: op on list %llx",
-			  (unsigned long long)number);
-		list_del(&op->list);
-	}
-	spin_unlock(&ops_lock);
+	WARN_ON(!list_empty(&op->list));
 
 	rv = op->info.rv;
 
@@ -343,13 +325,7 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 	send_op(op);
 	wait_event(recv_wq, (op->done != 0));
 
-	spin_lock(&ops_lock);
-	if (!list_empty(&op->list)) {
-		log_error(ls, "dlm_posix_get: op on list %llx",
-			  (unsigned long long)number);
-		list_del(&op->list);
-	}
-	spin_unlock(&ops_lock);
+	WARN_ON(!list_empty(&op->list));
 
 	/* info.rv from userspace is 1 for conflict, 0 for no-conflict,
 	   -ENOENT if there are no locks on the file */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 2/4] fs: dlm: cleanup plock_op vs plock_xop
  2022-02-16 15:53 [Cluster-devel] [PATCH dlm/next 0/4] fs: dlm: cleanup plock code Alexander Aring
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON Alexander Aring
@ 2022-02-16 15:53 ` Alexander Aring
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 3/4] fs: dlm: rearrange async condition return Alexander Aring
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 4/4] fs: dlm: improve plock logging if interrupted Alexander Aring
  3 siblings, 0 replies; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 15:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Lately the different casting between plock_op and plock_xop and list
holders which was involved showed some issues which were hard to see.
This patch removes the "plock_xop" structure and introduces a
"struct plock_async_data". This structure will be set in "struct plock_op"
in case of asynchronous lock handling as the original "plock_xop" was
made for. There is no need anymore to cast pointers around for
additional fields in case of asynchronous lock handling.  As disadvantage
another allocation was introduces but only needed in the asynchronous
case which is currently only used in combination with nfs lockd.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 77 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 31 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 55fba2f0234f..4e60dd657cb6 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -19,20 +19,20 @@ static struct list_head recv_list;
 static wait_queue_head_t send_wq;
 static wait_queue_head_t recv_wq;
 
-struct plock_op {
-	struct list_head list;
-	int done;
-	struct dlm_plock_info info;
-	int (*callback)(struct file_lock *fl, int result);
-};
-
-struct plock_xop {
-	struct plock_op xop;
+struct plock_async_data {
 	void *fl;
 	void *file;
 	struct file_lock flc;
+	int (*callback)(struct file_lock *fl, int result);
 };
 
+struct plock_op {
+	struct list_head list;
+	int done;
+	struct dlm_plock_info info;
+	/* if set indicates async handling */
+	struct plock_async_data *data;
+};
 
 static inline void set_version(struct dlm_plock_info *info)
 {
@@ -58,6 +58,12 @@ static int check_version(struct dlm_plock_info *info)
 	return 0;
 }
 
+static void dlm_release_plock_op(struct plock_op *op)
+{
+	kfree(op->data);
+	kfree(op);
+}
+
 static void send_op(struct plock_op *op)
 {
 	set_version(&op->info);
@@ -101,22 +107,21 @@ static void do_unlock_close(struct dlm_ls *ls, u64 number,
 int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 		   int cmd, struct file_lock *fl)
 {
+	struct plock_async_data *op_data;
 	struct dlm_ls *ls;
 	struct plock_op *op;
-	struct plock_xop *xop;
 	int rv;
 
 	ls = dlm_find_lockspace_local(lockspace);
 	if (!ls)
 		return -EINVAL;
 
-	xop = kzalloc(sizeof(*xop), GFP_NOFS);
-	if (!xop) {
+	op = kzalloc(sizeof(*op), GFP_NOFS);
+	if (!op) {
 		rv = -ENOMEM;
 		goto out;
 	}
 
-	op = &xop->xop;
 	op->info.optype		= DLM_PLOCK_OP_LOCK;
 	op->info.pid		= fl->fl_pid;
 	op->info.ex		= (fl->fl_type == F_WRLCK);
@@ -125,22 +130,32 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 	op->info.number		= number;
 	op->info.start		= fl->fl_start;
 	op->info.end		= fl->fl_end;
+	/* async handling */
 	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
+		op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
+		if (!op_data) {
+			dlm_release_plock_op(op);
+			rv = -ENOMEM;
+			goto out;
+		}
+
 		/* fl_owner is lockd which doesn't distinguish
 		   processes on the nfs client */
 		op->info.owner	= (__u64) fl->fl_pid;
-		op->callback	= fl->fl_lmops->lm_grant;
-		locks_init_lock(&xop->flc);
-		locks_copy_lock(&xop->flc, fl);
-		xop->fl		= fl;
-		xop->file	= file;
+		op_data->callback = fl->fl_lmops->lm_grant;
+		locks_init_lock(&op_data->flc);
+		locks_copy_lock(&op_data->flc, fl);
+		op_data->fl		= fl;
+		op_data->file	= file;
+
+		op->data = op_data;
 	} else {
 		op->info.owner	= (__u64)(long) fl->fl_owner;
 	}
 
 	send_op(op);
 
-	if (!op->callback) {
+	if (!op->data) {
 		rv = wait_event_interruptible(recv_wq, (op->done != 0));
 		if (rv == -ERESTARTSYS) {
 			log_debug(ls, "dlm_posix_lock: wait killed %llx",
@@ -148,7 +163,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 			spin_lock(&ops_lock);
 			list_del(&op->list);
 			spin_unlock(&ops_lock);
-			kfree(xop);
+			dlm_release_plock_op(op);
 			do_unlock_close(ls, number, file, fl);
 			goto out;
 		}
@@ -167,7 +182,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 				  (unsigned long long)number);
 	}
 
-	kfree(xop);
+	dlm_release_plock_op(op);
 out:
 	dlm_put_lockspace(ls);
 	return rv;
@@ -177,20 +192,20 @@ EXPORT_SYMBOL_GPL(dlm_posix_lock);
 /* Returns failure iff a successful lock operation should be canceled */
 static int dlm_plock_callback(struct plock_op *op)
 {
+	struct plock_async_data *op_data = op->data;
 	struct file *file;
 	struct file_lock *fl;
 	struct file_lock *flc;
 	int (*notify)(struct file_lock *fl, int result) = NULL;
-	struct plock_xop *xop = (struct plock_xop *)op;
 	int rv = 0;
 
 	WARN_ON(!list_empty(&op->list));
 
 	/* check if the following 2 are still valid or make a copy */
-	file = xop->file;
-	flc = &xop->flc;
-	fl = xop->fl;
-	notify = op->callback;
+	file = op_data->file;
+	flc = &op_data->flc;
+	fl = op_data->fl;
+	notify = op_data->callback;
 
 	if (op->info.rv) {
 		notify(fl, op->info.rv);
@@ -221,7 +236,7 @@ static int dlm_plock_callback(struct plock_op *op)
 	}
 
 out:
-	kfree(xop);
+	dlm_release_plock_op(op);
 	return rv;
 }
 
@@ -285,7 +300,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 		rv = 0;
 
 out_free:
-	kfree(op);
+	dlm_release_plock_op(op);
 out:
 	dlm_put_lockspace(ls);
 	fl->fl_flags = fl_flags;
@@ -345,7 +360,7 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 		rv = 0;
 	}
 
-	kfree(op);
+	dlm_release_plock_op(op);
 out:
 	dlm_put_lockspace(ls);
 	return rv;
@@ -381,7 +396,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
 	   (the process did not make an unlock call). */
 
 	if (op->info.flags & DLM_PLOCK_FL_CLOSE)
-		kfree(op);
+		dlm_release_plock_op(op);
 
 	if (copy_to_user(u, &info, sizeof(info)))
 		return -EFAULT;
@@ -413,7 +428,7 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
 		    op->info.owner == info.owner) {
 			list_del_init(&op->list);
 			memcpy(&op->info, &info, sizeof(info));
-			if (op->callback)
+			if (op->data)
 				do_callback = 1;
 			else
 				op->done = 1;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 3/4] fs: dlm: rearrange async condition return
  2022-02-16 15:53 [Cluster-devel] [PATCH dlm/next 0/4] fs: dlm: cleanup plock code Alexander Aring
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON Alexander Aring
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 2/4] fs: dlm: cleanup plock_op vs plock_xop Alexander Aring
@ 2022-02-16 15:53 ` Alexander Aring
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 4/4] fs: dlm: improve plock logging if interrupted Alexander Aring
  3 siblings, 0 replies; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 15:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch moves the return of FILE_LOCK_DEFERRED a little bit earlier
than checking afterwards again if the request was an asynchronous request.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 4e60dd657cb6..757d9013788a 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -149,26 +149,25 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 		op_data->file	= file;
 
 		op->data = op_data;
+
+		send_op(op);
+		rv = FILE_LOCK_DEFERRED;
+		goto out;
 	} else {
 		op->info.owner	= (__u64)(long) fl->fl_owner;
 	}
 
 	send_op(op);
 
-	if (!op->data) {
-		rv = wait_event_interruptible(recv_wq, (op->done != 0));
-		if (rv == -ERESTARTSYS) {
-			log_debug(ls, "dlm_posix_lock: wait killed %llx",
-				  (unsigned long long)number);
-			spin_lock(&ops_lock);
-			list_del(&op->list);
-			spin_unlock(&ops_lock);
-			dlm_release_plock_op(op);
-			do_unlock_close(ls, number, file, fl);
-			goto out;
-		}
-	} else {
-		rv = FILE_LOCK_DEFERRED;
+	rv = wait_event_interruptible(recv_wq, (op->done != 0));
+	if (rv == -ERESTARTSYS) {
+		log_debug(ls, "%s: wait killed %llx", __func__,
+			  (unsigned long long)number);
+		spin_lock(&ops_lock);
+		list_del(&op->list);
+		spin_unlock(&ops_lock);
+		dlm_release_plock_op(op);
+		do_unlock_close(ls, number, file, fl);
 		goto out;
 	}
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 4/4] fs: dlm: improve plock logging if interrupted
  2022-02-16 15:53 [Cluster-devel] [PATCH dlm/next 0/4] fs: dlm: cleanup plock code Alexander Aring
                   ` (2 preceding siblings ...)
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 3/4] fs: dlm: rearrange async condition return Alexander Aring
@ 2022-02-16 15:53 ` Alexander Aring
  3 siblings, 0 replies; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 15:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes the log level if a plock is removed when interrupted
from debug to info. Additional it signals now that the plock entity was
removed to let the user know what's happening.

If on a dev_write() a pending plock cannot be find it will signal that
it might have been removed because wait interruption.

Before this patch there might be a "dev_write no op ..." info message
and the users can only guess that the plock was removed before because
the wait interruption. To be sure that is the case we log both messages
on the same log level.

Let both message be logged on info layer because it should not happened
a lot and if it happens it should be clear why the op was not found.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 757d9013788a..ce1af7986e16 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -161,11 +161,12 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 
 	rv = wait_event_interruptible(recv_wq, (op->done != 0));
 	if (rv == -ERESTARTSYS) {
-		log_debug(ls, "%s: wait killed %llx", __func__,
-			  (unsigned long long)number);
 		spin_lock(&ops_lock);
 		list_del(&op->list);
 		spin_unlock(&ops_lock);
+		log_print("%s: wait interrupted %x %llx, op removed",
+			  __func__, ls->ls_global_id,
+			  (unsigned long long)number);
 		dlm_release_plock_op(op);
 		do_unlock_close(ls, number, file, fl);
 		goto out;
@@ -443,8 +444,8 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
 		else
 			wake_up(&recv_wq);
 	} else
-		log_print("dev_write no op %x %llx", info.fsid,
-			  (unsigned long long)info.number);
+		log_print("%s: no op %x %llx - may got interrupted?", __func__,
+			  info.fsid, (unsigned long long)info.number);
 	return count;
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON Alexander Aring
@ 2022-02-16 16:08   ` Andreas Gruenbacher
  2022-02-16 16:16     ` Alexander Aring
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Gruenbacher @ 2022-02-16 16:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Feb 16, 2022 at 4:53 PM Alexander Aring <aahringo@redhat.com> wrote:
> There are several sanity checks and recover handling if they occur in
> the dlm plock handling. They should never occur otherwise we have a bug
> in the code. To make such bugs more visible we remove the recover
> handling and add a WARN_ON() on those sanity checks.
>
> Signed-off-by: Alexander Aring <aahringo@redhat.com>
> ---
>  fs/dlm/plock.c | 32 ++++----------------------------
>  1 file changed, 4 insertions(+), 28 deletions(-)
>
> diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> index a10d2bcfe75a..55fba2f0234f 100644
> --- a/fs/dlm/plock.c
> +++ b/fs/dlm/plock.c
> @@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
>                 goto out;
>         }
>
> -       spin_lock(&ops_lock);
> -       if (!list_empty(&op->list)) {
> -               log_error(ls, "dlm_posix_lock: op on list %llx",
> -                         (unsigned long long)number);
> -               list_del(&op->list);
> -       }
> -       spin_unlock(&ops_lock);
> +       WARN_ON(!list_empty(&op->list));

Why don't those checks need the ops_lock spin lock anymore?
Why does it make sense to get rid of the list_del calls?

>         rv = op->info.rv;
>
> @@ -190,13 +184,7 @@ static int dlm_plock_callback(struct plock_op *op)
>         struct plock_xop *xop = (struct plock_xop *)op;
>         int rv = 0;
>
> -       spin_lock(&ops_lock);
> -       if (!list_empty(&op->list)) {
> -               log_print("dlm_plock_callback: op on list %llx",
> -                         (unsigned long long)op->info.number);
> -               list_del(&op->list);
> -       }
> -       spin_unlock(&ops_lock);
> +       WARN_ON(!list_empty(&op->list));
>
>         /* check if the following 2 are still valid or make a copy */
>         file = xop->file;
> @@ -289,13 +277,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
>         send_op(op);
>         wait_event(recv_wq, (op->done != 0));
>
> -       spin_lock(&ops_lock);
> -       if (!list_empty(&op->list)) {
> -               log_error(ls, "dlm_posix_unlock: op on list %llx",
> -                         (unsigned long long)number);
> -               list_del(&op->list);
> -       }
> -       spin_unlock(&ops_lock);
> +       WARN_ON(!list_empty(&op->list));
>
>         rv = op->info.rv;
>
> @@ -343,13 +325,7 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file,
>         send_op(op);
>         wait_event(recv_wq, (op->done != 0));
>
> -       spin_lock(&ops_lock);
> -       if (!list_empty(&op->list)) {
> -               log_error(ls, "dlm_posix_get: op on list %llx",
> -                         (unsigned long long)number);
> -               list_del(&op->list);
> -       }
> -       spin_unlock(&ops_lock);
> +       WARN_ON(!list_empty(&op->list));
>
>         /* info.rv from userspace is 1 for conflict, 0 for no-conflict,
>            -ENOENT if there are no locks on the file */
> --
> 2.31.1
>

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-16 16:08   ` Andreas Gruenbacher
@ 2022-02-16 16:16     ` Alexander Aring
  2022-02-16 16:25       ` Alexander Aring
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 16:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, Feb 16, 2022 at 11:08 AM Andreas Gruenbacher
<agruenba@redhat.com> wrote:
>
> On Wed, Feb 16, 2022 at 4:53 PM Alexander Aring <aahringo@redhat.com> wrote:
> > There are several sanity checks and recover handling if they occur in
> > the dlm plock handling. They should never occur otherwise we have a bug
> > in the code. To make such bugs more visible we remove the recover
> > handling and add a WARN_ON() on those sanity checks.
> >
> > Signed-off-by: Alexander Aring <aahringo@redhat.com>
> > ---
> >  fs/dlm/plock.c | 32 ++++----------------------------
> >  1 file changed, 4 insertions(+), 28 deletions(-)
> >
> > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > index a10d2bcfe75a..55fba2f0234f 100644
> > --- a/fs/dlm/plock.c
> > +++ b/fs/dlm/plock.c
> > @@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> >                 goto out;
> >         }
> >
> > -       spin_lock(&ops_lock);
> > -       if (!list_empty(&op->list)) {
> > -               log_error(ls, "dlm_posix_lock: op on list %llx",
> > -                         (unsigned long long)number);
> > -               list_del(&op->list);
> > -       }
> > -       spin_unlock(&ops_lock);
> > +       WARN_ON(!list_empty(&op->list));
>
> Why don't those checks need the ops_lock spin lock anymore?
> Why does it make sense to get rid of the list_del calls?

My understanding is that those list_del() calls try to recover
something if a sanity check hits. The list_emptry() check should
always be true at this point no matter if lock is held or not.
Therefore no lock is required here to do some sanity checking.
If those checks hit there is a bug and trying to recover from it makes
no sense from my point of view.

- Alex



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-16 16:16     ` Alexander Aring
@ 2022-02-16 16:25       ` Alexander Aring
  2022-02-16 16:26         ` Alexander Aring
  2022-02-17  0:36       ` Andreas Gruenbacher
  2022-02-17  0:41       ` Andreas Gruenbacher
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 16:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, Feb 16, 2022 at 11:16 AM Alexander Aring <aahringo@redhat.com> wrote:
>
> Hi,
>
> On Wed, Feb 16, 2022 at 11:08 AM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > On Wed, Feb 16, 2022 at 4:53 PM Alexander Aring <aahringo@redhat.com> wrote:
> > > There are several sanity checks and recover handling if they occur in
> > > the dlm plock handling. They should never occur otherwise we have a bug
> > > in the code. To make such bugs more visible we remove the recover
> > > handling and add a WARN_ON() on those sanity checks.
> > >
> > > Signed-off-by: Alexander Aring <aahringo@redhat.com>
> > > ---
> > >  fs/dlm/plock.c | 32 ++++----------------------------
> > >  1 file changed, 4 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > index a10d2bcfe75a..55fba2f0234f 100644
> > > --- a/fs/dlm/plock.c
> > > +++ b/fs/dlm/plock.c
> > > @@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> > >                 goto out;
> > >         }
> > >
> > > -       spin_lock(&ops_lock);
> > > -       if (!list_empty(&op->list)) {
> > > -               log_error(ls, "dlm_posix_lock: op on list %llx",
> > > -                         (unsigned long long)number);
> > > -               list_del(&op->list);
> > > -       }
> > > -       spin_unlock(&ops_lock);
> > > +       WARN_ON(!list_empty(&op->list));
> >
> > Why don't those checks need the ops_lock spin lock anymore?
> > Why does it make sense to get rid of the list_del calls?
>
> My understanding is that those list_del() calls try to recover
> something if a sanity check hits. The list_emptry() check should
> always be true at this point no matter if lock is held or not.

s/true/false, but !list_empty() is true :)

 - Alex



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-16 16:25       ` Alexander Aring
@ 2022-02-16 16:26         ` Alexander Aring
  0 siblings, 0 replies; 15+ messages in thread
From: Alexander Aring @ 2022-02-16 16:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, Feb 16, 2022 at 11:25 AM Alexander Aring <aahringo@redhat.com> wrote:
>
> Hi,
>
> On Wed, Feb 16, 2022 at 11:16 AM Alexander Aring <aahringo@redhat.com> wrote:
> >
> > Hi,
> >
> > On Wed, Feb 16, 2022 at 11:08 AM Andreas Gruenbacher
> > <agruenba@redhat.com> wrote:
> > >
> > > On Wed, Feb 16, 2022 at 4:53 PM Alexander Aring <aahringo@redhat.com> wrote:
> > > > There are several sanity checks and recover handling if they occur in
> > > > the dlm plock handling. They should never occur otherwise we have a bug
> > > > in the code. To make such bugs more visible we remove the recover
> > > > handling and add a WARN_ON() on those sanity checks.
> > > >
> > > > Signed-off-by: Alexander Aring <aahringo@redhat.com>
> > > > ---
> > > >  fs/dlm/plock.c | 32 ++++----------------------------
> > > >  1 file changed, 4 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > > index a10d2bcfe75a..55fba2f0234f 100644
> > > > --- a/fs/dlm/plock.c
> > > > +++ b/fs/dlm/plock.c
> > > > @@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> > > >                 goto out;
> > > >         }
> > > >
> > > > -       spin_lock(&ops_lock);
> > > > -       if (!list_empty(&op->list)) {
> > > > -               log_error(ls, "dlm_posix_lock: op on list %llx",
> > > > -                         (unsigned long long)number);
> > > > -               list_del(&op->list);
> > > > -       }
> > > > -       spin_unlock(&ops_lock);
> > > > +       WARN_ON(!list_empty(&op->list));
> > >
> > > Why don't those checks need the ops_lock spin lock anymore?
> > > Why does it make sense to get rid of the list_del calls?
> >
> > My understanding is that those list_del() calls try to recover
> > something if a sanity check hits. The list_emptry() check should
> > always be true at this point no matter if lock is held or not.
>
> s/true/false, but !list_empty() is true :)

forget that, originally it was correct....

- Alex



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-16 16:16     ` Alexander Aring
  2022-02-16 16:25       ` Alexander Aring
@ 2022-02-17  0:36       ` Andreas Gruenbacher
  2022-02-17 14:02         ` Alexander Aring
  2022-02-17 16:54         ` David Teigland
  2022-02-17  0:41       ` Andreas Gruenbacher
  2 siblings, 2 replies; 15+ messages in thread
From: Andreas Gruenbacher @ 2022-02-17  0:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Feb 16, 2022 at 5:16 PM Alexander Aring <aahringo@redhat.com> wrote:
>
> Hi,
>
> On Wed, Feb 16, 2022 at 11:08 AM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > On Wed, Feb 16, 2022 at 4:53 PM Alexander Aring <aahringo@redhat.com> wrote:
> > > There are several sanity checks and recover handling if they occur in
> > > the dlm plock handling. They should never occur otherwise we have a bug
> > > in the code. To make such bugs more visible we remove the recover
> > > handling and add a WARN_ON() on those sanity checks.
> > >
> > > Signed-off-by: Alexander Aring <aahringo@redhat.com>
> > > ---
> > >  fs/dlm/plock.c | 32 ++++----------------------------
> > >  1 file changed, 4 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > index a10d2bcfe75a..55fba2f0234f 100644
> > > --- a/fs/dlm/plock.c
> > > +++ b/fs/dlm/plock.c
> > > @@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> > >                 goto out;
> > >         }
> > >
> > > -       spin_lock(&ops_lock);
> > > -       if (!list_empty(&op->list)) {
> > > -               log_error(ls, "dlm_posix_lock: op on list %llx",
> > > -                         (unsigned long long)number);
> > > -               list_del(&op->list);
> > > -       }
> > > -       spin_unlock(&ops_lock);
> > > +       WARN_ON(!list_empty(&op->list));
> >
> > Why don't those checks need the ops_lock spin lock anymore?
> > Why does it make sense to get rid of the list_del calls?
>
> My understanding is that those list_del() calls try to recover
> something if a sanity check hits. The list_emptry() check should
> always be true at this point no matter if lock is held or not.
> Therefore no lock is required here to do some sanity checking.

I don't immediately see what, other than the spin lock, would
guarantee a consistent memory view. In other words, without taking the
spin lock, 'list_empty(&op->list)' might still be true on one CPU even
though another CPU has already added 'op' to a list. So please, when
changing the locking somewhere, explain why the change is correct
beyond just stating that the locking isn't needed.

Thanks,
Andreas

> If those checks hit there is a bug and trying to recover from it makes
> no sense from my point of view.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-16 16:16     ` Alexander Aring
  2022-02-16 16:25       ` Alexander Aring
  2022-02-17  0:36       ` Andreas Gruenbacher
@ 2022-02-17  0:41       ` Andreas Gruenbacher
  2022-02-17 14:03         ` Alexander Aring
  2 siblings, 1 reply; 15+ messages in thread
From: Andreas Gruenbacher @ 2022-02-17  0:41 UTC (permalink / raw)
  To: cluster-devel.redhat.com

There's also an unnecessary INIT_LIST_HEAD() in send_op().

Andreas

---
 fs/dlm/plock.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index ce1af7986e16..ff439d780cb1 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -67,7 +67,6 @@ static void dlm_release_plock_op(struct plock_op *op)
 static void send_op(struct plock_op *op)
 {
 	set_version(&op->info);
-	INIT_LIST_HEAD(&op->list);
 	spin_lock(&ops_lock);
 	list_add_tail(&op->list, &send_list);
 	spin_unlock(&ops_lock);
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-17  0:36       ` Andreas Gruenbacher
@ 2022-02-17 14:02         ` Alexander Aring
  2022-02-17 16:54         ` David Teigland
  1 sibling, 0 replies; 15+ messages in thread
From: Alexander Aring @ 2022-02-17 14:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, Feb 16, 2022 at 7:37 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> On Wed, Feb 16, 2022 at 5:16 PM Alexander Aring <aahringo@redhat.com> wrote:
> >
> > Hi,
> >
> > On Wed, Feb 16, 2022 at 11:08 AM Andreas Gruenbacher
> > <agruenba@redhat.com> wrote:
> > >
> > > On Wed, Feb 16, 2022 at 4:53 PM Alexander Aring <aahringo@redhat.com> wrote:
> > > > There are several sanity checks and recover handling if they occur in
> > > > the dlm plock handling. They should never occur otherwise we have a bug
> > > > in the code. To make such bugs more visible we remove the recover
> > > > handling and add a WARN_ON() on those sanity checks.
> > > >
> > > > Signed-off-by: Alexander Aring <aahringo@redhat.com>
> > > > ---
> > > >  fs/dlm/plock.c | 32 ++++----------------------------
> > > >  1 file changed, 4 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > > index a10d2bcfe75a..55fba2f0234f 100644
> > > > --- a/fs/dlm/plock.c
> > > > +++ b/fs/dlm/plock.c
> > > > @@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> > > >                 goto out;
> > > >         }
> > > >
> > > > -       spin_lock(&ops_lock);
> > > > -       if (!list_empty(&op->list)) {
> > > > -               log_error(ls, "dlm_posix_lock: op on list %llx",
> > > > -                         (unsigned long long)number);
> > > > -               list_del(&op->list);
> > > > -       }
> > > > -       spin_unlock(&ops_lock);
> > > > +       WARN_ON(!list_empty(&op->list));
> > >
> > > Why don't those checks need the ops_lock spin lock anymore?
> > > Why does it make sense to get rid of the list_del calls?
> >
> > My understanding is that those list_del() calls try to recover
> > something if a sanity check hits. The list_emptry() check should
> > always be true at this point no matter if lock is held or not.
> > Therefore no lock is required here to do some sanity checking.
>
> I don't immediately see what, other than the spin lock, would
> guarantee a consistent memory view. In other words, without taking the
> spin lock, 'list_empty(&op->list)' might still be true on one CPU even
> though another CPU has already added 'op' to a list. So please, when
> changing the locking somewhere, explain why the change is correct
> beyond just stating that the locking isn't needed.
>

Three WARN_ON()'s are behind wait_event()/wait_event_interruptible()
assuming they have the right barriers here to provide a consistent
memory view.

The other one in dlm_plock_callback() was the one which makes me think
about what the code is trying to do here. My guess is that the whole
locking concept is somehow connected, that some posix lock api/misc
char devices calls cannot be called parallel and they occur in a
specific sequence (maybe this sequence is only guaranteed by
dlm_controld?).
However the dlm_plock_callback() just did a
"list_del_init(&op->list);" before the WARN_ON() and I guess nothing
should interfere with this by another list manipulation otherwise the
previous sanity check would hit.

I would remove those checks but I don't trust that the code is
assuming that some things cannot run parallel (as the API calls it)
and in a specific sequence, otherwise there would be multiple
use-after-free. My assumption here is that those sanity checks somehow
try to confirm that.

Does this justify(if this explanation is correct)/explain the patch enough?

- Alex



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-17  0:41       ` Andreas Gruenbacher
@ 2022-02-17 14:03         ` Alexander Aring
  2022-02-17 14:47           ` Andreas Gruenbacher
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Aring @ 2022-02-17 14:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, Feb 16, 2022 at 7:41 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> There's also an unnecessary INIT_LIST_HEAD() in send_op().
>

please send a patch in a form which makes it easy to apply. Also
please explain exactly why an "INIT_LIST_HEAD()" is not needed here
rather than just stating that the init isn't needed.

Thanks.

- Alex



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-17 14:03         ` Alexander Aring
@ 2022-02-17 14:47           ` Andreas Gruenbacher
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Gruenbacher @ 2022-02-17 14:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Feb 17, 2022 at 3:03 PM Alexander Aring <aahringo@redhat.com> wrote:
> On Wed, Feb 16, 2022 at 7:41 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
> >
> > There's also an unnecessary INIT_LIST_HEAD() in send_op().
> >
>
> please send a patch in a form which makes it easy to apply. Also
> please explain exactly why an "INIT_LIST_HEAD()" is not needed here
> rather than just stating that the init isn't needed.

The reason why the INIT_LIST_HEAD is unnecessary is because it
initializes op->list, and two lines further down, the list_add_tail
overrides op->list.

Do with this report whatever you want, but don't expect me to spend
extra time on it.

Andreas



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON
  2022-02-17  0:36       ` Andreas Gruenbacher
  2022-02-17 14:02         ` Alexander Aring
@ 2022-02-17 16:54         ` David Teigland
  1 sibling, 0 replies; 15+ messages in thread
From: David Teigland @ 2022-02-17 16:54 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Feb 17, 2022 at 01:36:44AM +0100, Andreas Gruenbacher wrote:
> On Wed, Feb 16, 2022 at 5:16 PM Alexander Aring <aahringo@redhat.com> wrote:
> > > > -       spin_lock(&ops_lock);
> > > > -       if (!list_empty(&op->list)) {
> > > > -               log_error(ls, "dlm_posix_lock: op on list %llx",
> > > > -                         (unsigned long long)number);
> > > > -               list_del(&op->list);
> > > > -       }
> > > > -       spin_unlock(&ops_lock);
> > > > +       WARN_ON(!list_empty(&op->list));
> > >
> > > Why don't those checks need the ops_lock spin lock anymore?
> > > Why does it make sense to get rid of the list_del calls?
> >
> > My understanding is that those list_del() calls try to recover
> > something if a sanity check hits. The list_emptry() check should
> > always be true at this point no matter if lock is held or not.
> > Therefore no lock is required here to do some sanity checking.
> 
> I don't immediately see what, other than the spin lock, would
> guarantee a consistent memory view. In other words, without taking the
> spin lock, 'list_empty(&op->list)' might still be true on one CPU even
> though another CPU has already added 'op' to a list. 

I'm not sure what thread contexts are running on the CPUs in your example.

> So please, when changing the locking somewhere, explain why the change
> is correct beyond just stating that the locking isn't needed.

Since the removed locking was not actually doing anything useful, there's
a limited amount that can be said about what it changes.

It's clear that the ops_lock protects the lists.  The op is not on any
list at this point, and if it were the code is broken.  WARN_ON seems like
the preferred way to indicate failed assumptions in the code.  In other
words, the code was making a shallow and cosmetic attempt to look robust
rather than broken.

Dave



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-17 16:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 15:53 [Cluster-devel] [PATCH dlm/next 0/4] fs: dlm: cleanup plock code Alexander Aring
2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON Alexander Aring
2022-02-16 16:08   ` Andreas Gruenbacher
2022-02-16 16:16     ` Alexander Aring
2022-02-16 16:25       ` Alexander Aring
2022-02-16 16:26         ` Alexander Aring
2022-02-17  0:36       ` Andreas Gruenbacher
2022-02-17 14:02         ` Alexander Aring
2022-02-17 16:54         ` David Teigland
2022-02-17  0:41       ` Andreas Gruenbacher
2022-02-17 14:03         ` Alexander Aring
2022-02-17 14:47           ` Andreas Gruenbacher
2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 2/4] fs: dlm: cleanup plock_op vs plock_xop Alexander Aring
2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 3/4] fs: dlm: rearrange async condition return Alexander Aring
2022-02-16 15:53 ` [Cluster-devel] [PATCH dlm/next 4/4] fs: dlm: improve plock logging if interrupted Alexander Aring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.