All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH dlm/next 1/6] fs: dlm: plock use list_first_entry
@ 2022-05-26 20:15 Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 2/6] fs: dlm: remove may interrupted message Alexander Aring
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Alexander Aring @ 2022-05-26 20:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch will use the list helper list_first_entry() instead of using
list_entry() to get the first element of a list.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 0993eebf2060..7cab5d27132b 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -378,7 +378,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
 
 	spin_lock(&ops_lock);
 	if (!list_empty(&send_list)) {
-		op = list_entry(send_list.next, struct plock_op, list);
+		op = list_first_entry(&send_list, struct plock_op, list);
 		if (op->info.flags & DLM_PLOCK_FL_CLOSE)
 			list_del(&op->list);
 		else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm/next 2/6] fs: dlm: remove may interrupted message
  2022-05-26 20:15 [Cluster-devel] [PATCH dlm/next 1/6] fs: dlm: plock use list_first_entry Alexander Aring
@ 2022-05-26 20:15 ` Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 3/6] fs: dlm: add pid to debug log Alexander Aring
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2022-05-26 20:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Commit bcfad4265ced ("dlm: improve plock logging if interrupted") shall
improve the printout to give the user a reason why a dlm plock op
function was not found anymore when the kernel dlm op side was sending a
operation request to the user space.

However this situation was still confusing users about those messages. A
new approach will avoid such message and if they occur there is a bug.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 7cab5d27132b..32eda1f43d22 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -443,7 +443,7 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
 		else
 			wake_up(&recv_wq);
 	} else
-		log_print("%s: no op %x %llx - may got interrupted?", __func__,
+		log_print("%s: no op %x %llx", __func__,
 			  info.fsid, (unsigned long long)info.number);
 	return count;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm/next 3/6] fs: dlm: add pid to debug log
  2022-05-26 20:15 [Cluster-devel] [PATCH dlm/next 1/6] fs: dlm: plock use list_first_entry Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 2/6] fs: dlm: remove may interrupted message Alexander Aring
@ 2022-05-26 20:15 ` Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 4/6] fs: dlm: change log output to debug again Alexander Aring
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2022-05-26 20:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds the pid information which requested the lock operation
to the debug log output.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 32eda1f43d22..a58598c3edc5 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -163,9 +163,9 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 		spin_lock(&ops_lock);
 		list_del(&op->list);
 		spin_unlock(&ops_lock);
-		log_print("%s: wait interrupted %x %llx, op removed",
+		log_print("%s: wait interrupted %x %llx pid %d, op removed",
 			  __func__, ls->ls_global_id,
-			  (unsigned long long)number);
+			  (unsigned long long)number, op->info.pid);
 		dlm_release_plock_op(op);
 		do_unlock_close(ls, number, file, fl);
 		goto out;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm/next 4/6] fs: dlm: change log output to debug again
  2022-05-26 20:15 [Cluster-devel] [PATCH dlm/next 1/6] fs: dlm: plock use list_first_entry Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 2/6] fs: dlm: remove may interrupted message Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 3/6] fs: dlm: add pid to debug log Alexander Aring
@ 2022-05-26 20:15 ` Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 5/6] fs: dlm: use dlm_plock_info for do_unlock_close Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 6/6] fs: dlm: change posix lock sigint handling Alexander Aring
  4 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2022-05-26 20:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch revers the commit bcfad4265ced ("dlm: improve plock logging
if interrupted") by moving it to debug level and noticing the user an op
was removed from the kernel user space mechanism handling.

A new approach will be made that an op should never be removed when
waiting from an answer from the user space side.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index a58598c3edc5..868940c48e3a 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -163,7 +163,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 		spin_lock(&ops_lock);
 		list_del(&op->list);
 		spin_unlock(&ops_lock);
-		log_print("%s: wait interrupted %x %llx pid %d, op removed",
+		log_debug(ls, "%s: wait interrupted %x %llx pid %d",
 			  __func__, ls->ls_global_id,
 			  (unsigned long long)number, op->info.pid);
 		dlm_release_plock_op(op);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm/next 5/6] fs: dlm: use dlm_plock_info for do_unlock_close
  2022-05-26 20:15 [Cluster-devel] [PATCH dlm/next 1/6] fs: dlm: plock use list_first_entry Alexander Aring
                   ` (2 preceding siblings ...)
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 4/6] fs: dlm: change log output to debug again Alexander Aring
@ 2022-05-26 20:15 ` Alexander Aring
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 6/6] fs: dlm: change posix lock sigint handling Alexander Aring
  4 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2022-05-26 20:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch will refactor do_unlock_close() by using only struct
dlm_plock_info as parameter. There is only one use of do_unlock_close()
and the same variables was copied before into the used dlm_plock_info
structure.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 868940c48e3a..cf7bba461bfd 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -79,8 +79,7 @@ static void send_op(struct plock_op *op)
    abandoned waiter.  So, we have to insert the unlock-close when the
    lock call is interrupted. */
 
-static void do_unlock_close(struct dlm_ls *ls, u64 number,
-			    struct file *file, struct file_lock *fl)
+static void do_unlock_close(const struct dlm_plock_info *info)
 {
 	struct plock_op *op;
 
@@ -89,15 +88,12 @@ static void do_unlock_close(struct dlm_ls *ls, u64 number,
 		return;
 
 	op->info.optype		= DLM_PLOCK_OP_UNLOCK;
-	op->info.pid		= fl->fl_pid;
-	op->info.fsid		= ls->ls_global_id;
-	op->info.number		= number;
+	op->info.pid		= info->pid;
+	op->info.fsid		= info->fsid;
+	op->info.number		= info->number;
 	op->info.start		= 0;
 	op->info.end		= OFFSET_MAX;
-	if (fl->fl_lmops && fl->fl_lmops->lm_grant)
-		op->info.owner	= (__u64) fl->fl_pid;
-	else
-		op->info.owner	= (__u64)(long) fl->fl_owner;
+	op->info.owner		= info->owner;
 
 	op->info.flags |= DLM_PLOCK_FL_CLOSE;
 	send_op(op);
@@ -167,7 +163,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 			  __func__, ls->ls_global_id,
 			  (unsigned long long)number, op->info.pid);
 		dlm_release_plock_op(op);
-		do_unlock_close(ls, number, file, fl);
+		do_unlock_close(&op->info);
 		goto out;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm/next 6/6] fs: dlm: change posix lock sigint handling
  2022-05-26 20:15 [Cluster-devel] [PATCH dlm/next 1/6] fs: dlm: plock use list_first_entry Alexander Aring
                   ` (3 preceding siblings ...)
  2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 5/6] fs: dlm: use dlm_plock_info for do_unlock_close Alexander Aring
@ 2022-05-26 20:15 ` Alexander Aring
  4 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2022-05-26 20:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch will change the handling if a plock operation was interrupted
while waiting for a user space reply (probably dlm_controld). This is
not while the posix lock waits in lock blocking state which is done by
locks_lock_file_wait(). However the lock operation should be always
interruptible, doesn't matter which wait is currently blocking the
process.

If an interruption due waiting on a user space reply occurs the current
behaviour is that we remove the already transmitted operation request to
the user space from an list which is used to make a lookup if a reply
comes back. This has as side effect that we see some:

dev_write no op...

in the kernel log because the lookup failed. This is easily reproducible
by running:

stress-ng --fcntl 100

and hitting strg-c afterwards.

Instead of removing the op from the lookup list, we wait until the
operation is completed. When the operation was completed we check if the
wait was interrupted, if so we don't handle the request anymore and
cleanup the original lock request. When there are still "dev_write no op"
messages around it signals an issue that we removed an op while is hasn't
been completed yet. This situation should never happen.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/plock.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index cf7bba461bfd..737f185aad8d 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -29,6 +29,8 @@ struct plock_async_data {
 struct plock_op {
 	struct list_head list;
 	int done;
+	/* if lock op got interrupted while waiting dlm_controld reply */
+	bool sigint;
 	struct dlm_plock_info info;
 	/* if set indicates async handling */
 	struct plock_async_data *data;
@@ -157,16 +159,24 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 	rv = wait_event_interruptible(recv_wq, (op->done != 0));
 	if (rv == -ERESTARTSYS) {
 		spin_lock(&ops_lock);
-		list_del(&op->list);
+		/* recheck under ops_lock if we got a done != 0,
+		 * if so this interrupt case should be ignored
+		 */
+		if (op->done != 0) {
+			spin_unlock(&ops_lock);
+			goto do_lock_wait;
+		}
+
+		op->sigint = true;
 		spin_unlock(&ops_lock);
 		log_debug(ls, "%s: wait interrupted %x %llx pid %d",
 			  __func__, ls->ls_global_id,
 			  (unsigned long long)number, op->info.pid);
-		dlm_release_plock_op(op);
-		do_unlock_close(&op->info);
 		goto out;
 	}
 
+do_lock_wait:
+
 	WARN_ON(!list_empty(&op->list));
 
 	rv = op->info.rv;
@@ -421,6 +431,19 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
 		if (iter->info.fsid == info.fsid &&
 		    iter->info.number == info.number &&
 		    iter->info.owner == info.owner) {
+			if (iter->sigint) {
+				list_del(&iter->list);
+				spin_unlock(&ops_lock);
+
+				pr_debug("%s: sigint cleanup %x %llx pid %d",
+					  __func__, iter->info.fsid,
+					  (unsigned long long)iter->info.number,
+					  iter->info.pid);
+				do_unlock_close(&iter->info);
+				memcpy(&iter->info, &info, sizeof(info));
+				dlm_release_plock_op(iter);
+				return count;
+			}
 			list_del_init(&iter->list);
 			memcpy(&iter->info, &info, sizeof(info));
 			if (iter->data)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-26 20:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-26 20:15 [Cluster-devel] [PATCH dlm/next 1/6] fs: dlm: plock use list_first_entry Alexander Aring
2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 2/6] fs: dlm: remove may interrupted message Alexander Aring
2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 3/6] fs: dlm: add pid to debug log Alexander Aring
2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 4/6] fs: dlm: change log output to debug again Alexander Aring
2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 5/6] fs: dlm: use dlm_plock_info for do_unlock_close Alexander Aring
2022-05-26 20:15 ` [Cluster-devel] [PATCH dlm/next 6/6] fs: dlm: change posix lock sigint handling Alexander Aring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.