All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1
@ 2022-06-08 10:48 Heming Zhao via Ocfs2-devel
  2022-06-08 10:48 ` [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue Heming Zhao via Ocfs2-devel
  2022-06-18  2:35 ` [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1 Joseph Qi via Ocfs2-devel
  0 siblings, 2 replies; 17+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2022-06-08 10:48 UTC (permalink / raw)
  To: ocfs2-devel, joseph.qi

=== test cases ====

<1> remount on local node for cluster env

mount -t ocfs2 /dev/vdb /mnt
mount -t ocfs2 /dev/vdb /mnt              <=== failure
mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure

<2> remount on local node for nocluster env

mount -t ocfs2 -o nocluster /dev/vdb /mnt
mount -t ocfs2 /dev/vdb /mnt              <=== failure
mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure

<3> remount on another node for cluster env

node2:
mount -t ocfs2 /dev/vdb /mnt

node1:
mount -t ocfs2 /dev/vdb /mnt  <== success
umount
mount -t ocfs2 -o nocluster /dev/vdb /mnt <== failure

<4> remount on another node for nocluster env

node2:
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node1:
mount -t ocfs2 /dev/vdb /mnt              <== failure
mount -t ocfs2 -o nocluster /dev/vdb /mnt <== success, see below comment

<5> simulate after crash status for cluster env

(below all steps did on node1. node2 is unmount status)
mount -t ocfs2 /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.cluster.mnted
umount /mnt
dd if=/root/slotmap.cluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== failure
mount -t ocfs2 /dev/vdb /mnt && umount /mnt <== clean slot 0
mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== success

<6> simulate after crash status for nocluster env

(below all steps did on node1. node2 is unmount status)
mount -t ocfs2 -o nocluster /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.nocluster.mnted
umount /mnt
dd if=/root/slotmap.nocluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
mount -t ocfs2 /dev/vdb /mnt   <== failure
mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt <== clean slot 0
mount -t ocfs2 /dev/vdb /mnt   <== success


-----
For test case <4>, the kernel job is done, but there still left
userspace work todo. 
In my view, mount.ocfs2 needs add double confirm for this scenario.

current style:
```
# mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt
Warning: to mount a clustered volume without the cluster stack.
Please make sure you only mount the file system from one node.
Otherwise, the file system may be damaged.
Proceed (y/N): y
```

I plan to change as:
```
# mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt
Warning: to mount a clustered volume without the cluster stack.
Please make sure you only mount the file system from one node.
Otherwise, the file system may be damaged.
Proceed (y/N): y
Warning: detect volume already mounted as nocluster mode.
Do you mount this volume on another node?
Please confirm you want to mount this volume on this node.
Proceed (y/N): y
```

Heming Zhao (1):
  ocfs2: fix ocfs2_find_slot repeats alloc same slot issue

 fs/ocfs2/dlmglue.c  |  3 ++
 fs/ocfs2/ocfs2_fs.h |  3 ++
 fs/ocfs2/slot_map.c | 70 ++++++++++++++++++++++++++++++++++++---------
 3 files changed, 62 insertions(+), 14 deletions(-)

-- 
2.34.1


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-08 10:48 [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1 Heming Zhao via Ocfs2-devel
@ 2022-06-08 10:48 ` Heming Zhao via Ocfs2-devel
  2022-06-12 14:16   ` Joseph Qi via Ocfs2-devel
  2022-06-18  2:35 ` [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1 Joseph Qi via Ocfs2-devel
  1 sibling, 1 reply; 17+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2022-06-08 10:48 UTC (permalink / raw)
  To: ocfs2-devel, joseph.qi

Below commit log copied from Junxiao's patch:
https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000107.html

Junxiao planed to revert commit 912f655d78c5("ocfs2: mount shared volume
without ha stack"), but maintainer & I preferred to keep and fix this bug.

-------------------------- snip  --------------------------
This commit introduced a regression that can cause mount hung.
The changes in __ocfs2_find_empty_slot causes that any node with
none-zero node number can grab the slot that was already taken by
node 0, so node 1 will access the same journal with node 0, when it
try to grab journal cluster lock, it will hung because it was already
acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first,
then node 1, you will see the following call trace from node 1.

[13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
[13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
[13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
[13148.749354] Call Trace:
[13148.750718]  <TASK>
[13148.752019]  ? usleep_range+0x90/0x89
[13148.753882]  __schedule+0x210/0x567
[13148.755684]  schedule+0x44/0xa8
[13148.757270]  schedule_timeout+0x106/0x13c
[13148.759273]  ? __prepare_to_swait+0x53/0x78
[13148.761218]  __wait_for_common+0xae/0x163
[13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
[13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
[13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
[13148.775401]  ? iput+0x69/0xba
[13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
[13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
[13148.781756]  mount_bdev+0x190/0x1b7
[13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
[13148.785634]  legacy_get_tree+0x27/0x48
[13148.787466]  vfs_get_tree+0x25/0xd0
[13148.789270]  do_new_mount+0x18c/0x2d9
[13148.791046]  __x64_sys_mount+0x10e/0x142
[13148.792911]  do_syscall_64+0x3b/0x89
[13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
[13148.797051] RIP: 0033:0x7f2309f6e26e
[13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
[13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
[13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
[13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
[13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
[13148.816564]  </TASK>

To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.
-------------------------- snip  --------------------------

** analysis **

Under Junxiao's call trace, in __ocfs2_find_empty_slot(), the 'if'
accessment is wrong. sl_node_num could be 0 at o2cb env.

with current information, the trigger flow (may):
1>
node1 with 'node_num = 0' for mounting. it will succeed.
at this time, slotmap extent block will contains es_valid:1 &
es_node_num:0 for node1
then ocfs2_update_disk_slot() will write back slotmap info to disk.

2>
then, node2 with 'node_num = 1' for mounting

ocfs2_find_slot
 + ocfs2_update_slot_info //read slotmap info from disk
 |  + set si->si_slots[0].es_valid = 1 & si->si_slots[0].sl_node_num = 0
 |
 + __ocfs2_node_num_to_slot //will return -ENOENT.
    __ocfs2_find_empty_slot
     + search preferred (node_num:1) failed
     + 'si->si_slots[0].sl_node_num' is false. trigger 'break' condition.
     + return slot 0 will cause node2 grab node1 journal dlm lock, then trigger hung.

** how to fix **

For simplifing code logic design, We make a rule:
If last mount didn't do umount, (eg: crash happened), the next mount MUST
be same mount type.

All possible cases, when enter ocfs2_find_slot():

1. all slots are empty, [cluster|nocluster] mode mount will succeed.
   this is clean ocfs2 volume case.

2. for nocluster mount action
   - slot 0 is empty, but another slot is not empty:
     - mount failure. (there should be in clustered env, deny mixed mount case)
   - slot 0 is not empty, all other slots are empty:
     - slot 0 is nocluster type: mount success  (may crash last time)
     - slot 0 is cluster type: mount failure (deny mixed mount case)

3. for cluster mount action
   - slot 0 is empty, but another slot is not empty:
     - mount success
   - slot 0 is not empty, all other slots are empty:
     - slot 0 is nocluster type: mount failure (deny mixed mount case)
     - slot 0 is cluster type: mount success (may crash last time)

above with simplified form:
1.
clean parition => nocluster/cluster@any_node    - success

2.
cluster@any_node => nocluster@any_node          - failure
nocluster@node1 => crash => nocluster@node1     - success
nocluster@node2 => nocluster@node1              - success [*]
cluster@any_node => crash => nocluster@any_node - failure

3.
cluster@any_node => cluster@any_node            - success
cluster@any_node => crash => cluster@any_node   - success
nocluster@any_node => crash => cluster@any_node - failure

[*]: this is the only risk to corrupt data. we allow this case happen:
- node2 may crash or borken, and fails to bootup anymore.
- node2 fails to access ocfs2 volume, needs to manage from another node.
- mount.ocfs2 will give special warning for this case.

Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/dlmglue.c  |  3 ++
 fs/ocfs2/ocfs2_fs.h |  3 ++
 fs/ocfs2/slot_map.c | 70 ++++++++++++++++++++++++++++++++++++---------
 3 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 801e60bab955..6b017ae46145 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3403,6 +3403,9 @@ void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
 	ocfs2_lock_res_free(&osb->osb_nfs_sync_lockres);
 	ocfs2_lock_res_free(&osb->osb_orphan_scan.os_lockres);
 
+	if (ocfs2_mount_local(osb))
+		return;
+
 	ocfs2_cluster_disconnect(osb->cconn, hangup_pending);
 	osb->cconn = NULL;
 
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 638d875eccc7..4fe42e638309 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -534,6 +534,9 @@ struct ocfs2_slot_map {
  */
 };
 
+#define OCFS2_SLOTMAP_CLUSTER   1
+#define OCFS2_SLOTMAP_NOCLUSTER 2
+
 struct ocfs2_extended_slot {
 /*00*/	__u8	es_valid;
 	__u8	es_reserved1[3];
diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index 0b0ae3ebb0cf..a5c06d3ecb27 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -26,7 +26,7 @@
 
 
 struct ocfs2_slot {
-	int sl_valid;
+	u8 sl_valid;
 	unsigned int sl_node_num;
 };
 
@@ -52,11 +52,11 @@ static void ocfs2_invalidate_slot(struct ocfs2_slot_info *si,
 }
 
 static void ocfs2_set_slot(struct ocfs2_slot_info *si,
-			   int slot_num, unsigned int node_num)
+			   int slot_num, unsigned int node_num, u8 valid)
 {
 	BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots));
 
-	si->si_slots[slot_num].sl_valid = 1;
+	si->si_slots[slot_num].sl_valid = valid;
 	si->si_slots[slot_num].sl_node_num = node_num;
 }
 
@@ -75,7 +75,8 @@ static void ocfs2_update_slot_info_extended(struct ocfs2_slot_info *si)
 		     i++, slotno++) {
 			if (se->se_slots[i].es_valid)
 				ocfs2_set_slot(si, slotno,
-					       le32_to_cpu(se->se_slots[i].es_node_num));
+					       le32_to_cpu(se->se_slots[i].es_node_num),
+						   le32_to_cpu(se->se_slots[i].es_valid));
 			else
 				ocfs2_invalidate_slot(si, slotno);
 		}
@@ -97,7 +98,7 @@ static void ocfs2_update_slot_info_old(struct ocfs2_slot_info *si)
 		if (le16_to_cpu(sm->sm_slots[i]) == (u16)OCFS2_INVALID_SLOT)
 			ocfs2_invalidate_slot(si, i);
 		else
-			ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i]));
+			ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i]), OCFS2_SLOTMAP_CLUSTER);
 	}
 }
 
@@ -252,16 +253,14 @@ static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
 	int i, ret = -ENOSPC;
 
 	if ((preferred >= 0) && (preferred < si->si_num_slots)) {
-		if (!si->si_slots[preferred].sl_valid ||
-		    !si->si_slots[preferred].sl_node_num) {
+		if (!si->si_slots[preferred].sl_valid) {
 			ret = preferred;
 			goto out;
 		}
 	}
 
 	for(i = 0; i < si->si_num_slots; i++) {
-		if (!si->si_slots[i].sl_valid ||
-		    !si->si_slots[i].sl_node_num) {
+		if (!si->si_slots[i].sl_valid) {
 			ret = i;
 			break;
 		}
@@ -270,6 +269,20 @@ static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
 	return ret;
 }
 
+static int __ocfs2_find_used_slot(struct ocfs2_slot_info *si)
+{
+	int i, ret = -ENOENT;
+
+	for (i = 0; i < si->si_num_slots; i++) {
+		if (si->si_slots[i].sl_valid) {
+			ret = i;
+			break;
+		}
+	}
+
+	return ret;
+}
+
 int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num)
 {
 	int slot;
@@ -449,17 +462,45 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 {
 	int status;
 	int slot;
+	int nocluster_mnt = 0;
 	struct ocfs2_slot_info *si;
 
 	si = osb->slot_info;
 
 	spin_lock(&osb->osb_lock);
 	ocfs2_update_slot_info(si);
+	slot = __ocfs2_find_used_slot(si);
+	if (slot == 0 && (si->si_slots[0].sl_valid == OCFS2_SLOTMAP_NOCLUSTER))
+		nocluster_mnt = 1;
 
-	if (ocfs2_mount_local(osb))
-		/* use slot 0 directly in local mode */
-		slot = 0;
-	else {
+	/*
+	 * We set a rule:
+	 * if last mount didn't do unmount, (eg: crash happened), the next mount
+	 * MUST be same mount type.
+	 */
+	if (ocfs2_mount_local(osb)) {
+		/* empty slotmap, or partition didn't unmount last time */
+		if ((slot == -ENOENT) || nocluster_mnt) {
+			/* use slot 0 directly in local mode */
+			slot = 0;
+			nocluster_mnt = 1;
+		} else {
+			spin_unlock(&osb->osb_lock);
+			mlog(ML_ERROR, "found clustered mount slot in noclustered env!\n");
+			mlog(ML_ERROR, "please clean slotmap info for mount.\n");
+			mlog(ML_ERROR, "eg. remount then unmount with clustered mode\n");
+			status = -EINVAL;
+			goto bail;
+		}
+	} else {
+		if (nocluster_mnt) {
+			spin_unlock(&osb->osb_lock);
+			mlog(ML_ERROR, "found noclustered mount slot in clustered env!\n");
+			mlog(ML_ERROR, "please clean slotmap info for mount.\n");
+			mlog(ML_ERROR, "eg. remount then unmount with noclustered mode\n");
+			status = -EINVAL;
+			goto bail;
+		}
 		/* search for ourselves first and take the slot if it already
 		 * exists. Perhaps we need to mark this in a variable for our
 		 * own journal recovery? Possibly not, though we certainly
@@ -481,7 +522,8 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 			       slot, osb->dev_str);
 	}
 
-	ocfs2_set_slot(si, slot, osb->node_num);
+	ocfs2_set_slot(si, slot, osb->node_num, nocluster_mnt ?
+					OCFS2_SLOTMAP_NOCLUSTER : OCFS2_SLOTMAP_CLUSTER);
 	osb->slot_num = slot;
 	spin_unlock(&osb->osb_lock);
 
-- 
2.34.1


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-08 10:48 ` [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue Heming Zhao via Ocfs2-devel
@ 2022-06-12 14:16   ` Joseph Qi via Ocfs2-devel
  2022-06-13  7:59     ` heming.zhao--- via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2022-06-12 14:16 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel

Hi,

Why can't use local mount? I don't remember if we discuss about this.

Thanks,
Joseph

On 6/8/22 6:48 PM, Heming Zhao wrote:
> Below commit log copied from Junxiao's patch:
> https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000107.html
> 
> Junxiao planed to revert commit 912f655d78c5("ocfs2: mount shared volume
> without ha stack"), but maintainer & I preferred to keep and fix this bug.
> 
> -------------------------- snip  --------------------------
> This commit introduced a regression that can cause mount hung.
> The changes in __ocfs2_find_empty_slot causes that any node with
> none-zero node number can grab the slot that was already taken by
> node 0, so node 1 will access the same journal with node 0, when it
> try to grab journal cluster lock, it will hung because it was already
> acquired by node 0.
> It's very easy to reproduce this, in one cluster, mount node 0 first,
> then node 1, you will see the following call trace from node 1.
> 
> [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
> [13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
> [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
> [13148.749354] Call Trace:
> [13148.750718]  <TASK>
> [13148.752019]  ? usleep_range+0x90/0x89
> [13148.753882]  __schedule+0x210/0x567
> [13148.755684]  schedule+0x44/0xa8
> [13148.757270]  schedule_timeout+0x106/0x13c
> [13148.759273]  ? __prepare_to_swait+0x53/0x78
> [13148.761218]  __wait_for_common+0xae/0x163
> [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
> [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
> [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
> [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
> [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
> [13148.775401]  ? iput+0x69/0xba
> [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
> [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
> [13148.781756]  mount_bdev+0x190/0x1b7
> [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
> [13148.785634]  legacy_get_tree+0x27/0x48
> [13148.787466]  vfs_get_tree+0x25/0xd0
> [13148.789270]  do_new_mount+0x18c/0x2d9
> [13148.791046]  __x64_sys_mount+0x10e/0x142
> [13148.792911]  do_syscall_64+0x3b/0x89
> [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
> [13148.797051] RIP: 0033:0x7f2309f6e26e
> [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
> [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
> [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
> [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
> [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
> [13148.816564]  </TASK>
> 
> To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
> introduced the feature to mount ocfs2 locally even it is cluster based,
> that is a very dangerous, it can easily cause serious data corruption,
> there is no way to stop other nodes mounting the fs and corrupting it.
> Setup ha or other cluster-aware stack is just the cost that we have to
> take for avoiding corruption, otherwise we have to do it in kernel.
> -------------------------- snip  --------------------------
> 
> ** analysis **
> 
> Under Junxiao's call trace, in __ocfs2_find_empty_slot(), the 'if'
> accessment is wrong. sl_node_num could be 0 at o2cb env.
> 
> with current information, the trigger flow (may):
> 1>
> node1 with 'node_num = 0' for mounting. it will succeed.
> at this time, slotmap extent block will contains es_valid:1 &
> es_node_num:0 for node1
> then ocfs2_update_disk_slot() will write back slotmap info to disk.
> 
> 2>
> then, node2 with 'node_num = 1' for mounting
> 
> ocfs2_find_slot
>  + ocfs2_update_slot_info //read slotmap info from disk
>  |  + set si->si_slots[0].es_valid = 1 & si->si_slots[0].sl_node_num = 0
>  |
>  + __ocfs2_node_num_to_slot //will return -ENOENT.
>     __ocfs2_find_empty_slot
>      + search preferred (node_num:1) failed
>      + 'si->si_slots[0].sl_node_num' is false. trigger 'break' condition.
>      + return slot 0 will cause node2 grab node1 journal dlm lock, then trigger hung.
> 
> ** how to fix **
> 
> For simplifing code logic design, We make a rule:
> If last mount didn't do umount, (eg: crash happened), the next mount MUST
> be same mount type.
> 
> All possible cases, when enter ocfs2_find_slot():
> 
> 1. all slots are empty, [cluster|nocluster] mode mount will succeed.
>    this is clean ocfs2 volume case.
> 
> 2. for nocluster mount action
>    - slot 0 is empty, but another slot is not empty:
>      - mount failure. (there should be in clustered env, deny mixed mount case)
>    - slot 0 is not empty, all other slots are empty:
>      - slot 0 is nocluster type: mount success  (may crash last time)
>      - slot 0 is cluster type: mount failure (deny mixed mount case)
> 
> 3. for cluster mount action
>    - slot 0 is empty, but another slot is not empty:
>      - mount success
>    - slot 0 is not empty, all other slots are empty:
>      - slot 0 is nocluster type: mount failure (deny mixed mount case)
>      - slot 0 is cluster type: mount success (may crash last time)
> 
> above with simplified form:
> 1.
> clean parition => nocluster/cluster@any_node    - success
> 
> 2.
> cluster@any_node => nocluster@any_node          - failure
> nocluster@node1 => crash => nocluster@node1     - success
> nocluster@node2 => nocluster@node1              - success [*]
> cluster@any_node => crash => nocluster@any_node - failure
> 
> 3.
> cluster@any_node => cluster@any_node            - success
> cluster@any_node => crash => cluster@any_node   - success
> nocluster@any_node => crash => cluster@any_node - failure
> 
> [*]: this is the only risk to corrupt data. we allow this case happen:
> - node2 may crash or borken, and fails to bootup anymore.
> - node2 fails to access ocfs2 volume, needs to manage from another node.
> - mount.ocfs2 will give special warning for this case.
> 
> Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/dlmglue.c  |  3 ++
>  fs/ocfs2/ocfs2_fs.h |  3 ++
>  fs/ocfs2/slot_map.c | 70 ++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 62 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 801e60bab955..6b017ae46145 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -3403,6 +3403,9 @@ void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
>  	ocfs2_lock_res_free(&osb->osb_nfs_sync_lockres);
>  	ocfs2_lock_res_free(&osb->osb_orphan_scan.os_lockres);
>  
> +	if (ocfs2_mount_local(osb))
> +		return;
> +
>  	ocfs2_cluster_disconnect(osb->cconn, hangup_pending);
>  	osb->cconn = NULL;
>  
> diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
> index 638d875eccc7..4fe42e638309 100644
> --- a/fs/ocfs2/ocfs2_fs.h
> +++ b/fs/ocfs2/ocfs2_fs.h
> @@ -534,6 +534,9 @@ struct ocfs2_slot_map {
>   */
>  };
>  
> +#define OCFS2_SLOTMAP_CLUSTER   1
> +#define OCFS2_SLOTMAP_NOCLUSTER 2
> +
>  struct ocfs2_extended_slot {
>  /*00*/	__u8	es_valid;
>  	__u8	es_reserved1[3];
> diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
> index 0b0ae3ebb0cf..a5c06d3ecb27 100644
> --- a/fs/ocfs2/slot_map.c
> +++ b/fs/ocfs2/slot_map.c
> @@ -26,7 +26,7 @@
>  
>  
>  struct ocfs2_slot {
> -	int sl_valid;
> +	u8 sl_valid;
>  	unsigned int sl_node_num;
>  };
>  
> @@ -52,11 +52,11 @@ static void ocfs2_invalidate_slot(struct ocfs2_slot_info *si,
>  }
>  
>  static void ocfs2_set_slot(struct ocfs2_slot_info *si,
> -			   int slot_num, unsigned int node_num)
> +			   int slot_num, unsigned int node_num, u8 valid)
>  {
>  	BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots));
>  
> -	si->si_slots[slot_num].sl_valid = 1;
> +	si->si_slots[slot_num].sl_valid = valid;
>  	si->si_slots[slot_num].sl_node_num = node_num;
>  }
>  
> @@ -75,7 +75,8 @@ static void ocfs2_update_slot_info_extended(struct ocfs2_slot_info *si)
>  		     i++, slotno++) {
>  			if (se->se_slots[i].es_valid)
>  				ocfs2_set_slot(si, slotno,
> -					       le32_to_cpu(se->se_slots[i].es_node_num));
> +					       le32_to_cpu(se->se_slots[i].es_node_num),
> +						   le32_to_cpu(se->se_slots[i].es_valid));
>  			else
>  				ocfs2_invalidate_slot(si, slotno);
>  		}
> @@ -97,7 +98,7 @@ static void ocfs2_update_slot_info_old(struct ocfs2_slot_info *si)
>  		if (le16_to_cpu(sm->sm_slots[i]) == (u16)OCFS2_INVALID_SLOT)
>  			ocfs2_invalidate_slot(si, i);
>  		else
> -			ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i]));
> +			ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i]), OCFS2_SLOTMAP_CLUSTER);
>  	}
>  }
>  
> @@ -252,16 +253,14 @@ static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
>  	int i, ret = -ENOSPC;
>  
>  	if ((preferred >= 0) && (preferred < si->si_num_slots)) {
> -		if (!si->si_slots[preferred].sl_valid ||
> -		    !si->si_slots[preferred].sl_node_num) {
> +		if (!si->si_slots[preferred].sl_valid) {
>  			ret = preferred;
>  			goto out;
>  		}
>  	}
>  
>  	for(i = 0; i < si->si_num_slots; i++) {
> -		if (!si->si_slots[i].sl_valid ||
> -		    !si->si_slots[i].sl_node_num) {
> +		if (!si->si_slots[i].sl_valid) {
>  			ret = i;
>  			break;
>  		}
> @@ -270,6 +269,20 @@ static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
>  	return ret;
>  }
>  
> +static int __ocfs2_find_used_slot(struct ocfs2_slot_info *si)
> +{
> +	int i, ret = -ENOENT;
> +
> +	for (i = 0; i < si->si_num_slots; i++) {
> +		if (si->si_slots[i].sl_valid) {
> +			ret = i;
> +			break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
>  int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num)
>  {
>  	int slot;
> @@ -449,17 +462,45 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
>  {
>  	int status;
>  	int slot;
> +	int nocluster_mnt = 0;
>  	struct ocfs2_slot_info *si;
>  
>  	si = osb->slot_info;
>  
>  	spin_lock(&osb->osb_lock);
>  	ocfs2_update_slot_info(si);
> +	slot = __ocfs2_find_used_slot(si);
> +	if (slot == 0 && (si->si_slots[0].sl_valid == OCFS2_SLOTMAP_NOCLUSTER))
> +		nocluster_mnt = 1;
>  
> -	if (ocfs2_mount_local(osb))
> -		/* use slot 0 directly in local mode */
> -		slot = 0;
> -	else {
> +	/*
> +	 * We set a rule:
> +	 * if last mount didn't do unmount, (eg: crash happened), the next mount
> +	 * MUST be same mount type.
> +	 */
> +	if (ocfs2_mount_local(osb)) {
> +		/* empty slotmap, or partition didn't unmount last time */
> +		if ((slot == -ENOENT) || nocluster_mnt) {
> +			/* use slot 0 directly in local mode */
> +			slot = 0;
> +			nocluster_mnt = 1;
> +		} else {
> +			spin_unlock(&osb->osb_lock);
> +			mlog(ML_ERROR, "found clustered mount slot in noclustered env!\n");
> +			mlog(ML_ERROR, "please clean slotmap info for mount.\n");
> +			mlog(ML_ERROR, "eg. remount then unmount with clustered mode\n");
> +			status = -EINVAL;
> +			goto bail;
> +		}
> +	} else {
> +		if (nocluster_mnt) {
> +			spin_unlock(&osb->osb_lock);
> +			mlog(ML_ERROR, "found noclustered mount slot in clustered env!\n");
> +			mlog(ML_ERROR, "please clean slotmap info for mount.\n");
> +			mlog(ML_ERROR, "eg. remount then unmount with noclustered mode\n");
> +			status = -EINVAL;
> +			goto bail;
> +		}
>  		/* search for ourselves first and take the slot if it already
>  		 * exists. Perhaps we need to mark this in a variable for our
>  		 * own journal recovery? Possibly not, though we certainly
> @@ -481,7 +522,8 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
>  			       slot, osb->dev_str);
>  	}
>  
> -	ocfs2_set_slot(si, slot, osb->node_num);
> +	ocfs2_set_slot(si, slot, osb->node_num, nocluster_mnt ?
> +					OCFS2_SLOTMAP_NOCLUSTER : OCFS2_SLOTMAP_CLUSTER);
>  	osb->slot_num = slot;
>  	spin_unlock(&osb->osb_lock);
>  

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-12 14:16   ` Joseph Qi via Ocfs2-devel
@ 2022-06-13  7:59     ` heming.zhao--- via Ocfs2-devel
  2022-06-13  8:21       ` Joseph Qi via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-06-13  7:59 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel

On 6/12/22 22:16, Joseph Qi wrote:
> Hi,
> 
> Why can't use local mount? I don't remember if we discuss about this.
> 
Sorry, I can't follow your question.
Do you mean why revert commit 912f655d78c5?

or you are interest with the feature local mount?
the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
see mkfs.ocfs2(8) '-M' option.

/Heming

> 
> On 6/8/22 6:48 PM, Heming Zhao wrote:
>> Below commit log copied from Junxiao's patch:
>> https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000107.html
>>
>> Junxiao planed to revert commit 912f655d78c5("ocfs2: mount shared volume
>> without ha stack"), but maintainer & I preferred to keep and fix this bug.
>>
>> -------------------------- snip  --------------------------
>> This commit introduced a regression that can cause mount hung.
>> The changes in __ocfs2_find_empty_slot causes that any node with
>> none-zero node number can grab the slot that was already taken by
>> node 0, so node 1 will access the same journal with node 0, when it
>> try to grab journal cluster lock, it will hung because it was already
>> acquired by node 0.
>> It's very easy to reproduce this, in one cluster, mount node 0 first,
>> then node 1, you will see the following call trace from node 1.
>>
>> [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
>> [13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
>> [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
>> [13148.749354] Call Trace:
>> [13148.750718]  <TASK>
>> [13148.752019]  ? usleep_range+0x90/0x89
>> [13148.753882]  __schedule+0x210/0x567
>> [13148.755684]  schedule+0x44/0xa8
>> [13148.757270]  schedule_timeout+0x106/0x13c
>> [13148.759273]  ? __prepare_to_swait+0x53/0x78
>> [13148.761218]  __wait_for_common+0xae/0x163
>> [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
>> [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
>> [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
>> [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
>> [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
>> [13148.775401]  ? iput+0x69/0xba
>> [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
>> [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
>> [13148.781756]  mount_bdev+0x190/0x1b7
>> [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
>> [13148.785634]  legacy_get_tree+0x27/0x48
>> [13148.787466]  vfs_get_tree+0x25/0xd0
>> [13148.789270]  do_new_mount+0x18c/0x2d9
>> [13148.791046]  __x64_sys_mount+0x10e/0x142
>> [13148.792911]  do_syscall_64+0x3b/0x89
>> [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
>> [13148.797051] RIP: 0033:0x7f2309f6e26e
>> [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
>> [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
>> [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
>> [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
>> [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
>> [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
>> [13148.816564]  </TASK>
>>
>> To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
>> introduced the feature to mount ocfs2 locally even it is cluster based,
>> that is a very dangerous, it can easily cause serious data corruption,
>> there is no way to stop other nodes mounting the fs and corrupting it.
>> Setup ha or other cluster-aware stack is just the cost that we have to
>> take for avoiding corruption, otherwise we have to do it in kernel.
>> -------------------------- snip  --------------------------
>>
>> ** analysis **
>>
>> Under Junxiao's call trace, in __ocfs2_find_empty_slot(), the 'if'
>> accessment is wrong. sl_node_num could be 0 at o2cb env.
>>
>> with current information, the trigger flow (may):
>> 1>
>> node1 with 'node_num = 0' for mounting. it will succeed.
>> at this time, slotmap extent block will contains es_valid:1 &
>> es_node_num:0 for node1
>> then ocfs2_update_disk_slot() will write back slotmap info to disk.
>>
>> 2>
>> then, node2 with 'node_num = 1' for mounting
>>
>> ocfs2_find_slot
>>   + ocfs2_update_slot_info //read slotmap info from disk
>>   |  + set si->si_slots[0].es_valid = 1 & si->si_slots[0].sl_node_num = 0
>>   |
>>   + __ocfs2_node_num_to_slot //will return -ENOENT.
>>      __ocfs2_find_empty_slot
>>       + search preferred (node_num:1) failed
>>       + 'si->si_slots[0].sl_node_num' is false. trigger 'break' condition.
>>       + return slot 0 will cause node2 grab node1 journal dlm lock, then trigger hung.
>>
>> ** how to fix **
>>
>> For simplifing code logic design, We make a rule:
>> If last mount didn't do umount, (eg: crash happened), the next mount MUST
>> be same mount type.
>>
>> All possible cases, when enter ocfs2_find_slot():
>>
>> 1. all slots are empty, [cluster|nocluster] mode mount will succeed.
>>     this is clean ocfs2 volume case.
>>
>> 2. for nocluster mount action
>>     - slot 0 is empty, but another slot is not empty:
>>       - mount failure. (there should be in clustered env, deny mixed mount case)
>>     - slot 0 is not empty, all other slots are empty:
>>       - slot 0 is nocluster type: mount success  (may crash last time)
>>       - slot 0 is cluster type: mount failure (deny mixed mount case)
>>
>> 3. for cluster mount action
>>     - slot 0 is empty, but another slot is not empty:
>>       - mount success
>>     - slot 0 is not empty, all other slots are empty:
>>       - slot 0 is nocluster type: mount failure (deny mixed mount case)
>>       - slot 0 is cluster type: mount success (may crash last time)
>>
>> above with simplified form:
>> 1.
>> clean parition => nocluster/cluster@any_node    - success
>>
>> 2.
>> cluster@any_node => nocluster@any_node          - failure
>> nocluster@node1 => crash => nocluster@node1     - success
>> nocluster@node2 => nocluster@node1              - success [*]
>> cluster@any_node => crash => nocluster@any_node - failure
>>
>> 3.
>> cluster@any_node => cluster@any_node            - success
>> cluster@any_node => crash => cluster@any_node   - success
>> nocluster@any_node => crash => cluster@any_node - failure
>>
>> [*]: this is the only risk to corrupt data. we allow this case happen:
>> - node2 may crash or borken, and fails to bootup anymore.
>> - node2 fails to access ocfs2 volume, needs to manage from another node.
>> - mount.ocfs2 will give special warning for this case.
>>
>> Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>> ---
>>   fs/ocfs2/dlmglue.c  |  3 ++
>>   fs/ocfs2/ocfs2_fs.h |  3 ++
>>   fs/ocfs2/slot_map.c | 70 ++++++++++++++++++++++++++++++++++++---------
>>   3 files changed, 62 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 801e60bab955..6b017ae46145 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -3403,6 +3403,9 @@ void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
>>   	ocfs2_lock_res_free(&osb->osb_nfs_sync_lockres);
>>   	ocfs2_lock_res_free(&osb->osb_orphan_scan.os_lockres);
>>   
>> +	if (ocfs2_mount_local(osb))
>> +		return;
>> +
>>   	ocfs2_cluster_disconnect(osb->cconn, hangup_pending);
>>   	osb->cconn = NULL;
>>   
>> diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
>> index 638d875eccc7..4fe42e638309 100644
>> --- a/fs/ocfs2/ocfs2_fs.h
>> +++ b/fs/ocfs2/ocfs2_fs.h
>> @@ -534,6 +534,9 @@ struct ocfs2_slot_map {
>>    */
>>   };
>>   
>> +#define OCFS2_SLOTMAP_CLUSTER   1
>> +#define OCFS2_SLOTMAP_NOCLUSTER 2
>> +
>>   struct ocfs2_extended_slot {
>>   /*00*/	__u8	es_valid;
>>   	__u8	es_reserved1[3];
>> diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
>> index 0b0ae3ebb0cf..a5c06d3ecb27 100644
>> --- a/fs/ocfs2/slot_map.c
>> +++ b/fs/ocfs2/slot_map.c
>> @@ -26,7 +26,7 @@
>>   
>>   
>>   struct ocfs2_slot {
>> -	int sl_valid;
>> +	u8 sl_valid;
>>   	unsigned int sl_node_num;
>>   };
>>   
>> @@ -52,11 +52,11 @@ static void ocfs2_invalidate_slot(struct ocfs2_slot_info *si,
>>   }
>>   
>>   static void ocfs2_set_slot(struct ocfs2_slot_info *si,
>> -			   int slot_num, unsigned int node_num)
>> +			   int slot_num, unsigned int node_num, u8 valid)
>>   {
>>   	BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots));
>>   
>> -	si->si_slots[slot_num].sl_valid = 1;
>> +	si->si_slots[slot_num].sl_valid = valid;
>>   	si->si_slots[slot_num].sl_node_num = node_num;
>>   }
>>   
>> @@ -75,7 +75,8 @@ static void ocfs2_update_slot_info_extended(struct ocfs2_slot_info *si)
>>   		     i++, slotno++) {
>>   			if (se->se_slots[i].es_valid)
>>   				ocfs2_set_slot(si, slotno,
>> -					       le32_to_cpu(se->se_slots[i].es_node_num));
>> +					       le32_to_cpu(se->se_slots[i].es_node_num),
>> +						   le32_to_cpu(se->se_slots[i].es_valid));
>>   			else
>>   				ocfs2_invalidate_slot(si, slotno);
>>   		}
>> @@ -97,7 +98,7 @@ static void ocfs2_update_slot_info_old(struct ocfs2_slot_info *si)
>>   		if (le16_to_cpu(sm->sm_slots[i]) == (u16)OCFS2_INVALID_SLOT)
>>   			ocfs2_invalidate_slot(si, i);
>>   		else
>> -			ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i]));
>> +			ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i]), OCFS2_SLOTMAP_CLUSTER);
>>   	}
>>   }
>>   
>> @@ -252,16 +253,14 @@ static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
>>   	int i, ret = -ENOSPC;
>>   
>>   	if ((preferred >= 0) && (preferred < si->si_num_slots)) {
>> -		if (!si->si_slots[preferred].sl_valid ||
>> -		    !si->si_slots[preferred].sl_node_num) {
>> +		if (!si->si_slots[preferred].sl_valid) {
>>   			ret = preferred;
>>   			goto out;
>>   		}
>>   	}
>>   
>>   	for(i = 0; i < si->si_num_slots; i++) {
>> -		if (!si->si_slots[i].sl_valid ||
>> -		    !si->si_slots[i].sl_node_num) {
>> +		if (!si->si_slots[i].sl_valid) {
>>   			ret = i;
>>   			break;
>>   		}
>> @@ -270,6 +269,20 @@ static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
>>   	return ret;
>>   }
>>   
>> +static int __ocfs2_find_used_slot(struct ocfs2_slot_info *si)
>> +{
>> +	int i, ret = -ENOENT;
>> +
>> +	for (i = 0; i < si->si_num_slots; i++) {
>> +		if (si->si_slots[i].sl_valid) {
>> +			ret = i;
>> +			break;
>> +		}
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num)
>>   {
>>   	int slot;
>> @@ -449,17 +462,45 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
>>   {
>>   	int status;
>>   	int slot;
>> +	int nocluster_mnt = 0;
>>   	struct ocfs2_slot_info *si;
>>   
>>   	si = osb->slot_info;
>>   
>>   	spin_lock(&osb->osb_lock);
>>   	ocfs2_update_slot_info(si);
>> +	slot = __ocfs2_find_used_slot(si);
>> +	if (slot == 0 && (si->si_slots[0].sl_valid == OCFS2_SLOTMAP_NOCLUSTER))
>> +		nocluster_mnt = 1;
>>   
>> -	if (ocfs2_mount_local(osb))
>> -		/* use slot 0 directly in local mode */
>> -		slot = 0;
>> -	else {
>> +	/*
>> +	 * We set a rule:
>> +	 * if last mount didn't do unmount, (eg: crash happened), the next mount
>> +	 * MUST be same mount type.
>> +	 */
>> +	if (ocfs2_mount_local(osb)) {
>> +		/* empty slotmap, or partition didn't unmount last time */
>> +		if ((slot == -ENOENT) || nocluster_mnt) {
>> +			/* use slot 0 directly in local mode */
>> +			slot = 0;
>> +			nocluster_mnt = 1;
>> +		} else {
>> +			spin_unlock(&osb->osb_lock);
>> +			mlog(ML_ERROR, "found clustered mount slot in noclustered env!\n");
>> +			mlog(ML_ERROR, "please clean slotmap info for mount.\n");
>> +			mlog(ML_ERROR, "eg. remount then unmount with clustered mode\n");
>> +			status = -EINVAL;
>> +			goto bail;
>> +		}
>> +	} else {
>> +		if (nocluster_mnt) {
>> +			spin_unlock(&osb->osb_lock);
>> +			mlog(ML_ERROR, "found noclustered mount slot in clustered env!\n");
>> +			mlog(ML_ERROR, "please clean slotmap info for mount.\n");
>> +			mlog(ML_ERROR, "eg. remount then unmount with noclustered mode\n");
>> +			status = -EINVAL;
>> +			goto bail;
>> +		}
>>   		/* search for ourselves first and take the slot if it already
>>   		 * exists. Perhaps we need to mark this in a variable for our
>>   		 * own journal recovery? Possibly not, though we certainly
>> @@ -481,7 +522,8 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
>>   			       slot, osb->dev_str);
>>   	}
>>   
>> -	ocfs2_set_slot(si, slot, osb->node_num);
>> +	ocfs2_set_slot(si, slot, osb->node_num, nocluster_mnt ?
>> +					OCFS2_SLOTMAP_NOCLUSTER : OCFS2_SLOTMAP_CLUSTER);
>>   	osb->slot_num = slot;
>>   	spin_unlock(&osb->osb_lock);
>>   


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-13  7:59     ` heming.zhao--- via Ocfs2-devel
@ 2022-06-13  8:21       ` Joseph Qi via Ocfs2-devel
  2022-06-13  8:48         ` heming.zhao--- via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2022-06-13  8:21 UTC (permalink / raw)
  To: heming.zhao, ocfs2-devel



On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
> On 6/12/22 22:16, Joseph Qi wrote:
>> Hi,
>>
>> Why can't use local mount? I don't remember if we discuss about this.
>>
> Sorry, I can't follow your question.
> Do you mean why revert commit 912f655d78c5?
> 
> or you are interest with the feature local mount?
> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
> see mkfs.ocfs2(8) '-M' option.
> 
What Junxiao's main concern is data corruption, so I'm afraid we have to
introduce an ondisk feature bit to prevent mixed nocluster and cluster
mount, similar to local mount.
Another scenario is journal replay after crash.

Thanks,
Joseph

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-13  8:21       ` Joseph Qi via Ocfs2-devel
@ 2022-06-13  8:48         ` heming.zhao--- via Ocfs2-devel
  2022-06-13 15:43           ` Junxiao Bi via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-06-13  8:48 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel

On 6/13/22 16:21, Joseph Qi wrote:
> 
> 
> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>> On 6/12/22 22:16, Joseph Qi wrote:
>>> Hi,
>>>
>>> Why can't use local mount? I don't remember if we discuss about this.
>>>
>> Sorry, I can't follow your question.
>> Do you mean why revert commit 912f655d78c5?
>>
>> or you are interest with the feature local mount?
>> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
>> see mkfs.ocfs2(8) '-M' option.
>>
> What Junxiao's main concern is data corruption, so I'm afraid we have to
> introduce an ondisk feature bit to prevent mixed nocluster and cluster
> mount, similar to local mount.

this patch defined two new variants/flags:
#define OCFS2_SLOTMAP_CLUSTER   1
#define OCFS2_SLOTMAP_NOCLUSTER 2

(I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and for compatibility,
anything doesn't need to be changed.

OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
this new value only take effect after a successfully nocluster mount.
(pls fix me), existed kernel/user space code don't do any special handle for
noclustered mount mode in slotmap area. So the new value is also compatibility.

And the patch can also prevent mixed mount, the related code is in ocfs2_find_slot().
code logic:
- noclustered mount condition: slotmap is empty or already mounted with noclustered
- clustered mount condition: slotmap is empty or already mounted with clustered.
- all other conditions will be denied.

> Another scenario is journal replay after crash.
> 

this patch set a rule:
If last mount didn't do umount, (eg: crash happened), the next mount MUST be same mount type.
(please also check above lines of 'code logic'.)

In my view, this rule is enough to handle crash scenario.
So my patch should be polished in somewhere, but it is workable.

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-13  8:48         ` heming.zhao--- via Ocfs2-devel
@ 2022-06-13 15:43           ` Junxiao Bi via Ocfs2-devel
  2022-06-14  2:59             ` heming.zhao--- via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: Junxiao Bi via Ocfs2-devel @ 2022-06-13 15:43 UTC (permalink / raw)
  To: heming.zhao; +Cc: ocfs2-devel


> 在 2022年6月13日,上午1:48,heming.zhao@suse.com 写道:
> 
> On 6/13/22 16:21, Joseph Qi wrote:
>>> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>> Hi,
>>>> 
>>>> Why can't use local mount? I don't remember if we discuss about this.
>>>> 
>>> Sorry, I can't follow your question.
>>> Do you mean why revert commit 912f655d78c5?
>>> 
>>> or you are interest with the feature local mount?
>>> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
>>> see mkfs.ocfs2(8) '-M' option.
>>> 
>> What Junxiao's main concern is data corruption, so I'm afraid we have to
>> introduce an ondisk feature bit to prevent mixed nocluster and cluster
>> mount, similar to local mount.
> 
> this patch defined two new variants/flags:
> #define OCFS2_SLOTMAP_CLUSTER   1
> #define OCFS2_SLOTMAP_NOCLUSTER 2
> 
> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and for compatibility,
> anything doesn't need to be changed.
> 
> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
> this new value only take effect after a successfully nocluster mount.
> (pls fix me), existed kernel/user space code don't do any special handle for
> noclustered mount mode in slotmap area. So the new value is also compatibility.
> 
> And the patch can also prevent mixed mount, the related code is in ocfs2_find_slot().
> code logic:
> - noclustered mount condition: slotmap is empty or already mounted with noclustered
> - clustered mount condition: slotmap is empty or already mounted with clustered.
> - all other conditions will be denied.
Finding slot required reading slot map and then  update slot map. It is not atomic , you can’t prevent mixed mount until you have a cluster lock.
> 
>> Another scenario is journal replay after crash.
> 
> this patch set a rule:
> If last mount didn't do umount, (eg: crash happened), the next mount MUST be same mount type.
> (please also check above lines of 'code logic'.)
> 
> In my view, this rule is enough to handle crash scenario.
> So my patch should be polished in somewhere, but it is workable.
> 
> Thanks,
> Heming
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-13 15:43           ` Junxiao Bi via Ocfs2-devel
@ 2022-06-14  2:59             ` heming.zhao--- via Ocfs2-devel
  2022-06-14  3:27               ` Joseph Qi via Ocfs2-devel
  2022-06-14  6:22               ` Junxiao Bi via Ocfs2-devel
  0 siblings, 2 replies; 17+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-06-14  2:59 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: ocfs2-devel

On 6/13/22 23:43, Junxiao Bi wrote:
> 
>> 在 2022年6月13日,上午1:48,heming.zhao@suse.com 写道:
>>
>> On 6/13/22 16:21, Joseph Qi wrote:
>>>> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>> Hi,
>>>>>
>>>>> Why can't use local mount? I don't remember if we discuss about this.
>>>>>
>>>> Sorry, I can't follow your question.
>>>> Do you mean why revert commit 912f655d78c5?
>>>>
>>>> or you are interest with the feature local mount?
>>>> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
>>>> see mkfs.ocfs2(8) '-M' option.
>>>>
>>> What Junxiao's main concern is data corruption, so I'm afraid we have to
>>> introduce an ondisk feature bit to prevent mixed nocluster and cluster
>>> mount, similar to local mount.
>>
>> this patch defined two new variants/flags:
>> #define OCFS2_SLOTMAP_CLUSTER   1
>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>
>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and for compatibility,
>> anything doesn't need to be changed.
>>
>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
>> this new value only take effect after a successfully nocluster mount.
>> (pls fix me), existed kernel/user space code don't do any special handle for
>> noclustered mount mode in slotmap area. So the new value is also compatibility.
>>
>> And the patch can also prevent mixed mount, the related code is in ocfs2_find_slot().
>> code logic:
>> - noclustered mount condition: slotmap is empty or already mounted with noclustered
>> - clustered mount condition: slotmap is empty or already mounted with clustered.
>> - all other conditions will be denied.
> Finding slot required reading slot map and then  update slot map. It is not atomic , you can’t prevent mixed mount until you have a cluster lock.

Could I say your mentioned use case is invalid.
I believe all (yes, *all*) the ocfs2 users use this fs under cluster mode in
their product env.
The nocluster mount is only used on maintained env (eg. backup, fsck).

We only concern two ways:
1. user forgets to unmount (eg crash) before using another mount mode.
2. when ocfs2 volume is working, which should deny volume is mounted by another mode.
    this may happened user mistakenly runs command or script.

Base on above 1 & 2, Junxiao above mentioned use case only happens on one scenario:
the volume doesn't be mounted by any node, then user mistakenly mounts volume
with [no]cluster mode at same time. I believe this use case is invalid. And I guess
gfs2 may also has the same issue.

With this patch, ocfs2 has more sanity check ability than other fs, eg: xfs, ext4.
SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 with A/P mode.
The xfs/ext4 never have detecting mount status code, Junxiao mentioned mixed mount
can also happens on these fs. How do xfs/ext4/HA maintainers handle it? Under these
fs mounting behavior, these fields maintainers also treat the mixed mount as
invalid use case and ignore handle it.

/Heming

>>
>>> Another scenario is journal replay after crash.
>>
>> this patch set a rule:
>> If last mount didn't do umount, (eg: crash happened), the next mount MUST be same mount type.
>> (please also check above lines of 'code logic'.)
>>
>> In my view, this rule is enough to handle crash scenario.
>> So my patch should be polished in somewhere, but it is workable.
>>
>> Thanks,
>> Heming


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-14  2:59             ` heming.zhao--- via Ocfs2-devel
@ 2022-06-14  3:27               ` Joseph Qi via Ocfs2-devel
  2022-06-14 12:13                 ` heming.zhao--- via Ocfs2-devel
  2022-06-14  6:22               ` Junxiao Bi via Ocfs2-devel
  1 sibling, 1 reply; 17+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2022-06-14  3:27 UTC (permalink / raw)
  To: heming.zhao, Junxiao Bi; +Cc: ocfs2-devel



On 6/14/22 10:59 AM, heming.zhao@suse.com wrote:
> On 6/13/22 23:43, Junxiao Bi wrote:
>>
>>> 在 2022年6月13日,上午1:48,heming.zhao@suse.com 写道:
>>>
>>> On 6/13/22 16:21, Joseph Qi wrote:
>>>>> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Why can't use local mount? I don't remember if we discuss about this.
>>>>>>
>>>>> Sorry, I can't follow your question.
>>>>> Do you mean why revert commit 912f655d78c5?
>>>>>
>>>>> or you are interest with the feature local mount?
>>>>> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
>>>>> see mkfs.ocfs2(8) '-M' option.
>>>>>
>>>> What Junxiao's main concern is data corruption, so I'm afraid we have to
>>>> introduce an ondisk feature bit to prevent mixed nocluster and cluster
>>>> mount, similar to local mount.
>>>
>>> this patch defined two new variants/flags:
>>> #define OCFS2_SLOTMAP_CLUSTER   1
>>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>>
>>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and for compatibility,
>>> anything doesn't need to be changed.
>>>
>>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
>>> this new value only take effect after a successfully nocluster mount.
>>> (pls fix me), existed kernel/user space code don't do any special handle for
>>> noclustered mount mode in slotmap area. So the new value is also compatibility.
>>>
>>> And the patch can also prevent mixed mount, the related code is in ocfs2_find_slot().
>>> code logic:
>>> - noclustered mount condition: slotmap is empty or already mounted with noclustered
>>> - clustered mount condition: slotmap is empty or already mounted with clustered.
>>> - all other conditions will be denied.
>> Finding slot required reading slot map and then  update slot map. It is not atomic , you can’t prevent mixed mount until you have a cluster lock.
> 
> Could I say your mentioned use case is invalid.
> I believe all (yes, *all*) the ocfs2 users use this fs under cluster mode in
> their product env.
> The nocluster mount is only used on maintained env (eg. backup, fsck).
> 
> We only concern two ways:
> 1. user forgets to unmount (eg crash) before using another mount mode.
> 2. when ocfs2 volume is working, which should deny volume is mounted by another mode.
>    this may happened user mistakenly runs command or script.
> 
> Base on above 1 & 2, Junxiao above mentioned use case only happens on one scenario:
> the volume doesn't be mounted by any node, then user mistakenly mounts volume
> with [no]cluster mode at same time. I believe this use case is invalid. And I guess
> gfs2 may also has the same issue.
> 
> With this patch, ocfs2 has more sanity check ability than other fs, eg: xfs, ext4.
> SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 with A/P mode.
> The xfs/ext4 never have detecting mount status code, Junxiao mentioned mixed mount
> can also happens on these fs. How do xfs/ext4/HA maintainers handle it? Under these
> fs mounting behavior, these fields maintainers also treat the mixed mount as
> invalid use case and ignore handle it.

Ext4 has a feature mmp (multiple mount protection) to do this.
More details please refer to:
https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout

Thanks,
Joseph

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-14  2:59             ` heming.zhao--- via Ocfs2-devel
  2022-06-14  3:27               ` Joseph Qi via Ocfs2-devel
@ 2022-06-14  6:22               ` Junxiao Bi via Ocfs2-devel
  1 sibling, 0 replies; 17+ messages in thread
From: Junxiao Bi via Ocfs2-devel @ 2022-06-14  6:22 UTC (permalink / raw)
  To: heming.zhao; +Cc: ocfs2-devel

On 6/13/22 7:59 PM, heming.zhao@suse.com wrote:

> On 6/13/22 23:43, Junxiao Bi wrote:
>>
>>> 在 2022年6月13日,上午1:48,heming.zhao@suse.com 写道:
>>>
>>> On 6/13/22 16:21, Joseph Qi wrote:
>>>>> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Why can't use local mount? I don't remember if we discuss about 
>>>>>> this.
>>>>>>
>>>>> Sorry, I can't follow your question.
>>>>> Do you mean why revert commit 912f655d78c5?
>>>>>
>>>>> or you are interest with the feature local mount?
>>>>> the local mount is created by mkfs.ocfs2, which can't be converted 
>>>>> to clustered.
>>>>> see mkfs.ocfs2(8) '-M' option.
>>>>>
>>>> What Junxiao's main concern is data corruption, so I'm afraid we 
>>>> have to
>>>> introduce an ondisk feature bit to prevent mixed nocluster and cluster
>>>> mount, similar to local mount.
>>>
>>> this patch defined two new variants/flags:
>>> #define OCFS2_SLOTMAP_CLUSTER   1
>>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>>
>>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and 
>>> for compatibility,
>>> anything doesn't need to be changed.
>>>
>>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
>>> this new value only take effect after a successfully nocluster mount.
>>> (pls fix me), existed kernel/user space code don't do any special 
>>> handle for
>>> noclustered mount mode in slotmap area. So the new value is also 
>>> compatibility.
>>>
>>> And the patch can also prevent mixed mount, the related code is in 
>>> ocfs2_find_slot().
>>> code logic:
>>> - noclustered mount condition: slotmap is empty or already mounted 
>>> with noclustered
>>> - clustered mount condition: slotmap is empty or already mounted 
>>> with clustered.
>>> - all other conditions will be denied.
>> Finding slot required reading slot map and then  update slot map. It 
>> is not atomic , you can’t prevent mixed mount until you have a 
>> cluster lock.
>
> Could I say your mentioned use case is invalid.
> I believe all (yes, *all*) the ocfs2 users use this fs under cluster 
> mode in
> their product env.
> The nocluster mount is only used on maintained env (eg. backup, fsck).
>
> We only concern two ways:
> 1. user forgets to unmount (eg crash) before using another mount mode.
> 2. when ocfs2 volume is working, which should deny volume is mounted 
> by another mode.
>    this may happened user mistakenly runs command or script.
>
> Base on above 1 & 2, Junxiao above mentioned use case only happens on 
> one scenario:
> the volume doesn't be mounted by any node, then user mistakenly mounts 
> volume
> with [no]cluster mode at same time. I believe this use case is 
> invalid. And I guess
> gfs2 may also has the same issue.

You missed the point, i never said that's the user case. The point is 
with your feature ocfs2 could be corrupted due to mistake done by 
customer, thinking about why fsck.ocfs2/mkfs.ocfs2 check that before 
changing anything.

You'd better make sure gfs2 had that issue, if so, i think you should 
raise a bug to them.

Thanks,

Junxiao.

> With this patch, ocfs2 has more sanity check ability than other fs, 
> eg: xfs, ext4.
> SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 
> with A/P mode.
> The xfs/ext4 never have detecting mount status code, Junxiao mentioned 
> mixed mount
> can also happens on these fs. How do xfs/ext4/HA maintainers handle 
> it? Under these
> fs mounting behavior, these fields maintainers also treat the mixed 
> mount as
> invalid use case and ignore handle it.
>
> /Heming
>
>>>
>>>> Another scenario is journal replay after crash.
>>>
>>> this patch set a rule:
>>> If last mount didn't do umount, (eg: crash happened), the next mount 
>>> MUST be same mount type.
>>> (please also check above lines of 'code logic'.)
>>>
>>> In my view, this rule is enough to handle crash scenario.
>>> So my patch should be polished in somewhere, but it is workable.
>>>
>>> Thanks,
>>> Heming
>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-14  3:27               ` Joseph Qi via Ocfs2-devel
@ 2022-06-14 12:13                 ` heming.zhao--- via Ocfs2-devel
  2022-06-15  2:03                   ` Joseph Qi via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-06-14 12:13 UTC (permalink / raw)
  To: Joseph Qi, Junxiao Bi; +Cc: ocfs2-devel

On 6/14/22 11:27, Joseph Qi wrote:
> 
> 
> On 6/14/22 10:59 AM, heming.zhao@suse.com wrote:
>> On 6/13/22 23:43, Junxiao Bi wrote:
>>>
>>>> 在 2022年6月13日,上午1:48,heming.zhao@suse.com 写道:
>>>>
>>>> On 6/13/22 16:21, Joseph Qi wrote:
>>>>>> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>>>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Why can't use local mount? I don't remember if we discuss about this.
>>>>>>>
>>>>>> Sorry, I can't follow your question.
>>>>>> Do you mean why revert commit 912f655d78c5?
>>>>>>
>>>>>> or you are interest with the feature local mount?
>>>>>> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
>>>>>> see mkfs.ocfs2(8) '-M' option.
>>>>>>
>>>>> What Junxiao's main concern is data corruption, so I'm afraid we have to
>>>>> introduce an ondisk feature bit to prevent mixed nocluster and cluster
>>>>> mount, similar to local mount.
>>>>
>>>> this patch defined two new variants/flags:
>>>> #define OCFS2_SLOTMAP_CLUSTER   1
>>>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>>>
>>>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and for compatibility,
>>>> anything doesn't need to be changed.
>>>>
>>>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
>>>> this new value only take effect after a successfully nocluster mount.
>>>> (pls fix me), existed kernel/user space code don't do any special handle for
>>>> noclustered mount mode in slotmap area. So the new value is also compatibility.
>>>>
>>>> And the patch can also prevent mixed mount, the related code is in ocfs2_find_slot().
>>>> code logic:
>>>> - noclustered mount condition: slotmap is empty or already mounted with noclustered
>>>> - clustered mount condition: slotmap is empty or already mounted with clustered.
>>>> - all other conditions will be denied.
>>> Finding slot required reading slot map and then  update slot map. It is not atomic , you can’t prevent mixed mount until you have a cluster lock.
>>
>> Could I say your mentioned use case is invalid.
>> I believe all (yes, *all*) the ocfs2 users use this fs under cluster mode in
>> their product env.
>> The nocluster mount is only used on maintained env (eg. backup, fsck).
>>
>> We only concern two ways:
>> 1. user forgets to unmount (eg crash) before using another mount mode.
>> 2. when ocfs2 volume is working, which should deny volume is mounted by another mode.
>>     this may happened user mistakenly runs command or script.
>>
>> Base on above 1 & 2, Junxiao above mentioned use case only happens on one scenario:
>> the volume doesn't be mounted by any node, then user mistakenly mounts volume
>> with [no]cluster mode at same time. I believe this use case is invalid. And I guess
>> gfs2 may also has the same issue.
>>
>> With this patch, ocfs2 has more sanity check ability than other fs, eg: xfs, ext4.
>> SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 with A/P mode.
>> The xfs/ext4 never have detecting mount status code, Junxiao mentioned mixed mount
>> can also happens on these fs. How do xfs/ext4/HA maintainers handle it? Under these
>> fs mounting behavior, these fields maintainers also treat the mixed mount as
>> invalid use case and ignore handle it.
> 
> Ext4 has a feature mmp (multiple mount protection) to do this.
> More details please refer to:
> https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout
> 

Hi Joseph,

I checked mmp feature related code, it can help ext4 to avoid multiple mount issue.

 From my code reading:
- the mmp feature did mmp check when mouting.
- the mmp feature will start a kernel thread (kmmpd) to monitor mount status.
- kmmpd does detecting underlying dev IOs task.

We could re-use mmp feature idea for ocfs2, do you plan to add it?
If yes, I could take this feature porting job.

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-14 12:13                 ` heming.zhao--- via Ocfs2-devel
@ 2022-06-15  2:03                   ` Joseph Qi via Ocfs2-devel
  2022-06-15  4:37                     ` heming.zhao--- via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2022-06-15  2:03 UTC (permalink / raw)
  To: heming.zhao, Junxiao Bi; +Cc: ocfs2-devel



On 6/14/22 8:13 PM, heming.zhao@suse.com wrote:
> On 6/14/22 11:27, Joseph Qi wrote:
>>
>>
>> On 6/14/22 10:59 AM, heming.zhao@suse.com wrote:
>>> On 6/13/22 23:43, Junxiao Bi wrote:
>>>>
>>>>> 在 2022年6月13日,上午1:48,heming.zhao@suse.com 写道:
>>>>>
>>>>> On 6/13/22 16:21, Joseph Qi wrote:
>>>>>>> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>>>>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Why can't use local mount? I don't remember if we discuss about this.
>>>>>>>>
>>>>>>> Sorry, I can't follow your question.
>>>>>>> Do you mean why revert commit 912f655d78c5?
>>>>>>>
>>>>>>> or you are interest with the feature local mount?
>>>>>>> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
>>>>>>> see mkfs.ocfs2(8) '-M' option.
>>>>>>>
>>>>>> What Junxiao's main concern is data corruption, so I'm afraid we have to
>>>>>> introduce an ondisk feature bit to prevent mixed nocluster and cluster
>>>>>> mount, similar to local mount.
>>>>>
>>>>> this patch defined two new variants/flags:
>>>>> #define OCFS2_SLOTMAP_CLUSTER   1
>>>>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>>>>
>>>>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and for compatibility,
>>>>> anything doesn't need to be changed.
>>>>>
>>>>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
>>>>> this new value only take effect after a successfully nocluster mount.
>>>>> (pls fix me), existed kernel/user space code don't do any special handle for
>>>>> noclustered mount mode in slotmap area. So the new value is also compatibility.
>>>>>
>>>>> And the patch can also prevent mixed mount, the related code is in ocfs2_find_slot().
>>>>> code logic:
>>>>> - noclustered mount condition: slotmap is empty or already mounted with noclustered
>>>>> - clustered mount condition: slotmap is empty or already mounted with clustered.
>>>>> - all other conditions will be denied.
>>>> Finding slot required reading slot map and then  update slot map. It is not atomic , you can’t prevent mixed mount until you have a cluster lock.
>>>
>>> Could I say your mentioned use case is invalid.
>>> I believe all (yes, *all*) the ocfs2 users use this fs under cluster mode in
>>> their product env.
>>> The nocluster mount is only used on maintained env (eg. backup, fsck).
>>>
>>> We only concern two ways:
>>> 1. user forgets to unmount (eg crash) before using another mount mode.
>>> 2. when ocfs2 volume is working, which should deny volume is mounted by another mode.
>>>     this may happened user mistakenly runs command or script.
>>>
>>> Base on above 1 & 2, Junxiao above mentioned use case only happens on one scenario:
>>> the volume doesn't be mounted by any node, then user mistakenly mounts volume
>>> with [no]cluster mode at same time. I believe this use case is invalid. And I guess
>>> gfs2 may also has the same issue.
>>>
>>> With this patch, ocfs2 has more sanity check ability than other fs, eg: xfs, ext4.
>>> SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 with A/P mode.
>>> The xfs/ext4 never have detecting mount status code, Junxiao mentioned mixed mount
>>> can also happens on these fs. How do xfs/ext4/HA maintainers handle it? Under these
>>> fs mounting behavior, these fields maintainers also treat the mixed mount as
>>> invalid use case and ignore handle it.
>>
>> Ext4 has a feature mmp (multiple mount protection) to do this.
>> More details please refer to:
>> https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout
>>
> 
> Hi Joseph,
> 
> I checked mmp feature related code, it can help ext4 to avoid multiple mount issue.
> 
> From my code reading:
> - the mmp feature did mmp check when mouting.
> - the mmp feature will start a kernel thread (kmmpd) to monitor mount status.
> - kmmpd does detecting underlying dev IOs task.
> 
> We could re-use mmp feature idea for ocfs2, do you plan to add it?
> If yes, I could take this feature porting job.
> 
Umm... I refer it just for your information.
ocfs2 is cluster filesystem, for normal user scenarios, we intend to
mount on multiple nodes.
For use it as local filesystem, we have local mount mode.
So it seems 'nocluster' mode looks a bit weird. I'm not sure if this is
a frequently happened user case. Or can we do it in cluster stack as
Junxiao suggested? Oracle and I (before) only use o2cb, so we don't have
much experience on corosync + pacemaker.

Thanks,
Joseph

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
  2022-06-15  2:03                   ` Joseph Qi via Ocfs2-devel
@ 2022-06-15  4:37                     ` heming.zhao--- via Ocfs2-devel
  0 siblings, 0 replies; 17+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-06-15  4:37 UTC (permalink / raw)
  To: Joseph Qi, Junxiao Bi; +Cc: ocfs2-devel

On 6/15/22 10:03, Joseph Qi wrote:
> 
> 
> On 6/14/22 8:13 PM, heming.zhao@suse.com wrote:
>> On 6/14/22 11:27, Joseph Qi wrote:
>>>
>>>
>>> On 6/14/22 10:59 AM, heming.zhao@suse.com wrote:
>>>> On 6/13/22 23:43, Junxiao Bi wrote:
>>>>>
>>>>>> 在 2022年6月13日,上午1:48,heming.zhao@suse.com 写道:
>>>>>>
>>>>>> On 6/13/22 16:21, Joseph Qi wrote:
>>>>>>>> On 6/13/22 3:59 PM, heming.zhao@suse.com wrote:
>>>>>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Why can't use local mount? I don't remember if we discuss about this.
>>>>>>>>>
>>>>>>>> Sorry, I can't follow your question.
>>>>>>>> Do you mean why revert commit 912f655d78c5?
>>>>>>>>
>>>>>>>> or you are interest with the feature local mount?
>>>>>>>> the local mount is created by mkfs.ocfs2, which can't be converted to clustered.
>>>>>>>> see mkfs.ocfs2(8) '-M' option.
>>>>>>>>
>>>>>>> What Junxiao's main concern is data corruption, so I'm afraid we have to
>>>>>>> introduce an ondisk feature bit to prevent mixed nocluster and cluster
>>>>>>> mount, similar to local mount.
>>>>>>
>>>>>> this patch defined two new variants/flags:
>>>>>> #define OCFS2_SLOTMAP_CLUSTER   1
>>>>>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>>>>>
>>>>>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and for compatibility,
>>>>>> anything doesn't need to be changed.
>>>>>>
>>>>>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
>>>>>> this new value only take effect after a successfully nocluster mount.
>>>>>> (pls fix me), existed kernel/user space code don't do any special handle for
>>>>>> noclustered mount mode in slotmap area. So the new value is also compatibility.
>>>>>>
>>>>>> And the patch can also prevent mixed mount, the related code is in ocfs2_find_slot().
>>>>>> code logic:
>>>>>> - noclustered mount condition: slotmap is empty or already mounted with noclustered
>>>>>> - clustered mount condition: slotmap is empty or already mounted with clustered.
>>>>>> - all other conditions will be denied.
>>>>> Finding slot required reading slot map and then  update slot map. It is not atomic , you can’t prevent mixed mount until you have a cluster lock.
>>>>
>>>> Could I say your mentioned use case is invalid.
>>>> I believe all (yes, *all*) the ocfs2 users use this fs under cluster mode in
>>>> their product env.
>>>> The nocluster mount is only used on maintained env (eg. backup, fsck).
>>>>
>>>> We only concern two ways:
>>>> 1. user forgets to unmount (eg crash) before using another mount mode.
>>>> 2. when ocfs2 volume is working, which should deny volume is mounted by another mode.
>>>>      this may happened user mistakenly runs command or script.
>>>>
>>>> Base on above 1 & 2, Junxiao above mentioned use case only happens on one scenario:
>>>> the volume doesn't be mounted by any node, then user mistakenly mounts volume
>>>> with [no]cluster mode at same time. I believe this use case is invalid. And I guess
>>>> gfs2 may also has the same issue.
>>>>
>>>> With this patch, ocfs2 has more sanity check ability than other fs, eg: xfs, ext4.
>>>> SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 with A/P mode.
>>>> The xfs/ext4 never have detecting mount status code, Junxiao mentioned mixed mount
>>>> can also happens on these fs. How do xfs/ext4/HA maintainers handle it? Under these
>>>> fs mounting behavior, these fields maintainers also treat the mixed mount as
>>>> invalid use case and ignore handle it.
>>>
>>> Ext4 has a feature mmp (multiple mount protection) to do this.
>>> More details please refer to:
>>> https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout
>>>
>>
>> Hi Joseph,
>>
>> I checked mmp feature related code, it can help ext4 to avoid multiple mount issue.
>>
>>  From my code reading:
>> - the mmp feature did mmp check when mouting.
>> - the mmp feature will start a kernel thread (kmmpd) to monitor mount status.
>> - kmmpd does detecting underlying dev IOs task.
>>
>> We could re-use mmp feature idea for ocfs2, do you plan to add it?
>> If yes, I could take this feature porting job.
>>
> Umm... I refer it just for your information.
> ocfs2 is cluster filesystem, for normal user scenarios, we intend to
> mount on multiple nodes.
> For use it as local filesystem, we have local mount mode.

Do you think anyone uses ocfs2 with local mount mode?
In my view, there are many FSs power than ocfs2 under local mount mode. They are
very likely to choose other fs (eg. xfs, ext4, btrfs).

The local mount mode is fixed when mkfs.ocfs2 and won't be changed later.
(please correct me if above line is wrong.)
nocluster feature is based on local mount feature, and nocluster mount mode
is more useful than local mount.
I even have a bold idea, ocfs2 could drop local mount feature support, and
only support cluster & nocluster mount mode.

> So it seems 'nocluster' mode looks a bit weird. I'm not sure if this is
> a frequently happened user case. Or can we do it in cluster stack as
> Junxiao suggested? Oracle and I (before) only use o2cb, so we don't have
> much experience on corosync + pacemaker.

I had explained the usecases of nocluster in previous mails. SUSE's customer
uses this feature to maintainer (eg fsck) or backup. Do you remember ocfs-tools
commit 7085e9177adc7197250d872c50a05dfc9c531bdc, which came from real world.
I also agree customer can do all the nocluster operations in cluster stack.
The key of nocluster is to bypass setting ha stack.

For minimal code change, this patch [1/1] is enough.
And for fixing (Junxiao concerned) multi mount simultaneously, we could partly use
ext4 mmp feature/idea. mmp mounting phase (before creating kmmp thread) code is
enough for fixing this issue. and I estimates the mmp for ocfs2 code is about ~100
lines.

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1
  2022-06-08 10:48 [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1 Heming Zhao via Ocfs2-devel
  2022-06-08 10:48 ` [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue Heming Zhao via Ocfs2-devel
@ 2022-06-18  2:35 ` Joseph Qi via Ocfs2-devel
  2022-06-18 10:18   ` heming.zhao--- via Ocfs2-devel
  1 sibling, 1 reply; 17+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2022-06-18  2:35 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel



On 6/8/22 6:48 PM, Heming Zhao wrote:
> === test cases ====
> 
> <1> remount on local node for cluster env
> 
> mount -t ocfs2 /dev/vdb /mnt
> mount -t ocfs2 /dev/vdb /mnt              <=== failure
> mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure
> 

This is mount multiple times, not remount.
Don't get it has any relations with your changes.

> <2> remount on local node for nocluster env
> 
> mount -t ocfs2 -o nocluster /dev/vdb /mnt
> mount -t ocfs2 /dev/vdb /mnt              <=== failure
> mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure
> 
> <3> remount on another node for cluster env
> 
> node2:
> mount -t ocfs2 /dev/vdb /mnt
> 
> node1:
> mount -t ocfs2 /dev/vdb /mnt  <== success
> umount
> mount -t ocfs2 -o nocluster /dev/vdb /mnt <== failure
> 
> <4> remount on another node for nocluster env
> 
> node2:
> mount -t ocfs2 -o nocluster /dev/vdb /mnt
> 
> node1:
> mount -t ocfs2 /dev/vdb /mnt              <== failure
> mount -t ocfs2 -o nocluster /dev/vdb /mnt <== success, see below comment
> 
Why allow two nodes mount nocluster sucessfully?
Since there is no cluster lock enabled, it will corrupt data.

> <5> simulate after crash status for cluster env
> 
> (below all steps did on node1. node2 is unmount status)
> mount -t ocfs2 /dev/vdb /mnt
> dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.cluster.mnted
> umount /mnt
> dd if=/root/slotmap.cluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
> mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== failure
> mount -t ocfs2 /dev/vdb /mnt && umount /mnt <== clean slot 0
> mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== success
> 
> <6> simulate after crash status for nocluster env
> 
> (below all steps did on node1. node2 is unmount status)
> mount -t ocfs2 -o nocluster /dev/vdb /mnt
> dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.nocluster.mnted
> umount /mnt
> dd if=/root/slotmap.nocluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
> mount -t ocfs2 /dev/vdb /mnt   <== failure
> mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt <== clean slot 0
> mount -t ocfs2 /dev/vdb /mnt   <== success
> 
'bs=1 count=8 skip=76058624', is this for slotmap backup?

Thanks,
Joseph

> 
> -----
> For test case <4>, the kernel job is done, but there still left
> userspace work todo. 
> In my view, mount.ocfs2 needs add double confirm for this scenario.
> 
> current style:
> ```
> # mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt
> Warning: to mount a clustered volume without the cluster stack.
> Please make sure you only mount the file system from one node.
> Otherwise, the file system may be damaged.
> Proceed (y/N): y
> ```
> 
> I plan to change as:
> ```
> # mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt
> Warning: to mount a clustered volume without the cluster stack.
> Please make sure you only mount the file system from one node.
> Otherwise, the file system may be damaged.
> Proceed (y/N): y
> Warning: detect volume already mounted as nocluster mode.
> Do you mount this volume on another node?
> Please confirm you want to mount this volume on this node.
> Proceed (y/N): y
> ```
> 
> Heming Zhao (1):
>   ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
> 
>  fs/ocfs2/dlmglue.c  |  3 ++
>  fs/ocfs2/ocfs2_fs.h |  3 ++
>  fs/ocfs2/slot_map.c | 70 ++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 62 insertions(+), 14 deletions(-)
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1
  2022-06-18  2:35 ` [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1 Joseph Qi via Ocfs2-devel
@ 2022-06-18 10:18   ` heming.zhao--- via Ocfs2-devel
  2022-06-25 12:47     ` Joseph Qi via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-06-18 10:18 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel

On 6/18/22 10:35, Joseph Qi wrote:
> 
> 
> On 6/8/22 6:48 PM, Heming Zhao wrote:
>> === test cases ====
>>
>> <1> remount on local node for cluster env
>>
>> mount -t ocfs2 /dev/vdb /mnt
>> mount -t ocfs2 /dev/vdb /mnt              <=== failure
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure
>>
> 
> This is mount multiple times, not remount.
> Don't get it has any relations with your changes.

Yes. not related with my patch.
I include this test only for watching remount result.

> 
>> <2> remount on local node for nocluster env
>>
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt
>> mount -t ocfs2 /dev/vdb /mnt              <=== failure
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure
>>
>> <3> remount on another node for cluster env
>>
>> node2:
>> mount -t ocfs2 /dev/vdb /mnt
>>
>> node1:
>> mount -t ocfs2 /dev/vdb /mnt  <== success
>> umount
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <== failure
>>
>> <4> remount on another node for nocluster env
>>
>> node2:
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt
>>
>> node1:
>> mount -t ocfs2 /dev/vdb /mnt              <== failure
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <== success, see below comment
>>
> Why allow two nodes mount nocluster sucessfully?
> Since there is no cluster lock enabled, it will corrupt data.

I didn't know about ext4 mmp feature at that time. If ext4 allows mount on
different machine (can corrupt data), ocfs2 also allow this case to happen.
But following ext4 mmp, we could better to add some similar code for blocking
multi mount.
> 
>> <5> simulate after crash status for cluster env
>>
>> (below all steps did on node1. node2 is unmount status)
>> mount -t ocfs2 /dev/vdb /mnt
>> dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.cluster.mnted
>> umount /mnt
>> dd if=/root/slotmap.cluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== failure
>> mount -t ocfs2 /dev/vdb /mnt && umount /mnt <== clean slot 0
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== success
>>
>> <6> simulate after crash status for nocluster env
>>
>> (below all steps did on node1. node2 is unmount status)
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt
>> dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.nocluster.mnted
>> umount /mnt
>> dd if=/root/slotmap.nocluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
>> mount -t ocfs2 /dev/vdb /mnt   <== failure
>> mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt <== clean slot 0
>> mount -t ocfs2 /dev/vdb /mnt   <== success
>>
> 'bs=1 count=8 skip=76058624', is this for slotmap backup?

sorry, I forgot to explain this secret number meaning. Your guess is right.

how to calculate:
```
my test disk is 500M raw file, attached to kvm-qemu with shared mode.
(my env) block size: 1K cluster size: 4K '//slot_map' inode number: 0xD.
debugfs: stat //slot_map
         Inode: 13   Mode: 0644   Generation: 4183895025 (0xf9612bf1)
         FS Generation: 4183895025 (0xf9612bf1)
         CRC32: 00000000   ECC: 0000
         Type: Regular   Attr: 0x0   Flags: Valid System
         Dynamic Features: (0x0)
         User: 0 (root)   Group: 0 (root)   Size: 4096
         Links: 1   Clusters: 1
         ctime: 0x62286e49 0x0 -- Wed Mar  9 17:07:21.0 2022
         atime: 0x62286e49 0x0 -- Wed Mar  9 17:07:21.0 2022
         mtime: 0x62286e4a 0x0 -- Wed Mar  9 17:07:22.0 2022
         dtime: 0x0 -- Thu Jan  1 08:00:00 1970
         Refcount Block: 0
         Last Extblk: 0   Orphan Slot: 0
         Sub Alloc Slot: Global   Sub Alloc Bit: 5
         Tree Depth: 0   Count: 51   Next Free Rec: 1
         ## Offset        Clusters       Block#          Flags
         0  0             1              74276           0x0

74276 * 1024 => 76058624 (0x4889000)
```

At last, do you think I could send v2 patch which includes part of ext4 mmp feature.
I plan to copy ext4_multi_mount_protect(), and won't include generating kmmpd kthread
code.
btw, to be honest, I can't totally got the idea of kmmpd. kmmpd does periodically
update/detect mmp area. And ext4_multi_mount_protect() already blocks new mounting
action. So in my view, kmmpd update/detect actions only work for user/something
directly modifies disk mmp area case. if my guess is right, the kmmpd is not necessary.

Thanks,
Heming
> 
> Thanks,
> Joseph
> 
>>
>> -----
>> For test case <4>, the kernel job is done, but there still left
>> userspace work todo.
>> In my view, mount.ocfs2 needs add double confirm for this scenario.
>>
>> current style:
>> ```
>> # mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt
>> Warning: to mount a clustered volume without the cluster stack.
>> Please make sure you only mount the file system from one node.
>> Otherwise, the file system may be damaged.
>> Proceed (y/N): y
>> ```
>>
>> I plan to change as:
>> ```
>> # mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt
>> Warning: to mount a clustered volume without the cluster stack.
>> Please make sure you only mount the file system from one node.
>> Otherwise, the file system may be damaged.
>> Proceed (y/N): y
>> Warning: detect volume already mounted as nocluster mode.
>> Do you mount this volume on another node?
>> Please confirm you want to mount this volume on this node.
>> Proceed (y/N): y
>> ```
>>
>> Heming Zhao (1):
>>    ocfs2: fix ocfs2_find_slot repeats alloc same slot issue
>>
>>   fs/ocfs2/dlmglue.c  |  3 ++
>>   fs/ocfs2/ocfs2_fs.h |  3 ++
>>   fs/ocfs2/slot_map.c | 70 ++++++++++++++++++++++++++++++++++++---------
>>   3 files changed, 62 insertions(+), 14 deletions(-)
>>


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1
  2022-06-18 10:18   ` heming.zhao--- via Ocfs2-devel
@ 2022-06-25 12:47     ` Joseph Qi via Ocfs2-devel
  2022-06-25 15:02       ` heming.zhao--- via Ocfs2-devel
  0 siblings, 1 reply; 17+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2022-06-25 12:47 UTC (permalink / raw)
  To: heming.zhao, ocfs2-devel



On 6/18/22 6:18 PM, heming.zhao@suse.com wrote:
> On 6/18/22 10:35, Joseph Qi wrote:
>>
>>
>> On 6/8/22 6:48 PM, Heming Zhao wrote:
>>> === test cases ====
>>>
>>> <1> remount on local node for cluster env
>>>
>>> mount -t ocfs2 /dev/vdb /mnt
>>> mount -t ocfs2 /dev/vdb /mnt              <=== failure
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure
>>>
>>
>> This is mount multiple times, not remount.
>> Don't get it has any relations with your changes.
> 
> Yes. not related with my patch.
> I include this test only for watching remount result.
> 
>>
>>> <2> remount on local node for nocluster env
>>>
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt
>>> mount -t ocfs2 /dev/vdb /mnt              <=== failure
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <=== failure
>>>
>>> <3> remount on another node for cluster env
>>>
>>> node2:
>>> mount -t ocfs2 /dev/vdb /mnt
>>>
>>> node1:
>>> mount -t ocfs2 /dev/vdb /mnt  <== success
>>> umount
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <== failure
>>>
>>> <4> remount on another node for nocluster env
>>>
>>> node2:
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt
>>>
>>> node1:
>>> mount -t ocfs2 /dev/vdb /mnt              <== failure
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt <== success, see below comment
>>>
>> Why allow two nodes mount nocluster sucessfully?
>> Since there is no cluster lock enabled, it will corrupt data.
> 
> I didn't know about ext4 mmp feature at that time. If ext4 allows mount on
> different machine (can corrupt data), ocfs2 also allow this case to happen.
> But following ext4 mmp, we could better to add some similar code for blocking
> multi mount.
>>
>>> <5> simulate after crash status for cluster env
>>>
>>> (below all steps did on node1. node2 is unmount status)
>>> mount -t ocfs2 /dev/vdb /mnt
>>> dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.cluster.mnted
>>> umount /mnt
>>> dd if=/root/slotmap.cluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== failure
>>> mount -t ocfs2 /dev/vdb /mnt && umount /mnt <== clean slot 0
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== success
>>>
>>> <6> simulate after crash status for nocluster env
>>>
>>> (below all steps did on node1. node2 is unmount status)
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt
>>> dd if=/dev/vdb bs=1 count=8 skip=76058624 of=/root/slotmap.nocluster.mnted
>>> umount /mnt
>>> dd if=/root/slotmap.nocluster.mnted of=/dev/vdb seek=76058624 bs=1 count=8
>>> mount -t ocfs2 /dev/vdb /mnt   <== failure
>>> mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt <== clean slot 0
>>> mount -t ocfs2 /dev/vdb /mnt   <== success
>>>
>> 'bs=1 count=8 skip=76058624', is this for slotmap backup?
> 
> sorry, I forgot to explain this secret number meaning. Your guess is right.
> 
> how to calculate:
> ```
> my test disk is 500M raw file, attached to kvm-qemu with shared mode.
> (my env) block size: 1K cluster size: 4K '//slot_map' inode number: 0xD.
> debugfs: stat //slot_map
>         Inode: 13   Mode: 0644   Generation: 4183895025 (0xf9612bf1)
>         FS Generation: 4183895025 (0xf9612bf1)
>         CRC32: 00000000   ECC: 0000
>         Type: Regular   Attr: 0x0   Flags: Valid System
>         Dynamic Features: (0x0)
>         User: 0 (root)   Group: 0 (root)   Size: 4096
>         Links: 1   Clusters: 1
>         ctime: 0x62286e49 0x0 -- Wed Mar  9 17:07:21.0 2022
>         atime: 0x62286e49 0x0 -- Wed Mar  9 17:07:21.0 2022
>         mtime: 0x62286e4a 0x0 -- Wed Mar  9 17:07:22.0 2022
>         dtime: 0x0 -- Thu Jan  1 08:00:00 1970
>         Refcount Block: 0
>         Last Extblk: 0   Orphan Slot: 0
>         Sub Alloc Slot: Global   Sub Alloc Bit: 5
>         Tree Depth: 0   Count: 51   Next Free Rec: 1
>         ## Offset        Clusters       Block#          Flags
>         0  0             1              74276           0x0
> 
> 74276 * 1024 => 76058624 (0x4889000)
> ```
> 
> At last, do you think I could send v2 patch which includes part of ext4 mmp feature.
> I plan to copy ext4_multi_mount_protect(), and won't include generating kmmpd kthread
> code.
> btw, to be honest, I can't totally got the idea of kmmpd. kmmpd does periodically
> update/detect mmp area. And ext4_multi_mount_protect() already blocks new mounting
> action. So in my view, kmmpd update/detect actions only work for user/something
> directly modifies disk mmp area case. if my guess is right, the kmmpd is not necessary.
> 

Sorry for the late reply.
Since now this feature is incomplete and o2cb is the default stack, I'd
like take Junxiao's suggestion, we revert this feature first to quickly
fix the regression.
And we can take it again in the future once the feature is mature.

Thanks,
Joseph

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1
  2022-06-25 12:47     ` Joseph Qi via Ocfs2-devel
@ 2022-06-25 15:02       ` heming.zhao--- via Ocfs2-devel
  0 siblings, 0 replies; 17+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-06-25 15:02 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel

On 6/25/22 20:47, Joseph Qi wrote:
> 
> 
> On 6/18/22 6:18 PM, heming.zhao@suse.com wrote:
>> On 6/18/22 10:35, Joseph Qi wrote:
>>>
>>>
>>> On 6/8/22 6:48 PM, Heming Zhao wrote:
>>>> === test cases ====
>>>>
>>>> <1> remount on local node for cluster env
>>>>
>>>> mount -t ocfs2 /dev/vdb /mnt
>>>> ... ...
>> how to calculate:
>> ```
>> my test disk is 500M raw file, attached to kvm-qemu with shared mode.
>> (my env) block size: 1K cluster size: 4K '//slot_map' inode number: 0xD.
>> debugfs: stat //slot_map
>>          Inode: 13   Mode: 0644   Generation: 4183895025 (0xf9612bf1)
>>          FS Generation: 4183895025 (0xf9612bf1)
>>          CRC32: 00000000   ECC: 0000
>>          Type: Regular   Attr: 0x0   Flags: Valid System
>>          Dynamic Features: (0x0)
>>          User: 0 (root)   Group: 0 (root)   Size: 4096
>>          Links: 1   Clusters: 1
>>          ctime: 0x62286e49 0x0 -- Wed Mar  9 17:07:21.0 2022
>>          atime: 0x62286e49 0x0 -- Wed Mar  9 17:07:21.0 2022
>>          mtime: 0x62286e4a 0x0 -- Wed Mar  9 17:07:22.0 2022
>>          dtime: 0x0 -- Thu Jan  1 08:00:00 1970
>>          Refcount Block: 0
>>          Last Extblk: 0   Orphan Slot: 0
>>          Sub Alloc Slot: Global   Sub Alloc Bit: 5
>>          Tree Depth: 0   Count: 51   Next Free Rec: 1
>>          ## Offset        Clusters       Block#          Flags
>>          0  0             1              74276           0x0
>>
>> 74276 * 1024 => 76058624 (0x4889000)
>> ```
>>
>> At last, do you think I could send v2 patch which includes part of ext4 mmp feature.
>> I plan to copy ext4_multi_mount_protect(), and won't include generating kmmpd kthread
>> code.
>> btw, to be honest, I can't totally got the idea of kmmpd. kmmpd does periodically
>> update/detect mmp area. And ext4_multi_mount_protect() already blocks new mounting
>> action. So in my view, kmmpd update/detect actions only work for user/something
>> directly modifies disk mmp area case. if my guess is right, the kmmpd is not necessary.
>>
> 
> Sorry for the late reply.
> Since now this feature is incomplete and o2cb is the default stack, I'd
> like take Junxiao's suggestion, we revert this feature first to quickly
> fix the regression.
> And we can take it again in the future once the feature is mature.
> 

Agree with your decision, I will file a new patch(es) for noclustered mount.

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-06-25 15:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-08 10:48 [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1 Heming Zhao via Ocfs2-devel
2022-06-08 10:48 ` [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue Heming Zhao via Ocfs2-devel
2022-06-12 14:16   ` Joseph Qi via Ocfs2-devel
2022-06-13  7:59     ` heming.zhao--- via Ocfs2-devel
2022-06-13  8:21       ` Joseph Qi via Ocfs2-devel
2022-06-13  8:48         ` heming.zhao--- via Ocfs2-devel
2022-06-13 15:43           ` Junxiao Bi via Ocfs2-devel
2022-06-14  2:59             ` heming.zhao--- via Ocfs2-devel
2022-06-14  3:27               ` Joseph Qi via Ocfs2-devel
2022-06-14 12:13                 ` heming.zhao--- via Ocfs2-devel
2022-06-15  2:03                   ` Joseph Qi via Ocfs2-devel
2022-06-15  4:37                     ` heming.zhao--- via Ocfs2-devel
2022-06-14  6:22               ` Junxiao Bi via Ocfs2-devel
2022-06-18  2:35 ` [Ocfs2-devel] [PATCH 0/1] test case for patch 1/1 Joseph Qi via Ocfs2-devel
2022-06-18 10:18   ` heming.zhao--- via Ocfs2-devel
2022-06-25 12:47     ` Joseph Qi via Ocfs2-devel
2022-06-25 15:02       ` heming.zhao--- via Ocfs2-devel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.