All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heming Zhao via Ocfs2-devel <ocfs2-devel@oss.oracle.com>
To: ocfs2-devel@oss.oracle.com, joseph.qi@linux.alibaba.com
Subject: [Ocfs2-devel] [PATCH 0/4] re-enable non-clustered mount & add MMP support
Date: Sat, 30 Jul 2022 09:14:07 +0800	[thread overview]
Message-ID: <20220730011411.11214-1-heming.zhao@suse.com> (raw)

This serial patches re-enable ocfs2 non-clustered mount feature.

the previous patch c80af0c250c8 (Revert "ocfs2: mount shared volume
without ha stack") revert Gang's non-clustered mount patch. This
serial patches re-enable ocfs2 non-clustered mount.

the key different between local mount and non-clustered mount: 
local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do
convert job without ha stack. non-clustered mount feature can run
totally without ha stack.

For avoiding data corruption when non-clustered & clustered mount are
happening at same time, this serial patches also introduces MMP
feature. MMP (Multiple Mount Protection) idea got from ext4 MMP
(fs/ext4/mmp.c) which protects fs from being mounted more than once.
For ocfs2 is a clustered fs and also for compatible with existing
slotmap feature, I did some optimization and modification when
porting from ext4 MMP to ocfs2.

The related userspace code for supporting MMP had been sent to github
for reviewing:
- https://github.com/markfasheh/ocfs2-tools/pull/58

ocfs2-tools enable MMP and check status:

```
# enable MMP
tunefs.ocfs2 --fs-feature=mmp /dev/vdb

# check the command result
tunefs.ocfs2 -Q "%H\n" /dev/vdb | grep MMP

# active MMP on nocluster mount
mount -t ocfs2 -o nocluster /dev/vdb /mnt

# check slotmap info
# echo slotmap | PAGER=cat debugfs.ocfs2 /dev/vdb
```

=== below are test cases for patches ====

<1> non-clustered mount vs local mount

1.1 tunefs.ocfs2 can't convert local/nolocal mount without ha stack.

```
(on ha stack env)
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=local /dev/vdb  (<== success)
tunefs.ocfs2 --fs-features=nolocal /dev/vdb  (<== success)
(on another node without ha stack)
tunefs.ocfs2 --fs-features=local /dev/vdb  (<== failure)
```

1.2 non-cluster feature can run without ha stack.
```
(on ha stack env)
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
(on another node without ha stack)
mount -t ocfs2 -o nocluster /dev/vdb /mnt  (<== success)
```


<2> do clustered & non-clustered mount on same node

2.1  non-clustered mount => clustered mount

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
mount -t ocfs2 /dev/vdb /mnt               (<=== failure)
```

2.2 clustered mount => non-clustered mount

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt  (<=== failure)
```

<3> one node does clustered mount, another does non-clustered mount

test rule: clustered mount and non-clustered mount can not exist at same
time.

3.1 clustered mount @node1 => [no]clustered mount @node2

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== success)
umount /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure)
```

3.2 enable mmp, repeate 3.1 case

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb   (<== enable mmp)
mount -t ocfs2 /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== wait ~22s [*] for mmp,
then success)
umount /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure)
```

[*] 22s:
(OCFS2_MMP_MIN_CHECK_INTERVAL * 2 + 1) * 2 times (calling
schedule_timeout_interruptible)

3.3 noclustered mount @node1 => [no]clustered  mount @node2

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== failure)
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== success, without mmp
enable)
umount /mnt               (<== will ZERO out slotmap area while node1
still mounting)
```

3.4 enable mmp, repeate 3.3 case.

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb   (<== enable mmp)
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== failure)
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure, denied by mmp)
```

<4> simulate mounting after machine crash

info:
- below all steps do on one node
- address 287387648 is the '//slot_map' extent address.
- test the rule: If last mount didn't do unmount, (eg: crash), the next
  mount MUST be same mount type.

4.0 how to calculate '//slot_map' extent address

```
# PAGER=cat debugfs.ocfs2 -R "stats" /dev/vdb | grep "Block Size Bits"
        Block Size Bits: 12   Cluster Size Bits: 12

# PAGER=cat debugfs.ocfs2 -R "stat //slot_map" /dev/vdb | grep -A1
# "Block#"
        ## Offset        Clusters       Block#          Flags
        0  0             1              70163           0x0
```

70163 * (1<<12) = 70163 * 4096 = 287387648


4.1 clustered mount => crash => non-clustered mount fails => clean
slotmap => non-clustered mount succeeds

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=32 skip=287387648
of=/root/slotmap.cluster.mnted  (<== backup slot info)
umount /mnt
dd if=/root/slotmap.cluster.mnted of=/dev/vdb seek=287387648 bs=1
count=32 (<== overwrite)

mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== failure
mount -t ocfs2 /dev/vdb /mnt && umount /mnt <== clean slot 0
mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== success
```

4.2  non-clustered mount => crash => clustered mount fails => clean
slotmap => clustered mount succeeds

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=32 skip=287387648
of=/root/slotmap.nocluster.mnted
umount /mnt
dd if=/root/slotmap.nocluster.mnted of=/dev/vdb seek=287387648 bs=1
count=32

mount -t ocfs2 /dev/vdb /mnt   <== failure
mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt <== clean slot
0
mount -t ocfs2 /dev/vdb /mnt   <== success
```

<5> MMP test

5.1 node1 noclustered mount => node 2 noclustered mount

disable mmp
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== success)
```

enable mmp
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== wait ~12s[*], failure by
mmp)
```

[*] 12s:
sleep (OCFS2_MMP_MIN_CHECK_INTERVAL * 2 + 1) then detect mmp_seq was
changed, then failed.

5.2 node1 clustered mount => node 2 clustered mount

see case 3.2

5.3 node1 noclustered mount => node 2 noclustered mount

see case 3.4

5.4 remount test

5.4.1 non-clustered mount (run commands on same node)

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb

mount -t ocfs2 -o nocluster /dev/vdb /mnt
ps axj | grep kmmpd                            (<== will show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_MMP_SEQ')

mount -o remount,ro,nocluster /dev/vdb /mnt    (<== kmmpd will stop)
ps axj | grep kmmpd  (<== won't show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_MMP_SEQ_CLEAN')

mount -o remount,rw,nocluster /dev/vdb /mnt    (<== kmmpd will start)
ps axj | grep kmmpd  (<== will show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_MMP_SEQ')
```

5.4.2 clustered mount

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb

mount -t ocfs2 /dev/vdb /mnt                   (<== clustered mount
won't create kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_VALID_CLUSTER')

mount -o remount,ro /dev/vdb /mnt
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_VALID_CLUSTER')

mount -o remount,rw /dev/vdb /mnt              (<== wait for ~22s by mmp
start)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_VALID_CLUSTER')
```

Heming Zhao (4):
  ocfs2: Fix freeing uninitialized resource on ocfs2_dlm_shutdown
  ocfs2: add mlog ML_WARNING support
  re-enable "ocfs2: mount shared volume without ha stack"
  ocfs2: introduce ext4 MMP feature

 fs/ocfs2/cluster/masklog.c |   3 +
 fs/ocfs2/cluster/masklog.h |   9 +-
 fs/ocfs2/dlmglue.c         |   3 +
 fs/ocfs2/ocfs2.h           |   6 +-
 fs/ocfs2/ocfs2_fs.h        |  13 +-
 fs/ocfs2/slot_map.c        | 479 +++++++++++++++++++++++++++++++++++--
 fs/ocfs2/slot_map.h        |   3 +
 fs/ocfs2/super.c           |  42 +++-
 8 files changed, 527 insertions(+), 31 deletions(-)

-- 
2.37.1


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

             reply	other threads:[~2022-07-30  1:14 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-30  1:14 Heming Zhao via Ocfs2-devel [this message]
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: Fix freeing uninitialized resource on ocfs2_dlm_shutdown Heming Zhao via Ocfs2-devel
2022-08-08  6:51   ` Joseph Qi via Ocfs2-devel
2022-08-08 12:09     ` Heming Zhao via Ocfs2-devel
2022-08-10  1:31       ` Joseph Qi via Ocfs2-devel
2022-08-10 23:52         ` heming.zhao--- via Ocfs2-devel
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: add mlog ML_WARNING support Heming Zhao via Ocfs2-devel
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack" Heming Zhao via Ocfs2-devel
2022-07-31 17:42   ` Mark Fasheh via Ocfs2-devel
2022-08-01  1:01     ` heming.zhao--- via Ocfs2-devel
2022-08-01  2:25       ` heming.zhao--- via Ocfs2-devel
2022-08-04 23:53       ` Mark Fasheh via Ocfs2-devel
2022-08-05  4:11         ` Mark Fasheh via Ocfs2-devel
2022-08-06 15:53           ` heming.zhao--- via Ocfs2-devel
2022-08-06 16:20           ` Heming Zhao via Ocfs2-devel
2022-08-06 15:44         ` heming.zhao--- via Ocfs2-devel
2022-08-06 16:15         ` Heming Zhao via Ocfs2-devel
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 4/4] ocfs2: introduce ext4 MMP feature Heming Zhao via Ocfs2-devel
2022-07-31  9:13   ` heming.zhao--- via Ocfs2-devel
2022-08-08  8:19   ` Joseph Qi via Ocfs2-devel
2022-08-08  9:07     ` Heming Zhao via Ocfs2-devel
2022-08-08  9:26       ` Heming Zhao via Ocfs2-devel
2022-08-08  9:29       ` Joseph Qi via Ocfs2-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220730011411.11214-1-heming.zhao@suse.com \
    --to=ocfs2-devel@oss.oracle.com \
    --cc=heming.zhao@suse.com \
    --cc=joseph.qi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.