* [PATCH -next v5 1/3] ext4: compare to local seq and nodename when check conflict
2021-10-20 3:17 [PATCH -next v5 0/3] Fix some issues about mmp Ye Bin
@ 2021-10-20 3:18 ` Ye Bin
2021-10-20 3:18 ` [PATCH -next v5 2/3] ext4: remove useless bh_check variable Ye Bin
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ye Bin @ 2021-10-20 3:18 UTC (permalink / raw)
To: tytso, adilger.kernel, linux-ext4; +Cc: linux-kernel, jack, Ye Bin
As mmp and check_mmp is point to the same data, so there will never
detect conflict.
To solve this issue just compare to local data.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 5 ++++-
fs/ext4/mmp.c | 9 +++++----
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 404dd50856e5..9a487a558787 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2601,6 +2601,8 @@ struct ext4_features {
#define EXT4_MMP_SEQ_FSCK 0xE24D4D50U /* mmp_seq value when being fscked */
#define EXT4_MMP_SEQ_MAX 0xE24D4D4FU /* maximum valid mmp_seq value */
+#define EXT4_MMP_NODENAME_LEN 64 /* mmp_nodename length */
+
struct mmp_struct {
__le32 mmp_magic; /* Magic number for MMP */
__le32 mmp_seq; /* Sequence no. updated periodically */
@@ -2610,7 +2612,8 @@ struct mmp_struct {
* purposes and do not affect the correctness of the algorithm
*/
__le64 mmp_time; /* Time last updated */
- char mmp_nodename[64]; /* Node which last updated MMP block */
+ /* Node which last updated MMP block */
+ char mmp_nodename[EXT4_MMP_NODENAME_LEN];
char mmp_bdevname[32]; /* Bdev which last updated MMP block */
/*
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index cebea4270817..97d5a8136eb2 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -138,6 +138,7 @@ static int kmmpd(void *data)
unsigned mmp_check_interval;
unsigned long last_update_time;
unsigned long diff;
+ char nodename[EXT4_MMP_NODENAME_LEN];
int retval = 0;
mmp_block = le64_to_cpu(es->s_mmp_block);
@@ -153,8 +154,8 @@ static int kmmpd(void *data)
BUILD_BUG_ON(sizeof(mmp->mmp_bdevname) < BDEVNAME_SIZE);
bdevname(bh->b_bdev, mmp->mmp_bdevname);
- memcpy(mmp->mmp_nodename, init_utsname()->nodename,
- sizeof(mmp->mmp_nodename));
+ memcpy(nodename, init_utsname()->nodename, sizeof(nodename));
+ memcpy(mmp->mmp_nodename, nodename, sizeof(mmp->mmp_nodename));
while (!kthread_should_stop() && !sb_rdonly(sb)) {
if (!ext4_has_feature_mmp(sb)) {
@@ -206,8 +207,8 @@ static int kmmpd(void *data)
}
mmp_check = (struct mmp_struct *)(bh_check->b_data);
- if (mmp->mmp_seq != mmp_check->mmp_seq ||
- memcmp(mmp->mmp_nodename, mmp_check->mmp_nodename,
+ if (seq != mmp_check->mmp_seq ||
+ memcmp(nodename, mmp_check->mmp_nodename,
sizeof(mmp->mmp_nodename))) {
dump_mmp_msg(sb, mmp_check,
"Error while updating MMP info. "
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH -next v5 2/3] ext4: remove useless bh_check variable
2021-10-20 3:17 [PATCH -next v5 0/3] Fix some issues about mmp Ye Bin
2021-10-20 3:18 ` [PATCH -next v5 1/3] ext4: compare to local seq and nodename when check conflict Ye Bin
@ 2021-10-20 3:18 ` Ye Bin
2021-10-20 3:18 ` [PATCH -next v5 3/3] ext4: simplify read_mmp_block fucntion Ye Bin
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ye Bin @ 2021-10-20 3:18 UTC (permalink / raw)
To: tytso, adilger.kernel, linux-ext4; +Cc: linux-kernel, jack, Ye Bin
Since we initialize 'bh_check' to NULL and pass it to read_mmp_block(), that
function will just call sb_getblk() which will just return the buffer_head
we have in 'bh'. So just remove the pointless 'bh_check' variable and use
'bh' directly.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/mmp.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 97d5a8136eb2..9788c617e593 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -195,10 +195,7 @@ static int kmmpd(void *data)
*/
diff = jiffies - last_update_time;
if (diff > mmp_check_interval * HZ) {
- struct buffer_head *bh_check = NULL;
- struct mmp_struct *mmp_check;
-
- retval = read_mmp_block(sb, &bh_check, mmp_block);
+ retval = read_mmp_block(sb, &bh, mmp_block);
if (retval) {
ext4_error_err(sb, -retval,
"error reading MMP data: %d",
@@ -206,20 +203,18 @@ static int kmmpd(void *data)
goto wait_to_exit;
}
- mmp_check = (struct mmp_struct *)(bh_check->b_data);
- if (seq != mmp_check->mmp_seq ||
- memcmp(nodename, mmp_check->mmp_nodename,
- sizeof(mmp->mmp_nodename))) {
- dump_mmp_msg(sb, mmp_check,
+ mmp = (struct mmp_struct *)(bh->b_data);
+ if (seq != le32_to_cpu(mmp->mmp_seq) ||
+ memcmp(nodename, mmp->mmp_nodename,
+ sizeof(nodename))) {
+ dump_mmp_msg(sb, mmp,
"Error while updating MMP info. "
"The filesystem seems to have been"
" multiply mounted.");
ext4_error_err(sb, EBUSY, "abort");
- put_bh(bh_check);
retval = -EBUSY;
goto wait_to_exit;
}
- put_bh(bh_check);
}
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH -next v5 3/3] ext4: simplify read_mmp_block fucntion
2021-10-20 3:17 [PATCH -next v5 0/3] Fix some issues about mmp Ye Bin
2021-10-20 3:18 ` [PATCH -next v5 1/3] ext4: compare to local seq and nodename when check conflict Ye Bin
2021-10-20 3:18 ` [PATCH -next v5 2/3] ext4: remove useless bh_check variable Ye Bin
@ 2021-10-20 3:18 ` Ye Bin
2021-11-08 2:11 ` [PATCH -next v5 0/3] Fix some issues about mmp yebin
2021-12-01 7:39 ` yebin
4 siblings, 0 replies; 6+ messages in thread
From: Ye Bin @ 2021-10-20 3:18 UTC (permalink / raw)
To: tytso, adilger.kernel, linux-ext4; +Cc: linux-kernel, jack, Ye Bin
This patch is according to Jan Kara's suggestion:
I guess I would just get rid of sb_getblk() in read_mmp_block() and always
expect valid bh passed. The only place that passes NULL bh after this
patch is one case in ext4_multi_mount_protect() and that can call
sb_getblk() on its own. That way we can also simplify read_mmp_block()
prototype to:
static int read_mmp_block(struct super_block *sb, struct buffer_head *bh);
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/mmp.c | 46 +++++++++++++++++++---------------------------
1 file changed, 19 insertions(+), 27 deletions(-)
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 9788c617e593..d46239a0e36b 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -64,33 +64,26 @@ static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
/*
* Read the MMP block. It _must_ be read from disk and hence we clear the
* uptodate flag on the buffer.
+ * Caller must ensure pass valid 'bh'.
*/
-static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
- ext4_fsblk_t mmp_block)
+static int read_mmp_block(struct super_block *sb, struct buffer_head *bh)
{
struct mmp_struct *mmp;
int ret;
- if (*bh)
- clear_buffer_uptodate(*bh);
-
- /* This would be sb_bread(sb, mmp_block), except we need to be sure
- * that the MD RAID device cache has been bypassed, and that the read
- * is not blocked in the elevator. */
- if (!*bh) {
- *bh = sb_getblk(sb, mmp_block);
- if (!*bh) {
- ret = -ENOMEM;
- goto warn_exit;
- }
+ if (!bh) {
+ ret = -EINVAL;
+ goto warn_exit;
}
- lock_buffer(*bh);
- ret = ext4_read_bh(*bh, REQ_META | REQ_PRIO, NULL);
+ clear_buffer_uptodate(bh);
+
+ lock_buffer(bh);
+ ret = ext4_read_bh(bh, REQ_META | REQ_PRIO, NULL);
if (ret)
goto warn_exit;
- mmp = (struct mmp_struct *)((*bh)->b_data);
+ mmp = (struct mmp_struct *)((bh)->b_data);
if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC) {
ret = -EFSCORRUPTED;
goto warn_exit;
@@ -101,10 +94,7 @@ static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
}
return 0;
warn_exit:
- brelse(*bh);
- *bh = NULL;
- ext4_warning(sb, "Error %d while reading MMP block %llu",
- ret, mmp_block);
+ ext4_warning(sb, "Error %d while reading MMP block", ret);
return ret;
}
@@ -131,7 +121,6 @@ static int kmmpd(void *data)
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
struct buffer_head *bh = EXT4_SB(sb)->s_mmp_bh;
struct mmp_struct *mmp;
- ext4_fsblk_t mmp_block;
u32 seq = 0;
unsigned long failed_writes = 0;
int mmp_update_interval = le16_to_cpu(es->s_mmp_update_interval);
@@ -141,7 +130,6 @@ static int kmmpd(void *data)
char nodename[EXT4_MMP_NODENAME_LEN];
int retval = 0;
- mmp_block = le64_to_cpu(es->s_mmp_block);
mmp = (struct mmp_struct *)(bh->b_data);
mmp->mmp_time = cpu_to_le64(ktime_get_real_seconds());
/*
@@ -195,7 +183,7 @@ static int kmmpd(void *data)
*/
diff = jiffies - last_update_time;
if (diff > mmp_check_interval * HZ) {
- retval = read_mmp_block(sb, &bh, mmp_block);
+ retval = read_mmp_block(sb, bh);
if (retval) {
ext4_error_err(sb, -retval,
"error reading MMP data: %d",
@@ -289,7 +277,11 @@ int ext4_multi_mount_protect(struct super_block *sb,
goto failed;
}
- retval = read_mmp_block(sb, &bh, mmp_block);
+ bh = sb_getblk(sb, mmp_block);
+ if (!bh)
+ goto failed;
+
+ retval = read_mmp_block(sb, bh);
if (retval)
goto failed;
@@ -327,7 +319,7 @@ int ext4_multi_mount_protect(struct super_block *sb,
goto failed;
}
- retval = read_mmp_block(sb, &bh, mmp_block);
+ retval = read_mmp_block(sb, bh);
if (retval)
goto failed;
mmp = (struct mmp_struct *)(bh->b_data);
@@ -356,7 +348,7 @@ int ext4_multi_mount_protect(struct super_block *sb,
goto failed;
}
- retval = read_mmp_block(sb, &bh, mmp_block);
+ retval = read_mmp_block(sb, bh);
if (retval)
goto failed;
mmp = (struct mmp_struct *)(bh->b_data);
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH -next v5 0/3] Fix some issues about mmp
2021-10-20 3:17 [PATCH -next v5 0/3] Fix some issues about mmp Ye Bin
` (2 preceding siblings ...)
2021-10-20 3:18 ` [PATCH -next v5 3/3] ext4: simplify read_mmp_block fucntion Ye Bin
@ 2021-11-08 2:11 ` yebin
2021-12-01 7:39 ` yebin
4 siblings, 0 replies; 6+ messages in thread
From: yebin @ 2021-11-08 2:11 UTC (permalink / raw)
To: tytso, adilger.kernel, linux-ext4; +Cc: linux-kernel, jack
On 2021/10/20 11:17, Ye Bin wrote:
> I test mmp function as follow steps:
> 1. Inject delay 5s in ext4_multi_mount_protect function after
> "skip:" label.
> 2. Share HostA block device(sda) with HostB(nbd0) by NBD.
> 3. Enable mmp feature when mkfs.ext4 sda.
> 4. Mount sda and nbd0 at the same time.
>
> I found kmmpd never trigger detect multi-mount. Reason is as follow:
> 1. Kmmpd init seq with 0, if two host have same nodename, will lead to
> detect confliction very slow, even never detect confliction.
> 2. When detect confliction in kmmpd, we get 'check_bh' is same with 'bh'.
> so we compare data with itself.
> 3. We only trigger detect when ”diff > mmp_check_interval * HZ“,
> 'mmp_check_interval' is double of 'mmp_update_interval', 'diff' is
> about 'mmp_update_interval'. So 'diff' is little than 'mmp_check_interval * HZ'
> normaly. As Jan Kara explain as follows:
> "I think the check is there only for the case where write_mmp_block() +
> sleep took longer than mmp_check_interval. I agree that should rarely
> happen but on a really busy system it is possible and in that case we would
> miss updating mmp block for too long and so another node could have started
> using the filesystem. "
>
> v1->v2:
> Fix 'last_check_time' not initialized before checking.
>
> v2->v3:
> 1. drop commit "ext4: introduce last_check_time record previous check time"
> As Ted explain as follows:
> "I'd like Andreas to comment here. My understanding is that MMP
> originally intended as a safety mechanism which would be used as part
> of a primary/backup high availability system, but not as the *primary*
> system where you might try to have two servers simultaneously try to
> mount the file system and use MMP as the "election" mechanism to
> decide which server is going to be the primary system, and which would
> be the backup system.
>
> The cost of being able to handle this particular race is it would slow
> down the mounts of cleanly unmounted systems.
>
> There *are* better systems to implement leader elections[1] than using
> MMP. Most of these more efficient leader elections assume that you
> have a working IP network, and so if you have a separate storage
> network (including a shared SCSI bus) from your standard IP network,
> then MMP is a useful failsafe in the face of a network partition of
> your IP network. The question is whether MMP should be useful for
> more than that. And if it isn't, then we should probably document
> what MMP is and isn't good for, and give advice in the form of an
> application note for how MMP should be used in the context of a larger
> system."
> 2. drop commit "ext4: fix possible store wrong check interval value in disk when umount"
> 3. simplify read_mmp_block fucntion to avoid UAF
>
> v3->v4:
> 1. drop commit "ext4: init 'seq' with the value which set in 'ext4_multi_mount_protect'"
> 2. merge "ext4: get buffer head before read_mmp_block" to
> "ext4: simplify read_mmp_block fucntion"
> 3. rename "ext4: avoid to re-read mmp check data get from page cache" to
> "ext4: remove useless bh_check variable"
> 4. reorder "ext4: remove useless bh_check variable" and
> "ext4: simplify read_mmp_block fucntion"
>
> v4->v5:
> 1. Fix follow warning:
>>> fs/ext4/mmp.c:124:15: warning: variable 'mmp_block' set but not used [-Wunused-but-set-variable]
> ext4_fsblk_t mmp_block;
> 2. Fix incorrect judgement in 'ext4_multi_mount_protect'.
>
> Ye Bin (3):
> ext4: compare to local seq and nodename when check conflict
> ext4: remove useless bh_check variable
> ext4: simplify read_mmp_block fucntion
>
> fs/ext4/ext4.h | 5 +++-
> fs/ext4/mmp.c | 66 +++++++++++++++++++++-----------------------------
> 2 files changed, 31 insertions(+), 40 deletions(-)
>
ping...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH -next v5 0/3] Fix some issues about mmp
2021-10-20 3:17 [PATCH -next v5 0/3] Fix some issues about mmp Ye Bin
` (3 preceding siblings ...)
2021-11-08 2:11 ` [PATCH -next v5 0/3] Fix some issues about mmp yebin
@ 2021-12-01 7:39 ` yebin
4 siblings, 0 replies; 6+ messages in thread
From: yebin @ 2021-12-01 7:39 UTC (permalink / raw)
To: tytso, adilger.kernel, linux-ext4; +Cc: linux-kernel, jack
On 2021/10/20 11:17, Ye Bin wrote:
> I test mmp function as follow steps:
> 1. Inject delay 5s in ext4_multi_mount_protect function after
> "skip:" label.
> 2. Share HostA block device(sda) with HostB(nbd0) by NBD.
> 3. Enable mmp feature when mkfs.ext4 sda.
> 4. Mount sda and nbd0 at the same time.
>
> I found kmmpd never trigger detect multi-mount. Reason is as follow:
> 1. Kmmpd init seq with 0, if two host have same nodename, will lead to
> detect confliction very slow, even never detect confliction.
> 2. When detect confliction in kmmpd, we get 'check_bh' is same with 'bh'.
> so we compare data with itself.
> 3. We only trigger detect when ”diff > mmp_check_interval * HZ“,
> 'mmp_check_interval' is double of 'mmp_update_interval', 'diff' is
> about 'mmp_update_interval'. So 'diff' is little than 'mmp_check_interval * HZ'
> normaly. As Jan Kara explain as follows:
> "I think the check is there only for the case where write_mmp_block() +
> sleep took longer than mmp_check_interval. I agree that should rarely
> happen but on a really busy system it is possible and in that case we would
> miss updating mmp block for too long and so another node could have started
> using the filesystem. "
>
> v1->v2:
> Fix 'last_check_time' not initialized before checking.
>
> v2->v3:
> 1. drop commit "ext4: introduce last_check_time record previous check time"
> As Ted explain as follows:
> "I'd like Andreas to comment here. My understanding is that MMP
> originally intended as a safety mechanism which would be used as part
> of a primary/backup high availability system, but not as the *primary*
> system where you might try to have two servers simultaneously try to
> mount the file system and use MMP as the "election" mechanism to
> decide which server is going to be the primary system, and which would
> be the backup system.
>
> The cost of being able to handle this particular race is it would slow
> down the mounts of cleanly unmounted systems.
>
> There *are* better systems to implement leader elections[1] than using
> MMP. Most of these more efficient leader elections assume that you
> have a working IP network, and so if you have a separate storage
> network (including a shared SCSI bus) from your standard IP network,
> then MMP is a useful failsafe in the face of a network partition of
> your IP network. The question is whether MMP should be useful for
> more than that. And if it isn't, then we should probably document
> what MMP is and isn't good for, and give advice in the form of an
> application note for how MMP should be used in the context of a larger
> system."
> 2. drop commit "ext4: fix possible store wrong check interval value in disk when umount"
> 3. simplify read_mmp_block fucntion to avoid UAF
>
> v3->v4:
> 1. drop commit "ext4: init 'seq' with the value which set in 'ext4_multi_mount_protect'"
> 2. merge "ext4: get buffer head before read_mmp_block" to
> "ext4: simplify read_mmp_block fucntion"
> 3. rename "ext4: avoid to re-read mmp check data get from page cache" to
> "ext4: remove useless bh_check variable"
> 4. reorder "ext4: remove useless bh_check variable" and
> "ext4: simplify read_mmp_block fucntion"
>
> v4->v5:
> 1. Fix follow warning:
>>> fs/ext4/mmp.c:124:15: warning: variable 'mmp_block' set but not used [-Wunused-but-set-variable]
> ext4_fsblk_t mmp_block;
> 2. Fix incorrect judgement in 'ext4_multi_mount_protect'.
>
> Ye Bin (3):
> ext4: compare to local seq and nodename when check conflict
> ext4: remove useless bh_check variable
> ext4: simplify read_mmp_block fucntion
>
> fs/ext4/ext4.h | 5 +++-
> fs/ext4/mmp.c | 66 +++++++++++++++++++++-----------------------------
> 2 files changed, 31 insertions(+), 40 deletions(-)
Hi, Ted
I've tested this patch set. If there are no other questions, can it be
merged?
^ permalink raw reply [flat|nested] 6+ messages in thread