* [Ocfs2-devel] [PATCH v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready
@ 2022-03-10 10:58 Heming Zhao via Ocfs2-devel
2022-03-11 2:33 ` Joseph Qi via Ocfs2-devel
0 siblings, 1 reply; 3+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2022-03-10 10:58 UTC (permalink / raw)
To: ocfs2-devel, joseph.qi
Call trace:
ocfs2: Registered cluster interface user
dlm: no local IP address has been set
dlm: cannot start dlm lowcomms -107
(mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
(mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
(mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
BUG: kernel NULL pointer dereference, address: 0000000000000030
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ...
RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
...
Call Trace:
<TASK>
? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
evict+0xc0/0x1c0
ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
mount_bdev+0x182/0x1b0
? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
legacy_get_tree+0x24/0x40
vfs_get_tree+0x22/0xb0
path_mount+0x465/0xac0
__x64_sys_mount+0x103/0x140
do_syscall_64+0x59/0x80
? syscall_exit_to_user_mode+0x18/0x40
? do_syscall_64+0x69/0x80
? syscall_exit_to_user_mode+0x18/0x40
? do_syscall_64+0x69/0x80
? exc_page_fault+0x68/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xae
How to trigger:
tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst
== there is no dlm running ==
tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
== kernel crash ==
Analysis:
After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
Journal init later than before, it makes NULL pointer access in free
routine.
ocfs2_fill_super
+ ocfs2_mount_volume
| ocfs2_dlm_init //failed, osb->journal still NULL.
+ ...
+ ocfs2_dismount_volume
ocfs2_release_system_inodes
...
evict
...
ocfs2_clear_inode
ocfs2_checkpoint_inode
ocfs2_ci_fully_checkpointed
time_after(journal->j_trans_id, ci->ci_last_trans)
+ journal is empty, crash!
The fix should assess osb->journal before journal operation. And the fix
should be only related with abnormal case before ocfs2_journal_init().
Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
v2: revise commit log
add checking in ocfs2_dismount_volume()
---
fs/ocfs2/inode.c | 3 ++-
fs/ocfs2/journal.h | 2 +-
fs/ocfs2/super.c | 7 ++++---
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 6c2411c2afcf..3826ab7eab3e 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode)
* the journal is flushed before journal shutdown. Thus it is safe to
* have inodes get cleaned up after journal shutdown.
*/
- jbd2_journal_release_jbd_inode(osb->journal->j_journal,
+ if (osb->journal)
+ jbd2_journal_release_jbd_inode(osb->journal->j_journal,
&oi->ip_jinode);
}
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 8dcb2f2cadbc..1610d49b4112 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode)
{
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- if (ocfs2_mount_local(osb))
+ if (!osb->journal || ocfs2_mount_local(osb))
return;
if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 2772dec9dcea..01f1ab7c8de8 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
/* Wait for worker to be done with the work structure in osb */
cancel_work_sync(&osb->dquot_drop_work);
- ocfs2_shutdown_local_alloc(osb);
-
- ocfs2_truncate_log_shutdown(osb);
+ if (osb->journal) {
+ ocfs2_shutdown_local_alloc(osb);
+ ocfs2_truncate_log_shutdown(osb);
+ }
/* This will disable recovery and flush any recovery work. */
ocfs2_recovery_exit(osb);
--
2.34.1
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready
2022-03-10 10:58 [Ocfs2-devel] [PATCH v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready Heming Zhao via Ocfs2-devel
@ 2022-03-11 2:33 ` Joseph Qi via Ocfs2-devel
2022-03-11 8:10 ` heming.zhao--- via Ocfs2-devel
0 siblings, 1 reply; 3+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2022-03-11 2:33 UTC (permalink / raw)
To: Heming Zhao, ocfs2-devel
Hi,
To prevent the crash, I think these changes are fine.
But it looks wried to spread the check of ocfs2 journal.
Can we intialize the ocfs2 journal much earlier?
Thanks,
Joseph
On 3/10/22 6:58 PM, Heming Zhao wrote:
> Call trace:
>
> ocfs2: Registered cluster interface user
> dlm: no local IP address has been set
> dlm: cannot start dlm lowcomms -107
> (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
> (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
> (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
> BUG: kernel NULL pointer dereference, address: 0000000000000030
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ...
> RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
> ...
> Call Trace:
> <TASK>
> ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
> evict+0xc0/0x1c0
> ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
> ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
> ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
> mount_bdev+0x182/0x1b0
> ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
> legacy_get_tree+0x24/0x40
> vfs_get_tree+0x22/0xb0
> path_mount+0x465/0xac0
> __x64_sys_mount+0x103/0x140
> do_syscall_64+0x59/0x80
> ? syscall_exit_to_user_mode+0x18/0x40
> ? do_syscall_64+0x69/0x80
> ? syscall_exit_to_user_mode+0x18/0x40
> ? do_syscall_64+0x69/0x80
> ? exc_page_fault+0x68/0x150
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> How to trigger:
>
> tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
> tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst
>
> == there is no dlm running ==
> tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
> == kernel crash ==
>
> Analysis:
>
> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
> Journal init later than before, it makes NULL pointer access in free
> routine.
>
> ocfs2_fill_super
> + ocfs2_mount_volume
> | ocfs2_dlm_init //failed, osb->journal still NULL.
> + ...
> + ocfs2_dismount_volume
> ocfs2_release_system_inodes
> ...
> evict
> ...
> ocfs2_clear_inode
> ocfs2_checkpoint_inode
> ocfs2_ci_fully_checkpointed
> time_after(journal->j_trans_id, ci->ci_last_trans)
> + journal is empty, crash!
>
> The fix should assess osb->journal before journal operation. And the fix
> should be only related with abnormal case before ocfs2_journal_init().
>
> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
> v2: revise commit log
> add checking in ocfs2_dismount_volume()
> ---
> fs/ocfs2/inode.c | 3 ++-
> fs/ocfs2/journal.h | 2 +-
> fs/ocfs2/super.c | 7 ++++---
> 3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 6c2411c2afcf..3826ab7eab3e 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode)
> * the journal is flushed before journal shutdown. Thus it is safe to
> * have inodes get cleaned up after journal shutdown.
> */
> - jbd2_journal_release_jbd_inode(osb->journal->j_journal,
> + if (osb->journal)
> + jbd2_journal_release_jbd_inode(osb->journal->j_journal,
> &oi->ip_jinode);
> }
>
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 8dcb2f2cadbc..1610d49b4112 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode)
> {
> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>
> - if (ocfs2_mount_local(osb))
> + if (!osb->journal || ocfs2_mount_local(osb))
> return;
>
> if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 2772dec9dcea..01f1ab7c8de8 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
> /* Wait for worker to be done with the work structure in osb */
> cancel_work_sync(&osb->dquot_drop_work);
>
> - ocfs2_shutdown_local_alloc(osb);
> -
> - ocfs2_truncate_log_shutdown(osb);
> + if (osb->journal) {
> + ocfs2_shutdown_local_alloc(osb);
> + ocfs2_truncate_log_shutdown(osb);
> + }
>
> /* This will disable recovery and flush any recovery work. */
> ocfs2_recovery_exit(osb);
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready
2022-03-11 2:33 ` Joseph Qi via Ocfs2-devel
@ 2022-03-11 8:10 ` heming.zhao--- via Ocfs2-devel
0 siblings, 0 replies; 3+ messages in thread
From: heming.zhao--- via Ocfs2-devel @ 2022-03-11 8:10 UTC (permalink / raw)
To: Joseph Qi, ocfs2-devel
On 3/11/22 10:33, Joseph Qi wrote:
> Hi,
> To prevent the crash, I think these changes are fine.
> But it looks wried to spread the check of ocfs2 journal.
Yes, I have same feeling when I'm making patch.
> Can we intialize the ocfs2 journal much earlier?
>
calling it earlier only makes the gap more narrow not disappear.
(another way is to revert the commit da5e7c87827e8, it will make author unhappy)
I'm just learning ocfs2 code, don't have too much knowledge.
please forgive me for below foolish writing.
IMO all dirty data is useless on mounting phase, and can be safely dropped.
(or IOW, all the dirty data before journal init should be dropped directly)
I have tried to mark readonly when error happen, want to use readonly to skip
write operations but still crash.
I checked __ext4_fill_super, who calls ext4_load_journal lately too, but it uses
specific free steps before function return. Maybe we could do the same clean job
in ocfs2_fill_super or ocfs2_mount_volume(I prefer this func). This idea looks
need to write some new functions.
Thanks
>
> On 3/10/22 6:58 PM, Heming Zhao wrote:
>> Call trace:
>>
>> ocfs2: Registered cluster interface user
>> dlm: no local IP address has been set
>> dlm: cannot start dlm lowcomms -107
>> (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
>> (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
>> (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
>> BUG: kernel NULL pointer dereference, address: 0000000000000030
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 0 P4D 0
>> Oops: 0000 [#1] PREEMPT SMP NOPTI
>> CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ...
>> RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
>> ...
>> Call Trace:
>> <TASK>
>> ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
>> evict+0xc0/0x1c0
>> ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
>> ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
>> ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
>> mount_bdev+0x182/0x1b0
>> ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
>> legacy_get_tree+0x24/0x40
>> vfs_get_tree+0x22/0xb0
>> path_mount+0x465/0xac0
>> __x64_sys_mount+0x103/0x140
>> do_syscall_64+0x59/0x80
>> ? syscall_exit_to_user_mode+0x18/0x40
>> ? do_syscall_64+0x69/0x80
>> ? syscall_exit_to_user_mode+0x18/0x40
>> ? do_syscall_64+0x69/0x80
>> ? exc_page_fault+0x68/0x150
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> How to trigger:
>>
>> tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
>> tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst
>>
>> == there is no dlm running ==
>> tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
>> == kernel crash ==
>>
>> Analysis:
>>
>> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
>> Journal init later than before, it makes NULL pointer access in free
>> routine.
>>
>> ocfs2_fill_super
>> + ocfs2_mount_volume
>> | ocfs2_dlm_init //failed, osb->journal still NULL.
>> + ...
>> + ocfs2_dismount_volume
>> ocfs2_release_system_inodes
>> ...
>> evict
>> ...
>> ocfs2_clear_inode
>> ocfs2_checkpoint_inode
>> ocfs2_ci_fully_checkpointed
>> time_after(journal->j_trans_id, ci->ci_last_trans)
>> + journal is empty, crash!
>>
>> The fix should assess osb->journal before journal operation. And the fix
>> should be only related with abnormal case before ocfs2_journal_init().
>>
>> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>> ---
>> v2: revise commit log
>> add checking in ocfs2_dismount_volume()
>> ---
>> fs/ocfs2/inode.c | 3 ++-
>> fs/ocfs2/journal.h | 2 +-
>> fs/ocfs2/super.c | 7 ++++---
>> 3 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
>> index 6c2411c2afcf..3826ab7eab3e 100644
>> --- a/fs/ocfs2/inode.c
>> +++ b/fs/ocfs2/inode.c
>> @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode)
>> * the journal is flushed before journal shutdown. Thus it is safe to
>> * have inodes get cleaned up after journal shutdown.
>> */
>> - jbd2_journal_release_jbd_inode(osb->journal->j_journal,
>> + if (osb->journal)
>> + jbd2_journal_release_jbd_inode(osb->journal->j_journal,
>> &oi->ip_jinode);
>> }
>>
>> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
>> index 8dcb2f2cadbc..1610d49b4112 100644
>> --- a/fs/ocfs2/journal.h
>> +++ b/fs/ocfs2/journal.h
>> @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode)
>> {
>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>
>> - if (ocfs2_mount_local(osb))
>> + if (!osb->journal || ocfs2_mount_local(osb))
>> return;
>>
>> if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>> index 2772dec9dcea..01f1ab7c8de8 100644
>> --- a/fs/ocfs2/super.c
>> +++ b/fs/ocfs2/super.c
>> @@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
>> /* Wait for worker to be done with the work structure in osb */
>> cancel_work_sync(&osb->dquot_drop_work);
>>
>> - ocfs2_shutdown_local_alloc(osb);
>> -
>> - ocfs2_truncate_log_shutdown(osb);
>> + if (osb->journal) {
>> + ocfs2_shutdown_local_alloc(osb);
>> + ocfs2_truncate_log_shutdown(osb);
>> + }
>>
>> /* This will disable recovery and flush any recovery work. */
>> ocfs2_recovery_exit(osb);
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-03-11 8:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-10 10:58 [Ocfs2-devel] [PATCH v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready Heming Zhao via Ocfs2-devel
2022-03-11 2:33 ` Joseph Qi via Ocfs2-devel
2022-03-11 8:10 ` heming.zhao--- via Ocfs2-devel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).