All of lore.kernel.org
 help / color / mirror / Atom feed
* [f2fs-dev] [Bug 210745] New: kernel crash during umounting a partition with f2fs filesystem
@ 2020-12-17  6:43 bugzilla-daemon
  2020-12-18 10:27 ` [f2fs-dev] [Bug 210745] " bugzilla-daemon
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-12-17  6:43 UTC (permalink / raw)
  To: linux-f2fs-devel

https://bugzilla.kernel.org/show_bug.cgi?id=210745

            Bug ID: 210745
           Summary: kernel crash during umounting a partition with f2fs
                    filesystem
           Product: File System
           Version: 2.5
    Kernel Version: 4.14.193
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: f2fs
          Assignee: filesystem_f2fs@kernel-bugs.kernel.org
          Reporter: Zhiguo.Niu@unisoc.com
        Regression: No

Hi,
When we do the reboot stress test in a device, we may encounter the following
kernel crash occasionally.


[   42.035226] c6 Unable to handle kernel NULL pointer dereference at virtual
address 0000000a
[   43.437464] c6  __list_del_entry_valid+0xc/0xd8
[   43.441962] c6  f2fs_destroy_node_manager+0x218/0x398
[   43.446984] c6  f2fs_put_super+0x19c/0x2b8
[   43.451052] c6  generic_shutdown_super+0x70/0xf8
[   43.455635] c6  kill_block_super+0x2c/0x5c
[   43.459702] c6  kill_f2fs_super+0xac/0xd8
[   43.463684] c6  deactivate_locked_super+0x5c/0x124
[   43.468442] c6  deactivate_super+0x5c/0x68
[   43.472512] c6  cleanup_mnt+0x9c/0x118
[   43.476231] c6  __cleanup_mnt+0x1c/0x28
[   43.480043] c6  task_work_run+0x88/0xa8
[   43.483850] c6  do_notify_resume+0x39c/0x1c88
[   43.488174] c6  work_pending+0x8/0x14

the code of crash point is:
f2fs/node.c

void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi)

        while ((found = __gang_lookup_nat_cache(nm_i,
                                        nid, NATVEC_SIZE, natvec))) {
                unsigned idx;

                nid = nat_get_nid(natvec[found - 1]) + 1;
                for (idx = 0; idx < found; idx++) {
                        spin_lock(&nm_i->nat_list_lock);
>                       list_del(&natvec[idx]->list);
                        spin_unlock(&nm_i->nat_list_lock);

                        __del_from_nat_cache(nm_i, natvec[idx]);
                }
        }

because of the current nat entry in natvec[idx] is a invalid pointer or its
member list has null next member.

We have encountered this issue for several times in both Andoird Q & R version

I analyze these issue as following:

1. the current nat can be found in stack, like as "a"
ffffff800806b8d0:  ffffffc0af33cbc0 ffffffc0af4869a0 
> ffffff800806b8e0:  ffffffc0f49baa00 000000000000000a 
ffffff800806b8f0:  ffffffc0af33c040 ffffffc0c69f0e20 
ffffff800806b900:  ffffffc0c695abc0 ffffffc01e2a4460 

2.these invalid entry can be found in nat_root radix tree of f2fs_nm_info

3. I have reviewed the codes about nat_tree_lock, and has not any clues

please let me know if you need any other information
thanks a lot.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [f2fs-dev] [Bug 210745] kernel crash during umounting a partition with f2fs filesystem
  2020-12-17  6:43 [f2fs-dev] [Bug 210745] New: kernel crash during umounting a partition with f2fs filesystem bugzilla-daemon
@ 2020-12-18 10:27 ` bugzilla-daemon
  2020-12-21  8:09 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-12-18 10:27 UTC (permalink / raw)
  To: linux-f2fs-devel

https://bugzilla.kernel.org/show_bug.cgi?id=210745

Chao Yu (chao@kernel.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO
                 CC|                            |chao@kernel.org

--- Comment #1 from Chao Yu (chao@kernel.org) ---
Hi,

I checked the code of 4.14.193, I don't have any clue about why this can
happen,
and I don't remember that there is such corruption condition occured on nid
list, because all its update is under nat_tree_lock, let me know if I missed
something.

Do you apply private patch on 4.14.193?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [f2fs-dev] [Bug 210745] kernel crash during umounting a partition with f2fs filesystem
  2020-12-17  6:43 [f2fs-dev] [Bug 210745] New: kernel crash during umounting a partition with f2fs filesystem bugzilla-daemon
  2020-12-18 10:27 ` [f2fs-dev] [Bug 210745] " bugzilla-daemon
@ 2020-12-21  8:09 ` bugzilla-daemon
  2020-12-21  8:29 ` bugzilla-daemon
  2020-12-21  8:44 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-12-21  8:09 UTC (permalink / raw)
  To: linux-f2fs-devel

https://bugzilla.kernel.org/show_bug.cgi?id=210745

--- Comment #2 from Zhiguo.Niu (Zhiguo.Niu@unisoc.com) ---
(In reply to Chao Yu from comment #1)
> Hi,
> 
> I checked the code of 4.14.193, I don't have any clue about why this can
> happen,
> and I don't remember that there is such corruption condition occured on nid
> list, because all its update is under nat_tree_lock, let me know if I missed
> something.
> 
> Do you apply private patch on 4.14.193?


hi Chao, 

Thanks for your reply, I have checked my codebase, there is no any other
private patches in current version.

I find that local variables natvec & setvec in f2fs_destroy_node_manager may be
inited as 0xaa and 0xaaaaaaaaaaaaaaaa, just like :

void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi)
{
        struct f2fs_nm_info *nm_i = NM_I(sbi);
        struct free_nid *i, *next_i;
        struct nat_entry *natvec[NATVEC_SIZE];
        struct nat_entry_set *setvec[SETVEC_SIZE];

dis:
crash_arm64> dis f2fs_destroy_node_manager
0xffffff800842e2a8 <f2fs_destroy_node_manager>: stp     x29, x30, [sp,#-96]!
0xffffff800842e2ac <f2fs_destroy_node_manager+4>:       stp     x28, x27,
[sp,#16]
0xffffff800842e2b0 <f2fs_destroy_node_manager+8>:       stp     x26, x25,
[sp,#32]
0xffffff800842e2b4 <f2fs_destroy_node_manager+12>:      stp     x24, x23,
[sp,#48]
0xffffff800842e2b8 <f2fs_destroy_node_manager+16>:      stp     x22, x21,
[sp,#64]
0xffffff800842e2bc <f2fs_destroy_node_manager+20>:      stp     x20, x19,
[sp,#80]
0xffffff800842e2c0 <f2fs_destroy_node_manager+24>:      mov     x29, sp
0xffffff800842e2c4 <f2fs_destroy_node_manager+28>:      sub     sp, sp, #0x320
0xffffff800842e2c8 <f2fs_destroy_node_manager+32>:      adrp    x8,
0xffffff800947e000 <xt_connlimit_locks+768>
0xffffff800842e2cc <f2fs_destroy_node_manager+36>:      ldr     x8, [x8,#264]
0xffffff800842e2d0 <f2fs_destroy_node_manager+40>:      mov     x27, x0
0xffffff800842e2d4 <f2fs_destroy_node_manager+44>:      str     x8, [x29,#-16]
0xffffff800842e2d8 <f2fs_destroy_node_manager+48>:      nop
0xffffff800842e2dc <f2fs_destroy_node_manager+52>:      ldr     x20, [x27,#112]
0xffffff800842e2e0 <f2fs_destroy_node_manager+56>:      add     x0, sp, #0x110
0xffffff800842e2e4 <f2fs_destroy_node_manager+60>:      mov     w1, #0xaa      
                // #170
0xffffff800842e2e8 <f2fs_destroy_node_manager+64>:      mov     w2, #0x200     
                // #512
0xffffff800842e2ec <f2fs_destroy_node_manager+68>:      bl     
0xffffff8008be6b80 <__memset>
0xffffff800842e2f0 <f2fs_destroy_node_manager+72>:      mov     x8,
#0xaaaaaaaaaaaaaaaa         // #-6148914691236517206
0xffffff800842e2f4 <f2fs_destroy_node_manager+76>:      stp     x8, x8,
[sp,#256]
0xffffff800842e2f8 <f2fs_destroy_node_manager+80>:      stp     x8, x8,
[sp,#240]
0xffffff800842e2fc <f2fs_destroy_node_manager+84>:      stp     x8, x8,
[sp,#224]
0xffffff800842e300 <f2fs_destroy_node_manager+88>:      stp     x8, x8,
[sp,#208]
0xffffff800842e304 <f2fs_destroy_node_manager+92>:      stp     x8, x8,
[sp,#192]
0xffffff800842e308 <f2fs_destroy_node_manager+96>:      stp     x8, x8,
[sp,#176]
0xffffff800842e30c <f2fs_destroy_node_manager+100>:     stp     x8, x8,
[sp,#160]
0xffffff800842e310 <f2fs_destroy_node_manager+104>:     stp     x8, x8,
[sp,#144]
0xffffff800842e314 <f2fs_destroy_node_manager+108>:     stp     x8, x8,
[sp,#128]
0xffffff800842e318 <f2fs_destroy_node_manager+112>:     stp     x8, x8,
[sp,#112]
0xffffff800842e31c <f2fs_destroy_node_manager+116>:     stp     x8, x8,
[sp,#96]
0xffffff800842e320 <f2fs_destroy_node_manager+120>:     stp     x8, x8,
[sp,#80]
0xffffff800842e324 <f2fs_destroy_node_manager+124>:     stp     x8, x8,
[sp,#64]
0xffffff800842e328 <f2fs_destroy_node_manager+128>:     stp     x8, x8,
[sp,#48]
0xffffff800842e32c <f2fs_destroy_node_manager+132>:     stp     x8, x8,
[sp,#32]
0xffffff800842e330 <f2fs_destroy_node_manager+136>:     stp     x8, x8,
[sp,#16]

I am not sure this is the root cause about this issue, because these invalid
entry can be found in nat_root radix tree of f2fs_nm_info

thanks!

thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [f2fs-dev] [Bug 210745] kernel crash during umounting a partition with f2fs filesystem
  2020-12-17  6:43 [f2fs-dev] [Bug 210745] New: kernel crash during umounting a partition with f2fs filesystem bugzilla-daemon
  2020-12-18 10:27 ` [f2fs-dev] [Bug 210745] " bugzilla-daemon
  2020-12-21  8:09 ` bugzilla-daemon
@ 2020-12-21  8:29 ` bugzilla-daemon
  2020-12-21  8:44 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-12-21  8:29 UTC (permalink / raw)
  To: linux-f2fs-devel

https://bugzilla.kernel.org/show_bug.cgi?id=210745

--- Comment #3 from Chao Yu (chao@kernel.org) ---
nm_i->nat_list_lock was introduced in 4.19, are you sure your codebase is
4.14.193?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [f2fs-dev] [Bug 210745] kernel crash during umounting a partition with f2fs filesystem
  2020-12-17  6:43 [f2fs-dev] [Bug 210745] New: kernel crash during umounting a partition with f2fs filesystem bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-12-21  8:29 ` bugzilla-daemon
@ 2020-12-21  8:44 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-12-21  8:44 UTC (permalink / raw)
  To: linux-f2fs-devel

https://bugzilla.kernel.org/show_bug.cgi?id=210745

--- Comment #4 from Chao Yu (chao@kernel.org) ---
(In reply to Zhiguo.Niu from comment #2)
> hi Chao, 
> 
> Thanks for your reply, I have checked my codebase, there is no any other
> private patches in current version.
> 
> I find that local variables natvec & setvec in f2fs_destroy_node_manager may
> be inited as 0xaa and 0xaaaaaaaaaaaaaaaa, just like :
> 
> void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi)
> {
>       struct f2fs_nm_info *nm_i = NM_I(sbi);
>       struct free_nid *i, *next_i;
>       struct nat_entry *natvec[NATVEC_SIZE];
>       struct nat_entry_set *setvec[SETVEC_SIZE];
> 

I don't think so, natvec array will be assigned in __gang_lookup_nat_cache(),
and natvec[0..found - 1] will be valid, in "destroy nat cache" loop, we will
not access natvec array out-of-range.

Can you please check whether @found is valid or not (@found should be less or
equal than NATVEC_SIZE)?

BTW, one possible case could be stack overflow, but during umount(), would
that really happen?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-21  8:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17  6:43 [f2fs-dev] [Bug 210745] New: kernel crash during umounting a partition with f2fs filesystem bugzilla-daemon
2020-12-18 10:27 ` [f2fs-dev] [Bug 210745] " bugzilla-daemon
2020-12-21  8:09 ` bugzilla-daemon
2020-12-21  8:29 ` bugzilla-daemon
2020-12-21  8:44 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.