All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  2:29 ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-08  2:29 UTC (permalink / raw)
  To: Ryusuke Konishi, Andrew Morton; +Cc: linux-nilfs, linux-kernel, Liu Shixin

Syzbot reported a NULL pointer dereference:

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
 [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : _compound_head include/linux/page-flags.h:253 [inline]
 pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
 lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
 sp : ffff80001290bc00
 x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
 x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
 x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
 x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
 x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
 x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
 x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
 x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
 x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
 Call trace:
  lock_page+0x28/0x1e0 include/linux/pagemap.h:956
  nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
  nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
  nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
  nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
  nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
  kthread+0x12c/0x158 kernel/kthread.c:376
  ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.

Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
---
 fs/nilfs2/segment.c | 1 +
 fs/nilfs2/sufile.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index b4cebad21b48..7be632c15f91 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
 		sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
 
 		/* allocate the next next full segment */
+		nextnextnum = segbuf->sb_segnum;
 		err = nilfs_sufile_alloc(sufile, &nextnextnum);
 		if (unlikely(err))
 			goto failed_segbuf;
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 77ff8e95421f..853a8212114f 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
 		goto out_sem;
 	kaddr = kmap_atomic(header_bh->b_page);
 	header = kaddr + bh_offset(header_bh);
-	last_alloc = le64_to_cpu(header->sh_last_alloc);
+	last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
 	kunmap_atomic(kaddr);
 
 	nsegments = nilfs_sufile_get_nsegments(sufile);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  2:29 ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-08  2:29 UTC (permalink / raw)
  To: Ryusuke Konishi, Andrew Morton
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Liu Shixin

Syzbot reported a NULL pointer dereference:

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
 [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : _compound_head include/linux/page-flags.h:253 [inline]
 pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
 lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
 sp : ffff80001290bc00
 x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
 x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
 x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
 x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
 x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
 x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
 x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
 x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
 x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
 Call trace:
  lock_page+0x28/0x1e0 include/linux/pagemap.h:956
  nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
  nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
  nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
  nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
  nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
  kthread+0x12c/0x158 kernel/kthread.c:376
  ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.

Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 fs/nilfs2/segment.c | 1 +
 fs/nilfs2/sufile.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index b4cebad21b48..7be632c15f91 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
 		sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
 
 		/* allocate the next next full segment */
+		nextnextnum = segbuf->sb_segnum;
 		err = nilfs_sufile_alloc(sufile, &nextnextnum);
 		if (unlikely(err))
 			goto failed_segbuf;
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 77ff8e95421f..853a8212114f 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
 		goto out_sem;
 	kaddr = kmap_atomic(header_bh->b_page);
 	header = kaddr + bh_offset(header_bh);
-	last_alloc = le64_to_cpu(header->sh_last_alloc);
+	last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
 	kunmap_atomic(kaddr);
 
 	nsegments = nilfs_sufile_get_nsegments(sufile);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  4:41   ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08  4:41 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

Hi Liu Shixin,

On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
>
> Syzbot reported a NULL pointer dereference:
>
>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
>  Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>  Modules linked in:
>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : _compound_head include/linux/page-flags.h:253 [inline]
>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>  sp : ffff80001290bc00
>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
>  Call trace:
>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
>   kthread+0x12c/0x158 kernel/kthread.c:376
>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
>
> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
>
> Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> ---
>  fs/nilfs2/segment.c | 1 +
>  fs/nilfs2/sufile.c  | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index b4cebad21b48..7be632c15f91 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
>
>                 /* allocate the next next full segment */
> +               nextnextnum = segbuf->sb_segnum;
>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
>                 if (unlikely(err))
>                         goto failed_segbuf;
> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> index 77ff8e95421f..853a8212114f 100644
> --- a/fs/nilfs2/sufile.c
> +++ b/fs/nilfs2/sufile.c
> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>                 goto out_sem;
>         kaddr = kmap_atomic(header_bh->b_page);
>         header = kaddr + bh_offset(header_bh);
> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
>         kunmap_atomic(kaddr);
>
>         nsegments = nilfs_sufile_get_nsegments(sufile);
> --
> 2.25.1

Thank you for your help.   I have a few questions, so I'll ask them below.

> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> add a bh in two segbuf->sb_segsum_buffers.

If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
is updated.
all segment allocation must be done through nilfs_sufile_alloc().
And, the allocated segment is marked dirty on the sufile not to be
reallocated until it's freed.

So, why is it happening that the same segment is allocated twice in a log ?
Is it hard to fix the problem by correcting the calling sequence of
nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
?

I haven't looked closely at this patch yet, but I'm concerned about
the impact on other places as well.
nilfs_sufile_alloc() is also used in
nilfs_segctor_begin_construction() and
nilfs_prepare_segment_for_recovery().  Are there any side effects?

This patch turns an output-only argument into both input and output,
and that input value is always used in the calculation of
"last_alloc".
So, this change requires all callers to pass a meaningful initial
value (at least a valid value) to *segnump.

Another question, will this work near the end of the segments ?
Since segments are used cyclically, wouldn't comparison with the max
function break down there?
(I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  4:41   ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08  4:41 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Liu Shixin,

On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
>
> Syzbot reported a NULL pointer dereference:
>
>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
>  Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>  Modules linked in:
>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : _compound_head include/linux/page-flags.h:253 [inline]
>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>  sp : ffff80001290bc00
>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
>  Call trace:
>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
>   kthread+0x12c/0x158 kernel/kthread.c:376
>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
>
> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
>
> Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  fs/nilfs2/segment.c | 1 +
>  fs/nilfs2/sufile.c  | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index b4cebad21b48..7be632c15f91 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
>
>                 /* allocate the next next full segment */
> +               nextnextnum = segbuf->sb_segnum;
>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
>                 if (unlikely(err))
>                         goto failed_segbuf;
> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> index 77ff8e95421f..853a8212114f 100644
> --- a/fs/nilfs2/sufile.c
> +++ b/fs/nilfs2/sufile.c
> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>                 goto out_sem;
>         kaddr = kmap_atomic(header_bh->b_page);
>         header = kaddr + bh_offset(header_bh);
> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
>         kunmap_atomic(kaddr);
>
>         nsegments = nilfs_sufile_get_nsegments(sufile);
> --
> 2.25.1

Thank you for your help.   I have a few questions, so I'll ask them below.

> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> add a bh in two segbuf->sb_segsum_buffers.

If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
is updated.
all segment allocation must be done through nilfs_sufile_alloc().
And, the allocated segment is marked dirty on the sufile not to be
reallocated until it's freed.

So, why is it happening that the same segment is allocated twice in a log ?
Is it hard to fix the problem by correcting the calling sequence of
nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
?

I haven't looked closely at this patch yet, but I'm concerned about
the impact on other places as well.
nilfs_sufile_alloc() is also used in
nilfs_segctor_begin_construction() and
nilfs_prepare_segment_for_recovery().  Are there any side effects?

This patch turns an output-only argument into both input and output,
and that input value is always used in the calculation of
"last_alloc".
So, this change requires all callers to pass a meaningful initial
value (at least a valid value) to *segnump.

Another question, will this work near the end of the segments ?
Since segments are used cyclically, wouldn't comparison with the max
function break down there?
(I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  6:19     ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08  6:19 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

On Tue, Nov 8, 2022 at 1:41 PM Ryusuke Konishi wrote:
>
> Hi Liu Shixin,
>
> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> >
> > Syzbot reported a NULL pointer dereference:
> >
> >  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> >  Mem abort info:
> >    ESR = 0x0000000096000004
> >    EC = 0x25: DABT (current EL), IL = 32 bits
> >    SET = 0, FnV = 0
> >    EA = 0, S1PTW = 0
> >    FSC = 0x04: level 0 translation fault
> >  Data abort info:
> >    ISV = 0, ISS = 0x00000004
> >    CM = 0, WnR = 0
> >  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> >  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> >  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >  Modules linked in:
> >  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> >  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> >  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >  pc : _compound_head include/linux/page-flags.h:253 [inline]
> >  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> >  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >  sp : ffff80001290bc00
> >  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> >  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> >  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> >  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> >  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> >  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> >  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> >  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> >  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> >  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> >  Call trace:
> >   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> >   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> >   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> >   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> >   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> >   kthread+0x12c/0x158 kernel/kthread.c:376
> >   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> >
> > If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> >
> > Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
> > Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> > Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> > ---
> >  fs/nilfs2/segment.c | 1 +
> >  fs/nilfs2/sufile.c  | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> > index b4cebad21b48..7be632c15f91 100644
> > --- a/fs/nilfs2/segment.c
> > +++ b/fs/nilfs2/segment.c
> > @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> >                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> >
> >                 /* allocate the next next full segment */
> > +               nextnextnum = segbuf->sb_segnum;
> >                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> >                 if (unlikely(err))
> >                         goto failed_segbuf;
> > diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> > index 77ff8e95421f..853a8212114f 100644
> > --- a/fs/nilfs2/sufile.c
> > +++ b/fs/nilfs2/sufile.c
> > @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> >                 goto out_sem;
> >         kaddr = kmap_atomic(header_bh->b_page);
> >         header = kaddr + bh_offset(header_bh);
> > -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> > +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> >         kunmap_atomic(kaddr);
> >
> >         nsegments = nilfs_sufile_get_nsegments(sufile);
> > --
> > 2.25.1
>
> Thank you for your help.   I have a few questions, so I'll ask them below.
>
> > If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > add a bh in two segbuf->sb_segsum_buffers.
>
> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> is updated.
> all segment allocation must be done through nilfs_sufile_alloc().
> And, the allocated segment is marked dirty on the sufile not to be
> reallocated until it's freed.
>
> So, why is it happening that the same segment is allocated twice in a log ?
> Is it hard to fix the problem by correcting the calling sequence of
> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc() ?

If the cause of this problem is a sufile state inconsistency on disk,
rather than an imperfection in the call sequence, then some sanity
check should be added to treat the inconsistency as an error.
Since this came from syzbot, I suspect that it may have created an
on-disk metadata anomaly and hit this.

For example, if the next (or next next) segment that the latest
segment summary points to is not in the allocated (dirty) state and it
is missed and is causing this oops, I think we should change it to
check for that at mount time.

Either way, I feel like we need to look a little more at the root cause.

Ryusuke Konishi

>
> I haven't looked closely at this patch yet, but I'm concerned about
> the impact on other places as well.
> nilfs_sufile_alloc() is also used in
> nilfs_segctor_begin_construction() and
> nilfs_prepare_segment_for_recovery().  Are there any side effects?
>
> This patch turns an output-only argument into both input and output,
> and that input value is always used in the calculation of
> "last_alloc".
> So, this change requires all callers to pass a meaningful initial
> value (at least a valid value) to *segnump.
>
> Another question, will this work near the end of the segments ?
> Since segments are used cyclically, wouldn't comparison with the max
> function break down there?
> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
>
> Regards,
> Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  6:19     ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08  6:19 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 8, 2022 at 1:41 PM Ryusuke Konishi wrote:
>
> Hi Liu Shixin,
>
> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> >
> > Syzbot reported a NULL pointer dereference:
> >
> >  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> >  Mem abort info:
> >    ESR = 0x0000000096000004
> >    EC = 0x25: DABT (current EL), IL = 32 bits
> >    SET = 0, FnV = 0
> >    EA = 0, S1PTW = 0
> >    FSC = 0x04: level 0 translation fault
> >  Data abort info:
> >    ISV = 0, ISS = 0x00000004
> >    CM = 0, WnR = 0
> >  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> >  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> >  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >  Modules linked in:
> >  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> >  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> >  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >  pc : _compound_head include/linux/page-flags.h:253 [inline]
> >  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> >  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >  sp : ffff80001290bc00
> >  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> >  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> >  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> >  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> >  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> >  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> >  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> >  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> >  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> >  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> >  Call trace:
> >   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> >   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> >   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> >   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> >   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> >   kthread+0x12c/0x158 kernel/kthread.c:376
> >   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> >
> > If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> >
> > Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
> > Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> > Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/nilfs2/segment.c | 1 +
> >  fs/nilfs2/sufile.c  | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> > index b4cebad21b48..7be632c15f91 100644
> > --- a/fs/nilfs2/segment.c
> > +++ b/fs/nilfs2/segment.c
> > @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> >                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> >
> >                 /* allocate the next next full segment */
> > +               nextnextnum = segbuf->sb_segnum;
> >                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> >                 if (unlikely(err))
> >                         goto failed_segbuf;
> > diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> > index 77ff8e95421f..853a8212114f 100644
> > --- a/fs/nilfs2/sufile.c
> > +++ b/fs/nilfs2/sufile.c
> > @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> >                 goto out_sem;
> >         kaddr = kmap_atomic(header_bh->b_page);
> >         header = kaddr + bh_offset(header_bh);
> > -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> > +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> >         kunmap_atomic(kaddr);
> >
> >         nsegments = nilfs_sufile_get_nsegments(sufile);
> > --
> > 2.25.1
>
> Thank you for your help.   I have a few questions, so I'll ask them below.
>
> > If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > add a bh in two segbuf->sb_segsum_buffers.
>
> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> is updated.
> all segment allocation must be done through nilfs_sufile_alloc().
> And, the allocated segment is marked dirty on the sufile not to be
> reallocated until it's freed.
>
> So, why is it happening that the same segment is allocated twice in a log ?
> Is it hard to fix the problem by correcting the calling sequence of
> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc() ?

If the cause of this problem is a sufile state inconsistency on disk,
rather than an imperfection in the call sequence, then some sanity
check should be added to treat the inconsistency as an error.
Since this came from syzbot, I suspect that it may have created an
on-disk metadata anomaly and hit this.

For example, if the next (or next next) segment that the latest
segment summary points to is not in the allocated (dirty) state and it
is missed and is causing this oops, I think we should change it to
check for that at mount time.

Either way, I feel like we need to look a little more at the root cause.

Ryusuke Konishi

>
> I haven't looked closely at this patch yet, but I'm concerned about
> the impact on other places as well.
> nilfs_sufile_alloc() is also used in
> nilfs_segctor_begin_construction() and
> nilfs_prepare_segment_for_recovery().  Are there any side effects?
>
> This patch turns an output-only argument into both input and output,
> and that input value is always used in the calculation of
> "last_alloc".
> So, this change requires all callers to pass a meaningful initial
> value (at least a valid value) to *segnump.
>
> Another question, will this work near the end of the segments ?
> Since segments are used cyclically, wouldn't comparison with the max
> function break down there?
> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
>
> Regards,
> Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  6:49     ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-08  6:49 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Andrew Morton, linux-nilfs, linux-kernel



On 2022/11/8 12:41, Ryusuke Konishi wrote:
> Hi Liu Shixin,
>
> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
>> Syzbot reported a NULL pointer dereference:
>>
>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
>>  Mem abort info:
>>    ESR = 0x0000000096000004
>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>    SET = 0, FnV = 0
>>    EA = 0, S1PTW = 0
>>    FSC = 0x04: level 0 translation fault
>>  Data abort info:
>>    ISV = 0, ISS = 0x00000004
>>    CM = 0, WnR = 0
>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>  Modules linked in:
>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>  sp : ffff80001290bc00
>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
>>  Call trace:
>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
>>   kthread+0x12c/0x158 kernel/kthread.c:376
>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
>>
>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
>>
>> Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
>> ---
>>  fs/nilfs2/segment.c | 1 +
>>  fs/nilfs2/sufile.c  | 2 +-
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
>> index b4cebad21b48..7be632c15f91 100644
>> --- a/fs/nilfs2/segment.c
>> +++ b/fs/nilfs2/segment.c
>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
>>
>>                 /* allocate the next next full segment */
>> +               nextnextnum = segbuf->sb_segnum;
>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
>>                 if (unlikely(err))
>>                         goto failed_segbuf;
>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
>> index 77ff8e95421f..853a8212114f 100644
>> --- a/fs/nilfs2/sufile.c
>> +++ b/fs/nilfs2/sufile.c
>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>>                 goto out_sem;
>>         kaddr = kmap_atomic(header_bh->b_page);
>>         header = kaddr + bh_offset(header_bh);
>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
>>         kunmap_atomic(kaddr);
>>
>>         nsegments = nilfs_sufile_get_nsegments(sufile);
>> --
>> 2.25.1
> Thank you for your help.   I have a few questions, so I'll ask them below.
>
>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>> add a bh in two segbuf->sb_segsum_buffers.
> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> is updated.
> all segment allocation must be done through nilfs_sufile_alloc().
> And, the allocated segment is marked dirty on the sufile not to be
> reallocated until it's freed.
>
> So, why is it happening that the same segment is allocated twice in a log ?
> Is it hard to fix the problem by correcting the calling sequence of
> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> ?
The problem happened when we call nilfs_segctor_begin_construction() and satisfied
condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
all the time, so last_alloc is not updated.
Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
we will get sb_segnum again.

By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
>
> I haven't looked closely at this patch yet, but I'm concerned about
> the impact on other places as well.
> nilfs_sufile_alloc() is also used in
> nilfs_segctor_begin_construction() and
> nilfs_prepare_segment_for_recovery().  Are there any side effects?
>
> This patch turns an output-only argument into both input and output,
> and that input value is always used in the calculation of
> "last_alloc".
> So, this change requires all callers to pass a meaningful initial
> value (at least a valid value) to *segnump.
>
> Another question, will this work near the end of the segments ?
> Since segments are used cyclically, wouldn't comparison with the max
> function break down there?
> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
and seems to need improvement. Maybe there is a more reasonable solution.

Thanks,
>
> Regards,
> Ryusuke Konishi
> .
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08  6:49     ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-08  6:49 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA



On 2022/11/8 12:41, Ryusuke Konishi wrote:
> Hi Liu Shixin,
>
> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
>> Syzbot reported a NULL pointer dereference:
>>
>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
>>  Mem abort info:
>>    ESR = 0x0000000096000004
>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>    SET = 0, FnV = 0
>>    EA = 0, S1PTW = 0
>>    FSC = 0x04: level 0 translation fault
>>  Data abort info:
>>    ISV = 0, ISS = 0x00000004
>>    CM = 0, WnR = 0
>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>  Modules linked in:
>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>  sp : ffff80001290bc00
>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
>>  Call trace:
>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
>>   kthread+0x12c/0x158 kernel/kthread.c:376
>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
>>
>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
>>
>> Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
>> Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> ---
>>  fs/nilfs2/segment.c | 1 +
>>  fs/nilfs2/sufile.c  | 2 +-
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
>> index b4cebad21b48..7be632c15f91 100644
>> --- a/fs/nilfs2/segment.c
>> +++ b/fs/nilfs2/segment.c
>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
>>
>>                 /* allocate the next next full segment */
>> +               nextnextnum = segbuf->sb_segnum;
>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
>>                 if (unlikely(err))
>>                         goto failed_segbuf;
>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
>> index 77ff8e95421f..853a8212114f 100644
>> --- a/fs/nilfs2/sufile.c
>> +++ b/fs/nilfs2/sufile.c
>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>>                 goto out_sem;
>>         kaddr = kmap_atomic(header_bh->b_page);
>>         header = kaddr + bh_offset(header_bh);
>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
>>         kunmap_atomic(kaddr);
>>
>>         nsegments = nilfs_sufile_get_nsegments(sufile);
>> --
>> 2.25.1
> Thank you for your help.   I have a few questions, so I'll ask them below.
>
>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>> add a bh in two segbuf->sb_segsum_buffers.
> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> is updated.
> all segment allocation must be done through nilfs_sufile_alloc().
> And, the allocated segment is marked dirty on the sufile not to be
> reallocated until it's freed.
>
> So, why is it happening that the same segment is allocated twice in a log ?
> Is it hard to fix the problem by correcting the calling sequence of
> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> ?
The problem happened when we call nilfs_segctor_begin_construction() and satisfied
condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
all the time, so last_alloc is not updated.
Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
we will get sb_segnum again.

By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
>
> I haven't looked closely at this patch yet, but I'm concerned about
> the impact on other places as well.
> nilfs_sufile_alloc() is also used in
> nilfs_segctor_begin_construction() and
> nilfs_prepare_segment_for_recovery().  Are there any side effects?
>
> This patch turns an output-only argument into both input and output,
> and that input value is always used in the calculation of
> "last_alloc".
> So, this change requires all callers to pass a meaningful initial
> value (at least a valid value) to *segnump.
>
> Another question, will this work near the end of the segments ?
> Since segments are used cyclically, wouldn't comparison with the max
> function break down there?
> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
and seems to need improvement. Maybe there is a more reasonable solution.

Thanks,
>
> Regards,
> Ryusuke Konishi
> .
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 10:33       ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 10:33 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
>
>
>
> On 2022/11/8 12:41, Ryusuke Konishi wrote:
> > Hi Liu Shixin,
> >
> > On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> >> Syzbot reported a NULL pointer dereference:
> >>
> >>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> >>  Mem abort info:
> >>    ESR = 0x0000000096000004
> >>    EC = 0x25: DABT (current EL), IL = 32 bits
> >>    SET = 0, FnV = 0
> >>    EA = 0, S1PTW = 0
> >>    FSC = 0x04: level 0 translation fault
> >>  Data abort info:
> >>    ISV = 0, ISS = 0x00000004
> >>    CM = 0, WnR = 0
> >>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> >>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> >>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >>  Modules linked in:
> >>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> >>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> >>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>  pc : _compound_head include/linux/page-flags.h:253 [inline]
> >>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> >>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>  sp : ffff80001290bc00
> >>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> >>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> >>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> >>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> >>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> >>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> >>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> >>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> >>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> >>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> >>  Call trace:
> >>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> >>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> >>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> >>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> >>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> >>   kthread+0x12c/0x158 kernel/kthread.c:376
> >>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> >>
> >> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> >>
> >> Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
> >> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> >> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> >> ---
> >>  fs/nilfs2/segment.c | 1 +
> >>  fs/nilfs2/sufile.c  | 2 +-
> >>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> >> index b4cebad21b48..7be632c15f91 100644
> >> --- a/fs/nilfs2/segment.c
> >> +++ b/fs/nilfs2/segment.c
> >> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> >>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> >>
> >>                 /* allocate the next next full segment */
> >> +               nextnextnum = segbuf->sb_segnum;
> >>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> >>                 if (unlikely(err))
> >>                         goto failed_segbuf;
> >> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> >> index 77ff8e95421f..853a8212114f 100644
> >> --- a/fs/nilfs2/sufile.c
> >> +++ b/fs/nilfs2/sufile.c
> >> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> >>                 goto out_sem;
> >>         kaddr = kmap_atomic(header_bh->b_page);
> >>         header = kaddr + bh_offset(header_bh);
> >> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> >> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> >>         kunmap_atomic(kaddr);
> >>
> >>         nsegments = nilfs_sufile_get_nsegments(sufile);
> >> --
> >> 2.25.1
> > Thank you for your help.   I have a few questions, so I'll ask them below.
> >
> >> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >> add a bh in two segbuf->sb_segsum_buffers.
> > If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> > is updated.
> > all segment allocation must be done through nilfs_sufile_alloc().
> > And, the allocated segment is marked dirty on the sufile not to be
> > reallocated until it's freed.
> >
> > So, why is it happening that the same segment is allocated twice in a log ?
> > Is it hard to fix the problem by correcting the calling sequence of
> > nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> > ?
> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> all the time, so last_alloc is not updated.
> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> we will get sb_segnum again.
>
> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> >
> > I haven't looked closely at this patch yet, but I'm concerned about
> > the impact on other places as well.
> > nilfs_sufile_alloc() is also used in
> > nilfs_segctor_begin_construction() and
> > nilfs_prepare_segment_for_recovery().  Are there any side effects?
> >
> > This patch turns an output-only argument into both input and output,
> > and that input value is always used in the calculation of
> > "last_alloc".
> > So, this change requires all callers to pass a meaningful initial
> > value (at least a valid value) to *segnump.
> >
> > Another question, will this work near the end of the segments ?
> > Since segments are used cyclically, wouldn't comparison with the max
> > function break down there?
> > (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
> and seems to need improvement. Maybe there is a more reasonable solution.
>
> Thanks,

I will explain step by step.

> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?

nilfs->ns_segnum holds the segment number where nilfs2 currently
writes logs to, and nilfs->ns_nextnum holds the next segment number
where nilfs2 will select to write logs when the current segment gets
full.
Here, a segment is a fixed sized container of logs (*), in which
multiple logs can be appended and written in.
(*) Exceptionally, segment #0 is shorter than others to exclude the
superblock area.

The reason "nextnum" is needed is that mount-time recovery processing
needs to know which segment the logs follow.

Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
allocated on the sufile.
Here, "allocated" means the corresponding segment usage entry is
marked "dirty" on the sufile.
In nilfs_sufile_alloc() this is done with the following line:

                        nilfs_segment_usage_set_dirty(su);

On the other hand, the test whether the segment is allocatable or not
is done at the following part:

                        if (!nilfs_segment_usage_clean(su))
                                continue;
                        /* found a clean segment */

Coming back to your last question, if nilfs->ns_segnum !=
nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
the segment of ns_segnum.
Both should be in the allocated state, so there is no need to call
nilfs_sufile_alloc() in this case.

If nilfs->ns_segnum == nilfs->ns_nextnum in
nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
is full and no more logs can be written.
In this case, a new segment must be allocated for new ns_nextnum,
therefore nilfs_sufile_alloc() will be called.

> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> all the time, so last_alloc is not updated.

As I explained above, it is expected behavior that
nilfs_sufile_alloc() is not called and last_alloc is not updated if
nilfs->ns_segnum != nilfs->ns_nextnum.

nilfs_segbuf_map() is always called there to set the geometry
information in the segbuf.  This is expected behavior.
And, nilfs_segbuf_set_next_segnum() is called later to set the
determined ns_nextnum to the segbuf, so this is expected behavior as
well.

> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> we will get sb_segnum again.

nilfs_segctor_extend_segments() pre-allocates one or more segments
large enough to store updated blocks of metadata files that need to be
written in a series of logs at once, and sets up a chain of segbufs.
(Sorry for the missing function comment).

sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
segments.  This is expected behavior.
And, the sb_nextnum (= next sb_segnum) will be given by
nilfs_sufile_alloc().   This is also expected.
It looks like the problem is that nilfs_sufile_alloc() here allocates
the same segnum again.

Because sb_segnum is set by prev->sb_nextnum which is allocated by the
previous nilfs_sufile_alloc() call,
this usually does not happen.

A possible anomaly is if the segment pointed by the first nextnum (or
segnum) was not marked dirty on sufile.
This may happen if the sufile is corrupted on the test disk image that
syzbot provided (mounted).

Can you confirm if this is actually happening?

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 10:33       ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 10:33 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
>
>
>
> On 2022/11/8 12:41, Ryusuke Konishi wrote:
> > Hi Liu Shixin,
> >
> > On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> >> Syzbot reported a NULL pointer dereference:
> >>
> >>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> >>  Mem abort info:
> >>    ESR = 0x0000000096000004
> >>    EC = 0x25: DABT (current EL), IL = 32 bits
> >>    SET = 0, FnV = 0
> >>    EA = 0, S1PTW = 0
> >>    FSC = 0x04: level 0 translation fault
> >>  Data abort info:
> >>    ISV = 0, ISS = 0x00000004
> >>    CM = 0, WnR = 0
> >>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> >>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> >>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >>  Modules linked in:
> >>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> >>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> >>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>  pc : _compound_head include/linux/page-flags.h:253 [inline]
> >>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> >>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>  sp : ffff80001290bc00
> >>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> >>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> >>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> >>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> >>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> >>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> >>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> >>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> >>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> >>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> >>  Call trace:
> >>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> >>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> >>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> >>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> >>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> >>   kthread+0x12c/0x158 kernel/kthread.c:376
> >>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> >>
> >> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> >>
> >> Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
> >> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> >> Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> >> ---
> >>  fs/nilfs2/segment.c | 1 +
> >>  fs/nilfs2/sufile.c  | 2 +-
> >>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> >> index b4cebad21b48..7be632c15f91 100644
> >> --- a/fs/nilfs2/segment.c
> >> +++ b/fs/nilfs2/segment.c
> >> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> >>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> >>
> >>                 /* allocate the next next full segment */
> >> +               nextnextnum = segbuf->sb_segnum;
> >>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> >>                 if (unlikely(err))
> >>                         goto failed_segbuf;
> >> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> >> index 77ff8e95421f..853a8212114f 100644
> >> --- a/fs/nilfs2/sufile.c
> >> +++ b/fs/nilfs2/sufile.c
> >> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> >>                 goto out_sem;
> >>         kaddr = kmap_atomic(header_bh->b_page);
> >>         header = kaddr + bh_offset(header_bh);
> >> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> >> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> >>         kunmap_atomic(kaddr);
> >>
> >>         nsegments = nilfs_sufile_get_nsegments(sufile);
> >> --
> >> 2.25.1
> > Thank you for your help.   I have a few questions, so I'll ask them below.
> >
> >> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >> add a bh in two segbuf->sb_segsum_buffers.
> > If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> > is updated.
> > all segment allocation must be done through nilfs_sufile_alloc().
> > And, the allocated segment is marked dirty on the sufile not to be
> > reallocated until it's freed.
> >
> > So, why is it happening that the same segment is allocated twice in a log ?
> > Is it hard to fix the problem by correcting the calling sequence of
> > nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> > ?
> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> all the time, so last_alloc is not updated.
> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> we will get sb_segnum again.
>
> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> >
> > I haven't looked closely at this patch yet, but I'm concerned about
> > the impact on other places as well.
> > nilfs_sufile_alloc() is also used in
> > nilfs_segctor_begin_construction() and
> > nilfs_prepare_segment_for_recovery().  Are there any side effects?
> >
> > This patch turns an output-only argument into both input and output,
> > and that input value is always used in the calculation of
> > "last_alloc".
> > So, this change requires all callers to pass a meaningful initial
> > value (at least a valid value) to *segnump.
> >
> > Another question, will this work near the end of the segments ?
> > Since segments are used cyclically, wouldn't comparison with the max
> > function break down there?
> > (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
> and seems to need improvement. Maybe there is a more reasonable solution.
>
> Thanks,

I will explain step by step.

> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?

nilfs->ns_segnum holds the segment number where nilfs2 currently
writes logs to, and nilfs->ns_nextnum holds the next segment number
where nilfs2 will select to write logs when the current segment gets
full.
Here, a segment is a fixed sized container of logs (*), in which
multiple logs can be appended and written in.
(*) Exceptionally, segment #0 is shorter than others to exclude the
superblock area.

The reason "nextnum" is needed is that mount-time recovery processing
needs to know which segment the logs follow.

Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
allocated on the sufile.
Here, "allocated" means the corresponding segment usage entry is
marked "dirty" on the sufile.
In nilfs_sufile_alloc() this is done with the following line:

                        nilfs_segment_usage_set_dirty(su);

On the other hand, the test whether the segment is allocatable or not
is done at the following part:

                        if (!nilfs_segment_usage_clean(su))
                                continue;
                        /* found a clean segment */

Coming back to your last question, if nilfs->ns_segnum !=
nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
the segment of ns_segnum.
Both should be in the allocated state, so there is no need to call
nilfs_sufile_alloc() in this case.

If nilfs->ns_segnum == nilfs->ns_nextnum in
nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
is full and no more logs can be written.
In this case, a new segment must be allocated for new ns_nextnum,
therefore nilfs_sufile_alloc() will be called.

> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> all the time, so last_alloc is not updated.

As I explained above, it is expected behavior that
nilfs_sufile_alloc() is not called and last_alloc is not updated if
nilfs->ns_segnum != nilfs->ns_nextnum.

nilfs_segbuf_map() is always called there to set the geometry
information in the segbuf.  This is expected behavior.
And, nilfs_segbuf_set_next_segnum() is called later to set the
determined ns_nextnum to the segbuf, so this is expected behavior as
well.

> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> we will get sb_segnum again.

nilfs_segctor_extend_segments() pre-allocates one or more segments
large enough to store updated blocks of metadata files that need to be
written in a series of logs at once, and sets up a chain of segbufs.
(Sorry for the missing function comment).

sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
segments.  This is expected behavior.
And, the sb_nextnum (= next sb_segnum) will be given by
nilfs_sufile_alloc().   This is also expected.
It looks like the problem is that nilfs_sufile_alloc() here allocates
the same segnum again.

Because sb_segnum is set by prev->sb_nextnum which is allocated by the
previous nilfs_sufile_alloc() call,
this usually does not happen.

A possible anomaly is if the segment pointed by the first nextnum (or
segnum) was not marked dirty on sufile.
This may happen if the sufile is corrupted on the test disk image that
syzbot provided (mounted).

Can you confirm if this is actually happening?

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
  2022-11-08 10:33       ` Ryusuke Konishi
  (?)
@ 2022-11-08 11:10       ` Ryusuke Konishi
  -1 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 11:10 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

On Tue, Nov 8, 2022 at 7:33 PM Ryusuke Konishi  wrote:
> On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
> > Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> > and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> > we will get sb_segnum again.
>
> nilfs_segctor_extend_segments() pre-allocates one or more segments
> large enough to store updated blocks of metadata files that need to be
> written in a series of logs at once, and sets up a chain of segbufs.
> (Sorry for the missing function comment).
>
> sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
> segments.  This is expected behavior.
> And, the sb_nextnum (= next sb_segnum) will be given by
> nilfs_sufile_alloc().   This is also expected.
> It looks like the problem is that nilfs_sufile_alloc() here allocates
> the same segnum again.
>
> Because sb_segnum is set by prev->sb_nextnum which is allocated by the
> previous nilfs_sufile_alloc() call,
> this usually does not happen.
>
> A possible anomaly is if the segment pointed by the first nextnum (or
> segnum) was not marked dirty on sufile.
> This may happen if the sufile is corrupted on the test disk image that
> syzbot provided (mounted).
>
> Can you confirm if this is actually happening?

If we can mount the test disk image, the state of sufile can be
confirmed quickly with lssu command:

$ lssu
              SEGNUM        DATE     TIME STAT     NBLOCKS
                   3  2022-11-04 23:23:49  -d-        2048
                   4  2022-11-04 23:23:50  ad-         103
                   5  ---------- --:--:--  ad-           0

Here, the flag "d" in STAT means the segment is dirty (in-use) and the
segment of ns_segnum or ns_nextnum is indicated with the "a" flag.
This is an example of a normal disk image.
Or, if it's easy to insert debug code to check, that's fine too.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 11:53         ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-08 11:53 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Andrew Morton, linux-nilfs, linux-kernel



On 2022/11/8 18:33, Ryusuke Konishi wrote:
> On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
>>
>>
>> On 2022/11/8 12:41, Ryusuke Konishi wrote:
>>> Hi Liu Shixin,
>>>
>>> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
>>>> Syzbot reported a NULL pointer dereference:
>>>>
>>>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
>>>>  Mem abort info:
>>>>    ESR = 0x0000000096000004
>>>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>>>    SET = 0, FnV = 0
>>>>    EA = 0, S1PTW = 0
>>>>    FSC = 0x04: level 0 translation fault
>>>>  Data abort info:
>>>>    ISV = 0, ISS = 0x00000004
>>>>    CM = 0, WnR = 0
>>>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
>>>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
>>>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>  Modules linked in:
>>>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
>>>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
>>>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
>>>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
>>>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>>>  sp : ffff80001290bc00
>>>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
>>>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
>>>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
>>>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
>>>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
>>>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
>>>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
>>>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
>>>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
>>>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
>>>>  Call trace:
>>>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
>>>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
>>>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
>>>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
>>>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
>>>>   kthread+0x12c/0x158 kernel/kthread.c:376
>>>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
>>>>
>>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>>>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
>>>>
>>>> Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
>>>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
>>>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
>>>> ---
>>>>  fs/nilfs2/segment.c | 1 +
>>>>  fs/nilfs2/sufile.c  | 2 +-
>>>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
>>>> index b4cebad21b48..7be632c15f91 100644
>>>> --- a/fs/nilfs2/segment.c
>>>> +++ b/fs/nilfs2/segment.c
>>>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
>>>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
>>>>
>>>>                 /* allocate the next next full segment */
>>>> +               nextnextnum = segbuf->sb_segnum;
>>>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
>>>>                 if (unlikely(err))
>>>>                         goto failed_segbuf;
>>>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
>>>> index 77ff8e95421f..853a8212114f 100644
>>>> --- a/fs/nilfs2/sufile.c
>>>> +++ b/fs/nilfs2/sufile.c
>>>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>>>>                 goto out_sem;
>>>>         kaddr = kmap_atomic(header_bh->b_page);
>>>>         header = kaddr + bh_offset(header_bh);
>>>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
>>>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
>>>>         kunmap_atomic(kaddr);
>>>>
>>>>         nsegments = nilfs_sufile_get_nsegments(sufile);
>>>> --
>>>> 2.25.1
>>> Thank you for your help.   I have a few questions, so I'll ask them below.
>>>
>>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>>>> add a bh in two segbuf->sb_segsum_buffers.
>>> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
>>> is updated.
>>> all segment allocation must be done through nilfs_sufile_alloc().
>>> And, the allocated segment is marked dirty on the sufile not to be
>>> reallocated until it's freed.
>>>
>>> So, why is it happening that the same segment is allocated twice in a log ?
>>> Is it hard to fix the problem by correcting the calling sequence of
>>> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
>>> ?
>> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
>> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
>> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
>> all the time, so last_alloc is not updated.
>> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
>> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
>> we will get sb_segnum again.
>>
>> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
>> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
>>> I haven't looked closely at this patch yet, but I'm concerned about
>>> the impact on other places as well.
>>> nilfs_sufile_alloc() is also used in
>>> nilfs_segctor_begin_construction() and
>>> nilfs_prepare_segment_for_recovery().  Are there any side effects?
>>>
>>> This patch turns an output-only argument into both input and output,
>>> and that input value is always used in the calculation of
>>> "last_alloc".
>>> So, this change requires all callers to pass a meaningful initial
>>> value (at least a valid value) to *segnump.
>>>
>>> Another question, will this work near the end of the segments ?
>>> Since segments are used cyclically, wouldn't comparison with the max
>>> function break down there?
>>> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
>> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
>> and seems to need improvement. Maybe there is a more reasonable solution.
>>
>> Thanks,
> I will explain step by step.
>
>> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
>> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> nilfs->ns_segnum holds the segment number where nilfs2 currently
> writes logs to, and nilfs->ns_nextnum holds the next segment number
> where nilfs2 will select to write logs when the current segment gets
> full.
> Here, a segment is a fixed sized container of logs (*), in which
> multiple logs can be appended and written in.
> (*) Exceptionally, segment #0 is shorter than others to exclude the
> superblock area.
>
> The reason "nextnum" is needed is that mount-time recovery processing
> needs to know which segment the logs follow.
>
> Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
> allocated on the sufile.
> Here, "allocated" means the corresponding segment usage entry is
> marked "dirty" on the sufile.
> In nilfs_sufile_alloc() this is done with the following line:
>
>                         nilfs_segment_usage_set_dirty(su);
>
> On the other hand, the test whether the segment is allocatable or not
> is done at the following part:
>
>                         if (!nilfs_segment_usage_clean(su))
>                                 continue;
>                         /* found a clean segment */
>
> Coming back to your last question, if nilfs->ns_segnum !=
> nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
> the segment of ns_segnum.
> Both should be in the allocated state, so there is no need to call
> nilfs_sufile_alloc() in this case.
>
> If nilfs->ns_segnum == nilfs->ns_nextnum in
> nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
> sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
> is full and no more logs can be written.
> In this case, a new segment must be allocated for new ns_nextnum,
> therefore nilfs_sufile_alloc() will be called.
>
>> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
>> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
>> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
>> all the time, so last_alloc is not updated.
> As I explained above, it is expected behavior that
> nilfs_sufile_alloc() is not called and last_alloc is not updated if
> nilfs->ns_segnum != nilfs->ns_nextnum.
>
> nilfs_segbuf_map() is always called there to set the geometry
> information in the segbuf.  This is expected behavior.
> And, nilfs_segbuf_set_next_segnum() is called later to set the
> determined ns_nextnum to the segbuf, so this is expected behavior as
> well.
>
>> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
>> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
>> we will get sb_segnum again.
> nilfs_segctor_extend_segments() pre-allocates one or more segments
> large enough to store updated blocks of metadata files that need to be
> written in a series of logs at once, and sets up a chain of segbufs.
> (Sorry for the missing function comment).
>
> sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
> segments.  This is expected behavior.
> And, the sb_nextnum (= next sb_segnum) will be given by
> nilfs_sufile_alloc().   This is also expected.
> It looks like the problem is that nilfs_sufile_alloc() here allocates
> the same segnum again.
Thanks to your series of explanations, I learned a lot. So I can assume that
the skip of nilfs_sufile_alloc() here is because it's already been allocated. So
in next time, we will skip previous sb_segnum natually. And the problem
should not be happend.

>
> Because sb_segnum is set by prev->sb_nextnum which is allocated by the
> previous nilfs_sufile_alloc() call,
> this usually does not happen.
>
> A possible anomaly is if the segment pointed by the first nextnum (or
> segnum) was not marked dirty on sufile.
> This may happen if the sufile is corrupted on the test disk image that
> syzbot provided (mounted).
>
> Can you confirm if this is actually happening?
This problem was discovered through syzbot and can be reproduced. The root
reason I think is data corruption rather than logical error. So we should add
some protection instead of modifying the logic, right?
>
> Regards,
> Ryusuke Konishi
> .
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 11:53         ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-08 11:53 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA



On 2022/11/8 18:33, Ryusuke Konishi wrote:
> On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
>>
>>
>> On 2022/11/8 12:41, Ryusuke Konishi wrote:
>>> Hi Liu Shixin,
>>>
>>> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
>>>> Syzbot reported a NULL pointer dereference:
>>>>
>>>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
>>>>  Mem abort info:
>>>>    ESR = 0x0000000096000004
>>>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>>>    SET = 0, FnV = 0
>>>>    EA = 0, S1PTW = 0
>>>>    FSC = 0x04: level 0 translation fault
>>>>  Data abort info:
>>>>    ISV = 0, ISS = 0x00000004
>>>>    CM = 0, WnR = 0
>>>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
>>>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
>>>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>  Modules linked in:
>>>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
>>>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
>>>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
>>>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
>>>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>>>  sp : ffff80001290bc00
>>>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
>>>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
>>>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
>>>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
>>>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
>>>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
>>>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
>>>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
>>>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
>>>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
>>>>  Call trace:
>>>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
>>>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
>>>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
>>>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
>>>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
>>>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
>>>>   kthread+0x12c/0x158 kernel/kthread.c:376
>>>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
>>>>
>>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>>>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
>>>>
>>>> Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
>>>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
>>>> Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>  fs/nilfs2/segment.c | 1 +
>>>>  fs/nilfs2/sufile.c  | 2 +-
>>>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
>>>> index b4cebad21b48..7be632c15f91 100644
>>>> --- a/fs/nilfs2/segment.c
>>>> +++ b/fs/nilfs2/segment.c
>>>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
>>>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
>>>>
>>>>                 /* allocate the next next full segment */
>>>> +               nextnextnum = segbuf->sb_segnum;
>>>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
>>>>                 if (unlikely(err))
>>>>                         goto failed_segbuf;
>>>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
>>>> index 77ff8e95421f..853a8212114f 100644
>>>> --- a/fs/nilfs2/sufile.c
>>>> +++ b/fs/nilfs2/sufile.c
>>>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>>>>                 goto out_sem;
>>>>         kaddr = kmap_atomic(header_bh->b_page);
>>>>         header = kaddr + bh_offset(header_bh);
>>>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
>>>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
>>>>         kunmap_atomic(kaddr);
>>>>
>>>>         nsegments = nilfs_sufile_get_nsegments(sufile);
>>>> --
>>>> 2.25.1
>>> Thank you for your help.   I have a few questions, so I'll ask them below.
>>>
>>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
>>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
>>>> add a bh in two segbuf->sb_segsum_buffers.
>>> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
>>> is updated.
>>> all segment allocation must be done through nilfs_sufile_alloc().
>>> And, the allocated segment is marked dirty on the sufile not to be
>>> reallocated until it's freed.
>>>
>>> So, why is it happening that the same segment is allocated twice in a log ?
>>> Is it hard to fix the problem by correcting the calling sequence of
>>> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
>>> ?
>> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
>> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
>> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
>> all the time, so last_alloc is not updated.
>> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
>> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
>> we will get sb_segnum again.
>>
>> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
>> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
>>> I haven't looked closely at this patch yet, but I'm concerned about
>>> the impact on other places as well.
>>> nilfs_sufile_alloc() is also used in
>>> nilfs_segctor_begin_construction() and
>>> nilfs_prepare_segment_for_recovery().  Are there any side effects?
>>>
>>> This patch turns an output-only argument into both input and output,
>>> and that input value is always used in the calculation of
>>> "last_alloc".
>>> So, this change requires all callers to pass a meaningful initial
>>> value (at least a valid value) to *segnump.
>>>
>>> Another question, will this work near the end of the segments ?
>>> Since segments are used cyclically, wouldn't comparison with the max
>>> function break down there?
>>> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
>> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
>> and seems to need improvement. Maybe there is a more reasonable solution.
>>
>> Thanks,
> I will explain step by step.
>
>> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
>> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> nilfs->ns_segnum holds the segment number where nilfs2 currently
> writes logs to, and nilfs->ns_nextnum holds the next segment number
> where nilfs2 will select to write logs when the current segment gets
> full.
> Here, a segment is a fixed sized container of logs (*), in which
> multiple logs can be appended and written in.
> (*) Exceptionally, segment #0 is shorter than others to exclude the
> superblock area.
>
> The reason "nextnum" is needed is that mount-time recovery processing
> needs to know which segment the logs follow.
>
> Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
> allocated on the sufile.
> Here, "allocated" means the corresponding segment usage entry is
> marked "dirty" on the sufile.
> In nilfs_sufile_alloc() this is done with the following line:
>
>                         nilfs_segment_usage_set_dirty(su);
>
> On the other hand, the test whether the segment is allocatable or not
> is done at the following part:
>
>                         if (!nilfs_segment_usage_clean(su))
>                                 continue;
>                         /* found a clean segment */
>
> Coming back to your last question, if nilfs->ns_segnum !=
> nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
> the segment of ns_segnum.
> Both should be in the allocated state, so there is no need to call
> nilfs_sufile_alloc() in this case.
>
> If nilfs->ns_segnum == nilfs->ns_nextnum in
> nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
> sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
> is full and no more logs can be written.
> In this case, a new segment must be allocated for new ns_nextnum,
> therefore nilfs_sufile_alloc() will be called.
>
>> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
>> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
>> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
>> all the time, so last_alloc is not updated.
> As I explained above, it is expected behavior that
> nilfs_sufile_alloc() is not called and last_alloc is not updated if
> nilfs->ns_segnum != nilfs->ns_nextnum.
>
> nilfs_segbuf_map() is always called there to set the geometry
> information in the segbuf.  This is expected behavior.
> And, nilfs_segbuf_set_next_segnum() is called later to set the
> determined ns_nextnum to the segbuf, so this is expected behavior as
> well.
>
>> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
>> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
>> we will get sb_segnum again.
> nilfs_segctor_extend_segments() pre-allocates one or more segments
> large enough to store updated blocks of metadata files that need to be
> written in a series of logs at once, and sets up a chain of segbufs.
> (Sorry for the missing function comment).
>
> sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
> segments.  This is expected behavior.
> And, the sb_nextnum (= next sb_segnum) will be given by
> nilfs_sufile_alloc().   This is also expected.
> It looks like the problem is that nilfs_sufile_alloc() here allocates
> the same segnum again.
Thanks to your series of explanations, I learned a lot. So I can assume that
the skip of nilfs_sufile_alloc() here is because it's already been allocated. So
in next time, we will skip previous sb_segnum natually. And the problem
should not be happend.

>
> Because sb_segnum is set by prev->sb_nextnum which is allocated by the
> previous nilfs_sufile_alloc() call,
> this usually does not happen.
>
> A possible anomaly is if the segment pointed by the first nextnum (or
> segnum) was not marked dirty on sufile.
> This may happen if the sufile is corrupted on the test disk image that
> syzbot provided (mounted).
>
> Can you confirm if this is actually happening?
This problem was discovered through syzbot and can be reproduced. The root
reason I think is data corruption rather than logical error. So we should add
some protection instead of modifying the logic, right?
>
> Regards,
> Ryusuke Konishi
> .
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 12:24           ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 12:24 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

On Tue, Nov 8, 2022 at 8:53 PM Liu Shixin wrote:
>
>
>
> On 2022/11/8 18:33, Ryusuke Konishi wrote:
> > On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
> >>
> >>
> >> On 2022/11/8 12:41, Ryusuke Konishi wrote:
> >>> Hi Liu Shixin,
> >>>
> >>> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> >>>> Syzbot reported a NULL pointer dereference:
> >>>>
> >>>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> >>>>  Mem abort info:
> >>>>    ESR = 0x0000000096000004
> >>>>    EC = 0x25: DABT (current EL), IL = 32 bits
> >>>>    SET = 0, FnV = 0
> >>>>    EA = 0, S1PTW = 0
> >>>>    FSC = 0x04: level 0 translation fault
> >>>>  Data abort info:
> >>>>    ISV = 0, ISS = 0x00000004
> >>>>    CM = 0, WnR = 0
> >>>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> >>>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> >>>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >>>>  Modules linked in:
> >>>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> >>>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> >>>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
> >>>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> >>>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>>>  sp : ffff80001290bc00
> >>>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> >>>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> >>>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> >>>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> >>>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> >>>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> >>>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> >>>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> >>>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> >>>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> >>>>  Call trace:
> >>>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> >>>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> >>>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> >>>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> >>>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> >>>>   kthread+0x12c/0x158 kernel/kthread.c:376
> >>>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> >>>>
> >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >>>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> >>>>
> >>>> Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
> >>>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> >>>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> >>>> ---
> >>>>  fs/nilfs2/segment.c | 1 +
> >>>>  fs/nilfs2/sufile.c  | 2 +-
> >>>>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> >>>> index b4cebad21b48..7be632c15f91 100644
> >>>> --- a/fs/nilfs2/segment.c
> >>>> +++ b/fs/nilfs2/segment.c
> >>>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> >>>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> >>>>
> >>>>                 /* allocate the next next full segment */
> >>>> +               nextnextnum = segbuf->sb_segnum;
> >>>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> >>>>                 if (unlikely(err))
> >>>>                         goto failed_segbuf;
> >>>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> >>>> index 77ff8e95421f..853a8212114f 100644
> >>>> --- a/fs/nilfs2/sufile.c
> >>>> +++ b/fs/nilfs2/sufile.c
> >>>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> >>>>                 goto out_sem;
> >>>>         kaddr = kmap_atomic(header_bh->b_page);
> >>>>         header = kaddr + bh_offset(header_bh);
> >>>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> >>>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> >>>>         kunmap_atomic(kaddr);
> >>>>
> >>>>         nsegments = nilfs_sufile_get_nsegments(sufile);
> >>>> --
> >>>> 2.25.1
> >>> Thank you for your help.   I have a few questions, so I'll ask them below.
> >>>
> >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >>>> add a bh in two segbuf->sb_segsum_buffers.
> >>> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> >>> is updated.
> >>> all segment allocation must be done through nilfs_sufile_alloc().
> >>> And, the allocated segment is marked dirty on the sufile not to be
> >>> reallocated until it's freed.
> >>>
> >>> So, why is it happening that the same segment is allocated twice in a log ?
> >>> Is it hard to fix the problem by correcting the calling sequence of
> >>> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> >>> ?
> >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> >> all the time, so last_alloc is not updated.
> >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> >> we will get sb_segnum again.
> >>
> >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> >>> I haven't looked closely at this patch yet, but I'm concerned about
> >>> the impact on other places as well.
> >>> nilfs_sufile_alloc() is also used in
> >>> nilfs_segctor_begin_construction() and
> >>> nilfs_prepare_segment_for_recovery().  Are there any side effects?
> >>>
> >>> This patch turns an output-only argument into both input and output,
> >>> and that input value is always used in the calculation of
> >>> "last_alloc".
> >>> So, this change requires all callers to pass a meaningful initial
> >>> value (at least a valid value) to *segnump.
> >>>
> >>> Another question, will this work near the end of the segments ?
> >>> Since segments are used cyclically, wouldn't comparison with the max
> >>> function break down there?
> >>> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
> >> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
> >> and seems to need improvement. Maybe there is a more reasonable solution.
> >>
> >> Thanks,
> > I will explain step by step.
> >
> >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> > nilfs->ns_segnum holds the segment number where nilfs2 currently
> > writes logs to, and nilfs->ns_nextnum holds the next segment number
> > where nilfs2 will select to write logs when the current segment gets
> > full.
> > Here, a segment is a fixed sized container of logs (*), in which
> > multiple logs can be appended and written in.
> > (*) Exceptionally, segment #0 is shorter than others to exclude the
> > superblock area.
> >
> > The reason "nextnum" is needed is that mount-time recovery processing
> > needs to know which segment the logs follow.
> >
> > Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
> > allocated on the sufile.
> > Here, "allocated" means the corresponding segment usage entry is
> > marked "dirty" on the sufile.
> > In nilfs_sufile_alloc() this is done with the following line:
> >
> >                         nilfs_segment_usage_set_dirty(su);
> >
> > On the other hand, the test whether the segment is allocatable or not
> > is done at the following part:
> >
> >                         if (!nilfs_segment_usage_clean(su))
> >                                 continue;
> >                         /* found a clean segment */
> >
> > Coming back to your last question, if nilfs->ns_segnum !=
> > nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
> > the segment of ns_segnum.
> > Both should be in the allocated state, so there is no need to call
> > nilfs_sufile_alloc() in this case.
> >
> > If nilfs->ns_segnum == nilfs->ns_nextnum in
> > nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
> > sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
> > is full and no more logs can be written.
> > In this case, a new segment must be allocated for new ns_nextnum,
> > therefore nilfs_sufile_alloc() will be called.
> >
> >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> >> all the time, so last_alloc is not updated.
> > As I explained above, it is expected behavior that
> > nilfs_sufile_alloc() is not called and last_alloc is not updated if
> > nilfs->ns_segnum != nilfs->ns_nextnum.
> >
> > nilfs_segbuf_map() is always called there to set the geometry
> > information in the segbuf.  This is expected behavior.
> > And, nilfs_segbuf_set_next_segnum() is called later to set the
> > determined ns_nextnum to the segbuf, so this is expected behavior as
> > well.
> >
> >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> >> we will get sb_segnum again.
> > nilfs_segctor_extend_segments() pre-allocates one or more segments
> > large enough to store updated blocks of metadata files that need to be
> > written in a series of logs at once, and sets up a chain of segbufs.
> > (Sorry for the missing function comment).
> >
> > sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
> > segments.  This is expected behavior.
> > And, the sb_nextnum (= next sb_segnum) will be given by
> > nilfs_sufile_alloc().   This is also expected.
> > It looks like the problem is that nilfs_sufile_alloc() here allocates
> > the same segnum again.
> Thanks to your series of explanations, I learned a lot. So I can assume that
> the skip of nilfs_sufile_alloc() here is because it's already been allocated. So
> in next time, we will skip previous sb_segnum natually. And the problem
> should not be happend.
>
> >
> > Because sb_segnum is set by prev->sb_nextnum which is allocated by the
> > previous nilfs_sufile_alloc() call,
> > this usually does not happen.
> >
> > A possible anomaly is if the segment pointed by the first nextnum (or
> > segnum) was not marked dirty on sufile.
> > This may happen if the sufile is corrupted on the test disk image that
> > syzbot provided (mounted).
> >
> > Can you confirm if this is actually happening?
> This problem was discovered through syzbot and can be reproduced. The root
> reason I think is data corruption rather than logical error. So we should add
> some protection instead of modifying the logic, right?

I think the most likely cause is metadata corruption.  If so, we
should fix it by adding a proper sanity check, yes.
However, there is still the possibility that the error retry logic
after detecting errors has a flaw. (I believe at least this is not a
problem with normal paths.)
The sufile state inconsistency above is hypothetical for now.
Either way, I'd like to make sure what's actually happening (and where
the anomaly is coming from) so we can decide how to fix it.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 12:24           ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 12:24 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 8, 2022 at 8:53 PM Liu Shixin wrote:
>
>
>
> On 2022/11/8 18:33, Ryusuke Konishi wrote:
> > On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
> >>
> >>
> >> On 2022/11/8 12:41, Ryusuke Konishi wrote:
> >>> Hi Liu Shixin,
> >>>
> >>> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> >>>> Syzbot reported a NULL pointer dereference:
> >>>>
> >>>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> >>>>  Mem abort info:
> >>>>    ESR = 0x0000000096000004
> >>>>    EC = 0x25: DABT (current EL), IL = 32 bits
> >>>>    SET = 0, FnV = 0
> >>>>    EA = 0, S1PTW = 0
> >>>>    FSC = 0x04: level 0 translation fault
> >>>>  Data abort info:
> >>>>    ISV = 0, ISS = 0x00000004
> >>>>    CM = 0, WnR = 0
> >>>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> >>>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> >>>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >>>>  Modules linked in:
> >>>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> >>>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> >>>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
> >>>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> >>>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>>>  sp : ffff80001290bc00
> >>>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> >>>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> >>>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> >>>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> >>>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> >>>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> >>>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> >>>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> >>>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> >>>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> >>>>  Call trace:
> >>>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> >>>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> >>>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> >>>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> >>>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> >>>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> >>>>   kthread+0x12c/0x158 kernel/kthread.c:376
> >>>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> >>>>
> >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >>>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> >>>>
> >>>> Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
> >>>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> >>>> Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> >>>> ---
> >>>>  fs/nilfs2/segment.c | 1 +
> >>>>  fs/nilfs2/sufile.c  | 2 +-
> >>>>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> >>>> index b4cebad21b48..7be632c15f91 100644
> >>>> --- a/fs/nilfs2/segment.c
> >>>> +++ b/fs/nilfs2/segment.c
> >>>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> >>>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> >>>>
> >>>>                 /* allocate the next next full segment */
> >>>> +               nextnextnum = segbuf->sb_segnum;
> >>>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> >>>>                 if (unlikely(err))
> >>>>                         goto failed_segbuf;
> >>>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> >>>> index 77ff8e95421f..853a8212114f 100644
> >>>> --- a/fs/nilfs2/sufile.c
> >>>> +++ b/fs/nilfs2/sufile.c
> >>>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> >>>>                 goto out_sem;
> >>>>         kaddr = kmap_atomic(header_bh->b_page);
> >>>>         header = kaddr + bh_offset(header_bh);
> >>>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> >>>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> >>>>         kunmap_atomic(kaddr);
> >>>>
> >>>>         nsegments = nilfs_sufile_get_nsegments(sufile);
> >>>> --
> >>>> 2.25.1
> >>> Thank you for your help.   I have a few questions, so I'll ask them below.
> >>>
> >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> >>>> add a bh in two segbuf->sb_segsum_buffers.
> >>> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> >>> is updated.
> >>> all segment allocation must be done through nilfs_sufile_alloc().
> >>> And, the allocated segment is marked dirty on the sufile not to be
> >>> reallocated until it's freed.
> >>>
> >>> So, why is it happening that the same segment is allocated twice in a log ?
> >>> Is it hard to fix the problem by correcting the calling sequence of
> >>> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> >>> ?
> >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> >> all the time, so last_alloc is not updated.
> >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> >> we will get sb_segnum again.
> >>
> >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> >>> I haven't looked closely at this patch yet, but I'm concerned about
> >>> the impact on other places as well.
> >>> nilfs_sufile_alloc() is also used in
> >>> nilfs_segctor_begin_construction() and
> >>> nilfs_prepare_segment_for_recovery().  Are there any side effects?
> >>>
> >>> This patch turns an output-only argument into both input and output,
> >>> and that input value is always used in the calculation of
> >>> "last_alloc".
> >>> So, this change requires all callers to pass a meaningful initial
> >>> value (at least a valid value) to *segnump.
> >>>
> >>> Another question, will this work near the end of the segments ?
> >>> Since segments are used cyclically, wouldn't comparison with the max
> >>> function break down there?
> >>> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
> >> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
> >> and seems to need improvement. Maybe there is a more reasonable solution.
> >>
> >> Thanks,
> > I will explain step by step.
> >
> >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> > nilfs->ns_segnum holds the segment number where nilfs2 currently
> > writes logs to, and nilfs->ns_nextnum holds the next segment number
> > where nilfs2 will select to write logs when the current segment gets
> > full.
> > Here, a segment is a fixed sized container of logs (*), in which
> > multiple logs can be appended and written in.
> > (*) Exceptionally, segment #0 is shorter than others to exclude the
> > superblock area.
> >
> > The reason "nextnum" is needed is that mount-time recovery processing
> > needs to know which segment the logs follow.
> >
> > Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
> > allocated on the sufile.
> > Here, "allocated" means the corresponding segment usage entry is
> > marked "dirty" on the sufile.
> > In nilfs_sufile_alloc() this is done with the following line:
> >
> >                         nilfs_segment_usage_set_dirty(su);
> >
> > On the other hand, the test whether the segment is allocatable or not
> > is done at the following part:
> >
> >                         if (!nilfs_segment_usage_clean(su))
> >                                 continue;
> >                         /* found a clean segment */
> >
> > Coming back to your last question, if nilfs->ns_segnum !=
> > nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
> > the segment of ns_segnum.
> > Both should be in the allocated state, so there is no need to call
> > nilfs_sufile_alloc() in this case.
> >
> > If nilfs->ns_segnum == nilfs->ns_nextnum in
> > nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
> > sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
> > is full and no more logs can be written.
> > In this case, a new segment must be allocated for new ns_nextnum,
> > therefore nilfs_sufile_alloc() will be called.
> >
> >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> >> all the time, so last_alloc is not updated.
> > As I explained above, it is expected behavior that
> > nilfs_sufile_alloc() is not called and last_alloc is not updated if
> > nilfs->ns_segnum != nilfs->ns_nextnum.
> >
> > nilfs_segbuf_map() is always called there to set the geometry
> > information in the segbuf.  This is expected behavior.
> > And, nilfs_segbuf_set_next_segnum() is called later to set the
> > determined ns_nextnum to the segbuf, so this is expected behavior as
> > well.
> >
> >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> >> we will get sb_segnum again.
> > nilfs_segctor_extend_segments() pre-allocates one or more segments
> > large enough to store updated blocks of metadata files that need to be
> > written in a series of logs at once, and sets up a chain of segbufs.
> > (Sorry for the missing function comment).
> >
> > sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
> > segments.  This is expected behavior.
> > And, the sb_nextnum (= next sb_segnum) will be given by
> > nilfs_sufile_alloc().   This is also expected.
> > It looks like the problem is that nilfs_sufile_alloc() here allocates
> > the same segnum again.
> Thanks to your series of explanations, I learned a lot. So I can assume that
> the skip of nilfs_sufile_alloc() here is because it's already been allocated. So
> in next time, we will skip previous sb_segnum natually. And the problem
> should not be happend.
>
> >
> > Because sb_segnum is set by prev->sb_nextnum which is allocated by the
> > previous nilfs_sufile_alloc() call,
> > this usually does not happen.
> >
> > A possible anomaly is if the segment pointed by the first nextnum (or
> > segnum) was not marked dirty on sufile.
> > This may happen if the sufile is corrupted on the test disk image that
> > syzbot provided (mounted).
> >
> > Can you confirm if this is actually happening?
> This problem was discovered through syzbot and can be reproduced. The root
> reason I think is data corruption rather than logical error. So we should add
> some protection instead of modifying the logic, right?

I think the most likely cause is metadata corruption.  If so, we
should fix it by adding a proper sanity check, yes.
However, there is still the possibility that the error retry logic
after detecting errors has a flaw. (I believe at least this is not a
problem with normal paths.)
The sufile state inconsistency above is hypothetical for now.
Either way, I'd like to make sure what's actually happening (and where
the anomaly is coming from) so we can decide how to fix it.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 15:13             ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 15:13 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

Hi Liu Shixin,

On Tue, Nov 8, 2022 at 9:24 PM Ryusuke Konishi wrote:
>
> On Tue, Nov 8, 2022 at 8:53 PM Liu Shixin wrote:
> >
> >
> >
> > On 2022/11/8 18:33, Ryusuke Konishi wrote:
> > > On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
> > >>
> > >>
> > >> On 2022/11/8 12:41, Ryusuke Konishi wrote:
> > >>> Hi Liu Shixin,
> > >>>
> > >>> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> > >>>> Syzbot reported a NULL pointer dereference:
> > >>>>
> > >>>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> > >>>>  Mem abort info:
> > >>>>    ESR = 0x0000000096000004
> > >>>>    EC = 0x25: DABT (current EL), IL = 32 bits
> > >>>>    SET = 0, FnV = 0
> > >>>>    EA = 0, S1PTW = 0
> > >>>>    FSC = 0x04: level 0 translation fault
> > >>>>  Data abort info:
> > >>>>    ISV = 0, ISS = 0x00000004
> > >>>>    CM = 0, WnR = 0
> > >>>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> > >>>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> > >>>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > >>>>  Modules linked in:
> > >>>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> > >>>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> > >>>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > >>>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
> > >>>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> > >>>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> > >>>>  sp : ffff80001290bc00
> > >>>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> > >>>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> > >>>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> > >>>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> > >>>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> > >>>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> > >>>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> > >>>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> > >>>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> > >>>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> > >>>>  Call trace:
> > >>>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> > >>>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> > >>>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> > >>>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> > >>>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> > >>>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> > >>>>   kthread+0x12c/0x158 kernel/kthread.c:376
> > >>>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> > >>>>
> > >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > >>>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> > >>>>
> > >>>> Reported-by: syzbot+77e4f005cb899d4268d1@syzkaller.appspotmail.com
> > >>>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> > >>>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> > >>>> ---
> > >>>>  fs/nilfs2/segment.c | 1 +
> > >>>>  fs/nilfs2/sufile.c  | 2 +-
> > >>>>  2 files changed, 2 insertions(+), 1 deletion(-)
> > >>>>
> > >>>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> > >>>> index b4cebad21b48..7be632c15f91 100644
> > >>>> --- a/fs/nilfs2/segment.c
> > >>>> +++ b/fs/nilfs2/segment.c
> > >>>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> > >>>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> > >>>>
> > >>>>                 /* allocate the next next full segment */
> > >>>> +               nextnextnum = segbuf->sb_segnum;
> > >>>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> > >>>>                 if (unlikely(err))
> > >>>>                         goto failed_segbuf;
> > >>>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> > >>>> index 77ff8e95421f..853a8212114f 100644
> > >>>> --- a/fs/nilfs2/sufile.c
> > >>>> +++ b/fs/nilfs2/sufile.c
> > >>>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> > >>>>                 goto out_sem;
> > >>>>         kaddr = kmap_atomic(header_bh->b_page);
> > >>>>         header = kaddr + bh_offset(header_bh);
> > >>>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> > >>>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> > >>>>         kunmap_atomic(kaddr);
> > >>>>
> > >>>>         nsegments = nilfs_sufile_get_nsegments(sufile);
> > >>>> --
> > >>>> 2.25.1
> > >>> Thank you for your help.   I have a few questions, so I'll ask them below.
> > >>>
> > >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > >>>> add a bh in two segbuf->sb_segsum_buffers.
> > >>> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> > >>> is updated.
> > >>> all segment allocation must be done through nilfs_sufile_alloc().
> > >>> And, the allocated segment is marked dirty on the sufile not to be
> > >>> reallocated until it's freed.
> > >>>
> > >>> So, why is it happening that the same segment is allocated twice in a log ?
> > >>> Is it hard to fix the problem by correcting the calling sequence of
> > >>> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> > >>> ?
> > >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> > >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> > >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> > >> all the time, so last_alloc is not updated.
> > >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> > >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> > >> we will get sb_segnum again.
> > >>
> > >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> > >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> > >>> I haven't looked closely at this patch yet, but I'm concerned about
> > >>> the impact on other places as well.
> > >>> nilfs_sufile_alloc() is also used in
> > >>> nilfs_segctor_begin_construction() and
> > >>> nilfs_prepare_segment_for_recovery().  Are there any side effects?
> > >>>
> > >>> This patch turns an output-only argument into both input and output,
> > >>> and that input value is always used in the calculation of
> > >>> "last_alloc".
> > >>> So, this change requires all callers to pass a meaningful initial
> > >>> value (at least a valid value) to *segnump.
> > >>>
> > >>> Another question, will this work near the end of the segments ?
> > >>> Since segments are used cyclically, wouldn't comparison with the max
> > >>> function break down there?
> > >>> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
> > >> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
> > >> and seems to need improvement. Maybe there is a more reasonable solution.
> > >>
> > >> Thanks,
> > > I will explain step by step.
> > >
> > >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> > >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> > > nilfs->ns_segnum holds the segment number where nilfs2 currently
> > > writes logs to, and nilfs->ns_nextnum holds the next segment number
> > > where nilfs2 will select to write logs when the current segment gets
> > > full.
> > > Here, a segment is a fixed sized container of logs (*), in which
> > > multiple logs can be appended and written in.
> > > (*) Exceptionally, segment #0 is shorter than others to exclude the
> > > superblock area.
> > >
> > > The reason "nextnum" is needed is that mount-time recovery processing
> > > needs to know which segment the logs follow.
> > >
> > > Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
> > > allocated on the sufile.
> > > Here, "allocated" means the corresponding segment usage entry is
> > > marked "dirty" on the sufile.
> > > In nilfs_sufile_alloc() this is done with the following line:
> > >
> > >                         nilfs_segment_usage_set_dirty(su);
> > >
> > > On the other hand, the test whether the segment is allocatable or not
> > > is done at the following part:
> > >
> > >                         if (!nilfs_segment_usage_clean(su))
> > >                                 continue;
> > >                         /* found a clean segment */
> > >
> > > Coming back to your last question, if nilfs->ns_segnum !=
> > > nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
> > > the segment of ns_segnum.
> > > Both should be in the allocated state, so there is no need to call
> > > nilfs_sufile_alloc() in this case.
> > >
> > > If nilfs->ns_segnum == nilfs->ns_nextnum in
> > > nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
> > > sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
> > > is full and no more logs can be written.
> > > In this case, a new segment must be allocated for new ns_nextnum,
> > > therefore nilfs_sufile_alloc() will be called.
> > >
> > >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> > >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> > >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> > >> all the time, so last_alloc is not updated.
> > > As I explained above, it is expected behavior that
> > > nilfs_sufile_alloc() is not called and last_alloc is not updated if
> > > nilfs->ns_segnum != nilfs->ns_nextnum.
> > >
> > > nilfs_segbuf_map() is always called there to set the geometry
> > > information in the segbuf.  This is expected behavior.
> > > And, nilfs_segbuf_set_next_segnum() is called later to set the
> > > determined ns_nextnum to the segbuf, so this is expected behavior as
> > > well.
> > >
> > >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> > >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> > >> we will get sb_segnum again.
> > > nilfs_segctor_extend_segments() pre-allocates one or more segments
> > > large enough to store updated blocks of metadata files that need to be
> > > written in a series of logs at once, and sets up a chain of segbufs.
> > > (Sorry for the missing function comment).
> > >
> > > sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
> > > segments.  This is expected behavior.
> > > And, the sb_nextnum (= next sb_segnum) will be given by
> > > nilfs_sufile_alloc().   This is also expected.
> > > It looks like the problem is that nilfs_sufile_alloc() here allocates
> > > the same segnum again.
> > Thanks to your series of explanations, I learned a lot. So I can assume that
> > the skip of nilfs_sufile_alloc() here is because it's already been allocated. So
> > in next time, we will skip previous sb_segnum natually. And the problem
> > should not be happend.
> >
> > >
> > > Because sb_segnum is set by prev->sb_nextnum which is allocated by the
> > > previous nilfs_sufile_alloc() call,
> > > this usually does not happen.
> > >
> > > A possible anomaly is if the segment pointed by the first nextnum (or
> > > segnum) was not marked dirty on sufile.
> > > This may happen if the sufile is corrupted on the test disk image that
> > > syzbot provided (mounted).
> > >
> > > Can you confirm if this is actually happening?
> > This problem was discovered through syzbot and can be reproduced. The root
> > reason I think is data corruption rather than logical error. So we should add
> > some protection instead of modifying the logic, right?
>
> I think the most likely cause is metadata corruption.  If so, we
> should fix it by adding a proper sanity check, yes.
> However, there is still the possibility that the error retry logic
> after detecting errors has a flaw. (I believe at least this is not a
> problem with normal paths.)
> The sufile state inconsistency above is hypothetical for now.
> Either way, I'd like to make sure what's actually happening (and where
> the anomaly is coming from) so we can decide how to fix it.

I noticed that this syzbot report has a disk image "mount_0.gz", so I
tried to mount it read-only.
The result was as follows:

$ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
$ lssu
              SEGNUM        DATE     TIME STAT     NBLOCKS
                   0  26760730-10-29 19:40:00  -de   527958016
                   1  26760730-11-01 20:29:04  -de   527958016
                   2  1176433641-11-01 02:01:52  -de  2983038235
                   3  -1158249729-11-01 04:57:19  a--       25375
                   4  1970-01-02 21:24:32  a--           0
                   5  -1415215929-01-02 00:19:15  --e  1631451365
                   6  841067190-01-02 13:02:59  -d-  3101461260
                   7  1495233207-01-02 01:31:11  -de  1697748441
                   8  1875753393-01-02 21:54:14  -de   337757600
                   9  648878284-01-02 21:06:08  ---  2133388152
                  10  2122778986-01-02 17:49:43  --e   874605800
                  11  55846197-01-02 09:50:53  -de  4262365368
                  12  1872396026-01-02 06:45:18  ---  1051768258
                  13  820920473-01-02 19:28:35  --e  2932336675
                  14  2128306755-01-02 10:17:45  ---  3568501216
                  15  1457063063-01-02 01:24:17  --e  2330511560
                  16  586864775-01-02 16:08:15  ---   355283425
                  17  -824870041-01-02 22:47:26  -d-  4177999057
                  18  1562176264-01-02 08:06:45  ---  1312597355
                  19  -392925420-01-02 17:08:27  -d-  3811075948
                  20  1927575458-01-02 21:26:51  -de  1384562525
                  21  2139923505-01-02 08:16:04  -d-    41861305

Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
ns_nextnum) has the dirty flag
("d" in STAT).   So, as expected, this seems to be the root cause of
the duplicate assignments and oops.
The proper correction would be, therefore, to check and reject (or
fix) this anomaly while outputting an error message
(or warning if fix it at the same time).
It should be inserted in mount time logic because the segments that
nilfs2 itself allocates are always marked dirty with
nilfs_sufile_alloc().

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 15:13             ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 15:13 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Liu Shixin,

On Tue, Nov 8, 2022 at 9:24 PM Ryusuke Konishi wrote:
>
> On Tue, Nov 8, 2022 at 8:53 PM Liu Shixin wrote:
> >
> >
> >
> > On 2022/11/8 18:33, Ryusuke Konishi wrote:
> > > On Tue, Nov 8, 2022 at 3:49 PM Liu Shixin wrote:
> > >>
> > >>
> > >> On 2022/11/8 12:41, Ryusuke Konishi wrote:
> > >>> Hi Liu Shixin,
> > >>>
> > >>> On Tue, Nov 8, 2022 at 10:41 AM Liu Shixin wrote:
> > >>>> Syzbot reported a NULL pointer dereference:
> > >>>>
> > >>>>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
> > >>>>  Mem abort info:
> > >>>>    ESR = 0x0000000096000004
> > >>>>    EC = 0x25: DABT (current EL), IL = 32 bits
> > >>>>    SET = 0, FnV = 0
> > >>>>    EA = 0, S1PTW = 0
> > >>>>    FSC = 0x04: level 0 translation fault
> > >>>>  Data abort info:
> > >>>>    ISV = 0, ISS = 0x00000004
> > >>>>    CM = 0, WnR = 0
> > >>>>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108bcf000
> > >>>>  [0000000000000168] pgd=0000000000000000, p4d=0000000000000000
> > >>>>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > >>>>  Modules linked in:
> > >>>>  CPU: 1 PID: 3032 Comm: segctord Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
> > >>>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> > >>>>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > >>>>  pc : _compound_head include/linux/page-flags.h:253 [inline]
> > >>>>  pc : lock_page+0x28/0x1e0 include/linux/pagemap.h:958
> > >>>>  lr : lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> > >>>>  sp : ffff80001290bc00
> > >>>>  x29: ffff80001290bc00 x28: ffff80001290bde0 x27: 000000000000001b
> > >>>>  x26: fffffc000330d7c0 x25: ffff0000caa56d68 x24: ffff0000ca9fb1c0
> > >>>>  x23: 0000000000000080 x22: ffff0000ca9fb130 x21: 0000000000000160
> > >>>>  x20: ffff0000c91e10b8 x19: 0000000000000160 x18: 00000000000000c0
> > >>>>  x17: ffff80000dd0b198 x16: ffff80000db49158 x15: ffff0000c3e63500
> > >>>>  x14: 0000000000000000 x13: 00000000ffffffff x12: ffff0000c3e63500
> > >>>>  x11: ff808000095d1a0c x10: 0000000000000000 x9 : 0000000000000000
> > >>>>  x8 : 0000000000000000 x7 : ffff80000856806c x6 : 0000000000000000
> > >>>>  x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000000
> > >>>>  x2 : 0000000000000000 x1 : ffff80000cb431b1 x0 : 0000000000000000
> > >>>>  Call trace:
> > >>>>   lock_page+0x28/0x1e0 include/linux/pagemap.h:956
> > >>>>   nilfs_segctor_prepare_write+0x6c/0x21c fs/nilfs2/segment.c:1658
> > >>>>   nilfs_segctor_do_construct+0x9f4/0xee8 fs/nilfs2/segment.c:2068
> > >>>>   nilfs_segctor_construct+0xa0/0x380 fs/nilfs2/segment.c:2375
> > >>>>   nilfs_segctor_thread_construct fs/nilfs2/segment.c:2483 [inline]
> > >>>>   nilfs_segctor_thread+0x180/0x660 fs/nilfs2/segment.c:2566
> > >>>>   kthread+0x12c/0x158 kernel/kthread.c:376
> > >>>>   ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> > >>>>
> > >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > >>>> add a bh in two segbuf->sb_segsum_buffers. And finally cause list error.
> > >>>>
> > >>>> Reported-by: syzbot+77e4f005cb899d4268d1-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
> > >>>> Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
> > >>>> Signed-off-by: Liu Shixin <liushixin2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > >>>> ---
> > >>>>  fs/nilfs2/segment.c | 1 +
> > >>>>  fs/nilfs2/sufile.c  | 2 +-
> > >>>>  2 files changed, 2 insertions(+), 1 deletion(-)
> > >>>>
> > >>>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> > >>>> index b4cebad21b48..7be632c15f91 100644
> > >>>> --- a/fs/nilfs2/segment.c
> > >>>> +++ b/fs/nilfs2/segment.c
> > >>>> @@ -1371,6 +1371,7 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> > >>>>                 sci->sc_segbuf_nblocks += segbuf->sb_rest_blocks;
> > >>>>
> > >>>>                 /* allocate the next next full segment */
> > >>>> +               nextnextnum = segbuf->sb_segnum;
> > >>>>                 err = nilfs_sufile_alloc(sufile, &nextnextnum);
> > >>>>                 if (unlikely(err))
> > >>>>                         goto failed_segbuf;
> > >>>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> > >>>> index 77ff8e95421f..853a8212114f 100644
> > >>>> --- a/fs/nilfs2/sufile.c
> > >>>> +++ b/fs/nilfs2/sufile.c
> > >>>> @@ -317,7 +317,7 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
> > >>>>                 goto out_sem;
> > >>>>         kaddr = kmap_atomic(header_bh->b_page);
> > >>>>         header = kaddr + bh_offset(header_bh);
> > >>>> -       last_alloc = le64_to_cpu(header->sh_last_alloc);
> > >>>> +       last_alloc = max(le64_to_cpu(header->sh_last_alloc), *segnump);
> > >>>>         kunmap_atomic(kaddr);
> > >>>>
> > >>>>         nsegments = nilfs_sufile_get_nsegments(sufile);
> > >>>> --
> > >>>> 2.25.1
> > >>> Thank you for your help.   I have a few questions, so I'll ask them below.
> > >>>
> > >>>> If didn't call nilfs_sufile_alloc() in nilfs_segctor_begin_construction(),
> > >>>> nilfs_sufile_header's sh_last_alloc is not updated. In such case, we will
> > >>>> add a bh in two segbuf->sb_segsum_buffers.
> > >>> If nilfs_sufile_alloc() succeeds to allocate a segment, sh_last_alloc
> > >>> is updated.
> > >>> all segment allocation must be done through nilfs_sufile_alloc().
> > >>> And, the allocated segment is marked dirty on the sufile not to be
> > >>> reallocated until it's freed.
> > >>>
> > >>> So, why is it happening that the same segment is allocated twice in a log ?
> > >>> Is it hard to fix the problem by correcting the calling sequence of
> > >>> nilfs_sufile_alloc()/free()/etc without touching nilfs_sufile_alloc()
> > >>> ?
> > >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> > >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> > >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> > >> all the time, so last_alloc is not updated.
> > >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> > >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> > >> we will get sb_segnum again.
> > >>
> > >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> > >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> > >>> I haven't looked closely at this patch yet, but I'm concerned about
> > >>> the impact on other places as well.
> > >>> nilfs_sufile_alloc() is also used in
> > >>> nilfs_segctor_begin_construction() and
> > >>> nilfs_prepare_segment_for_recovery().  Are there any side effects?
> > >>>
> > >>> This patch turns an output-only argument into both input and output,
> > >>> and that input value is always used in the calculation of
> > >>> "last_alloc".
> > >>> So, this change requires all callers to pass a meaningful initial
> > >>> value (at least a valid value) to *segnump.
> > >>>
> > >>> Another question, will this work near the end of the segments ?
> > >>> Since segments are used cyclically, wouldn't comparison with the max
> > >>> function break down there?
> > >>> (I mean it seems that sh_last_alloc may be chosen unintentionally at the end.)
> > >> Thanks for the heads-up,I need to look at it again. This patch can only prevent this problem,
> > >> and seems to need improvement. Maybe there is a more reasonable solution.
> > >>
> > >> Thanks,
> > > I will explain step by step.
> > >
> > >> By the way, I still don't understand why skip nilfs_sufile_alloc() in some cases and why
> > >> nilfs->ns_segnum != nilfs->ns_nextnum. Do you have any ideas?
> > > nilfs->ns_segnum holds the segment number where nilfs2 currently
> > > writes logs to, and nilfs->ns_nextnum holds the next segment number
> > > where nilfs2 will select to write logs when the current segment gets
> > > full.
> > > Here, a segment is a fixed sized container of logs (*), in which
> > > multiple logs can be appended and written in.
> > > (*) Exceptionally, segment #0 is shorter than others to exclude the
> > > superblock area.
> > >
> > > The reason "nextnum" is needed is that mount-time recovery processing
> > > needs to know which segment the logs follow.
> > >
> > > Both segments (nilfs->ns_segnum and nilfs->ns_nextnum) must be already
> > > allocated on the sufile.
> > > Here, "allocated" means the corresponding segment usage entry is
> > > marked "dirty" on the sufile.
> > > In nilfs_sufile_alloc() this is done with the following line:
> > >
> > >                         nilfs_segment_usage_set_dirty(su);
> > >
> > > On the other hand, the test whether the segment is allocatable or not
> > > is done at the following part:
> > >
> > >                         if (!nilfs_segment_usage_clean(su))
> > >                                 continue;
> > >                         /* found a clean segment */
> > >
> > > Coming back to your last question, if nilfs->ns_segnum !=
> > > nilfs->ns_nextnum, it indicates that nilfs2 is still writing a log to
> > > the segment of ns_segnum.
> > > Both should be in the allocated state, so there is no need to call
> > > nilfs_sufile_alloc() in this case.
> > >
> > > If nilfs->ns_segnum == nilfs->ns_nextnum in
> > > nilfs_segctor_begin_construction(), then nilfs_shift_to_next_segment()
> > > sets ns_segnum to ns_nextnum because the current segment (ns_segnum)
> > > is full and no more logs can be written.
> > > In this case, a new segment must be allocated for new ns_nextnum,
> > > therefore nilfs_sufile_alloc() will be called.
> > >
> > >> The problem happened when we call nilfs_segctor_begin_construction() and satisfied
> > >> condition nilfs->ns_segnum != nilfs->ns_nextnum. In such scenario, nilfs_sufile_alloc()
> > >> will be skipped, but we call nilfs_segbuf_map() and nilfs_segbuf_set_next_segnum()
> > >> all the time, so last_alloc is not updated.
> > > As I explained above, it is expected behavior that
> > > nilfs_sufile_alloc() is not called and last_alloc is not updated if
> > > nilfs->ns_segnum != nilfs->ns_nextnum.
> > >
> > > nilfs_segbuf_map() is always called there to set the geometry
> > > information in the segbuf.  This is expected behavior.
> > > And, nilfs_segbuf_set_next_segnum() is called later to set the
> > > determined ns_nextnum to the segbuf, so this is expected behavior as
> > > well.
> > >
> > >> Then in nilfs_segctor_extend_segments(), we set sb_segnum by prev->sb_nextnum directly,
> > >> and calculate next sb_segnum by nilfs_sufile_alloc(), since last_alloc is not updated,
> > >> we will get sb_segnum again.
> > > nilfs_segctor_extend_segments() pre-allocates one or more segments
> > > large enough to store updated blocks of metadata files that need to be
> > > written in a series of logs at once, and sets up a chain of segbufs.
> > > (Sorry for the missing function comment).
> > >
> > > sb_segnum is set by prev->sb_nextnum to form a chain of buffers for
> > > segments.  This is expected behavior.
> > > And, the sb_nextnum (= next sb_segnum) will be given by
> > > nilfs_sufile_alloc().   This is also expected.
> > > It looks like the problem is that nilfs_sufile_alloc() here allocates
> > > the same segnum again.
> > Thanks to your series of explanations, I learned a lot. So I can assume that
> > the skip of nilfs_sufile_alloc() here is because it's already been allocated. So
> > in next time, we will skip previous sb_segnum natually. And the problem
> > should not be happend.
> >
> > >
> > > Because sb_segnum is set by prev->sb_nextnum which is allocated by the
> > > previous nilfs_sufile_alloc() call,
> > > this usually does not happen.
> > >
> > > A possible anomaly is if the segment pointed by the first nextnum (or
> > > segnum) was not marked dirty on sufile.
> > > This may happen if the sufile is corrupted on the test disk image that
> > > syzbot provided (mounted).
> > >
> > > Can you confirm if this is actually happening?
> > This problem was discovered through syzbot and can be reproduced. The root
> > reason I think is data corruption rather than logical error. So we should add
> > some protection instead of modifying the logic, right?
>
> I think the most likely cause is metadata corruption.  If so, we
> should fix it by adding a proper sanity check, yes.
> However, there is still the possibility that the error retry logic
> after detecting errors has a flaw. (I believe at least this is not a
> problem with normal paths.)
> The sufile state inconsistency above is hypothetical for now.
> Either way, I'd like to make sure what's actually happening (and where
> the anomaly is coming from) so we can decide how to fix it.

I noticed that this syzbot report has a disk image "mount_0.gz", so I
tried to mount it read-only.
The result was as follows:

$ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
$ lssu
              SEGNUM        DATE     TIME STAT     NBLOCKS
                   0  26760730-10-29 19:40:00  -de   527958016
                   1  26760730-11-01 20:29:04  -de   527958016
                   2  1176433641-11-01 02:01:52  -de  2983038235
                   3  -1158249729-11-01 04:57:19  a--       25375
                   4  1970-01-02 21:24:32  a--           0
                   5  -1415215929-01-02 00:19:15  --e  1631451365
                   6  841067190-01-02 13:02:59  -d-  3101461260
                   7  1495233207-01-02 01:31:11  -de  1697748441
                   8  1875753393-01-02 21:54:14  -de   337757600
                   9  648878284-01-02 21:06:08  ---  2133388152
                  10  2122778986-01-02 17:49:43  --e   874605800
                  11  55846197-01-02 09:50:53  -de  4262365368
                  12  1872396026-01-02 06:45:18  ---  1051768258
                  13  820920473-01-02 19:28:35  --e  2932336675
                  14  2128306755-01-02 10:17:45  ---  3568501216
                  15  1457063063-01-02 01:24:17  --e  2330511560
                  16  586864775-01-02 16:08:15  ---   355283425
                  17  -824870041-01-02 22:47:26  -d-  4177999057
                  18  1562176264-01-02 08:06:45  ---  1312597355
                  19  -392925420-01-02 17:08:27  -d-  3811075948
                  20  1927575458-01-02 21:26:51  -de  1384562525
                  21  2139923505-01-02 08:16:04  -d-    41861305

Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
ns_nextnum) has the dirty flag
("d" in STAT).   So, as expected, this seems to be the root cause of
the duplicate assignments and oops.
The proper correction would be, therefore, to check and reject (or
fix) this anomaly while outputting an error message
(or warning if fix it at the same time).
It should be inserted in mount time logic because the segments that
nilfs2 itself allocates are always marked dirty with
nilfs_sufile_alloc().

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 17:50               ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 17:50 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

Hi Liu Shixin,

I'm sorry for my repeated emails.

On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote:
> >
> > I think the most likely cause is metadata corruption.  If so, we
> > should fix it by adding a proper sanity check, yes.
> > However, there is still the possibility that the error retry logic
> > after detecting errors has a flaw. (I believe at least this is not a
> > problem with normal paths.)
> > The sufile state inconsistency above is hypothetical for now.
> > Either way, I'd like to make sure what's actually happening (and where
> > the anomaly is coming from) so we can decide how to fix it.
>
> I noticed that this syzbot report has a disk image "mount_0.gz", so I
> tried to mount it read-only.
> The result was as follows:
>
> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
> $ lssu
>               SEGNUM        DATE     TIME STAT     NBLOCKS
>                    0  26760730-10-29 19:40:00  -de   527958016
>                    1  26760730-11-01 20:29:04  -de   527958016
>                    2  1176433641-11-01 02:01:52  -de  2983038235
>                    3  -1158249729-11-01 04:57:19  a--       25375
>                    4  1970-01-02 21:24:32  a--           0
>                    5  -1415215929-01-02 00:19:15  --e  1631451365
>                    6  841067190-01-02 13:02:59  -d-  3101461260
>                    7  1495233207-01-02 01:31:11  -de  1697748441
>                    8  1875753393-01-02 21:54:14  -de   337757600
>                    9  648878284-01-02 21:06:08  ---  2133388152
>                   10  2122778986-01-02 17:49:43  --e   874605800
>                   11  55846197-01-02 09:50:53  -de  4262365368
>                   12  1872396026-01-02 06:45:18  ---  1051768258
>                   13  820920473-01-02 19:28:35  --e  2932336675
>                   14  2128306755-01-02 10:17:45  ---  3568501216
>                   15  1457063063-01-02 01:24:17  --e  2330511560
>                   16  586864775-01-02 16:08:15  ---   355283425
>                   17  -824870041-01-02 22:47:26  -d-  4177999057
>                   18  1562176264-01-02 08:06:45  ---  1312597355
>                   19  -392925420-01-02 17:08:27  -d-  3811075948
>                   20  1927575458-01-02 21:26:51  -de  1384562525
>                   21  2139923505-01-02 08:16:04  -d-    41861305
>
> Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
> ns_nextnum) has the dirty flag
> ("d" in STAT).   So, as expected, this seems to be the root cause of
> the duplicate assignments and oops.
> The proper correction would be, therefore, to check and reject (or
> fix) this anomaly while outputting an error message
> (or warning if fix it at the same time).

> It should be inserted in mount time logic because the segments that
> nilfs2 itself allocates are always marked dirty with
> nilfs_sufile_alloc().

I will change my opinion.

Considering the possibility of sufile corruption at runtime, it seems
that the sanity check should be done on every nilfs_sufile_alloc()
call.

I now think nilfs_sufile_alloc() should call nilfs_error() and return
-EIO if the number of a newly found vacant segment matches
nilfs->ns_segnum or nilfs->ns_nextnum.

This test should not need to be done for every segbuf's sb_segnum
because metadata blocks that contain the information about allocated
segments are kept in memory because of its dirty state and will not be
destroyed until they are finally written out.

One correction then.  Just checking the output of the lssu command
wasn't enough because the following test is done on all flags except
the active flag "a".  (this flag is not persistent and visible only
when seeing via ioctl.)

>                         if (!nilfs_segment_usage_clean(su))
>                                 continue;
>                         /* found a clean segment */

We also had to see the invisible flags as well to confirm if that is
the root cause, but lssu doens't show them all.  So I checked the dump
data of the mount_0 file.
As a result, the segment at ns_segnum was not clean, but that of
ns_nextnum was clean, which means it's the root cause as expected.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-08 17:50               ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-08 17:50 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Liu Shixin,

I'm sorry for my repeated emails.

On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote:
> >
> > I think the most likely cause is metadata corruption.  If so, we
> > should fix it by adding a proper sanity check, yes.
> > However, there is still the possibility that the error retry logic
> > after detecting errors has a flaw. (I believe at least this is not a
> > problem with normal paths.)
> > The sufile state inconsistency above is hypothetical for now.
> > Either way, I'd like to make sure what's actually happening (and where
> > the anomaly is coming from) so we can decide how to fix it.
>
> I noticed that this syzbot report has a disk image "mount_0.gz", so I
> tried to mount it read-only.
> The result was as follows:
>
> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
> $ lssu
>               SEGNUM        DATE     TIME STAT     NBLOCKS
>                    0  26760730-10-29 19:40:00  -de   527958016
>                    1  26760730-11-01 20:29:04  -de   527958016
>                    2  1176433641-11-01 02:01:52  -de  2983038235
>                    3  -1158249729-11-01 04:57:19  a--       25375
>                    4  1970-01-02 21:24:32  a--           0
>                    5  -1415215929-01-02 00:19:15  --e  1631451365
>                    6  841067190-01-02 13:02:59  -d-  3101461260
>                    7  1495233207-01-02 01:31:11  -de  1697748441
>                    8  1875753393-01-02 21:54:14  -de   337757600
>                    9  648878284-01-02 21:06:08  ---  2133388152
>                   10  2122778986-01-02 17:49:43  --e   874605800
>                   11  55846197-01-02 09:50:53  -de  4262365368
>                   12  1872396026-01-02 06:45:18  ---  1051768258
>                   13  820920473-01-02 19:28:35  --e  2932336675
>                   14  2128306755-01-02 10:17:45  ---  3568501216
>                   15  1457063063-01-02 01:24:17  --e  2330511560
>                   16  586864775-01-02 16:08:15  ---   355283425
>                   17  -824870041-01-02 22:47:26  -d-  4177999057
>                   18  1562176264-01-02 08:06:45  ---  1312597355
>                   19  -392925420-01-02 17:08:27  -d-  3811075948
>                   20  1927575458-01-02 21:26:51  -de  1384562525
>                   21  2139923505-01-02 08:16:04  -d-    41861305
>
> Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
> ns_nextnum) has the dirty flag
> ("d" in STAT).   So, as expected, this seems to be the root cause of
> the duplicate assignments and oops.
> The proper correction would be, therefore, to check and reject (or
> fix) this anomaly while outputting an error message
> (or warning if fix it at the same time).

> It should be inserted in mount time logic because the segments that
> nilfs2 itself allocates are always marked dirty with
> nilfs_sufile_alloc().

I will change my opinion.

Considering the possibility of sufile corruption at runtime, it seems
that the sanity check should be done on every nilfs_sufile_alloc()
call.

I now think nilfs_sufile_alloc() should call nilfs_error() and return
-EIO if the number of a newly found vacant segment matches
nilfs->ns_segnum or nilfs->ns_nextnum.

This test should not need to be done for every segbuf's sb_segnum
because metadata blocks that contain the information about allocated
segments are kept in memory because of its dirty state and will not be
destroyed until they are finally written out.

One correction then.  Just checking the output of the lssu command
wasn't enough because the following test is done on all flags except
the active flag "a".  (this flag is not persistent and visible only
when seeing via ioctl.)

>                         if (!nilfs_segment_usage_clean(su))
>                                 continue;
>                         /* found a clean segment */

We also had to see the invisible flags as well to confirm if that is
the root cause, but lssu doens't show them all.  So I checked the dump
data of the mount_0 file.
As a result, the segment at ns_segnum was not clean, but that of
ns_nextnum was clean, which means it's the root cause as expected.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-09  3:26                 ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-09  3:26 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Andrew Morton, linux-nilfs, linux-kernel



On 2022/11/9 1:50, Ryusuke Konishi wrote:
> Hi Liu Shixin,
>
> I'm sorry for my repeated emails.
>
> On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote:
>>> I think the most likely cause is metadata corruption.  If so, we
>>> should fix it by adding a proper sanity check, yes.
>>> However, there is still the possibility that the error retry logic
>>> after detecting errors has a flaw. (I believe at least this is not a
>>> problem with normal paths.)
>>> The sufile state inconsistency above is hypothetical for now.
>>> Either way, I'd like to make sure what's actually happening (and where
>>> the anomaly is coming from) so we can decide how to fix it.
>> I noticed that this syzbot report has a disk image "mount_0.gz", so I
>> tried to mount it read-only.
>> The result was as follows:
>>
>> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
>> $ lssu
>>               SEGNUM        DATE     TIME STAT     NBLOCKS
>>                    0  26760730-10-29 19:40:00  -de   527958016
>>                    1  26760730-11-01 20:29:04  -de   527958016
>>                    2  1176433641-11-01 02:01:52  -de  2983038235
>>                    3  -1158249729-11-01 04:57:19  a--       25375
>>                    4  1970-01-02 21:24:32  a--           0
>>                    5  -1415215929-01-02 00:19:15  --e  1631451365
>>                    6  841067190-01-02 13:02:59  -d-  3101461260
>>                    7  1495233207-01-02 01:31:11  -de  1697748441
>>                    8  1875753393-01-02 21:54:14  -de   337757600
>>                    9  648878284-01-02 21:06:08  ---  2133388152
>>                   10  2122778986-01-02 17:49:43  --e   874605800
>>                   11  55846197-01-02 09:50:53  -de  4262365368
>>                   12  1872396026-01-02 06:45:18  ---  1051768258
>>                   13  820920473-01-02 19:28:35  --e  2932336675
>>                   14  2128306755-01-02 10:17:45  ---  3568501216
>>                   15  1457063063-01-02 01:24:17  --e  2330511560
>>                   16  586864775-01-02 16:08:15  ---   355283425
>>                   17  -824870041-01-02 22:47:26  -d-  4177999057
>>                   18  1562176264-01-02 08:06:45  ---  1312597355
>>                   19  -392925420-01-02 17:08:27  -d-  3811075948
>>                   20  1927575458-01-02 21:26:51  -de  1384562525
>>                   21  2139923505-01-02 08:16:04  -d-    41861305
>>
>> Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
>> ns_nextnum) has the dirty flag
>> ("d" in STAT).   So, as expected, this seems to be the root cause of
>> the duplicate assignments and oops.
>> The proper correction would be, therefore, to check and reject (or
>> fix) this anomaly while outputting an error message
>> (or warning if fix it at the same time).
>> It should be inserted in mount time logic because the segments that
>> nilfs2 itself allocates are always marked dirty with
>> nilfs_sufile_alloc().
> I will change my opinion.
>
> Considering the possibility of sufile corruption at runtime, it seems
> that the sanity check should be done on every nilfs_sufile_alloc()
> call.
>
> I now think nilfs_sufile_alloc() should call nilfs_error() and return
> -EIO if the number of a newly found vacant segment matches
> nilfs->ns_segnum or nilfs->ns_nextnum.
Before we add the first segbuf into sci->sc_segbufs in nilfs_segctor_begin_construction(),
there is no possibility existing duplicate segnum. And the subsequent process should
not be affected by sufile corruption. So I think it's enough to only check for case alloc==0.
I don't find any other possible wrong scenarios.

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 7be632c15f91..7b91c790b798 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1326,7 +1326,13 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
                err = nilfs_sufile_alloc(nilfs->ns_sufile, &nextnum);
                if (err)
                        goto failed;
+       } else {
+               /* Check the next segment has alreadly been allocated */
+               err = nilfs_sufile_test_allocated(nilfs->ns_sufile, nextnum);
+               if (err)
+                       goto failed;
        }
+
        nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
 
        BUG_ON(!list_empty(&sci->sc_segbufs));
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 853a8212114f..8dff12c56891 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -399,6 +399,36 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
        return ret;
 }
 
+int nilfs_sufile_test_allocated(struct inode *sufile, __u64 *segnump)
+{
+       struct buffer_head *su_bh;
+       struct nilfs_segment_usage *su;
+       void *kaddr;
+       int ret;
+
+       down_write(&NILFS_MDT(sufile)->mi_sem);
+
+       ret = nilfs_sufile_get_segment_usage_block(sufile, segnump, 1,
+                                                  &su_bh);
+       if (ret < 0)
+               goto out_sem;
+
+       kaddr = kmap_atomic(su_bh->b_page);
+       su = nilfs_sufile_block_get_segment_usage(
+               sufile, segnump, su_bh, kaddr);
+
+       if (nilfs_segment_usage_clean(su))
+               ret = -EIO;
+
+       kunmap_atomic(kaddr);
+
+       brelse(su_bh);
+
+out_sem:
+       up_write(&NILFS_MDT(sufile)->mi_sem);
+       return ret;
+}
+
 void nilfs_sufile_do_cancel_free(struct inode *sufile, __u64 segnum,
                                 struct buffer_head *header_bh,
                                 struct buffer_head *su_bh)
diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
index 8e8a1a5a0402..02b61ca6f318 100644
--- a/fs/nilfs2/sufile.h
+++ b/fs/nilfs2/sufile.h
@@ -24,6 +24,7 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode *sufile);
 
 int nilfs_sufile_set_alloc_range(struct inode *sufile, __u64 start, __u64 end);
 int nilfs_sufile_alloc(struct inode *, __u64 *);
+int nilfs_sufile_test_allocated(struct inode *, __u64 *);
 int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
 int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
                                   unsigned long nblocks, time64_t modtime);

>
> This test should not need to be done for every segbuf's sb_segnum
> because metadata blocks that contain the information about allocated
> segments are kept in memory because of its dirty state and will not be
> destroyed until they are finally written out.
>
> One correction then.  Just checking the output of the lssu command
> wasn't enough because the following test is done on all flags except
> the active flag "a".  (this flag is not persistent and visible only
> when seeing via ioctl.)
>
>>                         if (!nilfs_segment_usage_clean(su))
>>                                 continue;
>>                         /* found a clean segment */
> We also had to see the invisible flags as well to confirm if that is
> the root cause, but lssu doens't show them all.  So I checked the dump
> data of the mount_0 file.
> As a result, the segment at ns_segnum was not clean, but that of
> ns_nextnum was clean, which means it's the root cause as expected.
Thanks for your confirmation.

Regards,
Liu Shixin
.
>
> Regards,
> Ryusuke Konishi
> .
>


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-09  3:26                 ` Liu Shixin
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Shixin @ 2022-11-09  3:26 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA



On 2022/11/9 1:50, Ryusuke Konishi wrote:
> Hi Liu Shixin,
>
> I'm sorry for my repeated emails.
>
> On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote:
>>> I think the most likely cause is metadata corruption.  If so, we
>>> should fix it by adding a proper sanity check, yes.
>>> However, there is still the possibility that the error retry logic
>>> after detecting errors has a flaw. (I believe at least this is not a
>>> problem with normal paths.)
>>> The sufile state inconsistency above is hypothetical for now.
>>> Either way, I'd like to make sure what's actually happening (and where
>>> the anomaly is coming from) so we can decide how to fix it.
>> I noticed that this syzbot report has a disk image "mount_0.gz", so I
>> tried to mount it read-only.
>> The result was as follows:
>>
>> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
>> $ lssu
>>               SEGNUM        DATE     TIME STAT     NBLOCKS
>>                    0  26760730-10-29 19:40:00  -de   527958016
>>                    1  26760730-11-01 20:29:04  -de   527958016
>>                    2  1176433641-11-01 02:01:52  -de  2983038235
>>                    3  -1158249729-11-01 04:57:19  a--       25375
>>                    4  1970-01-02 21:24:32  a--           0
>>                    5  -1415215929-01-02 00:19:15  --e  1631451365
>>                    6  841067190-01-02 13:02:59  -d-  3101461260
>>                    7  1495233207-01-02 01:31:11  -de  1697748441
>>                    8  1875753393-01-02 21:54:14  -de   337757600
>>                    9  648878284-01-02 21:06:08  ---  2133388152
>>                   10  2122778986-01-02 17:49:43  --e   874605800
>>                   11  55846197-01-02 09:50:53  -de  4262365368
>>                   12  1872396026-01-02 06:45:18  ---  1051768258
>>                   13  820920473-01-02 19:28:35  --e  2932336675
>>                   14  2128306755-01-02 10:17:45  ---  3568501216
>>                   15  1457063063-01-02 01:24:17  --e  2330511560
>>                   16  586864775-01-02 16:08:15  ---   355283425
>>                   17  -824870041-01-02 22:47:26  -d-  4177999057
>>                   18  1562176264-01-02 08:06:45  ---  1312597355
>>                   19  -392925420-01-02 17:08:27  -d-  3811075948
>>                   20  1927575458-01-02 21:26:51  -de  1384562525
>>                   21  2139923505-01-02 08:16:04  -d-    41861305
>>
>> Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
>> ns_nextnum) has the dirty flag
>> ("d" in STAT).   So, as expected, this seems to be the root cause of
>> the duplicate assignments and oops.
>> The proper correction would be, therefore, to check and reject (or
>> fix) this anomaly while outputting an error message
>> (or warning if fix it at the same time).
>> It should be inserted in mount time logic because the segments that
>> nilfs2 itself allocates are always marked dirty with
>> nilfs_sufile_alloc().
> I will change my opinion.
>
> Considering the possibility of sufile corruption at runtime, it seems
> that the sanity check should be done on every nilfs_sufile_alloc()
> call.
>
> I now think nilfs_sufile_alloc() should call nilfs_error() and return
> -EIO if the number of a newly found vacant segment matches
> nilfs->ns_segnum or nilfs->ns_nextnum.
Before we add the first segbuf into sci->sc_segbufs in nilfs_segctor_begin_construction(),
there is no possibility existing duplicate segnum. And the subsequent process should
not be affected by sufile corruption. So I think it's enough to only check for case alloc==0.
I don't find any other possible wrong scenarios.

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 7be632c15f91..7b91c790b798 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1326,7 +1326,13 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
                err = nilfs_sufile_alloc(nilfs->ns_sufile, &nextnum);
                if (err)
                        goto failed;
+       } else {
+               /* Check the next segment has alreadly been allocated */
+               err = nilfs_sufile_test_allocated(nilfs->ns_sufile, nextnum);
+               if (err)
+                       goto failed;
        }
+
        nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
 
        BUG_ON(!list_empty(&sci->sc_segbufs));
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 853a8212114f..8dff12c56891 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -399,6 +399,36 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
        return ret;
 }
 
+int nilfs_sufile_test_allocated(struct inode *sufile, __u64 *segnump)
+{
+       struct buffer_head *su_bh;
+       struct nilfs_segment_usage *su;
+       void *kaddr;
+       int ret;
+
+       down_write(&NILFS_MDT(sufile)->mi_sem);
+
+       ret = nilfs_sufile_get_segment_usage_block(sufile, segnump, 1,
+                                                  &su_bh);
+       if (ret < 0)
+               goto out_sem;
+
+       kaddr = kmap_atomic(su_bh->b_page);
+       su = nilfs_sufile_block_get_segment_usage(
+               sufile, segnump, su_bh, kaddr);
+
+       if (nilfs_segment_usage_clean(su))
+               ret = -EIO;
+
+       kunmap_atomic(kaddr);
+
+       brelse(su_bh);
+
+out_sem:
+       up_write(&NILFS_MDT(sufile)->mi_sem);
+       return ret;
+}
+
 void nilfs_sufile_do_cancel_free(struct inode *sufile, __u64 segnum,
                                 struct buffer_head *header_bh,
                                 struct buffer_head *su_bh)
diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
index 8e8a1a5a0402..02b61ca6f318 100644
--- a/fs/nilfs2/sufile.h
+++ b/fs/nilfs2/sufile.h
@@ -24,6 +24,7 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode *sufile);
 
 int nilfs_sufile_set_alloc_range(struct inode *sufile, __u64 start, __u64 end);
 int nilfs_sufile_alloc(struct inode *, __u64 *);
+int nilfs_sufile_test_allocated(struct inode *, __u64 *);
 int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
 int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
                                   unsigned long nblocks, time64_t modtime);

>
> This test should not need to be done for every segbuf's sb_segnum
> because metadata blocks that contain the information about allocated
> segments are kept in memory because of its dirty state and will not be
> destroyed until they are finally written out.
>
> One correction then.  Just checking the output of the lssu command
> wasn't enough because the following test is done on all flags except
> the active flag "a".  (this flag is not persistent and visible only
> when seeing via ioctl.)
>
>>                         if (!nilfs_segment_usage_clean(su))
>>                                 continue;
>>                         /* found a clean segment */
> We also had to see the invisible flags as well to confirm if that is
> the root cause, but lssu doens't show them all.  So I checked the dump
> data of the mount_0 file.
> As a result, the segment at ns_segnum was not clean, but that of
> ns_nextnum was clean, which means it's the root cause as expected.
Thanks for your confirmation.

Regards,
Liu Shixin
.
>
> Regards,
> Ryusuke Konishi
> .
>


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-09 14:00                   ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-09 14:00 UTC (permalink / raw)
  To: Liu Shixin; +Cc: Andrew Morton, linux-nilfs, linux-kernel

On Wed, Nov 9, 2022 at 12:26 PM Liu Shixin wrote:
> On 2022/11/9 1:50, Ryusuke Konishi wrote:
> > Hi Liu Shixin,
> >
> > I'm sorry for my repeated emails.
> >
> > On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote:
> >>> I think the most likely cause is metadata corruption.  If so, we
> >>> should fix it by adding a proper sanity check, yes.
> >>> However, there is still the possibility that the error retry logic
> >>> after detecting errors has a flaw. (I believe at least this is not a
> >>> problem with normal paths.)
> >>> The sufile state inconsistency above is hypothetical for now.
> >>> Either way, I'd like to make sure what's actually happening (and where
> >>> the anomaly is coming from) so we can decide how to fix it.
> >> I noticed that this syzbot report has a disk image "mount_0.gz", so I
> >> tried to mount it read-only.
> >> The result was as follows:
> >>
> >> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
> >> $ lssu
> >>               SEGNUM        DATE     TIME STAT     NBLOCKS
> >>                    0  26760730-10-29 19:40:00  -de   527958016
> >>                    1  26760730-11-01 20:29:04  -de   527958016
> >>                    2  1176433641-11-01 02:01:52  -de  2983038235
> >>                    3  -1158249729-11-01 04:57:19  a--       25375
> >>                    4  1970-01-02 21:24:32  a--           0
> >>                    5  -1415215929-01-02 00:19:15  --e  1631451365
> >>                    6  841067190-01-02 13:02:59  -d-  3101461260
> >>                    7  1495233207-01-02 01:31:11  -de  1697748441
> >>                    8  1875753393-01-02 21:54:14  -de   337757600
> >>                    9  648878284-01-02 21:06:08  ---  2133388152
> >>                   10  2122778986-01-02 17:49:43  --e   874605800
> >>                   11  55846197-01-02 09:50:53  -de  4262365368
> >>                   12  1872396026-01-02 06:45:18  ---  1051768258
> >>                   13  820920473-01-02 19:28:35  --e  2932336675
> >>                   14  2128306755-01-02 10:17:45  ---  3568501216
> >>                   15  1457063063-01-02 01:24:17  --e  2330511560
> >>                   16  586864775-01-02 16:08:15  ---   355283425
> >>                   17  -824870041-01-02 22:47:26  -d-  4177999057
> >>                   18  1562176264-01-02 08:06:45  ---  1312597355
> >>                   19  -392925420-01-02 17:08:27  -d-  3811075948
> >>                   20  1927575458-01-02 21:26:51  -de  1384562525
> >>                   21  2139923505-01-02 08:16:04  -d-    41861305
> >>
> >> Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
> >> ns_nextnum) has the dirty flag
> >> ("d" in STAT).   So, as expected, this seems to be the root cause of
> >> the duplicate assignments and oops.
> >> The proper correction would be, therefore, to check and reject (or
> >> fix) this anomaly while outputting an error message
> >> (or warning if fix it at the same time).
> >> It should be inserted in mount time logic because the segments that
> >> nilfs2 itself allocates are always marked dirty with
> >> nilfs_sufile_alloc().
> > I will change my opinion.
> >
> > Considering the possibility of sufile corruption at runtime, it seems
> > that the sanity check should be done on every nilfs_sufile_alloc()
> > call.
> >
> > I now think nilfs_sufile_alloc() should call nilfs_error() and return
> > -EIO if the number of a newly found vacant segment matches
> > nilfs->ns_segnum or nilfs->ns_nextnum.

> Before we add the first segbuf into sci->sc_segbufs in nilfs_segctor_begin_construction(),
> there is no possibility existing duplicate segnum. And the subsequent process should
> not be affected by sufile corruption. So I think it's enough to only check for case alloc==0.
> I don't find any other possible wrong scenarios.

I think the problem is not yet fixed.
With your patch, I still think there are scenarios that cause
duplicate segment assignments:
(1)  If the segment at nilfs->ns_segnum is in a clean state due to
sufile data corruption, and
if sci->sc_write_logs is empty and nilfs->ns_segnum == nilfs->nextnum
to start writing a segment from the beginning, then
nilfs_sufile_alloc() is called in nilfs_segctor_begin_construction()
and nextnum can be
equal to nilfs->ns_segnum.
This nextnum is stored in segbuf->sb_nextnum, and if segbuf is spliced
with nilfs_segctor_extend_segments(), the segbuf->sb_nextnum will be
used for sb_segnum of the added segbuf, and the list corruption of
sb_segsum_buffers can happen between these two segbufs.

(2)  If the segment at nilfs->ns_segnum is in a clean state due to
sufile data corruption, and
if sci->sc_write_logs is not empty and segbuf->sb_rest_blocks <
NILFS_PSEG_MIN_BLOCKS,
then nilfs_sufile_alloc() is called in
nilfs_segctor_begin_construction(), and the retrieved nextnum is set
to
sb_nextnum of the appended segbuf there, and this sb_nextnum can be
equal to nilfs->ns_segnum.
Then, if the segbufs are spliced with nilfs_segctor_extend_segments(),
the last segbuf->sb_nextnum will be used for sb_segnum of the appended
segbuf.  Therefore, list corruption of sb_segsum_buffers can happen.

(3) If the segment at nilfs->ns_segnum is in a clean state due to
sufile data corruption, and then the segbuf is spliced with
nilfs_segctor_extend_segments(), nilfs_sufile_alloc() is called, and
the retrieved "nextnextnum" is set to sb_nextnum of the appended
segbuf, but this can be equal to nilfs->ns_segnum.
If one more segbuf is appended to the segbufs in
nilfs_segctor_extend_segments(), list corruption of
sb_segsum_buffers can happen due to the segnum duplication.

And please call nilfs_error() after detecting the metadata anomaly to
output this critical situation and  degrade the filesystem
appropriately (read-only for example).  This metadata corruption is a
critical situation and writes will continually fail.  File system
operations should not continue as if nothing happened.

I would like to ask you to reconsider how to fix it, but will also
comment on the current patch below.

>
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index 7be632c15f91..7b91c790b798 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1326,7 +1326,13 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
>                 err = nilfs_sufile_alloc(nilfs->ns_sufile, &nextnum);
>                 if (err)
>                         goto failed;
> +       } else {
> +               /* Check the next segment has alreadly been allocated */

Here is a typo:  "alreadly" -> "already"

> +               err = nilfs_sufile_test_allocated(nilfs->ns_sufile, nextnum);
> +               if (err)
> +                       goto failed;
>         }
> +
>         nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
>
>         BUG_ON(!list_empty(&sci->sc_segbufs));
> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> index 853a8212114f..8dff12c56891 100644
> --- a/fs/nilfs2/sufile.c
> +++ b/fs/nilfs2/sufile.c
> @@ -399,6 +399,36 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>         return ret;
>  }
>
> +int nilfs_sufile_test_allocated(struct inode *sufile, __u64 *segnump)

The second argument should be "__u64 segnum".  The type is different
from the above caller.

> +{
> +       struct buffer_head *su_bh;
> +       struct nilfs_segment_usage *su;
> +       void *kaddr;
> +       int ret;
> +
> +       down_write(&NILFS_MDT(sufile)->mi_sem);

Please use down_read()/up_read() since the operation of this function
is not mutative.
See nilfs_sufile_get_stat() for an example.

> +
> +       ret = nilfs_sufile_get_segment_usage_block(sufile, segnump, 1,

"segnump" should be "segnum".  If you use a pointer, this must be "*segnump".
Either way this is wrong.

> +                                                  &su_bh);
> +       if (ret < 0)
> +               goto out_sem;
> +
> +       kaddr = kmap_atomic(su_bh->b_page);
> +       su = nilfs_sufile_block_get_segment_usage(
> +               sufile, segnump, su_bh, kaddr);

Ditto.   "segnump" here should be "segnum".

> +
> +       if (nilfs_segment_usage_clean(su))
> +               ret = -EIO;
> +
> +       kunmap_atomic(kaddr);
> +
> +       brelse(su_bh);
> +
> +out_sem:
> +       up_write(&NILFS_MDT(sufile)->mi_sem);
> +       return ret;
> +}
> +
>  void nilfs_sufile_do_cancel_free(struct inode *sufile, __u64 segnum,
>                                  struct buffer_head *header_bh,
>                                  struct buffer_head *su_bh)
> diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
> index 8e8a1a5a0402..02b61ca6f318 100644
> --- a/fs/nilfs2/sufile.h
> +++ b/fs/nilfs2/sufile.h
> @@ -24,6 +24,7 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode *sufile);
>
>  int nilfs_sufile_set_alloc_range(struct inode *sufile, __u64 start, __u64 end);
>  int nilfs_sufile_alloc(struct inode *, __u64 *);

> +int nilfs_sufile_test_allocated(struct inode *, __u64 *);

This should be:

int nilfs_sufile_test_allocated(struct inode *sufile, __u64 segnum);

Please include the variable name in the argument for newly added or
modified declarations.

Regards,
Ryusuke Konishi


>  int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
>  int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
>                                    unsigned long nblocks, time64_t modtime);
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()
@ 2022-11-09 14:00                   ` Ryusuke Konishi
  0 siblings, 0 replies; 23+ messages in thread
From: Ryusuke Konishi @ 2022-11-09 14:00 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Andrew Morton, linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 9, 2022 at 12:26 PM Liu Shixin wrote:
> On 2022/11/9 1:50, Ryusuke Konishi wrote:
> > Hi Liu Shixin,
> >
> > I'm sorry for my repeated emails.
> >
> > On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote:
> >>> I think the most likely cause is metadata corruption.  If so, we
> >>> should fix it by adding a proper sanity check, yes.
> >>> However, there is still the possibility that the error retry logic
> >>> after detecting errors has a flaw. (I believe at least this is not a
> >>> problem with normal paths.)
> >>> The sufile state inconsistency above is hypothetical for now.
> >>> Either way, I'd like to make sure what's actually happening (and where
> >>> the anomaly is coming from) so we can decide how to fix it.
> >> I noticed that this syzbot report has a disk image "mount_0.gz", so I
> >> tried to mount it read-only.
> >> The result was as follows:
> >>
> >> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
> >> $ lssu
> >>               SEGNUM        DATE     TIME STAT     NBLOCKS
> >>                    0  26760730-10-29 19:40:00  -de   527958016
> >>                    1  26760730-11-01 20:29:04  -de   527958016
> >>                    2  1176433641-11-01 02:01:52  -de  2983038235
> >>                    3  -1158249729-11-01 04:57:19  a--       25375
> >>                    4  1970-01-02 21:24:32  a--           0
> >>                    5  -1415215929-01-02 00:19:15  --e  1631451365
> >>                    6  841067190-01-02 13:02:59  -d-  3101461260
> >>                    7  1495233207-01-02 01:31:11  -de  1697748441
> >>                    8  1875753393-01-02 21:54:14  -de   337757600
> >>                    9  648878284-01-02 21:06:08  ---  2133388152
> >>                   10  2122778986-01-02 17:49:43  --e   874605800
> >>                   11  55846197-01-02 09:50:53  -de  4262365368
> >>                   12  1872396026-01-02 06:45:18  ---  1051768258
> >>                   13  820920473-01-02 19:28:35  --e  2932336675
> >>                   14  2128306755-01-02 10:17:45  ---  3568501216
> >>                   15  1457063063-01-02 01:24:17  --e  2330511560
> >>                   16  586864775-01-02 16:08:15  ---   355283425
> >>                   17  -824870041-01-02 22:47:26  -d-  4177999057
> >>                   18  1562176264-01-02 08:06:45  ---  1312597355
> >>                   19  -392925420-01-02 17:08:27  -d-  3811075948
> >>                   20  1927575458-01-02 21:26:51  -de  1384562525
> >>                   21  2139923505-01-02 08:16:04  -d-    41861305
> >>
> >> Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
> >> ns_nextnum) has the dirty flag
> >> ("d" in STAT).   So, as expected, this seems to be the root cause of
> >> the duplicate assignments and oops.
> >> The proper correction would be, therefore, to check and reject (or
> >> fix) this anomaly while outputting an error message
> >> (or warning if fix it at the same time).
> >> It should be inserted in mount time logic because the segments that
> >> nilfs2 itself allocates are always marked dirty with
> >> nilfs_sufile_alloc().
> > I will change my opinion.
> >
> > Considering the possibility of sufile corruption at runtime, it seems
> > that the sanity check should be done on every nilfs_sufile_alloc()
> > call.
> >
> > I now think nilfs_sufile_alloc() should call nilfs_error() and return
> > -EIO if the number of a newly found vacant segment matches
> > nilfs->ns_segnum or nilfs->ns_nextnum.

> Before we add the first segbuf into sci->sc_segbufs in nilfs_segctor_begin_construction(),
> there is no possibility existing duplicate segnum. And the subsequent process should
> not be affected by sufile corruption. So I think it's enough to only check for case alloc==0.
> I don't find any other possible wrong scenarios.

I think the problem is not yet fixed.
With your patch, I still think there are scenarios that cause
duplicate segment assignments:
(1)  If the segment at nilfs->ns_segnum is in a clean state due to
sufile data corruption, and
if sci->sc_write_logs is empty and nilfs->ns_segnum == nilfs->nextnum
to start writing a segment from the beginning, then
nilfs_sufile_alloc() is called in nilfs_segctor_begin_construction()
and nextnum can be
equal to nilfs->ns_segnum.
This nextnum is stored in segbuf->sb_nextnum, and if segbuf is spliced
with nilfs_segctor_extend_segments(), the segbuf->sb_nextnum will be
used for sb_segnum of the added segbuf, and the list corruption of
sb_segsum_buffers can happen between these two segbufs.

(2)  If the segment at nilfs->ns_segnum is in a clean state due to
sufile data corruption, and
if sci->sc_write_logs is not empty and segbuf->sb_rest_blocks <
NILFS_PSEG_MIN_BLOCKS,
then nilfs_sufile_alloc() is called in
nilfs_segctor_begin_construction(), and the retrieved nextnum is set
to
sb_nextnum of the appended segbuf there, and this sb_nextnum can be
equal to nilfs->ns_segnum.
Then, if the segbufs are spliced with nilfs_segctor_extend_segments(),
the last segbuf->sb_nextnum will be used for sb_segnum of the appended
segbuf.  Therefore, list corruption of sb_segsum_buffers can happen.

(3) If the segment at nilfs->ns_segnum is in a clean state due to
sufile data corruption, and then the segbuf is spliced with
nilfs_segctor_extend_segments(), nilfs_sufile_alloc() is called, and
the retrieved "nextnextnum" is set to sb_nextnum of the appended
segbuf, but this can be equal to nilfs->ns_segnum.
If one more segbuf is appended to the segbufs in
nilfs_segctor_extend_segments(), list corruption of
sb_segsum_buffers can happen due to the segnum duplication.

And please call nilfs_error() after detecting the metadata anomaly to
output this critical situation and  degrade the filesystem
appropriately (read-only for example).  This metadata corruption is a
critical situation and writes will continually fail.  File system
operations should not continue as if nothing happened.

I would like to ask you to reconsider how to fix it, but will also
comment on the current patch below.

>
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index 7be632c15f91..7b91c790b798 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1326,7 +1326,13 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
>                 err = nilfs_sufile_alloc(nilfs->ns_sufile, &nextnum);
>                 if (err)
>                         goto failed;
> +       } else {
> +               /* Check the next segment has alreadly been allocated */

Here is a typo:  "alreadly" -> "already"

> +               err = nilfs_sufile_test_allocated(nilfs->ns_sufile, nextnum);
> +               if (err)
> +                       goto failed;
>         }
> +
>         nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
>
>         BUG_ON(!list_empty(&sci->sc_segbufs));
> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> index 853a8212114f..8dff12c56891 100644
> --- a/fs/nilfs2/sufile.c
> +++ b/fs/nilfs2/sufile.c
> @@ -399,6 +399,36 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
>         return ret;
>  }
>
> +int nilfs_sufile_test_allocated(struct inode *sufile, __u64 *segnump)

The second argument should be "__u64 segnum".  The type is different
from the above caller.

> +{
> +       struct buffer_head *su_bh;
> +       struct nilfs_segment_usage *su;
> +       void *kaddr;
> +       int ret;
> +
> +       down_write(&NILFS_MDT(sufile)->mi_sem);

Please use down_read()/up_read() since the operation of this function
is not mutative.
See nilfs_sufile_get_stat() for an example.

> +
> +       ret = nilfs_sufile_get_segment_usage_block(sufile, segnump, 1,

"segnump" should be "segnum".  If you use a pointer, this must be "*segnump".
Either way this is wrong.

> +                                                  &su_bh);
> +       if (ret < 0)
> +               goto out_sem;
> +
> +       kaddr = kmap_atomic(su_bh->b_page);
> +       su = nilfs_sufile_block_get_segment_usage(
> +               sufile, segnump, su_bh, kaddr);

Ditto.   "segnump" here should be "segnum".

> +
> +       if (nilfs_segment_usage_clean(su))
> +               ret = -EIO;
> +
> +       kunmap_atomic(kaddr);
> +
> +       brelse(su_bh);
> +
> +out_sem:
> +       up_write(&NILFS_MDT(sufile)->mi_sem);
> +       return ret;
> +}
> +
>  void nilfs_sufile_do_cancel_free(struct inode *sufile, __u64 segnum,
>                                  struct buffer_head *header_bh,
>                                  struct buffer_head *su_bh)
> diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
> index 8e8a1a5a0402..02b61ca6f318 100644
> --- a/fs/nilfs2/sufile.h
> +++ b/fs/nilfs2/sufile.h
> @@ -24,6 +24,7 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode *sufile);
>
>  int nilfs_sufile_set_alloc_range(struct inode *sufile, __u64 start, __u64 end);
>  int nilfs_sufile_alloc(struct inode *, __u64 *);

> +int nilfs_sufile_test_allocated(struct inode *, __u64 *);

This should be:

int nilfs_sufile_test_allocated(struct inode *sufile, __u64 segnum);

Please include the variable name in the argument for newly added or
modified declarations.

Regards,
Ryusuke Konishi


>  int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
>  int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
>                                    unsigned long nblocks, time64_t modtime);
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-11-09 14:00 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08  2:29 [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write() Liu Shixin
2022-11-08  2:29 ` Liu Shixin
2022-11-08  4:41 ` Ryusuke Konishi
2022-11-08  4:41   ` Ryusuke Konishi
2022-11-08  6:19   ` Ryusuke Konishi
2022-11-08  6:19     ` Ryusuke Konishi
2022-11-08  6:49   ` Liu Shixin
2022-11-08  6:49     ` Liu Shixin
2022-11-08 10:33     ` Ryusuke Konishi
2022-11-08 10:33       ` Ryusuke Konishi
2022-11-08 11:10       ` Ryusuke Konishi
2022-11-08 11:53       ` Liu Shixin
2022-11-08 11:53         ` Liu Shixin
2022-11-08 12:24         ` Ryusuke Konishi
2022-11-08 12:24           ` Ryusuke Konishi
2022-11-08 15:13           ` Ryusuke Konishi
2022-11-08 15:13             ` Ryusuke Konishi
2022-11-08 17:50             ` Ryusuke Konishi
2022-11-08 17:50               ` Ryusuke Konishi
2022-11-09  3:26               ` Liu Shixin
2022-11-09  3:26                 ` Liu Shixin
2022-11-09 14:00                 ` Ryusuke Konishi
2022-11-09 14:00                   ` Ryusuke Konishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.