All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [overlayfs?] general protection fault in d_path
@ 2023-07-05 12:00 syzbot
  2023-07-05 13:05 ` Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: syzbot @ 2023-07-05 12:00 UTC (permalink / raw)
  To: amir73il, linux-fsdevel, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    d528014517f2 Revert ".gitignore: ignore *.cover and *.mbx"
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14fad002a80000
kernel config:  https://syzkaller.appspot.com/x/.config?x=1085b4238c9eb6ba
dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/fef94e788067/disk-d5280145.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/576412ea518b/vmlinux-d5280145.xz
kernel image: https://storage.googleapis.com/syzbot-assets/685a0e4be06b/bzImage-d5280145.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
CPU: 1 PID: 10127 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-11478-gd528014517f2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
 seqcount_lockdep_reader_access+0x139/0x220 include/linux/seqlock.h:102
 get_fs_root_rcu fs/d_path.c:244 [inline]
 d_path+0x2f0/0x6e0 fs/d_path.c:286
 audit_log_d_path+0xd3/0x310 kernel/audit.c:2139
 dump_common_audit_data security/lsm_audit.c:224 [inline]
 common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
 smack_log+0x421/0x540 security/smack/smack_access.c:383
 smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
 smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1202
 security_inode_getattr+0xd3/0x120 security/security.c:2114
 vfs_getattr+0x25/0x70 fs/stat.c:167
 ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
 ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
 ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203
 __fput+0x36a/0x950 fs/file_table.c:378
 task_work_run+0x24a/0x300 kernel/task_work.c:179
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0x68f/0x2290 kernel/exit.c:874
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x1709/0x17e0 kernel/signal.c:2877
 arch_do_signal_or_restart+0x91/0x670 arch/x86/kernel/signal.c:308
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:297
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f7f3ca8c389
Code: Unable to access opcode bytes at 0x7f7f3ca8c35f.
RSP: 002b:00007f7f3d741168 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
RAX: fffffffffffffffb RBX: 00007f7f3cbac050 RCX: 00007f7f3ca8c389
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000020000180
RBP: 00007f7f3cad7493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff8432555f R14: 00007f7f3d741300 R15: 0000000000022000
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
   0:	85 75 18             	test   %esi,0x18(%rbp)
   3:	00 00                	add    %al,(%rax)
   5:	83 3d 15 c8 2c 0d 00 	cmpl   $0x0,0xd2cc815(%rip)        # 0xd2cc821
   c:	48 89 9c 24 10 01 00 	mov    %rbx,0x110(%rsp)
  13:	00
  14:	0f 84 f8 0f 00 00    	je     0x1012
  1a:	83 3d 5c de b3 0b 00 	cmpl   $0x0,0xbb3de5c(%rip)        # 0xbb3de7d
  21:	74 34                	je     0x57
  23:	48 89 d0             	mov    %rdx,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	42 80 3c 00 00       	cmpb   $0x0,(%rax,%r8,1) <-- trapping instruction
  2f:	74 1a                	je     0x4b
  31:	48 89 d7             	mov    %rdx,%rdi
  34:	e8 b4 51 79 00       	callq  0x7951ed
  39:	48                   	rex.W
  3a:	8b                   	.byte 0x8b
  3b:	94                   	xchg   %eax,%esp
  3c:	24 80                	and    $0x80,%al


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [overlayfs?] general protection fault in d_path
  2023-07-05 12:00 [syzbot] [overlayfs?] general protection fault in d_path syzbot
@ 2023-07-05 13:05 ` Christian Brauner
  2023-07-05 13:39   ` Amir Goldstein
  2023-07-05 13:41   ` Jeff Layton
  2023-09-12 22:10 ` syzbot
  2023-09-18  0:04 ` syzbot
  2 siblings, 2 replies; 35+ messages in thread
From: Christian Brauner @ 2023-07-05 13:05 UTC (permalink / raw)
  To: syzbot, Jeff Layton, amir73il
  Cc: linux-fsdevel, linux-kernel, linux-unionfs, miklos, syzkaller-bugs

On Wed, Jul 05, 2023 at 05:00:45AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    d528014517f2 Revert ".gitignore: ignore *.cover and *.mbx"
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14fad002a80000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=1085b4238c9eb6ba
> dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/fef94e788067/disk-d5280145.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/576412ea518b/vmlinux-d5280145.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/685a0e4be06b/bzImage-d5280145.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> 
> general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
> CPU: 1 PID: 10127 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-11478-gd528014517f2 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
> Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
> RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
> RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
> RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
> RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
> R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
> R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
> FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
>  seqcount_lockdep_reader_access+0x139/0x220 include/linux/seqlock.h:102
>  get_fs_root_rcu fs/d_path.c:244 [inline]
>  d_path+0x2f0/0x6e0 fs/d_path.c:286
>  audit_log_d_path+0xd3/0x310 kernel/audit.c:2139
>  dump_common_audit_data security/lsm_audit.c:224 [inline]
>  common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
>  smack_log+0x421/0x540 security/smack/smack_access.c:383
>  smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
>  smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1202
>  security_inode_getattr+0xd3/0x120 security/security.c:2114
>  vfs_getattr+0x25/0x70 fs/stat.c:167
>  ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
>  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
>  ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203

Ugh, I think the root of this might all be the call back into
vfs_getattr() that happens on overlayfs:

__fput()
-> ima_file_free()
   -> mutex_lock()
   -> vfs_getattr_nosec()
      -> i_op->getattr() == ovl_getattr()
         -> vfs_getattr()
	    -> security_inode_getattr()
   -> mutex_unlock()

So either overlayfs needs to call vfs_getattr_nosec() when the request
comes from vfs_getattr_nosec() or this needs to use
backing_file_real_path() to operate on the real underlying path.

Thoughts?

>  __fput+0x36a/0x950 fs/file_table.c:378
>  task_work_run+0x24a/0x300 kernel/task_work.c:179
>  exit_task_work include/linux/task_work.h:38 [inline]
>  do_exit+0x68f/0x2290 kernel/exit.c:874
>  do_group_exit+0x206/0x2c0 kernel/exit.c:1024
>  get_signal+0x1709/0x17e0 kernel/signal.c:2877
>  arch_do_signal_or_restart+0x91/0x670 arch/x86/kernel/signal.c:308
>  exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
>  exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>  syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:297
>  do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f7f3ca8c389
> Code: Unable to access opcode bytes at 0x7f7f3ca8c35f.
> RSP: 002b:00007f7f3d741168 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
> RAX: fffffffffffffffb RBX: 00007f7f3cbac050 RCX: 00007f7f3ca8c389
> RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000020000180
> RBP: 00007f7f3cad7493 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007fff8432555f R14: 00007f7f3d741300 R15: 0000000000022000
>  </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
> Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
> RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
> RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
> RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
> RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
> R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
> R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
> FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> ----------------
> Code disassembly (best guess):
>    0:	85 75 18             	test   %esi,0x18(%rbp)
>    3:	00 00                	add    %al,(%rax)
>    5:	83 3d 15 c8 2c 0d 00 	cmpl   $0x0,0xd2cc815(%rip)        # 0xd2cc821
>    c:	48 89 9c 24 10 01 00 	mov    %rbx,0x110(%rsp)
>   13:	00
>   14:	0f 84 f8 0f 00 00    	je     0x1012
>   1a:	83 3d 5c de b3 0b 00 	cmpl   $0x0,0xbb3de5c(%rip)        # 0xbb3de7d
>   21:	74 34                	je     0x57
>   23:	48 89 d0             	mov    %rdx,%rax
>   26:	48 c1 e8 03          	shr    $0x3,%rax
> * 2a:	42 80 3c 00 00       	cmpb   $0x0,(%rax,%r8,1) <-- trapping instruction
>   2f:	74 1a                	je     0x4b
>   31:	48 89 d7             	mov    %rdx,%rdi
>   34:	e8 b4 51 79 00       	callq  0x7951ed
>   39:	48                   	rex.W
>   3a:	8b                   	.byte 0x8b
>   3b:	94                   	xchg   %eax,%esp
>   3c:	24 80                	and    $0x80,%al
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the bug is already fixed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want to change bug's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the bug is a duplicate of another bug, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [overlayfs?] general protection fault in d_path
  2023-07-05 13:05 ` Christian Brauner
@ 2023-07-05 13:39   ` Amir Goldstein
  2023-07-05 13:41   ` Jeff Layton
  1 sibling, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2023-07-05 13:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: syzbot, Jeff Layton, linux-fsdevel, linux-kernel, linux-unionfs,
	miklos, syzkaller-bugs, Mimi Zohar

On Wed, Jul 5, 2023 at 4:06 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Jul 05, 2023 at 05:00:45AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    d528014517f2 Revert ".gitignore: ignore *.cover and *.mbx"
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14fad002a80000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=1085b4238c9eb6ba
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/fef94e788067/disk-d5280145.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/576412ea518b/vmlinux-d5280145.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/685a0e4be06b/bzImage-d5280145.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> >
> > general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
> > CPU: 1 PID: 10127 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-11478-gd528014517f2 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> > RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
> > Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
> > RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
> > RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
> > RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
> > RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
> > R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
> > R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
> > FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
> >  seqcount_lockdep_reader_access+0x139/0x220 include/linux/seqlock.h:102
> >  get_fs_root_rcu fs/d_path.c:244 [inline]
> >  d_path+0x2f0/0x6e0 fs/d_path.c:286
> >  audit_log_d_path+0xd3/0x310 kernel/audit.c:2139
> >  dump_common_audit_data security/lsm_audit.c:224 [inline]
> >  common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
> >  smack_log+0x421/0x540 security/smack/smack_access.c:383
> >  smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
> >  smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1202
> >  security_inode_getattr+0xd3/0x120 security/security.c:2114
> >  vfs_getattr+0x25/0x70 fs/stat.c:167
> >  ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
> >  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
> >  ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203
>
> Ugh, I think the root of this might all be the call back into
> vfs_getattr() that happens on overlayfs:
>
> __fput()
> -> ima_file_free()
>    -> mutex_lock()
>    -> vfs_getattr_nosec()
>       -> i_op->getattr() == ovl_getattr()
>          -> vfs_getattr()
>             -> security_inode_getattr()
>    -> mutex_unlock()
>
> So either overlayfs needs to call vfs_getattr_nosec() when the request
> comes from vfs_getattr_nosec() or this needs to use
> backing_file_real_path() to operate on the real underlying path.
>
> Thoughts?

The latter.

IMA code cannot operate on a mixture of real inode (file_inode())
real dentry (file_dentry()) and ovl path, especially for reading
stat.change_cookie which is not really well defined in ovl.

At least those direct f_path references need to be fixed:

security/integrity/ima/ima_main.c:
vfs_getattr_nosec(&file->f_path, &stat,
security/integrity/ima/ima_api.c:       result =
vfs_getattr_nosec(&file->f_path, &stat, STATX_CHANGE_COOKIE,
security/integrity/ima/ima_crypto.c:            f =
dentry_open(&file->f_path, flags, file->f_cred);

and then all the places that format full path for audit logs:
security/integrity/ima/ima_main.c:      *pathname =
ima_d_path(&file->f_path, pathbuf, filename);

Need to decide if it is prefered to log the full ovl path or the
relative real path (relative to the private mount clone of the ovl layer).

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [overlayfs?] general protection fault in d_path
  2023-07-05 13:05 ` Christian Brauner
  2023-07-05 13:39   ` Amir Goldstein
@ 2023-07-05 13:41   ` Jeff Layton
  2023-07-05 13:54     ` Amir Goldstein
  1 sibling, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2023-07-05 13:41 UTC (permalink / raw)
  To: Christian Brauner, syzbot, amir73il
  Cc: linux-fsdevel, linux-kernel, linux-unionfs, miklos, syzkaller-bugs

On Wed, 2023-07-05 at 15:05 +0200, Christian Brauner wrote:
> On Wed, Jul 05, 2023 at 05:00:45AM -0700, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    d528014517f2 Revert ".gitignore: ignore *.cover and *.mbx"
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14fad002a80000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=1085b4238c9eb6ba
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > 
> > Unfortunately, I don't have any reproducer for this issue yet.
> > 
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/fef94e788067/disk-d5280145.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/576412ea518b/vmlinux-d5280145.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/685a0e4be06b/bzImage-d5280145.xz
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > 
> > general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
> > CPU: 1 PID: 10127 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-11478-gd528014517f2 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> > RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
> > Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
> > RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
> > RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
> > RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
> > RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
> > R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
> > R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
> > FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
> >  seqcount_lockdep_reader_access+0x139/0x220 include/linux/seqlock.h:102
> >  get_fs_root_rcu fs/d_path.c:244 [inline]
> >  d_path+0x2f0/0x6e0 fs/d_path.c:286
> >  audit_log_d_path+0xd3/0x310 kernel/audit.c:2139
> >  dump_common_audit_data security/lsm_audit.c:224 [inline]
> >  common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
> >  smack_log+0x421/0x540 security/smack/smack_access.c:383
> >  smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
> >  smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1202
> >  security_inode_getattr+0xd3/0x120 security/security.c:2114
> >  vfs_getattr+0x25/0x70 fs/stat.c:167
> >  ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
> >  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
> >  ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203
> 
> Ugh, I think the root of this might all be the call back into
> vfs_getattr() that happens on overlayfs:
> 
> __fput()
> -> ima_file_free()
>    -> mutex_lock()
>    -> vfs_getattr_nosec()
>       -> i_op->getattr() == ovl_getattr()
>          -> vfs_getattr()
> 	    -> security_inode_getattr()
>    -> mutex_unlock()
> 
> So either overlayfs needs to call vfs_getattr_nosec() when the request
> comes from vfs_getattr_nosec() or this needs to use
> backing_file_real_path() to operate on the real underlying path.
> 
> Thoughts?
> 

When you say "this needs to use backing_file_real_path()", what do you
mean by "this"? IMA?

That said, passing some sort of NOSEC flag to vfs_getattr that
designates the call as kernel-internal seems like the more correct thing
to do here, and might be useful in other weird stacking cases like this.

> >  __fput+0x36a/0x950 fs/file_table.c:378
> >  task_work_run+0x24a/0x300 kernel/task_work.c:179
> >  exit_task_work include/linux/task_work.h:38 [inline]
> >  do_exit+0x68f/0x2290 kernel/exit.c:874
> >  do_group_exit+0x206/0x2c0 kernel/exit.c:1024
> >  get_signal+0x1709/0x17e0 kernel/signal.c:2877
> >  arch_do_signal_or_restart+0x91/0x670 arch/x86/kernel/signal.c:308
> >  exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
> >  exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
> >  __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> >  syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:297
> >  do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
> >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > RIP: 0033:0x7f7f3ca8c389
> > Code: Unable to access opcode bytes at 0x7f7f3ca8c35f.
> > RSP: 002b:00007f7f3d741168 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
> > RAX: fffffffffffffffb RBX: 00007f7f3cbac050 RCX: 00007f7f3ca8c389
> > RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000020000180
> > RBP: 00007f7f3cad7493 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007fff8432555f R14: 00007f7f3d741300 R15: 0000000000022000
> >  </TASK>
> > Modules linked in:
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
> > Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
> > RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
> > RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
> > RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
> > RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
> > R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
> > R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
> > FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > ----------------
> > Code disassembly (best guess):
> >    0:	85 75 18             	test   %esi,0x18(%rbp)
> >    3:	00 00                	add    %al,(%rax)
> >    5:	83 3d 15 c8 2c 0d 00 	cmpl   $0x0,0xd2cc815(%rip)        # 0xd2cc821
> >    c:	48 89 9c 24 10 01 00 	mov    %rbx,0x110(%rsp)
> >   13:	00
> >   14:	0f 84 f8 0f 00 00    	je     0x1012
> >   1a:	83 3d 5c de b3 0b 00 	cmpl   $0x0,0xbb3de5c(%rip)        # 0xbb3de7d
> >   21:	74 34                	je     0x57
> >   23:	48 89 d0             	mov    %rdx,%rax
> >   26:	48 c1 e8 03          	shr    $0x3,%rax
> > * 2a:	42 80 3c 00 00       	cmpb   $0x0,(%rax,%r8,1) <-- trapping instruction
> >   2f:	74 1a                	je     0x4b
> >   31:	48 89 d7             	mov    %rdx,%rdi
> >   34:	e8 b4 51 79 00       	callq  0x7951ed
> >   39:	48                   	rex.W
> >   3a:	8b                   	.byte 0x8b
> >   3b:	94                   	xchg   %eax,%esp
> >   3c:	24 80                	and    $0x80,%al
> > 
> > 
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > 
> > If the bug is already fixed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> > 
> > If you want to change bug's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> > 
> > If the bug is a duplicate of another bug, reply with:
> > #syz dup: exact-subject-of-another-report
> > 
> > If you want to undo deduplication, reply with:
> > #syz undup

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [overlayfs?] general protection fault in d_path
  2023-07-05 13:41   ` Jeff Layton
@ 2023-07-05 13:54     ` Amir Goldstein
  2023-07-05 14:10       ` Jeff Layton
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2023-07-05 13:54 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christian Brauner, syzbot, linux-fsdevel, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Wed, Jul 5, 2023 at 4:41 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Wed, 2023-07-05 at 15:05 +0200, Christian Brauner wrote:
> > On Wed, Jul 05, 2023 at 05:00:45AM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    d528014517f2 Revert ".gitignore: ignore *.cover and *.mbx"
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=14fad002a80000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=1085b4238c9eb6ba
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/fef94e788067/disk-d5280145.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/576412ea518b/vmlinux-d5280145.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/685a0e4be06b/bzImage-d5280145.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > >
> > > general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] PREEMPT SMP KASAN
> > > KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
> > > CPU: 1 PID: 10127 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-11478-gd528014517f2 #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> > > RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
> > > Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
> > > RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
> > > RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
> > > RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
> > > RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
> > > R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
> > > R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
> > > FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > >  <TASK>
> > >  lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
> > >  seqcount_lockdep_reader_access+0x139/0x220 include/linux/seqlock.h:102
> > >  get_fs_root_rcu fs/d_path.c:244 [inline]
> > >  d_path+0x2f0/0x6e0 fs/d_path.c:286
> > >  audit_log_d_path+0xd3/0x310 kernel/audit.c:2139
> > >  dump_common_audit_data security/lsm_audit.c:224 [inline]
> > >  common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
> > >  smack_log+0x421/0x540 security/smack/smack_access.c:383
> > >  smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
> > >  smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1202
> > >  security_inode_getattr+0xd3/0x120 security/security.c:2114
> > >  vfs_getattr+0x25/0x70 fs/stat.c:167
> > >  ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
> > >  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
> > >  ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203
> >
> > Ugh, I think the root of this might all be the call back into
> > vfs_getattr() that happens on overlayfs:
> >
> > __fput()
> > -> ima_file_free()
> >    -> mutex_lock()
> >    -> vfs_getattr_nosec()
> >       -> i_op->getattr() == ovl_getattr()
> >          -> vfs_getattr()
> >           -> security_inode_getattr()
> >    -> mutex_unlock()
> >
> > So either overlayfs needs to call vfs_getattr_nosec() when the request
> > comes from vfs_getattr_nosec() or this needs to use
> > backing_file_real_path() to operate on the real underlying path.
> >
> > Thoughts?
> >
>
> When you say "this needs to use backing_file_real_path()", what do you
> mean by "this"? IMA?
>
> That said, passing some sort of NOSEC flag to vfs_getattr that
> designates the call as kernel-internal seems like the more correct thing
> to do here, and might be useful in other weird stacking cases like this.
>

I don't think that NOSEC is the root cause.

If you ever noticed file_dentry() sprinkled through fs code,
it is only there because if that code were to call use helpers
that rely on file_inode() and d_inode(file->f_path.dentry) being
the same - bad things will happen and NOSEC will not cover
all those bad things.

IMA code also has file_dentry() sprinkled.
But it still accesses file->f_path in a few places and that
can result in bad things.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [overlayfs?] general protection fault in d_path
  2023-07-05 13:54     ` Amir Goldstein
@ 2023-07-05 14:10       ` Jeff Layton
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff Layton @ 2023-07-05 14:10 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, syzbot, linux-fsdevel, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Wed, 2023-07-05 at 16:54 +0300, Amir Goldstein wrote:
> On Wed, Jul 5, 2023 at 4:41 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Wed, 2023-07-05 at 15:05 +0200, Christian Brauner wrote:
> > > On Wed, Jul 05, 2023 at 05:00:45AM -0700, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    d528014517f2 Revert ".gitignore: ignore *.cover and *.mbx"
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=14fad002a80000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=1085b4238c9eb6ba
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > > > 
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > 
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/fef94e788067/disk-d5280145.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/576412ea518b/vmlinux-d5280145.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/685a0e4be06b/bzImage-d5280145.xz
> > > > 
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > 
> > > > general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] PREEMPT SMP KASAN
> > > > KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
> > > > CPU: 1 PID: 10127 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-11478-gd528014517f2 #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> > > > RIP: 0010:__lock_acquire+0x10d/0x7f70 kernel/locking/lockdep.c:5012
> > > > Code: 85 75 18 00 00 83 3d 15 c8 2c 0d 00 48 89 9c 24 10 01 00 00 0f 84 f8 0f 00 00 83 3d 5c de b3 0b 00 74 34 48 89 d0 48 c1 e8 03 <42> 80 3c 00 00 74 1a 48 89 d7 e8 b4 51 79 00 48 8b 94 24 80 00 00
> > > > RSP: 0018:ffffc900169be9e0 EFLAGS: 00010006
> > > > RAX: 000000000000000a RBX: 1ffff92002d37d60 RCX: 0000000000000002
> > > > RDX: 0000000000000050 RSI: 0000000000000000 RDI: 0000000000000050
> > > > RBP: ffffc900169beca8 R08: dffffc0000000000 R09: 0000000000000001
> > > > R10: dffffc0000000000 R11: fffffbfff1d2fe76 R12: 0000000000000000
> > > > R13: 0000000000000001 R14: 0000000000000002 R15: ffff88802091d940
> > > > FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 00007fa22a3fe000 CR3: 000000004b5e1000 CR4: 00000000003506e0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > Call Trace:
> > > >  <TASK>
> > > >  lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
> > > >  seqcount_lockdep_reader_access+0x139/0x220 include/linux/seqlock.h:102
> > > >  get_fs_root_rcu fs/d_path.c:244 [inline]
> > > >  d_path+0x2f0/0x6e0 fs/d_path.c:286
> > > >  audit_log_d_path+0xd3/0x310 kernel/audit.c:2139
> > > >  dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > >  common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
> > > >  smack_log+0x421/0x540 security/smack/smack_access.c:383
> > > >  smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
> > > >  smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1202
> > > >  security_inode_getattr+0xd3/0x120 security/security.c:2114
> > > >  vfs_getattr+0x25/0x70 fs/stat.c:167
> > > >  ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
> > > >  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
> > > >  ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203
> > > 
> > > Ugh, I think the root of this might all be the call back into
> > > vfs_getattr() that happens on overlayfs:
> > > 
> > > __fput()
> > > -> ima_file_free()
> > >    -> mutex_lock()
> > >    -> vfs_getattr_nosec()
> > >       -> i_op->getattr() == ovl_getattr()
> > >          -> vfs_getattr()
> > >           -> security_inode_getattr()
> > >    -> mutex_unlock()
> > > 
> > > So either overlayfs needs to call vfs_getattr_nosec() when the request
> > > comes from vfs_getattr_nosec() or this needs to use
> > > backing_file_real_path() to operate on the real underlying path.
> > > 
> > > Thoughts?
> > > 
> > 
> > When you say "this needs to use backing_file_real_path()", what do you
> > mean by "this"? IMA?
> > 
> > That said, passing some sort of NOSEC flag to vfs_getattr that
> > designates the call as kernel-internal seems like the more correct thing
> > to do here, and might be useful in other weird stacking cases like this.
> > 
> 
> I don't think that NOSEC is the root cause.
> 
> If you ever noticed file_dentry() sprinkled through fs code,
> it is only there because if that code were to call use helpers
> that rely on file_inode() and d_inode(file->f_path.dentry) being
> the same - bad things will happen and NOSEC will not cover
> all those bad things.
> 
> IMA code also has file_dentry() sprinkled.
> But it still accesses file->f_path in a few places and that
> can result in bad things.
> 

Ok, that makes sense, and is a lot less invasive than having to rework
vfs_getattr.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [overlayfs?] general protection fault in d_path
  2023-07-05 12:00 [syzbot] [overlayfs?] general protection fault in d_path syzbot
  2023-07-05 13:05 ` Christian Brauner
@ 2023-09-12 22:10 ` syzbot
  2023-09-13  7:09   ` Amir Goldstein
  2023-09-18  0:04 ` syzbot
  2 siblings, 1 reply; 35+ messages in thread
From: syzbot @ 2023-09-12 22:10 UTC (permalink / raw)
  To: amir73il, brauner, jlayton, linux-fsdevel, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs, zohar

syzbot has found a reproducer for the following issue on:

HEAD commit:    a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2' of g..
git tree:       upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=11c82308680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/b28ecb88c714/disk-a747acc0.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/03dd2cd5356f/vmlinux-a747acc0.xz
kernel image: https://storage.googleapis.com/syzbot-assets/63365d9bf980/bzImage-a747acc0.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
CPU: 0 PID: 5030 Comm: syz-executor173 Not tainted 6.6.0-rc1-syzkaller-00014-ga747acc0b752 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
RSP: 0018:ffffc90003a7eee0 EFLAGS: 00010246
RAX: 7e73051ae5315e00 RBX: 0000000000000009 RCX: ffff88807da73b80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003a7eff0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
R13: 0000000000000048 R14: dffffc0000000000 R15: ffff8880206d8000
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f351862ebb8 CR3: 00000000276a7000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 audit_log_d_path+0xd3/0x310 kernel/audit.c:2138
 dump_common_audit_data security/lsm_audit.c:224 [inline]
 common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
 smack_log+0x421/0x540 security/smack/smack_access.c:383
 smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
 smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1271
 security_inode_getattr+0xd3/0x120 security/security.c:2153
 vfs_getattr+0x2a/0x3a0 fs/stat.c:206
 ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
 ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
 ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203
 __fput+0x36a/0x910 fs/file_table.c:378
 task_work_run+0x24a/0x300 kernel/task_work.c:179
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0x68f/0x2290 kernel/exit.c:874
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f35185d8529
Code: Unable to access opcode bytes at 0x7f35185d84ff.
RSP: 002b:00007f3518599218 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00007f3518662308 RCX: 00007f35185d8529
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f351866230c
RBP: 00007f3518662300 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f351862f064
R13: 0031656c69662f2e R14: 6e6f3d7865646e69 R15: 0079616c7265766f
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
RSP: 0018:ffffc90003a7eee0 EFLAGS: 00010246
RAX: 7e73051ae5315e00 RBX: 0000000000000009 RCX: ffff88807da73b80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003a7eff0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
R13: 0000000000000048 R14: dffffc0000000000 R15: ffff8880206d8000
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f351862ebb8 CR3: 000000007e769000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
   0:	30 00                	xor    %al,(%rax)
   2:	74 08                	je     0xc
   4:	48 89 df             	mov    %rbx,%rdi
   7:	e8 be 20 e1 ff       	call   0xffe120ca
   c:	4c 8b 23             	mov    (%rbx),%r12
   f:	4d 8d 6c 24 48       	lea    0x48(%r12),%r13
  14:	49 81 c4 88 00 00 00 	add    $0x88,%r12
  1b:	4c 89 eb             	mov    %r13,%rbx
  1e:	48 c1 eb 03          	shr    $0x3,%rbx
  22:	4c 89 ef             	mov    %r13,%rdi
  25:	e8 00 1e 00 00       	call   0x1e2a
* 2a:	42 0f b6 04 33       	movzbl (%rbx,%r14,1),%eax <-- trapping instruction
  2f:	84 c0                	test   %al,%al
  31:	0f 85 89 00 00 00    	jne    0xc0
  37:	45 8b 7d 00          	mov    0x0(%r13),%r15d
  3b:	44 89 fe             	mov    %r15d,%esi
  3e:	83                   	.byte 0x83
  3f:	e6                   	.byte 0xe6


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [overlayfs?] general protection fault in d_path
  2023-09-12 22:10 ` syzbot
@ 2023-09-13  7:09   ` Amir Goldstein
  2023-09-13  9:06     ` [syzbot] [integrity] [overlayfs] " syzbot
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2023-09-13  7:09 UTC (permalink / raw)
  To: syzbot
  Cc: brauner, jlayton, linux-fsdevel, linux-kernel, linux-unionfs,
	miklos, syzkaller-bugs, zohar

On Wed, Sep 13, 2023 at 1:10 AM syzbot
<syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com> wrote:
>
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit:    a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2' of g..
> git tree:       upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=11c82308680000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/b28ecb88c714/disk-a747acc0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/03dd2cd5356f/vmlinux-a747acc0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/63365d9bf980/bzImage-a747acc0.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
>
> general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
> CPU: 0 PID: 5030 Comm: syz-executor173 Not tainted 6.6.0-rc1-syzkaller-00014-ga747acc0b752 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
> RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
> RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
> RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
> Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
> RSP: 0018:ffffc90003a7eee0 EFLAGS: 00010246
> RAX: 7e73051ae5315e00 RBX: 0000000000000009 RCX: ffff88807da73b80
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffc90003a7eff0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
> R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
> R13: 0000000000000048 R14: dffffc0000000000 R15: ffff8880206d8000
> FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f351862ebb8 CR3: 00000000276a7000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  audit_log_d_path+0xd3/0x310 kernel/audit.c:2138
>  dump_common_audit_data security/lsm_audit.c:224 [inline]
>  common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
>  smack_log+0x421/0x540 security/smack/smack_access.c:383
>  smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
>  smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1271
>  security_inode_getattr+0xd3/0x120 security/security.c:2153
>  vfs_getattr+0x2a/0x3a0 fs/stat.c:206
>  ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
>  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
>  ima_file_free+0x26e/0x4b0 security/integrity/ima/ima_main.c:203
>  __fput+0x36a/0x910 fs/file_table.c:378
>  task_work_run+0x24a/0x300 kernel/task_work.c:179
>  exit_task_work include/linux/task_work.h:38 [inline]
>  do_exit+0x68f/0x2290 kernel/exit.c:874
>  do_group_exit+0x206/0x2c0 kernel/exit.c:1024
>  get_signal+0x175d/0x1840 kernel/signal.c:2892
>  arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
>  exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
>  exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>  syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
>  do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f35185d8529
> Code: Unable to access opcode bytes at 0x7f35185d84ff.
> RSP: 002b:00007f3518599218 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000001 RBX: 00007f3518662308 RCX: 00007f35185d8529
> RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f351866230c
> RBP: 00007f3518662300 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f351862f064
> R13: 0031656c69662f2e R14: 6e6f3d7865646e69 R15: 0079616c7265766f
>  </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
> RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
> RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
> Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
> RSP: 0018:ffffc90003a7eee0 EFLAGS: 00010246
> RAX: 7e73051ae5315e00 RBX: 0000000000000009 RCX: ffff88807da73b80
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffc90003a7eff0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
> R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
> R13: 0000000000000048 R14: dffffc0000000000 R15: ffff8880206d8000
> FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f351862ebb8 CR3: 000000007e769000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> ----------------
> Code disassembly (best guess):
>    0:   30 00                   xor    %al,(%rax)
>    2:   74 08                   je     0xc
>    4:   48 89 df                mov    %rbx,%rdi
>    7:   e8 be 20 e1 ff          call   0xffe120ca
>    c:   4c 8b 23                mov    (%rbx),%r12
>    f:   4d 8d 6c 24 48          lea    0x48(%r12),%r13
>   14:   49 81 c4 88 00 00 00    add    $0x88,%r12
>   1b:   4c 89 eb                mov    %r13,%rbx
>   1e:   48 c1 eb 03             shr    $0x3,%rbx
>   22:   4c 89 ef                mov    %r13,%rdi
>   25:   e8 00 1e 00 00          call   0x1e2a
> * 2a:   42 0f b6 04 33          movzbl (%rbx,%r14,1),%eax <-- trapping instruction
>   2f:   84 c0                   test   %al,%al
>   31:   0f 85 89 00 00 00       jne    0xc0
>   37:   45 8b 7d 00             mov    0x0(%r13),%r15d
>   3b:   44 89 fe                mov    %r15d,%esi
>   3e:   83                      .byte 0x83
>   3f:   e6                      .byte 0xe6
>
>
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.

#syz set subsystems: integrity, overlayfs

#syz test: https://github.com/amir73il/linux ima-ovl-fix

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-13  7:09   ` Amir Goldstein
@ 2023-09-13  9:06     ` syzbot
  2023-09-14 17:54       ` Mimi Zohar
  0 siblings, 1 reply; 35+ messages in thread
From: syzbot @ 2023-09-13  9:06 UTC (permalink / raw)
  To: amir73il, brauner, jlayton, linux-fsdevel, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs, zohar

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
general protection fault in d_path

general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
CPU: 0 PID: 5465 Comm: syz-executor.0 Not tainted 6.6.0-rc1-syzkaller-00004-g965067e2f71e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
RSP: 0018:ffffc90005056ec0 EFLAGS: 00010246
RAX: be27ea831a7ad800 RBX: 0000000000000009 RCX: ffff88801c713b80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90005056fd0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
R13: 0000000000000048 R14: dffffc0000000000 R15: ffff888076dcc000
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4b67d70420 CR3: 0000000016f66000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 audit_log_d_path+0xd3/0x310 kernel/audit.c:2138
 dump_common_audit_data security/lsm_audit.c:224 [inline]
 common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
 smack_log+0x421/0x540 security/smack/smack_access.c:383
 smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
 smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1271
 security_inode_getattr+0xd3/0x120 security/security.c:2153
 vfs_getattr+0x2a/0x3a0 fs/stat.c:206
 ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
 ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
 ima_file_free+0x2c3/0x560 security/integrity/ima/ima_main.c:203
 __fput+0x36a/0x910 fs/file_table.c:378
 task_work_run+0x24a/0x300 kernel/task_work.c:179
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0x68f/0x2290 kernel/exit.c:874
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f2a67a7cae9
Code: Unable to access opcode bytes at 0x7f2a67a7cabf.
RSP: 002b:00007f2a6875c178 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00007f2a67b9bf88 RCX: 00007f2a67a7cae9
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f2a67b9bf8c
RBP: 00007f2a67b9bf80 R08: 00007fffba3690b0 R09: 00007f2a6875c6c0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2a67b9bf8c
R13: 000000000000000b R14: 00007fffba21b880 R15: 00007fffba21b968
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
RSP: 0018:ffffc90005056ec0 EFLAGS: 00010246
RAX: be27ea831a7ad800 RBX: 0000000000000009 RCX: ffff88801c713b80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90005056fd0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
R13: 0000000000000048 R14: dffffc0000000000 R15: ffff888076dcc000
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4b67d70420 CR3: 0000000016f66000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
   0:	30 00                	xor    %al,(%rax)
   2:	74 08                	je     0xc
   4:	48 89 df             	mov    %rbx,%rdi
   7:	e8 be 20 e1 ff       	call   0xffe120ca
   c:	4c 8b 23             	mov    (%rbx),%r12
   f:	4d 8d 6c 24 48       	lea    0x48(%r12),%r13
  14:	49 81 c4 88 00 00 00 	add    $0x88,%r12
  1b:	4c 89 eb             	mov    %r13,%rbx
  1e:	48 c1 eb 03          	shr    $0x3,%rbx
  22:	4c 89 ef             	mov    %r13,%rdi
  25:	e8 00 1e 00 00       	call   0x1e2a
* 2a:	42 0f b6 04 33       	movzbl (%rbx,%r14,1),%eax <-- trapping instruction
  2f:	84 c0                	test   %al,%al
  31:	0f 85 89 00 00 00    	jne    0xc0
  37:	45 8b 7d 00          	mov    0x0(%r13),%r15d
  3b:	44 89 fe             	mov    %r15d,%esi
  3e:	83                   	.byte 0x83
  3f:	e6                   	.byte 0xe6


Tested on:

commit:         965067e2 ima: fix wrong dereferences of file->f_path
git tree:       https://github.com/amir73il/linux ima-ovl-fix
console output: https://syzkaller.appspot.com/x/log.txt?x=109b00e8680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Note: no patches were applied.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-13  9:06     ` [syzbot] [integrity] [overlayfs] " syzbot
@ 2023-09-14 17:54       ` Mimi Zohar
  0 siblings, 0 replies; 35+ messages in thread
From: Mimi Zohar @ 2023-09-14 17:54 UTC (permalink / raw)
  To: syzbot, amir73il, brauner, jlayton, linux-fsdevel, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Wed, 2023-09-13 at 02:06 -0700, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> general protection fault in d_path

Is an IMA policy being loaded?  Please include the boot command line.


> general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
> CPU: 0 PID: 5465 Comm: syz-executor.0 Not tainted 6.6.0-rc1-syzkaller-00004-g965067e2f71e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
> RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
> RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
> RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
> Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
> RSP: 0018:ffffc90005056ec0 EFLAGS: 00010246
> RAX: be27ea831a7ad800 RBX: 0000000000000009 RCX: ffff88801c713b80
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffc90005056fd0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
> R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
> R13: 0000000000000048 R14: dffffc0000000000 R15: ffff888076dcc000
> FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4b67d70420 CR3: 0000000016f66000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  audit_log_d_path+0xd3/0x310 kernel/audit.c:2138
>  dump_common_audit_data security/lsm_audit.c:224 [inline]
>  common_lsm_audit+0x7cf/0x1a90 security/lsm_audit.c:458
>  smack_log+0x421/0x540 security/smack/smack_access.c:383
>  smk_tskacc+0x2ff/0x360 security/smack/smack_access.c:253
>  smack_inode_getattr+0x203/0x270 security/smack/smack_lsm.c:1271
>  security_inode_getattr+0xd3/0x120 security/security.c:2153
>  vfs_getattr+0x2a/0x3a0 fs/stat.c:206
>  ovl_getattr+0x1b1/0xf70 fs/overlayfs/inode.c:174
>  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
>  ima_file_free+0x2c3/0x560 security/integrity/ima/ima_main.c:203
>  __fput+0x36a/0x910 fs/file_table.c:378
>  task_work_run+0x24a/0x300 kernel/task_work.c:179
>  exit_task_work include/linux/task_work.h:38 [inline]
>  do_exit+0x68f/0x2290 kernel/exit.c:874
>  do_group_exit+0x206/0x2c0 kernel/exit.c:1024
>  get_signal+0x175d/0x1840 kernel/signal.c:2892
>  arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
>  exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
>  exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>  syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
>  do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f2a67a7cae9
> Code: Unable to access opcode bytes at 0x7f2a67a7cabf.
> RSP: 002b:00007f2a6875c178 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000001 RBX: 00007f2a67b9bf88 RCX: 00007f2a67a7cae9
> RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f2a67b9bf8c
> RBP: 00007f2a67b9bf80 R08: 00007fffba3690b0 R09: 00007f2a6875c6c0
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2a67b9bf8c
> R13: 000000000000000b R14: 00007fffba21b880 R15: 00007fffba21b968
>  </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__seqprop_spinlock_sequence include/linux/seqlock.h:275 [inline]
> RIP: 0010:get_fs_root_rcu fs/d_path.c:244 [inline]
> RIP: 0010:d_path+0x2f0/0x6e0 fs/d_path.c:286
> Code: 30 00 74 08 48 89 df e8 be 20 e1 ff 4c 8b 23 4d 8d 6c 24 48 49 81 c4 88 00 00 00 4c 89 eb 48 c1 eb 03 4c 89 ef e8 00 1e 00 00 <42> 0f b6 04 33 84 c0 0f 85 89 00 00 00 45 8b 7d 00 44 89 fe 83 e6
> RSP: 0018:ffffc90005056ec0 EFLAGS: 00010246
> RAX: be27ea831a7ad800 RBX: 0000000000000009 RCX: ffff88801c713b80
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffc90005056fd0 R08: ffffffff82068d08 R09: 1ffffffff1d34ccd
> R10: dffffc0000000000 R11: fffffbfff1d34cce R12: 0000000000000088
> R13: 0000000000000048 R14: dffffc0000000000 R15: ffff888076dcc000
> FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4b67d70420 CR3: 0000000016f66000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> ----------------
> Code disassembly (best guess):
>    0:	30 00                	xor    %al,(%rax)
>    2:	74 08                	je     0xc
>    4:	48 89 df             	mov    %rbx,%rdi
>    7:	e8 be 20 e1 ff       	call   0xffe120ca
>    c:	4c 8b 23             	mov    (%rbx),%r12
>    f:	4d 8d 6c 24 48       	lea    0x48(%r12),%r13
>   14:	49 81 c4 88 00 00 00 	add    $0x88,%r12
>   1b:	4c 89 eb             	mov    %r13,%rbx
>   1e:	48 c1 eb 03          	shr    $0x3,%rbx
>   22:	4c 89 ef             	mov    %r13,%rdi
>   25:	e8 00 1e 00 00       	call   0x1e2a
> * 2a:	42 0f b6 04 33       	movzbl (%rbx,%r14,1),%eax <-- trapping instruction
>   2f:	84 c0                	test   %al,%al
>   31:	0f 85 89 00 00 00    	jne    0xc0
>   37:	45 8b 7d 00          	mov    0x0(%r13),%r15d
>   3b:	44 89 fe             	mov    %r15d,%esi
>   3e:	83                   	.byte 0x83
>   3f:	e6                   	.byte 0xe6
> 
> 
> Tested on:
> 
> commit:         965067e2 ima: fix wrong dereferences of file->f_path
> git tree:       https://github.com/amir73il/linux ima-ovl-fix
> console output: https://syzkaller.appspot.com/x/log.txt?x=109b00e8680000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> 
> Note: no patches were applied.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-07-05 12:00 [syzbot] [overlayfs?] general protection fault in d_path syzbot
  2023-07-05 13:05 ` Christian Brauner
  2023-09-12 22:10 ` syzbot
@ 2023-09-18  0:04 ` syzbot
  2023-09-20 17:01   ` Stefan Berger
  2 siblings, 1 reply; 35+ messages in thread
From: syzbot @ 2023-09-18  0:04 UTC (permalink / raw)
  To: amir73il, brauner, jlayton, linux-fsdevel, linux-integrity,
	linux-kernel, linux-unionfs, miklos, stefanb, syzkaller-bugs,
	zohar

syzbot has bisected this issue to:

commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
Author: Jeff Layton <jlayton@kernel.org>
Date:   Mon Apr 17 16:55:51 2023 +0000

    IMA: use vfs_getattr_nosec to get the i_version

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
start commit:   a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2' of g..
git tree:       upstream
final oops:     https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
console output: https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000

Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-18  0:04 ` syzbot
@ 2023-09-20 17:01   ` Stefan Berger
  2023-09-20 20:37     ` Stefan Berger
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Berger @ 2023-09-20 17:01 UTC (permalink / raw)
  To: syzbot, amir73il, brauner, jlayton, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs, zohar


On 9/17/23 20:04, syzbot wrote:
> syzbot has bisected this issue to:
>
> commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> Author: Jeff Layton <jlayton@kernel.org>
> Date:   Mon Apr 17 16:55:51 2023 +0000
>
>      IMA: use vfs_getattr_nosec to get the i_version
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> start commit:   a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2' of g..
> git tree:       upstream
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> console output: https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> dashboard link: https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>
> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

The final oops shows this here:

BUG: kernel NULL pointer dereference, address: 0000000000000058
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted 6.4.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 08/04/2023
RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05 02 
00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0 <49> 8b 44 
c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <TASK>
  lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
  seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
  get_fs_root_rcu fs/d_path.c:243 [inline]
  d_path+0xd1/0x1f0 fs/d_path.c:285
  audit_log_d_path+0x65/0x130 kernel/audit.c:2139
  dump_common_audit_data security/lsm_audit.c:224 [inline]
  common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
  smack_log+0xad/0x130 security/smack/smack_access.c:383
  smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
  smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
  security_inode_getattr+0x32/0x50 security/security.c:2114
  vfs_getattr+0x1b/0x40 fs/stat.c:167
  ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
  ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
  __fput+0xc7/0x220 fs/file_table.c:315
  task_work_run+0x7d/0xa0 kernel/task_work.c:179
  exit_task_work include/linux/task_work.h:38 [inline]
  do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
  do_group_exit+0x85/0xa0 kernel/exit.c:1021
  get_signal+0x73c/0x7f0 kernel/signal.c:2874
  arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
  exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
  exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
  __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
  syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
  do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
  entry_SYSCALL_64_after_hwframe+0x63/0xcd


do_exit has called exit_fs(tsk) [ 
https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]

exit_fs(tsk) has set tsk->fs = NULL [ 
https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103 ]

I think this then bites in d_path() where it calls:

     get_fs_root_rcu(current->fs, &root);   [ 
https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]

current->fs is likely NULL here.

If this was correct it would have nothing to do with the actual patch, 
though, but rather with the fact that smack logs on process termination. 
I am not sure what the solution would be other than testing for 
current->fs == NULL in d_path before using it and returning an error 
that is not normally returned or trying to intercept this case in smack.

    Stefan


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-20 17:01   ` Stefan Berger
@ 2023-09-20 20:37     ` Stefan Berger
  2023-09-20 21:16       ` Jeff Layton
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Berger @ 2023-09-20 20:37 UTC (permalink / raw)
  To: syzbot, amir73il, brauner, jlayton, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs, zohar, casey


On 9/20/23 13:01, Stefan Berger wrote:
>
> On 9/17/23 20:04, syzbot wrote:
>> syzbot has bisected this issue to:
>>
>> commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
>> Author: Jeff Layton <jlayton@kernel.org>
>> Date:   Mon Apr 17 16:55:51 2023 +0000
>>
>>      IMA: use vfs_getattr_nosec to get the i_version
>>
>> bisection log: 
>> https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
>> start commit:   a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2' 
>> of g..
>> git tree:       upstream
>> final oops: https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
>> kernel config: 
>> https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
>> dashboard link: 
>> https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>>
>> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
>> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
>>
>> For information about bisection process see: 
>> https://goo.gl/tpsmEJ#bisection
>
> The final oops shows this here:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000058
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted 6.4.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, 
> BIOS Google 08/04/2023
> RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05 
> 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0 <49> 8b 
> 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
>  seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
>  get_fs_root_rcu fs/d_path.c:243 [inline]
>  d_path+0xd1/0x1f0 fs/d_path.c:285
>  audit_log_d_path+0x65/0x130 kernel/audit.c:2139
>  dump_common_audit_data security/lsm_audit.c:224 [inline]
>  common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
>  smack_log+0xad/0x130 security/smack/smack_access.c:383
>  smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
>  smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
>  security_inode_getattr+0x32/0x50 security/security.c:2114
>  vfs_getattr+0x1b/0x40 fs/stat.c:167
>  ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
>  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
>  ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
>  __fput+0xc7/0x220 fs/file_table.c:315
>  task_work_run+0x7d/0xa0 kernel/task_work.c:179
>  exit_task_work include/linux/task_work.h:38 [inline]
>  do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
>  do_group_exit+0x85/0xa0 kernel/exit.c:1021
>  get_signal+0x73c/0x7f0 kernel/signal.c:2874
>  arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
>  exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
>  exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>  syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
>  do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
>
> do_exit has called exit_fs(tsk) [ 
> https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
>
> exit_fs(tsk) has set tsk->fs = NULL [ 
> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103 ]
>
> I think this then bites in d_path() where it calls:
>
>     get_fs_root_rcu(current->fs, &root);   [ 
> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
>
> current->fs is likely NULL here.
>
> If this was correct it would have nothing to do with the actual patch, 
> though, but rather with the fact that smack logs on process 
> termination. I am not sure what the solution would be other than 
> testing for current->fs == NULL in d_path before using it and 
> returning an error that is not normally returned or trying to 
> intercept this case in smack.

I have now been able to recreate the syzbot issue with the test program 
and the issue is exactly the one described here, current->fs == NULL.

    Stefan



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-20 20:37     ` Stefan Berger
@ 2023-09-20 21:16       ` Jeff Layton
  2023-09-20 22:09         ` Stefan Berger
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2023-09-20 21:16 UTC (permalink / raw)
  To: Stefan Berger, syzbot, amir73il, brauner, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs, zohar, casey

On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> On 9/20/23 13:01, Stefan Berger wrote:
> > 
> > On 9/17/23 20:04, syzbot wrote:
> > > syzbot has bisected this issue to:
> > > 
> > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > Author: Jeff Layton <jlayton@kernel.org>
> > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > 
> > >      IMA: use vfs_getattr_nosec to get the i_version
> > > 
> > > bisection log: 
> > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > start commit:   a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2' 
> > > of g..
> > > git tree:       upstream
> > > final oops: https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > kernel config: 
> > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > 
> > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
> > > 
> > > For information about bisection process see: 
> > > https://goo.gl/tpsmEJ#bisection
> > 
> > The final oops shows this here:
> > 
> > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP
> > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted 6.4.0-rc2-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, 
> > BIOS Google 08/04/2023
> > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05 
> > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0 <49> 8b 
> > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) 
> > knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> >  seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> >  get_fs_root_rcu fs/d_path.c:243 [inline]
> >  d_path+0xd1/0x1f0 fs/d_path.c:285
> >  audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> >  dump_common_audit_data security/lsm_audit.c:224 [inline]
> >  common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> >  smack_log+0xad/0x130 security/smack/smack_access.c:383
> >  smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> >  smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> >  security_inode_getattr+0x32/0x50 security/security.c:2114
> >  vfs_getattr+0x1b/0x40 fs/stat.c:167
> >  ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> >  ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
> >  ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> >  __fput+0xc7/0x220 fs/file_table.c:315
> >  task_work_run+0x7d/0xa0 kernel/task_work.c:179
> >  exit_task_work include/linux/task_work.h:38 [inline]
> >  do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> >  do_group_exit+0x85/0xa0 kernel/exit.c:1021
> >  get_signal+0x73c/0x7f0 kernel/signal.c:2874
> >  arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> >  exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> >  exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> >  __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> >  syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> >  do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> > 
> > do_exit has called exit_fs(tsk) [ 
> > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > 
> > exit_fs(tsk) has set tsk->fs = NULL [ 
> > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103 ]
> > 
> > I think this then bites in d_path() where it calls:
> > 
> >     get_fs_root_rcu(current->fs, &root);   [ 
> > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > 
> > current->fs is likely NULL here.
> > 
> > If this was correct it would have nothing to do with the actual patch, 
> > though, but rather with the fact that smack logs on process 
> > termination. I am not sure what the solution would be other than 
> > testing for current->fs == NULL in d_path before using it and 
> > returning an error that is not normally returned or trying to 
> > intercept this case in smack.
> 
> I have now been able to recreate the syzbot issue with the test program 
> and the issue is exactly the one described here, current->fs == NULL.
> 

Earlier in this thread, Amir had a diagnosis that IMA is inappropriately
trying to use f_path directly instead of using the helpers that are
friendly for stacking filesystems.

https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/

I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
someone on the IMA team could try this approach?

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-20 21:16       ` Jeff Layton
@ 2023-09-20 22:09         ` Stefan Berger
  2023-09-21  0:10           ` Stefan Berger
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Berger @ 2023-09-20 22:09 UTC (permalink / raw)
  To: Jeff Layton, syzbot, amir73il, brauner, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs, zohar, casey


On 9/20/23 17:16, Jeff Layton wrote:
> On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
>> On 9/20/23 13:01, Stefan Berger wrote:
>>> On 9/17/23 20:04, syzbot wrote:
>>>> syzbot has bisected this issue to:
>>>>
>>>> commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
>>>> Author: Jeff Layton <jlayton@kernel.org>
>>>> Date:   Mon Apr 17 16:55:51 2023 +0000
>>>>
>>>>       IMA: use vfs_getattr_nosec to get the i_version
>>>>
>>>> bisection log:
>>>> https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
>>>> start commit:   a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2'
>>>> of g..
>>>> git tree:       upstream
>>>> final oops: https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
>>>> kernel config:
>>>> https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
>>>> dashboard link:
>>>> https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>>>>
>>>> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
>>>> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
>>>>
>>>> For information about bisection process see:
>>>> https://goo.gl/tpsmEJ#bisection
>>> The final oops shows this here:
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000058
>>> #PF: supervisor read access in kernel mode
>>> #PF: error_code(0x0000) - not-present page
>>> PGD 0 P4D 0
>>> Oops: 0000 [#1] PREEMPT SMP
>>> CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted 6.4.0-rc2-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>> BIOS Google 08/04/2023
>>> RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
>>> Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
>>> 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0 <49> 8b
>>> 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
>>> RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>> R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
>>> FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Call Trace:
>>>   <TASK>
>>>   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
>>>   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
>>>   get_fs_root_rcu fs/d_path.c:243 [inline]
>>>   d_path+0xd1/0x1f0 fs/d_path.c:285
>>>   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
>>>   dump_common_audit_data security/lsm_audit.c:224 [inline]
>>>   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
>>>   smack_log+0xad/0x130 security/smack/smack_access.c:383
>>>   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
>>>   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
>>>   security_inode_getattr+0x32/0x50 security/security.c:2114
>>>   vfs_getattr+0x1b/0x40 fs/stat.c:167
>>>   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
>>>   ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
>>>   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
>>>   __fput+0xc7/0x220 fs/file_table.c:315
>>>   task_work_run+0x7d/0xa0 kernel/task_work.c:179
>>>   exit_task_work include/linux/task_work.h:38 [inline]
>>>   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
>>>   do_group_exit+0x85/0xa0 kernel/exit.c:1021
>>>   get_signal+0x73c/0x7f0 kernel/signal.c:2874
>>>   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
>>>   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
>>>   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
>>>   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>>>   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
>>>   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
>>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>>
>>> do_exit has called exit_fs(tsk) [
>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
>>>
>>> exit_fs(tsk) has set tsk->fs = NULL [
>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103 ]
>>>
>>> I think this then bites in d_path() where it calls:
>>>
>>>      get_fs_root_rcu(current->fs, &root);   [
>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
>>>
>>> current->fs is likely NULL here.
>>>
>>> If this was correct it would have nothing to do with the actual patch,
>>> though, but rather with the fact that smack logs on process
>>> termination. I am not sure what the solution would be other than
>>> testing for current->fs == NULL in d_path before using it and
>>> returning an error that is not normally returned or trying to
>>> intercept this case in smack.
>> I have now been able to recreate the syzbot issue with the test program
>> and the issue is exactly the one described here, current->fs == NULL.
>>
> Earlier in this thread, Amir had a diagnosis that IMA is inappropriately
> trying to use f_path directly instead of using the helpers that are
> friendly for stacking filesystems.
>
> https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
>
> I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> someone on the IMA team could try this approach?


I have applied this patch here from Amir now and it does NOT resolve the 
issue:

https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0

     Stefan



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-20 22:09         ` Stefan Berger
@ 2023-09-21  0:10           ` Stefan Berger
  2023-09-21  0:52             ` Casey Schaufler
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Berger @ 2023-09-21  0:10 UTC (permalink / raw)
  To: Jeff Layton, syzbot, amir73il, brauner, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs, zohar, casey


On 9/20/23 18:09, Stefan Berger wrote:
>
> On 9/20/23 17:16, Jeff Layton wrote:
>> On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
>>> On 9/20/23 13:01, Stefan Berger wrote:
>>>> On 9/17/23 20:04, syzbot wrote:
>>>>> syzbot has bisected this issue to:
>>>>>
>>>>> commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
>>>>> Author: Jeff Layton <jlayton@kernel.org>
>>>>> Date:   Mon Apr 17 16:55:51 2023 +0000
>>>>>
>>>>>       IMA: use vfs_getattr_nosec to get the i_version
>>>>>
>>>>> bisection log:
>>>>> https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
>>>>> start commit:   a747acc0b752 Merge tag 'linux-kselftest-next-6.6-rc2'
>>>>> of g..
>>>>> git tree:       upstream
>>>>> final oops: 
>>>>> https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
>>>>> console output: 
>>>>> https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
>>>>> kernel config:
>>>>> https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
>>>>> dashboard link:
>>>>> https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
>>>>> C reproducer: 
>>>>> https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>>>>>
>>>>> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
>>>>> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the 
>>>>> i_version")
>>>>>
>>>>> For information about bisection process see:
>>>>> https://goo.gl/tpsmEJ#bisection
>>>> The final oops shows this here:
>>>>
>>>> BUG: kernel NULL pointer dereference, address: 0000000000000058
>>>> #PF: supervisor read access in kernel mode
>>>> #PF: error_code(0x0000) - not-present page
>>>> PGD 0 P4D 0
>>>> Oops: 0000 [#1] PREEMPT SMP
>>>> CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted 
>>>> 6.4.0-rc2-syzkaller #0
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>> BIOS Google 08/04/2023
>>>> RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
>>>> Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
>>>> 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0 <49> 8b
>>>> 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
>>>> RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>> R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
>>>> FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
>>>> knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Call Trace:
>>>>   <TASK>
>>>>   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
>>>>   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
>>>>   get_fs_root_rcu fs/d_path.c:243 [inline]
>>>>   d_path+0xd1/0x1f0 fs/d_path.c:285
>>>>   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
>>>>   dump_common_audit_data security/lsm_audit.c:224 [inline]
>>>>   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
>>>>   smack_log+0xad/0x130 security/smack/smack_access.c:383
>>>>   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
>>>>   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
>>>>   security_inode_getattr+0x32/0x50 security/security.c:2114
>>>>   vfs_getattr+0x1b/0x40 fs/stat.c:167
>>>>   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
>>>>   ima_check_last_writer security/integrity/ima/ima_main.c:171 [inline]
>>>>   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
>>>>   __fput+0xc7/0x220 fs/file_table.c:315
>>>>   task_work_run+0x7d/0xa0 kernel/task_work.c:179
>>>>   exit_task_work include/linux/task_work.h:38 [inline]
>>>>   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
>>>>   do_group_exit+0x85/0xa0 kernel/exit.c:1021
>>>>   get_signal+0x73c/0x7f0 kernel/signal.c:2874
>>>>   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
>>>>   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
>>>>   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
>>>>   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>>>>   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
>>>>   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
>>>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>
>>>>
>>>> do_exit has called exit_fs(tsk) [
>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
>>>>
>>>> exit_fs(tsk) has set tsk->fs = NULL [
>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103 ]
>>>>
>>>> I think this then bites in d_path() where it calls:
>>>>
>>>>      get_fs_root_rcu(current->fs, &root);   [
>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
>>>>
>>>> current->fs is likely NULL here.
>>>>
>>>> If this was correct it would have nothing to do with the actual patch,
>>>> though, but rather with the fact that smack logs on process
>>>> termination. I am not sure what the solution would be other than
>>>> testing for current->fs == NULL in d_path before using it and
>>>> returning an error that is not normally returned or trying to
>>>> intercept this case in smack.
>>> I have now been able to recreate the syzbot issue with the test program
>>> and the issue is exactly the one described here, current->fs == NULL.
>>>
>> Earlier in this thread, Amir had a diagnosis that IMA is inappropriately
>> trying to use f_path directly instead of using the helpers that are
>> friendly for stacking filesystems.
>>
>> https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/ 
>>
>>
>> I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
>> someone on the IMA team could try this approach?
>
>
> I have applied this patch here from Amir now and it does NOT resolve 
> the issue:
>
> https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0 
>
>

This seems to resolve the issue:

diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
index 585e5e35710b..57afcea1e39b 100644
--- a/security/smack/smack_access.c
+++ b/security/smack/smack_access.c
@@ -347,6 +347,9 @@ void smack_log(char *subject_label, char 
*object_label, int request,
         struct smack_audit_data *sad;
         struct common_audit_data *a = &ad->a;

+       if (current->flags & PF_EXITING)
+               return;
+
         /* check if we have to log the current event */
         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
                 return;



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21  0:10           ` Stefan Berger
@ 2023-09-21  0:52             ` Casey Schaufler
  2023-09-21 10:32               ` Jeff Layton
  0 siblings, 1 reply; 35+ messages in thread
From: Casey Schaufler @ 2023-09-21  0:52 UTC (permalink / raw)
  To: Stefan Berger, Jeff Layton, syzbot, amir73il, brauner,
	linux-fsdevel, linux-integrity, linux-kernel, linux-unionfs,
	miklos, syzkaller-bugs, zohar, Casey Schaufler

On 9/20/2023 5:10 PM, Stefan Berger wrote:
>
> On 9/20/23 18:09, Stefan Berger wrote:
>>
>> On 9/20/23 17:16, Jeff Layton wrote:
>>> On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
>>>> On 9/20/23 13:01, Stefan Berger wrote:
>>>>> On 9/17/23 20:04, syzbot wrote:
>>>>>> syzbot has bisected this issue to:
>>>>>>
>>>>>> commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
>>>>>> Author: Jeff Layton <jlayton@kernel.org>
>>>>>> Date:   Mon Apr 17 16:55:51 2023 +0000
>>>>>>
>>>>>>       IMA: use vfs_getattr_nosec to get the i_version
>>>>>>
>>>>>> bisection log:
>>>>>> https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
>>>>>> start commit:   a747acc0b752 Merge tag
>>>>>> 'linux-kselftest-next-6.6-rc2'
>>>>>> of g..
>>>>>> git tree:       upstream
>>>>>> final oops:
>>>>>> https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
>>>>>> console output:
>>>>>> https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
>>>>>> kernel config:
>>>>>> https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
>>>>>> dashboard link:
>>>>>> https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
>>>>>> syz repro:
>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
>>>>>> C reproducer:
>>>>>> https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>>>>>>
>>>>>> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
>>>>>> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
>>>>>> i_version")
>>>>>>
>>>>>> For information about bisection process see:
>>>>>> https://goo.gl/tpsmEJ#bisection
>>>>> The final oops shows this here:
>>>>>
>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000058
>>>>> #PF: supervisor read access in kernel mode
>>>>> #PF: error_code(0x0000) - not-present page
>>>>> PGD 0 P4D 0
>>>>> Oops: 0000 [#1] PREEMPT SMP
>>>>> CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
>>>>> 6.4.0-rc2-syzkaller #0
>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>> BIOS Google 08/04/2023
>>>>> RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
>>>>> Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
>>>>> 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
>>>>> <49> 8b
>>>>> 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
>>>>> RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
>>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>>> R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
>>>>> FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
>>>>> knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Call Trace:
>>>>>   <TASK>
>>>>>   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
>>>>>   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
>>>>>   get_fs_root_rcu fs/d_path.c:243 [inline]
>>>>>   d_path+0xd1/0x1f0 fs/d_path.c:285
>>>>>   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
>>>>>   dump_common_audit_data security/lsm_audit.c:224 [inline]
>>>>>   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
>>>>>   smack_log+0xad/0x130 security/smack/smack_access.c:383
>>>>>   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
>>>>>   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
>>>>>   security_inode_getattr+0x32/0x50 security/security.c:2114
>>>>>   vfs_getattr+0x1b/0x40 fs/stat.c:167
>>>>>   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
>>>>>   ima_check_last_writer security/integrity/ima/ima_main.c:171
>>>>> [inline]
>>>>>   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
>>>>>   __fput+0xc7/0x220 fs/file_table.c:315
>>>>>   task_work_run+0x7d/0xa0 kernel/task_work.c:179
>>>>>   exit_task_work include/linux/task_work.h:38 [inline]
>>>>>   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
>>>>>   do_group_exit+0x85/0xa0 kernel/exit.c:1021
>>>>>   get_signal+0x73c/0x7f0 kernel/signal.c:2874
>>>>>   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
>>>>>   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
>>>>>   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
>>>>>   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>>>>>   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
>>>>>   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
>>>>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>
>>>>>
>>>>> do_exit has called exit_fs(tsk) [
>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
>>>>>
>>>>> exit_fs(tsk) has set tsk->fs = NULL [
>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
>>>>> ]
>>>>>
>>>>> I think this then bites in d_path() where it calls:
>>>>>
>>>>>      get_fs_root_rcu(current->fs, &root);   [
>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
>>>>>
>>>>> current->fs is likely NULL here.
>>>>>
>>>>> If this was correct it would have nothing to do with the actual
>>>>> patch,
>>>>> though, but rather with the fact that smack logs on process
>>>>> termination. I am not sure what the solution would be other than
>>>>> testing for current->fs == NULL in d_path before using it and
>>>>> returning an error that is not normally returned or trying to
>>>>> intercept this case in smack.
>>>> I have now been able to recreate the syzbot issue with the test
>>>> program
>>>> and the issue is exactly the one described here, current->fs == NULL.
>>>>
>>> Earlier in this thread, Amir had a diagnosis that IMA is
>>> inappropriately
>>> trying to use f_path directly instead of using the helpers that are
>>> friendly for stacking filesystems.
>>>
>>> https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
>>>
>>>
>>> I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
>>> someone on the IMA team could try this approach?
>>
>>
>> I have applied this patch here from Amir now and it does NOT resolve
>> the issue:
>>
>> https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
>>
>>
>
> This seems to resolve the issue:
>
> diff --git a/security/smack/smack_access.c
> b/security/smack/smack_access.c
> index 585e5e35710b..57afcea1e39b 100644
> --- a/security/smack/smack_access.c
> +++ b/security/smack/smack_access.c
> @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> *object_label, int request,
>         struct smack_audit_data *sad;
>         struct common_audit_data *a = &ad->a;
>
> +       if (current->flags & PF_EXITING)
> +               return;
> +

Based on what I see here I can understand that this prevents the panic,
but it isn't so clear what changed that introduced the problem.

>         /* check if we have to log the current event */
>         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
>                 return;
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21  0:52             ` Casey Schaufler
@ 2023-09-21 10:32               ` Jeff Layton
  2023-09-21 11:24                 ` Mimi Zohar
  2023-09-21 12:03                 ` Amir Goldstein
  0 siblings, 2 replies; 35+ messages in thread
From: Jeff Layton @ 2023-09-21 10:32 UTC (permalink / raw)
  To: Casey Schaufler, Stefan Berger, syzbot, amir73il, brauner,
	linux-fsdevel, linux-integrity, linux-kernel, linux-unionfs,
	miklos, syzkaller-bugs, zohar

On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > 
> > On 9/20/23 18:09, Stefan Berger wrote:
> > > 
> > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > syzbot has bisected this issue to:
> > > > > > > 
> > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > 
> > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > 
> > > > > > > bisection log:
> > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > of g..
> > > > > > > git tree:       upstream
> > > > > > > final oops:
> > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > console output:
> > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > kernel config:
> > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > dashboard link:
> > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > syz repro:
> > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > C reproducer:
> > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > 
> > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > i_version")
> > > > > > > 
> > > > > > > For information about bisection process see:
> > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > The final oops shows this here:
> > > > > > 
> > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > #PF: supervisor read access in kernel mode
> > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > PGD 0 P4D 0
> > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > BIOS Google 08/04/2023
> > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > <49> 8b
> > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > Call Trace:
> > > > > >   <TASK>
> > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > [inline]
> > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > 
> > > > > > 
> > > > > > do_exit has called exit_fs(tsk) [
> > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > 
> > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > ]
> > > > > > 
> > > > > > I think this then bites in d_path() where it calls:
> > > > > > 
> > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > 
> > > > > > current->fs is likely NULL here.
> > > > > > 
> > > > > > If this was correct it would have nothing to do with the actual
> > > > > > patch,
> > > > > > though, but rather with the fact that smack logs on process
> > > > > > termination. I am not sure what the solution would be other than
> > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > returning an error that is not normally returned or trying to
> > > > > > intercept this case in smack.
> > > > > I have now been able to recreate the syzbot issue with the test
> > > > > program
> > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > 
> > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > inappropriately
> > > > trying to use f_path directly instead of using the helpers that are
> > > > friendly for stacking filesystems.
> > > > 
> > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > 
> > > > 
> > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > someone on the IMA team could try this approach?
> > > 
> > > 
> > > I have applied this patch here from Amir now and it does NOT resolve
> > > the issue:
> > > 
> > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > 
> > > 
> > 
> > This seems to resolve the issue:
> > 
> > diff --git a/security/smack/smack_access.c
> > b/security/smack/smack_access.c
> > index 585e5e35710b..57afcea1e39b 100644
> > --- a/security/smack/smack_access.c
> > +++ b/security/smack/smack_access.c
> > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > *object_label, int request,
> >         struct smack_audit_data *sad;
> >         struct common_audit_data *a = &ad->a;
> > 
> > +       if (current->flags & PF_EXITING)
> > +               return;
> > +
> 
> Based on what I see here I can understand that this prevents the panic,
> but it isn't so clear what changed that introduced the problem.
> 
> >         /* check if we have to log the current event */
> >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> >                 return;
> > 
> > 

Apparently, it's this patch:

    db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version

At one time, IMA would reach directly into the inode to get the
i_version and ctime. That was fine for certain filesystems, but with
more recent changes it needs to go through ->getattr instead. Evidently,
it's selecting the wrong inode to query when dealing with overlayfs and
that's causing panics at times.

As to why the above patch helps, I'm not sure, but given that it doesn't
seem to change which inode is being queried via getattr, it seems like
this is probably papering over the real bug. That said, IMA and
overlayfs are not really in my wheelhouse, so I could be very wrong
here.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 10:32               ` Jeff Layton
@ 2023-09-21 11:24                 ` Mimi Zohar
  2023-09-21 11:48                   ` Christian Brauner
  2023-09-21 12:03                 ` Amir Goldstein
  1 sibling, 1 reply; 35+ messages in thread
From: Mimi Zohar @ 2023-09-21 11:24 UTC (permalink / raw)
  To: Jeff Layton, Casey Schaufler, Stefan Berger, syzbot, amir73il,
	brauner, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > 
> > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > 
> > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > syzbot has bisected this issue to:
> > > > > > > > 
> > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > 
> > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > 
> > > > > > > > bisection log:
> > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > of g..
> > > > > > > > git tree:       upstream
> > > > > > > > final oops:
> > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > console output:
> > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > kernel config:
> > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > dashboard link:
> > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > syz repro:
> > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > C reproducer:
> > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > 
> > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > i_version")
> > > > > > > > 
> > > > > > > > For information about bisection process see:
> > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > The final oops shows this here:
> > > > > > > 
> > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > PGD 0 P4D 0
> > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > BIOS Google 08/04/2023
> > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > <49> 8b
> > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > Call Trace:
> > > > > > >   <TASK>
> > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > [inline]
> > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > 
> > > > > > > 
> > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > 
> > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > ]
> > > > > > > 
> > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > 
> > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > 
> > > > > > > current->fs is likely NULL here.
> > > > > > > 
> > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > patch,
> > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > returning an error that is not normally returned or trying to
> > > > > > > intercept this case in smack.
> > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > program
> > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > 
> > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > inappropriately
> > > > > trying to use f_path directly instead of using the helpers that are
> > > > > friendly for stacking filesystems.
> > > > > 
> > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > 
> > > > > 
> > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > someone on the IMA team could try this approach?
> > > > 
> > > > 
> > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > the issue:
> > > > 
> > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > 
> > > > 
> > > 
> > > This seems to resolve the issue:
> > > 
> > > diff --git a/security/smack/smack_access.c
> > > b/security/smack/smack_access.c
> > > index 585e5e35710b..57afcea1e39b 100644
> > > --- a/security/smack/smack_access.c
> > > +++ b/security/smack/smack_access.c
> > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > *object_label, int request,
> > >         struct smack_audit_data *sad;
> > >         struct common_audit_data *a = &ad->a;
> > > 
> > > +       if (current->flags & PF_EXITING)
> > > +               return;
> > > +
> > 
> > Based on what I see here I can understand that this prevents the panic,
> > but it isn't so clear what changed that introduced the problem.
> > 
> > >         /* check if we have to log the current event */
> > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > >                 return;
> > > 
> > > 
> 
> Apparently, it's this patch:
> 
>     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version

Yes, the syzbot was updated with that info.

> At one time, IMA would reach directly into the inode to get the
> i_version and ctime. That was fine for certain filesystems, but with
> more recent changes it needs to go through ->getattr instead. Evidently,
> it's selecting the wrong inode to query when dealing with overlayfs and
> that's causing panics at times.
> 
> As to why the above patch helps, I'm not sure, but given that it doesn't
> seem to change which inode is being queried via getattr, it seems like
> this is probably papering over the real bug. That said, IMA and
> overlayfs are not really in my wheelhouse, so I could be very wrong
> here.

The call to vfs_getattr_nosec() somehow triggers a call to
security_inode_getattr().  Without the call neither ovl_getattr() nor
smack_inode_getattr() would be called.

Mimi


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 11:24                 ` Mimi Zohar
@ 2023-09-21 11:48                   ` Christian Brauner
  2023-09-21 14:29                     ` Stefan Berger
                                       ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Christian Brauner @ 2023-09-21 11:48 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Jeff Layton, Casey Schaufler, Stefan Berger, syzbot, amir73il,
	linux-fsdevel, linux-integrity, linux-kernel, linux-unionfs,
	miklos, syzkaller-bugs

On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > 
> > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > 
> > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > 
> > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > 
> > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > 
> > > > > > > > > bisection log:
> > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > of g..
> > > > > > > > > git tree:       upstream
> > > > > > > > > final oops:
> > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > console output:
> > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > kernel config:
> > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > dashboard link:
> > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > syz repro:
> > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > C reproducer:
> > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > 
> > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > i_version")
> > > > > > > > > 
> > > > > > > > > For information about bisection process see:
> > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > The final oops shows this here:
> > > > > > > > 
> > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > PGD 0 P4D 0
> > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > BIOS Google 08/04/2023
> > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > <49> 8b
> > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > Call Trace:
> > > > > > > >   <TASK>
> > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > [inline]
> > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > 
> > > > > > > > 
> > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > 
> > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > ]
> > > > > > > > 
> > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > 
> > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > 
> > > > > > > > current->fs is likely NULL here.
> > > > > > > > 
> > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > patch,
> > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > intercept this case in smack.
> > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > program
> > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > 
> > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > inappropriately
> > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > friendly for stacking filesystems.
> > > > > > 
> > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > 
> > > > > > 
> > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > someone on the IMA team could try this approach?
> > > > > 
> > > > > 
> > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > the issue:
> > > > > 
> > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > 
> > > > > 
> > > > 
> > > > This seems to resolve the issue:
> > > > 
> > > > diff --git a/security/smack/smack_access.c
> > > > b/security/smack/smack_access.c
> > > > index 585e5e35710b..57afcea1e39b 100644
> > > > --- a/security/smack/smack_access.c
> > > > +++ b/security/smack/smack_access.c
> > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > *object_label, int request,
> > > >         struct smack_audit_data *sad;
> > > >         struct common_audit_data *a = &ad->a;
> > > > 
> > > > +       if (current->flags & PF_EXITING)
> > > > +               return;
> > > > +
> > > 
> > > Based on what I see here I can understand that this prevents the panic,
> > > but it isn't so clear what changed that introduced the problem.
> > > 
> > > >         /* check if we have to log the current event */
> > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > >                 return;
> > > > 
> > > > 
> > 
> > Apparently, it's this patch:
> > 
> >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> 
> Yes, the syzbot was updated with that info.
> 
> > At one time, IMA would reach directly into the inode to get the
> > i_version and ctime. That was fine for certain filesystems, but with
> > more recent changes it needs to go through ->getattr instead. Evidently,
> > it's selecting the wrong inode to query when dealing with overlayfs and
> > that's causing panics at times.
> > 
> > As to why the above patch helps, I'm not sure, but given that it doesn't
> > seem to change which inode is being queried via getattr, it seems like
> > this is probably papering over the real bug. That said, IMA and
> > overlayfs are not really in my wheelhouse, so I could be very wrong
> > here.
> 
> The call to vfs_getattr_nosec() somehow triggers a call to
> security_inode_getattr().  Without the call neither ovl_getattr() nor
> smack_inode_getattr() would be called.

ima_file_free()
-> ima_check_last_writer()
   -> vfs_getattr_nosec()
      -> i_op->getattr() == ovl_getattr()
         -> vfs_getattr()
            -> security_inode_getattr()
	    -> real_i_op->getattr()

is the callchain that triggers this.

ima_file_free() is called in a very sensitive location: __fput() that
can be called from task work when the process is already PF_EXITING.

The ideal solution would be for ima to stop calling back into the
filesystems in this location at all but that's probably not going to
happen because I now realize you also set extended attributes from
__fput():


ima_check_last_writer()
-> ima_update_xatt()
   -> ima_fix_xattr()
      -> __vfs_setxattr_noperm()

The __vfs_setxattr_noperm() codepath can itself trigger
security_inode_post_setxattr() and security_inode_setsecurity(). So
those hooks are hopefully safe to be called with PF_EXITING tasks as
well...

Imho, this is all very wild but I'm not judging.

Two solutions imho:
(1) teach stacking filesystems like overlayfs and ecryptfs to use
    vfs_getattr_nosec() in their ->getattr() implementation when they
    are themselves called via vfs_getattr_nosec(). This will fix this by
    not triggering another LSM hook.
(2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
    anything

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 10:32               ` Jeff Layton
  2023-09-21 11:24                 ` Mimi Zohar
@ 2023-09-21 12:03                 ` Amir Goldstein
  1 sibling, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2023-09-21 12:03 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Casey Schaufler, Stefan Berger, syzbot, brauner, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs, zohar

On Thu, Sep 21, 2023 at 1:32 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > >
> > > On 9/20/23 18:09, Stefan Berger wrote:
> > > >
> > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > syzbot has bisected this issue to:
> > > > > > > >
> > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > >
> > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > >
> > > > > > > > bisection log:
> > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > of g..
> > > > > > > > git tree:       upstream
> > > > > > > > final oops:
> > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > console output:
> > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > kernel config:
> > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > dashboard link:
> > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > syz repro:
> > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > C reproducer:
> > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > >
> > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > i_version")
> > > > > > > >
> > > > > > > > For information about bisection process see:
> > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > The final oops shows this here:
> > > > > > >
> > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > PGD 0 P4D 0
> > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > BIOS Google 08/04/2023
> > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > <49> 8b
> > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > Call Trace:
> > > > > > >   <TASK>
> > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > [inline]
> > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > >
> > > > > > >
> > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > >
> > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > ]
> > > > > > >
> > > > > > > I think this then bites in d_path() where it calls:
> > > > > > >
> > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > >
> > > > > > > current->fs is likely NULL here.
> > > > > > >
> > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > patch,
> > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > returning an error that is not normally returned or trying to
> > > > > > > intercept this case in smack.
> > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > program
> > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > >
> > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > inappropriately
> > > > > trying to use f_path directly instead of using the helpers that are
> > > > > friendly for stacking filesystems.
> > > > >
> > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > >
> > > > >
> > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > someone on the IMA team could try this approach?
> > > >
> > > >
> > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > the issue:
> > > >
> > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > >
> > > >
> > >
> > > This seems to resolve the issue:
> > >
> > > diff --git a/security/smack/smack_access.c
> > > b/security/smack/smack_access.c
> > > index 585e5e35710b..57afcea1e39b 100644
> > > --- a/security/smack/smack_access.c
> > > +++ b/security/smack/smack_access.c
> > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > *object_label, int request,
> > >         struct smack_audit_data *sad;
> > >         struct common_audit_data *a = &ad->a;
> > >
> > > +       if (current->flags & PF_EXITING)
> > > +               return;
> > > +
> >
> > Based on what I see here I can understand that this prevents the panic,
> > but it isn't so clear what changed that introduced the problem.
> >
> > >         /* check if we have to log the current event */
> > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > >                 return;
> > >
> > >
>
> Apparently, it's this patch:
>
>     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
>
> At one time, IMA would reach directly into the inode to get the
> i_version and ctime. That was fine for certain filesystems, but with
> more recent changes it needs to go through ->getattr instead. Evidently,
> it's selecting the wrong inode to query when dealing with overlayfs and
> that's causing panics at times.
>

This is a partial story - one which I told and tried to solve with my patch.
It is true in some cases, but it wasn't the case in this reproducer.

Overlayfs keeps *two* files open, one for the ovl inode and one for
the 'real' inode in underlying fs.

IMA hooks can happen on either of these files.
fput of the ovl file will trigger fput of the real file.
My patch fixes IMA code that triggers on the real file,
which must use file_real_path() to get the path.
My patch really fixes a bug that was also introduced by your
patch, but it is not the bug that syzbot reported.

The syzbot reproducer is triggered on fput of the ovl file.
This is why my patch did not fix it.

> As to why the above patch helps, I'm not sure, but given that it doesn't
> seem to change which inode is being queried via getattr, it seems like
> this is probably papering over the real bug. That said, IMA and
> overlayfs are not really in my wheelhouse, so I could be very wrong
> here.

IMO, the solution is one of the two solutions that Christian proposed -
you had also proposed to propage the nosec context yourself
before I suggested my fix. I never said that it was wrong - just didn't
think it was the root cause for this regression, because there was
another bug under the lamppost.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 11:48                   ` Christian Brauner
@ 2023-09-21 14:29                     ` Stefan Berger
  2023-09-21 14:52                     ` Mimi Zohar
  2023-09-28 22:03                     ` Stefan Berger
  2 siblings, 0 replies; 35+ messages in thread
From: Stefan Berger @ 2023-09-21 14:29 UTC (permalink / raw)
  To: Christian Brauner, Mimi Zohar
  Cc: Jeff Layton, Casey Schaufler, syzbot, amir73il, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs


On 9/21/23 07:48, Christian Brauner wrote:
> On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
>> On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
>>> On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
>>>> On 9/20/2023 5:10 PM, Stefan Berger wrote:
>>>>> On 9/20/23 18:09, Stefan Berger wrote:
>>>>>> On 9/20/23 17:16, Jeff Layton wrote:
>>>>>>> On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
>>>>>>>> On 9/20/23 13:01, Stefan Berger wrote:
>>>>>>>>> On 9/17/23 20:04, syzbot wrote:
>>>>>>>>>> syzbot has bisected this issue to:
>>>>>>>>>>
>>>>>>>>>> commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
>>>>>>>>>> Author: Jeff Layton <jlayton@kernel.org>
>>>>>>>>>> Date:   Mon Apr 17 16:55:51 2023 +0000
>>>>>>>>>>
>>>>>>>>>>        IMA: use vfs_getattr_nosec to get the i_version
>>>>>>>>>>
>>>>>>>>>> bisection log:
>>>>>>>>>> https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
>>>>>>>>>> start commit:   a747acc0b752 Merge tag
>>>>>>>>>> 'linux-kselftest-next-6.6-rc2'
>>>>>>>>>> of g..
>>>>>>>>>> git tree:       upstream
>>>>>>>>>> final oops:
>>>>>>>>>> https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
>>>>>>>>>> console output:
>>>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
>>>>>>>>>> kernel config:
>>>>>>>>>> https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
>>>>>>>>>> dashboard link:
>>>>>>>>>> https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
>>>>>>>>>> syz repro:
>>>>>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
>>>>>>>>>> C reproducer:
>>>>>>>>>> https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>>>>>>>>>>
>>>>>>>>>> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
>>>>>>>>>> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
>>>>>>>>>> i_version")
>>>>>>>>>>
>>>>>>>>>> For information about bisection process see:
>>>>>>>>>> https://goo.gl/tpsmEJ#bisection
>>>>>>>>> The final oops shows this here:
>>>>>>>>>
>>>>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000058
>>>>>>>>> #PF: supervisor read access in kernel mode
>>>>>>>>> #PF: error_code(0x0000) - not-present page
>>>>>>>>> PGD 0 P4D 0
>>>>>>>>> Oops: 0000 [#1] PREEMPT SMP
>>>>>>>>> CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
>>>>>>>>> 6.4.0-rc2-syzkaller #0
>>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>>>>> BIOS Google 08/04/2023
>>>>>>>>> RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
>>>>>>>>> Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
>>>>>>>>> 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
>>>>>>>>> <49> 8b
>>>>>>>>> 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
>>>>>>>>> RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
>>>>>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
>>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
>>>>>>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
>>>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>>>>>>> R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
>>>>>>>>> FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
>>>>>>>>> knlGS:0000000000000000
>>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
>>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>> Call Trace:
>>>>>>>>>    <TASK>
>>>>>>>>>    lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
>>>>>>>>>    seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
>>>>>>>>>    get_fs_root_rcu fs/d_path.c:243 [inline]
>>>>>>>>>    d_path+0xd1/0x1f0 fs/d_path.c:285
>>>>>>>>>    audit_log_d_path+0x65/0x130 kernel/audit.c:2139
>>>>>>>>>    dump_common_audit_data security/lsm_audit.c:224 [inline]
>>>>>>>>>    common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
>>>>>>>>>    smack_log+0xad/0x130 security/smack/smack_access.c:383
>>>>>>>>>    smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
>>>>>>>>>    smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
>>>>>>>>>    security_inode_getattr+0x32/0x50 security/security.c:2114
>>>>>>>>>    vfs_getattr+0x1b/0x40 fs/stat.c:167
>>>>>>>>>    ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
>>>>>>>>>    ima_check_last_writer security/integrity/ima/ima_main.c:171
>>>>>>>>> [inline]
>>>>>>>>>    ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
>>>>>>>>>    __fput+0xc7/0x220 fs/file_table.c:315
>>>>>>>>>    task_work_run+0x7d/0xa0 kernel/task_work.c:179
>>>>>>>>>    exit_task_work include/linux/task_work.h:38 [inline]
>>>>>>>>>    do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
>>>>>>>>>    do_group_exit+0x85/0xa0 kernel/exit.c:1021
>>>>>>>>>    get_signal+0x73c/0x7f0 kernel/signal.c:2874
>>>>>>>>>    arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
>>>>>>>>>    exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
>>>>>>>>>    exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
>>>>>>>>>    __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>>>>>>>>>    syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
>>>>>>>>>    do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
>>>>>>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> do_exit has called exit_fs(tsk) [
>>>>>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
>>>>>>>>>
>>>>>>>>> exit_fs(tsk) has set tsk->fs = NULL [
>>>>>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> I think this then bites in d_path() where it calls:
>>>>>>>>>
>>>>>>>>>       get_fs_root_rcu(current->fs, &root);   [
>>>>>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
>>>>>>>>>
>>>>>>>>> current->fs is likely NULL here.
>>>>>>>>>
>>>>>>>>> If this was correct it would have nothing to do with the actual
>>>>>>>>> patch,
>>>>>>>>> though, but rather with the fact that smack logs on process
>>>>>>>>> termination. I am not sure what the solution would be other than
>>>>>>>>> testing for current->fs == NULL in d_path before using it and
>>>>>>>>> returning an error that is not normally returned or trying to
>>>>>>>>> intercept this case in smack.
>>>>>>>> I have now been able to recreate the syzbot issue with the test
>>>>>>>> program
>>>>>>>> and the issue is exactly the one described here, current->fs == NULL.
>>>>>>>>
>>>>>>> Earlier in this thread, Amir had a diagnosis that IMA is
>>>>>>> inappropriately
>>>>>>> trying to use f_path directly instead of using the helpers that are
>>>>>>> friendly for stacking filesystems.
>>>>>>>
>>>>>>> https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
>>>>>>>
>>>>>>>
>>>>>>> I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
>>>>>>> someone on the IMA team could try this approach?
>>>>>>
>>>>>> I have applied this patch here from Amir now and it does NOT resolve
>>>>>> the issue:
>>>>>>
>>>>>> https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
>>>>>>
>>>>>>
>>>>> This seems to resolve the issue:
>>>>>
>>>>> diff --git a/security/smack/smack_access.c
>>>>> b/security/smack/smack_access.c
>>>>> index 585e5e35710b..57afcea1e39b 100644
>>>>> --- a/security/smack/smack_access.c
>>>>> +++ b/security/smack/smack_access.c
>>>>> @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
>>>>> *object_label, int request,
>>>>>          struct smack_audit_data *sad;
>>>>>          struct common_audit_data *a = &ad->a;
>>>>>
>>>>> +       if (current->flags & PF_EXITING)
>>>>> +               return;
>>>>> +
>>>> Based on what I see here I can understand that this prevents the panic,
>>>> but it isn't so clear what changed that introduced the problem.
>>>>
>>>>>          /* check if we have to log the current event */
>>>>>          if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
>>>>>                  return;
>>>>>
>>>>>
>>> Apparently, it's this patch:
>>>
>>>      db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
>> Yes, the syzbot was updated with that info.
>>
>>> At one time, IMA would reach directly into the inode to get the
>>> i_version and ctime. That was fine for certain filesystems, but with
>>> more recent changes it needs to go through ->getattr instead. Evidently,
>>> it's selecting the wrong inode to query when dealing with overlayfs and
>>> that's causing panics at times.
>>>
>>> As to why the above patch helps, I'm not sure, but given that it doesn't
>>> seem to change which inode is being queried via getattr, it seems like
>>> this is probably papering over the real bug. That said, IMA and
>>> overlayfs are not really in my wheelhouse, so I could be very wrong
>>> here.
>> The call to vfs_getattr_nosec() somehow triggers a call to
>> security_inode_getattr().  Without the call neither ovl_getattr() nor
>> smack_inode_getattr() would be called.
> ima_file_free()
> -> ima_check_last_writer()
>     -> vfs_getattr_nosec()
>        -> i_op->getattr() == ovl_getattr()
>           -> vfs_getattr()
>              -> security_inode_getattr()
> 	    -> real_i_op->getattr()
>
> is the callchain that triggers this.
>
> ima_file_free() is called in a very sensitive location: __fput() that
> can be called from task work when the process is already PF_EXITING.
>
> The ideal solution would be for ima to stop calling back into the
> filesystems in this location at all but that's probably not going to
> happen because I now realize you also set extended attributes from
> __fput():
>
>
> ima_check_last_writer()
> -> ima_update_xatt()
>     -> ima_fix_xattr()
>        -> __vfs_setxattr_noperm()
>
> The __vfs_setxattr_noperm() codepath can itself trigger
> security_inode_post_setxattr() and security_inode_setsecurity(). So
> those hooks are hopefully safe to be called with PF_EXITING tasks as
> well...

LSM inode_post_setxattr has two users, Smack and SELinux. Smack does not 
call smack_log/audit in this case. SELinux seems safe as well.

LSM inode_setsecurity has two users, Smack and SELinux. Smack does not 
call smack_log/audit in this case. SELinux seems safe as well.


>
> Imho, this is all very wild but I'm not judging.
>
> Two solutions imho:
> (1) teach stacking filesystems like overlayfs and ecryptfs to use
>      vfs_getattr_nosec() in their ->getattr() implementation when they
>      are themselves called via vfs_getattr_nosec(). This will fix this by
>      not triggering another LSM hook.
> (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
>      anything

Re (2): LSM's getattr should still make policy decision, right? So 
callers should be allowed to go to some depth into these functions but 
avoid calls deeper into smack_log (for example) that trigger the call path
   common_lsm_audit
    -> dump_common_audit_data
     -> audit_log_d_path via a->type LSM_AUDIT_DATA_PATH or 
LSM_AUDIT_DATA_FILE or  LSM_AUDIT_DATA_IOCTL_OP and then possibly run 
into current->fs == NULL in d_path.

To avoid audit_log_d_path being called the a->type should not be either 
one of the 3 mentioned above:

LSM_AUDIT_DATA_PATH: used by Smack & SELinux
LSM_AUDIT_DATA_FILE: used by SELinux
LSM_AUDIT_DATA_IOCTL: used by SELinux


The LSM getattr has users AppArmor, SELinux, Smack and Tomoyo.

Tomoyo: seems safe
SELinux:

selinux_inode_getattr
   -> path_has_perm  [ sets LSM_AUDIT_DATA_FILE !! ]
     -> inode_has_perm
      -> avc_has_perm
       -> avc_audit
         -> slow_avc_audit
           -> common_lsm_audit
            -> dump_common_audit_data
             -> audit_log_d_path
              -> d_path
               -> get_fs_root_rcu with current->fs = NULL

Smack: (the known path per this syzbot issue)

smack_inode_getattr   [ sets LSM_AUDIT_DATA_PATH !! ]
   -> smk_curacc
     -> smk_tskacc
       -> smack_log
        -> common_lsm_audit
          -> dump_common_audit_data
            -> audit_log_d_path
              -> d_path
               -> get_fs_root_rcu with current->fs = NULL

AppArmor:

apparmor_inode_getattr
   -> common_perm_cond
      -> common_perm
        -> aa_path_perm
           -> profile_path_perm
               -> aa_path_perm
                  -> aa_audit_file      [ sets LSM_AUDIT_DATA_TASK ]
                     -> aa_audit
                      -> aa_audit_msg
                        -> common_lsm_audit
                         -> dump_common_audit_data
                           -> DOES NOT call audit_log_d_path but calls 
task_tgid_nr(tsk)


So, SELinux and Smack would be affected. The common code path starts 
with common_lsm_audit and either a check for current->flags & PF_EXITING 
or a more fine-grained one like this here could be a solution for (2)

diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 849e832719e2..4f3570322851 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -445,6 +445,18 @@ void common_lsm_audit(struct common_audit_data *a,

         if (a == NULL)
                 return;
+
+       if (current->flags & PF_EXITING) {
+               /*
+                * Avoid running into audit_log_d_path -> d_path
+                * -> get_fs_root_rcu with current->fs = NULL
+                */
+               if (a->type == LSM_AUDIT_DATA_PATH ||
+                   a->type == LSM_AUDIT_DATA_FILE ||
+                   a->type == LSM_AUDIT_DATA_IOCTL_OP)
+                       return;
+       }
+
         /* we use GFP_ATOMIC so we won't sleep */
         ab = audit_log_start(audit_context(), GFP_ATOMIC | __GFP_NOWARN,
                              AUDIT_AVC);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 11:48                   ` Christian Brauner
  2023-09-21 14:29                     ` Stefan Berger
@ 2023-09-21 14:52                     ` Mimi Zohar
  2023-09-21 15:10                       ` Jeff Layton
  2023-09-28 22:03                     ` Stefan Berger
  2 siblings, 1 reply; 35+ messages in thread
From: Mimi Zohar @ 2023-09-21 14:52 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jeff Layton, Casey Schaufler, Stefan Berger, syzbot, amir73il,
	linux-fsdevel, linux-integrity, linux-kernel, linux-unionfs,
	miklos, syzkaller-bugs

On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > 
> > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > 
> > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > 
> > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > 
> > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > 
> > > > > > > > > > bisection log:
> > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > of g..
> > > > > > > > > > git tree:       upstream
> > > > > > > > > > final oops:
> > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > console output:
> > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > kernel config:
> > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > dashboard link:
> > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > syz repro:
> > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > C reproducer:
> > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > 
> > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > i_version")
> > > > > > > > > > 
> > > > > > > > > > For information about bisection process see:
> > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > The final oops shows this here:
> > > > > > > > > 
> > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > PGD 0 P4D 0
> > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > <49> 8b
> > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > knlGS:0000000000000000
> > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > Call Trace:
> > > > > > > > >   <TASK>
> > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > [inline]
> > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > 
> > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > ]
> > > > > > > > > 
> > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > 
> > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > 
> > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > 
> > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > patch,
> > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > intercept this case in smack.
> > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > program
> > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > 
> > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > inappropriately
> > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > friendly for stacking filesystems.
> > > > > > > 
> > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > 
> > > > > > > 
> > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > someone on the IMA team could try this approach?
> > > > > > 
> > > > > > 
> > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > the issue:
> > > > > > 
> > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > 
> > > > > > 
> > > > > 
> > > > > This seems to resolve the issue:
> > > > > 
> > > > > diff --git a/security/smack/smack_access.c
> > > > > b/security/smack/smack_access.c
> > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > --- a/security/smack/smack_access.c
> > > > > +++ b/security/smack/smack_access.c
> > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > *object_label, int request,
> > > > >         struct smack_audit_data *sad;
> > > > >         struct common_audit_data *a = &ad->a;
> > > > > 
> > > > > +       if (current->flags & PF_EXITING)
> > > > > +               return;
> > > > > +
> > > > 
> > > > Based on what I see here I can understand that this prevents the panic,
> > > > but it isn't so clear what changed that introduced the problem.
> > > > 
> > > > >         /* check if we have to log the current event */
> > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > >                 return;
> > > > > 
> > > > > 
> > > 
> > > Apparently, it's this patch:
> > > 
> > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > 
> > Yes, the syzbot was updated with that info.
> > 
> > > At one time, IMA would reach directly into the inode to get the
> > > i_version and ctime. That was fine for certain filesystems, but with
> > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > that's causing panics at times.
> > > 
> > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > seem to change which inode is being queried via getattr, it seems like
> > > this is probably papering over the real bug. That said, IMA and
> > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > here.
> > 
> > The call to vfs_getattr_nosec() somehow triggers a call to
> > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > smack_inode_getattr() would be called.
> 
> ima_file_free()
> -> ima_check_last_writer()
>    -> vfs_getattr_nosec()
>       -> i_op->getattr() == ovl_getattr()
>          -> vfs_getattr()
>             -> security_inode_getattr()
> 	    -> real_i_op->getattr()
> 
> is the callchain that triggers this.

Thank you for the explanation as to why ovl_getattr() and subsequently
smack_inode_getattr() is being called.

> 
> ima_file_free() is called in a very sensitive location: __fput() that
> can be called from task work when the process is already PF_EXITING.
> 
> The ideal solution would be for ima to stop calling back into the
> filesystems in this location at all but that's probably not going to
> happen because I now realize you also set extended attributes from
> __fput():
> 
> 
> ima_check_last_writer()
> -> ima_update_xatt()
>    -> ima_fix_xattr()
>       -> __vfs_setxattr_noperm()
> 
> The __vfs_setxattr_noperm() codepath can itself trigger
> security_inode_post_setxattr() and security_inode_setsecurity(). So
> those hooks are hopefully safe to be called with PF_EXITING tasks as
> well...
> 
> Imho, this is all very wild but I'm not judging.

Measuring and verifying immutable files is straight forward. 
Measuring, verifiying, and updating mutable file hashes is a lot more
complicated.  Re-calculating the file hash everytime the file changes
would impact performance.  The file hash is currently updated as the
last writer closes the file (__fput).  One of the reasons for the wq
was for IMA to safely calculate the file hash and and take the i_mutex
to write the xattr.

IMA support for mutable files makes IMA a lot more complicated.  Any
improvement suggestions would be appreciated.

> 
> Two solutions imho:
> (1) teach stacking filesystems like overlayfs and ecryptfs to use
>     vfs_getattr_nosec() in their ->getattr() implementation when they
>     are themselves called via vfs_getattr_nosec(). This will fix this by
>     not triggering another LSM hook.
> (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
>     anything

The original problem was detecting i_version change on overlayfs.

Amir's proposed patch might resolve it without commit db1d1e8b9867
("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
said, it does not address the new problem introduced by it.   Assuming
Amir's proposed patch resolves the original problem, an alternative
solution would be to revert commit db1d1e8b9867.

Mimi


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 14:52                     ` Mimi Zohar
@ 2023-09-21 15:10                       ` Jeff Layton
  2023-09-21 15:18                         ` Amir Goldstein
  2023-09-21 15:19                         ` Mimi Zohar
  0 siblings, 2 replies; 35+ messages in thread
From: Jeff Layton @ 2023-09-21 15:10 UTC (permalink / raw)
  To: Mimi Zohar, Christian Brauner
  Cc: Casey Schaufler, Stefan Berger, syzbot, amir73il, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs

On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > 
> > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > 
> > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > 
> > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > 
> > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > 
> > > > > > > > > > > bisection log:
> > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > of g..
> > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > final oops:
> > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > console output:
> > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > kernel config:
> > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > dashboard link:
> > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > syz repro:
> > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > C reproducer:
> > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > 
> > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > i_version")
> > > > > > > > > > > 
> > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > The final oops shows this here:
> > > > > > > > > > 
> > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > <49> 8b
> > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > Call Trace:
> > > > > > > > > >   <TASK>
> > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > [inline]
> > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > 
> > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > ]
> > > > > > > > > > 
> > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > 
> > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > 
> > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > 
> > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > patch,
> > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > intercept this case in smack.
> > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > program
> > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > 
> > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > inappropriately
> > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > friendly for stacking filesystems.
> > > > > > > > 
> > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > someone on the IMA team could try this approach?
> > > > > > > 
> > > > > > > 
> > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > the issue:
> > > > > > > 
> > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > This seems to resolve the issue:
> > > > > > 
> > > > > > diff --git a/security/smack/smack_access.c
> > > > > > b/security/smack/smack_access.c
> > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > --- a/security/smack/smack_access.c
> > > > > > +++ b/security/smack/smack_access.c
> > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > *object_label, int request,
> > > > > >         struct smack_audit_data *sad;
> > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > 
> > > > > > +       if (current->flags & PF_EXITING)
> > > > > > +               return;
> > > > > > +
> > > > > 
> > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > but it isn't so clear what changed that introduced the problem.
> > > > > 
> > > > > >         /* check if we have to log the current event */
> > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > >                 return;
> > > > > > 
> > > > > > 
> > > > 
> > > > Apparently, it's this patch:
> > > > 
> > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > 
> > > Yes, the syzbot was updated with that info.
> > > 
> > > > At one time, IMA would reach directly into the inode to get the
> > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > that's causing panics at times.
> > > > 
> > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > seem to change which inode is being queried via getattr, it seems like
> > > > this is probably papering over the real bug. That said, IMA and
> > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > here.
> > > 
> > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > smack_inode_getattr() would be called.
> > 
> > ima_file_free()
> > -> ima_check_last_writer()
> >    -> vfs_getattr_nosec()
> >       -> i_op->getattr() == ovl_getattr()
> >          -> vfs_getattr()
> >             -> security_inode_getattr()
> > 	    -> real_i_op->getattr()
> > 
> > is the callchain that triggers this.
> 
> Thank you for the explanation as to why ovl_getattr() and subsequently
> smack_inode_getattr() is being called.
> 
> > 
> > ima_file_free() is called in a very sensitive location: __fput() that
> > can be called from task work when the process is already PF_EXITING.
> > 
> > The ideal solution would be for ima to stop calling back into the
> > filesystems in this location at all but that's probably not going to
> > happen because I now realize you also set extended attributes from
> > __fput():
> > 
> > 
> > ima_check_last_writer()
> > -> ima_update_xatt()
> >    -> ima_fix_xattr()
> >       -> __vfs_setxattr_noperm()
> > 
> > The __vfs_setxattr_noperm() codepath can itself trigger
> > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > well...
> > 
> > Imho, this is all very wild but I'm not judging.
> 
> Measuring and verifying immutable files is straight forward. 
> Measuring, verifiying, and updating mutable file hashes is a lot more
> complicated.  Re-calculating the file hash everytime the file changes
> would impact performance.  The file hash is currently updated as the
> last writer closes the file (__fput).  One of the reasons for the wq
> was for IMA to safely calculate the file hash and and take the i_mutex
> to write the xattr.
> 
> IMA support for mutable files makes IMA a lot more complicated.  Any
> improvement suggestions would be appreciated.
> 
> > 
> > Two solutions imho:
> > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> >     vfs_getattr_nosec() in their ->getattr() implementation when they
> >     are themselves called via vfs_getattr_nosec(). This will fix this by
> >     not triggering another LSM hook.
> > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> >     anything
> 
> The original problem was detecting i_version change on overlayfs.
> 
> Amir's proposed patch might resolve it without commit db1d1e8b9867
> ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> said, it does not address the new problem introduced by it.   Assuming
> Amir's proposed patch resolves the original problem, an alternative
> solution would be to revert commit db1d1e8b9867.
> 

If you're going to revert that commit, then I'm wondering what you
intend to do instead. Reaching directly into the inode to get this
information is really no bueno.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 15:10                       ` Jeff Layton
@ 2023-09-21 15:18                         ` Amir Goldstein
  2023-09-21 15:19                         ` Mimi Zohar
  1 sibling, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2023-09-21 15:18 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Mimi Zohar, Christian Brauner, Casey Schaufler, Stefan Berger,
	syzbot, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Thu, Sep 21, 2023 at 6:10 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> > On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > >
> > > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > >
> > > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > >
> > > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > >
> > > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > >
> > > > > > > > > > > > bisection log:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > > of g..
> > > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > > final oops:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > > console output:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > > kernel config:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > > dashboard link:
> > > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > > syz repro:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > > C reproducer:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > >
> > > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > > i_version")
> > > > > > > > > > > >
> > > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > > The final oops shows this here:
> > > > > > > > > > >
> > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > > <49> 8b
> > > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > Call Trace:
> > > > > > > > > > >   <TASK>
> > > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > > [inline]
> > > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > >
> > > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > > ]
> > > > > > > > > > >
> > > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > >
> > > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > >
> > > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > >
> > > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > > patch,
> > > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > > intercept this case in smack.
> > > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > > program
> > > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > >
> > > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > > inappropriately
> > > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > > friendly for stacking filesystems.
> > > > > > > > >
> > > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > > someone on the IMA team could try this approach?
> > > > > > > >
> > > > > > > >
> > > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > > the issue:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > This seems to resolve the issue:
> > > > > > >
> > > > > > > diff --git a/security/smack/smack_access.c
> > > > > > > b/security/smack/smack_access.c
> > > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > > --- a/security/smack/smack_access.c
> > > > > > > +++ b/security/smack/smack_access.c
> > > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > > *object_label, int request,
> > > > > > >         struct smack_audit_data *sad;
> > > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > >
> > > > > > > +       if (current->flags & PF_EXITING)
> > > > > > > +               return;
> > > > > > > +
> > > > > >
> > > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > > but it isn't so clear what changed that introduced the problem.
> > > > > >
> > > > > > >         /* check if we have to log the current event */
> > > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > > >                 return;
> > > > > > >
> > > > > > >
> > > > >
> > > > > Apparently, it's this patch:
> > > > >
> > > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > >
> > > > Yes, the syzbot was updated with that info.
> > > >
> > > > > At one time, IMA would reach directly into the inode to get the
> > > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > > that's causing panics at times.
> > > > >
> > > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > > seem to change which inode is being queried via getattr, it seems like
> > > > > this is probably papering over the real bug. That said, IMA and
> > > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > > here.
> > > >
> > > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > > smack_inode_getattr() would be called.
> > >
> > > ima_file_free()
> > > -> ima_check_last_writer()
> > >    -> vfs_getattr_nosec()
> > >       -> i_op->getattr() == ovl_getattr()
> > >          -> vfs_getattr()
> > >             -> security_inode_getattr()
> > >         -> real_i_op->getattr()
> > >
> > > is the callchain that triggers this.
> >
> > Thank you for the explanation as to why ovl_getattr() and subsequently
> > smack_inode_getattr() is being called.
> >
> > >
> > > ima_file_free() is called in a very sensitive location: __fput() that
> > > can be called from task work when the process is already PF_EXITING.
> > >
> > > The ideal solution would be for ima to stop calling back into the
> > > filesystems in this location at all but that's probably not going to
> > > happen because I now realize you also set extended attributes from
> > > __fput():
> > >
> > >
> > > ima_check_last_writer()
> > > -> ima_update_xatt()
> > >    -> ima_fix_xattr()
> > >       -> __vfs_setxattr_noperm()
> > >
> > > The __vfs_setxattr_noperm() codepath can itself trigger
> > > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > > well...
> > >
> > > Imho, this is all very wild but I'm not judging.
> >
> > Measuring and verifying immutable files is straight forward.
> > Measuring, verifiying, and updating mutable file hashes is a lot more
> > complicated.  Re-calculating the file hash everytime the file changes
> > would impact performance.  The file hash is currently updated as the
> > last writer closes the file (__fput).  One of the reasons for the wq
> > was for IMA to safely calculate the file hash and and take the i_mutex
> > to write the xattr.
> >
> > IMA support for mutable files makes IMA a lot more complicated.  Any
> > improvement suggestions would be appreciated.
> >
> > >
> > > Two solutions imho:
> > > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> > >     vfs_getattr_nosec() in their ->getattr() implementation when they
> > >     are themselves called via vfs_getattr_nosec(). This will fix this by
> > >     not triggering another LSM hook.
> > > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> > >     anything
> >
> > The original problem was detecting i_version change on overlayfs.
> >
> > Amir's proposed patch might resolve it without commit db1d1e8b9867
> > ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> > said, it does not address the new problem introduced by it.   Assuming
> > Amir's proposed patch resolves the original problem, an alternative
> > solution would be to revert commit db1d1e8b9867.
> >
>
> If you're going to revert that commit, then I'm wondering what you
> intend to do instead. Reaching directly into the inode to get this
> information is really no bueno.

Just for overlayfs or in general?

The way I see it, IMA support on overlayfs is buggy.
There are many open issues in syzbot and not overlayfs developers
and not IMA developers really care enough to fix these bugs.
Is it a setup that is interesting for IMA users at all?

If not, then we should definitely prefer a correctness bug
in IMA+overlayfs over a crash.

If it is possible, I would recommend to opt-out overlayfs
from the default IMA policy to silence syzbot bug reports
this setup.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 15:10                       ` Jeff Layton
  2023-09-21 15:18                         ` Amir Goldstein
@ 2023-09-21 15:19                         ` Mimi Zohar
  2023-09-21 15:39                           ` Jeff Layton
  1 sibling, 1 reply; 35+ messages in thread
From: Mimi Zohar @ 2023-09-21 15:19 UTC (permalink / raw)
  To: Jeff Layton, Christian Brauner
  Cc: Casey Schaufler, Stefan Berger, syzbot, amir73il, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs

On Thu, 2023-09-21 at 11:10 -0400, Jeff Layton wrote:
> On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> > On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > > 
> > > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > > 
> > > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > > 
> > > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > > 
> > > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > > 
> > > > > > > > > > > > bisection log:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > > of g..
> > > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > > final oops:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > > console output:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > > kernel config:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > > dashboard link:
> > > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > > syz repro:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > > C reproducer:
> > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > > 
> > > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > > i_version")
> > > > > > > > > > > > 
> > > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > > The final oops shows this here:
> > > > > > > > > > > 
> > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > > <49> 8b
> > > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > Call Trace:
> > > > > > > > > > >   <TASK>
> > > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > > [inline]
> > > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > > 
> > > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > > ]
> > > > > > > > > > > 
> > > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > > 
> > > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > > 
> > > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > > 
> > > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > > patch,
> > > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > > intercept this case in smack.
> > > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > > program
> > > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > > 
> > > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > > inappropriately
> > > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > > friendly for stacking filesystems.
> > > > > > > > > 
> > > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > > someone on the IMA team could try this approach?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > > the issue:
> > > > > > > > 
> > > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > This seems to resolve the issue:
> > > > > > > 
> > > > > > > diff --git a/security/smack/smack_access.c
> > > > > > > b/security/smack/smack_access.c
> > > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > > --- a/security/smack/smack_access.c
> > > > > > > +++ b/security/smack/smack_access.c
> > > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > > *object_label, int request,
> > > > > > >         struct smack_audit_data *sad;
> > > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > > 
> > > > > > > +       if (current->flags & PF_EXITING)
> > > > > > > +               return;
> > > > > > > +
> > > > > > 
> > > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > > but it isn't so clear what changed that introduced the problem.
> > > > > > 
> > > > > > >         /* check if we have to log the current event */
> > > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > > >                 return;
> > > > > > > 
> > > > > > > 
> > > > > 
> > > > > Apparently, it's this patch:
> > > > > 
> > > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > > 
> > > > Yes, the syzbot was updated with that info.
> > > > 
> > > > > At one time, IMA would reach directly into the inode to get the
> > > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > > that's causing panics at times.
> > > > > 
> > > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > > seem to change which inode is being queried via getattr, it seems like
> > > > > this is probably papering over the real bug. That said, IMA and
> > > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > > here.
> > > > 
> > > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > > smack_inode_getattr() would be called.
> > > 
> > > ima_file_free()
> > > -> ima_check_last_writer()
> > >    -> vfs_getattr_nosec()
> > >       -> i_op->getattr() == ovl_getattr()
> > >          -> vfs_getattr()
> > >             -> security_inode_getattr()
> > > 	    -> real_i_op->getattr()
> > > 
> > > is the callchain that triggers this.
> > 
> > Thank you for the explanation as to why ovl_getattr() and subsequently
> > smack_inode_getattr() is being called.
> > 
> > > 
> > > ima_file_free() is called in a very sensitive location: __fput() that
> > > can be called from task work when the process is already PF_EXITING.
> > > 
> > > The ideal solution would be for ima to stop calling back into the
> > > filesystems in this location at all but that's probably not going to
> > > happen because I now realize you also set extended attributes from
> > > __fput():
> > > 
> > > 
> > > ima_check_last_writer()
> > > -> ima_update_xatt()
> > >    -> ima_fix_xattr()
> > >       -> __vfs_setxattr_noperm()
> > > 
> > > The __vfs_setxattr_noperm() codepath can itself trigger
> > > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > > well...
> > > 
> > > Imho, this is all very wild but I'm not judging.
> > 
> > Measuring and verifying immutable files is straight forward. 
> > Measuring, verifiying, and updating mutable file hashes is a lot more
> > complicated.  Re-calculating the file hash everytime the file changes
> > would impact performance.  The file hash is currently updated as the
> > last writer closes the file (__fput).  One of the reasons for the wq
> > was for IMA to safely calculate the file hash and and take the i_mutex
> > to write the xattr.
> > 
> > IMA support for mutable files makes IMA a lot more complicated.  Any
> > improvement suggestions would be appreciated.
> > 
> > > 
> > > Two solutions imho:
> > > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> > >     vfs_getattr_nosec() in their ->getattr() implementation when they
> > >     are themselves called via vfs_getattr_nosec(). This will fix this by
> > >     not triggering another LSM hook.
> > > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> > >     anything
> > 
> > The original problem was detecting i_version change on overlayfs.
> > 
> > Amir's proposed patch might resolve it without commit db1d1e8b9867
> > ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> > said, it does not address the new problem introduced by it.   Assuming
> > Amir's proposed patch resolves the original problem, an alternative
> > solution would be to revert commit db1d1e8b9867.
> > 
> 
> If you're going to revert that commit, then I'm wondering what you
> intend to do instead. Reaching directly into the inode to get this
> information is really no bueno.

IMA detecting file change based on i_version has been there since IMA
was upstreamed.   Please explain why this is not a good idea.

Mimi


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 15:19                         ` Mimi Zohar
@ 2023-09-21 15:39                           ` Jeff Layton
  2023-09-21 16:06                             ` Mimi Zohar
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2023-09-21 15:39 UTC (permalink / raw)
  To: Mimi Zohar, Christian Brauner
  Cc: Casey Schaufler, Stefan Berger, syzbot, amir73il, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs

On Thu, 2023-09-21 at 11:19 -0400, Mimi Zohar wrote:
> On Thu, 2023-09-21 at 11:10 -0400, Jeff Layton wrote:
> > On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> > > On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > > > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > > > 
> > > > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > > > 
> > > > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > > > 
> > > > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > > > 
> > > > > > > > > > > > > bisection log:
> > > > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > > > of g..
> > > > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > > > final oops:
> > > > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > > > console output:
> > > > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > > > kernel config:
> > > > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > > > dashboard link:
> > > > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > > > syz repro:
> > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > > > C reproducer:
> > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > > > i_version")
> > > > > > > > > > > > > 
> > > > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > > > The final oops shows this here:
> > > > > > > > > > > > 
> > > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > > > <49> 8b
> > > > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > > Call Trace:
> > > > > > > > > > > >   <TASK>
> > > > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > > > [inline]
> > > > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > > > 
> > > > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > > > ]
> > > > > > > > > > > > 
> > > > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > > > 
> > > > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > > > 
> > > > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > > > 
> > > > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > > > patch,
> > > > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > > > intercept this case in smack.
> > > > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > > > program
> > > > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > > > 
> > > > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > > > inappropriately
> > > > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > > > friendly for stacking filesystems.
> > > > > > > > > > 
> > > > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > > > someone on the IMA team could try this approach?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > > > the issue:
> > > > > > > > > 
> > > > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > This seems to resolve the issue:
> > > > > > > > 
> > > > > > > > diff --git a/security/smack/smack_access.c
> > > > > > > > b/security/smack/smack_access.c
> > > > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > > > --- a/security/smack/smack_access.c
> > > > > > > > +++ b/security/smack/smack_access.c
> > > > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > > > *object_label, int request,
> > > > > > > >         struct smack_audit_data *sad;
> > > > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > > > 
> > > > > > > > +       if (current->flags & PF_EXITING)
> > > > > > > > +               return;
> > > > > > > > +
> > > > > > > 
> > > > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > > > but it isn't so clear what changed that introduced the problem.
> > > > > > > 
> > > > > > > >         /* check if we have to log the current event */
> > > > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > > > >                 return;
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > > > Apparently, it's this patch:
> > > > > > 
> > > > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > > > 
> > > > > Yes, the syzbot was updated with that info.
> > > > > 
> > > > > > At one time, IMA would reach directly into the inode to get the
> > > > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > > > that's causing panics at times.
> > > > > > 
> > > > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > > > seem to change which inode is being queried via getattr, it seems like
> > > > > > this is probably papering over the real bug. That said, IMA and
> > > > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > > > here.
> > > > > 
> > > > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > > > smack_inode_getattr() would be called.
> > > > 
> > > > ima_file_free()
> > > > -> ima_check_last_writer()
> > > >    -> vfs_getattr_nosec()
> > > >       -> i_op->getattr() == ovl_getattr()
> > > >          -> vfs_getattr()
> > > >             -> security_inode_getattr()
> > > > 	    -> real_i_op->getattr()
> > > > 
> > > > is the callchain that triggers this.
> > > 
> > > Thank you for the explanation as to why ovl_getattr() and subsequently
> > > smack_inode_getattr() is being called.
> > > 
> > > > 
> > > > ima_file_free() is called in a very sensitive location: __fput() that
> > > > can be called from task work when the process is already PF_EXITING.
> > > > 
> > > > The ideal solution would be for ima to stop calling back into the
> > > > filesystems in this location at all but that's probably not going to
> > > > happen because I now realize you also set extended attributes from
> > > > __fput():
> > > > 
> > > > 
> > > > ima_check_last_writer()
> > > > -> ima_update_xatt()
> > > >    -> ima_fix_xattr()
> > > >       -> __vfs_setxattr_noperm()
> > > > 
> > > > The __vfs_setxattr_noperm() codepath can itself trigger
> > > > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > > > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > > > well...
> > > > 
> > > > Imho, this is all very wild but I'm not judging.
> > > 
> > > Measuring and verifying immutable files is straight forward. 
> > > Measuring, verifiying, and updating mutable file hashes is a lot more
> > > complicated.  Re-calculating the file hash everytime the file changes
> > > would impact performance.  The file hash is currently updated as the
> > > last writer closes the file (__fput).  One of the reasons for the wq
> > > was for IMA to safely calculate the file hash and and take the i_mutex
> > > to write the xattr.
> > > 
> > > IMA support for mutable files makes IMA a lot more complicated.  Any
> > > improvement suggestions would be appreciated.
> > > 
> > > > 
> > > > Two solutions imho:
> > > > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> > > >     vfs_getattr_nosec() in their ->getattr() implementation when they
> > > >     are themselves called via vfs_getattr_nosec(). This will fix this by
> > > >     not triggering another LSM hook.
> > > > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> > > >     anything
> > > 
> > > The original problem was detecting i_version change on overlayfs.
> > > 
> > > Amir's proposed patch might resolve it without commit db1d1e8b9867
> > > ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> > > said, it does not address the new problem introduced by it.   Assuming
> > > Amir's proposed patch resolves the original problem, an alternative
> > > solution would be to revert commit db1d1e8b9867.
> > > 
> > 
> > If you're going to revert that commit, then I'm wondering what you
> > intend to do instead. Reaching directly into the inode to get this
> > information is really no bueno.
> 
> IMA detecting file change based on i_version has been there since IMA
> was upstreamed.   Please explain why this is not a good idea.
> 

Not all i_version values are managed in the same way. Network
filesystems need to pass through the value from the server, whereas with
a local filesystems the kernel needs to manage the increment.

IMA is mostly interested in local filesystems at the moment. The main
kernel-managed versions in the kernel are in btrfs, ext4 and xfs and
tmpfs. Until recently, only btrfs had one that functioned properly. Both
ext4 and xfs would also increment their values on atime updates. In
practical terms, this means that IMA ends up doing unnecessary
remeasurements after read events in some cases.

ext4 recently had its i_version value fixed to not do this, but the XFS
developers are unable to fix theirs to avoid incrementing on atime
updates. For that, I'm working on the multigrain ctime patches which
should allow XFS to fake up a STATX_CHANGE_COOKIE in its getattr
routine.

IMA has no practical way to tell what the filesystem can do if it's
groveling around inside struct inode, which is why I recommended using
vfs_getattr_nosec to grab this info. If that's problematic then by all
means, back out that patch, but you'll need to come up with some way to
deal with the different nuances of the different i_version counters in
across different filesystems.

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 15:39                           ` Jeff Layton
@ 2023-09-21 16:06                             ` Mimi Zohar
  2023-09-21 17:01                               ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Mimi Zohar @ 2023-09-21 16:06 UTC (permalink / raw)
  To: Jeff Layton, Christian Brauner
  Cc: Casey Schaufler, Stefan Berger, syzbot, amir73il, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs

On Thu, 2023-09-21 at 11:39 -0400, Jeff Layton wrote:
> On Thu, 2023-09-21 at 11:19 -0400, Mimi Zohar wrote:
> > On Thu, 2023-09-21 at 11:10 -0400, Jeff Layton wrote:
> > > On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> > > > On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > > > > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > > > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > > > > 
> > > > > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > > > > 
> > > > > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > bisection log:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > > > > of g..
> > > > > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > > > > final oops:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > > > > console output:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > > > > kernel config:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > > > > dashboard link:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > > > > syz repro:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > > > > C reproducer:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > > > > i_version")
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > > > > The final oops shows this here:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > > > > <49> 8b
> > > > > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > > > Call Trace:
> > > > > > > > > > > > >   <TASK>
> > > > > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > > > > [inline]
> > > > > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > > > > 
> > > > > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > > > > ]
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > > > > 
> > > > > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > > > > 
> > > > > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > > > > patch,
> > > > > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > > > > intercept this case in smack.
> > > > > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > > > > program
> > > > > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > > > > 
> > > > > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > > > > inappropriately
> > > > > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > > > > friendly for stacking filesystems.
> > > > > > > > > > > 
> > > > > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > > > > someone on the IMA team could try this approach?
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > > > > the issue:
> > > > > > > > > > 
> > > > > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This seems to resolve the issue:
> > > > > > > > > 
> > > > > > > > > diff --git a/security/smack/smack_access.c
> > > > > > > > > b/security/smack/smack_access.c
> > > > > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > > > > --- a/security/smack/smack_access.c
> > > > > > > > > +++ b/security/smack/smack_access.c
> > > > > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > > > > *object_label, int request,
> > > > > > > > >         struct smack_audit_data *sad;
> > > > > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > > > > 
> > > > > > > > > +       if (current->flags & PF_EXITING)
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > > 
> > > > > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > > > > but it isn't so clear what changed that introduced the problem.
> > > > > > > > 
> > > > > > > > >         /* check if we have to log the current event */
> > > > > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > > > > >                 return;
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > 
> > > > > > > Apparently, it's this patch:
> > > > > > > 
> > > > > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > > > > 
> > > > > > Yes, the syzbot was updated with that info.
> > > > > > 
> > > > > > > At one time, IMA would reach directly into the inode to get the
> > > > > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > > > > that's causing panics at times.
> > > > > > > 
> > > > > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > > > > seem to change which inode is being queried via getattr, it seems like
> > > > > > > this is probably papering over the real bug. That said, IMA and
> > > > > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > > > > here.
> > > > > > 
> > > > > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > > > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > > > > smack_inode_getattr() would be called.
> > > > > 
> > > > > ima_file_free()
> > > > > -> ima_check_last_writer()
> > > > >    -> vfs_getattr_nosec()
> > > > >       -> i_op->getattr() == ovl_getattr()
> > > > >          -> vfs_getattr()
> > > > >             -> security_inode_getattr()
> > > > > 	    -> real_i_op->getattr()
> > > > > 
> > > > > is the callchain that triggers this.
> > > > 
> > > > Thank you for the explanation as to why ovl_getattr() and subsequently
> > > > smack_inode_getattr() is being called.
> > > > 
> > > > > 
> > > > > ima_file_free() is called in a very sensitive location: __fput() that
> > > > > can be called from task work when the process is already PF_EXITING.
> > > > > 
> > > > > The ideal solution would be for ima to stop calling back into the
> > > > > filesystems in this location at all but that's probably not going to
> > > > > happen because I now realize you also set extended attributes from
> > > > > __fput():
> > > > > 
> > > > > 
> > > > > ima_check_last_writer()
> > > > > -> ima_update_xatt()
> > > > >    -> ima_fix_xattr()
> > > > >       -> __vfs_setxattr_noperm()
> > > > > 
> > > > > The __vfs_setxattr_noperm() codepath can itself trigger
> > > > > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > > > > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > > > > well...
> > > > > 
> > > > > Imho, this is all very wild but I'm not judging.
> > > > 
> > > > Measuring and verifying immutable files is straight forward. 
> > > > Measuring, verifiying, and updating mutable file hashes is a lot more
> > > > complicated.  Re-calculating the file hash everytime the file changes
> > > > would impact performance.  The file hash is currently updated as the
> > > > last writer closes the file (__fput).  One of the reasons for the wq
> > > > was for IMA to safely calculate the file hash and and take the i_mutex
> > > > to write the xattr.
> > > > 
> > > > IMA support for mutable files makes IMA a lot more complicated.  Any
> > > > improvement suggestions would be appreciated.
> > > > 
> > > > > 
> > > > > Two solutions imho:
> > > > > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> > > > >     vfs_getattr_nosec() in their ->getattr() implementation when they
> > > > >     are themselves called via vfs_getattr_nosec(). This will fix this by
> > > > >     not triggering another LSM hook.
> > > > > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> > > > >     anything
> > > > 
> > > > The original problem was detecting i_version change on overlayfs.
> > > > 
> > > > Amir's proposed patch might resolve it without commit db1d1e8b9867
> > > > ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> > > > said, it does not address the new problem introduced by it.   Assuming
> > > > Amir's proposed patch resolves the original problem, an alternative
> > > > solution would be to revert commit db1d1e8b9867.
> > > > 
> > > 
> > > If you're going to revert that commit, then I'm wondering what you
> > > intend to do instead. Reaching directly into the inode to get this
> > > information is really no bueno.
> > 
> > IMA detecting file change based on i_version has been there since IMA
> > was upstreamed.   Please explain why this is not a good idea.
> > 
> 
> Not all i_version values are managed in the same way. Network
> filesystems need to pass through the value from the server, whereas with
> a local filesystems the kernel needs to manage the increment.
> 
> IMA is mostly interested in local filesystems at the moment. The main
> kernel-managed versions in the kernel are in btrfs, ext4 and xfs and
> tmpfs. Until recently, only btrfs had one that functioned properly. Both
> ext4 and xfs would also increment their values on atime updates. In
> practical terms, this means that IMA ends up doing unnecessary
> remeasurements after read events in some cases.
> 
> ext4 recently had its i_version value fixed to not do this, but the XFS
> developers are unable to fix theirs to avoid incrementing on atime
> updates. For that, I'm working on the multigrain ctime patches which
> should allow XFS to fake up a STATX_CHANGE_COOKIE in its getattr
> routine.
> 
> IMA has no practical way to tell what the filesystem can do if it's
> groveling around inside struct inode, which is why I recommended using
> vfs_getattr_nosec to grab this info. If that's problematic then by all
> means, back out that patch, but you'll need to come up with some way to
> deal with the different nuances of the different i_version counters in
> across different filesystems.

Got it.  This is basically a performance issue, because i_version is
being updated too frequently on some filesystems.  It's not an issue of
missing measurements or not re-evaluting the file's integrity when
needed.

Let's see if Amir's patch actually fixes the original problem before
making any decisions.  (Wishing for a reproducer of the original
problem.)

Mimi


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 16:06                             ` Mimi Zohar
@ 2023-09-21 17:01                               ` Amir Goldstein
  2023-09-26 14:40                                 ` Mimi Zohar
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2023-09-21 17:01 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Jeff Layton, Christian Brauner, Casey Schaufler, Stefan Berger,
	syzbot, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Thu, Sep 21, 2023 at 7:31 PM Mimi Zohar <zohar@linux.ibm.com> wrote:
>
> On Thu, 2023-09-21 at 11:39 -0400, Jeff Layton wrote:
> > On Thu, 2023-09-21 at 11:19 -0400, Mimi Zohar wrote:
> > > On Thu, 2023-09-21 at 11:10 -0400, Jeff Layton wrote:
> > > > On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> > > > > On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > > > > > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > > > > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > > > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > > > > >
> > > > > > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bisection log:
> > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > > > > > of g..
> > > > > > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > > > > > final oops:
> > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > > > > > console output:
> > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > > > > > kernel config:
> > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > > > > > dashboard link:
> > > > > > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > > > > > syz repro:
> > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > > > > > C reproducer:
> > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > > > > > i_version")
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > > > > > The final oops shows this here:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > > > > > <49> 8b
> > > > > > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > > > > Call Trace:
> > > > > > > > > > > > > >   <TASK>
> > > > > > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > > > > > [inline]
> > > > > > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > > > > > ]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > > > > > patch,
> > > > > > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > > > > > intercept this case in smack.
> > > > > > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > > > > > program
> > > > > > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > > > > >
> > > > > > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > > > > > inappropriately
> > > > > > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > > > > > friendly for stacking filesystems.
> > > > > > > > > > > >
> > > > > > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > > > > > someone on the IMA team could try this approach?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > > > > > the issue:
> > > > > > > > > > >
> > > > > > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This seems to resolve the issue:
> > > > > > > > > >
> > > > > > > > > > diff --git a/security/smack/smack_access.c
> > > > > > > > > > b/security/smack/smack_access.c
> > > > > > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > > > > > --- a/security/smack/smack_access.c
> > > > > > > > > > +++ b/security/smack/smack_access.c
> > > > > > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > > > > > *object_label, int request,
> > > > > > > > > >         struct smack_audit_data *sad;
> > > > > > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > > > > >
> > > > > > > > > > +       if (current->flags & PF_EXITING)
> > > > > > > > > > +               return;
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > > > > > but it isn't so clear what changed that introduced the problem.
> > > > > > > > >
> > > > > > > > > >         /* check if we have to log the current event */
> > > > > > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > > > > > >                 return;
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > > Apparently, it's this patch:
> > > > > > > >
> > > > > > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > > > > >
> > > > > > > Yes, the syzbot was updated with that info.
> > > > > > >
> > > > > > > > At one time, IMA would reach directly into the inode to get the
> > > > > > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > > > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > > > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > > > > > that's causing panics at times.
> > > > > > > >
> > > > > > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > > > > > seem to change which inode is being queried via getattr, it seems like
> > > > > > > > this is probably papering over the real bug. That said, IMA and
> > > > > > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > > > > > here.
> > > > > > >
> > > > > > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > > > > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > > > > > smack_inode_getattr() would be called.
> > > > > >
> > > > > > ima_file_free()
> > > > > > -> ima_check_last_writer()
> > > > > >    -> vfs_getattr_nosec()
> > > > > >       -> i_op->getattr() == ovl_getattr()
> > > > > >          -> vfs_getattr()
> > > > > >             -> security_inode_getattr()
> > > > > >           -> real_i_op->getattr()
> > > > > >
> > > > > > is the callchain that triggers this.
> > > > >
> > > > > Thank you for the explanation as to why ovl_getattr() and subsequently
> > > > > smack_inode_getattr() is being called.
> > > > >
> > > > > >
> > > > > > ima_file_free() is called in a very sensitive location: __fput() that
> > > > > > can be called from task work when the process is already PF_EXITING.
> > > > > >
> > > > > > The ideal solution would be for ima to stop calling back into the
> > > > > > filesystems in this location at all but that's probably not going to
> > > > > > happen because I now realize you also set extended attributes from
> > > > > > __fput():
> > > > > >
> > > > > >
> > > > > > ima_check_last_writer()
> > > > > > -> ima_update_xatt()
> > > > > >    -> ima_fix_xattr()
> > > > > >       -> __vfs_setxattr_noperm()
> > > > > >
> > > > > > The __vfs_setxattr_noperm() codepath can itself trigger
> > > > > > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > > > > > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > > > > > well...
> > > > > >
> > > > > > Imho, this is all very wild but I'm not judging.
> > > > >
> > > > > Measuring and verifying immutable files is straight forward.
> > > > > Measuring, verifiying, and updating mutable file hashes is a lot more
> > > > > complicated.  Re-calculating the file hash everytime the file changes
> > > > > would impact performance.  The file hash is currently updated as the
> > > > > last writer closes the file (__fput).  One of the reasons for the wq
> > > > > was for IMA to safely calculate the file hash and and take the i_mutex
> > > > > to write the xattr.
> > > > >
> > > > > IMA support for mutable files makes IMA a lot more complicated.  Any
> > > > > improvement suggestions would be appreciated.
> > > > >
> > > > > >
> > > > > > Two solutions imho:
> > > > > > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> > > > > >     vfs_getattr_nosec() in their ->getattr() implementation when they
> > > > > >     are themselves called via vfs_getattr_nosec(). This will fix this by
> > > > > >     not triggering another LSM hook.
> > > > > > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> > > > > >     anything
> > > > >
> > > > > The original problem was detecting i_version change on overlayfs.
> > > > >
> > > > > Amir's proposed patch might resolve it without commit db1d1e8b9867
> > > > > ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> > > > > said, it does not address the new problem introduced by it.   Assuming
> > > > > Amir's proposed patch resolves the original problem, an alternative
> > > > > solution would be to revert commit db1d1e8b9867.
> > > > >
> > > >
> > > > If you're going to revert that commit, then I'm wondering what you
> > > > intend to do instead. Reaching directly into the inode to get this
> > > > information is really no bueno.
> > >
> > > IMA detecting file change based on i_version has been there since IMA
> > > was upstreamed.   Please explain why this is not a good idea.
> > >
> >
> > Not all i_version values are managed in the same way. Network
> > filesystems need to pass through the value from the server, whereas with
> > a local filesystems the kernel needs to manage the increment.
> >
> > IMA is mostly interested in local filesystems at the moment. The main
> > kernel-managed versions in the kernel are in btrfs, ext4 and xfs and
> > tmpfs. Until recently, only btrfs had one that functioned properly. Both
> > ext4 and xfs would also increment their values on atime updates. In
> > practical terms, this means that IMA ends up doing unnecessary
> > remeasurements after read events in some cases.
> >
> > ext4 recently had its i_version value fixed to not do this, but the XFS
> > developers are unable to fix theirs to avoid incrementing on atime
> > updates. For that, I'm working on the multigrain ctime patches which
> > should allow XFS to fake up a STATX_CHANGE_COOKIE in its getattr
> > routine.
> >
> > IMA has no practical way to tell what the filesystem can do if it's
> > groveling around inside struct inode, which is why I recommended using
> > vfs_getattr_nosec to grab this info. If that's problematic then by all
> > means, back out that patch, but you'll need to come up with some way to
> > deal with the different nuances of the different i_version counters in
> > across different filesystems.
>
> Got it.  This is basically a performance issue, because i_version is
> being updated too frequently on some filesystems.  It's not an issue of
> missing measurements or not re-evaluting the file's integrity when
> needed.
>
> Let's see if Amir's patch actually fixes the original problem before
> making any decisions.  (Wishing for a reproducer of the original
> problem.)
>

Confused. What is the "original problem"?
I never claimed that my patch fixes the "original problem".
I claimed [1] that my patch fixes a problem that existed before
db1d1e8b9867, but db1d1e8b9867 added two more instances
of that bug (wrong dereference of file->f_path).
Apparently, db1d1e8b9867 introduced another bug.

It looks like you should revert db1d1e8b9867, but regardless,
I recommend that you apply my patch. My patch conflicts
with the revert but the conflict is trivial - the two hunks that
fix the new vfs_getattr_nosec() calls are irrelevant - the rest are.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-unionfs/20230913073755.3489676-1-amir73il@gmail.com/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 17:01                               ` Amir Goldstein
@ 2023-09-26 14:40                                 ` Mimi Zohar
  2023-09-29  4:30                                   ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Mimi Zohar @ 2023-09-26 14:40 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jeff Layton, Christian Brauner, Casey Schaufler, Stefan Berger,
	syzbot, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Thu, 2023-09-21 at 20:01 +0300, Amir Goldstein wrote:
> On Thu, Sep 21, 2023 at 7:31 PM Mimi Zohar <zohar@linux.ibm.com> wrote:
> >
> > On Thu, 2023-09-21 at 11:39 -0400, Jeff Layton wrote:
> > > On Thu, 2023-09-21 at 11:19 -0400, Mimi Zohar wrote:
> > > > On Thu, 2023-09-21 at 11:10 -0400, Jeff Layton wrote:
> > > > > On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> > > > > > On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > > > > > > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > > > > > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > > > > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > > > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > > > > > > Author: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bisection log:
> > > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > > > > > > of g..
> > > > > > > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > > > > > > final oops:
> > > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > > > > > > console output:
> > > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > > > > > > kernel config:
> > > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > > > > > > dashboard link:
> > > > > > > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > > > > > > syz repro:
> > > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > > > > > > C reproducer:
> > > > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> > > > > > > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > > > > > > i_version")
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > > > > > > The final oops shows this here:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > > > > > > <49> 8b
> > > > > > > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > > > > > Call Trace:
> > > > > > > > > > > > > > >   <TASK>
> > > > > > > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > > > > > > [inline]
> > > > > > > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > > > > > > ]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > > > > > > patch,
> > > > > > > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > > > > > > intercept this case in smack.
> > > > > > > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > > > > > > program
> > > > > > > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > > > > > > inappropriately
> > > > > > > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > > > > > > friendly for stacking filesystems.
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > > > > > > someone on the IMA team could try this approach?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > > > > > > the issue:
> > > > > > > > > > > >
> > > > > > > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This seems to resolve the issue:
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/security/smack/smack_access.c
> > > > > > > > > > > b/security/smack/smack_access.c
> > > > > > > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > > > > > > --- a/security/smack/smack_access.c
> > > > > > > > > > > +++ b/security/smack/smack_access.c
> > > > > > > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > > > > > > *object_label, int request,
> > > > > > > > > > >         struct smack_audit_data *sad;
> > > > > > > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > > > > > >
> > > > > > > > > > > +       if (current->flags & PF_EXITING)
> > > > > > > > > > > +               return;
> > > > > > > > > > > +
> > > > > > > > > >
> > > > > > > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > > > > > > but it isn't so clear what changed that introduced the problem.
> > > > > > > > > >
> > > > > > > > > > >         /* check if we have to log the current event */
> > > > > > > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > > > > > > >                 return;
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Apparently, it's this patch:
> > > > > > > > >
> > > > > > > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > >
> > > > > > > > Yes, the syzbot was updated with that info.
> > > > > > > >
> > > > > > > > > At one time, IMA would reach directly into the inode to get the
> > > > > > > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > > > > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > > > > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > > > > > > that's causing panics at times.
> > > > > > > > >
> > > > > > > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > > > > > > seem to change which inode is being queried via getattr, it seems like
> > > > > > > > > this is probably papering over the real bug. That said, IMA and
> > > > > > > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > > > > > > here.
> > > > > > > >
> > > > > > > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > > > > > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > > > > > > smack_inode_getattr() would be called.
> > > > > > >
> > > > > > > ima_file_free()
> > > > > > > -> ima_check_last_writer()
> > > > > > >    -> vfs_getattr_nosec()
> > > > > > >       -> i_op->getattr() == ovl_getattr()
> > > > > > >          -> vfs_getattr()
> > > > > > >             -> security_inode_getattr()
> > > > > > >           -> real_i_op->getattr()
> > > > > > >
> > > > > > > is the callchain that triggers this.
> > > > > >
> > > > > > Thank you for the explanation as to why ovl_getattr() and subsequently
> > > > > > smack_inode_getattr() is being called.
> > > > > >
> > > > > > >
> > > > > > > ima_file_free() is called in a very sensitive location: __fput() that
> > > > > > > can be called from task work when the process is already PF_EXITING.
> > > > > > >
> > > > > > > The ideal solution would be for ima to stop calling back into the
> > > > > > > filesystems in this location at all but that's probably not going to
> > > > > > > happen because I now realize you also set extended attributes from
> > > > > > > __fput():
> > > > > > >
> > > > > > >
> > > > > > > ima_check_last_writer()
> > > > > > > -> ima_update_xatt()
> > > > > > >    -> ima_fix_xattr()
> > > > > > >       -> __vfs_setxattr_noperm()
> > > > > > >
> > > > > > > The __vfs_setxattr_noperm() codepath can itself trigger
> > > > > > > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > > > > > > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > > > > > > well...
> > > > > > >
> > > > > > > Imho, this is all very wild but I'm not judging.
> > > > > >
> > > > > > Measuring and verifying immutable files is straight forward.
> > > > > > Measuring, verifiying, and updating mutable file hashes is a lot more
> > > > > > complicated.  Re-calculating the file hash everytime the file changes
> > > > > > would impact performance.  The file hash is currently updated as the
> > > > > > last writer closes the file (__fput).  One of the reasons for the wq
> > > > > > was for IMA to safely calculate the file hash and and take the i_mutex
> > > > > > to write the xattr.
> > > > > >
> > > > > > IMA support for mutable files makes IMA a lot more complicated.  Any
> > > > > > improvement suggestions would be appreciated.
> > > > > >
> > > > > > >
> > > > > > > Two solutions imho:
> > > > > > > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> > > > > > >     vfs_getattr_nosec() in their ->getattr() implementation when they
> > > > > > >     are themselves called via vfs_getattr_nosec(). This will fix this by
> > > > > > >     not triggering another LSM hook.
> > > > > > > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> > > > > > >     anything
> > > > > >
> > > > > > The original problem was detecting i_version change on overlayfs.
> > > > > >
> > > > > > Amir's proposed patch might resolve it without commit db1d1e8b9867
> > > > > > ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> > > > > > said, it does not address the new problem introduced by it.   Assuming
> > > > > > Amir's proposed patch resolves the original problem, an alternative
> > > > > > solution would be to revert commit db1d1e8b9867.
> > > > > >
> > > > >
> > > > > If you're going to revert that commit, then I'm wondering what you
> > > > > intend to do instead. Reaching directly into the inode to get this
> > > > > information is really no bueno.
> > > >
> > > > IMA detecting file change based on i_version has been there since IMA
> > > > was upstreamed.   Please explain why this is not a good idea.
> > > >
> > >
> > > Not all i_version values are managed in the same way. Network
> > > filesystems need to pass through the value from the server, whereas with
> > > a local filesystems the kernel needs to manage the increment.
> > >
> > > IMA is mostly interested in local filesystems at the moment. The main
> > > kernel-managed versions in the kernel are in btrfs, ext4 and xfs and
> > > tmpfs. Until recently, only btrfs had one that functioned properly. Both
> > > ext4 and xfs would also increment their values on atime updates. In
> > > practical terms, this means that IMA ends up doing unnecessary
> > > remeasurements after read events in some cases.
> > >
> > > ext4 recently had its i_version value fixed to not do this, but the XFS
> > > developers are unable to fix theirs to avoid incrementing on atime
> > > updates. For that, I'm working on the multigrain ctime patches which
> > > should allow XFS to fake up a STATX_CHANGE_COOKIE in its getattr
> > > routine.
> > >
> > > IMA has no practical way to tell what the filesystem can do if it's
> > > groveling around inside struct inode, which is why I recommended using
> > > vfs_getattr_nosec to grab this info. If that's problematic then by all
> > > means, back out that patch, but you'll need to come up with some way to
> > > deal with the different nuances of the different i_version counters in
> > > across different filesystems.
> >
> > Got it.  This is basically a performance issue, because i_version is
> > being updated too frequently on some filesystems.  It's not an issue of
> > missing measurements or not re-evaluting the file's integrity when
> > needed.
> >
> > Let's see if Amir's patch actually fixes the original problem before
> > making any decisions.  (Wishing for a reproducer of the original
> > problem.)
> >
> 
> Confused. What is the "original problem"?
> I never claimed that my patch fixes the "original problem".
> I claimed [1] that my patch fixes a problem that existed before
> db1d1e8b9867, but db1d1e8b9867 added two more instances
> of that bug (wrong dereference of file->f_path).
> Apparently, db1d1e8b9867 introduced another bug.
> 
> It looks like you should revert db1d1e8b9867, but regardless,
> I recommend that you apply my patch. My patch conflicts
> with the revert but the conflict is trivial - the two hunks that
> fix the new vfs_getattr_nosec() calls are irrelevant - the rest are.
> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/linux-unionfs/20230913073755.3489676-1-amir73il@gmail.com/

Here are some of the issues with IMA/Overlay:

1. False positive syzbot IMA/overlay lockdep warnings.
2, Not detecting file change with squashfs + overlay.
3. Changes to the backing file are not detected by overlay (when
backing file is not in policy).

Commit db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
upstreamed to address 2, but has become unnecessary due to other
changes.  According to Stefan, the problem subsequently was resolved
without either commit db1d1e8b9867 or 18b44bc5a672.  (Kernel was not
bi-sected to find bug resolution.)

Commit 18b44bc5a672 ("ovl: Always reevaluate the file signature for
IMA") to address 3.

[PATCH] "ima: fix wrong dereferences of file->f_path" is probably
correct.  Does it address any syzbot reports?

-- 
thanks,

Mimi


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-21 11:48                   ` Christian Brauner
  2023-09-21 14:29                     ` Stefan Berger
  2023-09-21 14:52                     ` Mimi Zohar
@ 2023-09-28 22:03                     ` Stefan Berger
  2023-09-29  4:25                       ` Amir Goldstein
  2 siblings, 1 reply; 35+ messages in thread
From: Stefan Berger @ 2023-09-28 22:03 UTC (permalink / raw)
  To: Christian Brauner, Mimi Zohar
  Cc: Jeff Layton, Casey Schaufler, syzbot, amir73il, linux-fsdevel,
	linux-integrity, linux-kernel, linux-unionfs, miklos,
	syzkaller-bugs


On 9/21/23 07:48, Christian Brauner wrote:
>
> Imho, this is all very wild but I'm not judging.
>
> Two solutions imho:
> (1) teach stacking filesystems like overlayfs and ecryptfs to use
>      vfs_getattr_nosec() in their ->getattr() implementation when they
>      are themselves called via vfs_getattr_nosec(). This will fix this by
>      not triggering another LSM hook.

This somewhat lengthy patch I think would be a solution for (1). I don't 
think the Fixes tag is correct but IMO it should propagate farther back, 
if possible.


 From 01467f6e879c4cd757abb4d79cb18bf11150bed8 Mon Sep 17 00:00:00 2001
From: Stefan Berger <stefanb@linux.ibm.com>
Date: Thu, 28 Sep 2023 14:42:39 -0400
Subject: [PATCH] fs: Enable GETATTR_NOSEC parameter for getattr interface
  function

When vfs_getattr_nosec() calls a filesystem's getattr interface function
then the 'nosec' should propagate into this function so that
vfs_getattr_nosec() can again be called from the filesystem's gettattr
rather than vfs_getattr(). The latter would add unnecessary security
checks that the initial vfs_getattr_nosec() call wanted to avoid.
Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass
with the new getattr_flags parameter to the getattr interface function.
In overlayfs and ecryptfs use this flag to determine which one of the
two functions to call.

In a recent code change introduced to IMA vfs_getattr_nosec() ended up
calling vfs_getattr() in overlayfs, which in turn called
security_inode_getattr() on an exiting process that did not have
current->fs set anymore, which then caused a kernel NULL pointer
dereference. With this change the call to security_inode_getattr() can
be avoided, thus avoiding the NULL pointer dereference.

Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>

---

simple_getattr has been adjusted and all files returned by the following
grep have been adjusted as well.

  grep -rEI "^[[:space:]]+\.getattr" ./ | \
    grep -v simple_getattr  | \
    cut -d ":" -f1 | sort | uniq
---
  fs/9p/vfs_inode.c             |  3 ++-
  fs/9p/vfs_inode_dotl.c        |  3 ++-
  fs/afs/inode.c                |  3 ++-
  fs/afs/internal.h             |  2 +-
  fs/bad_inode.c                |  3 ++-
  fs/btrfs/inode.c              |  3 ++-
  fs/ceph/inode.c               |  9 ++++++---
  fs/ceph/super.h               |  3 ++-
  fs/coda/coda_linux.h          |  2 +-
  fs/coda/inode.c               |  3 ++-
  fs/ecryptfs/inode.c           | 14 ++++++++++----
  fs/erofs/inode.c              |  2 +-
  fs/erofs/internal.h           |  2 +-
  fs/exfat/exfat_fs.h           |  2 +-
  fs/exfat/file.c               |  2 +-
  fs/ext2/ext2.h                |  2 +-
  fs/ext2/inode.c               |  3 ++-
  fs/ext4/ext4.h                |  6 ++++--
  fs/ext4/inode.c               |  9 ++++++---
  fs/ext4/symlink.c             |  6 ++++--
  fs/f2fs/f2fs.h                |  3 ++-
  fs/f2fs/file.c                |  3 ++-
  fs/f2fs/namei.c               |  6 ++++--
  fs/fat/fat.h                  |  3 ++-
  fs/fat/file.c                 |  3 ++-
  fs/fuse/dir.c                 |  3 ++-
  fs/gfs2/inode.c               |  4 +++-
  fs/hfsplus/hfsplus_fs.h       |  2 +-
  fs/hfsplus/inode.c            |  2 +-
  fs/kernfs/inode.c             |  3 ++-
  fs/kernfs/kernfs-internal.h   |  3 ++-
  fs/libfs.c                    |  5 +++--
  fs/minix/inode.c              |  3 ++-
  fs/minix/minix.h              |  3 ++-
  fs/nfs/inode.c                |  3 ++-
  fs/nfs/namespace.c            |  5 +++--
  fs/ntfs3/file.c               |  3 ++-
  fs/ntfs3/ntfs_fs.h            |  3 ++-
  fs/ocfs2/file.c               |  3 ++-
  fs/ocfs2/file.h               |  3 ++-
  fs/orangefs/inode.c           |  3 ++-
  fs/orangefs/orangefs-kernel.h |  3 ++-
  fs/overlayfs/inode.c          |  8 ++++++--
  fs/overlayfs/overlayfs.h      |  3 ++-
  fs/proc/base.c                |  6 ++++--
  fs/proc/fd.c                  |  3 ++-
  fs/proc/generic.c             |  3 ++-
  fs/proc/internal.h            |  3 ++-
  fs/proc/proc_net.c            |  3 ++-
  fs/proc/proc_sysctl.c         |  3 ++-
  fs/proc/root.c                |  3 ++-
  fs/smb/client/cifsfs.h        |  3 ++-
  fs/smb/client/inode.c         |  3 ++-
  fs/stat.c                     |  3 ++-
  fs/sysv/itree.c               |  3 ++-
  fs/sysv/sysv.h                |  2 +-
  fs/ubifs/dir.c                |  3 ++-
  fs/ubifs/file.c               |  6 ++++--
  fs/ubifs/ubifs.h              |  3 ++-
  fs/udf/symlink.c              |  3 ++-
  fs/vboxsf/utils.c             |  3 ++-
  fs/vboxsf/vfsmod.h            |  2 +-
  fs/xfs/xfs_iops.c             |  3 ++-
  include/linux/fs.h            | 10 ++++++++--
  include/linux/nfs_fs.h        |  2 +-
  mm/shmem.c                    |  3 ++-
  66 files changed, 159 insertions(+), 82 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 0d28ecf668d0..9c5a7e653bb1 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1000,7 +1000,8 @@ v9fs_vfs_rename(struct mnt_idmap *idmap, struct 
inode *old_dir,

  static int
  v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int flags)
+         struct kstat *stat, u32 request_mask, unsigned int flags,
+         unsigned int getattr_flags)
  {
      struct dentry *dentry = path->dentry;
      struct inode *inode = d_inode(dentry);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 1312f68965ac..e4238ee243bf 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -439,7 +439,8 @@ static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
  static int
  v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
                const struct path *path, struct kstat *stat,
-              u32 request_mask, unsigned int flags)
+              u32 request_mask, unsigned int flags,
+              unsigned int getattr_flags)
  {
      struct dentry *dentry = path->dentry;
      struct v9fs_session_info *v9ses;
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 1c794a1896aa..8763e6126a8c 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -750,7 +750,8 @@ int afs_validate(struct afs_vnode *vnode, struct key 
*key)
   * read the attributes of an inode
   */
  int afs_getattr(struct mnt_idmap *idmap, const struct path *path,
-        struct kstat *stat, u32 request_mask, unsigned int query_flags)
+        struct kstat *stat, u32 request_mask, unsigned int query_flags,
+        unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct afs_vnode *vnode = AFS_FS_I(inode);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index da73b97e19a9..b8dfb6232086 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1175,7 +1175,7 @@ extern bool afs_check_validity(struct afs_vnode *);
  extern int afs_validate(struct afs_vnode *, struct key *);
  bool afs_pagecache_valid(struct afs_vnode *);
  extern int afs_getattr(struct mnt_idmap *idmap, const struct path *,
-               struct kstat *, u32, unsigned int);
+               struct kstat *, u32, unsigned int, unsigned int);
  extern int afs_setattr(struct mnt_idmap *idmap, struct dentry *, 
struct iattr *);
  extern void afs_evict_inode(struct inode *);
  extern int afs_drop_inode(struct inode *);
diff --git a/fs/bad_inode.c b/fs/bad_inode.c
index 83f9566c973b..22219161382d 100644
--- a/fs/bad_inode.c
+++ b/fs/bad_inode.c
@@ -97,7 +97,8 @@ static int bad_inode_permission(struct mnt_idmap *idmap,

  static int bad_inode_getattr(struct mnt_idmap *idmap,
                   const struct path *path, struct kstat *stat,
-                 u32 request_mask, unsigned int query_flags)
+                 u32 request_mask, unsigned int query_flags,
+                 unsigned int getattr_flags)
  {
      return -EIO;
  }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7814b9d654ce..bc9fbaa42b93 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8624,7 +8624,8 @@ int __init btrfs_init_cachep(void)

  static int btrfs_getattr(struct mnt_idmap *idmap,
               const struct path *path, struct kstat *stat,
-             u32 request_mask, unsigned int flags)
+             u32 request_mask, unsigned int flags,
+             unsigned int getattr_flags)
  {
      u64 delalloc_bytes;
      u64 inode_bytes;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 800ab7920513..a798f0a7238f 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2255,11 +2255,13 @@ static const char 
*ceph_encrypted_get_link(struct dentry *dentry,
  static int ceph_encrypted_symlink_getattr(struct mnt_idmap *idmap,
                        const struct path *path,
                        struct kstat *stat, u32 request_mask,
-                      unsigned int query_flags)
+                      unsigned int query_flags,
+                      unsigned int getattr_flags)
  {
      int ret;

-    ret = ceph_getattr(idmap, path, stat, request_mask, query_flags);
+    ret = ceph_getattr(idmap, path, stat, request_mask, query_flags,
+               getattr_flags);
      if (ret)
          return ret;
      return fscrypt_symlink_getattr(path, stat);
@@ -2960,7 +2962,8 @@ static int statx_to_caps(u32 want, umode_t mode)
   * then we can avoid talking to the MDS at all.
   */
  int ceph_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int flags)
+         struct kstat *stat, u32 request_mask, unsigned int flags,
+         unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct super_block *sb = inode->i_sb;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 51c7f2b14f6f..f472cefd21bd 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1100,7 +1100,8 @@ extern int ceph_setattr(struct mnt_idmap *idmap,
              struct dentry *dentry, struct iattr *attr);
  extern int ceph_getattr(struct mnt_idmap *idmap,
              const struct path *path, struct kstat *stat,
-            u32 request_mask, unsigned int flags);
+            u32 request_mask, unsigned int flags,
+            unsigned int getattr_flags);
  void ceph_inode_shutdown(struct inode *inode);

  static inline bool ceph_inode_is_shutdown(struct inode *inode)
diff --git a/fs/coda/coda_linux.h b/fs/coda/coda_linux.h
index dd6277d87afb..ec6e30cbb35f 100644
--- a/fs/coda/coda_linux.h
+++ b/fs/coda/coda_linux.h
@@ -50,7 +50,7 @@ int coda_permission(struct mnt_idmap *idmap, struct 
inode *inode,
              int mask);
  int coda_revalidate_inode(struct inode *);
  int coda_getattr(struct mnt_idmap *, const struct path *, struct kstat *,
-         u32, unsigned int);
+         u32, unsigned int, unsigned int);
  int coda_setattr(struct mnt_idmap *, struct dentry *, struct iattr *);

  /* this file:  helpers */
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 0c7c2528791e..52465963c455 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -252,7 +252,8 @@ static void coda_evict_inode(struct inode *inode)
  }

  int coda_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int flags)
+         struct kstat *stat, u32 request_mask, unsigned int flags,
+         unsigned int getattr_flags)
  {
      int err = coda_revalidate_inode(d_inode(path->dentry));
      if (!err)
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 992d9c7e64ae..31173a4534d2 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -974,7 +974,8 @@ static int ecryptfs_setattr(struct mnt_idmap *idmap,

  static int ecryptfs_getattr_link(struct mnt_idmap *idmap,
                   const struct path *path, struct kstat *stat,
-                 u32 request_mask, unsigned int flags)
+                 u32 request_mask, unsigned int flags,
+                 unsigned int getattr_flags)
  {
      struct dentry *dentry = path->dentry;
      struct ecryptfs_mount_crypt_stat *mount_crypt_stat;
@@ -1000,14 +1001,19 @@ static int ecryptfs_getattr_link(struct 
mnt_idmap *idmap,

  static int ecryptfs_getattr(struct mnt_idmap *idmap,
                  const struct path *path, struct kstat *stat,
-                u32 request_mask, unsigned int flags)
+                u32 request_mask, unsigned int flags,
+                unsigned int getattr_flags)
  {
      struct dentry *dentry = path->dentry;
      struct kstat lower_stat;
      int rc;

-    rc = vfs_getattr(ecryptfs_dentry_to_lower_path(dentry), &lower_stat,
-             request_mask, flags);
+    if (getattr_flags & GETATTR_NOSEC)
+        rc = vfs_getattr_nosec(ecryptfs_dentry_to_lower_path(dentry),
+                       &lower_stat, request_mask, flags);
+    else
+        rc = vfs_getattr(ecryptfs_dentry_to_lower_path(dentry),
+                 &lower_stat, request_mask, flags);
      if (!rc) {
          fsstack_copy_attr_all(d_inode(dentry),
                        ecryptfs_inode_to_lower(d_inode(dentry)));
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index edc8ec7581b8..f9ac4c001c17 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -358,7 +358,7 @@ struct inode *erofs_iget(struct super_block *sb, 
erofs_nid_t nid)

  int erofs_getattr(struct mnt_idmap *idmap, const struct path *path,
            struct kstat *stat, u32 request_mask,
-          unsigned int query_flags)
+          unsigned int query_flags, unsigned int getattr_flags)
  {
      struct inode *const inode = d_inode(path->dentry);

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4ff88d0dd980..16fbb39d670f 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -422,7 +422,7 @@ int erofs_map_blocks(struct inode *inode, struct 
erofs_map_blocks *map);
  struct inode *erofs_iget(struct super_block *sb, erofs_nid_t nid);
  int erofs_getattr(struct mnt_idmap *idmap, const struct path *path,
            struct kstat *stat, u32 request_mask,
-          unsigned int query_flags);
+          unsigned int query_flags, unsigned int getattr_flags);
  int erofs_namei(struct inode *dir, const struct qstr *name,
          erofs_nid_t *nid, unsigned int *d_type);

diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index f55498e5c23d..5e86c62de12f 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -454,7 +454,7 @@ int exfat_setattr(struct mnt_idmap *idmap, struct 
dentry *dentry,
            struct iattr *attr);
  int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
            struct kstat *stat, unsigned int request_mask,
-          unsigned int query_flags);
+          unsigned int query_flags, unsigned int getattr_flags);
  int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int 
datasync);
  long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
  long exfat_compat_ioctl(struct file *filp, unsigned int cmd,
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 32395ef686a2..06cb318bfa0a 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -227,7 +227,7 @@ void exfat_truncate(struct inode *inode)

  int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
            struct kstat *stat, unsigned int request_mask,
-          unsigned int query_flags)
+          unsigned int query_flags, unsigned int getattr_flags)
  {
      struct inode *inode = d_backing_inode(path->dentry);
      struct exfat_inode_info *ei = EXFAT_I(inode);
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 7fdd685c384d..a2b460786d21 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -748,7 +748,7 @@ void ext2_write_failed(struct address_space 
*mapping, loff_t to);
  extern int ext2_get_block(struct inode *, sector_t, struct buffer_head 
*, int);
  extern int ext2_setattr (struct mnt_idmap *, struct dentry *, struct 
iattr *);
  extern int ext2_getattr (struct mnt_idmap *, const struct path *,
-             struct kstat *, u32, unsigned int);
+             struct kstat *, u32, unsigned int, unsigned int);
  extern void ext2_set_inode_flags(struct inode *inode);
  extern int ext2_fiemap(struct inode *inode, struct fiemap_extent_info 
*fieinfo,
                 u64 start, u64 len);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 314b415ee518..153738e254ac 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1611,7 +1611,8 @@ int ext2_write_inode(struct inode *inode, struct 
writeback_control *wbc)
  }

  int ext2_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int query_flags)
+         struct kstat *stat, u32 request_mask, unsigned int query_flags,
+         unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct ext2_inode_info *ei = EXT2_I(inode);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9418359b1d9d..90b1e08bd89a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2977,11 +2977,13 @@ extern int  ext4_setattr(struct mnt_idmap *, 
struct dentry *,
               struct iattr *);
  extern u32  ext4_dio_alignment(struct inode *inode);
  extern int  ext4_getattr(struct mnt_idmap *, const struct path *,
-             struct kstat *, u32, unsigned int);
+             struct kstat *, u32, unsigned int,
+             unsigned int);
  extern void ext4_evict_inode(struct inode *);
  extern void ext4_clear_inode(struct inode *);
  extern int  ext4_file_getattr(struct mnt_idmap *, const struct path *,
-                  struct kstat *, u32, unsigned int);
+                  struct kstat *, u32, unsigned int,
+                  unsigned int);
  extern void ext4_dirty_inode(struct inode *, int);
  extern int ext4_change_inode_journal_flag(struct inode *, int);
  extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4ce35f1c8b0a..ede71d313519 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5515,7 +5515,8 @@ u32 ext4_dio_alignment(struct inode *inode)
  }

  int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int query_flags)
+         struct kstat *stat, u32 request_mask, unsigned int query_flags,
+         unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct ext4_inode *raw_inode;
@@ -5577,12 +5578,14 @@ int ext4_getattr(struct mnt_idmap *idmap, const 
struct path *path,

  int ext4_file_getattr(struct mnt_idmap *idmap,
                const struct path *path, struct kstat *stat,
-              u32 request_mask, unsigned int query_flags)
+              u32 request_mask, unsigned int query_flags,
+              unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      u64 delalloc_blocks;

-    ext4_getattr(idmap, path, stat, request_mask, query_flags);
+    ext4_getattr(idmap, path, stat, request_mask, query_flags,
+             getattr_flags);

      /*
       * If there is inline data in the inode, the inode will normally not
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 75bf1f88843c..abc30ebb2be2 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -58,9 +58,11 @@ static const char *ext4_encrypted_get_link(struct 
dentry *dentry,
  static int ext4_encrypted_symlink_getattr(struct mnt_idmap *idmap,
                        const struct path *path,
                        struct kstat *stat, u32 request_mask,
-                      unsigned int query_flags)
+                      unsigned int query_flags,
+                      unsigned int getattr_flags)
  {
-    ext4_getattr(idmap, path, stat, request_mask, query_flags);
+    ext4_getattr(idmap, path, stat, request_mask, query_flags,
+             getattr_flags);

      return fscrypt_symlink_getattr(path, stat);
  }
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6d688e42d89c..0a8436cd2f5f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3477,7 +3477,8 @@ int f2fs_do_truncate_blocks(struct inode *inode, 
u64 from, bool lock);
  int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock);
  int f2fs_truncate(struct inode *inode);
  int f2fs_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int flags);
+         struct kstat *stat, u32 request_mask, unsigned int flags,
+         unsigned int getattr_flags);
  int f2fs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
           struct iattr *attr);
  int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t 
pg_end);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ca5904129b16..e05bdd318aef 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -832,7 +832,8 @@ static bool f2fs_force_buffered_io(struct inode 
*inode, int rw)
  }

  int f2fs_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int query_flags)
+         struct kstat *stat, u32 request_mask, unsigned int query_flags,
+         unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct f2fs_inode_info *fi = F2FS_I(inode);
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 193b22a2d6bf..cfdcdeb2c0b0 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -1356,9 +1356,11 @@ static const char *f2fs_encrypted_get_link(struct 
dentry *dentry,
  static int f2fs_encrypted_symlink_getattr(struct mnt_idmap *idmap,
                        const struct path *path,
                        struct kstat *stat, u32 request_mask,
-                      unsigned int query_flags)
+                      unsigned int query_flags,
+                      unsigned int getattr_flags)
  {
-    f2fs_getattr(idmap, path, stat, request_mask, query_flags);
+    f2fs_getattr(idmap, path, stat, request_mask, query_flags,
+             getattr_flags);

      return fscrypt_symlink_getattr(path, stat);
  }
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 66cf4778cf3b..74b57f2e1c36 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -403,7 +403,8 @@ extern int fat_setattr(struct mnt_idmap *idmap, 
struct dentry *dentry,
  extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
  extern int fat_getattr(struct mnt_idmap *idmap,
                 const struct path *path, struct kstat *stat,
-               u32 request_mask, unsigned int flags);
+               u32 request_mask, unsigned int flags,
+               unsigned int getattr_flags);
  extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
                int datasync);

diff --git a/fs/fat/file.c b/fs/fat/file.c
index e887e9ab7472..2177784f54a2 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -396,7 +396,8 @@ void fat_truncate_blocks(struct inode *inode, loff_t 
offset)
  }

  int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
-        struct kstat *stat, u32 request_mask, unsigned int flags)
+        struct kstat *stat, u32 request_mask, unsigned int flags,
+        unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb);
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d707e6987da9..3531d4239635 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -2069,7 +2069,8 @@ static int fuse_setattr(struct mnt_idmap *idmap, 
struct dentry *entry,

  static int fuse_getattr(struct mnt_idmap *idmap,
              const struct path *path, struct kstat *stat,
-            u32 request_mask, unsigned int flags)
+            u32 request_mask, unsigned int flags,
+            unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct fuse_conn *fc = get_fuse_conn(inode);
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 0eac04507904..50383ec30573 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2038,6 +2038,7 @@ static int gfs2_setattr(struct mnt_idmap *idmap,
   * @stat: The inode's stats
   * @request_mask: Mask of STATX_xxx flags indicating the caller's 
interests
   * @flags: AT_STATX_xxx setting
+ * @getattr_flags: GETATTR_xxx
   *
   * This may be called from the VFS directly, or from within GFS2 with the
   * inode locked, so we look to see if the glock is already locked and only
@@ -2050,7 +2051,8 @@ static int gfs2_setattr(struct mnt_idmap *idmap,

  static int gfs2_getattr(struct mnt_idmap *idmap,
              const struct path *path, struct kstat *stat,
-            u32 request_mask, unsigned int flags)
+            u32 request_mask, unsigned int flags,
+            unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct gfs2_inode *ip = GFS2_I(inode);
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index 7ededcb720c1..3da3a79c9742 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -483,7 +483,7 @@ int hfsplus_cat_read_inode(struct inode *inode, 
struct hfs_find_data *fd);
  int hfsplus_cat_write_inode(struct inode *inode);
  int hfsplus_getattr(struct mnt_idmap *idmap, const struct path *path,
              struct kstat *stat, u32 request_mask,
-            unsigned int query_flags);
+            unsigned int query_flags, unsigned int getattr_flags);
  int hfsplus_file_fsync(struct file *file, loff_t start, loff_t end,
                 int datasync);
  int hfsplus_fileattr_get(struct dentry *dentry, struct fileattr *fa);
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index c65c8c4b03dd..afa7e8ee8cb2 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -278,7 +278,7 @@ static int hfsplus_setattr(struct mnt_idmap *idmap,

  int hfsplus_getattr(struct mnt_idmap *idmap, const struct path *path,
              struct kstat *stat, u32 request_mask,
-            unsigned int query_flags)
+            unsigned int query_flags, unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 922719a343a7..92af4b394274 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -182,7 +182,8 @@ static void kernfs_refresh_inode(struct kernfs_node 
*kn, struct inode *inode)

  int kernfs_iop_getattr(struct mnt_idmap *idmap,
                 const struct path *path, struct kstat *stat,
-               u32 request_mask, unsigned int query_flags)
+               u32 request_mask, unsigned int query_flags,
+               unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct kernfs_node *kn = inode->i_private;
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index a9b854cdfdb5..ccf74e08105c 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -135,7 +135,8 @@ int kernfs_iop_setattr(struct mnt_idmap *idmap, 
struct dentry *dentry,
                 struct iattr *iattr);
  int kernfs_iop_getattr(struct mnt_idmap *idmap,
                 const struct path *path, struct kstat *stat,
-               u32 request_mask, unsigned int query_flags);
+               u32 request_mask, unsigned int query_flags,
+               unsigned int getattr_flags);
  ssize_t kernfs_iop_listxattr(struct dentry *dentry, char *buf, size_t 
size);
  int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr);

diff --git a/fs/libfs.c b/fs/libfs.c
index 37f2d34ee090..757d24b8f4be 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -30,7 +30,7 @@

  int simple_getattr(struct mnt_idmap *idmap, const struct path *path,
             struct kstat *stat, u32 request_mask,
-           unsigned int query_flags)
+           unsigned int query_flags, unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
@@ -1579,7 +1579,8 @@ static struct dentry *empty_dir_lookup(struct 
inode *dir, struct dentry *dentry,

  static int empty_dir_getattr(struct mnt_idmap *idmap,
                   const struct path *path, struct kstat *stat,
-                 u32 request_mask, unsigned int query_flags)
+                 u32 request_mask, unsigned int query_flags,
+                 unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index df575473c1cc..74ae2c33f5a5 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -651,7 +651,8 @@ static int minix_write_inode(struct inode *inode, 
struct writeback_control *wbc)
  }

  int minix_getattr(struct mnt_idmap *idmap, const struct path *path,
-          struct kstat *stat, u32 request_mask, unsigned int flags)
+          struct kstat *stat, u32 request_mask, unsigned int flags,
+          unsigned int getattr_flags)
  {
      struct super_block *sb = path->dentry->d_sb;
      struct inode *inode = d_inode(path->dentry);
diff --git a/fs/minix/minix.h b/fs/minix/minix.h
index d493507c064f..6ac55d3b649c 100644
--- a/fs/minix/minix.h
+++ b/fs/minix/minix.h
@@ -52,7 +52,8 @@ extern int minix_new_block(struct inode * inode);
  extern void minix_free_block(struct inode *inode, unsigned long block);
  extern unsigned long minix_count_free_blocks(struct super_block *sb);
  extern int minix_getattr(struct mnt_idmap *, const struct path *,
-             struct kstat *, u32, unsigned int);
+             struct kstat *, u32, unsigned int,
+             unsigned int);
  extern int minix_prepare_chunk(struct page *page, loff_t pos, unsigned 
len);

  extern void V1_minix_truncate(struct inode *);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e21c073158e5..c5cad2515b37 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -831,7 +831,8 @@ static u32 nfs_get_valid_attrmask(struct inode *inode)
  }

  int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
-        struct kstat *stat, u32 request_mask, unsigned int query_flags)
+        struct kstat *stat, u32 request_mask, unsigned int query_flags,
+        unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct nfs_server *server = NFS_SERVER(inode);
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index e7494cdd957e..d4c3622ff4ed 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -210,11 +210,12 @@ struct vfsmount *nfs_d_automount(struct path *path)
  static int
  nfs_namespace_getattr(struct mnt_idmap *idmap,
                const struct path *path, struct kstat *stat,
-              u32 request_mask, unsigned int query_flags)
+              u32 request_mask, unsigned int query_flags,
+              unsigned int getattr_flags)
  {
      if (NFS_FH(d_inode(path->dentry))->size != 0)
          return nfs_getattr(idmap, path, stat, request_mask,
-                   query_flags);
+                   query_flags, getattr_flags);
      generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
               stat);
      return 0;
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 962f12ce6c0a..729ca3c09958 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -72,7 +72,8 @@ static long ntfs_compat_ioctl(struct file *filp, u32 
cmd, unsigned long arg)
   * ntfs_getattr - inode_operations::getattr
   */
  int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, u32 flags)
+         struct kstat *stat, u32 request_mask, u32 flags,
+         unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct ntfs_inode *ni = ntfs_i(inode);
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
index 629403ede6e5..3c83352883f2 100644
--- a/fs/ntfs3/ntfs_fs.h
+++ b/fs/ntfs3/ntfs_fs.h
@@ -492,7 +492,8 @@ extern const struct file_operations ntfs_dir_operations;

  /* Globals from file.c */
  int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, u32 flags);
+         struct kstat *stat, u32 request_mask, u32 flags,
+         unsigned int getattr_flags);
  int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
            struct iattr *attr);
  void ntfs_sparse_cluster(struct inode *inode, struct page *page0, CLST 
vcn,
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c45596c25c66..7e367cd801d3 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1300,7 +1300,8 @@ int ocfs2_setattr(struct mnt_idmap *idmap, struct 
dentry *dentry,
  }

  int ocfs2_getattr(struct mnt_idmap *idmap, const struct path *path,
-          struct kstat *stat, u32 request_mask, unsigned int flags)
+          struct kstat *stat, u32 request_mask, unsigned int flags,
+          unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct super_block *sb = path->dentry->d_sb;
diff --git a/fs/ocfs2/file.h b/fs/ocfs2/file.h
index 8e53e4ac1120..2dbf4edd94da 100644
--- a/fs/ocfs2/file.h
+++ b/fs/ocfs2/file.h
@@ -52,7 +52,8 @@ int ocfs2_zero_extend(struct inode *inode, struct 
buffer_head *di_bh,
  int ocfs2_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
            struct iattr *attr);
  int ocfs2_getattr(struct mnt_idmap *idmap, const struct path *path,
-          struct kstat *stat, u32 request_mask, unsigned int flags);
+          struct kstat *stat, u32 request_mask, unsigned int flags,
+          unsigned int getattr_flags);
  int ocfs2_permission(struct mnt_idmap *idmap,
               struct inode *inode,
               int mask);
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 085912268442..f7c2c318d392 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -859,7 +859,8 @@ int orangefs_setattr(struct mnt_idmap *idmap, struct 
dentry *dentry,
   * Obtain attributes of an object given a dentry
   */
  int orangefs_getattr(struct mnt_idmap *idmap, const struct path *path,
-             struct kstat *stat, u32 request_mask, unsigned int flags)
+             struct kstat *stat, u32 request_mask, unsigned int flags,
+             unsigned int getattr_flags)
  {
      int ret;
      struct inode *inode = path->dentry->d_inode;
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index b711654ca18a..f0682b51e5e0 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -365,7 +365,8 @@ int __orangefs_setattr_mode(struct dentry *dentry, 
struct iattr *iattr);
  int orangefs_setattr(struct mnt_idmap *, struct dentry *, struct iattr *);

  int orangefs_getattr(struct mnt_idmap *idmap, const struct path *path,
-             struct kstat *stat, u32 request_mask, unsigned int flags);
+             struct kstat *stat, u32 request_mask, unsigned int flags,
+             unsigned int getattr_flags);

  int orangefs_permission(struct mnt_idmap *idmap,
              struct inode *inode, int mask);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 83ef66644c21..8c0f3e125d09 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -155,7 +155,8 @@ static void ovl_map_dev_ino(struct dentry *dentry, 
struct kstat *stat, int fsid)
  }

  int ovl_getattr(struct mnt_idmap *idmap, const struct path *path,
-        struct kstat *stat, u32 request_mask, unsigned int flags)
+        struct kstat *stat, u32 request_mask, unsigned int flags,
+        unsigned int getattr_flags)
  {
      struct dentry *dentry = path->dentry;
      enum ovl_path_type type;
@@ -171,7 +172,10 @@ int ovl_getattr(struct mnt_idmap *idmap, const 
struct path *path,

      type = ovl_path_real(dentry, &realpath);
      old_cred = ovl_override_creds(dentry->d_sb);
-    err = vfs_getattr(&realpath, stat, request_mask, flags);
+    if (getattr_flags & GETATTR_NOSEC)
+        err = vfs_getattr_nosec(&realpath, stat, request_mask, flags);
+    else
+        err = vfs_getattr(&realpath, stat, request_mask, flags);
      if (err)
          goto out;

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 9817b2dcb132..6f91bfb1bbbd 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -687,7 +687,8 @@ unsigned int ovl_get_nlink(struct ovl_fs *ofs, 
struct dentry *lowerdentry,
  int ovl_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
          struct iattr *attr);
  int ovl_getattr(struct mnt_idmap *idmap, const struct path *path,
-        struct kstat *stat, u32 request_mask, unsigned int flags);
+        struct kstat *stat, u32 request_mask, unsigned int flags,
+        unsigned int getattr_flags);
  int ovl_permission(struct mnt_idmap *idmap, struct inode *inode,
             int mask);
  int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const 
char *name,
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ffd54617c354..3e733cd17fba 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1960,7 +1960,8 @@ static struct inode 
*proc_pid_make_base_inode(struct super_block *sb,
  }

  int pid_getattr(struct mnt_idmap *idmap, const struct path *path,
-        struct kstat *stat, u32 request_mask, unsigned int query_flags)
+        struct kstat *stat, u32 request_mask, unsigned int query_flags,
+        unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb);
@@ -3896,7 +3897,8 @@ static int proc_task_readdir(struct file *file, 
struct dir_context *ctx)

  static int proc_task_getattr(struct mnt_idmap *idmap,
                   const struct path *path, struct kstat *stat,
-                 u32 request_mask, unsigned int query_flags)
+                 u32 request_mask, unsigned int query_flags,
+                 unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct task_struct *p = get_proc_task(inode);
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 6276b3938842..1ea9eea9953d 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -347,7 +347,8 @@ int proc_fd_permission(struct mnt_idmap *idmap,

  static int proc_fd_getattr(struct mnt_idmap *idmap,
              const struct path *path, struct kstat *stat,
-            u32 request_mask, unsigned int query_flags)
+            u32 request_mask, unsigned int query_flags,
+            unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      int rv = 0;
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 775ce0bcf08c..4acb07536329 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -135,7 +135,8 @@ static int proc_notify_change(struct mnt_idmap *idmap,

  static int proc_getattr(struct mnt_idmap *idmap,
              const struct path *path, struct kstat *stat,
-            u32 request_mask, unsigned int query_flags)
+            u32 request_mask, unsigned int query_flags,
+            unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct proc_dir_entry *de = PDE(inode);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 9a8f32f21ff5..c3bc8eee4771 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -163,7 +163,8 @@ extern int proc_pid_statm(struct seq_file *, struct 
pid_namespace *,
   */
  extern const struct dentry_operations pid_dentry_operations;
  extern int pid_getattr(struct mnt_idmap *, const struct path *,
-               struct kstat *, u32, unsigned int);
+               struct kstat *, u32, unsigned int,
+               unsigned int);
  extern int proc_setattr(struct mnt_idmap *, struct dentry *,
              struct iattr *);
  extern void proc_pid_evict_inode(struct proc_inode *);
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 2ba31b6d68c0..e76019d1fe3b 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -301,7 +301,8 @@ static struct dentry *proc_tgid_net_lookup(struct 
inode *dir,

  static int proc_tgid_net_getattr(struct mnt_idmap *idmap,
                   const struct path *path, struct kstat *stat,
-                 u32 request_mask, unsigned int query_flags)
+                 u32 request_mask, unsigned int query_flags,
+                 unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct net *net;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c88854df0b62..3dc86f02f64e 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -842,7 +842,8 @@ static int proc_sys_setattr(struct mnt_idmap *idmap,

  static int proc_sys_getattr(struct mnt_idmap *idmap,
                  const struct path *path, struct kstat *stat,
-                u32 request_mask, unsigned int query_flags)
+                u32 request_mask, unsigned int query_flags,
+                unsigned int getattr_flags)
  {
      struct inode *inode = d_inode(path->dentry);
      struct ctl_table_header *head = grab_header(inode);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 9191248f2dac..ab9113fc119d 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -312,7 +312,8 @@ void __init proc_root_init(void)

  static int proc_root_getattr(struct mnt_idmap *idmap,
                   const struct path *path, struct kstat *stat,
-                 u32 request_mask, unsigned int query_flags)
+                 u32 request_mask, unsigned int query_flags,
+                 unsigned int getattr_flags)
  {
      generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
               stat);
diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index 41daebd220ff..453a3bd7ba65 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -73,7 +73,8 @@ extern int cifs_invalidate_mapping(struct inode *inode);
  extern int cifs_revalidate_mapping(struct inode *inode);
  extern int cifs_zap_mapping(struct inode *inode);
  extern int cifs_getattr(struct mnt_idmap *, const struct path *,
-            struct kstat *, u32, unsigned int);
+            struct kstat *, u32, unsigned int,
+            unsigned int);
  extern int cifs_setattr(struct mnt_idmap *, struct dentry *,
              struct iattr *);
  extern int cifs_fiemap(struct inode *, struct fiemap_extent_info *, 
u64 start,
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index d7c302442c1e..1aa1bd0c85fd 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -2570,7 +2570,8 @@ int cifs_revalidate_dentry(struct dentry *dentry)
  }

  int cifs_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int flags)
+         struct kstat *stat, u32 request_mask, unsigned int flags,
+         unsigned int getattr_flags)
  {
      struct dentry *dentry = path->dentry;
      struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
diff --git a/fs/stat.c b/fs/stat.c
index d43a5cc1bfa4..fc4f6c0a91a8 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -133,7 +133,8 @@ int vfs_getattr_nosec(const struct path *path, 
struct kstat *stat,
      idmap = mnt_idmap(path->mnt);
      if (inode->i_op->getattr)
          return inode->i_op->getattr(idmap, path, stat,
-                        request_mask, query_flags);
+                        request_mask, query_flags,
+                        GETATTR_NOSEC);

      generic_fillattr(idmap, request_mask, inode, stat);
      return 0;
diff --git a/fs/sysv/itree.c b/fs/sysv/itree.c
index edb94e55de8e..3ff96577f0db 100644
--- a/fs/sysv/itree.c
+++ b/fs/sysv/itree.c
@@ -446,7 +446,8 @@ static unsigned sysv_nblocks(struct super_block *s, 
loff_t size)
  }

  int sysv_getattr(struct mnt_idmap *idmap, const struct path *path,
-         struct kstat *stat, u32 request_mask, unsigned int flags)
+         struct kstat *stat, u32 request_mask, unsigned int flags,
+         unsigned int getattr_flags)
  {
      struct super_block *s = path->dentry->d_sb;
      generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
diff --git a/fs/sysv/sysv.h b/fs/sysv/sysv.h
index e3f988b469ee..1e6e1e2faad4 100644
--- a/fs/sysv/sysv.h
+++ b/fs/sysv/sysv.h
@@ -142,7 +142,7 @@ extern int sysv_write_inode(struct inode *, struct 
writeback_control *wbc);
  extern int sysv_sync_inode(struct inode *);
  extern void sysv_set_inode(struct inode *, dev_t);
  extern int sysv_getattr(struct mnt_idmap *, const struct path *,
-            struct kstat *, u32, unsigned int);
+            struct kstat *, u32, unsigned int, unsigned int);
  extern int sysv_init_icache(void);
  extern void sysv_destroy_icache(void);

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 2f48c58d47cd..30fe9f64292c 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1632,7 +1632,8 @@ static int ubifs_rename(struct mnt_idmap *idmap,
  }

  int ubifs_getattr(struct mnt_idmap *idmap, const struct path *path,
-          struct kstat *stat, u32 request_mask, unsigned int flags)
+          struct kstat *stat, u32 request_mask, unsigned int flags,
+          unsigned int getattr_flags)
  {
      loff_t size;
      struct inode *inode = d_inode(path->dentry);
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index e5382f0b2587..ef0dda82adfc 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1622,9 +1622,11 @@ static const char *ubifs_get_link(struct dentry 
*dentry,

  static int ubifs_symlink_getattr(struct mnt_idmap *idmap,
                   const struct path *path, struct kstat *stat,
-                 u32 request_mask, unsigned int query_flags)
+                 u32 request_mask, unsigned int query_flags,
+                 unsigned int getattr_flags)
  {
-    ubifs_getattr(idmap, path, stat, request_mask, query_flags);
+    ubifs_getattr(idmap, path, stat, request_mask, query_flags,
+              getattr_flags);

      if (IS_ENCRYPTED(d_inode(path->dentry)))
          return fscrypt_symlink_getattr(path, stat);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index ebb3ad6b5e7e..bf1429096176 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -2033,7 +2033,8 @@ int ubifs_update_time(struct inode *inode, int flags);
  struct inode *ubifs_new_inode(struct ubifs_info *c, struct inode *dir,
                    umode_t mode, bool is_xattr);
  int ubifs_getattr(struct mnt_idmap *idmap, const struct path *path,
-          struct kstat *stat, u32 request_mask, unsigned int flags);
+          struct kstat *stat, u32 request_mask, unsigned int flags,
+          unsigned int getattr_flags);
  int ubifs_check_dir_empty(struct inode *dir);

  /* xattr.c */
diff --git a/fs/udf/symlink.c b/fs/udf/symlink.c
index f7eaf7b14594..743f5f59e94c 100644
--- a/fs/udf/symlink.c
+++ b/fs/udf/symlink.c
@@ -143,7 +143,8 @@ static int udf_symlink_filler(struct file *file, 
struct folio *folio)

  static int udf_symlink_getattr(struct mnt_idmap *idmap,
                     const struct path *path, struct kstat *stat,
-                   u32 request_mask, unsigned int flags)
+                   u32 request_mask, unsigned int flags,
+                   unsigned int getattr_flags)
  {
      struct dentry *dentry = path->dentry;
      struct inode *inode = d_backing_inode(dentry);
diff --git a/fs/vboxsf/utils.c b/fs/vboxsf/utils.c
index 83f20dd15522..8bbc31c99ede 100644
--- a/fs/vboxsf/utils.c
+++ b/fs/vboxsf/utils.c
@@ -232,7 +232,8 @@ int vboxsf_inode_revalidate(struct dentry *dentry)
  }

  int vboxsf_getattr(struct mnt_idmap *idmap, const struct path *path,
-           struct kstat *kstat, u32 request_mask, unsigned int flags)
+           struct kstat *kstat, u32 request_mask, unsigned int flags,
+           unsigned int getattr_flags)
  {
      int err;
      struct dentry *dentry = path->dentry;
diff --git a/fs/vboxsf/vfsmod.h b/fs/vboxsf/vfsmod.h
index 05973eb89d52..8ec1edc0cbe6 100644
--- a/fs/vboxsf/vfsmod.h
+++ b/fs/vboxsf/vfsmod.h
@@ -99,7 +99,7 @@ int vboxsf_stat_dentry(struct dentry *dentry, struct 
shfl_fsobjinfo *info);
  int vboxsf_inode_revalidate(struct dentry *dentry);
  int vboxsf_getattr(struct mnt_idmap *idmap, const struct path *path,
             struct kstat *kstat, u32 request_mask,
-           unsigned int query_flags);
+           unsigned int query_flags, unsigned int getattr_flags);
  int vboxsf_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
             struct iattr *iattr);
  struct shfl_string *vboxsf_path_from_dentry(struct vboxsf_sbi *sbi,
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1c1e6171209d..d70ab9d791a9 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -552,7 +552,8 @@ xfs_vn_getattr(
      const struct path    *path,
      struct kstat        *stat,
      u32            request_mask,
-    unsigned int        query_flags)
+    unsigned int        query_flags,
+    unsigned int        getattr_flags)
  {
      struct inode        *inode = d_inode(path->dentry);
      struct xfs_inode    *ip = XFS_I(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b528f063e8ff..f2ec5a48e5dc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1924,7 +1924,7 @@ struct inode_operations {
              struct inode *, struct dentry *, unsigned int);
      int (*setattr) (struct mnt_idmap *, struct dentry *, struct iattr *);
      int (*getattr) (struct mnt_idmap *, const struct path *,
-            struct kstat *, u32, unsigned int);
+            struct kstat *, u32, unsigned int, unsigned int);
      ssize_t (*listxattr) (struct dentry *, char *, size_t);
      int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
                u64 len);
@@ -2098,6 +2098,11 @@ static inline bool sb_rdonly(const struct 
super_block *sb) { return sb->s_flags
  #define IS_WHITEOUT(inode)    (S_ISCHR(inode->i_mode) && \
                   (inode)->i_rdev == WHITEOUT_DEV)

+/*
+ * getattr flags
+ */
+#define GETATTR_NOSEC    (1 << 0)
+
  static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap,
                     struct inode *inode)
  {
@@ -3066,7 +3071,8 @@ extern int dcache_readdir(struct file *, struct 
dir_context *);
  extern int simple_setattr(struct mnt_idmap *, struct dentry *,
                struct iattr *);
  extern int simple_getattr(struct mnt_idmap *, const struct path *,
-              struct kstat *, u32, unsigned int);
+              struct kstat *, u32, unsigned int,
+              unsigned int);
  extern int simple_statfs(struct dentry *, struct kstatfs *);
  extern int simple_open(struct inode *inode, struct file *file);
  extern int simple_link(struct dentry *, struct inode *, struct dentry *);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 279262057a92..f6cecdbe11ca 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -421,7 +421,7 @@ extern int nfs_post_op_update_inode(struct inode 
*inode, struct nfs_fattr *fattr
  extern int nfs_post_op_update_inode_force_wcc(struct inode *inode, 
struct nfs_fattr *fattr);
  extern int nfs_post_op_update_inode_force_wcc_locked(struct inode 
*inode, struct nfs_fattr *fattr);
  extern int nfs_getattr(struct mnt_idmap *, const struct path *,
-               struct kstat *, u32, unsigned int);
+               struct kstat *, u32, unsigned int, unsigned int);
  extern void nfs_access_add_cache(struct inode *, struct 
nfs_access_entry *, const struct cred *);
  extern void nfs_access_set_mask(struct nfs_access_entry *, u32);
  extern int nfs_permission(struct mnt_idmap *, struct inode *, int);
diff --git a/mm/shmem.c b/mm/shmem.c
index 69595d341882..ff43a0d248bb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1119,7 +1119,8 @@ EXPORT_SYMBOL_GPL(shmem_truncate_range);

  static int shmem_getattr(struct mnt_idmap *idmap,
               const struct path *path, struct kstat *stat,
-             u32 request_mask, unsigned int query_flags)
+             u32 request_mask, unsigned int query_flags,
+             unsigned int getattr_flags)
  {
      struct inode *inode = path->dentry->d_inode;
      struct shmem_inode_info *info = SHMEM_I(inode);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-28 22:03                     ` Stefan Berger
@ 2023-09-29  4:25                       ` Amir Goldstein
  2023-09-29 12:39                         ` Stefan Berger
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2023-09-29  4:25 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Christian Brauner, Mimi Zohar, Jeff Layton, Casey Schaufler,
	syzbot, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Fri, Sep 29, 2023 at 3:02 AM Stefan Berger <stefanb@linux.ibm.com> wrote:
>
>
> On 9/21/23 07:48, Christian Brauner wrote:
> >
> > Imho, this is all very wild but I'm not judging.
> >
> > Two solutions imho:
> > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> >      vfs_getattr_nosec() in their ->getattr() implementation when they
> >      are themselves called via vfs_getattr_nosec(). This will fix this by
> >      not triggering another LSM hook.
>
> This somewhat lengthy patch I think would be a solution for (1). I don't
> think the Fixes tag is correct but IMO it should propagate farther back,
> if possible.
>
>
>  From 01467f6e879c4cd757abb4d79cb18bf11150bed8 Mon Sep 17 00:00:00 2001
> From: Stefan Berger <stefanb@linux.ibm.com>
> Date: Thu, 28 Sep 2023 14:42:39 -0400
> Subject: [PATCH] fs: Enable GETATTR_NOSEC parameter for getattr interface
>   function
>
> When vfs_getattr_nosec() calls a filesystem's getattr interface function
> then the 'nosec' should propagate into this function so that
> vfs_getattr_nosec() can again be called from the filesystem's gettattr
> rather than vfs_getattr(). The latter would add unnecessary security
> checks that the initial vfs_getattr_nosec() call wanted to avoid.
> Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass
> with the new getattr_flags parameter to the getattr interface function.
> In overlayfs and ecryptfs use this flag to determine which one of the
> two functions to call.
>
> In a recent code change introduced to IMA vfs_getattr_nosec() ended up
> calling vfs_getattr() in overlayfs, which in turn called
> security_inode_getattr() on an exiting process that did not have
> current->fs set anymore, which then caused a kernel NULL pointer
> dereference. With this change the call to security_inode_getattr() can
> be avoided, thus avoiding the NULL pointer dereference.
>
> Reported-by: syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com
> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
>
> ---
>
> simple_getattr has been adjusted and all files returned by the following
> grep have been adjusted as well.
>
>   grep -rEI "^[[:space:]]+\.getattr" ./ | \
>     grep -v simple_getattr  | \
>     cut -d ":" -f1 | sort | uniq
> ---
>   fs/9p/vfs_inode.c             |  3 ++-
>   fs/9p/vfs_inode_dotl.c        |  3 ++-
>   fs/afs/inode.c                |  3 ++-
>   fs/afs/internal.h             |  2 +-
>   fs/bad_inode.c                |  3 ++-
>   fs/btrfs/inode.c              |  3 ++-
>   fs/ceph/inode.c               |  9 ++++++---
>   fs/ceph/super.h               |  3 ++-
>   fs/coda/coda_linux.h          |  2 +-
>   fs/coda/inode.c               |  3 ++-
>   fs/ecryptfs/inode.c           | 14 ++++++++++----
>   fs/erofs/inode.c              |  2 +-
>   fs/erofs/internal.h           |  2 +-
>   fs/exfat/exfat_fs.h           |  2 +-
>   fs/exfat/file.c               |  2 +-
>   fs/ext2/ext2.h                |  2 +-
>   fs/ext2/inode.c               |  3 ++-
>   fs/ext4/ext4.h                |  6 ++++--
>   fs/ext4/inode.c               |  9 ++++++---
>   fs/ext4/symlink.c             |  6 ++++--
>   fs/f2fs/f2fs.h                |  3 ++-
>   fs/f2fs/file.c                |  3 ++-
>   fs/f2fs/namei.c               |  6 ++++--
>   fs/fat/fat.h                  |  3 ++-
>   fs/fat/file.c                 |  3 ++-
>   fs/fuse/dir.c                 |  3 ++-
>   fs/gfs2/inode.c               |  4 +++-
>   fs/hfsplus/hfsplus_fs.h       |  2 +-
>   fs/hfsplus/inode.c            |  2 +-
>   fs/kernfs/inode.c             |  3 ++-
>   fs/kernfs/kernfs-internal.h   |  3 ++-
>   fs/libfs.c                    |  5 +++--
>   fs/minix/inode.c              |  3 ++-
>   fs/minix/minix.h              |  3 ++-
>   fs/nfs/inode.c                |  3 ++-
>   fs/nfs/namespace.c            |  5 +++--
>   fs/ntfs3/file.c               |  3 ++-
>   fs/ntfs3/ntfs_fs.h            |  3 ++-
>   fs/ocfs2/file.c               |  3 ++-
>   fs/ocfs2/file.h               |  3 ++-
>   fs/orangefs/inode.c           |  3 ++-
>   fs/orangefs/orangefs-kernel.h |  3 ++-
>   fs/overlayfs/inode.c          |  8 ++++++--
>   fs/overlayfs/overlayfs.h      |  3 ++-
>   fs/proc/base.c                |  6 ++++--
>   fs/proc/fd.c                  |  3 ++-
>   fs/proc/generic.c             |  3 ++-
>   fs/proc/internal.h            |  3 ++-
>   fs/proc/proc_net.c            |  3 ++-
>   fs/proc/proc_sysctl.c         |  3 ++-
>   fs/proc/root.c                |  3 ++-
>   fs/smb/client/cifsfs.h        |  3 ++-
>   fs/smb/client/inode.c         |  3 ++-
>   fs/stat.c                     |  3 ++-
>   fs/sysv/itree.c               |  3 ++-
>   fs/sysv/sysv.h                |  2 +-
>   fs/ubifs/dir.c                |  3 ++-
>   fs/ubifs/file.c               |  6 ++++--
>   fs/ubifs/ubifs.h              |  3 ++-
>   fs/udf/symlink.c              |  3 ++-
>   fs/vboxsf/utils.c             |  3 ++-
>   fs/vboxsf/vfsmod.h            |  2 +-
>   fs/xfs/xfs_iops.c             |  3 ++-
>   include/linux/fs.h            | 10 ++++++++--
>   include/linux/nfs_fs.h        |  2 +-
>   mm/shmem.c                    |  3 ++-
>   66 files changed, 159 insertions(+), 82 deletions(-)
>

You can avoid all this churn.
Just use the existing query_flags arg.
Nothing outside the AT_STATX_SYNC_TYPE query_flags is
passed into filesystems from userspace.

Mast out AT_STATX_SYNC_TYPE in vfs_getattr()
And allow kernel internal request_flags in vfs_getattr_nosec()

The AT_ flag namespace is already a challenge, but mixing user
flags and kernel-only flags in vfs interfaces has been done before.

...

> @@ -171,7 +172,10 @@ int ovl_getattr(struct mnt_idmap *idmap, const
> struct path *path,
>
>       type = ovl_path_real(dentry, &realpath);
>       old_cred = ovl_override_creds(dentry->d_sb);
> -    err = vfs_getattr(&realpath, stat, request_mask, flags);
> +    if (getattr_flags & GETATTR_NOSEC)
> +        err = vfs_getattr_nosec(&realpath, stat, request_mask, flags);
> +    else
> +        err = vfs_getattr(&realpath, stat, request_mask, flags);
>       if (err)
>           goto out;
>

There are two more vfs_getattr() calls in this function that you missed.

Please create ovl_do_getattr() inline wrapper in overlayfs.h next to all
the other ovl_do_ wrappers and use it here.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-26 14:40                                 ` Mimi Zohar
@ 2023-09-29  4:30                                   ` Amir Goldstein
  0 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2023-09-29  4:30 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Jeff Layton, Christian Brauner, Casey Schaufler, Stefan Berger,
	syzbot, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Tue, Sep 26, 2023 at 5:40 PM Mimi Zohar <zohar@linux.ibm.com> wrote:
>
> On Thu, 2023-09-21 at 20:01 +0300, Amir Goldstein wrote:
> > On Thu, Sep 21, 2023 at 7:31 PM Mimi Zohar <zohar@linux.ibm.com> wrote:
> > >
...
> > > Let's see if Amir's patch actually fixes the original problem before
> > > making any decisions.  (Wishing for a reproducer of the original
> > > problem.)
> > >
> >
> > Confused. What is the "original problem"?
> > I never claimed that my patch fixes the "original problem".
> > I claimed [1] that my patch fixes a problem that existed before
> > db1d1e8b9867, but db1d1e8b9867 added two more instances
> > of that bug (wrong dereference of file->f_path).
> > Apparently, db1d1e8b9867 introduced another bug.
> >
> > It looks like you should revert db1d1e8b9867, but regardless,
> > I recommend that you apply my patch. My patch conflicts
> > with the revert but the conflict is trivial - the two hunks that
> > fix the new vfs_getattr_nosec() calls are irrelevant - the rest are.
> >
> > Thanks,
> > Amir.
> >
> > [1] https://lore.kernel.org/linux-unionfs/20230913073755.3489676-1-amir73il@gmail.com/
>
> Here are some of the issues with IMA/Overlay:
>
> 1. False positive syzbot IMA/overlay lockdep warnings.
> 2, Not detecting file change with squashfs + overlay.
> 3. Changes to the backing file are not detected by overlay (when
> backing file is not in policy).
>
> Commit db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
> upstreamed to address 2, but has become unnecessary due to other
> changes.  According to Stefan, the problem subsequently was resolved
> without either commit db1d1e8b9867 or 18b44bc5a672.  (Kernel was not
> bi-sected to find bug resolution.)
>
> Commit 18b44bc5a672 ("ovl: Always reevaluate the file signature for
> IMA") to address 3.
>
> [PATCH] "ima: fix wrong dereferences of file->f_path" is probably
> correct.  Does it address any syzbot reports?
>

Not that I know of.

Mimi,

I am going to change my recommendation to -
Please wait with applying my patch unless you know that it
fixes a known bug, because:

1. I don't have a complete picture of ovl+IMA
2. I didn't find any specific test case to prove the bug
3. I have a plan to get rid of the file_real_path() anomaly

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-29  4:25                       ` Amir Goldstein
@ 2023-09-29 12:39                         ` Stefan Berger
  2023-10-01 14:51                           ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Berger @ 2023-09-29 12:39 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Mimi Zohar, Jeff Layton, Casey Schaufler,
	syzbot, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs


On 9/29/23 00:25, Amir Goldstein wrote:
> On Fri, Sep 29, 2023 at 3:02 AM Stefan Berger <stefanb@linux.ibm.com> wrote:
>>
>> On 9/21/23 07:48, Christian Brauner wrote:
>>> Imho, this is all very wild but I'm not judging.
>>>
>>> Two solutions imho:
>>> (1) teach stacking filesystems like overlayfs and ecryptfs to use
>>>       vfs_getattr_nosec() in their ->getattr() implementation when they
>>>       are themselves called via vfs_getattr_nosec(). This will fix this by
>>>       not triggering another LSM hook.
>>
>> You can avoid all this churn.
>> Just use the existing query_flags arg.
>> Nothing outside the AT_STATX_SYNC_TYPE query_flags is
>> passed into filesystems from userspace.
>>
>> Mast out AT_STATX_SYNC_TYPE in vfs_getattr()
>> And allow kernel internal request_flags in vfs_getattr_nosec()
Hm, I thought that vfs_getattr_nosec needs to pass AT_GETATTR_NOSEC into 
->getattr().
>>
>> The AT_ flag namespace is already a challenge, but mixing user
>> flags and kernel-only flags in vfs interfaces has been done before.
>>
>> ...


That's what I wanted to avoid since now all filesystems' getattr() may 
have the AT_GETATTR_NOSEC mixed into the query_flags.

Anyway, here's what I currently have:

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 992d9c7e64ae..f7b5b1843dcc 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -998,16 +998,28 @@ static int ecryptfs_getattr_link(struct mnt_idmap 
*idmap,
         return rc;
  }

+static int ecryptfs_do_getattr(bool nosec, const struct path *path,
+                              struct kstat *stat, u32 request_mask,
+                              unsigned int flags)
+{
+       if (nosec)
+               return vfs_getattr_nosec(path, stat, request_mask, flags);
+       return vfs_getattr(path, stat, request_mask, flags);
+}
+
  static int ecryptfs_getattr(struct mnt_idmap *idmap,
                             const struct path *path, struct kstat *stat,
                             u32 request_mask, unsigned int flags)
  {
         struct dentry *dentry = path->dentry;
         struct kstat lower_stat;
+       bool nosec = flags & AT_GETATTR_NOSEC;
         int rc;

-       rc = vfs_getattr(ecryptfs_dentry_to_lower_path(dentry), &lower_stat,
-                        request_mask, flags);
+       flags &= ~AT_INTERNAL_MASK;
+
+       rc = ecryptfs_do_getattr(nosec, 
ecryptfs_dentry_to_lower_path(dentry),
+                                &lower_stat, request_mask, flags);
         if (!rc) {
                 fsstack_copy_attr_all(d_inode(dentry),
ecryptfs_inode_to_lower(d_inode(dentry)));
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 83ef66644c21..ec4ceb5b4ebf 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -166,12 +166,15 @@ int ovl_getattr(struct mnt_idmap *idmap, const 
struct path *path,
         int fsid = 0;
         int err;
         bool metacopy_blocks = false;
+       bool nosec = flags & AT_GETATTR_NOSEC;
+
+       flags &= ~AT_INTERNAL_MASK;

         metacopy_blocks = ovl_is_metacopy_dentry(dentry);

         type = ovl_path_real(dentry, &realpath);
         old_cred = ovl_override_creds(dentry->d_sb);
-       err = vfs_getattr(&realpath, stat, request_mask, flags);
+       err = ovl_do_getattr(nosec, &realpath, stat, request_mask, flags);
         if (err)
                 goto out;

@@ -196,8 +199,8 @@ int ovl_getattr(struct mnt_idmap *idmap, const 
struct path *path,
                                         (!is_dir ? STATX_NLINK : 0);

                         ovl_path_lower(dentry, &realpath);
-                       err = vfs_getattr(&realpath, &lowerstat,
-                                         lowermask, flags);
+                       err = ovl_do_getattr(nosec, &realpath, &lowerstat,
+                                            lowermask, flags);
                         if (err)
                                 goto out;

@@ -249,8 +252,9 @@ int ovl_getattr(struct mnt_idmap *idmap, const 
struct path *path,

                         ovl_path_lowerdata(dentry, &realpath);
                         if (realpath.dentry) {
-                               err = vfs_getattr(&realpath, &lowerdatastat,
-                                                 lowermask, flags);
+                               err = ovl_do_getattr(nosec, &realpath,
+ &lowerdatastat, lowermask,
+                                                    flags);
                                 if (err)
                                         goto out;
                         } else {
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 9817b2dcb132..cbee3ff3bab7 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -397,6 +397,15 @@ static inline bool ovl_open_flags_need_copy_up(int 
flags)
         return ((OPEN_FMODE(flags) & FMODE_WRITE) || (flags & O_TRUNC));
  }

+static inline int ovl_do_getattr(bool nosec, const struct path *path,
+                                struct kstat *stat, u32 request_mask,
+                                unsigned int flags)
+{
+       if (nosec)
+               return vfs_getattr_nosec(path, stat, request_mask, flags);
+       return vfs_getattr(path, stat, request_mask, flags);
+}
+
  /* util.c */
  int ovl_want_write(struct dentry *dentry);
  void ovl_drop_write(struct dentry *dentry);
diff --git a/fs/stat.c b/fs/stat.c
index d43a5cc1bfa4..3250e427e1aa 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -133,7 +133,8 @@ int vfs_getattr_nosec(const struct path *path, 
struct kstat *stat,
         idmap = mnt_idmap(path->mnt);
         if (inode->i_op->getattr)
                 return inode->i_op->getattr(idmap, path, stat,
-                                           request_mask, query_flags);
+                                           request_mask,
+                                           query_flags | AT_GETATTR_NOSEC);

         generic_fillattr(idmap, request_mask, inode, stat);
         return 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b528f063e8ff..9069d6a301f0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2027,6 +2027,12 @@ struct super_operations {
         void (*shutdown)(struct super_block *sb);
  };

+/*
+ * Internal query flags. See fcntl.h AT_xxx flags for the rest.
+ */
+#define AT_GETATTR_NOSEC               0x80000000
+#define AT_INTERNAL_MASK               0x80000000
+
  /*
   * Inode flags - they have no relation to superblock flags now
   */




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
  2023-09-29 12:39                         ` Stefan Berger
@ 2023-10-01 14:51                           ` Amir Goldstein
  0 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2023-10-01 14:51 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Christian Brauner, Mimi Zohar, Jeff Layton, Casey Schaufler,
	syzbot, linux-fsdevel, linux-integrity, linux-kernel,
	linux-unionfs, miklos, syzkaller-bugs

On Fri, Sep 29, 2023 at 3:39 PM Stefan Berger <stefanb@linux.ibm.com> wrote:
>
>
> On 9/29/23 00:25, Amir Goldstein wrote:
> > On Fri, Sep 29, 2023 at 3:02 AM Stefan Berger <stefanb@linux.ibm.com> wrote:
> >>
> >> On 9/21/23 07:48, Christian Brauner wrote:
> >>> Imho, this is all very wild but I'm not judging.
> >>>
> >>> Two solutions imho:
> >>> (1) teach stacking filesystems like overlayfs and ecryptfs to use
> >>>       vfs_getattr_nosec() in their ->getattr() implementation when they
> >>>       are themselves called via vfs_getattr_nosec(). This will fix this by
> >>>       not triggering another LSM hook.
> >>
> >> You can avoid all this churn.
> >> Just use the existing query_flags arg.
> >> Nothing outside the AT_STATX_SYNC_TYPE query_flags is
> >> passed into filesystems from userspace.
> >>
> >> Mast out AT_STATX_SYNC_TYPE in vfs_getattr()
> >> And allow kernel internal request_flags in vfs_getattr_nosec()
> Hm, I thought that vfs_getattr_nosec needs to pass AT_GETATTR_NOSEC into
> ->getattr().
> >>
> >> The AT_ flag namespace is already a challenge, but mixing user
> >> flags and kernel-only flags in vfs interfaces has been done before.
> >>
> >> ...
>
>
> That's what I wanted to avoid since now all filesystems' getattr() may
> have the AT_GETATTR_NOSEC mixed into the query_flags.
>
> Anyway, here's what I currently have:
>
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 992d9c7e64ae..f7b5b1843dcc 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -998,16 +998,28 @@ static int ecryptfs_getattr_link(struct mnt_idmap
> *idmap,
>          return rc;
>   }
>
> +static int ecryptfs_do_getattr(bool nosec, const struct path *path,
> +                              struct kstat *stat, u32 request_mask,
> +                              unsigned int flags)
> +{
> +       if (nosec)
> +               return vfs_getattr_nosec(path, stat, request_mask, flags);
> +       return vfs_getattr(path, stat, request_mask, flags);
> +}
> +
>   static int ecryptfs_getattr(struct mnt_idmap *idmap,
>                              const struct path *path, struct kstat *stat,
>                              u32 request_mask, unsigned int flags)
>   {
>          struct dentry *dentry = path->dentry;
>          struct kstat lower_stat;
> +       bool nosec = flags & AT_GETATTR_NOSEC;
>          int rc;
>
> -       rc = vfs_getattr(ecryptfs_dentry_to_lower_path(dentry), &lower_stat,
> -                        request_mask, flags);
> +       flags &= ~AT_INTERNAL_MASK;
> +
> +       rc = ecryptfs_do_getattr(nosec,
> ecryptfs_dentry_to_lower_path(dentry),
> +                                &lower_stat, request_mask, flags);
>          if (!rc) {
>                  fsstack_copy_attr_all(d_inode(dentry),
> ecryptfs_inode_to_lower(d_inode(dentry)));
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 83ef66644c21..ec4ceb5b4ebf 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -166,12 +166,15 @@ int ovl_getattr(struct mnt_idmap *idmap, const
> struct path *path,
>          int fsid = 0;
>          int err;
>          bool metacopy_blocks = false;
> +       bool nosec = flags & AT_GETATTR_NOSEC;
> +
> +       flags &= ~AT_INTERNAL_MASK;

I don't understand why you need the nosec helper arg.
What's wrong with:

static int ovl_do_getattr(const struct path *path,
                              struct kstat *stat, u32 request_mask,
                              unsigned int flags)
{
       if (flags & AT_GETATTR_NOSEC)
               return vfs_getattr_nosec(path, stat, request_mask, flags);
       return vfs_getattr(path, stat, request_mask, flags);
}

likewise for ecryptfs.

>
>          metacopy_blocks = ovl_is_metacopy_dentry(dentry);
>
>          type = ovl_path_real(dentry, &realpath);
>          old_cred = ovl_override_creds(dentry->d_sb);
> -       err = vfs_getattr(&realpath, stat, request_mask, flags);
> +       err = ovl_do_getattr(nosec, &realpath, stat, request_mask, flags);
>          if (err)
>                  goto out;
>
> @@ -196,8 +199,8 @@ int ovl_getattr(struct mnt_idmap *idmap, const
> struct path *path,
>                                          (!is_dir ? STATX_NLINK : 0);
>
>                          ovl_path_lower(dentry, &realpath);
> -                       err = vfs_getattr(&realpath, &lowerstat,
> -                                         lowermask, flags);
> +                       err = ovl_do_getattr(nosec, &realpath, &lowerstat,
> +                                            lowermask, flags);
>                          if (err)
>                                  goto out;
>
> @@ -249,8 +252,9 @@ int ovl_getattr(struct mnt_idmap *idmap, const
> struct path *path,
>
>                          ovl_path_lowerdata(dentry, &realpath);
>                          if (realpath.dentry) {
> -                               err = vfs_getattr(&realpath, &lowerdatastat,
> -                                                 lowermask, flags);
> +                               err = ovl_do_getattr(nosec, &realpath,
> + &lowerdatastat, lowermask,
> +                                                    flags);
>                                  if (err)
>                                          goto out;
>                          } else {
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 9817b2dcb132..cbee3ff3bab7 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -397,6 +397,15 @@ static inline bool ovl_open_flags_need_copy_up(int
> flags)
>          return ((OPEN_FMODE(flags) & FMODE_WRITE) || (flags & O_TRUNC));
>   }
>
> +static inline int ovl_do_getattr(bool nosec, const struct path *path,
> +                                struct kstat *stat, u32 request_mask,
> +                                unsigned int flags)
> +{
> +       if (nosec)
> +               return vfs_getattr_nosec(path, stat, request_mask, flags);
> +       return vfs_getattr(path, stat, request_mask, flags);
> +}
> +
>   /* util.c */
>   int ovl_want_write(struct dentry *dentry);
>   void ovl_drop_write(struct dentry *dentry);
> diff --git a/fs/stat.c b/fs/stat.c
> index d43a5cc1bfa4..3250e427e1aa 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -133,7 +133,8 @@ int vfs_getattr_nosec(const struct path *path,
> struct kstat *stat,
>          idmap = mnt_idmap(path->mnt);)
>          if (inode->i_op->getattr)
>                  return inode->i_op->getattr(idmap, path, stat,
> -                                           request_mask, query_flags);
> +                                           request_mask,
> +                                           query_flags | AT_GETATTR_NOSEC);
>

You also need in vfs_getattr():

if (WARN_ON_ONCE(query_flags & AT_GETATTR_NOSEC)
    return -EPERM;

>          generic_fillattr(idmap, request_mask, inode, stat);
>          return 0;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b528f063e8ff..9069d6a301f0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2027,6 +2027,12 @@ struct super_operations {
>          void (*shutdown)(struct super_block *sb);
>   };
>
> +/*
> + * Internal query flags. See fcntl.h AT_xxx flags for the rest.
> + */
> +#define AT_GETATTR_NOSEC               0x80000000
> +#define AT_INTERNAL_MASK               0x80000000
> +

Yeh, the problem is that people adding flags to fcntl.h
won't be seeing this comment and we don't want to put those
"expose" those flags in uapi header either.

One possible compromise is to put them in fcntl.h under
#ifdef __KERNEL__

Very controversial, yes, I know.
The whole concept of mixing functional flags (i.e. AT_STATX_*)
with lookup AT_* flags is controversial to begin with, not to
mention flag overload for different syscalls (i.e. AT_EACCESS/
AT_REMOVEDIR/AT_HANDLE_FID).

But since we have accepted this necessary evil, I think that at least
we could explicitly partition the AT_ flags namespace and declare:

#ifdef __KERNEL__
AT_LOOKUP_FLAGS_MASK ...
AT_MOUNT_FLAGS_MASK         AT_RECURSIVE
AT_SYSCALL_PRIVATE_MASK   AT_EACCESS
AT_SYNC_TYPE_MASK               AT_STATX_SYNC_TYPE
AT_KERNEL_INTERNAL_MASK  0x80000000
#endif

Sfefan,

I feel that I have to stress the point that this is only *my* opinion and
I accept that others (like some vfs co-maintains..) may passionately
disagree to further pollute the AT_ flags namespace.

The advantage of the AT_KERNEL_INTERNAL_MASK is that it is
in no way exposed to users via ABI, so if we decide to undo this
decision anytime in the future and reclaim the internal AT_ flags,
we could do that.

IMO, this is a decent compromise compared to the very noisy
patch that adds another flags argument to ->getattr() just to fix
this IMA/overlayfs corner case.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-10-01 14:52 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-05 12:00 [syzbot] [overlayfs?] general protection fault in d_path syzbot
2023-07-05 13:05 ` Christian Brauner
2023-07-05 13:39   ` Amir Goldstein
2023-07-05 13:41   ` Jeff Layton
2023-07-05 13:54     ` Amir Goldstein
2023-07-05 14:10       ` Jeff Layton
2023-09-12 22:10 ` syzbot
2023-09-13  7:09   ` Amir Goldstein
2023-09-13  9:06     ` [syzbot] [integrity] [overlayfs] " syzbot
2023-09-14 17:54       ` Mimi Zohar
2023-09-18  0:04 ` syzbot
2023-09-20 17:01   ` Stefan Berger
2023-09-20 20:37     ` Stefan Berger
2023-09-20 21:16       ` Jeff Layton
2023-09-20 22:09         ` Stefan Berger
2023-09-21  0:10           ` Stefan Berger
2023-09-21  0:52             ` Casey Schaufler
2023-09-21 10:32               ` Jeff Layton
2023-09-21 11:24                 ` Mimi Zohar
2023-09-21 11:48                   ` Christian Brauner
2023-09-21 14:29                     ` Stefan Berger
2023-09-21 14:52                     ` Mimi Zohar
2023-09-21 15:10                       ` Jeff Layton
2023-09-21 15:18                         ` Amir Goldstein
2023-09-21 15:19                         ` Mimi Zohar
2023-09-21 15:39                           ` Jeff Layton
2023-09-21 16:06                             ` Mimi Zohar
2023-09-21 17:01                               ` Amir Goldstein
2023-09-26 14:40                                 ` Mimi Zohar
2023-09-29  4:30                                   ` Amir Goldstein
2023-09-28 22:03                     ` Stefan Berger
2023-09-29  4:25                       ` Amir Goldstein
2023-09-29 12:39                         ` Stefan Berger
2023-10-01 14:51                           ` Amir Goldstein
2023-09-21 12:03                 ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.