linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* system call hook triggers kernel panic
@ 2019-10-17  2:00 Yi Li
  2019-10-17  4:29 ` Oliver O'Halloran
  0 siblings, 1 reply; 3+ messages in thread
From: Yi Li @ 2019-10-17  2:00 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

We tried to replace the umount system call with our own code. Bellow is the simplified test case.
When doing umount, there is kernel panic (on centos7 4.14.0-115.10.1.el7a.ppc64le kernel) on P9 OpenPOWER machine.
Could you please give suggestions on how to make the system call hook work properly on powerpc?

"
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/kallsyms.h>

static void** sct;

static asmlinkage long (*orig_umount)(char __user *, int);

static asmlinkage long umount_hook(char __user *name, int flags)
{
	char *dir_name;
	long ret;

	dir_name = strndup_user(name, 512); 
	printk(KERN_NOTICE "umount %s 0x%x\n", dir_name, flags);
	kfree(dir_name);

	ret = orig_umount(name, flags);

	printk("umount2 returned %ld\n", ret);

	return ret;
}

static int __init poc_init(void)
{
	sct = (void**)kallsyms_lookup_name("sys_call_table");

#ifdef CONFIG_PPC64
	orig_umount = sct[__NR_umount2 * 2];
	sct[__NR_umount2 * 2] = umount_hook;
#else
	/*
	 * For recent kernel on x86, we would need remove memory protection
	 * before modify syscall table, let's ignore the work for a PoC.
	 *
	 * The stock kernel for CentOS 7.4 or lower should be just fine.
	 */
	orig_umount = sct[__NR_umount2];
	sct[__NR_umount2] = umount_hook;
#endif

	printk("syscall.__NR_umount2 replaced\n");

	return 0;
}

static void poc_exit(void)
{
#ifdef CONFIG_PPC64
	sct[__NR_umount2 * 2] = orig_umount;
#else
	sct[__NR_umount2] = orig_umount;
#endif

	printk("syscall.__NR_umount2 restored\n");
}

module_init(poc_init);
module_exit(poc_exit);

MODULE_DESCRIPTION("syscall hook poc.  load it, umount something, then dmesg to"
	" check its activities.");
MODULE_VERSION("1.0");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Huang Le");
"

The kernel module can be insert correctly, and we mount a tmpfs, then umount.
Kernel panic when doing umount:
"
[  148.569777] umount /home/adam/test 0x0
[  148.608227] umount2 returned 0
[  148.608268] Unable to handle kernel paging request for data at address 0xc00800001625a288
[  148.608320] Faulting instruction address: 0xc00000000001d610
[  148.608387] Oops: Kernel access of bad area, sig: 11 [#1]
[  148.608418] LE SMP NR_CPUS=2048 NUMA PowerNV
[  148.608460] Modules linked in: poc(OE) rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core i2c_dev ses ipmi_powernv enclosure scsi_transport_sas sg ipmi_devintf at24 ofpart powernv_flash ipmi_msghandler mtd shpchp uio_pdrv_genirq opal_prd ibmpowernv uio ip_tables ext4 mbcache jbd2 sd_mod ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3 megaraid_sas be2net aacraid ptp pps_core
[  148.608946] CPU: 5 PID: 15540 Comm: umount Tainted: G           OE  ------------   4.14.0-115.10.1.el7a.ppc64le #1
[  148.609075] task: c000003fc4017000 task.stack: c000003fbae9c000
[  148.609159] NIP:  c00000000001d610 LR: c00000000000dd00 CTR: 000000000000004e
[  148.609239] REGS: c000003fbae9fb70 TRAP: 0300   Tainted: G           OE  ------------    (4.14.0-115.10.1.el7a.ppc64le)
[  148.609339] MSR:  9000000002803033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22000844  XER: 20040000
[  148.609391] CFAR: c00000000001d5f8 DAR: c00800001625a288 DSISR: 40000000 SOFTE: 1
[  148.609391] GPR00: c00000000000dd00 c000003fbae9fdf0 c0080000161c8400 c000003fbae9fea0
[  148.609391] GPR04: 0000000000040080 0000000000000000 0000000000000001 0000000000000000
[  148.609391] GPR08: c008000016258400 0000000000000002 0000000000000002 0000000000000c00
[  148.609391] GPR12: 0000000000000000 c00000000fa83700 0000000000000000 0000000000000000
[  148.609391] GPR16: 0000000000000000 0000000000000000 0000000000000000 00007fffe86f3234
[  148.609391] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  148.609391] GPR24: 000000012a7d6468 000000012a7d6590 0000000000000001 0000000163a574f0
[  148.609391] GPR28: 00002000000e1d54 0000000000000000 c000003fbae9fea0 900000000280f033
[  148.610016] NIP [c00000000001d610] restore_math+0x60/0x200
[  148.610079] LR [c00000000000dd00] ret_from_except_lite+0x2c/0x74
[  148.610143] Call Trace:
[  148.610186] [c000003fbae9fdf0] [c000003fbae9fe30] 0xc000003fbae9fe30 (unreliable)
[  148.610287] [c000003fbae9fe30] [c00000000000dd00] ret_from_except_lite+0x2c/0x74
[  148.610378] Instruction dump:
[  148.610414] 7be7e8a4 78e71f87 40820024 e92d0260 89290e78 2f890000 409e0014 e92d0260
[  148.610471] 89290e79 2f890000 419e0074 3d020009 <e9081e88> 7d4000a6 7d494378 60000000
[  148.610568] ---[ end trace 1ec6b39ae7531745 ]---
[  149.593561]
[  150.593628] Kernel panic - not syncing: Fatal exception
"

Thanks,
-Yi Li

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: system call hook triggers kernel panic
  2019-10-17  2:00 system call hook triggers kernel panic Yi Li
@ 2019-10-17  4:29 ` Oliver O'Halloran
  2019-10-17 10:33   ` Yi Li
  0 siblings, 1 reply; 3+ messages in thread
From: Oliver O'Halloran @ 2019-10-17  4:29 UTC (permalink / raw)
  To: Yi Li; +Cc: linuxppc-dev

On Thu, Oct 17, 2019 at 1:01 PM Yi Li <adamliyi@msn.com> wrote:
>
> Hi,

*snip*

> The kernel module can be insert correctly, and we mount a tmpfs, then umount.
> Kernel panic when doing umount:
> "
> [  148.569777] umount /home/adam/test 0x0
> [  148.608227] umount2 returned 0
> [  148.608268] Unable to handle kernel paging request for data at address 0xc00800001625a288
> [  148.608320] Faulting instruction address: 0xc00000000001d610
> [  148.608387] Oops: Kernel access of bad area, sig: 11 [#1]
> [  148.608418] LE SMP NR_CPUS=2048 NUMA PowerNV
> [  148.608460] Modules linked in: poc(OE) rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core i2c_dev ses ipmi_powernv enclosure scsi_transport_sas sg ipmi_devintf at24 ofpart powernv_flash ipmi_msghandler mtd shpchp uio_pdrv_genirq opal_prd ibmpowernv uio ip_tables ext4 mbcache jbd2 sd_mod ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3 megaraid_sas be2net aacraid ptp pps_core
> [  148.608946] CPU: 5 PID: 15540 Comm: umount Tainted: G           OE  ------------   4.14.0-115.10.1.el7a.ppc64le #1
> [  148.609075] task: c000003fc4017000 task.stack: c000003fbae9c000
> [  148.609159] NIP:  c00000000001d610 LR: c00000000000dd00 CTR: 000000000000004e
> [  148.609239] REGS: c000003fbae9fb70 TRAP: 0300   Tainted: G           OE  ------------    (4.14.0-115.10.1.el7a.ppc64le)
> [  148.609339] MSR:  9000000002803033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22000844  XER: 20040000
> [  148.609391] CFAR: c00000000001d5f8 DAR: c00800001625a288 DSISR: 40000000 SOFTE: 1
> [  148.609391] GPR00: c00000000000dd00 c000003fbae9fdf0 c0080000161c8400 c000003fbae9fea0
> [  148.609391] GPR04: 0000000000040080 0000000000000000 0000000000000001 0000000000000000
> [  148.609391] GPR08: c008000016258400 0000000000000002 0000000000000002 0000000000000c00
> [  148.609391] GPR12: 0000000000000000 c00000000fa83700 0000000000000000 0000000000000000
> [  148.609391] GPR16: 0000000000000000 0000000000000000 0000000000000000 00007fffe86f3234
> [  148.609391] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  148.609391] GPR24: 000000012a7d6468 000000012a7d6590 0000000000000001 0000000163a574f0
> [  148.609391] GPR28: 00002000000e1d54 0000000000000000 c000003fbae9fea0 900000000280f033
> [  148.610016] NIP [c00000000001d610] restore_math+0x60/0x200
> [  148.610079] LR [c00000000000dd00] ret_from_except_lite+0x2c/0x74
> [  148.610143] Call Trace:
> [  148.610186] [c000003fbae9fdf0] [c000003fbae9fe30] 0xc000003fbae9fe30 (unreliable)
> [  148.610287] [c000003fbae9fe30] [c00000000000dd00] ret_from_except_lite+0x2c/0x74
> [  148.610378] Instruction dump:
> [  148.610414] 7be7e8a4 78e71f87 40820024 e92d0260 89290e78 2f890000 409e0014 e92d0260
> [  148.610471] 89290e79 2f890000 419e0074 3d020009 <e9081e88> 7d4000a6 7d494378 60000000
> [  148.610568] ---[ end trace 1ec6b39ae7531745 ]---
> [  149.593561]
> [  150.593628] Kernel panic - not syncing: Fatal exception
> "

The ABI (v1 and v2) uses r2 as a pointer to the "table of contents"
which is used to look up the addresses of global symbols. TOCs are
specific to the current unit of execution and the vmlinux and each
module has its own TOC. From the dump it looks like the r2 is pointing
into the vmalloc area where modules are loaded so odds are the crash
is because the TOC isn't being restored when we return from the
patched function. One of the many reasons why you really shouldn't
hook the syscall table ;)

The vmlinux's TOC is saved somewhere in the PACA (legacy ppc specific
per-cpu thing) so you could restore it with some inline asm before
returning from your hook. Have a look at what we to load r2 in the
system call entry path.

> Thanks,
> -Yi Li

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: system call hook triggers kernel panic
  2019-10-17  4:29 ` Oliver O'Halloran
@ 2019-10-17 10:33   ` Yi Li
  0 siblings, 0 replies; 3+ messages in thread
From: Yi Li @ 2019-10-17 10:33 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev



> On Oct 17, 2019, at 12:29 PM, Oliver O'Halloran <oohall@gmail.com> wrote:
> 
> 
> The ABI (v1 and v2) uses r2 as a pointer to the "table of contents"
> which is used to look up the addresses of global symbols. TOCs are
> specific to the current unit of execution and the vmlinux and each
> module has its own TOC. From the dump it looks like the r2 is pointing
> into the vmalloc area where modules are loaded so odds are the crash
> is because the TOC isn't being restored when we return from the
> patched function. One of the many reasons why you really shouldn't
> hook the syscall table ;)
> 
> The vmlinux's TOC is saved somewhere in the PACA (legacy ppc specific
> per-cpu thing) so you could restore it with some inline asm before
> returning from your hook. Have a look at what we to load r2 in the
> system call entry path.
> 

Thanks for the insight!
I tried to restore 'r2' before return from the system call, there is no kernel panic:

"
static asmlinkage long umount_hook(char __user *name, int flags)
{
        char *dir_name;
        long ret;

        dir_name = strndup_user(name, 512);
        printk(KERN_NOTICE "umount %s 0x%x\n", dir_name, flags);
        kfree(dir_name);

        ret = orig_umount(name, flags);

        printk("umount2 returned %ld\n", ret);

        // PACATOC offsetof(struct paca_struct, kernel_toc)
        // asm volatile("ld 2,PACATOC(13)");
        asm volatile("ld 2, 16(13)");

        return ret;
}
"

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-10-17 10:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17  2:00 system call hook triggers kernel panic Yi Li
2019-10-17  4:29 ` Oliver O'Halloran
2019-10-17 10:33   ` Yi Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).