kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* Userspace app crash causes system crash on do_exit probe
@ 2020-09-01  7:16 César Augusto Marcelino dos Santos
  2020-09-01  7:33 ` Greg KH
  0 siblings, 1 reply; 2+ messages in thread
From: César Augusto Marcelino dos Santos @ 2020-09-01  7:16 UTC (permalink / raw)
  To: kernelnewbies

Dear community,

I have created a kernel module that adds probes to do_execve() and
do_exit() syscalls (code by the end of this email). It is running on a
custom kernel-based system, version 3.18.31.

The goal of this module is to see if I can capture several information
from any process that is about to start, or that is about to leave
userspace. I have tested the following scenarios:
- app inits
- app finishes its execution gracefully
- app is killed
- app crashes

The first three cases I can retrieve information from the process, but
on the last case, I am having an unexpected Kernel Oops. More
specifically, I am having trouble on retrieving command-line arguments
from a process, and seems to be due to some unusual race condition.

To ease things, I have simplified the original source code and focused
on the command-line part. It can be noticed that “getCommandLine()”
function is not being shown here, and the reason is because is a copy
of get_cmdline() method from mm/util.c
(https://elixir.bootlin.com/linux/latest/source/mm/util.c#L855).

This version of get_cmdline() is using synchronization mechanisms (in
my case, I have implemented it with semaphores instead of spinlocks),
which causes the Kernel to crash:
    ...
    BUG: scheduling while atomic: mysegfaultapp/6037/0x00000002
    Modules linked in: ...
    CPU: 0 PID: 9313 Comm: mysegfaultapp Tainted: P        W  O   3.18.31 #2
    [<c0014024>] (unwind_backtrace) from [<c00119f0>] (show_stack+0x10/0x14)
    [<c00119f0>] (show_stack) from [<c0039830>] (__schedule_bug+0x44/0x60)
    [<c0039830>] (__schedule_bug) from [<c0838040>] (__schedule+0x68/0x470)
    [<c0838040>] (__schedule) from [<c083a864>]
(rwsem_down_read_failed+0x104/0x130)
    [<c083a864>] (rwsem_down_read_failed) from [<bf000918>]
(getCommandLine.constprop.0+0x44/0x160 [mymodule])
    [<bf000918>] (getCommandLine.constprop.0 [mymodule]) from
[<bf000644>] (doExitHandler+0x1dc/0x25c [mymodule])
    [<bf000644>] (doExitHandler [mymodule]) from [<c0021850>]
(SyS_exit_group+0x0/0x10)
    [<c0021850>] (SyS_exit_group) from [<00000009>] (0x9)
    Unable to handle kernel paging request at virtual address fffffffe
    pgd = dbc20000
    [fffffffe] *pgd=9f3f8821, *pte=00000000, *ppte=00000000
    Internal error: Oops: 80000007 [#1] PREEMPT ARM
    ...

But if I use an implementation without synchronization mechanisms
(which is the one that matches my kernel version -
https://elixir.bootlin.com/linux/v3.18.31/source/mm/util.c#L355), once
a running app causes segmentation fault and crashes, I am not able to
report its command-line, but system remains running (for reference,
this app is a dummy app that causes a segfault on purpose, here called
“mysegfaultapp”).

Due to those situations, I have a few questions that I hope the
community can give me some directions on where to look further and
understand:
1) Is it possible to retrieve the command-line arguments from a
userspace process that crashed?
2) How can I inspect the reason for this crash on rwsem_down_read_failed?
3) If I go for the v.3.18.31 version that doesn’t use synchronization
structures (semaphores or spinlocks), what are the risks?


Please let me know if you need further information, or if you have any
questions.


Thanks in advance,
Cesar.


-------------------------------------------------------------------------------------------------------------------------------------------
static struct kretprobe initProcess;
static struct jprobe exitProcess;

static void doExitHandler(long code) {
    char commandLine[200];
    memset(commandLine, 0, sizeof(commandLine));

    if (getCommandLine(current, commandLine, sizeof(commandLine)) <= 0) {
        strcpy(commandLine, "ERROR");
    }

    printk(KERN_INFO "doExitHandler %s\n", commandLine);
    jprobe_return();
}

static int doExecHandler(struct kretprobe_instance *pMetadata, struct
pt_regs *pRegs) {
    char commandLine[200];
    memset(commandLine, 0, sizeof(commandLine));

    if (getCommandLine(current, commandLine, sizeof(commandLine)) <= 0) {
        strcpy(commandLine, "ERROR");
    }

    printk(KERN_INFO "doExecHandler %s\n", commandLine);
    return 0;
}

static int myInit(void) {
    int retval;

    initProcess.kp.symbol_name = "do_execve";
    initProcess.handler = doExecHandler;
    retval = register_kretprobe(&initProcess);

    exitProcess.kp.symbol_name = "do_exit";
    exitProcess.entry = JPROBE_ENTRY(doExitHandler);
    retval = register_jprobe(&exitProcess);

    return retval;
}

static void myExit(void) {
    unregister_kretprobe(&initProcess);
    unregister_jprobe(&exitProcess);
}

module_init(myInit);
module_exit(myExit);

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Userspace app crash causes system crash on do_exit probe
  2020-09-01  7:16 Userspace app crash causes system crash on do_exit probe César Augusto Marcelino dos Santos
@ 2020-09-01  7:33 ` Greg KH
  0 siblings, 0 replies; 2+ messages in thread
From: Greg KH @ 2020-09-01  7:33 UTC (permalink / raw)
  To: César Augusto Marcelino dos Santos; +Cc: kernelnewbies

On Tue, Sep 01, 2020 at 09:16:22AM +0200, César Augusto Marcelino dos Santos wrote:
> Dear community,
> 
> I have created a kernel module that adds probes to do_execve() and
> do_exit() syscalls (code by the end of this email). It is running on a
> custom kernel-based system, version 3.18.31.

Wow, 3.18.y is from December of 2014, many years ago, and over 467,000
changes ago.  You really need to ask the company that is forcing you to
rely on that old kernel version for stuff like this, as you are paying
them for that support, take advantage of it, do not rely on the
community to try to attempt to help with such an obsolete system.

That being said:

> The goal of this module is to see if I can capture several information
> from any process that is about to start, or that is about to leave
> userspace. I have tested the following scenarios:
> - app inits
> - app finishes its execution gracefully
> - app is killed
> - app crashes

Just use the LSM interface instead please, that is wht it is there for,
you really really really do not want to attempt to hook system calls,
unless you are a rootkit :)

good luck!

greg k-h

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-09-01  7:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01  7:16 Userspace app crash causes system crash on do_exit probe César Augusto Marcelino dos Santos
2020-09-01  7:33 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).