From: prathamesh naik <prathamesh.naik20@gmail.com>
To: "Valdis Klētnieks" <valdis.kletnieks@vt.edu>
Cc: kernelnewbies@kernelnewbies.org
Subject: Re: Predicting Process crash / Memory utlization using machine learning
Date: Wed, 9 Oct 2019 16:40:45 -0700
Message-ID: <CAGG2BF79aHZOvw+1nG5RW32w7hX3HyGp1eB8t9Rr1G2N1ZR4Vg@mail.gmail.com> (raw)
In-Reply-To: <177218.1570656506@turing-police>
[-- Attachment #1.1: Type: text/plain, Size: 2730 bytes --]
Thanks a lot for sharing.
One of the problem I am facing is not having enough actual data. I can
create simulated data but it is overfitting my algorithm.
Second problem is I am not sure what all factors (called features in ML
terms) are useful for pattern creation.
Some of the factors I could think of were :
1. Memory used
2. CPU
3. shared memory
4. vmstat
5. message queue sizes
Regards,
Prathamesh
On Wed, Oct 9, 2019 at 2:28 PM Valdis Klētnieks <valdis.kletnieks@vt.edu>
wrote:
> On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:
> > I want to work on project which can predict kernel process
> > crash or even user space process crash (or memory usage spikes) using
> > machine learning algorithms.
>
> This sounds like it's isomorphic to the Turing Halting Problem, and there's
> plenty of other good reasons to think that predicting a process crash is,
> in
> general, somewhere between "very difficult" and "impossible".
>
> Even "memory usage spikes" are going to be a challenge.
>
> Consider a program that's doing an in-memory sort. Your machine has 16 gig
> of
> memory, and 2 gig of swap. It's known that the sort algorithm requires
> 1.5G of
> memory for each gigabyte of input data.
>
> Does the system start to thrash, or crash entirely, or does the sort
> complete
> without issues? There's no way to make a prediction without knowing the
> size
> of the input data. And if you're dealing with something like
>
> grep <regexp> file | predictable-memory-sort
>
> where 'file' is a logfile *much* bigger than memory....
>
> You can see where this is heading...
>
> Bottom line: I'm pretty convinced that in the general case, you can't do
> much
> better than current monitoring systems already do: Look at free space,
> look at
> the free space trendline for the past 5 minutes or whatever, and issue an
> alert
> if the current trend indicates exhaustion in under 15 minutes.
>
> Now, what *might* be interesting is seeing if machine learning across
> multiple
> events is able to suggest better values than 5 and 15 minutes, to provide a
> best tradeoff between issuing an alert early enough that a sysadmin can
> take
> action, and avoiding issuing early alerts that turn out to be false alarms.
>
> The problem there is that getting enough data on actual production systems
> will be difficult, because sysadmins usually don't leave sub-optimal
> configuration
> settings in place so you can gather data.
>
> And data gathered for machine learning on an intentionally misconfigured
> test
> system won't be applicable to other machines.
>
> Good luck, this problem is a lot harder than it looks....
>
[-- Attachment #1.2: Type: text/html, Size: 3323 bytes --]
<div dir="ltr">Thanks a lot for sharing. <div>One of the problem I am facing is not having enough actual data. I can create simulated data but it is overfitting my algorithm.<br><div>Second problem is I am not sure what all factors (called features in ML terms) are useful for pattern creation.</div><div>Some of the factors I could think of were : </div><div>1. Memory used</div><div>2. CPU</div><div>3. shared memory</div><div>4. vmstat</div><div>5. message queue sizes</div><div><br></div><div>Regards,</div><div>Prathamesh</div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 9, 2019 at 2:28 PM Valdis Klētnieks <<a href="mailto:valdis.kletnieks@vt.edu">valdis.kletnieks@vt.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:<br>
> I want to work on project which can predict kernel process<br>
> crash or even user space process crash (or memory usage spikes) using<br>
> machine learning algorithms. <br>
<br>
This sounds like it's isomorphic to the Turing Halting Problem, and there's<br>
plenty of other good reasons to think that predicting a process crash is, in<br>
general, somewhere between "very difficult" and "impossible".<br>
<br>
Even "memory usage spikes" are going to be a challenge.<br>
<br>
Consider a program that's doing an in-memory sort. Your machine has 16 gig of<br>
memory, and 2 gig of swap. It's known that the sort algorithm requires 1.5G of<br>
memory for each gigabyte of input data.<br>
<br>
Does the system start to thrash, or crash entirely, or does the sort complete<br>
without issues? There's no way to make a prediction without knowing the size<br>
of the input data. And if you're dealing with something like <br>
<br>
grep <regexp> file | predictable-memory-sort<br>
<br>
where 'file' is a logfile *much* bigger than memory....<br>
<br>
You can see where this is heading...<br>
<br>
Bottom line: I'm pretty convinced that in the general case, you can't do much<br>
better than current monitoring systems already do: Look at free space, look at<br>
the free space trendline for the past 5 minutes or whatever, and issue an alert<br>
if the current trend indicates exhaustion in under 15 minutes.<br>
<br>
Now, what *might* be interesting is seeing if machine learning across multiple<br>
events is able to suggest better values than 5 and 15 minutes, to provide a<br>
best tradeoff between issuing an alert early enough that a sysadmin can take<br>
action, and avoiding issuing early alerts that turn out to be false alarms.<br>
<br>
The problem there is that getting enough data on actual production systems<br>
will be difficult, because sysadmins usually don't leave sub-optimal configuration<br>
settings in place so you can gather data.<br>
<br>
And data gathered for machine learning on an intentionally misconfigured test<br>
system won't be applicable to other machines.<br>
<br>
Good luck, this problem is a lot harder than it looks....<br>
</blockquote></div>
[-- Attachment #2: Type: text/plain, Size: 170 bytes --]
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
next prev parent reply index
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-09 8:23 prathamesh naik
2019-10-09 21:28 ` Valdis Klētnieks
2019-10-09 23:40 ` prathamesh naik [this message]
2019-10-10 7:17 ` Greg KH
2019-10-10 8:23 ` Ruben Safir
Reply instructions:
You may reply publically to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGG2BF79aHZOvw+1nG5RW32w7hX3HyGp1eB8t9Rr1G2N1ZR4Vg@mail.gmail.com \
--to=prathamesh.naik20@gmail.com \
--cc=kernelnewbies@kernelnewbies.org \
--cc=valdis.kletnieks@vt.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Kernel Newbies archive on lore.kernel.org
Archives are clonable:
git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
kernelnewbies@kernelnewbies.org
public-inbox-index kernelnewbies
Example config snippet for mirrors
Newsgroup available over NNTP:
nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git