Kernel Newbies archive on lore.kernel.org
 help / color / Atom feed
From: prathamesh naik <prathamesh.naik20@gmail.com>
To: "Valdis Klētnieks" <valdis.kletnieks@vt.edu>
Cc: kernelnewbies@kernelnewbies.org
Subject: Re: Predicting Process crash / Memory utlization using machine learning
Date: Wed, 9 Oct 2019 16:40:45 -0700
Message-ID: <CAGG2BF79aHZOvw+1nG5RW32w7hX3HyGp1eB8t9Rr1G2N1ZR4Vg@mail.gmail.com> (raw)
In-Reply-To: <177218.1570656506@turing-police>

[-- Attachment #1.1: Type: text/plain, Size: 2730 bytes --]

Thanks a lot for sharing.
One of the problem I am facing is not having enough actual data. I can
create simulated data but it is overfitting my algorithm.
Second problem is I am not sure what all factors (called features in ML
terms) are useful for pattern creation.
Some of the factors I could think of were :
1. Memory used
2. CPU
3. shared memory
4. vmstat
5. message queue sizes

Regards,
Prathamesh




On Wed, Oct 9, 2019 at 2:28 PM Valdis Klētnieks <valdis.kletnieks@vt.edu>
wrote:

> On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:
> >             I want to work on project which can predict kernel process
> > crash or even user space process crash (or memory usage spikes) using
> > machine learning algorithms.
>
> This sounds like it's isomorphic to the Turing Halting Problem, and there's
> plenty of other good reasons to think that predicting a process crash is,
> in
> general, somewhere between "very difficult" and "impossible".
>
> Even "memory usage spikes" are going to be a challenge.
>
> Consider a program that's doing an in-memory sort. Your machine has 16 gig
> of
> memory, and 2 gig of swap.  It's known that the sort algorithm requires
> 1.5G of
> memory for each gigabyte of input data.
>
> Does the system start to thrash, or crash entirely, or does the sort
> complete
> without issues?  There's no way to make a prediction without knowing the
> size
> of the input data.  And if you're dealing with something like
>
> grep <regexp> file | predictable-memory-sort
>
> where 'file' is a logfile *much* bigger than memory....
>
> You can see where this is heading...
>
> Bottom line:  I'm pretty convinced that in the general case, you can't do
> much
> better than current monitoring systems already do: Look at free space,
> look at
> the free space trendline for the past 5 minutes or whatever, and issue an
> alert
> if the current trend indicates exhaustion in under 15 minutes.
>
> Now, what *might* be interesting is seeing if machine learning across
> multiple
> events is able to suggest better values than 5 and 15 minutes, to provide a
> best tradeoff between issuing an alert early enough that a sysadmin can
> take
> action, and avoiding issuing early alerts that turn out to be false alarms.
>
> The problem there is that getting enough data on actual production systems
> will be difficult, because sysadmins usually don't leave sub-optimal
> configuration
> settings in place so you can gather data.
>
> And data gathered for machine learning on an intentionally misconfigured
> test
> system won't be applicable to other machines.
>
> Good luck, this problem is a lot harder than it looks....
>

[-- Attachment #1.2: Type: text/html, Size: 3323 bytes --]

<div dir="ltr">Thanks a lot for sharing. <div>One of the problem I am facing is not having enough actual data. I can create simulated data but it is overfitting my algorithm.<br><div>Second problem is I am not sure what all factors (called features in ML terms) are useful for pattern creation.</div><div>Some of the factors I could think of were : </div><div>1. Memory used</div><div>2. CPU</div><div>3. shared memory</div><div>4. vmstat</div><div>5. message queue sizes</div><div><br></div><div>Regards,</div><div>Prathamesh</div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 9, 2019 at 2:28 PM Valdis Klētnieks &lt;<a href="mailto:valdis.kletnieks@vt.edu">valdis.kletnieks@vt.edu</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:<br>
&gt;             I want to work on project which can predict kernel process<br>
&gt; crash or even user space process crash (or memory usage spikes) using<br>
&gt; machine learning algorithms. <br>
<br>
This sounds like it&#39;s isomorphic to the Turing Halting Problem, and there&#39;s<br>
plenty of other good reasons to think that predicting a process crash is, in<br>
general, somewhere between &quot;very difficult&quot; and &quot;impossible&quot;.<br>
<br>
Even &quot;memory usage spikes&quot; are going to be a challenge.<br>
<br>
Consider a program that&#39;s doing an in-memory sort. Your machine has 16 gig of<br>
memory, and 2 gig of swap.  It&#39;s known that the sort algorithm requires 1.5G of<br>
memory for each gigabyte of input data.<br>
<br>
Does the system start to thrash, or crash entirely, or does the sort complete<br>
without issues?  There&#39;s no way to make a prediction without knowing the size<br>
of the input data.  And if you&#39;re dealing with something like <br>
<br>
grep &lt;regexp&gt; file | predictable-memory-sort<br>
<br>
where &#39;file&#39; is a logfile *much* bigger than memory....<br>
<br>
You can see where this is heading...<br>
<br>
Bottom line:  I&#39;m pretty convinced that in the general case, you can&#39;t do much<br>
better than current monitoring systems already do: Look at free space, look at<br>
the free space trendline for the past 5 minutes or whatever, and issue an alert<br>
if the current trend indicates exhaustion in under 15 minutes.<br>
<br>
Now, what *might* be interesting is seeing if machine learning across multiple<br>
events is able to suggest better values than 5 and 15 minutes, to provide a<br>
best tradeoff between issuing an alert early enough that a sysadmin can take<br>
action, and avoiding issuing early alerts that turn out to be false alarms.<br>
<br>
The problem there is that getting enough data on actual production systems<br>
will be difficult, because sysadmins usually don&#39;t leave sub-optimal configuration<br>
settings in place so you can gather data.<br>
<br>
And data gathered for machine learning on an intentionally misconfigured test<br>
system won&#39;t be applicable to other machines.<br>
<br>
Good luck, this problem is a lot harder than it looks....<br>
</blockquote></div>

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

  reply index

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-09  8:23 prathamesh naik
2019-10-09 21:28 ` Valdis Klētnieks
2019-10-09 23:40   ` prathamesh naik [this message]
2019-10-10  7:17     ` Greg KH
2019-10-10  8:23 ` Ruben Safir

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGG2BF79aHZOvw+1nG5RW32w7hX3HyGp1eB8t9Rr1G2N1ZR4Vg@mail.gmail.com \
    --to=prathamesh.naik20@gmail.com \
    --cc=kernelnewbies@kernelnewbies.org \
    --cc=valdis.kletnieks@vt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Kernel Newbies archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org
	public-inbox-index kernelnewbies

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git