Kernel Newbies archive on lore.kernel.org
 help / color / Atom feed
* Predicting Process crash / Memory utlization using machine learning
@ 2019-10-09  8:23 prathamesh naik
  2019-10-09 21:28 ` Valdis Klētnieks
  2019-10-10  8:23 ` Ruben Safir
  0 siblings, 2 replies; 5+ messages in thread
From: prathamesh naik @ 2019-10-09  8:23 UTC (permalink / raw)
  To: kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 345 bytes --]

Hi all,
            I want to work on project which can predict kernel process
crash or even user space process crash (or memory usage spikes) using
machine learning algorithms. Can someone point me what all data can be
useful for tuning my algorithm ? is there already paper on this (could not
find much articles on this) ?

Thanks,
Prathamesh

[-- Attachment #1.2: Type: text/html, Size: 419 bytes --]

<div dir="ltr">Hi all,<div>            I want to work on project which can predict kernel process crash or even user space process crash (or memory usage spikes) using machine learning algorithms. Can someone point me what all data can be useful for tuning my algorithm ? is there already paper on this (could not find much articles on this) ?</div><div><br></div><div>Thanks,</div><div>Prathamesh </div></div>

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Predicting Process crash / Memory utlization using machine learning
  2019-10-09  8:23 Predicting Process crash / Memory utlization using machine learning prathamesh naik
@ 2019-10-09 21:28 ` Valdis Klētnieks
  2019-10-09 23:40   ` prathamesh naik
  2019-10-10  8:23 ` Ruben Safir
  1 sibling, 1 reply; 5+ messages in thread
From: Valdis Klētnieks @ 2019-10-09 21:28 UTC (permalink / raw)
  To: prathamesh naik; +Cc: kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 2107 bytes --]

On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:
>             I want to work on project which can predict kernel process
> crash or even user space process crash (or memory usage spikes) using
> machine learning algorithms. 

This sounds like it's isomorphic to the Turing Halting Problem, and there's
plenty of other good reasons to think that predicting a process crash is, in
general, somewhere between "very difficult" and "impossible".

Even "memory usage spikes" are going to be a challenge.

Consider a program that's doing an in-memory sort. Your machine has 16 gig of
memory, and 2 gig of swap.  It's known that the sort algorithm requires 1.5G of
memory for each gigabyte of input data.

Does the system start to thrash, or crash entirely, or does the sort complete
without issues?  There's no way to make a prediction without knowing the size
of the input data.  And if you're dealing with something like 

grep <regexp> file | predictable-memory-sort

where 'file' is a logfile *much* bigger than memory....

You can see where this is heading...

Bottom line:  I'm pretty convinced that in the general case, you can't do much
better than current monitoring systems already do: Look at free space, look at
the free space trendline for the past 5 minutes or whatever, and issue an alert
if the current trend indicates exhaustion in under 15 minutes.

Now, what *might* be interesting is seeing if machine learning across multiple
events is able to suggest better values than 5 and 15 minutes, to provide a
best tradeoff between issuing an alert early enough that a sysadmin can take
action, and avoiding issuing early alerts that turn out to be false alarms.

The problem there is that getting enough data on actual production systems
will be difficult, because sysadmins usually don't leave sub-optimal configuration
settings in place so you can gather data.

And data gathered for machine learning on an intentionally misconfigured test
system won't be applicable to other machines.

Good luck, this problem is a lot harder than it looks....

[-- Attachment #1.2: Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Predicting Process crash / Memory utlization using machine learning
  2019-10-09 21:28 ` Valdis Klētnieks
@ 2019-10-09 23:40   ` prathamesh naik
  2019-10-10  7:17     ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: prathamesh naik @ 2019-10-09 23:40 UTC (permalink / raw)
  To: Valdis Klētnieks; +Cc: kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 2730 bytes --]

Thanks a lot for sharing.
One of the problem I am facing is not having enough actual data. I can
create simulated data but it is overfitting my algorithm.
Second problem is I am not sure what all factors (called features in ML
terms) are useful for pattern creation.
Some of the factors I could think of were :
1. Memory used
2. CPU
3. shared memory
4. vmstat
5. message queue sizes

Regards,
Prathamesh




On Wed, Oct 9, 2019 at 2:28 PM Valdis Klētnieks <valdis.kletnieks@vt.edu>
wrote:

> On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:
> >             I want to work on project which can predict kernel process
> > crash or even user space process crash (or memory usage spikes) using
> > machine learning algorithms.
>
> This sounds like it's isomorphic to the Turing Halting Problem, and there's
> plenty of other good reasons to think that predicting a process crash is,
> in
> general, somewhere between "very difficult" and "impossible".
>
> Even "memory usage spikes" are going to be a challenge.
>
> Consider a program that's doing an in-memory sort. Your machine has 16 gig
> of
> memory, and 2 gig of swap.  It's known that the sort algorithm requires
> 1.5G of
> memory for each gigabyte of input data.
>
> Does the system start to thrash, or crash entirely, or does the sort
> complete
> without issues?  There's no way to make a prediction without knowing the
> size
> of the input data.  And if you're dealing with something like
>
> grep <regexp> file | predictable-memory-sort
>
> where 'file' is a logfile *much* bigger than memory....
>
> You can see where this is heading...
>
> Bottom line:  I'm pretty convinced that in the general case, you can't do
> much
> better than current monitoring systems already do: Look at free space,
> look at
> the free space trendline for the past 5 minutes or whatever, and issue an
> alert
> if the current trend indicates exhaustion in under 15 minutes.
>
> Now, what *might* be interesting is seeing if machine learning across
> multiple
> events is able to suggest better values than 5 and 15 minutes, to provide a
> best tradeoff between issuing an alert early enough that a sysadmin can
> take
> action, and avoiding issuing early alerts that turn out to be false alarms.
>
> The problem there is that getting enough data on actual production systems
> will be difficult, because sysadmins usually don't leave sub-optimal
> configuration
> settings in place so you can gather data.
>
> And data gathered for machine learning on an intentionally misconfigured
> test
> system won't be applicable to other machines.
>
> Good luck, this problem is a lot harder than it looks....
>

[-- Attachment #1.2: Type: text/html, Size: 3323 bytes --]

<div dir="ltr">Thanks a lot for sharing. <div>One of the problem I am facing is not having enough actual data. I can create simulated data but it is overfitting my algorithm.<br><div>Second problem is I am not sure what all factors (called features in ML terms) are useful for pattern creation.</div><div>Some of the factors I could think of were : </div><div>1. Memory used</div><div>2. CPU</div><div>3. shared memory</div><div>4. vmstat</div><div>5. message queue sizes</div><div><br></div><div>Regards,</div><div>Prathamesh</div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 9, 2019 at 2:28 PM Valdis Klētnieks &lt;<a href="mailto:valdis.kletnieks@vt.edu">valdis.kletnieks@vt.edu</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:<br>
&gt;             I want to work on project which can predict kernel process<br>
&gt; crash or even user space process crash (or memory usage spikes) using<br>
&gt; machine learning algorithms. <br>
<br>
This sounds like it&#39;s isomorphic to the Turing Halting Problem, and there&#39;s<br>
plenty of other good reasons to think that predicting a process crash is, in<br>
general, somewhere between &quot;very difficult&quot; and &quot;impossible&quot;.<br>
<br>
Even &quot;memory usage spikes&quot; are going to be a challenge.<br>
<br>
Consider a program that&#39;s doing an in-memory sort. Your machine has 16 gig of<br>
memory, and 2 gig of swap.  It&#39;s known that the sort algorithm requires 1.5G of<br>
memory for each gigabyte of input data.<br>
<br>
Does the system start to thrash, or crash entirely, or does the sort complete<br>
without issues?  There&#39;s no way to make a prediction without knowing the size<br>
of the input data.  And if you&#39;re dealing with something like <br>
<br>
grep &lt;regexp&gt; file | predictable-memory-sort<br>
<br>
where &#39;file&#39; is a logfile *much* bigger than memory....<br>
<br>
You can see where this is heading...<br>
<br>
Bottom line:  I&#39;m pretty convinced that in the general case, you can&#39;t do much<br>
better than current monitoring systems already do: Look at free space, look at<br>
the free space trendline for the past 5 minutes or whatever, and issue an alert<br>
if the current trend indicates exhaustion in under 15 minutes.<br>
<br>
Now, what *might* be interesting is seeing if machine learning across multiple<br>
events is able to suggest better values than 5 and 15 minutes, to provide a<br>
best tradeoff between issuing an alert early enough that a sysadmin can take<br>
action, and avoiding issuing early alerts that turn out to be false alarms.<br>
<br>
The problem there is that getting enough data on actual production systems<br>
will be difficult, because sysadmins usually don&#39;t leave sub-optimal configuration<br>
settings in place so you can gather data.<br>
<br>
And data gathered for machine learning on an intentionally misconfigured test<br>
system won&#39;t be applicable to other machines.<br>
<br>
Good luck, this problem is a lot harder than it looks....<br>
</blockquote></div>

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Predicting Process crash / Memory utlization using machine learning
  2019-10-09 23:40   ` prathamesh naik
@ 2019-10-10  7:17     ` Greg KH
  0 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2019-10-10  7:17 UTC (permalink / raw)
  To: prathamesh naik; +Cc: Valdis Klētnieks, kernelnewbies

On Wed, Oct 09, 2019 at 04:40:45PM -0700, prathamesh naik wrote:
> Thanks a lot for sharing.
> One of the problem I am facing is not having enough actual data. I can
> create simulated data but it is overfitting my algorithm.
> Second problem is I am not sure what all factors (called features in ML
> terms) are useful for pattern creation.
> Some of the factors I could think of were :
> 1. Memory used
> 2. CPU
> 3. shared memory
> 4. vmstat
> 5. message queue sizes

There are loads and loads of things you can monitor in a system.  See
any of the talks by Brendan Gregg http://www.brendangregg.com/ for lots
of examples.

good luck!

greg k-h

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Predicting Process crash / Memory utlization using machine learning
  2019-10-09  8:23 Predicting Process crash / Memory utlization using machine learning prathamesh naik
  2019-10-09 21:28 ` Valdis Klētnieks
@ 2019-10-10  8:23 ` Ruben Safir
  1 sibling, 0 replies; 5+ messages in thread
From: Ruben Safir @ 2019-10-10  8:23 UTC (permalink / raw)
  To: kernelnewbies

On 10/9/19 4:23 AM, prathamesh naik wrote:
> Hi all,
>             I want to work on project which can predict kernel process
> crash or even user space process crash (or memory usage spikes) using
> machine learning algorithms. Can someone point me what all data can be
> useful for tuning my algorithm ? is there already paper on this (could not
> find much articles on this) ?
> 
> Thanks,
> Prathamesh
> 
> 
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> 


is there an echo here?


-- 
So many immigrant groups have swept through our town
that Brooklyn, like Atlantis, reaches mythological
proportions in the mind of the world - RI Safir 1998
http://www.mrbrklyn.com
DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002

http://www.nylxs.com - Leadership Development in Free Software
http://www.brooklyn-living.com

Being so tracked is for FARM ANIMALS and extermination camps,
but incompatible with living as a free human being. -RI Safir 2013

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09  8:23 Predicting Process crash / Memory utlization using machine learning prathamesh naik
2019-10-09 21:28 ` Valdis Klētnieks
2019-10-09 23:40   ` prathamesh naik
2019-10-10  7:17     ` Greg KH
2019-10-10  8:23 ` Ruben Safir

Kernel Newbies archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org
	public-inbox-index kernelnewbies

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git