* Predicting Process crash / Memory utlization using machine learning @ 2019-10-09 8:23 prathamesh naik 2019-10-09 21:28 ` Valdis Klētnieks 2019-10-10 8:23 ` Ruben Safir 0 siblings, 2 replies; 5+ messages in thread From: prathamesh naik @ 2019-10-09 8:23 UTC (permalink / raw) To: kernelnewbies [-- Attachment #1.1: Type: text/plain, Size: 345 bytes --] Hi all, I want to work on project which can predict kernel process crash or even user space process crash (or memory usage spikes) using machine learning algorithms. Can someone point me what all data can be useful for tuning my algorithm ? is there already paper on this (could not find much articles on this) ? Thanks, Prathamesh [-- Attachment #1.2: Type: text/html, Size: 419 bytes --] [-- Attachment #2: Type: text/plain, Size: 170 bytes --] _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Predicting Process crash / Memory utlization using machine learning 2019-10-09 8:23 Predicting Process crash / Memory utlization using machine learning prathamesh naik @ 2019-10-09 21:28 ` Valdis Klētnieks 2019-10-09 23:40 ` prathamesh naik 2019-10-10 8:23 ` Ruben Safir 1 sibling, 1 reply; 5+ messages in thread From: Valdis Klētnieks @ 2019-10-09 21:28 UTC (permalink / raw) To: prathamesh naik; +Cc: kernelnewbies [-- Attachment #1.1: Type: text/plain, Size: 2107 bytes --] On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said: > I want to work on project which can predict kernel process > crash or even user space process crash (or memory usage spikes) using > machine learning algorithms. This sounds like it's isomorphic to the Turing Halting Problem, and there's plenty of other good reasons to think that predicting a process crash is, in general, somewhere between "very difficult" and "impossible". Even "memory usage spikes" are going to be a challenge. Consider a program that's doing an in-memory sort. Your machine has 16 gig of memory, and 2 gig of swap. It's known that the sort algorithm requires 1.5G of memory for each gigabyte of input data. Does the system start to thrash, or crash entirely, or does the sort complete without issues? There's no way to make a prediction without knowing the size of the input data. And if you're dealing with something like grep <regexp> file | predictable-memory-sort where 'file' is a logfile *much* bigger than memory.... You can see where this is heading... Bottom line: I'm pretty convinced that in the general case, you can't do much better than current monitoring systems already do: Look at free space, look at the free space trendline for the past 5 minutes or whatever, and issue an alert if the current trend indicates exhaustion in under 15 minutes. Now, what *might* be interesting is seeing if machine learning across multiple events is able to suggest better values than 5 and 15 minutes, to provide a best tradeoff between issuing an alert early enough that a sysadmin can take action, and avoiding issuing early alerts that turn out to be false alarms. The problem there is that getting enough data on actual production systems will be difficult, because sysadmins usually don't leave sub-optimal configuration settings in place so you can gather data. And data gathered for machine learning on an intentionally misconfigured test system won't be applicable to other machines. Good luck, this problem is a lot harder than it looks.... [-- Attachment #1.2: Type: application/pgp-signature, Size: 832 bytes --] [-- Attachment #2: Type: text/plain, Size: 170 bytes --] _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Predicting Process crash / Memory utlization using machine learning 2019-10-09 21:28 ` Valdis Klētnieks @ 2019-10-09 23:40 ` prathamesh naik 2019-10-10 7:17 ` Greg KH 0 siblings, 1 reply; 5+ messages in thread From: prathamesh naik @ 2019-10-09 23:40 UTC (permalink / raw) To: Valdis Klētnieks; +Cc: kernelnewbies [-- Attachment #1.1: Type: text/plain, Size: 2730 bytes --] Thanks a lot for sharing. One of the problem I am facing is not having enough actual data. I can create simulated data but it is overfitting my algorithm. Second problem is I am not sure what all factors (called features in ML terms) are useful for pattern creation. Some of the factors I could think of were : 1. Memory used 2. CPU 3. shared memory 4. vmstat 5. message queue sizes Regards, Prathamesh On Wed, Oct 9, 2019 at 2:28 PM Valdis Klētnieks <valdis.kletnieks@vt.edu> wrote: > On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said: > > I want to work on project which can predict kernel process > > crash or even user space process crash (or memory usage spikes) using > > machine learning algorithms. > > This sounds like it's isomorphic to the Turing Halting Problem, and there's > plenty of other good reasons to think that predicting a process crash is, > in > general, somewhere between "very difficult" and "impossible". > > Even "memory usage spikes" are going to be a challenge. > > Consider a program that's doing an in-memory sort. Your machine has 16 gig > of > memory, and 2 gig of swap. It's known that the sort algorithm requires > 1.5G of > memory for each gigabyte of input data. > > Does the system start to thrash, or crash entirely, or does the sort > complete > without issues? There's no way to make a prediction without knowing the > size > of the input data. And if you're dealing with something like > > grep <regexp> file | predictable-memory-sort > > where 'file' is a logfile *much* bigger than memory.... > > You can see where this is heading... > > Bottom line: I'm pretty convinced that in the general case, you can't do > much > better than current monitoring systems already do: Look at free space, > look at > the free space trendline for the past 5 minutes or whatever, and issue an > alert > if the current trend indicates exhaustion in under 15 minutes. > > Now, what *might* be interesting is seeing if machine learning across > multiple > events is able to suggest better values than 5 and 15 minutes, to provide a > best tradeoff between issuing an alert early enough that a sysadmin can > take > action, and avoiding issuing early alerts that turn out to be false alarms. > > The problem there is that getting enough data on actual production systems > will be difficult, because sysadmins usually don't leave sub-optimal > configuration > settings in place so you can gather data. > > And data gathered for machine learning on an intentionally misconfigured > test > system won't be applicable to other machines. > > Good luck, this problem is a lot harder than it looks.... > [-- Attachment #1.2: Type: text/html, Size: 3323 bytes --] [-- Attachment #2: Type: text/plain, Size: 170 bytes --] _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Predicting Process crash / Memory utlization using machine learning 2019-10-09 23:40 ` prathamesh naik @ 2019-10-10 7:17 ` Greg KH 0 siblings, 0 replies; 5+ messages in thread From: Greg KH @ 2019-10-10 7:17 UTC (permalink / raw) To: prathamesh naik; +Cc: Valdis Klētnieks, kernelnewbies On Wed, Oct 09, 2019 at 04:40:45PM -0700, prathamesh naik wrote: > Thanks a lot for sharing. > One of the problem I am facing is not having enough actual data. I can > create simulated data but it is overfitting my algorithm. > Second problem is I am not sure what all factors (called features in ML > terms) are useful for pattern creation. > Some of the factors I could think of were : > 1. Memory used > 2. CPU > 3. shared memory > 4. vmstat > 5. message queue sizes There are loads and loads of things you can monitor in a system. See any of the talks by Brendan Gregg http://www.brendangregg.com/ for lots of examples. good luck! greg k-h _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Predicting Process crash / Memory utlization using machine learning 2019-10-09 8:23 Predicting Process crash / Memory utlization using machine learning prathamesh naik 2019-10-09 21:28 ` Valdis Klētnieks @ 2019-10-10 8:23 ` Ruben Safir 1 sibling, 0 replies; 5+ messages in thread From: Ruben Safir @ 2019-10-10 8:23 UTC (permalink / raw) To: kernelnewbies On 10/9/19 4:23 AM, prathamesh naik wrote: > Hi all, > I want to work on project which can predict kernel process > crash or even user space process crash (or memory usage spikes) using > machine learning algorithms. Can someone point me what all data can be > useful for tuning my algorithm ? is there already paper on this (could not > find much articles on this) ? > > Thanks, > Prathamesh > > > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > is there an echo here? -- So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998 http://www.mrbrklyn.com DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 http://www.nylxs.com - Leadership Development in Free Software http://www.brooklyn-living.com Being so tracked is for FARM ANIMALS and extermination camps, but incompatible with living as a free human being. -RI Safir 2013 _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-10 8:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-09 8:23 Predicting Process crash / Memory utlization using machine learning prathamesh naik 2019-10-09 21:28 ` Valdis Klētnieks 2019-10-09 23:40 ` prathamesh naik 2019-10-10 7:17 ` Greg KH 2019-10-10 8:23 ` Ruben Safir
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).