* Adding latency tracking to trace-cmd @ 2021-02-24 16:54 Steven Rostedt 2021-02-24 19:22 ` Viktor.Rosendahl 0 siblings, 1 reply; 9+ messages in thread From: Steven Rostedt @ 2021-02-24 16:54 UTC (permalink / raw) To: Viktor Rosendahl; +Cc: Linux Trace Devel Hi Viktor, While doing some tests to trace-cmd, I found that the trace-cmd record was broken for latency tracers, and has been for some time. That's probably because nobody has been using it for such. That is, trace-cmd record is mostly used for "flyrecord" which is the constant recording of tracing. But if the user records one of the latency tracers (preemptirqsoff, wakeup_rt, etc), then it switches to "latency" mode and only takes a snapshot at the end of the recording. This is what I found to be broken, because it reset the tracing before taking the snapshot and lost whatever it was recording. Nobody seemed to complain about it, so I guess nobody cared. The fix is simple, and I was about to do so, but then I thought about your latency-collector tool, and thought that should be exactly what trace-cmd should do for such tracers. That is, instead of just taking a snapshot of the latency at the end of the recording (which it was suppose to do now), it should act like your latency-collector tool, and take snapshots every time there's a new latency. My question to you is, would you like to work on adding that feature to trace-cmd? git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git (which requires the libtracefs and libtracevent libraries). I'm thinking if someone were to do: trace-cmd record -p wakeup_rt It would then act just like the latency-collector and record every new instance of a latency into the trace.dat file, where: trace-cmd report would then nicely format that data for the user. Thoughts? -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-02-24 16:54 Adding latency tracking to trace-cmd Steven Rostedt @ 2021-02-24 19:22 ` Viktor.Rosendahl 2021-02-24 20:10 ` Steven Rostedt 2021-04-06 17:38 ` Steven Rostedt 0 siblings, 2 replies; 9+ messages in thread From: Viktor.Rosendahl @ 2021-02-24 19:22 UTC (permalink / raw) To: rostedt; +Cc: linux-trace-devel Hi Steve, On Wed, 2021-02-24 at 11:54 -0500, Steven Rostedt wrote: > Hi Viktor, > > While doing some tests to trace-cmd, I found that the trace-cmd record was > broken for latency tracers, and has been for some time. That's probably > because nobody has been using it for such. > > That is, trace-cmd record is mostly used for "flyrecord" which is the > constant recording of tracing. But if the user records one of the latency > tracers (preemptirqsoff, wakeup_rt, etc), then it switches to "latency" > mode and only takes a snapshot at the end of the recording. This is what I > found to be broken, because it reset the tracing before taking the > snapshot and lost whatever it was recording. Nobody seemed to complain > about it, so I guess nobody cared. > I have used trace-cmd but I did not know about this feature. Somehow I had assumed that trace-cmd would not deal with those tracers that work in overwrite mode. > The fix is simple, and I was about to do so, but then I thought about your > latency-collector tool, and thought that should be exactly what trace-cmd > should do for such tracers. That is, instead of just taking a snapshot of > the latency at the end of the recording (which it was suppose to do now), > it should act like your latency-collector tool, and take snapshots > every time there's a new latency. > > My question to you is, would you like to work on adding that feature to > trace-cmd? I would be interested to work on the feature but I would probably need permission from my employer, also in case I do it as a spare time project. For this reason, I will need some time come to a decision. I will get back to you when I know the answer. > > git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git > > (which requires the libtracefs and libtracevent libraries). > > I'm thinking if someone were to do: > > trace-cmd record -p wakeup_rt > > It would then act just like the latency-collector and record every new > instance of a latency into the trace.dat file, where: > > trace-cmd report > > would then nicely format that data for the user. > > Thoughts? It seems like trace-cmd is mostly GPL-2.0. The latency-collector is also GPL- 2.0, so I guess that it would be possible to take snippets of code and inspiration from it. Would you allow the hackish random algorithm to be used in trace-cmd? I mean the "--random" option in latency-collector. best regards, Viktor > > -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-02-24 19:22 ` Viktor.Rosendahl @ 2021-02-24 20:10 ` Steven Rostedt 2021-04-06 17:38 ` Steven Rostedt 1 sibling, 0 replies; 9+ messages in thread From: Steven Rostedt @ 2021-02-24 20:10 UTC (permalink / raw) To: Viktor.Rosendahl; +Cc: linux-trace-devel On Wed, 24 Feb 2021 19:22:48 +0000 <Viktor.Rosendahl@bmw.de> wrote: > I would be interested to work on the feature but I would probably need > permission from my employer, also in case I do it as a spare time project. Totally understand. > > For this reason, I will need some time come to a decision. I will get back to > you when I know the answer. No rush. I'll add the "quick fix" (which I Cc'd you on). I'm trying to get people to feel comfortable in contributing to trace-cmd and the libraries associated with it. Having "trace-cmd" in the database of acceptable open source projects to contribute to at companies like BMW will help facilitate that as well. Thanks, -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-02-24 19:22 ` Viktor.Rosendahl 2021-02-24 20:10 ` Steven Rostedt @ 2021-04-06 17:38 ` Steven Rostedt 2021-04-06 20:03 ` Viktor.Rosendahl 1 sibling, 1 reply; 9+ messages in thread From: Steven Rostedt @ 2021-04-06 17:38 UTC (permalink / raw) To: Viktor.Rosendahl; +Cc: linux-trace-devel On Wed, 24 Feb 2021 19:22:48 +0000 <Viktor.Rosendahl@bmw.de> wrote: > > > > My question to you is, would you like to work on adding that feature to > > trace-cmd? > > I would be interested to work on the feature but I would probably need > permission from my employer, also in case I do it as a spare time project. > > For this reason, I will need some time come to a decision. I will get back to > you when I know the answer. Hi Viktor, Any word on if your employer would let you work on trace-cmd? > > > > > git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git > > > > (which requires the libtracefs and libtracevent libraries). > > > > I'm thinking if someone were to do: > > > > trace-cmd record -p wakeup_rt > > > > It would then act just like the latency-collector and record every new > > instance of a latency into the trace.dat file, where: > > > > trace-cmd report > > > > would then nicely format that data for the user. > > > > Thoughts? > > It seems like trace-cmd is mostly GPL-2.0. The latency-collector is also GPL- > 2.0, so I guess that it would be possible to take snippets of code and > inspiration from it. > > Would you allow the hackish random algorithm to be used in trace-cmd? > I mean the "--random" option in latency-collector. > If you feel it is useful, sure. Although a definition of what it exactly does needs to be a bit more explained. I'm not really sure what use the --random option is for the latency tracer. -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-04-06 17:38 ` Steven Rostedt @ 2021-04-06 20:03 ` Viktor.Rosendahl 2021-04-06 20:24 ` Viktor.Rosendahl 2021-04-06 20:42 ` Steven Rostedt 0 siblings, 2 replies; 9+ messages in thread From: Viktor.Rosendahl @ 2021-04-06 20:03 UTC (permalink / raw) To: rostedt; +Cc: linux-trace-devel On Tue, 2021-04-06 at 13:38 -0400, Steven Rostedt wrote: > On Wed, 24 Feb 2021 19:22:48 +0000 > <Viktor.Rosendahl@bmw.de> wrote: > > > My question to you is, would you like to work on adding that feature to > > > trace-cmd? > > > > I would be interested to work on the feature but I would probably need > > permission from my employer, also in case I do it as a spare time project. > > > > For this reason, I will need some time come to a decision. I will get back > > to > > you when I know the answer. > > Hi Viktor, > > Any word on if your employer would let you work on trace-cmd? Hi Steve, The management was agreeable to it but I haven't had any spare time to work on it. To be honest, I could have worked on it during my Easter holidays but I just happened to busy myself with other private interests :) > > > > git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git > > > > > > (which requires the libtracefs and libtracevent libraries). > > > > > > I'm thinking if someone were to do: > > > > > > trace-cmd record -p wakeup_rt > > > > > > It would then act just like the latency-collector and record every new > > > instance of a latency into the trace.dat file, where: > > > > > > trace-cmd report > > > > > > would then nicely format that data for the user. > > > > > > Thoughts? > > > > It seems like trace-cmd is mostly GPL-2.0. The latency-collector is also > > GPL- > > 2.0, so I guess that it would be possible to take snippets of code and > > inspiration from it. > > > > Would you allow the hackish random algorithm to be used in trace-cmd? > > I mean the "--random" option in latency-collector. > > > > If you feel it is useful, sure. Although a definition of what it exactly > does needs to be a bit more explained. I'm not really sure what use the > --random option is for the latency tracer. > The usefulness lies in the case where you have latencies that systematically occur very close to each other. Let's suppose that we have some kernel code like this: static void foo(void) { long flags; spin_lock_irqsave(&mylock, flags); do_something_interesting(); /* This always takes 6 milliseconds */ spin_unlock_irqrestore(&mylock, flags) } static void bar(void) { long flags; spin_lock_irqsave(&mylock, flags); do_something_else(); /* This takes several tens of milliseconds */ spin_unlock_irqrestore(&mylock, flags) } void foobar(void) { foo(); // There could be plenty of code here, as long as it takes less than one // blackout period, we would have problem. This could be hundreds of ms. bar(); } EXPORT_SYMBOL(foobar); Let's assume that I believe that my soft realtime system may miss a deadline if there are latencies in the kernel that are greater than 5ms, so I use a threshold of 5 ms when tracing. Now, I start a long and expensive test campaign with many devices. I use the latency-collector without the --random option because I think it's unnecessary, then I will only ever find the latency caused by foo(), which is 6 ms. Specifically, what will happen is that at the end of foo(), ftrace will generate that inotify() event, which will wake up the latency collector, which opens the trace file. When the trace file is opened, there will be a blackout of a few hundred milliseconds and during this blackout no latencies will be captured. The latency event from bar() would only be generated at the end of the bar() function and since bar() takes so long to execute, it's very likely that the latency-collector has had time to react to the first latency and opened the trace file, which will cause the blackout to happen before we reach te end of bar(). Perhaps, I am able to fix the latency generated by foo(), so I believe that everything is fine, which it is not because I have that second much longer latency in bar() that I have not seen. Without the --random option, I would at this point need to start another test campaign, only then would I start seeing the longer latencies from bar(). On the other hand, if I use the --random option with the latency-collector, then what will happen is that sometimes the latency-collector will open the trace file immediately and other times it will sleep and wait for up to one second before doing so. If it opens the file immediately, we will get the first latency. If based on that random toin coss function decides to sleep, then we will get the second. If a long test camaping is exectuted, and foobar() is called several times during that campaign, then there is a good probability that we will capture both the latency from foo() and the one from bar(). Now --random is a bit more complicated because it actually tosses the coin again if another latency occurs when it is sleeping before opening the file. The probability of that coin toss function are chosen so that if we assume that there is a maximum of N closely occuring latencies, we will get each of them with probability 1/N. If the latency-collector detects a situation were it actually has detected N closely occuring latencies, it will automatically increase N to N + 1 and update the probablities of the coin toss accordingly. So basically the idea is that by acting randomly, we will not systematically lose a particular latency. It will also not make matters worse if we never get latencies that occur close to each other; the only drawback is that we sometimes wait for one second before opening the trace file and that doesn't really matter if there aren't any closely occuring latencies. Admittedly, I have never detected any real world case like the one described above in the kernel but I don't think that it's completely unthinkable. Also, a real world situation is likely to be more complicated but it's enough if we have one latency always occurring before another. Also, they need not be immediately after each other, it's enough if they are within one blackout period from each other. I measured the blackout period at one point with a system that I was working with and IIRC, it was something like 450 ms. The exact lenght could be dependent on things like the depth of the stack trace that needs to unwinded etc. Does this explanation make any sense to you? best regards, Viktor > -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-04-06 20:03 ` Viktor.Rosendahl @ 2021-04-06 20:24 ` Viktor.Rosendahl 2021-04-06 20:42 ` Steven Rostedt 1 sibling, 0 replies; 9+ messages in thread From: Viktor.Rosendahl @ 2021-04-06 20:24 UTC (permalink / raw) To: rostedt; +Cc: linux-trace-devel On Tue, 2021-04-06 at 22:03 +0200, Viktor Rosendahl wrote: > > > The usefulness lies in the case where you have latencies that systematically > occur very close to each other. > I should have dropped the "very" from the statement above. > > > Admittedly, I have never detected any real world case like the one described > above in the kernel but I don't think that it's completely unthinkable. > I forgot to mention that the kernel has a synthetic example of this situation in the preemptirq_delay_test module, if you use the burst_size parameter. If you want to see all the functions preemptirqtest_[0-9] in your stack traces, then you probably need to use the --random option. Then again this burst feature was implemented by me in order to test the latency-collector. best regards, Viktor ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-04-06 20:03 ` Viktor.Rosendahl 2021-04-06 20:24 ` Viktor.Rosendahl @ 2021-04-06 20:42 ` Steven Rostedt 2021-04-08 21:01 ` Viktor Rosendahl 1 sibling, 1 reply; 9+ messages in thread From: Steven Rostedt @ 2021-04-06 20:42 UTC (permalink / raw) To: Viktor.Rosendahl; +Cc: linux-trace-devel On Tue, 6 Apr 2021 20:03:46 +0000 <Viktor.Rosendahl@bmw.de> wrote: > Hi Steve, > > The management was agreeable to it but I haven't had any spare time to work on > it. Awesome. > > To be honest, I could have worked on it during my Easter holidays but I just > happened to busy myself with other private interests :) I wouldn't expect you to use your Easter holidays on this anyway ;-) I was just clearing out my INBOX and noticed that it's been a while, and decided to ping you so we don't forget. >> > > > > > Would you allow the hackish random algorithm to be used in trace-cmd? > > > I mean the "--random" option in latency-collector. > > > > > > > If you feel it is useful, sure. Although a definition of what it exactly > > does needs to be a bit more explained. I'm not really sure what use the > > --random option is for the latency tracer. > > > > The usefulness lies in the case where you have latencies that systematically > occur very close to each other. [..] > > Without the --random option, I would at this point need to start another test > campaign, only then would I start seeing the longer latencies from bar(). > > On the other hand, if I use the --random option with the latency-collector, then > what will happen is that sometimes the latency-collector will open the trace > file immediately and other times it will sleep and wait for up to one second > before doing so. If it opens the file immediately, we will get the first > latency. If based on that random toin coss function decides to sleep, then we > will get the second. > > If a long test camaping is exectuted, and foobar() is called several times > during that campaign, then there is a good probability that we will capture both > the latency from foo() and the one from bar(). > Now --random is a bit more complicated because it actually tosses the coin again > if another latency occurs when it is sleeping before opening the file. The > probability of that coin toss function are chosen so that if we assume that > there is a maximum of N closely occuring latencies, we will get each of them > with probability 1/N. If the latency-collector detects a situation were it > actually has detected N closely occuring latencies, it will automatically > increase N to N + 1 and update the probablities of the coin toss accordingly. > > So basically the idea is that by acting randomly, we will not systematically > lose a particular latency. It will also not make matters worse if we never get > latencies that occur close to each other; the only drawback is that we sometimes > wait for one second before opening the trace file and that doesn't really matter > if there aren't any closely occuring latencies. Hmm, sounds more like "--various" would be better than "--random". Just because it appears you wont to try different timings. Having a --random option just doesn't sound like it's what you expect it to be. There's already a "-s" option that takes a sleep interval between wakeups, which sounds similar to what you have. Perhaps we can make "-1" a special value to do the "random" wakeup thing. "0" is already special to make it wake up when it detects data in the buffer. -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-04-06 20:42 ` Steven Rostedt @ 2021-04-08 21:01 ` Viktor Rosendahl 2021-04-08 21:17 ` Steven Rostedt 0 siblings, 1 reply; 9+ messages in thread From: Viktor Rosendahl @ 2021-04-08 21:01 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-trace-devel On Tue, 2021-04-06 at 16:42 -0400, Steven Rostedt wrote: > On Tue, 6 Apr 2021 20:03:46 +0000 > <Viktor.Rosendahl@bmw.de> wrote: > <clip> > > Without the --random option, I would at this point need to start another > > test > > campaign, only then would I start seeing the longer latencies from bar(). > > > > On the other hand, if I use the --random option with the latency-collector, > > then > > what will happen is that sometimes the latency-collector will open the trace > > file immediately and other times it will sleep and wait for up to one second > > before doing so. If it opens the file immediately, we will get the first > > latency. If based on that random toin coss function decides to sleep, then > > we > > will get the second. > > > > If a long test camaping is exectuted, and foobar() is called several times > > during that campaign, then there is a good probability that we will capture > > both > > the latency from foo() and the one from bar(). > > Now --random is a bit more complicated because it actually tosses the coin > > again > > if another latency occurs when it is sleeping before opening the file. The > > probability of that coin toss function are chosen so that if we assume that > > there is a maximum of N closely occuring latencies, we will get each of them > > with probability 1/N. If the latency-collector detects a situation were it > > actually has detected N closely occuring latencies, it will automatically > > increase N to N + 1 and update the probablities of the coin toss > > accordingly. > > > > So basically the idea is that by acting randomly, we will not systematically > > lose a particular latency. It will also not make matters worse if we never > > get > > latencies that occur close to each other; the only drawback is that we > > sometimes > > wait for one second before opening the trace file and that doesn't really > > matter > > if there aren't any closely occuring latencies. > > Hmm, sounds more like "--various" would be better than "--random". Just > because it appears you wont to try different timings. Having a --random > option just doesn't sound like it's what you expect it to be. > I guess that you are right that with --random many people may assume the sleep time to be random. Perhaps --dizzy-sleep or --arbitrary-sleep would be better? For the latency-collector, I used the name --random because the behavior is based on the lrand48_r() call, seeded by /dev/urandom. In my thinking it's the sleeping behavior that is random, not the sleep time. I guess that a mathematical purist may say that the behavior is arbitrary rather than random, because if the value N is different from two, then there are unequal probabilities between the two choices. At first when I developed the latency-collector, I thought that using a random sleep time would be the right approach but I came to the conclusion that it is not a good idea. If we have the case with a burst of two latencies, where the first one is 5 ms, , then if we for example sleep randomly between 0 and 500 ms, we will only have a 1% chance to get the first latency. If we use the random algorithm from the latency-collector, with N=2, then we have 50% chance, which is much better. Even if we would have been paranoid to initialize with N=5, in order to prepare for bursts sizes of up to 5, then we would still have a 20% chance. If we have a random sleep time then we would need to make assumptions on how long the latencies are and how much time there is between them, and there is no way to guess that beforehand. With the random behavior, we only need to make an assumption about how many latencies there are going to be in a burst, which is the value N. This is also impossible to know with certainty but we can make some educated guess about it. Also, as I already mentioned, if the latency-collector ever encounters N latencies in a burst, then it will automatically increment N. > There's already a "-s" option that takes a sleep interval between wakeups, > which sounds similar to what you have. Perhaps we can make "-1" a special > value to do the "random" wakeup thing. "0" is already special to make it > wake up when it detects data in the buffer. > To me "-s -1" feels a bit too cryptic and non-descriptive. Also, I wonder how many would read the description of the -s option carefully enough to notice it. On the other hand, trace-cmd record seems to already use most of the alphabet as short options. Do you think it could be acceptable to add a new long option, such as --dizzy-sleep? best regards, Viktor ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Adding latency tracking to trace-cmd 2021-04-08 21:01 ` Viktor Rosendahl @ 2021-04-08 21:17 ` Steven Rostedt 0 siblings, 0 replies; 9+ messages in thread From: Steven Rostedt @ 2021-04-08 21:17 UTC (permalink / raw) To: Viktor Rosendahl; +Cc: linux-trace-devel On Thu, 8 Apr 2021 23:01:53 +0200 Viktor Rosendahl <viktor.rosendahl@gmail.com> wrote: > > Hmm, sounds more like "--various" would be better than "--random". Just > > because it appears you wont to try different timings. Having a --random > > option just doesn't sound like it's what you expect it to be. > > > > I guess that you are right that with --random many people may assume the > sleep time to be random. Perhaps --dizzy-sleep or --arbitrary-sleep > would be better? I think "--various-sleep" may be better. Note, I'm currently doing my income taxes, and when you sell a bunch of stocks that you bought at different times, for the date, you enter "various". Which is why that word is in my head ;-) > > For the latency-collector, I used the name --random because the behavior > is based on the lrand48_r() call, seeded by /dev/urandom. In my thinking > it's the sleeping behavior that is random, not the sleep time. > > I guess that a mathematical purist may say that the behavior is > arbitrary rather than random, because if the value N is different from > two, then there are unequal probabilities between the two choices. > > At first when I developed the latency-collector, I thought that using a > random sleep time would be the right approach but I came to the > conclusion that it is not a good idea. > > If we have the case with a burst of two latencies, where the first one > is 5 ms, , then if we for example sleep randomly between 0 and 500 ms, > we will only have a 1% chance to get the first latency. If we use the > random algorithm from the latency-collector, with N=2, then we have 50% > chance, which is much better. Even if we would have been paranoid to > initialize with N=5, in order to prepare for bursts sizes of up to 5, > then we would still have a 20% chance. > > If we have a random sleep time then we would need to make assumptions on > how long the latencies are and how much time there is between them, and > there is no way to guess that beforehand. > > With the random behavior, we only need to make an assumption about how > many latencies there are going to be in a burst, which is the value N. > This is also impossible to know with certainty but we can make some > educated guess about it. Also, as I already mentioned, if the > latency-collector ever encounters N latencies in a burst, then it will > automatically increment N. > > > There's already a "-s" option that takes a sleep interval between > wakeups, > > which sounds similar to what you have. Perhaps we can make "-1" a special > > value to do the "random" wakeup thing. "0" is already special to make it > > wake up when it detects data in the buffer. > > > > To me "-s -1" feels a bit too cryptic and non-descriptive. Also, I > wonder how many would read the description of the -s option carefully > enough to notice it. It really comes to how well we make our documentation, and I want to create a tutorial for most common operations. When there's good documentation (and a simple and quick tutorial) cryptic solutions like the above can be useful. > > On the other hand, trace-cmd record seems to already use most of the > alphabet as short options. Yeah, it has suffered that :-/ > > Do you think it could be acceptable to add a new long option, such as > --dizzy-sleep? I'm not really sure what you mean by "dizzy-sleep". To me, that means it is unbalanced and not stable. Where as --various-sleep is pretty much exactly what it is. It sleeps for various amounts of time. -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-04-08 21:17 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-24 16:54 Adding latency tracking to trace-cmd Steven Rostedt 2021-02-24 19:22 ` Viktor.Rosendahl 2021-02-24 20:10 ` Steven Rostedt 2021-04-06 17:38 ` Steven Rostedt 2021-04-06 20:03 ` Viktor.Rosendahl 2021-04-06 20:24 ` Viktor.Rosendahl 2021-04-06 20:42 ` Steven Rostedt 2021-04-08 21:01 ` Viktor Rosendahl 2021-04-08 21:17 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).