* Unexpected scheduling with mutexes @ 2019-03-27 10:56 Martin Christian 2019-03-29 20:01 ` Greg KH 0 siblings, 1 reply; 7+ messages in thread From: Martin Christian @ 2019-03-27 10:56 UTC (permalink / raw) To: kernelnewbies [-- Attachment #1.1.1: Type: text/plain, Size: 2047 bytes --] Hi, I've written a linux kernel module for an USB device. The USB driver provides 2 read-only character devices, which can be opened only exclusively by one process: - `/dev/cdev_a` - `/dev/cdev_b` The USB device can only handle one request at a time. The test setup is a follows: - Processes A reads data from 1st device: `dd if=/dev/cdev_a of=/tmp/a bs=X` - Processes B reads data from 2nd device: `dd if=/dev/cdev_b of=/tmp/b bs=X` - Process A and B run in parallel - After 10 seconds both processes are killed and size of both output files is compared. For certain values of `X` there is a significant difference in size between the two files, which I don't expect. A read call to the driver does the following: 1. `mutex_lock_interruptible(iolock)` 2. `usb_bulk_msg(dev, pipe, buf, X, timeout)` 3. `mutex_unlock(iolock)` 4. `copy_to_user(buf)` What I would expect is the following: 1. Proc A: `mutex_lock_interruptible(iolock)` 2. Proc A: `usb_bulk_msg(dev, pipe, buf, X, timeout)` 3. Scheduling: A -> B 4. Proc B: `mutex_lock_interruptible(iolock)` -> blocks 5. Scheduling: B -> A 6. Proc A: `mutex_unlock(iolock)` 7. Proc A: `copy_to_user(buf)` 8. Proc A: `mutex_lock_interruptible(iolock)` -> blocks 9. Scheduling: A -> B 10. Proc B: `usb_bulk_msg(dev, pipe, buf, X, timeout)` But what I see with ftrace is that in step 8, process A still continues. And it seems that for certain values of X the time inside the critical region is a multiple of the time slice, so that process B always gets the time slice when the critical region is blocked. What would be a best practise solution for this? I was thinking of calling `schedule()` each time after copying to user space or playing with nice values or using wait_queues? -- Dipl.-Inf. Martin Christian Senior Berater Entwicklung Hardware secunet Security Networks AG Tel.: +49 201 5454-3612, Fax +49 201 5454-1323 E-Mail: martin.christian@secunet.com Ammonstraße 74, 01067 Dresden www.secunet.com [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 170 bytes --] _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unexpected scheduling with mutexes 2019-03-27 10:56 Unexpected scheduling with mutexes Martin Christian @ 2019-03-29 20:01 ` Greg KH 2019-03-29 21:45 ` Valdis Klētnieks ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Greg KH @ 2019-03-29 20:01 UTC (permalink / raw) To: Martin Christian; +Cc: kernelnewbies On Wed, Mar 27, 2019 at 11:56:51AM +0100, Martin Christian wrote: > Hi, > > I've written a linux kernel module for an USB device. The USB driver > provides 2 read-only character devices, which can be opened only > exclusively by one process: > - `/dev/cdev_a` > - `/dev/cdev_b` "exclusive" opening really isn't that, unless you go through _HUGE_ gyrations to try to control this. It's not worth it in the end, so just let userspace deal with it. If it wants to interleave data to a device node, let it, it can handle the fallout. As an example of this, serial ports are not "exclusively owned", right? > The USB device can only handle one request at a time. > > The test setup is a follows: > - Processes A reads data from 1st device: `dd if=/dev/cdev_a of=/tmp/a > bs=X` > - Processes B reads data from 2nd device: `dd if=/dev/cdev_b of=/tmp/b > bs=X` > - Process A and B run in parallel > - After 10 seconds both processes are killed and size of both output > files is compared. > > For certain values of `X` there is a significant difference in size > between the two files, which I don't expect. > > A read call to the driver does the following: > 1. `mutex_lock_interruptible(iolock)` > 2. `usb_bulk_msg(dev, pipe, buf, X, timeout)` > 3. `mutex_unlock(iolock)` > 4. `copy_to_user(buf)` What are these values of X that cause differences here? > What I would expect is the following: > 1. Proc A: `mutex_lock_interruptible(iolock)` > 2. Proc A: `usb_bulk_msg(dev, pipe, buf, X, timeout)` > 3. Scheduling: A -> B > 4. Proc B: `mutex_lock_interruptible(iolock)` -> blocks > 5. Scheduling: B -> A > 6. Proc A: `mutex_unlock(iolock)` > 7. Proc A: `copy_to_user(buf)` > 8. Proc A: `mutex_lock_interruptible(iolock)` -> blocks > 9. Scheduling: A -> B > 10. Proc B: `usb_bulk_msg(dev, pipe, buf, X, timeout)` > > But what I see with ftrace is that in step 8, process A still continues. > And it seems that for certain values of X the time inside the critical > region is a multiple of the time slice, so that process B always gets > the time slice when the critical region is blocked. What would be a best > practise solution for this? I was thinking of calling `schedule()` each > time after copying to user space or playing with nice values or using > wait_queues? Step back a second and let me ask what exactly you are trying to solve here? If you are just playing around and want to watch mutexes being grabbed and passed of, that's fine, and fun. You are getting a good education with how scheduling works and of course, the hell^Wmess that USB really is. But if you are trying to somehow create a real api that you have to enforce the passing off of writing data from two different character devices in an interleaved format, you are doing this totally wrong, as this is not going to work with a simple mutex, as you have found out. There are so many different variables in play here, that you are trying to somehow sync up with just a single lock. You have, just off the top of my head: - scheduling issues of the different userspace programs - variability in USB transports (usb_bulk_msg() is just about the most inefficient way of ever sending data on a USB device, you end up wasting loads of time sleeping and waking up and waiting for things to happen, and the USB pipeline is almost totally empty for most of it.) - scheduling issues of the USB wakequeue that handles the data to be sent - sleeping/memory fault issues when copying data to/from userspace can play a huge factor for some systems. and I am sure there are more. A mutex isn't always "fair" in that it instantly gives up and passes control to someone else who is holding it at the same time. Sometimes it can be grabbed by someone else, like the person who just dropped it, based on a whole raft of factors that have been worked out over the years to provide a robust and scalable general purpose operating system. Schedulers too are not always "fair", that depends on a whole raft of things like what CPU is running when, where your task happens to be living at the moment, and of course, what else is happening in the system at the exact same time (I'm sure other things are happening, right?) Again, this is all due to the way CPUs work, and how Linux manages tasks in order to try to keep all resources used best at the moment. So, I really haven't answered your question here except to say, "it's complicated" and "you aren't measuring what you think you are measuring" :) Try to take USB out of the picture as well as userspace, and try running two kernel threads trying to grab a mutex and then print out "A" or "B" to the kernel log and then give it up. Is that output nicely interleaved or is there some duplicated messages.[1] Again, what are you really trying to determine here? Odds are there is a better way to do it, given that your above sequence of events is highly variable for a whole raft of reasons. thanks, greg k-h [1] Extra bonus points for those that recognize this task... _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unexpected scheduling with mutexes 2019-03-29 20:01 ` Greg KH @ 2019-03-29 21:45 ` Valdis Klētnieks 2019-03-30 12:25 ` Ruben Safir 2019-04-03 9:33 ` Martin Christian 2 siblings, 0 replies; 7+ messages in thread From: Valdis Klētnieks @ 2019-03-29 21:45 UTC (permalink / raw) To: Greg KH; +Cc: Martin Christian, kernelnewbies On Fri, 29 Mar 2019 21:01:58 +0100, Greg KH said: > But if you are trying to somehow create a real api that you have to > enforce the passing off of writing data from two different character > devices in an interleaved format, you are doing this totally wrong, as > this is not going to work with a simple mutex, as you have found out. There's almost always an even more fundamental issue here - I've seen plenty of people attempt to do this sort of thing. But invariably, they have little to no explanation of what semantics they think are correct. I'm not sure who are crazier - the people who try to do kernel-side locking for "exclusive" use of a device, or the people who don't understand why having 3 different programs trying to talk to /dev/ttyS0 at once will only lead to tearns and anguish... (Though recently, I discovered that there are no bad ideas so obvious that somebody won't try to re-invent them. I caught a software package that *really* should know better using "does DBus have an entry for this object?" as a lock.) > Try to take USB out of the picture as well as userspace, and try running > two kernel threads trying to grab a mutex and then print out "A" or "B" > to the kernel log and then give it up. Is that output nicely > interleaved or is there some duplicated messages.[1] > [1] Extra bonus points for those that recognize this task... Been there, done that, got the tire marks to prove it. :) _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unexpected scheduling with mutexes 2019-03-29 20:01 ` Greg KH 2019-03-29 21:45 ` Valdis Klētnieks @ 2019-03-30 12:25 ` Ruben Safir 2019-03-30 18:35 ` Greg KH 2019-04-03 9:33 ` Martin Christian 2 siblings, 1 reply; 7+ messages in thread From: Ruben Safir @ 2019-03-30 12:25 UTC (permalink / raw) To: kernelnewbies On 3/29/19 4:01 PM, Greg KH wrote: > As an example of this, serial ports are not "exclusively owned", right? they are not? What handles the interupt? -- So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998 http://www.mrbrklyn.com DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 http://www.nylxs.com - Leadership Development in Free Software http://www.brooklyn-living.com Being so tracked is for FARM ANIMALS and extermination camps, but incompatible with living as a free human being. -RI Safir 2013 _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unexpected scheduling with mutexes 2019-03-30 12:25 ` Ruben Safir @ 2019-03-30 18:35 ` Greg KH 0 siblings, 0 replies; 7+ messages in thread From: Greg KH @ 2019-03-30 18:35 UTC (permalink / raw) To: Ruben Safir; +Cc: kernelnewbies On Sat, Mar 30, 2019 at 08:25:57AM -0400, Ruben Safir wrote: > On 3/29/19 4:01 PM, Greg KH wrote: > > As an example of this, serial ports are not "exclusively owned", right? > > > they are not? What handles the interupt? Context is everything, and you cut out all of it here :( The kernel handles the interrupt of course, the sentence was referring to userspace interacting with the kernel, not anything else. greg k- _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unexpected scheduling with mutexes 2019-03-29 20:01 ` Greg KH 2019-03-29 21:45 ` Valdis Klētnieks 2019-03-30 12:25 ` Ruben Safir @ 2019-04-03 9:33 ` Martin Christian 2019-04-03 9:48 ` Greg KH 2 siblings, 1 reply; 7+ messages in thread From: Martin Christian @ 2019-04-03 9:33 UTC (permalink / raw) Cc: kernelnewbies [-- Attachment #1.1.1: Type: text/plain, Size: 2092 bytes --] Thanks a lot for the detailed reply! >> For certain values of `X` there is a significant difference in size >> between the two files, which I don't expect. >> >> A read call to the driver does the following: >> 1. `mutex_lock_interruptible(iolock)` >> 2. `usb_bulk_msg(dev, pipe, buf, X, timeout)` >> 3. `mutex_unlock(iolock)` >> 4. `copy_to_user(buf)` > > What are these values of X that cause differences here? Starting around 1k character device A gets more data until it turns over at around 4K. Request size from 10K yield the expected data rates. Character device A is a "real" random source and returns data much slower than device B, which is a pseudo random source. > But if you are trying to somehow create a real api that you have to > enforce the passing off of writing data from two different character > devices in an interleaved format, you are doing this totally wrong, as > this is not going to work with a simple mutex, as you have found out. As mentioned above, the USB device provides two different streams of random. But the device can process only one request at a time. Also I didn't want to have too much dynamic memory allocation, because I would need to allocate up to 64KB kernel memory on each open. That's because the USB device is designed to provide up to 64K of random in a single "request". A request has a header and footer "protecting" the request as a whole from data confusion. To make things simpler I decided to just allow one user space process at a time for each source - which is enough for our application. But yes, that could probably also got to user space. > So, I really haven't answered your question here except to say, "it's > complicated" and "you aren't measuring what you think you are measuring" :) Ok, I see. Thanks, Martin Christian -- Dipl.-Inf. Martin Christian Senior Berater Entwicklung Hardware secunet Security Networks AG Tel.: +49 201 5454-3612, Fax +49 201 5454-1323 E-Mail: martin.christian@secunet.com Ammonstraße 74, 01067 Dresden www.secunet.com [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 170 bytes --] _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unexpected scheduling with mutexes 2019-04-03 9:33 ` Martin Christian @ 2019-04-03 9:48 ` Greg KH 0 siblings, 0 replies; 7+ messages in thread From: Greg KH @ 2019-04-03 9:48 UTC (permalink / raw) To: Martin Christian; +Cc: kernelnewbies On Wed, Apr 03, 2019 at 11:33:56AM +0200, Martin Christian wrote: > >> For certain values of `X` there is a significant difference in size > >> between the two files, which I don't expect. > >> > >> A read call to the driver does the following: > >> 1. `mutex_lock_interruptible(iolock)` > >> 2. `usb_bulk_msg(dev, pipe, buf, X, timeout)` > >> 3. `mutex_unlock(iolock)` > >> 4. `copy_to_user(buf)` > > > > What are these values of X that cause differences here? > > Starting around 1k character device A gets more data until it turns over > at around 4K. Request size from 10K yield the expected data rates. Those are huge USB data stream sizes, what is the size of your USB endpoints? By doing large transfers like this, you are causing the USB core to do all the work (which is fine), but while that happens, lots of other things happen at the same time, making trying to measure things much more difficult. > Character device A is a "real" random source and returns data much > slower than device B, which is a pseudo random source. So those map to different USB device endpoints? > > But if you are trying to somehow create a real api that you have to > > enforce the passing off of writing data from two different character > > devices in an interleaved format, you are doing this totally wrong, as > > this is not going to work with a simple mutex, as you have found out. > > As mentioned above, the USB device provides two different streams of > random. But the device can process only one request at a time. Also I > didn't want to have too much dynamic memory allocation, because I would > need to allocate up to 64KB kernel memory on each open. So your USB device can not handle data from different endpoints at the same time? Or is it multiplexing it on the same endpoint? You need to provide a bit more information about your device for us to be able to help you out better. > That's because the USB device is designed to provide up to 64K of random > in a single "request". A request has a header and footer "protecting" > the request as a whole from data confusion. Who are you protecting the request from being confused from? The kernel? Userspace? Something else? Why not just tie your device into the kernel's random number system like other USB devices do that provide good entropy to the system? That way you don't have to do crazy things with character streams and blocking requests :) > To make things simpler I decided to just allow one user space process at > a time for each source - which is enough for our application. But yes, > that could probably also got to user space. Again, why not just use the random services provided by the kernel, and have your device feed that? That way everyone benefits and you don't have to do odd things and create a custom user api that no one else can use. thanks, greg k-h _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-04-03 13:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-27 10:56 Unexpected scheduling with mutexes Martin Christian 2019-03-29 20:01 ` Greg KH 2019-03-29 21:45 ` Valdis Klētnieks 2019-03-30 12:25 ` Ruben Safir 2019-03-30 18:35 ` Greg KH 2019-04-03 9:33 ` Martin Christian 2019-04-03 9:48 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).