kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* Unexpected scheduling with mutexes
@ 2019-03-27 10:56 Martin Christian
  2019-03-29 20:01 ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Christian @ 2019-03-27 10:56 UTC (permalink / raw)
  To: kernelnewbies


[-- Attachment #1.1.1: Type: text/plain, Size: 2047 bytes --]

Hi,

I've written a linux kernel module for an USB device. The USB driver
provides 2 read-only character devices, which can be opened only
exclusively by one process:
 - `/dev/cdev_a`
 - `/dev/cdev_b`

The USB device can only handle one request at a time.

The test setup is a follows:
 - Processes A reads data from 1st device: `dd if=/dev/cdev_a of=/tmp/a
bs=X`
 - Processes B reads data from 2nd device: `dd if=/dev/cdev_b of=/tmp/b
bs=X`
 - Process A and B run in parallel
 - After 10 seconds both processes are killed and size of both output
files is compared.

For certain values of `X` there is a significant difference in size
between the two files, which I don't expect.

A read call to the driver does the following:
 1. `mutex_lock_interruptible(iolock)`
 2. `usb_bulk_msg(dev, pipe, buf, X, timeout)`
 3. `mutex_unlock(iolock)`
 4. `copy_to_user(buf)`

What I would expect is the following:
 1. Proc A: `mutex_lock_interruptible(iolock)`
 2. Proc A: `usb_bulk_msg(dev, pipe, buf, X, timeout)`
 3. Scheduling: A -> B
 4. Proc B: `mutex_lock_interruptible(iolock)` -> blocks
 5. Scheduling: B -> A
 6. Proc A: `mutex_unlock(iolock)`
 7. Proc A: `copy_to_user(buf)`
 8. Proc A: `mutex_lock_interruptible(iolock)` -> blocks
 9. Scheduling: A -> B
 10. Proc B: `usb_bulk_msg(dev, pipe, buf, X, timeout)`

But what I see with ftrace is that in step 8, process A still continues.
And it seems that for certain values of X the time inside the critical
region is a multiple of the time slice, so that process B always gets
the time slice when the critical region is blocked. What would be a best
practise solution for this? I was thinking of calling `schedule()` each
time after copying to user space or playing with nice values or using
wait_queues?

-- 
Dipl.-Inf. Martin Christian
Senior Berater Entwicklung Hardware
secunet Security Networks AG

Tel.: +49 201 5454-3612, Fax +49 201 5454-1323
E-Mail: martin.christian@secunet.com
Ammonstraße 74, 01067 Dresden
www.secunet.com




[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected scheduling with mutexes
  2019-03-27 10:56 Unexpected scheduling with mutexes Martin Christian
@ 2019-03-29 20:01 ` Greg KH
  2019-03-29 21:45   ` Valdis Klētnieks
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Greg KH @ 2019-03-29 20:01 UTC (permalink / raw)
  To: Martin Christian; +Cc: kernelnewbies

On Wed, Mar 27, 2019 at 11:56:51AM +0100, Martin Christian wrote:
> Hi,
> 
> I've written a linux kernel module for an USB device. The USB driver
> provides 2 read-only character devices, which can be opened only
> exclusively by one process:
>  - `/dev/cdev_a`
>  - `/dev/cdev_b`

"exclusive" opening really isn't that, unless you go through _HUGE_
gyrations to try to control this.  It's not worth it in the end, so just
let userspace deal with it.  If it wants to interleave data to a device
node, let it, it can handle the fallout.

As an example of this, serial ports are not "exclusively owned", right?

> The USB device can only handle one request at a time.
> 
> The test setup is a follows:
>  - Processes A reads data from 1st device: `dd if=/dev/cdev_a of=/tmp/a
> bs=X`
>  - Processes B reads data from 2nd device: `dd if=/dev/cdev_b of=/tmp/b
> bs=X`
>  - Process A and B run in parallel
>  - After 10 seconds both processes are killed and size of both output
> files is compared.
> 
> For certain values of `X` there is a significant difference in size
> between the two files, which I don't expect.
> 
> A read call to the driver does the following:
>  1. `mutex_lock_interruptible(iolock)`
>  2. `usb_bulk_msg(dev, pipe, buf, X, timeout)`
>  3. `mutex_unlock(iolock)`
>  4. `copy_to_user(buf)`

What are these values of X that cause differences here?

> What I would expect is the following:
>  1. Proc A: `mutex_lock_interruptible(iolock)`
>  2. Proc A: `usb_bulk_msg(dev, pipe, buf, X, timeout)`
>  3. Scheduling: A -> B
>  4. Proc B: `mutex_lock_interruptible(iolock)` -> blocks
>  5. Scheduling: B -> A
>  6. Proc A: `mutex_unlock(iolock)`
>  7. Proc A: `copy_to_user(buf)`
>  8. Proc A: `mutex_lock_interruptible(iolock)` -> blocks
>  9. Scheduling: A -> B
>  10. Proc B: `usb_bulk_msg(dev, pipe, buf, X, timeout)`
> 
> But what I see with ftrace is that in step 8, process A still continues.
> And it seems that for certain values of X the time inside the critical
> region is a multiple of the time slice, so that process B always gets
> the time slice when the critical region is blocked. What would be a best
> practise solution for this? I was thinking of calling `schedule()` each
> time after copying to user space or playing with nice values or using
> wait_queues?

Step back a second and let me ask what exactly you are trying to solve
here?  If you are just playing around and want to watch mutexes being
grabbed and passed of, that's fine, and fun.  You are getting a good
education with how scheduling works and of course, the hell^Wmess that
USB really is.

But if you are trying to somehow create a real api that you have to
enforce the passing off of writing data from two different character
devices in an interleaved format, you are doing this totally wrong, as
this is not going to work with a simple mutex, as you have found out.

There are so many different variables in play here, that you are trying
to somehow sync up with just a single lock.  You have, just off the top
of my head:
	- scheduling issues of the different userspace programs
	- variability in USB transports (usb_bulk_msg() is just about
	  the most inefficient way of ever sending data on a USB device,
	  you end up wasting loads of time sleeping and waking up and
	  waiting for things to happen, and the USB pipeline is almost
	  totally empty for most of it.)
	- scheduling issues of the USB wakequeue that handles the data
	  to be sent
	- sleeping/memory fault issues when copying data to/from
	  userspace can play a huge factor for some systems.
and I am sure there are more.

A mutex isn't always "fair" in that it instantly gives up and passes
control to someone else who is holding it at the same time.  Sometimes
it can be grabbed by someone else, like the person who just dropped it,
based on a whole raft of factors that have been worked out over the
years to provide a robust and scalable general purpose operating system.

Schedulers too are not always "fair", that depends on a whole raft of
things like what CPU is running when, where your task happens to be
living at the moment, and of course, what else is happening in the
system at the exact same time (I'm sure other things are happening,
right?)  Again, this is all due to the way CPUs work, and how Linux
manages tasks in order to try to keep all resources used best at the
moment.

So, I really haven't answered your question here except to say, "it's
complicated" and "you aren't measuring what you think you are measuring" :)

Try to take USB out of the picture as well as userspace, and try running
two kernel threads trying to grab a mutex and then print out "A" or "B"
to the kernel log and then give it up.  Is that output nicely
interleaved or is there some duplicated messages.[1]

Again, what are you really trying to determine here?  Odds are there is
a better way to do it, given that your above sequence of events is
highly variable for a whole raft of reasons.

thanks,

greg k-h

[1] Extra bonus points for those that recognize this task...

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected scheduling with mutexes
  2019-03-29 20:01 ` Greg KH
@ 2019-03-29 21:45   ` Valdis Klētnieks
  2019-03-30 12:25   ` Ruben Safir
  2019-04-03  9:33   ` Martin Christian
  2 siblings, 0 replies; 7+ messages in thread
From: Valdis Klētnieks @ 2019-03-29 21:45 UTC (permalink / raw)
  To: Greg KH; +Cc: Martin Christian, kernelnewbies

On Fri, 29 Mar 2019 21:01:58 +0100, Greg KH said:

> But if you are trying to somehow create a real api that you have to
> enforce the passing off of writing data from two different character
> devices in an interleaved format, you are doing this totally wrong, as
> this is not going to work with a simple mutex, as you have found out.

There's almost always an even more fundamental issue here - I've seen plenty of
people attempt to do this sort of thing.  But invariably, they have little to
no explanation of what semantics they think are correct. I'm not sure who are
crazier - the people who try to do kernel-side locking for "exclusive" use of a
device, or the people who don't understand why having 3 different programs
trying to talk to /dev/ttyS0 at once will only lead to tearns and anguish...

(Though recently, I discovered that there are no bad ideas so obvious that
somebody won't try to re-invent them.  I caught a software package that *really*
should know better using "does DBus have an entry for this object?" as a lock.)

> Try to take USB out of the picture as well as userspace, and try running
> two kernel threads trying to grab a mutex and then print out "A" or "B"
> to the kernel log and then give it up.  Is that output nicely
> interleaved or is there some duplicated messages.[1]

> [1] Extra bonus points for those that recognize this task...

Been there, done that, got the tire marks to prove it. :)

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected scheduling with mutexes
  2019-03-29 20:01 ` Greg KH
  2019-03-29 21:45   ` Valdis Klētnieks
@ 2019-03-30 12:25   ` Ruben Safir
  2019-03-30 18:35     ` Greg KH
  2019-04-03  9:33   ` Martin Christian
  2 siblings, 1 reply; 7+ messages in thread
From: Ruben Safir @ 2019-03-30 12:25 UTC (permalink / raw)
  To: kernelnewbies

On 3/29/19 4:01 PM, Greg KH wrote:
> As an example of this, serial ports are not "exclusively owned", right?


they are not?  What handles the interupt?

-- 
So many immigrant groups have swept through our town
that Brooklyn, like Atlantis, reaches mythological
proportions in the mind of the world - RI Safir 1998
http://www.mrbrklyn.com
DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002

http://www.nylxs.com - Leadership Development in Free Software
http://www.brooklyn-living.com

Being so tracked is for FARM ANIMALS and extermination camps,
but incompatible with living as a free human being. -RI Safir 2013

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected scheduling with mutexes
  2019-03-30 12:25   ` Ruben Safir
@ 2019-03-30 18:35     ` Greg KH
  0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2019-03-30 18:35 UTC (permalink / raw)
  To: Ruben Safir; +Cc: kernelnewbies

On Sat, Mar 30, 2019 at 08:25:57AM -0400, Ruben Safir wrote:
> On 3/29/19 4:01 PM, Greg KH wrote:
> > As an example of this, serial ports are not "exclusively owned", right?
> 
> 
> they are not?  What handles the interupt?

Context is everything, and you cut out all of it here :(

The kernel handles the interrupt of course, the sentence was referring
to userspace interacting with the kernel, not anything else.

greg k-

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected scheduling with mutexes
  2019-03-29 20:01 ` Greg KH
  2019-03-29 21:45   ` Valdis Klētnieks
  2019-03-30 12:25   ` Ruben Safir
@ 2019-04-03  9:33   ` Martin Christian
  2019-04-03  9:48     ` Greg KH
  2 siblings, 1 reply; 7+ messages in thread
From: Martin Christian @ 2019-04-03  9:33 UTC (permalink / raw)
  Cc: kernelnewbies


[-- Attachment #1.1.1: Type: text/plain, Size: 2092 bytes --]

Thanks a lot for the detailed reply!

>> For certain values of `X` there is a significant difference in size
>> between the two files, which I don't expect.
>>
>> A read call to the driver does the following:
>>  1. `mutex_lock_interruptible(iolock)`
>>  2. `usb_bulk_msg(dev, pipe, buf, X, timeout)`
>>  3. `mutex_unlock(iolock)`
>>  4. `copy_to_user(buf)`
> 
> What are these values of X that cause differences here?

Starting around 1k character device A gets more data until it turns over
at around 4K. Request size from 10K yield the expected data rates.

Character device A is a "real" random source and returns data much
slower than device B, which is a pseudo random source.

> But if you are trying to somehow create a real api that you have to
> enforce the passing off of writing data from two different character
> devices in an interleaved format, you are doing this totally wrong, as
> this is not going to work with a simple mutex, as you have found out.

As mentioned above, the USB device provides two different streams of
random. But the device can process only one request at a time. Also I
didn't want to have too much dynamic memory allocation, because I would
need to allocate up to 64KB kernel memory on each open.

That's because the USB device is designed to provide up to 64K of random
in a single "request". A request has a header and footer "protecting"
the request as a whole from data confusion.

To make things simpler I decided to just allow one user space process at
a time for each source - which is enough for our application. But yes,
that could probably also got to user space.

> So, I really haven't answered your question here except to say, "it's
> complicated" and "you aren't measuring what you think you are measuring" :)

Ok, I see.

Thanks,

Martin Christian

-- 
Dipl.-Inf. Martin Christian
Senior Berater Entwicklung Hardware
secunet Security Networks AG

Tel.: +49 201 5454-3612, Fax +49 201 5454-1323
E-Mail: martin.christian@secunet.com
Ammonstraße 74, 01067 Dresden
www.secunet.com


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected scheduling with mutexes
  2019-04-03  9:33   ` Martin Christian
@ 2019-04-03  9:48     ` Greg KH
  0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2019-04-03  9:48 UTC (permalink / raw)
  To: Martin Christian; +Cc: kernelnewbies

On Wed, Apr 03, 2019 at 11:33:56AM +0200, Martin Christian wrote:
> >> For certain values of `X` there is a significant difference in size
> >> between the two files, which I don't expect.
> >>
> >> A read call to the driver does the following:
> >>  1. `mutex_lock_interruptible(iolock)`
> >>  2. `usb_bulk_msg(dev, pipe, buf, X, timeout)`
> >>  3. `mutex_unlock(iolock)`
> >>  4. `copy_to_user(buf)`
> > 
> > What are these values of X that cause differences here?
> 
> Starting around 1k character device A gets more data until it turns over
> at around 4K. Request size from 10K yield the expected data rates.

Those are huge USB data stream sizes, what is the size of your USB
endpoints?

By doing large transfers like this, you are causing the USB core to do
all the work (which is fine), but while that happens, lots of other
things happen at the same time, making trying to measure things much
more difficult.

> Character device A is a "real" random source and returns data much
> slower than device B, which is a pseudo random source.

So those map to different USB device endpoints?

> > But if you are trying to somehow create a real api that you have to
> > enforce the passing off of writing data from two different character
> > devices in an interleaved format, you are doing this totally wrong, as
> > this is not going to work with a simple mutex, as you have found out.
> 
> As mentioned above, the USB device provides two different streams of
> random. But the device can process only one request at a time. Also I
> didn't want to have too much dynamic memory allocation, because I would
> need to allocate up to 64KB kernel memory on each open.

So your USB device can not handle data from different endpoints at the
same time?  Or is it multiplexing it on the same endpoint?  You need to
provide a bit more information about your device for us to be able to
help you out better.

> That's because the USB device is designed to provide up to 64K of random
> in a single "request". A request has a header and footer "protecting"
> the request as a whole from data confusion.

Who are you protecting the request from being confused from?  The
kernel?  Userspace?  Something else?

Why not just tie your device into the kernel's random number system like
other USB devices do that provide good entropy to the system?  That way
you don't have to do crazy things with character streams and blocking
requests :)

> To make things simpler I decided to just allow one user space process at
> a time for each source - which is enough for our application. But yes,
> that could probably also got to user space.

Again, why not just use the random services provided by the kernel, and
have your device feed that?  That way everyone benefits and you don't
have to do odd things and create a custom user api that no one else can
use.

thanks,

greg k-h

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-03 13:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27 10:56 Unexpected scheduling with mutexes Martin Christian
2019-03-29 20:01 ` Greg KH
2019-03-29 21:45   ` Valdis Klētnieks
2019-03-30 12:25   ` Ruben Safir
2019-03-30 18:35     ` Greg KH
2019-04-03  9:33   ` Martin Christian
2019-04-03  9:48     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).