* [PATCH] udev enhancements to use kernel event queue @ 2003-06-12 19:10 Steven Dake 2003-06-12 21:47 ` Patrick Mochel ` (4 more replies) 0 siblings, 5 replies; 73+ messages in thread From: Steven Dake @ 2003-06-12 19:10 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 2004 bytes --] Folks, I have been looking at the udev idea that Greg KH has developed. Userland device enumeration definately is the way to go, however, there are some problems with using /sbin/hotplug to transmit device enumeration events: 1) performance of fork/exec with lots of devices is very poor 2) lots of processes can be forked with no policy to control how many are forked at once potentially leading to OOM 3) /sbin/hotplug events can occur out of order, eg: remove event occurs, /sbin/hotplug sleeps waiting for something, insert event occurs and completes immediately. Then the remove event completes, after the insert, resulting in out-of-order completion and a broken /dev. I have seen this several times with udev. 4) early device enumeration (character devices, etc) are missed because /sbin/hotplug is not available during early device init. To solve these problems, I've developed a driver and some minor modifications to the lib/kobject.c to queue these events and allow a user space program to read the events at its leisure. The driver supports poll/select for effecient use of the processor. The feature is called the System Device Enumeration Event Queue. I've attached a tarball which includes udev-0.1 modified to use the SDEQ, a kernel patch against linux 2.5.70 that implements the SDEQ, and a test/library that tests the SDEQ functionality. To jump right in try these steps: 1) apply the SDEQ patch 2) configure CONFIG_SDEQ on in config/general 3) make in the udev-0.1 directory 4) mkdir /newdevs (this is where the devices are enumerated) 5) reboot with new kernel 6) mount sysfs in /sys 7) mknod /dev/sdeq c 10 222 (this creates the device node that is used to communicate with the SDEQ) 7) run ./udev in udev-0.1 directory Look in /newdevs for enumerated devices. you can use fdisk on a disk device to see how dynamic device creation/deletion works. If it works for you or doesn't or you like the idea or don't, I've love to hear about it Thanks -steve [-- Attachment #2: sdeq.tar.gz --] [-- Type: application/x-gzip, Size: 20163 bytes --] ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 19:10 [PATCH] udev enhancements to use kernel event queue Steven Dake @ 2003-06-12 21:47 ` Patrick Mochel 2003-06-13 16:31 ` Steven Dake 2003-06-12 21:47 ` Greg KH ` (3 subsequent siblings) 4 siblings, 1 reply; 73+ messages in thread From: Patrick Mochel @ 2003-06-12 21:47 UTC (permalink / raw) To: Steven Dake; +Cc: linux-kernel *Sigh* You're asking for trouble, aren't you? For the new spectators in the crowd, Steven, Greg and I have been on a private email thread about this. This is the original email that he sent, with all comments made to him completely disregarded.. > I have been looking at the udev idea that Greg KH has developed. > Userland device enumeration definately is the way to go, however, there > are some problems with using /sbin/hotplug to transmit device > enumeration events: > > 1) performance of fork/exec with lots of devices is very poor Please define: - 'lots' - 'poor' The performance of /sbin/hotplug doesn't matter. It's a blip on the radar. If you have lots of devices (e.g. 1000 SCSI disks) the time it takes to fork + exec /sbin/hotplug is nothing compared to what it would take to spin up and probe each of the disks. If you don't have a lot of devices, then you're going to be spending a lot less time running /sbin/hotplug. So little that it's not going to make a difference, I'm willing to wager. If you have accurate time measurements, then I would be interested in seeing them. Though, I request that you test the following: - The default /sbin/hotplug for Redhat 9 systems - The default /sbin/hotplug for Montavista systems. - Greg's mini-hotplug - Your sdeq Please test against an identical kernel, with and without your patch. Please also post links to (or attach) the /sbin/hotplug binaries and scripts that you tested with. And, please post the complete 'tree -d' output of sysfs on the system(s) you tested on. > 2) lots of processes can be forked with no policy to control how many > are forked at once potentially leading to OOM Have you seen this happen? Sure, it's theoretically possible, but I highly doubt it will ever happen on a real system. Typically, if you have an astronomical number of devices, then you'll have an incredible amount of memory, too. Please post a bug report when you see this. > 3) /sbin/hotplug events can occur out of order, eg: remove event occurs, > /sbin/hotplug sleeps waiting for something, insert event occurs and > completes immediately. Then the remove event completes, after the > insert, resulting in out-of-order completion and a broken /dev. I have > seen this several times with udev. This is true, and I'll let Greg handle this one. > 4) early device enumeration (character devices, etc) are missed because > /sbin/hotplug is not available during early device init. initramfs, as well as one of many cold-plugging solutions out there should suffice for this. If they don't, we would welcome and appreciate your effort in helping overcome these deficiencies by evolving the current system rather than attempting a hostile takeover. > To solve these problems, I've developed a driver and some minor > modifications to the lib/kobject.c to queue these events and allow a > user space program to read the events at its leisure. The driver > supports poll/select for effecient use of the processor. Listen, it's not going to fly. Greg is hotplug czar, and I won't take the patches if he doesn't like them. Secondly, you're just not going to replace hotplug. At least not now. /sbin/hotplug works today and has no serious, provable issues, besides events coming in out of order. Incredibly long boot times, OOM situations just have not happened. And, we've agreed that we're not going to implement an overdesigned solution to fix problems that aren't there yet. Show us REAL bugs and we'll work on making what we have better. Thirdly, /sbin/hotplug is an ASCII interface, providing flexibility in the userspace agent implementations. Your character device (which is not appreciated) forces the userspace tools to use select(2), poll(2), or ioctl(2). No simple read(2)/write(2). Binary interfaces == Bad(tm). If you have doubts, please read the threads concerning binary vs. ASCII interfaces over the last two years before replying. I'm not even going to go as far as comment on the code, since conceptually its FITH. Finally, I think you need a serious attitude readjustment. You've exceeded all expectations in the level of jackass that you've acheived. You completley disregarded Greg's comments in a private thread started from the SAME EXACT email. You were pompous and arrogant to the person whose code you're trying to replace. Then, you have the nerve to start over on this list. Did you think we wouldn't notice? If anything, you've guaranteed that I will never take your emails seriously again. I really wish the device driver people at Montavista would pool their collective partial clues and figure WTF the rest of the world is doing. Do you guys listen? Do you read email? Hello? McFly? I'm frankly sick of you people wasting our time with these half-assed, hare-brained hacks that you expect us to love and integrate. You never try to evolve the current system; it's always a case of rewriting something that at least works with a completely new interface. You don't listen to our input, and you disappear after a few emails. Then, you try it all again a few months later. It's frustrating because you're obviously talented and have the time+energy to help, but you never seem to attempt to align yourselves with us or the rest of the kernel. Please, either a) re-read the email threads, the papers, the magazine articles, and/or the text file documentation and attempt to work with us; or b) stop trying to get us to listen. -pat ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 21:47 ` Patrick Mochel @ 2003-06-13 16:31 ` Steven Dake 2003-06-13 17:06 ` Patrick Mochel 0 siblings, 1 reply; 73+ messages in thread From: Steven Dake @ 2003-06-13 16:31 UTC (permalink / raw) To: Patrick Mochel; +Cc: linux-kernel Patrick Mochel wrote: >*Sigh* You're asking for trouble, aren't you? > >For the new spectators in the crowd, Steven, Greg and I have been on a >private email thread about this. This is the original email that he sent, >with all comments made to him completely disregarded.. > > > >>I have been looking at the udev idea that Greg KH has developed. >>Userland device enumeration definately is the way to go, however, there >>are some problems with using /sbin/hotplug to transmit device >>enumeration events: >> >>1) performance of fork/exec with lots of devices is very poor >> >> > >Please define: > >- 'lots' > > i tested 12 devices >- 'poor' > 80% slower for enumeration. > > >The performance of /sbin/hotplug doesn't matter. It's a blip on the radar. >If you have lots of devices (e.g. 1000 SCSI disks) the time it takes to >fork + exec /sbin/hotplug is nothing compared to what it would take to >spin up and probe each of the disks. If you don't have a lot of devices, >then you're going to be spending a lot less time running /sbin/hotplug. So >little that it's not going to make a difference, I'm willing to wager. > >If you have accurate time measurements, then I would be interested in >seeing them. Though, I request that you test the following: > >- The default /sbin/hotplug for Redhat 9 systems >- The default /sbin/hotplug for Montavista systems. >- Greg's mini-hotplug >- Your sdeq > >Please test against an identical kernel, with and without your patch. >Please also post links to (or attach) the /sbin/hotplug binaries and >scripts that you tested with. And, please post the complete 'tree -d' >output of sysfs on the system(s) you tested on. > > > >>2) lots of processes can be forked with no policy to control how many >>are forked at once potentially leading to OOM >> >> > >Have you seen this happen? Sure, it's theoretically possible, but I highly >doubt it will ever happen on a real system. Typically, if you have an >astronomical number of devices, then you'll have an incredible amount of >memory, too. Please post a bug report when you see this. > > I have not seen this happen, however, it seems clear that it would happen if /sbin/hotplug were called a sufficient number of times. > > >>3) /sbin/hotplug events can occur out of order, eg: remove event occurs, >>/sbin/hotplug sleeps waiting for something, insert event occurs and >>completes immediately. Then the remove event completes, after the >>insert, resulting in out-of-order completion and a broken /dev. I have >>seen this several times with udev. >> >> > >This is true, and I'll let Greg handle this one. > > > >>4) early device enumeration (character devices, etc) are missed because >>/sbin/hotplug is not available during early device init. >> >> > >initramfs, as well as one of many cold-plugging solutions out there should >suffice for this. If they don't, we would welcome and appreciate your >effort in helping overcome these deficiencies by evolving the current >system rather than attempting a hostile takeover. > > I am not attempting a hostile takeover. The patch provides an ADDITIONAL solution to what is already available which significantly improves performance. > > >>To solve these problems, I've developed a driver and some minor >>modifications to the lib/kobject.c to queue these events and allow a >>user space program to read the events at its leisure. The driver >>supports poll/select for effecient use of the processor. >> >> > >Listen, it's not going to fly. Greg is hotplug czar, and I won't take the >patches if he doesn't like them. > > This is not a hotplug issue, it is strictly a system device enumeration issue. >Secondly, you're just not going to replace hotplug. At least not now. >/sbin/hotplug works today and has no serious, provable issues, besides >events coming in out of order. Incredibly long boot times, OOM situations >just have not happened. And, we've agreed that we're not going to >implement an overdesigned solution to fix problems that aren't there yet. >Show us REAL bugs and we'll work on making what we have better. > > I have no intention of replacing hotplug. The patch is in addition to hotplug, and uses the good infrastructure Greg and yourself have already put in place. >Thirdly, /sbin/hotplug is an ASCII interface, providing flexibility in the >userspace agent implementations. Your character device (which is not >appreciated) forces the userspace tools to use select(2), poll(2), or >ioctl(2). No simple read(2)/write(2). Binary interfaces == Bad(tm). If you >have doubts, please read the threads concerning binary vs. ASCII >interfaces over the last two years before replying. > > I believe it is highly desireable to be able to use select. Imagine the case where the kernel generates an event, and some external agent, that wants to enumerate devices, must also generate an event. This allows a C program to select which fd to use for I/O. I don't understand the problems with binary interfaces, as they are numerous in the kernel. For an example, take a look at the syscall interface! >I'm not even going to go as far as comment on the code, since conceptually >its FITH. > >Finally, I think you need a serious attitude readjustment. You've exceeded >all expectations in the level of jackass that you've acheived. You >completley disregarded Greg's comments in a private thread started from >the SAME EXACT email. You were pompous and arrogant to the person whose >code you're trying to replace. Then, you have the nerve to start over on >this list. Did you think we wouldn't notice? If anything, you've >guaranteed that I will never take your emails seriously again. > > I did not disregard Greg's comments. The bottom line is the performance issues and out-of-order issues have not been addressed. Perhaps the out-of-order issue can be addressed, but performance of /sbin/hotplug will _always_ be slower then the operation of one process on an event queue. You can wait until enough people bitch about the performance of /sbin/hotplug and evolve /sbin/hotplug into the scheme I propose, or you can take this patch as is. I really don't care as I don't make patches available for my benefit; they are for the benefit of the community. Finally, I didn't realize the sandbox was full. If you don't want to play with my toys, you certainly don't have to. Thanks -steve >I really wish the device driver people at Montavista would pool their >collective partial clues and figure WTF the rest of the world is doing. >Do you guys listen? Do you read email? Hello? McFly? > >I'm frankly sick of you people wasting our time with these half-assed, >hare-brained hacks that you expect us to love and integrate. You never try >to evolve the current system; it's always a case of rewriting something >that at least works with a completely new interface. You don't listen to >our input, and you disappear after a few emails. Then, you try it all >again a few months later. > >It's frustrating because you're obviously talented and have the >time+energy to help, but you never seem to attempt to align yourselves >with us or the rest of the kernel. > >Please, either a) re-read the email threads, the papers, the magazine >articles, and/or the text file documentation and attempt to work with us; >or b) stop trying to get us to listen. > > > -pat > > > > > > > ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 16:31 ` Steven Dake @ 2003-06-13 17:06 ` Patrick Mochel 0 siblings, 0 replies; 73+ messages in thread From: Patrick Mochel @ 2003-06-13 17:06 UTC (permalink / raw) To: Steven Dake; +Cc: linux-kernel > >Please define: > > > >- 'lots' > > > > > i tested 12 devices > > >- 'poor' > > > 80% slower for enumeration. Prove it. :) > >>2) lots of processes can be forked with no policy to control how many > >>are forked at once potentially leading to OOM > >> > >> > > > >Have you seen this happen? Sure, it's theoretically possible, but I highly > >doubt it will ever happen on a real system. Typically, if you have an > >astronomical number of devices, then you'll have an incredible amount of > >memory, too. Please post a bug report when you see this. > > > > > I have not seen this happen, however, it seems clear that it would > happen if /sbin/hotplug were called a sufficient number of times. You have the same problem, theoretically. If the daemon dies, the event queue will fill and the system will crash. With the current scheme, user processes will be killed and we will lose hotplug events. With your scheme, the entire system will crash eventually. > >Thirdly, /sbin/hotplug is an ASCII interface, providing flexibility in the > >userspace agent implementations. Your character device (which is not > >appreciated) forces the userspace tools to use select(2), poll(2), or > >ioctl(2). No simple read(2)/write(2). Binary interfaces == Bad(tm). If you > >have doubts, please read the threads concerning binary vs. ASCII > >interfaces over the last two years before replying. > > > > > I believe it is highly desireable to be able to use select. Imagine the > case where the kernel generates an event, and some external agent, that > wants to enumerate devices, must also generate an event. This allows a C > program to select which fd to use for I/O. I don't understand the > problems with binary interfaces, as they are numerous in the kernel. For > an example, take a look at the syscall interface! That doesn't mean they're the right thing to do in all cases. Binary interfaces may offer better performance, but they're harder to maintain, harder to debug, and harder to write agents to use them. You may be so smart and arrogant to disregard those things as trite, but they are important, from a long-term development standpoint. Again, please read the archives concerning this topic on linux-kernel before commenting on this subject again. > I did not disregard Greg's comments. The bottom line is the performance > issues and out-of-order issues have not been addressed. Perhaps the > out-of-order issue can be addressed, but performance of /sbin/hotplug > will _always_ be slower then the operation of one process on an event > queue. You can wait until enough people bitch about the performance of > /sbin/hotplug and evolve /sbin/hotplug into the scheme I propose, or you > can take this patch as is. I really don't care as I don't make patches > available for my benefit; they are for the benefit of the community. Ok, I'll wait. I would rather see slow, marked improvements then blindly applying and believing the whims of some random kernel hacker under the farce of altruism and increased availability. > Finally, I didn't realize the sandbox was full. If you don't want to > play with my toys, you certainly don't have to. It isn't full, and I will not touch your toys, thank you very much. But, one of the first social rules that people usually learn in the sandbox is that in order to play with the other kids, you have to play their games with their rules. -pat ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 19:10 [PATCH] udev enhancements to use kernel event queue Steven Dake 2003-06-12 21:47 ` Patrick Mochel @ 2003-06-12 21:47 ` Greg KH 2003-06-12 22:03 ` Andrew Morton 2003-06-13 15:51 ` Steven Dake 2003-06-12 22:27 ` Oliver Neukum ` (2 subsequent siblings) 4 siblings, 2 replies; 73+ messages in thread From: Greg KH @ 2003-06-12 21:47 UTC (permalink / raw) To: Steven Dake; +Cc: linux-kernel Nice, you posted the same message and tarball to lkml that we have been talking about privately for a few days. Just to save time, and to bring everyone else up to speed, I'll just include my responses and then yours to this thread and then finally respond to your last message to me about this. Let me know if I get any of the attribution wrong, this is going to be a fun cut and paste... On Thu, Jun 12, 2003 at 12:10:48PM -0700, Steven Dake wrote: > Folks, > > I have been looking at the udev idea that Greg KH has developed. > Userland device enumeration definately is the way to go, however, there > are some problems with using /sbin/hotplug to transmit device > enumeration events: > > 1) performance of fork/exec with lots of devices is very poor You later said your binary of udev was 500k. I said: 500k binary? Who's running anything that big? udev weighs in around 6k right now! You have to add a _lot_ of functionality to move up to 500k. You responded: Hmm perhaps my binary was not built optmized when I looked at it, I'll take another look. > 2) lots of processes can be forked with no policy to control how many > are forked at once potentially leading to OOM I responded: No, this is pure speculation. Have you actually tried this out? I have. I've hotplugged pci drawers connected to huge numbers of SCSI disks and never even seen a blip on system load. It turns out that pretty much everything happens consecutively, with a reasonable amount of time just spent probing the SCSI busses. USB is pretty slow in enumerating their devices too, so I don't see a real world application that has this problem. Do you have any proof of this? You responded: I don't agree. I've timed bootup with 12 fibrechannel disks with 4 partitions each with udev using the SDEF patches. I've then timed bootup and added in the time required to add those 12 fibrechannel disks (via a script) using /sbin/hotplug. The script method (executing /sbin/hotplug 12*4 times with the appropriate data hardcoded) takes rougly 1.25-1.75 seconds longer. Since I'm using a chronograph, its not very accurate, but it is definately slower to enumerate with /sbin/hotplug using this methodlogy. That is only 12 disks. What about the case of 1000 disks? fork and exec are very expensive system calls, to be avoided at all costs... To which I responded: Well, if you don't dynamically link a 500K file, perhaps it would be a lot faster to start up :) Anyway, who really cares? This is _not_ a time critical thing. Who really cares that it takes 1 second longer to see the device node for the newly plugged in device? And if you plug in 5000 devices all at once, you better be willing to wait the extra minute or so for the system to calm down and make all of the devices available to userspace. And again you came back with: System bootup time and time between failures are the two factors used to determine availability. Reducing bootup time and increasing time between failures both improve availability. Even 1 second is considerable when 5 9s is 30seconds of downtime per year. 1 more second is a considerable change in availability when you run the equations. I now respond: Bootup time reduction would be a great thing to have. And I agree for situations where 1 second is a considerable amount of time, it is important. However, 99.99% of the people running Linux out there, do not have those problems. And even then, for those 00.01% of the people who do, they do whatever they possibly can to keep from having to reboot at all. I still say the measurable ammount of time to plug in a new disk and have it's device node created is not measurable, from using /sbin/hotplug and udev, and using your character node. Also, the ultimate solution for these kinds of people is the in-kernel devfs, or the current static /dev. If they use either of them, it's guaranteed to be faster then yours or mine implementation. > 3) /sbin/hotplug events can occur out of order, eg: remove event occurs, > /sbin/hotplug sleeps waiting for something, insert event occurs and > completes immediately. Then the remove event completes, after the > insert, resulting in out-of-order completion and a broken /dev. I have > seen this several times with udev. I responded: Yes this happens. I have a fix for this for udev itself. No kernel changes are needed. I'll show it at OLS in July if you want to see it :) You responded: There is a problem in udev that fork doesn't wait for the parent to exit (mknod) which is solved by using the mknod system call. But there is still another problem that /sbin/hotplug can execute out of order. I'll be at OLS so I'll take a look at your solution then. So we agree to talk about this at OLS, fine. > 4) early device enumeration (character devices, etc) are missed because > /sbin/hotplug is not available during early device init. I responded: Then use early initramfs to catch this. I've done this and seen it work already. If you don't want to do early init, walk the sysfs tree yourself from userspace, it works just as well, and we don't have to do any buffering in kernelspace at all. You responded: Initramfs still misses early events. That leaves walking sysfs. I have considered and rejected using a user space program to walk sysfs for this information. There are several problems 1) kobject device name is not present in sysfs (how do you know if its a character or block, then without some external mapping db, yuck) 2) scan of sysfs is not atomic, resulting in potential lost events and lost device enumerations I responded: You don't know this from hotplug either. You have to figure it out from the kobject name either way. Actually it's quite easy, only block devices show up in /sys/block, everything else is a character device. No, this is quite simple to fix: - set up /sbin/hotplug udev handler after init has started. - start walking sysfs to "cold plug" devices. - any new events that happen are caught by udev and handled there, while you are walking the tree. - no events are lost. You responded: Events could be handled in this situation twice, which is definately not what is desireable. I now respond: Doing a mknod() on a node that is already present doesn't harm anything. And since you have to keep a database around of what devices are present, and what you have called them, removing them after already removing them again, is detectable. So handing the same event twice isn't a problem. > To solve these problems, I've developed a driver and some minor > modifications to the lib/kobject.c to queue these events and allow a > user space program to read the events at its leisure. The driver > supports poll/select for effecient use of the processor. > > The feature is called the System Device Enumeration Event Queue. I've > attached a tarball which includes udev-0.1 modified to use the SDEQ, a > kernel patch against linux 2.5.70 that implements the SDEQ, and a > test/library that tests the SDEQ functionality. I responded: Ok, I looked at your code. To put it kindly, it will never work properly. See the long thread on lkml by Inaky after I announced udev. He tried to create something like what you have done, but got a big better implementation (yours has a number of memory leaks, and permission problems...) See that thread for why doing this in the kernel is not the way to go. Please don't waste your time on this anymore, it's a loosing battle... You responded: It obviously works properly when using the udev application so I am at a loss as to how you can make a claim that "it will never work". Please explain your comments about memory leaks. Data structures are allocated and then freed at appropriate times. Even in the case of a crash during a transaciton (get event, clera events), there are no leaks. Please explain permission problems. Permissions are controlled through policy in the filesystem (/dev/sdeq) which fully controls access to the system device enumeration event queue. I did look at Inaky's code, and frankly it was too complicated to solve the simple problem that needed to be solved. We also have a full-blown event mechanism (for redundant system slot compact pci support), but feel its too heavy-weight to be of general purpose use to the kernel. This particular implementation is lightweight and solves a specific need with real applications that can really use the functionality. I have read the thread and atleast 70% of the comments where "we should not use /sbin/hotplug, but instead events should be queued in the kernel". The thread raised all of the issues that are solved with this kernel patch. Several core kernel maintainers were the authors of the comments, so I'm not alone in this belief. Your obviously opposed to using character devices to queue kobject creation/deletion events for some reason which I don't understand. /sbin/hotplug is an inferior solution to queueing events in the kernel. Your not the maintainer of the code this feature is going into (which is the linux driver model). I'd really like to hear Pat's thoughts on this issue as he is the one that has to live with the changes. If you want to continue using /sbin/hotplug, with all of its numerous problems for your work, your more then welcome to do so. No one is forcing you to use sdeq, however, for those other vendors that may want to make persistent device naming solutions, this is a core feature that can not easily be solved by poorly thought out hacks. I will respond now: Ok, permission problems can be solved by relying on the permission of the device node, you are correct. As for the memory issues, if no one ever reads from the character node, it will constantly fill up with events, right? Shouldn't they be eventually flushed out at some point in time? Also for trying to remove events, why is userspace allowed to remove a multiple of events at once? I would think that a valid read would remove that event, you can have two applications grab the same event, which is not what I think you want to have happen. And finally... Yes, I know I'm not the maintainer of the code that this feature is going into. But my position is that this code is not needed in the main kernel tree, and offer that opinion to the maintainer of this code. I agree, most of the arguments in the previous udev thread talked about out of order events, but I think I've gotten that solved now (I can wave my hands about how it will be done, or I can show you working code at OLS. I think I'll wait until OLS, as code matters, wavy hands don't.) So, because of this, I still think that everything you are trying to do can be successfully done from userspace today (or use devfsd today, it will give you the same info!) And due to that, I do not think this code is necessary to be in the kernel. Thanks for reading this far, and my apologies if I got any of the above quotes out of order or misspoken, I just wanted to cut to the chase, and not have to go through 4 more rounds of email to make it to this point in the conversation again. greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 21:47 ` Greg KH @ 2003-06-12 22:03 ` Andrew Morton 2003-06-12 22:50 ` Greg KH 2003-06-13 8:40 ` Alan Cox 2003-06-13 15:51 ` Steven Dake 1 sibling, 2 replies; 73+ messages in thread From: Andrew Morton @ 2003-06-12 22:03 UTC (permalink / raw) To: Greg KH; +Cc: sdake, linux-kernel Greg KH <greg@kroah.com> wrote: > > > 3) /sbin/hotplug events can occur out of order, eg: remove event occurs, > > /sbin/hotplug sleeps waiting for something, insert event occurs and > > completes immediately. Then the remove event completes, after the > > insert, resulting in out-of-order completion and a broken /dev. I have > > seen this several times with udev. > > I responded: > Yes this happens. I have a fix for this for udev itself. No > kernel changes are needed. I'll show it at OLS in July if you > want to see it :) This is a significantly crappy aspect of the /sbin/hotplug callout. I'd be very interested in reading an outline of how you propose fixing it, without waiting until OLS, thanks. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 22:03 ` Andrew Morton @ 2003-06-12 22:50 ` Greg KH 2003-06-12 22:51 ` Andrew Morton ` (2 more replies) 2003-06-13 8:40 ` Alan Cox 1 sibling, 3 replies; 73+ messages in thread From: Greg KH @ 2003-06-12 22:50 UTC (permalink / raw) To: Andrew Morton; +Cc: sdake, linux-kernel On Thu, Jun 12, 2003 at 03:03:35PM -0700, Andrew Morton wrote: > Greg KH <greg@kroah.com> wrote: > > > > > 3) /sbin/hotplug events can occur out of order, eg: remove event occurs, > > > /sbin/hotplug sleeps waiting for something, insert event occurs and > > > completes immediately. Then the remove event completes, after the > > > insert, resulting in out-of-order completion and a broken /dev. I have > > > seen this several times with udev. > > > > I responded: > > Yes this happens. I have a fix for this for udev itself. No > > kernel changes are needed. I'll show it at OLS in July if you > > want to see it :) > > This is a significantly crappy aspect of the /sbin/hotplug callout. I'd be > very interested in reading an outline of how you propose fixing it, without > waiting until OLS, thanks. Sure, I knew someone would probably want to :) Anyway, here's what I've come up with, feel free to shoot it full of holes: <handwaving> - serialize the hotplug events in userspace: - udev daemon running listening on named pipe - small event generator kicked off from /sbin/hotplug call to write event to udev pipe This alone solves the major memory issues that people have complained about, and allows us to keep a ram database of present devices and their names, which a lot of people want to have. It also makes the /sbin/hotplug binary even smaller than 6k :) - apply debounce on events: - get event, delay Tbus amount of time. - after time expires, check queue to see if we have any other events for this device. - if not, this is the only one, act on it. - if so, delay Tbus amount of time again. - continue delaying until no new events for this device are present. - count up events for this device, and throw away the odd ones (even vs. odd, i.e. 2 adds and 1 remove really mean add the device.) - if both counts are even, then leave device at its current state (added or removed) but check device attributes to see if we had named it a "special" name. If so, need to make sure "special" name is still correct. If not, fix it. Now the whole trick is coming up with the Tbus time limit :) For all physical busses, it takes a decent amount of time to add or remove a device (in the seconds for PCI and USB). It's pretty hard to get these events out of order in the first place, except on a _very_ heavily loaded system (I've tried.) It's easier to get events out of order for virtual devices (like scsi-debug). That's why a different time value for different busses makes sense. So if Tbus is too small, we do get events out of order, make Tbus too big, and we start delaying too long, and get a real deep queue. So it's better to leave Tbus too big to be safe, more testing of proper values is essential before I even start to claim that this will work for all people, but I do think it is possible. One other thing that I think will work is making it a sliding scale (if we get an event, for example, for add, and the device is already there, increase Tbus and throw it back on the queue (and don't delete the other ones) and sleep again). This too needs a lot of testing. The above sequence has seemed to work pretty well for me so far, but it needs a lot more work and tweaking with real life loads. I've also been sidetracked recently with other work, but should have code to show by OLS... </handwaving> thanks, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 22:50 ` Greg KH @ 2003-06-12 22:51 ` Andrew Morton 2003-06-12 23:02 ` Greg KH 2003-06-12 23:05 ` Robert Love 2003-06-19 19:51 ` Werner Almesberger 2 siblings, 1 reply; 73+ messages in thread From: Andrew Morton @ 2003-06-12 22:51 UTC (permalink / raw) To: Greg KH; +Cc: sdake, linux-kernel Greg KH <greg@kroah.com> wrote: > > <handwaving> heh. Thought about adding a sequence number to the /sbin/hotplug argument list? ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 22:51 ` Andrew Morton @ 2003-06-12 23:02 ` Greg KH 2003-06-12 23:09 ` Greg KH 0 siblings, 1 reply; 73+ messages in thread From: Greg KH @ 2003-06-12 23:02 UTC (permalink / raw) To: Andrew Morton; +Cc: sdake, linux-kernel On Thu, Jun 12, 2003 at 03:51:48PM -0700, Andrew Morton wrote: > Greg KH <greg@kroah.com> wrote: > > > > <handwaving> > > heh. > > Thought about adding a sequence number to the /sbin/hotplug argument list? /me owes you a lot of beer at ols. That's a world simpler than my proposal, I think I'll go write that patch right now... thanks, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:02 ` Greg KH @ 2003-06-12 23:09 ` Greg KH 2003-06-12 23:14 ` Patrick Mochel 2003-06-12 23:16 ` Robert Love 0 siblings, 2 replies; 73+ messages in thread From: Greg KH @ 2003-06-12 23:09 UTC (permalink / raw) To: Patrick Mochel; +Cc: Andrew Morton, sdake, linux-kernel On Thu, Jun 12, 2003 at 04:02:47PM -0700, Greg KH wrote: > On Thu, Jun 12, 2003 at 03:51:48PM -0700, Andrew Morton wrote: > > Greg KH <greg@kroah.com> wrote: > > > > > > <handwaving> > > > > heh. > > > > Thought about adding a sequence number to the /sbin/hotplug argument list? > > /me owes you a lot of beer at ols. > > That's a world simpler than my proposal, I think I'll go write that > patch right now... Pat, here's a patch to add a sequence number to kobject hotplug calls to help userspace out a lot. thanks, greg k-h # kobject: add sequence number to hotplug events to help userspace out. diff -Nru a/lib/kobject.c b/lib/kobject.c --- a/lib/kobject.c Thu Jun 12 16:05:06 2003 +++ b/lib/kobject.c Thu Jun 12 16:05:06 2003 @@ -100,6 +100,7 @@ #define BUFFER_SIZE 1024 /* should be enough memory for the env */ #define NUM_ENVP 32 /* number of env pointers */ +static unsigned long sequence_num; static void kset_hotplug(const char *action, struct kset *kset, struct kobject *kobj) { @@ -151,6 +152,10 @@ envp [i++] = scratch; scratch += sprintf(scratch, "ACTION=%s", action) + 1; + + envp [i++] = scratch; + scratch += sprintf(scratch, "SEQNUM=%ld", sequence_num) + 1; + ++sequence_num; kobj_path_length = get_kobj_path_length (kset, kobj); kobj_path = kmalloc (kobj_path_length, GFP_KERNEL); ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:09 ` Greg KH @ 2003-06-12 23:14 ` Patrick Mochel 2003-06-12 23:16 ` Robert Love 1 sibling, 0 replies; 73+ messages in thread From: Patrick Mochel @ 2003-06-12 23:14 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, sdake, linux-kernel > Pat, here's a patch to add a sequence number to kobject hotplug calls to > help userspace out a lot. Thanks, applied. -pat ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:09 ` Greg KH 2003-06-12 23:14 ` Patrick Mochel @ 2003-06-12 23:16 ` Robert Love 2003-06-12 23:25 ` Greg KH 2003-06-12 23:25 ` Patrick Mochel 1 sibling, 2 replies; 73+ messages in thread From: Robert Love @ 2003-06-12 23:16 UTC (permalink / raw) To: Greg KH; +Cc: Patrick Mochel, Andrew Morton, sdake, linux-kernel On Thu, 2003-06-12 at 16:09, Greg KH wrote: > + > + envp [i++] = scratch; > + scratch += sprintf(scratch, "SEQNUM=%ld", sequence_num) + 1; > + ++sequence_num; Since I do not see a lock in here, I think you need to use atomics? As is, you can also race and have the same sequence_num value written out twice. Robert Love ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:16 ` Robert Love @ 2003-06-12 23:25 ` Greg KH 2003-06-13 20:01 ` Oliver Neukum 2003-06-12 23:25 ` Patrick Mochel 1 sibling, 1 reply; 73+ messages in thread From: Greg KH @ 2003-06-12 23:25 UTC (permalink / raw) To: Robert Love; +Cc: Patrick Mochel, Andrew Morton, sdake, linux-kernel On Thu, Jun 12, 2003 at 04:16:03PM -0700, Robert Love wrote: > On Thu, 2003-06-12 at 16:09, Greg KH wrote: > > > + > > + envp [i++] = scratch; > > + scratch += sprintf(scratch, "SEQNUM=%ld", sequence_num) + 1; > > + ++sequence_num; > > Since I do not see a lock in here, I think you need to use atomics? > > As is, you can also race and have the same sequence_num value written > out twice. Doh, I ate too much for lunch today and need to wake up... Pat, here's an updated patch. thanks, greg k-h diff -Nru a/lib/kobject.c b/lib/kobject.c --- a/lib/kobject.c Thu Jun 12 16:21:21 2003 +++ b/lib/kobject.c Thu Jun 12 16:21:21 2003 @@ -100,6 +100,8 @@ #define BUFFER_SIZE 1024 /* should be enough memory for the env */ #define NUM_ENVP 32 /* number of env pointers */ +static unsigned long sequence_num; +static spinlock_t sequence_lock = SPIN_LOCK_UNLOCKED; static void kset_hotplug(const char *action, struct kset *kset, struct kobject *kobj) { @@ -151,6 +153,12 @@ envp [i++] = scratch; scratch += sprintf(scratch, "ACTION=%s", action) + 1; + + spin_lock(&sequence_lock); + envp [i++] = scratch; + scratch += sprintf(scratch, "SEQNUM=%ld", sequence_num) + 1; + ++sequence_num; + spin_unlock(&sequence_lock); kobj_path_length = get_kobj_path_length (kset, kobj); kobj_path = kmalloc (kobj_path_length, GFP_KERNEL); ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:25 ` Greg KH @ 2003-06-13 20:01 ` Oliver Neukum 2003-06-18 22:59 ` Greg KH 0 siblings, 1 reply; 73+ messages in thread From: Oliver Neukum @ 2003-06-13 20:01 UTC (permalink / raw) To: Greg KH, Robert Love; +Cc: Patrick Mochel, Andrew Morton, sdake, linux-kernel > + > + spin_lock(&sequence_lock); > + envp [i++] = scratch; > + scratch += sprintf(scratch, "SEQNUM=%ld", sequence_num) + 1; > + ++sequence_num; > + spin_unlock(&sequence_lock); > > kobj_path_length = get_kobj_path_length (kset, kobj); > kobj_path = kmalloc (kobj_path_length, GFP_KERNEL); If this kmalloc fails, you'll have a hole in the numbers and user space will be very confused. You need to report dropped events if you do this. Regards Oliver ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 20:01 ` Oliver Neukum @ 2003-06-18 22:59 ` Greg KH 2003-06-19 0:12 ` Kevin P. Fleming 0 siblings, 1 reply; 73+ messages in thread From: Greg KH @ 2003-06-18 22:59 UTC (permalink / raw) To: Oliver Neukum Cc: Robert Love, Patrick Mochel, Andrew Morton, sdake, linux-kernel On Fri, Jun 13, 2003 at 10:01:47PM +0200, Oliver Neukum wrote: > > > + > > + spin_lock(&sequence_lock); > > + envp [i++] = scratch; > > + scratch += sprintf(scratch, "SEQNUM=%ld", sequence_num) + 1; > > + ++sequence_num; > > + spin_unlock(&sequence_lock); > > > > kobj_path_length = get_kobj_path_length (kset, kobj); > > kobj_path = kmalloc (kobj_path_length, GFP_KERNEL); > > If this kmalloc fails, you'll have a hole in the numbers and > user space will be very confused. You need to report dropped > events if you do this. Yes, we should add the sequence number last. Good catch. greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-18 22:59 ` Greg KH @ 2003-06-19 0:12 ` Kevin P. Fleming 2003-06-19 0:20 ` Greg KH ` (2 more replies) 0 siblings, 3 replies; 73+ messages in thread From: Kevin P. Fleming @ 2003-06-19 0:12 UTC (permalink / raw) To: Greg KH Cc: Oliver Neukum, Robert Love, Patrick Mochel, Andrew Morton, sdake, linux-kernel Greg KH wrote: >>If this kmalloc fails, you'll have a hole in the numbers and >>user space will be very confused. You need to report dropped >>events if you do this. > > > Yes, we should add the sequence number last. > While this is not a bad idea, I don't think you want to make a promise to userspace that there will never be gaps in the sequence numbers. When this sequence number was proposed, in my mind it seemed perfect because then userspace could _order_ multiple events for the same device to ensure they got processed in the correct order. I don't know that any hotplug userspace implementation is going to be large and complex enough to warrant "holding" events until lower-numbered events have been delivered. That just seems like a very difficult task with little potential gain, but I could very well be mistaken :-) ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 0:12 ` Kevin P. Fleming @ 2003-06-19 0:20 ` Greg KH 2003-06-19 0:42 ` Kevin P. Fleming 2003-06-19 0:25 ` Andrew Morton 2003-06-19 6:27 ` Oliver Neukum 2 siblings, 1 reply; 73+ messages in thread From: Greg KH @ 2003-06-19 0:20 UTC (permalink / raw) To: Kevin P. Fleming Cc: Oliver Neukum, Robert Love, Patrick Mochel, Andrew Morton, sdake, linux-kernel On Wed, Jun 18, 2003 at 05:12:50PM -0700, Kevin P. Fleming wrote: > Greg KH wrote: > > >>If this kmalloc fails, you'll have a hole in the numbers and > >>user space will be very confused. You need to report dropped > >>events if you do this. > > > > > >Yes, we should add the sequence number last. > > > > While this is not a bad idea, I don't think you want to make a promise > to userspace that there will never be gaps in the sequence numbers. When > this sequence number was proposed, in my mind it seemed perfect because > then userspace could _order_ multiple events for the same device to > ensure they got processed in the correct order. I don't know that any > hotplug userspace implementation is going to be large and complex enough > to warrant "holding" events until lower-numbered events have been > delivered. That just seems like a very difficult task with little > potential gain, but I could very well be mistaken :-) Yes you are :) You will have to handle gaps properly yes. But you also have to "hold" events for a bit of time in order to determine that things are out of order, or we have a gap. So a bit of "complex" logic is in the works, but it's much less complex than if we didn't have that sequence number. thanks, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 0:20 ` Greg KH @ 2003-06-19 0:42 ` Kevin P. Fleming 2003-06-19 0:51 ` Greg KH 0 siblings, 1 reply; 73+ messages in thread From: Kevin P. Fleming @ 2003-06-19 0:42 UTC (permalink / raw) To: Greg KH Cc: Oliver Neukum, Robert Love, Patrick Mochel, Andrew Morton, sdake, linux-kernel Greg KH wrote: > But you also have to "hold" events for a bit of time in order to > determine that things are out of order, or we have a gap. So a bit of > "complex" logic is in the works, but it's much less complex than if we > didn't have that sequence number. > OK, so the important point here is that while you probably delay events for a small amount of time (1-3 seconds maybe) to ensure that you aren't processing steps out of order, it's not likely that userspace is going hold event #1321 for an indefinite period of time just because it has never seen event #1320. I can't wait to see this implementation; it's going to be interesting, to say the least. However, without the sequence numbers, it would very likely be impossible. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 0:42 ` Kevin P. Fleming @ 2003-06-19 0:51 ` Greg KH 0 siblings, 0 replies; 73+ messages in thread From: Greg KH @ 2003-06-19 0:51 UTC (permalink / raw) To: Kevin P. Fleming Cc: Oliver Neukum, Robert Love, Patrick Mochel, Andrew Morton, sdake, linux-kernel On Wed, Jun 18, 2003 at 05:42:45PM -0700, Kevin P. Fleming wrote: > Greg KH wrote: > > >But you also have to "hold" events for a bit of time in order to > >determine that things are out of order, or we have a gap. So a bit of > >"complex" logic is in the works, but it's much less complex than if we > >didn't have that sequence number. > > > > OK, so the important point here is that while you probably delay events > for a small amount of time (1-3 seconds maybe) to ensure that you aren't > processing steps out of order, it's not likely that userspace is going > hold event #1321 for an indefinite period of time just because it has > never seen event #1320. Exactly. > I can't wait to see this implementation; it's going to be interesting, > to say the least. However, without the sequence numbers, it would very > likely be impossible. Hey, I still think my originally proposed version would have worked, eventually... :) greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 0:12 ` Kevin P. Fleming 2003-06-19 0:20 ` Greg KH @ 2003-06-19 0:25 ` Andrew Morton 2003-06-19 6:27 ` Oliver Neukum 2 siblings, 0 replies; 73+ messages in thread From: Andrew Morton @ 2003-06-19 0:25 UTC (permalink / raw) To: Kevin P. Fleming; +Cc: greg, oliver, rml, mochel, sdake, linux-kernel "Kevin P. Fleming" <kpfleming@cox.net> wrote: > > > Yes, we should add the sequence number last. > > > > While this is not a bad idea, I don't think you want to make a promise > to userspace that there will never be gaps in the sequence numbers. When > this sequence number was proposed, in my mind it seemed perfect because > then userspace could _order_ multiple events for the same device to > ensure they got processed in the correct order. I don't know that any > hotplug userspace implementation is going to be large and complex enough > to warrant "holding" events until lower-numbered events have been > delivered. That just seems like a very difficult task with little > potential gain, but I could very well be mistaken :-) The userspace support tools need to be able to handle gaps in any case, because call_usermodehelper() may fail. (In practice it won't fail, because the memory allocator is immortal. But the capability should be there. (Well actually, it could fail because the VM overcommit code might refuse the mmap. (But probably not, because root gets an extra margin.))) ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 0:12 ` Kevin P. Fleming 2003-06-19 0:20 ` Greg KH 2003-06-19 0:25 ` Andrew Morton @ 2003-06-19 6:27 ` Oliver Neukum 2 siblings, 0 replies; 73+ messages in thread From: Oliver Neukum @ 2003-06-19 6:27 UTC (permalink / raw) To: Kevin P. Fleming, Greg KH Cc: Robert Love, Patrick Mochel, Andrew Morton, sdake, linux-kernel Am Donnerstag, 19. Juni 2003 02:12 schrieb Kevin P. Fleming: > Greg KH wrote: > >>If this kmalloc fails, you'll have a hole in the numbers and > >>user space will be very confused. You need to report dropped > >>events if you do this. > > > > Yes, we should add the sequence number last. > > While this is not a bad idea, I don't think you want to make a promise > to userspace that there will never be gaps in the sequence numbers. When > this sequence number was proposed, in my mind it seemed perfect because > then userspace could _order_ multiple events for the same device to > ensure they got processed in the correct order. I don't know that any > hotplug userspace implementation is going to be large and complex enough > to warrant "holding" events until lower-numbered events have been > delivered. That just seems like a very difficult task with little > potential gain, but I could very well be mistaken :-) You cannot order events unless you hold such events. One event always arrives first. If it's the lower numbered, the point is moot. If it's the higher numbered, you'll need to hold it or there's no ordering. For the paranoid even that is not enough. A hotplug script may die in user space due to OOM oe EIO. Regards Oliver ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:16 ` Robert Love 2003-06-12 23:25 ` Greg KH @ 2003-06-12 23:25 ` Patrick Mochel 2003-06-12 23:29 ` Robert Love 1 sibling, 1 reply; 73+ messages in thread From: Patrick Mochel @ 2003-06-12 23:25 UTC (permalink / raw) To: Robert Love; +Cc: Greg KH, Andrew Morton, sdake, linux-kernel On 12 Jun 2003, Robert Love wrote: > On Thu, 2003-06-12 at 16:09, Greg KH wrote: > > > + > > + envp [i++] = scratch; > > + scratch += sprintf(scratch, "SEQNUM=%ld", sequence_num) + 1; > > + ++sequence_num; > > Since I do not see a lock in here, I think you need to use atomics? > > As is, you can also race and have the same sequence_num value written > out twice. Yeah, how about this ammended patch? -pat ===== lib/kobject.c 1.25 vs edited ===== --- 1.25/lib/kobject.c Wed Jun 11 12:06:06 2003 +++ edited/lib/kobject.c Thu Jun 12 16:24:26 2003 @@ -100,6 +100,9 @@ #define BUFFER_SIZE 1024 /* should be enough memory for the env */ #define NUM_ENVP 32 /* number of env pointers */ +static unsigned long sequence_num; +static spinlock_t sequence_lock = SPIN_LOCK_UNLOCKED; + static void kset_hotplug(const char *action, struct kset *kset, struct kobject *kobj) { @@ -112,6 +115,7 @@ int kobj_path_length; char *kobj_path = NULL; char *name = NULL; + unsigned long seq; /* If the kset has a filter operation, call it. If it returns failure, no hotplug event is required. */ @@ -151,6 +155,13 @@ envp [i++] = scratch; scratch += sprintf(scratch, "ACTION=%s", action) + 1; + + spin_lock(&sequence_lock); + seq = sequence_num++; + spin_unlock(&sequence_lock); + + envp [i++] = scratch; + scratch += sprintf(scratch, "SEQNUM=%ld", seq) + 1; kobj_path_length = get_kobj_path_length (kset, kobj); kobj_path = kmalloc (kobj_path_length, GFP_KERNEL); ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:25 ` Patrick Mochel @ 2003-06-12 23:29 ` Robert Love 2003-06-12 23:32 ` Greg KH 2003-06-12 23:34 ` Patrick Mochel 0 siblings, 2 replies; 73+ messages in thread From: Robert Love @ 2003-06-12 23:29 UTC (permalink / raw) To: Patrick Mochel; +Cc: Greg KH, Andrew Morton, sdake, linux-kernel On Thu, 2003-06-12 at 16:25, Patrick Mochel wrote: > Yeah, how about this ammended patch? Both this and Greg's look fine. I guess this is preferred, since the lock hold time is shorter :) > + spin_lock(&sequence_lock); > + seq = sequence_num++; > + spin_unlock(&sequence_lock); > + > + envp [i++] = scratch; > + scratch += sprintf(scratch, "SEQNUM=%ld", seq) + 1; Nice thinking. It is a shame we need a lock for this, but we don't have an atomic_inc_and_return(). Robert Love ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:29 ` Robert Love @ 2003-06-12 23:32 ` Greg KH 2003-06-12 23:34 ` Patrick Mochel 1 sibling, 0 replies; 73+ messages in thread From: Greg KH @ 2003-06-12 23:32 UTC (permalink / raw) To: Robert Love; +Cc: Patrick Mochel, Andrew Morton, sdake, linux-kernel On Thu, Jun 12, 2003 at 04:29:25PM -0700, Robert Love wrote: > On Thu, 2003-06-12 at 16:25, Patrick Mochel wrote: > > > Yeah, how about this ammended patch? > > Both this and Greg's look fine. > > I guess this is preferred, since the lock hold time is shorter :) Heh, I don't care either way, It's up to Pat... thanks for pointing this out, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:29 ` Robert Love 2003-06-12 23:32 ` Greg KH @ 2003-06-12 23:34 ` Patrick Mochel 2003-06-12 23:40 ` Paul Mackerras ` (2 more replies) 1 sibling, 3 replies; 73+ messages in thread From: Patrick Mochel @ 2003-06-12 23:34 UTC (permalink / raw) To: Robert Love; +Cc: Greg KH, Andrew Morton, sdake, linux-kernel > > + spin_lock(&sequence_lock); > > + seq = sequence_num++; > > + spin_unlock(&sequence_lock); > > + > > + envp [i++] = scratch; > > + scratch += sprintf(scratch, "SEQNUM=%ld", seq) + 1; > > Nice thinking. It is a shame we need a lock for this, but we don't have > an atomic_inc_and_return(). Those were my sentiments exactly: 16:21 * mochel searches for atomic_inc_and_read() :) It seems like the following should work. Would someone mind commenting on it? Thanks, -pat ===== include/asm/atomic.h 1.4 vs edited ===== --- 1.4/include/asm-i386/atomic.h Mon Feb 4 23:42:13 2002 +++ edited/include/asm/atomic.h Thu Jun 12 16:32:10 2003 @@ -110,6 +110,23 @@ :"m" (v->counter)); } + +/** + * atomic_inc_and_read - increment atomic variable and return new value + * @v: pointer of type atomic_t + * + * Atomically increments @v by 1. Note that the guaranteed + * useful range of an atomic_t is only 24 bits. + */ +static inline int atomic_inc_and_read(atomic_t *v) +{ + __asm__ __volatile__( + LOCK "incl %0" + :"=m" (v->counter) + :"m" (v->counter)); + return v->counter; +} + /** * atomic_dec - decrement atomic variable * @v: pointer of type atomic_t ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:34 ` Patrick Mochel @ 2003-06-12 23:40 ` Paul Mackerras 2003-06-12 23:50 ` Robert Love ` (3 more replies) 2003-06-12 23:42 ` Robert Love 2003-06-12 23:45 ` Davide Libenzi 2 siblings, 4 replies; 73+ messages in thread From: Paul Mackerras @ 2003-06-12 23:40 UTC (permalink / raw) To: Patrick Mochel; +Cc: Robert Love, Greg KH, Andrew Morton, sdake, linux-kernel Patrick Mochel writes: > It seems like the following should work. Would someone mind commenting on > it? > +/** > + * atomic_inc_and_read - increment atomic variable and return new value > + * @v: pointer of type atomic_t > + * > + * Atomically increments @v by 1. Note that the guaranteed > + * useful range of an atomic_t is only 24 bits. > + */ > +static inline int atomic_inc_and_read(atomic_t *v) > +{ > + __asm__ __volatile__( > + LOCK "incl %0" > + :"=m" (v->counter) > + :"m" (v->counter)); > + return v->counter; > +} BZZZT. If another CPU is also doing atomic_inc_and_read you could end up with both calls returning the same value. You can't do atomic_inc_and_read on 386. You can on cpus that have cmpxchg (e.g. later x86). You can also on machines with load-locked and store-conditional instructions (alpha, ppc, probably most other RISCs). Paul. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:40 ` Paul Mackerras @ 2003-06-12 23:50 ` Robert Love 2003-06-13 12:44 ` Richard B. Johnson 2003-06-12 23:52 ` Patrick Mochel ` (2 subsequent siblings) 3 siblings, 1 reply; 73+ messages in thread From: Robert Love @ 2003-06-12 23:50 UTC (permalink / raw) To: Paul Mackerras Cc: Patrick Mochel, Greg KH, Andrew Morton, sdake, linux-kernel On Thu, 2003-06-12 at 16:40, Paul Mackerras wrote: > BZZZT. If another CPU is also doing atomic_inc_and_read you could end > up with both calls returning the same value. That is what I thought. Damn. > You can't do atomic_inc_and_read on 386. You can on cpus that have > cmpxchg (e.g. later x86). You can also on machines with load-locked > and store-conditional instructions (alpha, ppc, probably most other > RISCs). So this is doable on everything but old i386 chips... hrm. Robert Love ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:50 ` Robert Love @ 2003-06-13 12:44 ` Richard B. Johnson 2003-06-13 12:54 ` Olivier Galibert 0 siblings, 1 reply; 73+ messages in thread From: Richard B. Johnson @ 2003-06-13 12:44 UTC (permalink / raw) To: Robert Love Cc: Paul Mackerras, Patrick Mochel, Greg KH, Andrew Morton, sdake, linux-kernel On Thu, 12 Jun 2003, Robert Love wrote: > On Thu, 2003-06-12 at 16:40, Paul Mackerras wrote: > > > BZZZT. If another CPU is also doing atomic_inc_and_read you could end > > up with both calls returning the same value. > > That is what I thought. Damn. > > > You can't do atomic_inc_and_read on 386. You can on cpus that have > > cmpxchg (e.g. later x86). You can also on machines with load-locked > > and store-conditional instructions (alpha, ppc, probably most other > > RISCs). > > So this is doable on everything but old i386 chips... hrm. > > Robert Love > No! They do not return the same value. They just don't return the final value, and the final value might not be important if the atomic operations are used correctly. The atomic stuff is to get rid of the read/modify/write problem where you have a read *INTERRUPT* (other read, other modify, other write), modify, write. Now the operand is nothing like expected. The atomic operations guarantee that the memory operand will, in fact, be completely modified as expected. Reading any value after such modification does not necessarily return the final value because somebody could modify it (again atomically), before you get a chance to read it. So, if you have N atomic_inc() and N atomic_dec() operations, the value will be whatever it was before the first operation. In other words, the value will be correct. FYI, all memory modify operations, not using an intermediate register, in 32-bit mode, of a longword or less, on ix86 machines are atomic, even without the lock prefix. like: memory: .long 0 incl (memory) incw (memory) incb (memory) This is because the CPU does not load the operand, modify it, then write it back. It might "physically" do this in hardware, but the physical operation cannot be interrupted until complete. So, in principle, the lock instruction isn't necessary for things like above. If you really need to know what the value of the memory operand was at the instant it was modified, you need use the locked/exchange instructions. Normally, you don't need to know the exact value of some semaphore, etc., only that some conditions were true or false at the instant of a test. Cheers, Dick Johnson Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips). Why is the government concerned about the lunatic fringe? Think about it. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 12:44 ` Richard B. Johnson @ 2003-06-13 12:54 ` Olivier Galibert 0 siblings, 0 replies; 73+ messages in thread From: Olivier Galibert @ 2003-06-13 12:54 UTC (permalink / raw) To: linux-kernel On Fri, Jun 13, 2003 at 08:44:52AM -0400, Richard B. Johnson wrote: > FYI, all memory modify operations, not using an intermediate register, > in 32-bit mode, of a longword or less, on ix86 machines are atomic, > even without the lock prefix. Unless you're on SMP. Of course 80386 SMP is not really what people want to do, but people may compile an "universal" 386 SMP kernel and run it on a later SMP box. OG. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:40 ` Paul Mackerras 2003-06-12 23:50 ` Robert Love @ 2003-06-12 23:52 ` Patrick Mochel 2003-06-13 8:41 ` Alan Cox 2003-06-13 11:17 ` David Schwartz 3 siblings, 0 replies; 73+ messages in thread From: Patrick Mochel @ 2003-06-12 23:52 UTC (permalink / raw) To: Paul Mackerras; +Cc: Robert Love, Greg KH, Andrew Morton, sdake, linux-kernel > You can't do atomic_inc_and_read on 386. You can on cpus that have > cmpxchg (e.g. later x86). You can also on machines with load-locked > and store-conditional instructions (alpha, ppc, probably most other > RISCs). Damn, so be it. Ho hum, back to the spinlock I go.. -pat ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:40 ` Paul Mackerras 2003-06-12 23:50 ` Robert Love 2003-06-12 23:52 ` Patrick Mochel @ 2003-06-13 8:41 ` Alan Cox 2003-06-13 11:00 ` Paul Mackerras 2003-06-13 11:17 ` David Schwartz 3 siblings, 1 reply; 73+ messages in thread From: Alan Cox @ 2003-06-13 8:41 UTC (permalink / raw) To: Paul Mackerras Cc: Patrick Mochel, Robert Love, Greg KH, Andrew Morton, sdake, Linux Kernel Mailing List On Gwe, 2003-06-13 at 00:40, Paul Mackerras wrote: > You can't do atomic_inc_and_read on 386. You can on cpus that have lock xaddl $1, [foo] ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 8:41 ` Alan Cox @ 2003-06-13 11:00 ` Paul Mackerras 2003-06-13 11:07 ` Herbert Xu 2003-06-13 13:31 ` Alan Cox 0 siblings, 2 replies; 73+ messages in thread From: Paul Mackerras @ 2003-06-13 11:00 UTC (permalink / raw) To: Alan Cox Cc: Patrick Mochel, Robert Love, Greg KH, Andrew Morton, sdake, Linux Kernel Mailing List Alan Cox writes: > lock xaddl $1, [foo] On 386? I stand corrected. :) Paul. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 11:00 ` Paul Mackerras @ 2003-06-13 11:07 ` Herbert Xu 2003-06-13 13:31 ` Alan Cox 1 sibling, 0 replies; 73+ messages in thread From: Herbert Xu @ 2003-06-13 11:07 UTC (permalink / raw) To: Paul Mackerras, linux-kernel Paul Mackerras <paulus@samba.org> wrote: > Alan Cox writes: > >> lock xaddl $1, [foo] > > On 386? I stand corrected. :) My instruction manual lists it as 486 only. -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 11:00 ` Paul Mackerras 2003-06-13 11:07 ` Herbert Xu @ 2003-06-13 13:31 ` Alan Cox 2003-06-13 19:57 ` Joe Korty 1 sibling, 1 reply; 73+ messages in thread From: Alan Cox @ 2003-06-13 13:31 UTC (permalink / raw) To: Paul Mackerras Cc: Patrick Mochel, Robert Love, Greg KH, Andrew Morton, sdake, Linux Kernel Mailing List On Gwe, 2003-06-13 at 12:00, Paul Mackerras wrote: > Alan Cox writes: > > > lock xaddl $1, [foo] > > On 386? I stand corrected. :) Turns out its 486 and higher, so you win. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 13:31 ` Alan Cox @ 2003-06-13 19:57 ` Joe Korty 2003-06-13 21:10 ` Alan Cox 0 siblings, 1 reply; 73+ messages in thread From: Joe Korty @ 2003-06-13 19:57 UTC (permalink / raw) To: Alan Cox Cc: Paul Mackerras, Patrick Mochel, Robert Love, Greg KH, Andrew Morton, sdake, Linux Kernel Mailing List On Fri, Jun 13, 2003 at 02:31:40PM +0100, Alan Cox wrote: > On Gwe, 2003-06-13 at 12:00, Paul Mackerras wrote: > > Alan Cox writes: > > > > > lock xaddl $1, [foo] > > > > On 386? I stand corrected. :) > > Turns out its 486 and higher, so you win. Perhaps it's time to remove 386 support from 2.5? Users of very old machines can stick with 2.0, 2.2 or 2.4. Joe ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 19:57 ` Joe Korty @ 2003-06-13 21:10 ` Alan Cox 0 siblings, 0 replies; 73+ messages in thread From: Alan Cox @ 2003-06-13 21:10 UTC (permalink / raw) To: joe.korty Cc: Paul Mackerras, Patrick Mochel, Robert Love, Greg KH, Andrew Morton, sdake, Linux Kernel Mailing List On Gwe, 2003-06-13 at 20:57, Joe Korty wrote: > > Turns out its 486 and higher, so you win. > > > Perhaps it's time to remove 386 support from 2.5? Users > of very old machines can stick with 2.0, 2.2 or 2.4. You have to solve these problems generally anyway, and lots of other platforms have limited locking functions. What it means is that all sane PC hardware can do it efficiently ^ permalink raw reply [flat|nested] 73+ messages in thread
* RE: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:40 ` Paul Mackerras ` (2 preceding siblings ...) 2003-06-13 8:41 ` Alan Cox @ 2003-06-13 11:17 ` David Schwartz 2003-06-13 15:44 ` Davide Libenzi 3 siblings, 1 reply; 73+ messages in thread From: David Schwartz @ 2003-06-13 11:17 UTC (permalink / raw) To: Paul Mackerras; +Cc: linux-kernel Pual Mackerras is said to have opined: > Patrick Mochel writes: > > +static inline int atomic_inc_and_read(atomic_t *v) > > +{ > > + __asm__ __volatile__( > > + LOCK "incl %0" > > + :"=m" (v->counter) > > + :"m" (v->counter)); > > + return v->counter; > > +} > BZZZT. If another CPU is also doing atomic_inc_and_read you could end > up with both calls returning the same value. > > You can't do atomic_inc_and_read on 386. You can on cpus that have > cmpxchg (e.g. later x86). You can also on machines with load-locked > and store-conditional instructions (alpha, ppc, probably most other > RISCs). You can also do it with a conditional move instruction, but it's kind of ugly. No help on a '386 though. There are ways to do it that work on a 386, but they are all basically equivalent to (or worse than) acquiring a spinlock, doing the deed, and then releasing it. You could also do (in pseudo-code): top: ret <- v->counter inc ret LOCK incl v->counter cmp v->counter, ret jz end LOCK decl v->counter jmp top: end: return ret This does not strictly guarantee in order return values, but that's meaningless without a lock anyway. DS ^ permalink raw reply [flat|nested] 73+ messages in thread
* RE: [PATCH] udev enhancements to use kernel event queue 2003-06-13 11:17 ` David Schwartz @ 2003-06-13 15:44 ` Davide Libenzi 0 siblings, 0 replies; 73+ messages in thread From: Davide Libenzi @ 2003-06-13 15:44 UTC (permalink / raw) To: David Schwartz; +Cc: Linux Kernel Mailing List On Fri, 13 Jun 2003, David Schwartz wrote: > You could also do (in pseudo-code): > > top: > ret <- v->counter > inc ret > LOCK incl v->counter > cmp v->counter, ret > jz end > LOCK decl v->counter > jmp top: > end: > return ret > > This does not strictly guarantee in order return values, but that's > meaningless without a lock anyway. It is even worse since it can return same values. Suppose counter is 0 : TASK0 TASK1 top: ret <- v->counter ; ret = 0 inc ret ; ret -> 1 LOCK incl v->counter ; counter -> 1 top: ret <- v->counter ; ret = 1 inc ret ; ret -> 2 LOCK incl v->counter ; counter -> 2 cmp v->counter, ret ; match !! got 2 jz end ... cmp v->counter, ret ; ret == 1 , counter == 2 no-match jz end LOCK decl v->counter ; counter -> 1 jmp top Then TASK0 starts again with no interruption and gets 2. You need instrucions that does atomic mod/cmp to do this kind of tricks. - Davide ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:34 ` Patrick Mochel 2003-06-12 23:40 ` Paul Mackerras @ 2003-06-12 23:42 ` Robert Love 2003-06-12 23:45 ` Davide Libenzi 2 siblings, 0 replies; 73+ messages in thread From: Robert Love @ 2003-06-12 23:42 UTC (permalink / raw) To: Patrick Mochel; +Cc: Greg KH, Andrew Morton, sdake, linux-kernel On Thu, 2003-06-12 at 16:34, Patrick Mochel wrote: > > Nice thinking. It is a shame we need a lock for this, but we don't > > have an atomic_inc_and_return(). > > Those were my sentiments exactly: Heh. > +static inline int atomic_inc_and_read(atomic_t *v) > +{ > + __asm__ __volatile__( > + LOCK "incl %0" > + :"=m" (v->counter) > + :"m" (v->counter)); > + return v->counter; > +} What prevents a race between the increment and the return (i.e. you return v->counter after another person also increments it)? Only the increment is guaranteed atomic. I think you need to copy the result of the increment into a local variable _inside_ of the LOCK and return that. Whether or not that will work sanely on all architectures I dunno. Robert Love ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 23:34 ` Patrick Mochel 2003-06-12 23:40 ` Paul Mackerras 2003-06-12 23:42 ` Robert Love @ 2003-06-12 23:45 ` Davide Libenzi 2 siblings, 0 replies; 73+ messages in thread From: Davide Libenzi @ 2003-06-12 23:45 UTC (permalink / raw) To: Patrick Mochel Cc: Robert Love, Greg KH, Andrew Morton, sdake, Linux Kernel Mailing List On Thu, 12 Jun 2003, Patrick Mochel wrote: > > > > + spin_lock(&sequence_lock); > > > + seq = sequence_num++; > > > + spin_unlock(&sequence_lock); > > > + > > > + envp [i++] = scratch; > > > + scratch += sprintf(scratch, "SEQNUM=%ld", seq) + 1; > > > > Nice thinking. It is a shame we need a lock for this, but we don't have > > an atomic_inc_and_return(). > > Those were my sentiments exactly: > > 16:21 * mochel searches for atomic_inc_and_read() :) > > It seems like the following should work. Would someone mind commenting on > it? It does not Pat, look at the generated asm. - Davide ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 22:50 ` Greg KH 2003-06-12 22:51 ` Andrew Morton @ 2003-06-12 23:05 ` Robert Love 2003-06-19 19:51 ` Werner Almesberger 2 siblings, 0 replies; 73+ messages in thread From: Robert Love @ 2003-06-12 23:05 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, sdake, linux-kernel On Thu, 2003-06-12 at 15:50, Greg KH wrote: > - serialize the hotplug events in userspace: > - udev daemon running listening on named pipe > - small event generator kicked off from /sbin/hotplug > call to write event to udev pipe What if you just passed a sequence number to /sbin/hotplug ? Robert Love ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 22:50 ` Greg KH 2003-06-12 22:51 ` Andrew Morton 2003-06-12 23:05 ` Robert Love @ 2003-06-19 19:51 ` Werner Almesberger 2003-06-26 12:17 ` Tommi Virtanen 2 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-06-19 19:51 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, sdake, linux-kernel Greg KH wrote: > On Thu, Jun 12, 2003 at 03:03:35PM -0700, Andrew Morton wrote: > > This is a significantly crappy aspect of the /sbin/hotplug callout. I'd be > > very interested in reading an outline of how you propose fixing it, without > > waiting until OLS, thanks. > > Sure, I knew someone would probably want to :) An interesting problem, with a lot of potential for wrong solutions ;-) A few comments from an interested bystander: 1) Reordering: First of all, it doesn't seem to be clear whether the ordering mechanism as a whole is supposed to be reliable. E.g. your timeout-based approach would fix reordering with a certain probability, but occasionally, reordering would still occur. (This is similar to IP networks: a little but of reordering is okay, but if you do it a lot, TCP performance will suffer.) Also, you don't try to correct losses, right ? I.e. if we get events A and B, A is lost, then B is handled without any attempt to retry A first. Right ? If you want a reliable reordering detection, /sbin/hotplug needs to know what happens with concurrent events, e.g. if A and B occur in parallel, B cannot be executed unless we know that A has either succeeded or failed. Sequence numbers alone don't help (you could, of course, combine them with timeouts, and end up with a more efficent version of your original timeout approach). You can track concurrent events in a number of ways, e.g. by going back to the kernel, and asking how many "older" instances of /sbin/hotplug are running, or they could communicate directly among each other. E.g. the kernel could open a pipe for each /sbin/hotplug, and give each /sbin/hotplug a duplicate of the reader end of the pipe of each concurrently running /sbin/hotplug. /sbin/hotplug could then poll them, and wail until all those fds are closed. What I don't quite understand is why you won't want to serialize in the kernel. The overall resource footprint of sleeping /sbin/hotplugs, and such, is certainly much larger than a few queued event messages. Furthermore, if you serialize in the kernel, you can easily and reliably indicate how many events have been lost since the last one. 2) Communication mechanism: Given all these complications, I'm not so sure if just sending messages (ASCII, EBCDIC, binary, Haiku, whatever ;-) wouldn't be more convenient in the end. You could dispatch them later to scripts easily enough. But since time doesn't seem to be an issue (more about that below), you could of course also use the same concept we use for interrupts: make /sbin/hotplug a "fast" interface script, which then delegates the real work to some other process. Given that it's a bit of an uphill battle to write user-space programs that absolutely never fail, it may also be good to have some completion signal that can be used to keep track of dropped events, and that can then be used to trigger a recovery procedure. A central dispatcher would also have the advantage of possessing full information on the events being handled. That way, events without interdependencies could still be processed in parallel. 3) We're in no hurry at all: Sorry, I don't buy this. You're of course right that bus systems that need some slow, timeout-based scan won't initialize quickly anyway. But it wouldn't be all that hard to make them faster, and then the hotplug interface would become the bottleneck. Example: put 1000 SCSI drives in a cabinet with a door switch. When the door is open, do things in the usual, slow way. When the door is closed, no drives can be added or removed, so the system could cache bus scan results and synthesize NACKs for absent devices. So I think it makes sense to avoid obvious bottlenecks in this design. Therefore, as long as any kind of serialization is required, and unless the kernel itself knows what can be parallelized, and what not, a dispatcher demon looks like the most light-weight solution. If you're worried about having yet another rarely used demon in the system, just make /sbin/hotplug persistent. If it is idle for too long, it can exit, and when there are new events, the kernel can launch it again. 4) Losses: Actually, I'm not so sure what really ought to happen with losses. If we have serialization requirements elsewhere, proceeding after unrecovered losses would probably mean that they're being violated. So if they can be violated then, maybe there is some leeway in other circumstances too ? On the other hand, if any loss means that major surgery is needed, the interface should probably have a "in loss" state, in which it just shuts down until someone cleans up the mess. Also a partial shutdown may be interesting (e.g. implemented by the dispatcher), where events with no interdependencies with other events would still be processed. The kernel could even provide some hints of what has been lost. (I.e. aggregate any information in the lost events.) That way, the partial shutdown would be even more efficient. But I think I'm overdesining now :-) Anyway, just my 2 centavos :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 19:51 ` Werner Almesberger @ 2003-06-26 12:17 ` Tommi Virtanen 2003-06-26 14:35 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Tommi Virtanen @ 2003-06-26 12:17 UTC (permalink / raw) To: Werner Almesberger; +Cc: Greg KH, Andrew Morton, sdake, linux-kernel On Thu, Jun 19, 2003 at 04:51:35PM -0300, Werner Almesberger wrote: > 4) Losses: > > Actually, I'm not so sure what really ought to happen with > losses. If we have serialization requirements elsewhere, > proceeding after unrecovered losses would probably mean that > they're being violated. So if they can be violated then, > maybe there is some leeway in other circumstances too ? > > On the other hand, if any loss means that major surgery is > needed, the interface should probably have a "in loss" state, > in which it just shuts down until someone cleans up the mess. > Also a partial shutdown may be interesting (e.g. implemented > by the dispatcher), where events with no interdependencies > with other events would still be processed. One thing came to my mind: If you have a sysfs-scanning method for startup, couldn't you just make the sequence-number-checking daemon reset its state and redo the sysfs scan on loss of events? (Or even make it just exec itself and use the exact same code as at startup.) That way the system recovers from event loss (or a reordering that gets the earlier event too late and is believed to be a loss) in a way that needs to work anyway, and isn't a magic special case. -- :(){ :|:&};: ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-26 12:17 ` Tommi Virtanen @ 2003-06-26 14:35 ` Werner Almesberger 0 siblings, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-06-26 14:35 UTC (permalink / raw) To: Tommi Virtanen; +Cc: Greg KH, Andrew Morton, sdake, linux-kernel Tommi Virtanen wrote: > If you have a sysfs-scanning method for startup, couldn't you > just make the sequence-number-checking daemon reset its state > and redo the sysfs scan on loss of events? Yes, that's the easier approach if you don't have any detection of error in the kernel itself. If the kernel already does all the work for figuring out that something has gone wrong, it may as well use this to reduce the noise. > That way the system recovers from event loss (or a reordering > that gets the earlier event too late and is believed to be a > loss) in a way that needs to work anyway, and isn't a magic > special case. Let's just hope reordering stays dead :-) There's still a bit of magic in loss recovery, because you need something that triggers loss recovery, after the loss has happened. That can of course just be whatever else happens to come along (or the user getting impatient), or a periodic scan. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 22:03 ` Andrew Morton 2003-06-12 22:50 ` Greg KH @ 2003-06-13 8:40 ` Alan Cox 2003-06-13 9:14 ` Olivier Galibert ` (2 more replies) 1 sibling, 3 replies; 73+ messages in thread From: Alan Cox @ 2003-06-13 8:40 UTC (permalink / raw) To: Andrew Morton; +Cc: Greg KH, sdake, Linux Kernel Mailing List On Iau, 2003-06-12 at 23:03, Andrew Morton wrote: > This is a significantly crappy aspect of the /sbin/hotplug callout. I'd be > very interested in reading an outline of how you propose fixing it, without > waiting until OLS, thanks. Dave Miller posted a simple patch to allow netlink to be used for kernel->user messages - hotplug/disk error/logging/whatever. I'm suprised therefore that the whole thing is being regurgitated again. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 8:40 ` Alan Cox @ 2003-06-13 9:14 ` Olivier Galibert 2003-06-19 20:53 ` Werner Almesberger 2003-06-13 16:05 ` Steven Dake 2003-06-13 16:32 ` Greg KH 2 siblings, 1 reply; 73+ messages in thread From: Olivier Galibert @ 2003-06-13 9:14 UTC (permalink / raw) To: Linux Kernel Mailing List On Fri, Jun 13, 2003 at 09:40:37AM +0100, Alan Cox wrote: > Dave Miller posted a simple patch to allow netlink to be used for > kernel->user messages - hotplug/disk error/logging/whatever. I'm > suprised therefore that the whole thing is being regurgitated again. "Netlink is not a reliable protocol." sez the manpage. That sounds a little ridiculous for a local kernel<->user message protocol. Having to manage acks on that kind of communication is both painful and quasi-untestable. OG. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 9:14 ` Olivier Galibert @ 2003-06-19 20:53 ` Werner Almesberger 0 siblings, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-06-19 20:53 UTC (permalink / raw) To: Olivier Galibert, Linux Kernel Mailing List Olivier Galibert wrote: > "Netlink is not a reliable protocol." sez the manpage. > > That sounds a little ridiculous for a local kernel<->user message > protocol. There are many way of being unreliable. If it means "bad things happen and nobody is told about it", then yes, this would be strange in this context. Howver, "loss due to congestion" is a perfectly reasonable behaviour (*), and netlink indicates when this has occurred. If you get more events than you can queue or compress (e.g. if you have on/off events but are only interested in the current status, you could aggregate any number of such events in just one bit), congestion is possible. > Having to manage acks on that kind of communication is both > painful and quasi-untestable. <ObPlug> Ha ! That's exactly the kind of stuff I'm writing umlsim for :-) http://umlsim.sourceforge.net/ </ObPlug> - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 8:40 ` Alan Cox 2003-06-13 9:14 ` Olivier Galibert @ 2003-06-13 16:05 ` Steven Dake 2003-06-13 16:32 ` Greg KH 2 siblings, 0 replies; 73+ messages in thread From: Steven Dake @ 2003-06-13 16:05 UTC (permalink / raw) To: Alan Cox; +Cc: Andrew Morton, Greg KH, Linux Kernel Mailing List Alan Thanks didn't see that patch come by. I'll have a look -steve Alan Cox wrote: >On Iau, 2003-06-12 at 23:03, Andrew Morton wrote: > > >>This is a significantly crappy aspect of the /sbin/hotplug callout. I'd be >>very interested in reading an outline of how you propose fixing it, without >>waiting until OLS, thanks. >> >> > >Dave Miller posted a simple patch to allow netlink to be used for >kernel->user messages - hotplug/disk error/logging/whatever. I'm >suprised therefore that the whole thing is being regurgitated again. > > > > > > ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 8:40 ` Alan Cox 2003-06-13 9:14 ` Olivier Galibert 2003-06-13 16:05 ` Steven Dake @ 2003-06-13 16:32 ` Greg KH 2 siblings, 0 replies; 73+ messages in thread From: Greg KH @ 2003-06-13 16:32 UTC (permalink / raw) To: Alan Cox; +Cc: Andrew Morton, sdake, Linux Kernel Mailing List On Fri, Jun 13, 2003 at 09:40:37AM +0100, Alan Cox wrote: > On Iau, 2003-06-12 at 23:03, Andrew Morton wrote: > > This is a significantly crappy aspect of the /sbin/hotplug callout. I'd be > > very interested in reading an outline of how you propose fixing it, without > > waiting until OLS, thanks. > > Dave Miller posted a simple patch to allow netlink to be used for > kernel->user messages - hotplug/disk error/logging/whatever. I'm > suprised therefore that the whole thing is being regurgitated again. For error logging stuff I think the netlink interface is fine. And I'm pretty sure some of the IBM RAS people are looking into useing it. But as a hotplug interface, no, I do not want to change to using it. The bash /sbin/hotplug plugin writers would hate me... :) thanks, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 21:47 ` Greg KH 2003-06-12 22:03 ` Andrew Morton @ 2003-06-13 15:51 ` Steven Dake 2003-06-13 16:41 ` Patrick Mochel 2003-06-13 16:42 ` Greg KH 1 sibling, 2 replies; 73+ messages in thread From: Steven Dake @ 2003-06-13 15:51 UTC (permalink / raw) To: Greg KH; +Cc: linux-kernel Greg KH wrote: >Nice, you posted the same message and tarball to lkml that we have been >talking about privately for a few days. > >Just to save time, and to bring everyone else up to speed, I'll just >include my responses and then yours to this thread and then finally >respond to your last message to me about this. > >Let me know if I get any of the attribution wrong, this is going to be a >fun cut and paste... > > Thanks for brining everyone up to speed. Everything looks correct from a quote point. > >On Thu, Jun 12, 2003 at 12:10:48PM -0700, Steven Dake wrote: > > >>Folks, >> >>I have been looking at the udev idea that Greg KH has developed. >>Userland device enumeration definately is the way to go, however, there >>are some problems with using /sbin/hotplug to transmit device >>enumeration events: >> >>1) performance of fork/exec with lots of devices is very poor >> >> > >You later said your binary of udev was 500k. I said: > 500k binary? Who's running anything that big? udev weighs in > around 6k right now! You have to add a _lot_ of functionality > to move up to 500k. > >You responded: > Hmm perhaps my binary was not built optmized when I looked at > it, I'll take another look. > > > >>2) lots of processes can be forked with no policy to control how many >>are forked at once potentially leading to OOM >> >> > >I responded: > No, this is pure speculation. Have you actually tried this out? > I have. I've hotplugged pci drawers connected to huge numbers > of SCSI disks and never even seen a blip on system load. It > turns out that pretty much everything happens consecutively, with > a reasonable amount of time just spent probing the SCSI busses. > USB is pretty slow in enumerating their devices too, so I don't > see a real world application that has this problem. Do you have > any proof of this? > >You responded: > I don't agree. I've timed bootup with 12 fibrechannel disks > with 4 partitions each with udev using the SDEF patches. I've > then timed bootup and added in the time required to add those 12 > fibrechannel disks (via a script) using /sbin/hotplug. The > script method (executing /sbin/hotplug 12*4 times with the > appropriate data hardcoded) takes rougly 1.25-1.75 seconds > longer. Since I'm using a chronograph, its not very accurate, > but it is definately slower to enumerate with /sbin/hotplug > using this methodlogy. That is only 12 disks. What about the > case of 1000 disks? > > fork and exec are very expensive system calls, to be avoided at > all costs... > >To which I responded: > Well, if you don't dynamically link a 500K file, perhaps it > would be a lot faster to start up :) > > Anyway, who really cares? This is _not_ a time critical thing. > Who really cares that it takes 1 second longer to see the device > node for the newly plugged in device? And if you plug in 5000 > devices all at once, you better be willing to wait the extra > minute or so for the system to calm down and make all of the > devices available to userspace. > >And again you came back with: > System bootup time and time between failures are the two factors > used to determine availability. Reducing bootup time and > increasing time between failures both improve availability. Even > 1 second is considerable when 5 9s is 30seconds of downtime per > year. 1 more second is a considerable change in availability > when you run the equations. > >I now respond: >Bootup time reduction would be a great thing to have. And I agree for >situations where 1 second is a considerable amount of time, it is >important. However, 99.99% of the people running Linux out there, do >not have those problems. And even then, for those 00.01% of the people >who do, they do whatever they possibly can to keep from having to reboot >at all. I still say the measurable ammount of time to plug in a new >disk and have it's device node created is not measurable, from using >/sbin/hotplug and udev, and using your character node. Also, the >ultimate solution for these kinds of people is the in-kernel devfs, or >the current static /dev. If they use either of them, it's guaranteed to >be faster then yours or mine implementation. > > I suppose it is possible that devfs could be faster, however, there are significant amounts of policies that can never be done in devfs which must be done in user space. For these types of applications, it makes sense to provide the fastest mechanism available, even when it may only improve boot performance by 1 second. I understand what you mean by saying that 99.99% of users don't care about availability. While those particular users may spend significant amounts of time improving Linux, it is the remaining folks that care about availability that are interested in investing money into r&d to improve availability while also improving feature set. It is this set of folks, (the people that do care about availability) that this patch is targeted towards. > > >>3) /sbin/hotplug events can occur out of order, eg: remove event occurs, >>/sbin/hotplug sleeps waiting for something, insert event occurs and >>completes immediately. Then the remove event completes, after the >>insert, resulting in out-of-order completion and a broken /dev. I have >>seen this several times with udev. >> >> > >I responded: > Yes this happens. I have a fix for this for udev itself. No > kernel changes are needed. I'll show it at OLS in July if you > want to see it :) > >You responded: > There is a problem in udev that fork doesn't wait for the parent > to exit (mknod) which is solved by using the mknod system call. > But there is still another problem that /sbin/hotplug can > execute out of order. I'll be at OLS so I'll take a look at > your solution then. > >So we agree to talk about this at OLS, fine. > > > >>4) early device enumeration (character devices, etc) are missed because >>/sbin/hotplug is not available during early device init. >> >> > >I responded: > Then use early initramfs to catch this. I've done this and seen > it work already. If you don't want to do early init, walk the > sysfs tree yourself from userspace, it works just as well, and > we don't have to do any buffering in kernelspace at all. > >You responded: > Initramfs still misses early events. That leaves walking sysfs. > > I have considered and rejected using a user space program to > walk sysfs for this information. There are several problems > 1) kobject device name is not present in sysfs (how do you know > if its a character or block, then without some external mapping > db, yuck) > 2) scan of sysfs is not atomic, resulting in potential lost > events and lost device enumerations > >I responded: > You don't know this from hotplug either. You have to figure it > out from the kobject name either way. Actually it's quite easy, > only block devices show up in /sys/block, everything else is a > character device. > > No, this is quite simple to fix: > - set up /sbin/hotplug udev handler after init has > started. > - start walking sysfs to "cold plug" devices. > - any new events that happen are caught by udev and > handled there, while you are walking the tree. > - no events are lost. > >You responded: > Events could be handled in this situation twice, which is > definately not what is desireable. > >I now respond: >Doing a mknod() on a node that is already present doesn't harm anything. >And since you have to keep a database around of what devices are >present, and what you have called them, removing them after already >removing them again, is detectable. So handing the same event twice >isn't a problem. > > I suppose you could handle multiple events twice, but the ability to detect a replacement without the OS removing the initial kobject is now removed. This is a real case which should be handled. > > >>To solve these problems, I've developed a driver and some minor >>modifications to the lib/kobject.c to queue these events and allow a >>user space program to read the events at its leisure. The driver >>supports poll/select for effecient use of the processor. >> >>The feature is called the System Device Enumeration Event Queue. I've >>attached a tarball which includes udev-0.1 modified to use the SDEQ, a >>kernel patch against linux 2.5.70 that implements the SDEQ, and a >>test/library that tests the SDEQ functionality. >> >> > >I responded: > Ok, I looked at your code. To put it kindly, it will never work > properly. See the long thread on lkml by Inaky after I > announced udev. He tried to create something like what you have > done, but got a big better implementation (yours has a number of > memory leaks, and permission problems...) > > See that thread for why doing this in the kernel is not the way > to go. Please don't waste your time on this anymore, it's a > loosing battle... > >You responded: > It obviously works properly when using the udev application so I > am at a loss as to how you can make a claim that "it will never > work". Please explain your comments about memory leaks. Data > structures are allocated and then freed at appropriate times. > Even in the case of a crash during a transaciton (get event, > clera events), there are no leaks. Please explain permission > problems. Permissions are controlled through policy in the > filesystem (/dev/sdeq) which fully controls access to the system > device enumeration event queue. > > I did look at Inaky's code, and frankly it was too complicated > to solve the simple problem that needed to be solved. We also > have a full-blown event mechanism (for redundant system slot > compact pci support), but feel its too heavy-weight to be of > general purpose use to the kernel. This particular > implementation is lightweight and solves a specific need with > real applications that can really use the functionality. > > I have read the thread and atleast 70% of the comments where "we > should not use /sbin/hotplug, but instead events should be > queued in the kernel". The thread raised all of the issues that > are solved with this kernel patch. Several core kernel > maintainers were the authors of the comments, so I'm not alone > in this belief. > > Your obviously opposed to using character devices to queue > kobject creation/deletion events for some reason which I don't > understand. /sbin/hotplug is an inferior solution to queueing > events in the kernel. Your not the maintainer of the code this > feature is going into (which is the linux driver model). I'd > really like to hear Pat's thoughts on this issue as he is the > one that has to live with the changes. If you want to continue > using /sbin/hotplug, with all of its numerous problems for your > work, your more then welcome to do so. No one is forcing you to > use sdeq, however, for those other vendors that may want to make > persistent device naming solutions, this is a core feature that > can not easily be solved by poorly thought out hacks. > >I will respond now: > >Ok, permission problems can be solved by relying on the permission of >the device node, you are correct. > >As for the memory issues, if no one ever reads from the character node, >it will constantly fill up with events, right? Shouldn't they be >eventually flushed out at some point in time? > > This is a problem... I wasn't quite sure how to handle this. The two choices are to include timeouts in events (after a certain amount of time, events are timed out and freed) or to allow only a certain number of events, after which events at the front of the queue are flushed. The reality though, is that the user will be running the daemon to clear out the events. If they don't, then they get what they deserve :) >Also for trying to remove events, why is userspace allowed to remove a >multiple of events at once? I would think that a valid read would >remove that event, you can have two applications grab the same event, >which is not what I think you want to have happen. > I probably should have explained why this decision was made. I believe it is desireable to be able to handle multiple events at once. The process is: user reads all events on event queue, user processes events, user deletes all events atomically. This solves the problem of what happens if the daemon crashes during deletion of events, or during processing of events. The daemon will be able to totally recover and process events properly. > >And finally... > >Yes, I know I'm not the maintainer of the code that this feature is >going into. But my position is that this code is not needed in the main >kernel tree, and offer that opinion to the maintainer of this code. > >I agree, most of the arguments in the previous udev thread talked about >out of order events, but I think I've gotten that solved now (I can wave >my hands about how it will be done, or I can show you working code at >OLS. I think I'll wait until OLS, as code matters, wavy hands don't.) > >So, because of this, I still think that everything you are trying to do >can be successfully done from userspace today (or use devfsd today, it >will give you the same info!) And due to that, I do not think this code >is necessary to be in the kernel. > > Everything could possibly be done in userspace, but performance is poor, and I am still concerned about the out-of-order events issue. >Thanks for reading this far, and my apologies if I got any of the above >quotes out of order or misspoken, I just wanted to cut to the chase, and >not have to go through 4 more rounds of email to make it to this point >in the conversation again. > > Thanks greg -steve >greg k-h > > > > > ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 15:51 ` Steven Dake @ 2003-06-13 16:41 ` Patrick Mochel 2003-06-13 16:42 ` Greg KH 1 sibling, 0 replies; 73+ messages in thread From: Patrick Mochel @ 2003-06-13 16:41 UTC (permalink / raw) To: Steven Dake; +Cc: Greg KH, linux-kernel > I suppose it is possible that devfs could be faster, however, there are > significant amounts of policies that can never be done in devfs which > must be done in user space. For these types of applications, it makes > sense to provide the fastest mechanism available, even when it may only > improve boot performance by 1 second. Eh? Why must you completely re-engineer a solution because you see the current one as deficient? Not only is it completely over-engineered by enforcing your fanatical ideas about requiring a new system daemon, but it's total pre-mature optimization. On top of that, you don't have any accurate numbers to back up your claims. Please perform and post the timings I suggested yesterday, and then we'll talk. > I understand what you mean by saying that 99.99% of users don't care > about availability. While those particular users may spend significant > amounts of time improving Linux, it is the remaining folks that care > about availability that are interested in investing money into r&d to > improve availability while also improving feature set. It is this set > of folks, (the people that do care about availability) that this patch > is targeted towards. Then it is your responsibility to merge the continuing efforts and design goals of the kernel with the requirements of your high-paying customers in the smoothest possible way. Serving one while ignoring the other is a good way to waste a lot of time and money. I care about availability. But, I am not willing to integrate or support features that come from some random person just because they claim to improve availability, especially when a) I don't like the numbers and b) there are no numbers to back it up. > >As for the memory issues, if no one ever reads from the character node, > >it will constantly fill up with events, right? Shouldn't they be > >eventually flushed out at some point in time? > > > > > This is a problem... I wasn't quite sure how to handle this. The two > choices are to include timeouts in events (after a certain amount of > time, events are timed out and freed) or to allow only a certain number > of events, after which events at the front of the queue are flushed. > > The reality though, is that the user will be running the daemon to clear > out the events. If they don't, then they get what they deserve :) And this improves availability how? -pat ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 15:51 ` Steven Dake 2003-06-13 16:41 ` Patrick Mochel @ 2003-06-13 16:42 ` Greg KH 2003-06-13 17:17 ` Chris Friesen 1 sibling, 1 reply; 73+ messages in thread From: Greg KH @ 2003-06-13 16:42 UTC (permalink / raw) To: Steven Dake; +Cc: linux-kernel On Fri, Jun 13, 2003 at 08:51:01AM -0700, Steven Dake wrote: > >I now respond: > >Bootup time reduction would be a great thing to have. And I agree for > >situations where 1 second is a considerable amount of time, it is > >important. However, 99.99% of the people running Linux out there, do > >not have those problems. And even then, for those 00.01% of the people > >who do, they do whatever they possibly can to keep from having to reboot > >at all. I still say the measurable ammount of time to plug in a new > >disk and have it's device node created is not measurable, from using > >/sbin/hotplug and udev, and using your character node. Also, the > >ultimate solution for these kinds of people is the in-kernel devfs, or > >the current static /dev. If they use either of them, it's guaranteed to > >be faster then yours or mine implementation. > > > > > I suppose it is possible that devfs could be faster, however, there are > significant amounts of policies that can never be done in devfs which > must be done in user space. For these types of applications, it makes > sense to provide the fastest mechanism available, even when it may only > improve boot performance by 1 second. That's what devfsd can be used for. It provides you with a way to get those events, in a binary fashion, with no out-of-order issues, in a "quick" way. People have implemented persistant device naming schemes by using this interface in the past. Don't write something new when what you are looking for is already there. > I understand what you mean by saying that 99.99% of users don't care > about availability. While those particular users may spend significant > amounts of time improving Linux, it is the remaining folks that care > about availability that are interested in investing money into r&d to > improve availability while also improving feature set. It is this set > of folks, (the people that do care about availability) that this patch > is targeted towards. Like Pat said, until you prove the speed and uptime differences between the following, I don't see how you can say this with a straight face: - The default /sbin/hotplug for Redhat 9 systems - The default /sbin/hotplug for Montavista systems. - my mini-hotplug - Your sdeq - devfsd After running those numbers, please let us know what you find out. > >I now respond: > >Doing a mknod() on a node that is already present doesn't harm anything. > >And since you have to keep a database around of what devices are > >present, and what you have called them, removing them after already > >removing them again, is detectable. So handing the same event twice > >isn't a problem. > > > > > I suppose you could handle multiple events twice, but the ability to > detect a replacement without the OS removing the initial kobject is now > removed. This is a real case which should be handled. Um, huh? How can the OS not remove a kobject, and not tell userspace? And if a device is unplugged, and then plugged right back in again, in the same place, with the same attributes (which is the only way you would not be able to detect this easily), what is the big problem? You would act apon that device in the same way, again, which would be just fine. > >As for the memory issues, if no one ever reads from the character node, > >it will constantly fill up with events, right? Shouldn't they be > >eventually flushed out at some point in time? > > > This is a problem... I wasn't quite sure how to handle this. The two > choices are to include timeouts in events (after a certain amount of > time, events are timed out and freed) or to allow only a certain number > of events, after which events at the front of the queue are flushed. > > The reality though, is that the user will be running the daemon to clear > out the events. If they don't, then they get what they deserve :) Heh, then your kernel which _has_ to stay up (due to your previous guarantees of uptime) keels over. I don't think that by requiring that a kernel _has_ to have a specific userspace program running in order for it to stay healthy would be anying any "carrier grade" user would ever agree to. :) > >Also for trying to remove events, why is userspace allowed to remove a > >multiple of events at once? I would think that a valid read would > >remove that event, you can have two applications grab the same event, > >which is not what I think you want to have happen. > > > I probably should have explained why this decision was made. I believe > it is desireable to be able to handle multiple events at once. The > process is: user reads all events on event queue, user processes events, > user deletes all events atomically. This solves the problem of what > happens if the daemon crashes during deletion of events, or during > processing of events. The daemon will be able to totally recover and > process events properly. But your implementation does not do this, so you added features and complexity before they were needed :) Again, you are relying on a userspace program to keep the kernel safe, not a wise choice. > >So, because of this, I still think that everything you are trying to do > >can be successfully done from userspace today (or use devfsd today, it > >will give you the same info!) And due to that, I do not think this code > >is necessary to be in the kernel. > > > > > Everything could possibly be done in userspace, but performance is poor, > and I am still concerned about the out-of-order events issue. We just solved the out-of-order events issue. I don't see any proof of your performance claims. Hence, I don't think this patch should be applied. thanks, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 16:42 ` Greg KH @ 2003-06-13 17:17 ` Chris Friesen 0 siblings, 0 replies; 73+ messages in thread From: Chris Friesen @ 2003-06-13 17:17 UTC (permalink / raw) To: Greg KH; +Cc: Steven Dake, linux-kernel Greg KH wrote: > On Fri, Jun 13, 2003 at 08:51:01AM -0700, Steven Dake wrote: >>The reality though, is that the user will be running the daemon to clear >>out the events. If they don't, then they get what they deserve :) >> > > Heh, then your kernel which _has_ to stay up (due to your previous > guarantees of uptime) keels over. I don't think that by requiring that > a kernel _has_ to have a specific userspace program running in order for > it to stay healthy would be anying any "carrier grade" user would ever > agree to. > > :) I don't think I totally buy that argument. Often a "carrier-grade" box has a whole bunch of things that must be running for the box to stay healthy. If one of them dies, the system logs the problem along with diagnostic information and tries to restart it. If it can't be restarted, then we failover to the warm standby and go for a reboot to try and clean things up. I agree that there are other issues, but needing to have a particular binary running isn't really a problem. (init, for example...) Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 19:10 [PATCH] udev enhancements to use kernel event queue Steven Dake 2003-06-12 21:47 ` Patrick Mochel 2003-06-12 21:47 ` Greg KH @ 2003-06-12 22:27 ` Oliver Neukum 2003-06-13 16:03 ` Steven Dake 2003-06-13 7:17 ` Christoph Hellwig 2003-06-13 18:06 ` Linus Torvalds 4 siblings, 1 reply; 73+ messages in thread From: Oliver Neukum @ 2003-06-12 22:27 UTC (permalink / raw) To: Steven Dake; +Cc: linux-kernel > If it works for you or doesn't or you like the idea or don't, I've love > to hear about it + default: + result = -EINVAL; + break; + } + return (result); Must return ENOTTY. +static int sdeq_open (struct inode *inode, struct file *file) +{ + MOD_INC_USE_COUNT; + + return 0; +} + +static int sdeq_release (struct inode *inode, struct file *file) +{ + MOD_DEC_USE_COUNT; + + return (0); +} Wrong. release does not map to close() Aside from that, what exactly are you trying to do? You are not solving the fundamental device node reuse race, yet you are making necessary a further demon. You are not addressing queue limits. The current hotplug scheme does so, admittedly crudely by failing to spawn a task, but considering the small numbers of events in question here, for the time being we can live with that. You can just as well add load control and error detection to the current scheme. You fail to do so in your scheme. You cannot queue events forever in unlimited numbers. As for ordering, this is a real problem, but not fundamental. You can make user space locking work. IMHO it will not be pretty if done with shell scripts, but it can work. There _is_ a basic problem with the kernel 'overtaking' user space in its view of the device tree, but you cannot solve that _at_ _all_ in user space. In short, if you feel that the hotplug scheme is inadequate for your needs, then write industry strength devfs2. Regards Oliver ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 22:27 ` Oliver Neukum @ 2003-06-13 16:03 ` Steven Dake 2003-06-13 16:50 ` Patrick Mochel ` (3 more replies) 0 siblings, 4 replies; 73+ messages in thread From: Steven Dake @ 2003-06-13 16:03 UTC (permalink / raw) To: Oliver Neukum; +Cc: linux-kernel Oliver Neukum wrote: >>If it works for you or doesn't or you like the idea or don't, I've love >>to hear about it >> >> > >+ default: >+ result = -EINVAL; >+ break; >+ } >+ return (result); > >Must return ENOTTY. > >+static int sdeq_open (struct inode *inode, struct file *file) >+{ >+ MOD_INC_USE_COUNT; >+ >+ return 0; >+} >+ >+static int sdeq_release (struct inode *inode, struct file *file) >+{ >+ MOD_DEC_USE_COUNT; >+ >+ return (0); >+} > >Wrong. release does not map to close() > > > hmm not sure where I got that from i'll fix thanks. >Aside from that, what exactly are you trying to do? >You are not solving the fundamental device node reuse race, >yet you are making necessary a further demon. > > For device enumeration, I see a daemon as necessary. The main goal of this work is to solve the out-of-order execution of sbin/hotplug and improve performance of the system during device enumeration with significant (200 disks, 4 partitions each) amounts of devices. Boot time with this scheme appears, in my rudimentary tests, to be faster on the order of 1-2 seconds for bootup for the case of just 12 disks. I would imagine 200 disks (which I don't have a good way to test, as I don't have 200 disks:) would provide better speed gains during bootup. This compares greg's original udev to this patched udev binary. >You are not addressing queue limits. The current hotplug >scheme does so, admittedly crudely by failing to spawn >a task, but considering the small numbers of events in >question here, for the time being we can live with that. > >You can just as well add load control and error detection >to the current scheme. You fail to do so in your scheme. >You cannot queue events forever in unlimited numbers. > > I agree there should be some way of limiting events. I'll add this set of code. >As for ordering, this is a real problem, but not fundamental. >You can make user space locking work. IMHO it will not be >pretty if done with shell scripts, but it can work. >There _is_ a basic problem with the kernel 'overtaking' >user space in its view of the device tree, but you cannot solve >that _at_ _all_ in user space. > >In short, if you feel that the hotplug scheme is inadequate >for your needs, then write industry strength devfs2. > > devfs is not appropriate as it does not allow for complex policy with external attributes that the kernel is unaware of. For an example, lets take the situation where a policy must access a cluster-wide manager to determine some information before it can make a policy decision. For that to occur, there must be sockets, and hopefully libc, which puts the entire thing in user space. Who would want to write policies in the kernel? uck. > Regards > Oliver > > > > Thanks -steve ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 16:03 ` Steven Dake @ 2003-06-13 16:50 ` Patrick Mochel 2003-06-13 17:10 ` Steven Dake 2003-06-13 17:32 ` Oliver Neukum ` (2 subsequent siblings) 3 siblings, 1 reply; 73+ messages in thread From: Patrick Mochel @ 2003-06-13 16:50 UTC (permalink / raw) To: Steven Dake; +Cc: Oliver Neukum, linux-kernel > >+static int sdeq_open (struct inode *inode, struct file *file) > >+{ > >+ MOD_INC_USE_COUNT; > >+ > >+ return 0; > >+} > >+ > >+static int sdeq_release (struct inode *inode, struct file *file) > >+{ > >+ MOD_DEC_USE_COUNT; > >+ > >+ return (0); > >+} > > > >Wrong. release does not map to close() > > > > > > > hmm not sure where I got that from i'll fix thanks. You don't even need to modify the module refcount, since sdeq cannot even be built as a module. Also, return is a statement, not a function; you could at least be consistent. > For device enumeration, I see a daemon as necessary. The main goal of > this work is to solve the out-of-order execution of sbin/hotplug and > improve performance of the system during device enumeration with > significant (200 disks, 4 partitions each) amounts of devices. Boot > time with this scheme appears, in my rudimentary tests, to be faster on > the order of 1-2 seconds for bootup for the case of just 12 disks. I > would imagine 200 disks (which I don't have a good way to test, as I > don't have 200 disks:) would provide better speed gains during bootup. > This compares greg's original udev to this patched udev binary. This part I really don't get about your argument. For one, please back up your claim with the method that you used to test and post some real numbers, showing real improvement. If you would like more hardware to test on, we have an entire lab full of large systems specifically for purposes like this. I don't know if we have a 200 disk library, but you can see what you can get at: http://osdl.org/stp/ For another, you're preaching about availability, from I presume the carrier-grade camp, since Montavista has a CG distro, and you're involved in our own CGL working group. But, I have never heard of a carrier-grade system that used 12 disks, let alone 200. I fail to see how you're backing up your argument about the 1 second that you saving with a realistic example. But, I also don't think that the speed hit would increase linearly. Besides, even if the system had access to 200+ disks, it's probably in a separate library, accessible to an/the entire cluster. It's likely also a commercial storage library, and not a linux system. Please, educate me and prove me wrong; I'm not opposed to new ways of thinking. > devfs is not appropriate as it does not allow for complex policy with > external attributes that the kernel is unaware of. For an example, lets > take the situation where a policy must access a cluster-wide manager to > determine some information before it can make a policy decision. For > that to occur, there must be sockets, and hopefully libc, which puts the > entire thing in user space. Who would want to write policies in the > kernel? uck. Cluster-wide manager of policy? Fine, you can do that with /sbin/hotplug, too. But, I don't think network latency is going to help your boot time.. -pat ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 16:50 ` Patrick Mochel @ 2003-06-13 17:10 ` Steven Dake 2003-06-13 18:26 ` Greg KH 2003-06-13 19:02 ` Patrick Mansfield 0 siblings, 2 replies; 73+ messages in thread From: Steven Dake @ 2003-06-13 17:10 UTC (permalink / raw) To: Patrick Mochel; +Cc: Oliver Neukum, linux-kernel Patrick Mochel wrote: >>>+static int sdeq_open (struct inode *inode, struct file *file) >>>+{ >>>+ MOD_INC_USE_COUNT; >>>+ >>>+ return 0; >>>+} >>>+ >>>+static int sdeq_release (struct inode *inode, struct file *file) >>>+{ >>>+ MOD_DEC_USE_COUNT; >>>+ >>>+ return (0); >>>+} >>> >>>Wrong. release does not map to close() >>> >>> >>> >>> >>> >>hmm not sure where I got that from i'll fix thanks. >> >> > >You don't even need to modify the module refcount, since sdeq cannot even >be built as a module. Also, return is a statement, not a function; you >could at least be consistent. > > Yes this area needs some work. I copied from a previous driver, which was obviously also not consistent:) > > >>For device enumeration, I see a daemon as necessary. The main goal of >>this work is to solve the out-of-order execution of sbin/hotplug and >>improve performance of the system during device enumeration with >>significant (200 disks, 4 partitions each) amounts of devices. Boot >>time with this scheme appears, in my rudimentary tests, to be faster on >>the order of 1-2 seconds for bootup for the case of just 12 disks. I >>would imagine 200 disks (which I don't have a good way to test, as I >>don't have 200 disks:) would provide better speed gains during bootup. >>This compares greg's original udev to this patched udev binary. >> >> > >This part I really don't get about your argument. > >For one, please back up your claim with the method that you used to test >and post some real numbers, showing real improvement. If you would like >more hardware to test on, we have an entire lab full of large systems >specifically for purposes like this. I don't know if we have a 200 disk >library, but you can see what you can get at: > > http://osdl.org/stp/ > > Brian Jackson (open gfs project) has offered to run a test case. I'll work with him to determine a test methodology and test results. > >For another, you're preaching about availability, from I presume the >carrier-grade camp, since Montavista has a CG distro, and you're involved >in our own CGL working group. But, I have never heard of a carrier-grade >system that used 12 disks, let alone 200. I fail to see how you're backing >up your argument about the 1 second that you saving with a realistic >example. > > There are two major types of carrier grade applications. Data plane and Control plane. Data plane applications generally don't use disks at all and this is probably what your most familiar with. Control plane applications (such as a home location register) or billing systems etc often use multitudes of disks for databases. Currently Linux is well suited for control plane applications on x86 with storage, and data plane applications on ppc without storage. >But, I also don't think that the speed hit would increase linearly. > >Besides, even if the system had access to 200+ disks, it's probably in a >separate library, accessible to an/the entire cluster. It's likely also a >commercial storage library, and not a linux system. > >Please, educate me and prove me wrong; I'm not opposed to new ways of >thinking. > > Future designs (such as ATCA), of which I am mostly concerned with, include storage in the chassis itself. A chassis may have 12 slots, of which 10 could be disks or cpu slots. The remaining two slots are fibrechannel hubs or ethernet switches. The fibrechannel hubs can (and often are) tied together through front-panel connectors, to form a rack or multiple racks in a cluster. Fibrechannel allows for 126 devices in a FCAL. There are two fibrechannel hubs per chassis, which allows for each cpu to see a total of 252 devices. now some of these devices could certainly be CPUs, but the ratio is atleast 1cpu:2disks since each cpu blade should use RAID 1 to protect against disk failure and provide easy hotswap. Then there are of course partitions, of which there could be many, and of which each one is a seperate device seperately enumerated in the system. So a realistic system, currently shipping today for the telecom market, could require the enumeration of 16(partitions)*168(disks) More realistically, each system may have 3 partitions, which is 3*168 disks, or even more realistically, each system may only have 1/4 as many disks, which is 3*42, or 120 disks (times two channels, 240 device enumerations). Brian Jackson has said there are 50 disks in the OSDL cluster, of which he could put 3 partitions on, which would test 150 device enumerations. I propose the following test: Create 3 partitions on all 50 disks. Create system to boot kernel.org 2.5.70 with initramdisk. Try greg's original udev without kernel aptch, try modified udev with kernel patch, time each system's startup time. Do you agree with the test methodology? If so, then we can proceed with gathering real data. Thanks -steve > > >>devfs is not appropriate as it does not allow for complex policy with >>external attributes that the kernel is unaware of. For an example, lets >>take the situation where a policy must access a cluster-wide manager to >>determine some information before it can make a policy decision. For >>that to occur, there must be sockets, and hopefully libc, which puts the >>entire thing in user space. Who would want to write policies in the >>kernel? uck. >> >> > >Cluster-wide manager of policy? Fine, you can do that with /sbin/hotplug, >too. But, I don't think network latency is going to help your boot time.. > > > yes this can be done with /sbin/hotplug, but not with devfs in the kernel. This was just an example, as there may be other policies which can only be done in the C language and not accessible from the kernel. /sbin/hotplug, of course, could have the same access as any other policy. > -pat > > > > > ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 17:10 ` Steven Dake @ 2003-06-13 18:26 ` Greg KH 2003-06-13 19:02 ` Patrick Mansfield 1 sibling, 0 replies; 73+ messages in thread From: Greg KH @ 2003-06-13 18:26 UTC (permalink / raw) To: Steven Dake; +Cc: Patrick Mochel, Oliver Neukum, linux-kernel On Fri, Jun 13, 2003 at 10:10:15AM -0700, Steven Dake wrote: > Brian Jackson has said there are 50 disks in the OSDL cluster, of which > he could put 3 partitions on, which would test 150 device enumerations. You can also create 2000+ virtual disks today on your desktop using scsi-debug to eliminate any hardware spin up times :) thanks, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 17:10 ` Steven Dake 2003-06-13 18:26 ` Greg KH @ 2003-06-13 19:02 ` Patrick Mansfield 1 sibling, 0 replies; 73+ messages in thread From: Patrick Mansfield @ 2003-06-13 19:02 UTC (permalink / raw) To: Steven Dake; +Cc: Patrick Mochel, Oliver Neukum, linux-kernel On Fri, Jun 13, 2003 at 10:10:15AM -0700, Steven Dake wrote: > I propose the following test: > Create 3 partitions on all 50 disks. Create system to boot kernel.org > 2.5.70 with initramdisk. Try greg's original udev without kernel aptch, > try modified udev with kernel patch, time each system's startup time. > > Do you agree with the test methodology? If so, then we can proceed with > gathering real data. I've seen a large variance on detect/probe/scan for fibre disks depending on whether the feral or qlogic drive is used. This is with switch attached fibre storage. The qlogic takes more time to probe, but has shorter timeouts when trying to detect disabled or disconnected ports. On my system (where I have a few unconnected fibre cards), the insmod of the qlogic and feral seem to be about equal (I haven't timed the actual differences, I normally don't build a kernel or modules with both feral and qlogic available, they are both available only outside of the mainline kernel). If I removed or connected the unconnected cards, the feral driver would be faster. You could probably improve boot time (or modprobe time) more by speeding up the qlogic driver or using the feral driver rather than trying to speed up the hotplug/whatever. Note that scsi_debug driver has a default IO response time of 1 tick (scsi_debug_delay), you can set it to zero, but then it has to wakeup the softirqd to complete the IO. So you you either rely on a timeout to complete an IO, or a wakeup of the ksoftirqd. I was trying to measure IO latencies at one point using scsi_debug, but I seemed to end up measuring context switch times instead. -- Patrick Mansfield ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 16:03 ` Steven Dake 2003-06-13 16:50 ` Patrick Mochel @ 2003-06-13 17:32 ` Oliver Neukum 2003-06-13 18:23 ` Greg KH 2003-06-13 18:24 ` Greg KH 3 siblings, 0 replies; 73+ messages in thread From: Oliver Neukum @ 2003-06-13 17:32 UTC (permalink / raw) To: Steven Dake; +Cc: linux-kernel > >Aside from that, what exactly are you trying to do? > >You are not solving the fundamental device node reuse race, > >yet you are making necessary a further demon. > > For device enumeration, I see a daemon as necessary. The main goal of > this work is to solve the out-of-order execution of sbin/hotplug and > improve performance of the system during device enumeration with > significant (200 disks, 4 partitions each) amounts of devices. Boot > time with this scheme appears, in my rudimentary tests, to be faster on > the order of 1-2 seconds for bootup for the case of just 12 disks. I > would imagine 200 disks (which I don't have a good way to test, as I > don't have 200 disks:) would provide better speed gains during bootup. > This compares greg's original udev to this patched udev binary. You will not beat a pure devfs solution in terms of speed. Enumeration is not a real problem. The real target of the hotplug system is configuration, not enumeration. > >You are not addressing queue limits. The current hotplug > >scheme does so, admittedly crudely by failing to spawn > >a task, but considering the small numbers of events in > >question here, for the time being we can live with that. > > > >You can just as well add load control and error detection > >to the current scheme. You fail to do so in your scheme. > >You cannot queue events forever in unlimited numbers. > > I agree there should be some way of limiting events. I'll add this set > of code. And you need to report loss of events. Doing this all by ioctls is very questionable. > >As for ordering, this is a real problem, but not fundamental. > >You can make user space locking work. IMHO it will not be > >pretty if done with shell scripts, but it can work. > >There _is_ a basic problem with the kernel 'overtaking' > >user space in its view of the device tree, but you cannot solve > >that _at_ _all_ in user space. > > > >In short, if you feel that the hotplug scheme is inadequate > >for your needs, then write industry strength devfs2. > > devfs is not appropriate as it does not allow for complex policy with > external attributes that the kernel is unaware of. For an example, lets > take the situation where a policy must access a cluster-wide manager to > determine some information before it can make a policy decision. For > that to occur, there must be sockets, and hopefully libc, which puts the > entire thing in user space. Who would want to write policies in the > kernel? uck. Then the method of invocation of your configurator will not matter with regards to speed. Out of order issues matter, but can be solved easier. It seems to me that you should take a close look at devfs and devfsd. Regards Oliver ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 16:03 ` Steven Dake 2003-06-13 16:50 ` Patrick Mochel 2003-06-13 17:32 ` Oliver Neukum @ 2003-06-13 18:23 ` Greg KH 2003-06-13 18:24 ` Greg KH 3 siblings, 0 replies; 73+ messages in thread From: Greg KH @ 2003-06-13 18:23 UTC (permalink / raw) To: Steven Dake; +Cc: Oliver Neukum, linux-kernel On Fri, Jun 13, 2003 at 09:03:19AM -0700, Steven Dake wrote: > >Aside from that, what exactly are you trying to do? > >You are not solving the fundamental device node reuse race, > >yet you are making necessary a further demon. > > > > > For device enumeration, I see a daemon as necessary. The main goal of > this work is to solve the out-of-order execution of sbin/hotplug and > improve performance of the system during device enumeration with > significant (200 disks, 4 partitions each) amounts of devices. Boot > time with this scheme appears, in my rudimentary tests, to be faster on > the order of 1-2 seconds for bootup for the case of just 12 disks. I > would imagine 200 disks (which I don't have a good way to test, as I > don't have 200 disks:) would provide better speed gains during bootup. > This compares greg's original udev to this patched udev binary. You're also using a 500k dynamically linked binary. Please don't do that for testing/real life. Use the 6k udev binary for you tests... thanks, greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 16:03 ` Steven Dake ` (2 preceding siblings ...) 2003-06-13 18:23 ` Greg KH @ 2003-06-13 18:24 ` Greg KH 3 siblings, 0 replies; 73+ messages in thread From: Greg KH @ 2003-06-13 18:24 UTC (permalink / raw) To: Steven Dake; +Cc: Oliver Neukum, linux-kernel On Fri, Jun 13, 2003 at 09:03:19AM -0700, Steven Dake wrote: > devfs is not appropriate as it does not allow for complex policy with > external attributes that the kernel is unaware of. For an example, lets > take the situation where a policy must access a cluster-wide manager to > determine some information before it can make a policy decision. For > that to occur, there must be sockets, and hopefully libc, which puts the > entire thing in user space. Who would want to write policies in the > kernel? uck. Look at devfsd, it is in userspace. greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 19:10 [PATCH] udev enhancements to use kernel event queue Steven Dake ` (2 preceding siblings ...) 2003-06-12 22:27 ` Oliver Neukum @ 2003-06-13 7:17 ` Christoph Hellwig 2003-06-13 18:06 ` Linus Torvalds 4 siblings, 0 replies; 73+ messages in thread From: Christoph Hellwig @ 2003-06-13 7:17 UTC (permalink / raw) To: Steven Dake; +Cc: linux-kernel On Thu, Jun 12, 2003 at 12:10:48PM -0700, Steven Dake wrote: > Folks, > > I have been looking at the udev idea that Greg KH has developed. > Userland device enumeration definately is the way to go, however, there > are some problems with using /sbin/hotplug to transmit device > enumeration events: Given the comments already on the list I won't repeat commetning on your attitued, but if you really think the maintainers and everyone else is totally clueless you should probably start the mail with that so every is aware of it.. So if you can prove the fork+exec really is a problem I'd suggest you go for an existing, standard interface instead. That would be netlink and not some chardev ioctl hack.. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-12 19:10 [PATCH] udev enhancements to use kernel event queue Steven Dake ` (3 preceding siblings ...) 2003-06-13 7:17 ` Christoph Hellwig @ 2003-06-13 18:06 ` Linus Torvalds 2003-06-19 23:32 ` Werner Almesberger 4 siblings, 1 reply; 73+ messages in thread From: Linus Torvalds @ 2003-06-13 18:06 UTC (permalink / raw) To: linux-kernel In article <3EE8D038.7090600@mvista.com>, Steven Dake <sdake@mvista.com> wrote: > >I have been looking at the udev idea that Greg KH has developed. >Userland device enumeration definately is the way to go, however, there >are some problems with using /sbin/hotplug to transmit device >enumeration events: No. WE ARE NOT GOING TO A EVENT DEAMON! Centralized event deamons are crap, unmaintainable, and unreadable. They are a maintenance nightmare, both from a development standpoint and from a MIS standpoint. They encourage doing everything in one program, keeping state in private memory, depending on ordering, and just generally do bad things. /sbin/hotplug, on the other hand, makes events clearly independent things, and encourages writing simple scripts that can be combined and localized. In other words, it's the UNIX way. I've seen the madness of event deamons, and it's called "cardmgr" and "acpid". We don't want it. Linus ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-13 18:06 ` Linus Torvalds @ 2003-06-19 23:32 ` Werner Almesberger 2003-06-19 23:42 ` Steven Dake 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-06-19 23:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Linus Torvalds wrote: > [ Event demons ] encourage doing everything in one program, > keeping state in private memory, depending on ordering, and just > generally do bad things. Well, the ordering bit is the hairy part. As long as it doesn't matter if an event gets lost every once in a while, and in which order they get processed, things are fine as they are. But then it scares me to see people start to try to design some general serialization mechanism on top of /sbin/hotplug - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 23:32 ` Werner Almesberger @ 2003-06-19 23:42 ` Steven Dake 2003-06-20 0:05 ` Linus Torvalds 0 siblings, 1 reply; 73+ messages in thread From: Steven Dake @ 2003-06-19 23:42 UTC (permalink / raw) To: Werner Almesberger; +Cc: Linus Torvalds, linux-kernel Werner Almesberger wrote: >Linus Torvalds wrote: > > >>[ Event demons ] encourage doing everything in one program, >>keeping state in private memory, depending on ordering, and just >>generally do bad things. >> >> > >Well, the ordering bit is the hairy part. As long as it doesn't >matter if an event gets lost every once in a while, and in which >order they get processed, things are fine as they are. > >But then it scares me to see people start to try to design some >general serialization mechanism on top of /sbin/hotplug > > A serialization methodology can be built on /sbin/hotplug, but it has all of the problems that Linus previously talked about for a kernel event queue. The difference is that the problem is moved to userland. >- Werner > > -steve ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-19 23:42 ` Steven Dake @ 2003-06-20 0:05 ` Linus Torvalds 2003-06-20 0:10 ` Steven Dake 0 siblings, 1 reply; 73+ messages in thread From: Linus Torvalds @ 2003-06-20 0:05 UTC (permalink / raw) To: Steven Dake; +Cc: Werner Almesberger, linux-kernel On Thu, 19 Jun 2003, Steven Dake wrote: > > A serialization methodology can be built on /sbin/hotplug, but it has > all of the problems that Linus previously talked about for a kernel > event queue. The difference is that the problem is moved to userland. Having event ordering is a trivial matter, and I'm not against adding a sequence number to /sbin/hotplug as part of the environment, for example. What I worry about is the situation where one big deamon handles everything, which makes it impossible to "hook in" to the thing without understanding the one big thing. The thing that makes /sbin/hotplug so wonderful is that it's stateless, and if you want to hook into it, it's absolutely _trivial_. Look at the default script there in redhat-9 for example, and it's obvious how to hook up to certain events etc. And why do people care about serialization anyway? Really? The whole notion is ludicrous. /sbin/hotplug _shouldn't_ be serialized. Serialization is bad. You should look at whatever problems you have with it now, and ask yourself whether maybe it should be done some other way entirely. Linus ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-20 0:05 ` Linus Torvalds @ 2003-06-20 0:10 ` Steven Dake 2003-06-20 0:43 ` Linus Torvalds 0 siblings, 1 reply; 73+ messages in thread From: Steven Dake @ 2003-06-20 0:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Werner Almesberger, linux-kernel Linus Torvalds wrote: >On Thu, 19 Jun 2003, Steven Dake wrote: > > >>A serialization methodology can be built on /sbin/hotplug, but it has >>all of the problems that Linus previously talked about for a kernel >>event queue. The difference is that the problem is moved to userland. >> >> > >Having event ordering is a trivial matter, and I'm not against adding a >sequence number to /sbin/hotplug as part of the environment, for example. > >What I worry about is the situation where one big deamon handles >everything, which makes it impossible to "hook in" to the thing without >understanding the one big thing. > >The thing that makes /sbin/hotplug so wonderful is that it's stateless, >and if you want to hook into it, it's absolutely _trivial_. Look at the >default script there in redhat-9 for example, and it's obvious how to hook >up to certain events etc. > >And why do people care about serialization anyway? Really? The whole >notion is ludicrous. /sbin/hotplug _shouldn't_ be serialized. >Serialization is bad. You should look at whatever problems you have with >it now, and ask yourself whether maybe it should be done some other way >entirely. > > Serialization is important for the case of a device state change occuring rapidly. An example is a device add and then remove, which results in (if out of order) the add occuring after the remove, leaving a dead device node in the filesystem. This is tolerable. What isn't as tolerable is a remove and then an add, where the add occurs out of order first. In this case, a device node should exist on the filesystem, but instead doesn't because the remove has come along and removed it. This occurance does happen much more often then you might think. When changing disk partitions, for example, items are removed and then added several times before the devices are finally instantiated. Definately to be avoided. Thanks -steve > Linus > > > > > ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-20 0:10 ` Steven Dake @ 2003-06-20 0:43 ` Linus Torvalds 2003-06-20 2:26 ` Werner Almesberger 2003-06-20 17:03 ` Steven Dake 0 siblings, 2 replies; 73+ messages in thread From: Linus Torvalds @ 2003-06-20 0:43 UTC (permalink / raw) To: Steven Dake; +Cc: Werner Almesberger, linux-kernel On Thu, 19 Jun 2003, Steven Dake wrote: > > Serialization is important for the case of a device state change > occuring rapidly. An example is a device add and then remove, which > results in (if out of order) the add occuring after the remove, leaving > a dead device node in the filesystem. This is tolerable. What isn't as > tolerable is a remove and then an add, where the add occurs out of order > first. In this case, a device node should exist on the filesystem, but > instead doesn't because the remove has come along and removed it. This still isn't a global serialization thing, though. For example, there's no reason _another_ device should be serialized with the disk discovery. And there are lots and lots of reasons why they should not EVER be serialized, especially since full serialization (like through a shared event deamon) results is _serious_ problems if one of the events end up being a slow thing that needs to query and/or spin up hardware. In other words, once you start talking about device nodes, you really want to handle the serialization on a per-node basis. Having some kind of kernel support for that is quite possible a good idea, but care should be taken that non-dependent events shouldn't be serialized. An easy way to do partial serialization is something like this: - each "call_usermodehelper()" gets associated with two numbers: the "class number" and the "sequence number within a class". We keep track of outstanding usermodehelpers. Add the class and sequence number to the environment of the hotplug execution so that the user-mode hotplug code can query them. - we add a simple interface to allow a user mode helper to ask the kernel to "please wait until class X has no pending sequence numbers smaller than Y" See what I'm aiming at? The class numbers shouldn't have any particular meaning, they're just a random number (it might be the bus number of the device that hotplugged, for example). But this allows a hotplug thing to say "if there are other outstanding hotplug things in my class that were started before me, wait here". Linus ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-20 0:43 ` Linus Torvalds @ 2003-06-20 2:26 ` Werner Almesberger 2003-06-20 17:03 ` Steven Dake 1 sibling, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-06-20 2:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Steven Dake, linux-kernel Linus Torvalds wrote: > This still isn't a global serialization thing, though. For example, > there's no reason _another_ device should be serialized with the disk > discovery. Yes, that sounds a lot better. Let's hope nobody discovers such a reason ... > An easy way to do partial serialization is something like this: That would work. It's in fact a per-class message queue, in which sleeping processes play the role of messages ;-) I thought a bit about what else one could do with this interface: - if you need loss/error handling, and are willing to keep one bit of information for each such class in the kernel, the it could register whether processing the previous event has succeeded, i.e. whether the kernel was able to launch a helper, and whether that helper has returned with exit status 0. This bit could be returned by the synchronization call. - in principle it would now be easy, and in many ways cleaner and more efficient, to turn this into a per-class event demon, with a "real" event queue. The problem is that you'd probably still want classes that are not serialized, and the kernel wouldn't know whether a given class needs serialization before /sbin/hotplug has run for a while. Neiter configuring class behaviour nor adding an "asychronous class" look particularly attractive to me. Too bad. A synchronous /sbin/hotplug as follows would certainly be nice, e.g. assuming that in each class only one event type can occur: #!/bin/sh agent=$1 read event set $event <... normal /sbin/hotplug code ...> or, the high-performance variant: #!/bin/sh exec /etc/hotplug/$1.agent #!/bin/sh <... do really expensive initialization ...> while read event; do set $event <... normal /etc/hotplug/*.agent code ...> done - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-20 0:43 ` Linus Torvalds 2003-06-20 2:26 ` Werner Almesberger @ 2003-06-20 17:03 ` Steven Dake 2003-06-20 17:18 ` Patrick Mochel 1 sibling, 1 reply; 73+ messages in thread From: Steven Dake @ 2003-06-20 17:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Werner Almesberger, linux-kernel Linus Torvalds wrote: >On Thu, 19 Jun 2003, Steven Dake wrote: > > >>Serialization is important for the case of a device state change >>occuring rapidly. An example is a device add and then remove, which >>results in (if out of order) the add occuring after the remove, leaving >>a dead device node in the filesystem. This is tolerable. What isn't as >>tolerable is a remove and then an add, where the add occurs out of order >>first. In this case, a device node should exist on the filesystem, but >>instead doesn't because the remove has come along and removed it. >> >> > >This still isn't a global serialization thing, though. For example, >there's no reason _another_ device should be serialized with the disk >discovery. > >And there are lots and lots of reasons why they should not EVER be >serialized, especially since full serialization (like through a shared >event deamon) results is _serious_ problems if one of the events end up >being a slow thing that needs to query and/or spin up hardware. > >In other words, once you start talking about device nodes, you really want >to handle the serialization on a per-node basis. Having some kind of >kernel support for that is quite possible a good idea, but care should be >taken that non-dependent events shouldn't be serialized. > >An easy way to do partial serialization is something like this: > > - each "call_usermodehelper()" gets associated with two numbers: the > "class number" and the "sequence number within a class". We keep track > of outstanding usermodehelpers. > > Add the class and sequence number to the environment of the hotplug > execution so that the user-mode hotplug code can query them. > > - we add a simple interface to allow a user mode helper to ask the kernel > to "please wait until class X has no pending sequence numbers smaller > than Y" > >See what I'm aiming at? The class numbers shouldn't have any particular >meaning, they're just a random number (it might be the bus number of the >device that hotplugged, for example). But this allows a hotplug thing to >say "if there are other outstanding hotplug things in my class that were >started before me, wait here". > > > A class number is an excellent idea, and could have potential performance gains. I thought of making such a patch, but expected it would be rejected since it is necessarily required and would require changes to significant sections of the kernel (to add the class to each object type). Have any better ideas for an implementation that doesn't touch so many sections of the kernel? > Linus > > > > > ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue 2003-06-20 17:03 ` Steven Dake @ 2003-06-20 17:18 ` Patrick Mochel 0 siblings, 0 replies; 73+ messages in thread From: Patrick Mochel @ 2003-06-20 17:18 UTC (permalink / raw) To: Steven Dake; +Cc: Linus Torvalds, Werner Almesberger, linux-kernel > A class number is an excellent idea, and could have potential > performance gains. I thought of making such a patch, but expected it > would be rejected since it is necessarily required and would require > changes to significant sections of the kernel (to add the class to each > object type). > > Have any better ideas for an implementation that doesn't touch so many > sections of the kernel? It should be trivial - we can use the subsystem name that the kobject belongs to for the class ID. And, struct subsystem can gain a sequence number, instead of having a global one in lib/kobject.c. -pat ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH] udev enhancements to use kernel event queue @ 2003-06-14 6:36 John Bradford 0 siblings, 0 replies; 73+ messages in thread From: John Bradford @ 2003-06-14 6:36 UTC (permalink / raw) To: alan, joe.korty; +Cc: akpm, greg, linux-kernel, mochel, paulus, rml, sdake > > > Turns out its 486 and higher, so you win. > > Perhaps it's time to remove 386 support from 2.5? Users > > of very old machines can stick with 2.0, 2.2 or 2.4. I strongly disagree, 386 support is important for the embedded market. > You have to solve these problems generally anyway, and lots of > other platforms have limited locking functions. Including future Linux platforms. (possibly). John. ^ permalink raw reply [flat|nested] 73+ messages in thread
end of thread, other threads:[~2003-06-26 14:21 UTC | newest] Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-06-12 19:10 [PATCH] udev enhancements to use kernel event queue Steven Dake 2003-06-12 21:47 ` Patrick Mochel 2003-06-13 16:31 ` Steven Dake 2003-06-13 17:06 ` Patrick Mochel 2003-06-12 21:47 ` Greg KH 2003-06-12 22:03 ` Andrew Morton 2003-06-12 22:50 ` Greg KH 2003-06-12 22:51 ` Andrew Morton 2003-06-12 23:02 ` Greg KH 2003-06-12 23:09 ` Greg KH 2003-06-12 23:14 ` Patrick Mochel 2003-06-12 23:16 ` Robert Love 2003-06-12 23:25 ` Greg KH 2003-06-13 20:01 ` Oliver Neukum 2003-06-18 22:59 ` Greg KH 2003-06-19 0:12 ` Kevin P. Fleming 2003-06-19 0:20 ` Greg KH 2003-06-19 0:42 ` Kevin P. Fleming 2003-06-19 0:51 ` Greg KH 2003-06-19 0:25 ` Andrew Morton 2003-06-19 6:27 ` Oliver Neukum 2003-06-12 23:25 ` Patrick Mochel 2003-06-12 23:29 ` Robert Love 2003-06-12 23:32 ` Greg KH 2003-06-12 23:34 ` Patrick Mochel 2003-06-12 23:40 ` Paul Mackerras 2003-06-12 23:50 ` Robert Love 2003-06-13 12:44 ` Richard B. Johnson 2003-06-13 12:54 ` Olivier Galibert 2003-06-12 23:52 ` Patrick Mochel 2003-06-13 8:41 ` Alan Cox 2003-06-13 11:00 ` Paul Mackerras 2003-06-13 11:07 ` Herbert Xu 2003-06-13 13:31 ` Alan Cox 2003-06-13 19:57 ` Joe Korty 2003-06-13 21:10 ` Alan Cox 2003-06-13 11:17 ` David Schwartz 2003-06-13 15:44 ` Davide Libenzi 2003-06-12 23:42 ` Robert Love 2003-06-12 23:45 ` Davide Libenzi 2003-06-12 23:05 ` Robert Love 2003-06-19 19:51 ` Werner Almesberger 2003-06-26 12:17 ` Tommi Virtanen 2003-06-26 14:35 ` Werner Almesberger 2003-06-13 8:40 ` Alan Cox 2003-06-13 9:14 ` Olivier Galibert 2003-06-19 20:53 ` Werner Almesberger 2003-06-13 16:05 ` Steven Dake 2003-06-13 16:32 ` Greg KH 2003-06-13 15:51 ` Steven Dake 2003-06-13 16:41 ` Patrick Mochel 2003-06-13 16:42 ` Greg KH 2003-06-13 17:17 ` Chris Friesen 2003-06-12 22:27 ` Oliver Neukum 2003-06-13 16:03 ` Steven Dake 2003-06-13 16:50 ` Patrick Mochel 2003-06-13 17:10 ` Steven Dake 2003-06-13 18:26 ` Greg KH 2003-06-13 19:02 ` Patrick Mansfield 2003-06-13 17:32 ` Oliver Neukum 2003-06-13 18:23 ` Greg KH 2003-06-13 18:24 ` Greg KH 2003-06-13 7:17 ` Christoph Hellwig 2003-06-13 18:06 ` Linus Torvalds 2003-06-19 23:32 ` Werner Almesberger 2003-06-19 23:42 ` Steven Dake 2003-06-20 0:05 ` Linus Torvalds 2003-06-20 0:10 ` Steven Dake 2003-06-20 0:43 ` Linus Torvalds 2003-06-20 2:26 ` Werner Almesberger 2003-06-20 17:03 ` Steven Dake 2003-06-20 17:18 ` Patrick Mochel 2003-06-14 6:36 John Bradford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).