* Re: [GIT PULL] kdbus for 4.1-rc1
@ 2015-04-15 18:18 Linus Torvalds
2015-04-15 18:28 ` Linus Torvalds
` (3 more replies)
0 siblings, 4 replies; 333+ messages in thread
From: Linus Torvalds @ 2015-04-15 18:18 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro,
Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann,
Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
David Herrmann, Djalal Harouni
On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
>>
>> I'll argue that you can't fix the later one. One thing that I've observed over
>> the years of having faster computers is, as soon as you make it faster, people
>> will write slower software.
>>
>> Currently the issue is that we have thousands of dbus queries, you make dbus
>> 10x faster, I guarantee that people will write software with 10 thousand dbus
>> queries and we are no better off than we are today.
>
> Then they get to buy a faster machine :)
Is there actually a performance issue?
I've seen this claimed, but I have never seen any actual numbers. What
speeds up? By how much? is it actually measurable?
Maybe they've marched past me in this thread-from-hell. But I can't
recall having seen any (not now, not before).
That said, I think the more serious issue is that if Luto complains
about the capability-capturing code being completely broken, then
people need to take that *seriously*.
Linus
^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:18 [GIT PULL] kdbus for 4.1-rc1 Linus Torvalds @ 2015-04-15 18:28 ` Linus Torvalds 2015-04-15 18:37 ` Greg Kroah-Hartman 2015-04-15 22:16 ` One Thousand Gnomes 2015-04-15 18:37 ` Greg Kroah-Hartman ` (2 subsequent siblings) 3 siblings, 2 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-15 18:28 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I've seen this claimed, but I have never seen any actual numbers. What > speeds up? By how much? is it actually measurable? And just to clarify: by "what speeds up, and by how much", I do _not_ mean "sending a dbus message speeds up by 10x and avoids context switches". I've seen _those_ numbers. But does it actually matter? It was more of a "there are thousands of dbus messages during boot, but can you actually measure the speedup?" question. That's the kind of numbers I've not seen. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:28 ` Linus Torvalds @ 2015-04-15 18:37 ` Greg Kroah-Hartman 2015-04-15 22:16 ` One Thousand Gnomes 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 18:37 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:28:58AM -0700, Linus Torvalds wrote: > On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I've seen this claimed, but I have never seen any actual numbers. What > > speeds up? By how much? is it actually measurable? > > And just to clarify: by "what speeds up, and by how much", I do _not_ > mean "sending a dbus message speeds up by 10x and avoids context > switches". I've seen _those_ numbers. But does it actually matter? > > It was more of a "there are thousands of dbus messages during boot, > but can you actually measure the speedup?" question. That's the kind > of numbers I've not seen. Someone from BMW did the testing on one of their car systems a while ago and posted some numbers, it was a factor of 10 faster. I'll try to dig it up, but it was burried in a powerpoint presentation, so it might be hard to find, give me a day. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:28 ` Linus Torvalds 2015-04-15 18:37 ` Greg Kroah-Hartman @ 2015-04-15 22:16 ` One Thousand Gnomes 1 sibling, 0 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 22:16 UTC (permalink / raw) To: Linus Torvalds Cc: Greg Kroah-Hartman, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 11:28:58 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I've seen this claimed, but I have never seen any actual numbers. What > > speeds up? By how much? is it actually measurable? > > And just to clarify: by "what speeds up, and by how much", I do _not_ > mean "sending a dbus message speeds up by 10x and avoids context > switches". I've seen _those_ numbers. But does it actually matter? In the desktop case some of the desktop folks manage to get themselves to the point they send so many messages that it does. That's rather a reflection on people programming performance critical code armed with tools that are too easy to use combined with the fact that messaging is hard to understand and model latency-wise. There is a better way to fix those. For MPI and some of the 'we used to run on an RT nano-kernel' people then kdbus as proposed won't help - but they do have problems where (particularly on very slow processors) Linux is naturally enough not that comparable with a minimally memory protecting rtos doing atomic swaps on pointers in a shared memory. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:18 [GIT PULL] kdbus for 4.1-rc1 Linus Torvalds 2015-04-15 18:28 ` Linus Torvalds @ 2015-04-15 18:37 ` Greg Kroah-Hartman 2015-04-15 22:26 ` Andy Lutomirski 2015-04-16 18:20 ` David Herrmann 3 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 18:37 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:18:45AM -0700, Linus Torvalds wrote: > That said, I think the more serious issue is that if Luto complains > about the capability-capturing code being completely broken, then > people need to take that *seriously*. I am taking that seriously, it's been a long day, will respond to it tomorrow. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:18 [GIT PULL] kdbus for 4.1-rc1 Linus Torvalds 2015-04-15 18:28 ` Linus Torvalds 2015-04-15 18:37 ` Greg Kroah-Hartman @ 2015-04-15 22:26 ` Andy Lutomirski 2015-04-16 18:20 ` David Herrmann 3 siblings, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-15 22:26 UTC (permalink / raw) To: Linus Torvalds Cc: Greg Kroah-Hartman, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: >> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote: >>> >>> I'll argue that you can't fix the later one. One thing that I've observed over >>> the years of having faster computers is, as soon as you make it faster, people >>> will write slower software. >>> >>> Currently the issue is that we have thousands of dbus queries, you make dbus >>> 10x faster, I guarantee that people will write software with 10 thousand dbus >>> queries and we are no better off than we are today. >> >> Then they get to buy a faster machine :) > > Is there actually a performance issue? > > I've seen this claimed, but I have never seen any actual numbers. What > speeds up? By how much? is it actually measurable? > > Maybe they've marched past me in this thread-from-hell. But I can't > recall having seen any (not now, not before). > > That said, I think the more serious issue is that if Luto complains > about the capability-capturing code being completely broken, then > people need to take that *seriously*. To be fair: the userspace version in systemd is completely broken, and v1 of kdbus's was completely broken. v2's is, as far as I know, just conceptually wrong and highly unlikely to be useful in any legitimate fashion, but it's no longer obvious to me that it's exploitable. (That being said, Eric doesn't like it, and I haven't re-read it recently. So it could still be completely broken.) --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:18 [GIT PULL] kdbus for 4.1-rc1 Linus Torvalds ` (2 preceding siblings ...) 2015-04-15 22:26 ` Andy Lutomirski @ 2015-04-16 18:20 ` David Herrmann 2015-04-20 20:43 ` Richard Weinberger 3 siblings, 1 reply; 333+ messages in thread From: David Herrmann @ 2015-04-16 18:20 UTC (permalink / raw) To: Linus Torvalds Cc: Greg Kroah-Hartman, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi On Wed, Apr 15, 2015 at 8:18 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: >> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote: >>> >>> I'll argue that you can't fix the later one. One thing that I've observed over >>> the years of having faster computers is, as soon as you make it faster, people >>> will write slower software. >>> >>> Currently the issue is that we have thousands of dbus queries, you make dbus >>> 10x faster, I guarantee that people will write software with 10 thousand dbus >>> queries and we are no better off than we are today. >> >> Then they get to buy a faster machine :) > > Is there actually a performance issue? > > I've seen this claimed, but I have never seen any actual numbers. What > speeds up? By how much? is it actually measurable? For us, boot speed-up has not been the primary concern. The boot-time speedup that kdbus provides is unlikely to be significant on a generic linux distro today, given that nowadays the slow parts during bootup are firmware and hw initialization. The number of dbus messages sent during bootup of a general purpose Linux distro is relatively small. OTOH, during start-up of desktop environments, the dbus traffic is substantial, but until the porting of Qt and glib to kdbus has been completed and merged the real-world effect of this is minimal. Our interest in improved raw performance is mainly motivated by making dbus a viable protocol in situations where its semantics are appropriate. But for performance reasons one had to use custom protocols instead so far. Greg gave some examples in the cover letter, multi-media being the most obvious area where the ten-fold decrease in latency and the ability to efficiently copy large chunks of memory from one process to another is relevant. > Maybe they've marched past me in this thread-from-hell. But I can't > recall having seen any (not now, not before). > > That said, I think the more serious issue is that if Luto complains > about the capability-capturing code being completely broken, then > people need to take that *seriously*. Absolutely. In v3 of the patch set we addressed Andy's concerns, and now he seems to be convinced that the kdbus code is not vulnerable [1]. Thanks David [1] http://lists.freedesktop.org/archives/systemd-devel/2015-April/030776.html ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 18:20 ` David Herrmann @ 2015-04-20 20:43 ` Richard Weinberger 2015-04-20 20:56 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-20 20:43 UTC (permalink / raw) To: David Herrmann Cc: Linus Torvalds, Greg Kroah-Hartman, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni David, On Thu, Apr 16, 2015 at 8:20 PM, David Herrmann <dh.herrmann@gmail.com> wrote: > Hi > > On Wed, Apr 15, 2015 at 8:18 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >>> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote: >>>> >>>> I'll argue that you can't fix the later one. One thing that I've observed over >>>> the years of having faster computers is, as soon as you make it faster, people >>>> will write slower software. >>>> >>>> Currently the issue is that we have thousands of dbus queries, you make dbus >>>> 10x faster, I guarantee that people will write software with 10 thousand dbus >>>> queries and we are no better off than we are today. >>> >>> Then they get to buy a faster machine :) >> >> Is there actually a performance issue? >> >> I've seen this claimed, but I have never seen any actual numbers. What >> speeds up? By how much? is it actually measurable? > > For us, boot speed-up has not been the primary concern. The boot-time > speedup that kdbus provides is unlikely to be significant on a generic > linux distro today, given that nowadays the slow parts during bootup > are firmware and hw initialization. The number of dbus messages sent > during bootup of a general purpose Linux distro is relatively small. > OTOH, during start-up of desktop environments, the dbus traffic is > substantial, but until the porting of Qt and glib to kdbus has been > completed and merged the real-world effect of this is minimal. > > Our interest in improved raw performance is mainly motivated by making > dbus a viable protocol in situations where its semantics are > appropriate. But for performance reasons one had to use custom > protocols instead so far. Greg gave some examples in the cover letter, > multi-media being the most obvious area where the ten-fold decrease in > latency and the ability to efficiently copy large chunks of memory > from one process to another is relevant. In which situation on a common Linux system is the current dbus too slow today? I've never seen a issue like "Oh my system is slow because dbus is eating too much CPU cycles". dbus my have issues which are worth to fix. But moving dbus more or less in the kernel seems overkill. So, what exactly are these issues and why can't we add new IPC primitives to Linux which allow a decent userland dbus? To me kdbus seems much like an ad-hoc solution which is very dbus centric. IIRC Alan asked the same question. -- Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 20:43 ` Richard Weinberger @ 2015-04-20 20:56 ` Greg Kroah-Hartman 2015-04-20 21:16 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-20 20:56 UTC (permalink / raw) To: Richard Weinberger Cc: David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 10:43:19PM +0200, Richard Weinberger wrote: > David, > > On Thu, Apr 16, 2015 at 8:20 PM, David Herrmann <dh.herrmann@gmail.com> wrote: > > Hi > > > > On Wed, Apr 15, 2015 at 8:18 PM, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > >> On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman > >> <gregkh@linuxfoundation.org> wrote: > >>> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote: > >>>> > >>>> I'll argue that you can't fix the later one. One thing that I've observed over > >>>> the years of having faster computers is, as soon as you make it faster, people > >>>> will write slower software. > >>>> > >>>> Currently the issue is that we have thousands of dbus queries, you make dbus > >>>> 10x faster, I guarantee that people will write software with 10 thousand dbus > >>>> queries and we are no better off than we are today. > >>> > >>> Then they get to buy a faster machine :) > >> > >> Is there actually a performance issue? > >> > >> I've seen this claimed, but I have never seen any actual numbers. What > >> speeds up? By how much? is it actually measurable? > > > > For us, boot speed-up has not been the primary concern. The boot-time > > speedup that kdbus provides is unlikely to be significant on a generic > > linux distro today, given that nowadays the slow parts during bootup > > are firmware and hw initialization. The number of dbus messages sent > > during bootup of a general purpose Linux distro is relatively small. > > OTOH, during start-up of desktop environments, the dbus traffic is > > substantial, but until the porting of Qt and glib to kdbus has been > > completed and merged the real-world effect of this is minimal. > > > > Our interest in improved raw performance is mainly motivated by making > > dbus a viable protocol in situations where its semantics are > > appropriate. But for performance reasons one had to use custom > > protocols instead so far. Greg gave some examples in the cover letter, > > multi-media being the most obvious area where the ten-fold decrease in > > latency and the ability to efficiently copy large chunks of memory > > from one process to another is relevant. > > In which situation on a common Linux system is the current dbus too slow today? > I've never seen a issue like "Oh my system is slow because dbus is > eating too much CPU cycles". See the original email which explained all of the things we can not do with D-Bus, some of which are due to speed, that can now be done with the kdbus code. > dbus my have issues which are worth to fix. But moving dbus more or > less in the kernel > seems overkill. Why do you think so? How is this code "overkill"? > So, what exactly are these issues and why can't we add new IPC > primitives to Linux which > allow a decent userland dbus? That's exactly what this patchset does. > To me kdbus seems much like an ad-hoc solution which is very dbus centric. Yes it is, but the "dbus centric" thing is a valid model that is quite useful and in use by a lot of programs as it solves a real problem. > IIRC Alan asked the same question. Yes, you can build everything off of tiny socket calls, but when you do that, you end up with the D-Bus userspace implementation we have today, with the issues that it has. By moving portions of that model into the kernel, as is done here, it solves a number of these issues, and allows for a lot more flexibility and things to be done that are impossible with the current model of trying to build on top of tiny ipc functions. The existing code is much stripped down from what you think of as a D-Bus daemon today, only the exact needed pieces are implemented here. Do you see anything wrong with the code as is submitted (aside from the issues that Al has pointed out that are being resolved already?) thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 20:56 ` Greg Kroah-Hartman @ 2015-04-20 21:16 ` Richard Weinberger 2015-04-20 21:46 ` Greg Kroah-Hartman 2015-04-21 9:07 ` Johannes Stezenbach 0 siblings, 2 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-20 21:16 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Greg, Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman: >> In which situation on a common Linux system is the current dbus too slow today? >> I've never seen a issue like "Oh my system is slow because dbus is >> eating too much CPU cycles". > > See the original email which explained all of the things we can not do > with D-Bus, some of which are due to speed, that can now be done with the > kdbus code. okay, let's do it together. 1. Performance You write: "DBus is not used for performance sensitive applications because DBus is slow. We want to make it fast so we can finally use it for low-latency, high-throughput applications." Which applications exactly? This reads to me like a solution for a non-existing problem. 2. Security I don't think that you need a 13k piece of code in the kernel to solve that issue. 3. Semantics for apps with heavy data payloads Again, sounds like a solution for a non-existing problem. 4. "Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions." You really need a in-kernel dbus with 13k to solve that? 5. Eavesdropping on the kernel level Same here. 6. dbus-daemon is not available during early-boot or shutdown. Why do you need dbus in that stage? Can you now please start answering questions instead of pointing to random mails? >> dbus my have issues which are worth to fix. But moving dbus more or >> less in the kernel >> seems overkill. > > Why do you think so? How is this code "overkill"? Did you try to review it? ;-) No really, it is by means no IPC primitive it is a super high level in-kernel IPC and not trivial. >> So, what exactly are these issues and why can't we add new IPC >> primitives to Linux which >> allow a decent userland dbus? > > That's exactly what this patchset does. No, it moves dbus into the kernel. >> To me kdbus seems much like an ad-hoc solution which is very dbus centric. > > Yes it is, but the "dbus centric" thing is a valid model that is quite > useful and in use by a lot of programs as it solves a real problem. What programs? >> IIRC Alan asked the same question. > > Yes, you can build everything off of tiny socket calls, but when you do > that, you end up with the D-Bus userspace implementation we have today, > with the issues that it has. By moving portions of that model into the > kernel, as is done here, it solves a number of these issues, and allows > for a lot more flexibility and things to be done that are impossible > with the current model of trying to build on top of tiny ipc functions. > > The existing code is much stripped down from what you think of as a > D-Bus daemon today, only the exact needed pieces are implemented here. > > Do you see anything wrong with the code as is submitted (aside from the > issues that Al has pointed out that are being resolved already?) The code is fine, the concepts are fishy as pointed out many times in this thread. My point is that we should try hard to fix dbus instead of moving it into the kernel. Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 21:16 ` Richard Weinberger @ 2015-04-20 21:46 ` Greg Kroah-Hartman 2015-04-20 22:06 ` Andy Lutomirski 2015-04-21 8:18 ` Richard Cochran 2015-04-21 9:07 ` Johannes Stezenbach 1 sibling, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-20 21:46 UTC (permalink / raw) To: Richard Weinberger Cc: David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote: > Greg, > > Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman: > >> In which situation on a common Linux system is the current dbus too slow today? > >> I've never seen a issue like "Oh my system is slow because dbus is > >> eating too much CPU cycles". > > > > See the original email which explained all of the things we can not do > > with D-Bus, some of which are due to speed, that can now be done with the > > kdbus code. > > okay, let's do it together. > > 1. Performance > You write: > "DBus is not used for performance sensitive applications because DBus is slow. > We want to make it fast so we can finally use it for low-latency, > high-throughput applications." > > Which applications exactly? > This reads to me like a solution for a non-existing problem. Anything that uses UDS for large buffers today can switch to using kdbus for it's data stream as it is faster. I know the Pulse Audio people have discussed this, and there are other people as well (Enlightenment library developers, glib, wayland, etc.) Without the code being in the kernel, no project is going to spend the time to convert their codebase to a feature that isn't accepted. > 2. Security > I don't think that you need a 13k piece of code in the kernel to solve > that issue. Wait, what? How can you blow by that requirement by just saying that this proposal isn't acceptable? You can't do that, sorry. Please show how what we have proposed does not provide the security requirements as is documented. > 3. Semantics for apps with heavy data payloads > > Again, sounds like a solution for a non-existing problem. No, media apps need to share their data somehow, and kdbus provides a way to do that. GNOME portals are one such proposed codebase that is looking to use kdbus for this, and again, so is Pulse Audio and the other groups listed above. > 4. "Being in the kernel closes a lot of races which can't be fixed with > the current userspace solutions." > > You really need a in-kernel dbus with 13k to solve that? Do you know of a smaller amount of code to solve this problem? If so, wonderful, please show us, but we aren't playing code golf here. We are proposing something that is well documented and easy to maintain, while still being fast and correct. If it you think this can be done in a smaller amount of code, please show us where we are doing needless things in the patches. > 5. Eavesdropping on the kernel level > > Same here. same here for what? Again, please point out the wasted code we have, and we will be glad to remove it. Or change it to be smaller. Odds are we have done something wrong and it can be reduced, but I don't see it. Hints are greatly appreciated. > 6. dbus-daemon is not available during early-boot or shutdown. > > Why do you need dbus in that stage? You need a way to transfer system state from the initrd to the root disk, and using D-Bus is a great way to do that. systemd goes through a lot of special gyrations to achieve this that can be greatly simplified by using kdbus. If the kernel provides the service, you can also ensure that you have a working D-Bus early on and very late, so that applications / daemons can properly rely on it for startup/shutdown, which can not be done today. > Can you now please start answering questions instead of pointing to random mails? I've done nothing _but_ answer questions in this email-thread-of-doom. > >> dbus my have issues which are worth to fix. But moving dbus more or > >> less in the kernel > >> seems overkill. > > > > Why do you think so? How is this code "overkill"? > > Did you try to review it? ;-) > No really, it is by means no IPC primitive it is a super high level > in-kernel IPC and not trivial. Yes, it's not "trivial" as it solves a "non-trivial" problem. I agree. But to claim that somehow a much smaller solution can be found without pointing out what we did wrong with our proposal is just rude. > >> So, what exactly are these issues and why can't we add new IPC > >> primitives to Linux which > >> allow a decent userland dbus? > > > > That's exactly what this patchset does. > > No, it moves dbus into the kernel. No, it moves part of the dbus model into the kernel. There are still other things that are outside of the kernel, as they do not belong in the kernel. You need a library in userspace to properly implement the D-Bus protocol and interaction that userspace programs are needing. If we had implemented the "whole thing" in the kernel, then that library would not be needed. > >> To me kdbus seems much like an ad-hoc solution which is very dbus centric. > > > > Yes it is, but the "dbus centric" thing is a valid model that is quite > > useful and in use by a lot of programs as it solves a real problem. > > What programs? Really? Come on, look at all of the user stack that relies on D-Bus today (GNOME, KDE), and all of the programs that take advantage of the libraries that provide D-Bus bindings (Qt, glib, Go, python, perl, etc.) Yes, it's possible to run a Linux without using D-Bus, but almost no distro these days does that anymore. Just look at what is on your desktop and server today that all major Linux companies support. As was pointed out, even the tiny IoT devices running Linux are now using D-Bus, it's everywhere :) > >> IIRC Alan asked the same question. > > > > Yes, you can build everything off of tiny socket calls, but when you do > > that, you end up with the D-Bus userspace implementation we have today, > > with the issues that it has. By moving portions of that model into the > > kernel, as is done here, it solves a number of these issues, and allows > > for a lot more flexibility and things to be done that are impossible > > with the current model of trying to build on top of tiny ipc functions. > > > > The existing code is much stripped down from what you think of as a > > D-Bus daemon today, only the exact needed pieces are implemented here. > > > > Do you see anything wrong with the code as is submitted (aside from the > > issues that Al has pointed out that are being resolved already?) > > The code is fine, the concepts are fishy as pointed out many times in > this thread. My point is that we should try hard to fix dbus instead > of moving it into the kernel. What needs to be "fixed" in D-Bus that this patchset provide a solution for? And no, the concepts are not "fishy" at all. They solve a real problem, and need, that programs today rely on. The concepts have been used for many decades, and this specific implementation has lasted for over a decade as it seems to be the best model that has evolved over time. And there is no known proposed model out there to succeed it that I know of, do you know of one? Because of that, and the thread where the proposed security problems were agreed not to be a security problem, I don't see a reason anymore why this code should not be merged. With the exception of Al's code review, which is being addressed. But that's a minor thing, not a major design flaw at all. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 21:46 ` Greg Kroah-Hartman @ 2015-04-20 22:06 ` Andy Lutomirski 2015-04-21 7:38 ` Johannes Stezenbach ` (2 more replies) 2015-04-21 8:18 ` Richard Cochran 1 sibling, 3 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-20 22:06 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 2:46 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote: >> Greg, >> >> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman: >> >> In which situation on a common Linux system is the current dbus too slow today? >> >> I've never seen a issue like "Oh my system is slow because dbus is >> >> eating too much CPU cycles". >> > >> > See the original email which explained all of the things we can not do >> > with D-Bus, some of which are due to speed, that can now be done with the >> > kdbus code. >> >> okay, let's do it together. >> >> 1. Performance >> You write: >> "DBus is not used for performance sensitive applications because DBus is slow. >> We want to make it fast so we can finally use it for low-latency, >> high-throughput applications." >> >> Which applications exactly? >> This reads to me like a solution for a non-existing problem. > > Anything that uses UDS for large buffers today can switch to using kdbus > for it's data stream as it is faster. I know the Pulse Audio people > have discussed this, and there are other people as well (Enlightenment > library developers, glib, wayland, etc.) Without the code being in the > kernel, no project is going to spend the time to convert their codebase > to a feature that isn't accepted. Anything that uses UDS for large buffers today can switch to using memfd over SCM_RIGHTS right now. If SCM_RIGHTS is too slow, then we can fix it along the lines that Al proposed. > >> 2. Security >> I don't think that you need a 13k piece of code in the kernel to solve >> that issue. > > Wait, what? How can you blow by that requirement by just saying that > this proposal isn't acceptable? You can't do that, sorry. Please show > how what we have proposed does not provide the security requirements as > is documented. This is backwards. The way this discussion is going is: kdbus promoters: here's some code someone else: the code does such and such in a way that's wrong for xyz reason kdbus promoters: show us the implementation bug in such and such This is not how this discussion should work. Richard didn't say there was a bug in your code; he said that your code was too large. > >> 3. Semantics for apps with heavy data payloads >> >> Again, sounds like a solution for a non-existing problem. > > No, media apps need to share their data somehow, and kdbus provides a > way to do that. GNOME portals are one such proposed codebase that is > looking to use kdbus for this, and again, so is Pulse Audio and the > other groups listed above. AFAICT you're talking about passing data into and out of a sandbox for processing or UI purposes. We have two excellent ways to do that right now: memfd and splice, depending on exactly what you're doing. > >> 4. "Being in the kernel closes a lot of races which can't be fixed with >> the current userspace solutions." >> >> You really need a in-kernel dbus with 13k to solve that? > > Do you know of a smaller amount of code to solve this problem? If so, > wonderful, please show us, but we aren't playing code golf here. We are > proposing something that is well documented and easy to maintain, while > still being fast and correct. If it you think this can be done in a > smaller amount of code, please show us where we are doing needless > things in the patches. I do. Implement something like my old SCM_IDENTITY proposal, which is kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I never proposed most of the absurd metadata items that kdbus is proposing, and I also suggesting doing it over plain old UNIX sockets. If that ends up at more than 500 LOC, then something's wrong. Also, everyone gets the benefit, not just kdbus. [snip] > Because of that, and the thread where the proposed security problems > were agreed not to be a security problem, I don't see a reason anymore > why this code should not be merged. > > With the exception of Al's code review, which is being addressed. But > that's a minor thing, not a major design flaw at all. My NACK stands. A security problem was fixed, but the metadata system has multiple problems, each of which is independently sufficient to earn my nack. On top of that, the policy mechanism is iffy and is probably worthy of my nack. On top of that, I think that someone into resource management needs to seriously consider whether having a broadcast send do get_user_pages or the equivalent on pages supplied by untrusted recipients (plural!) is a good idea. On top of *that*, I have serious doubts that the whole design make sense. That doesn't earn my nack specifically, but it sure seems like a lot of people share my doubts that the design makes sense, and I don't hear a whole lot of people saying that they thing the design is a good thing to put in the kernel. Also, the current thread-of-lesser-doom on the systemd list greatly decreases my confidence that the issues that have earned my nack will get resolved. The kdbus designers seem to be unwilling to accept that code should be merged into the kernel merely because I (me, personally) don't see a straightforward security exploit that the code enables. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 22:06 ` Andy Lutomirski @ 2015-04-21 7:38 ` Johannes Stezenbach 2015-04-21 9:35 ` One Thousand Gnomes 2015-04-21 10:31 ` Greg Kroah-Hartman 2 siblings, 0 replies; 333+ messages in thread From: Johannes Stezenbach @ 2015-04-21 7:38 UTC (permalink / raw) To: Andy Lutomirski Cc: Greg Kroah-Hartman, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 03:06:09PM -0700, Andy Lutomirski wrote: > > I do. Implement something like my old SCM_IDENTITY proposal, which is > kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I > never proposed most of the absurd metadata items that kdbus is > proposing, and I also suggesting doing it over plain old UNIX sockets. I'd like to point out that the DBus spec describes a fairly standard per-connection authentication mechanism. It seems that userspace DBus doesn't need per-message metadata. Johannes ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 22:06 ` Andy Lutomirski 2015-04-21 7:38 ` Johannes Stezenbach @ 2015-04-21 9:35 ` One Thousand Gnomes 2015-04-21 10:17 ` David Herrmann 2015-04-21 10:51 ` Greg Kroah-Hartman 2015-04-21 10:31 ` Greg Kroah-Hartman 2 siblings, 2 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-21 9:35 UTC (permalink / raw) To: Andy Lutomirski Cc: Greg Kroah-Hartman, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni > On top of that, I think that someone into resource management needs to > seriously consider whether having a broadcast send do get_user_pages > or the equivalent on pages supplied by untrusted recipients (plural!) > is a good idea. Oh but its so much fun if you pass pages belonging to a device driver, or pass bits of a GEM object thereby keeping entire graphics textures referenced 8) The get_user_pages stuff looks like simple case of premature optimisation (or I suspect pessmimisation). If bulk data goes via memfd/splice/socket point to point as it should (or via shared memory objects) then you don't need all the page pinning and that begins to simplify the code a lot. Copies of small messages are cheap if not free on most processors. Its a memcpy into a simple refcounted buffer which gets deleted when the refcount hits zero. (I'd argue for also looking at stuff like GEM and the dma buffers for some of the big stuff. As we move towards more and more accelerators again being able to pass around handles between accelerator more and more important) We do need something for the multicast messaging. Whether that's supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX message queue extensions ?). There's no real IP layer reliable ordered multicast delivery system that is low latency and lightweight because once it hits real networks it changes from a hard problem into a seriously hard problem because of multicast implosions and the like. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 9:35 ` One Thousand Gnomes @ 2015-04-21 10:17 ` David Herrmann 2015-04-21 12:20 ` Michal Hocko 2015-04-21 10:51 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: David Herrmann @ 2015-04-21 10:17 UTC (permalink / raw) To: One Thousand Gnomes Cc: Andy Lutomirski, Greg Kroah-Hartman, Richard Weinberger, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: >> On top of that, I think that someone into resource management needs to >> seriously consider whether having a broadcast send do get_user_pages >> or the equivalent on pages supplied by untrusted recipients (plural!) >> is a good idea. > > Oh but its so much fun if you pass pages belonging to a device driver, or > pass bits of a GEM object thereby keeping entire graphics textures > referenced 8) We do not use GUP, nor do we pass around pinned pages. All we use is __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / copy_from_user() internally relies on GUP or not, is an orthogonal issue that does not belong here. Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 10:17 ` David Herrmann @ 2015-04-21 12:20 ` Michal Hocko 2015-04-21 14:01 ` David Herrmann 0 siblings, 1 reply; 333+ messages in thread From: Michal Hocko @ 2015-04-21 12:20 UTC (permalink / raw) To: David Herrmann Cc: One Thousand Gnomes, Andy Lutomirski, Greg Kroah-Hartman, Richard Weinberger, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue 21-04-15 12:17:49, David Herrmann wrote: > Hi > > On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > <gnomes@lxorguk.ukuu.org.uk> wrote: > >> On top of that, I think that someone into resource management needs to > >> seriously consider whether having a broadcast send do get_user_pages > >> or the equivalent on pages supplied by untrusted recipients (plural!) > >> is a good idea. > > > > Oh but its so much fun if you pass pages belonging to a device driver, or > > pass bits of a GEM object thereby keeping entire graphics textures > > referenced 8) > > We do not use GUP, nor do we pass around pinned pages. All we use is > __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > copy_from_user() internally relies on GUP or not, is an orthogonal > issue that does not belong here. It kind of does AFAIU. If for nothing else then the memcg reasons mentioned in other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an untrusted user is allowed to hand over a shmem backed buffer which hasn't been charged yet (read faulted in) and then kdbus forced to fault it in a different user's context then you basically allow to hide memory allocations from the memcg. That is a clear show stopper. Or have I misunderstood the way how shmem buffers are used here? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 12:20 ` Michal Hocko @ 2015-04-21 14:01 ` David Herrmann 2015-04-21 14:27 ` Michal Hocko 0 siblings, 1 reply; 333+ messages in thread From: David Herrmann @ 2015-04-21 14:01 UTC (permalink / raw) To: Michal Hocko Cc: One Thousand Gnomes, Andy Lutomirski, Greg Kroah-Hartman, Richard Weinberger, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: > On Tue 21-04-15 12:17:49, David Herrmann wrote: >> Hi >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes >> <gnomes@lxorguk.ukuu.org.uk> wrote: >> >> On top of that, I think that someone into resource management needs to >> >> seriously consider whether having a broadcast send do get_user_pages >> >> or the equivalent on pages supplied by untrusted recipients (plural!) >> >> is a good idea. >> > >> > Oh but its so much fun if you pass pages belonging to a device driver, or >> > pass bits of a GEM object thereby keeping entire graphics textures >> > referenced 8) >> >> We do not use GUP, nor do we pass around pinned pages. All we use is >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / >> copy_from_user() internally relies on GUP or not, is an orthogonal >> issue that does not belong here. > > It kind of does AFAIU. No, it is not. The issue with GUP is that you elevate the page ref-count and thus prevent lru isolation, sealing, whatsoever. I cannot see how it is related to kdbus. However, ... > If for nothing else then the memcg reasons mentioned in > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > untrusted user is allowed to hand over a shmem backed buffer which hasn't > been charged yet (read faulted in) and then kdbus forced to fault it in > a different user's context then you basically allow to hide memory > allocations from the memcg. That is a clear show stopper. > > Or have I misunderstood the way how shmem buffers are used here? ..as you mentioned memcg, lets figure that out here. shmem buffers are used as receive-buffers by kdbus peers. They are read-only to user-space. All allocations are done by the kernel on message passing. There is no buffering on the sender's side, only on the receiver's. There're 3 possible ways to charge for memory that backs a message. We can charge the sender, the receiver or the kernel. Right now we charge the sender, as DBus-method-calls imply a trust relationship on the target. We could as well charge the receiver, which might be conceptually superior. Anyhow, both are imo better than the dbus-daemon model, where we basically charge the root-memcg (more precisely, the memcg of dbus-daemon). Note that a message is always charged on either the sender or the receiver. I don't see how memory is hidden from the memcg. If you pass a memfd to another peer, you also need to be aware that this memory was charged on you and stays so until the remote peer drops its reference. But all messages you receive via kdbus are always read-only. Btw., binder avoids this issue by not charging anyone, which I also don't think is a viable solution. Could you elaborate on the exact issue you're seeing here? Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 14:01 ` David Herrmann @ 2015-04-21 14:27 ` Michal Hocko 2015-04-21 14:47 ` David Herrmann 2015-04-21 18:11 ` Andy Lutomirski 0 siblings, 2 replies; 333+ messages in thread From: Michal Hocko @ 2015-04-21 14:27 UTC (permalink / raw) To: David Herrmann Cc: One Thousand Gnomes, Andy Lutomirski, Greg Kroah-Hartman, Richard Weinberger, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue 21-04-15 16:01:01, David Herrmann wrote: > Hi > > On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: > > On Tue 21-04-15 12:17:49, David Herrmann wrote: > >> Hi > >> > >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > >> <gnomes@lxorguk.ukuu.org.uk> wrote: > >> >> On top of that, I think that someone into resource management needs to > >> >> seriously consider whether having a broadcast send do get_user_pages > >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > >> >> is a good idea. > >> > > >> > Oh but its so much fun if you pass pages belonging to a device driver, or > >> > pass bits of a GEM object thereby keeping entire graphics textures > >> > referenced 8) > >> > >> We do not use GUP, nor do we pass around pinned pages. All we use is > >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > >> copy_from_user() internally relies on GUP or not, is an orthogonal > >> issue that does not belong here. > > > > It kind of does AFAIU. > > No, it is not. The issue with GUP is that you elevate the page > ref-count and thus prevent lru isolation, sealing, whatsoever. The point was that such a memory might be not present yet and need a page fault with all the side effects - memory reclaim, memcg charge... > I cannot see how it is related to kdbus. However, ... > > > If for nothing else then the memcg reasons mentioned in > > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > > untrusted user is allowed to hand over a shmem backed buffer which hasn't > > been charged yet (read faulted in) and then kdbus forced to fault it in > > a different user's context then you basically allow to hide memory > > allocations from the memcg. That is a clear show stopper. > > > > Or have I misunderstood the way how shmem buffers are used here? > > ..as you mentioned memcg, lets figure that out here. shmem buffers are > used as receive-buffers by kdbus peers. They are read-only to > user-space. All allocations are done by the kernel on message passing. OK, so the shmem buffer is allocated on the kernels behalf and under its control and no userspace can hand over one to kdbus. Do I get it right? If yes then the memcg escape I was describing above is not possible of course. This wasn't clear to me from the previous discussion. Thanks for the clarification! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 14:27 ` Michal Hocko @ 2015-04-21 14:47 ` David Herrmann 2015-04-21 18:11 ` Andy Lutomirski 1 sibling, 0 replies; 333+ messages in thread From: David Herrmann @ 2015-04-21 14:47 UTC (permalink / raw) To: Michal Hocko Cc: One Thousand Gnomes, Andy Lutomirski, Greg Kroah-Hartman, Richard Weinberger, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi On Tue, Apr 21, 2015 at 4:27 PM, Michal Hocko <mhocko@suse.cz> wrote: > On Tue 21-04-15 16:01:01, David Herrmann wrote: >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: >> > If for nothing else then the memcg reasons mentioned in >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't >> > been charged yet (read faulted in) and then kdbus forced to fault it in >> > a different user's context then you basically allow to hide memory >> > allocations from the memcg. That is a clear show stopper. >> > >> > Or have I misunderstood the way how shmem buffers are used here? >> >> ..as you mentioned memcg, lets figure that out here. shmem buffers are >> used as receive-buffers by kdbus peers. They are read-only to >> user-space. All allocations are done by the kernel on message passing. > > OK, so the shmem buffer is allocated on the kernels behalf and under > its control and no userspace can hand over one to kdbus. Do I get > it right? If yes then the memcg escape I was describing above is > not possible of course. This wasn't clear to me from the previous > discussion. Thanks for the clarification! Exactly. Much appreciated! Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 14:27 ` Michal Hocko 2015-04-21 14:47 ` David Herrmann @ 2015-04-21 18:11 ` Andy Lutomirski 2015-04-22 14:57 ` Michal Hocko 1 sibling, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-21 18:11 UTC (permalink / raw) To: Michal Hocko Cc: David Herrmann, One Thousand Gnomes, Greg Kroah-Hartman, Richard Weinberger, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <mhocko@suse.cz> wrote: > On Tue 21-04-15 16:01:01, David Herrmann wrote: >> Hi >> >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: >> >> Hi >> >> >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes >> >> <gnomes@lxorguk.ukuu.org.uk> wrote: >> >> >> On top of that, I think that someone into resource management needs to >> >> >> seriously consider whether having a broadcast send do get_user_pages >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) >> >> >> is a good idea. >> >> > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or >> >> > pass bits of a GEM object thereby keeping entire graphics textures >> >> > referenced 8) >> >> >> >> We do not use GUP, nor do we pass around pinned pages. All we use is >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / >> >> copy_from_user() internally relies on GUP or not, is an orthogonal >> >> issue that does not belong here. >> > >> > It kind of does AFAIU. >> >> No, it is not. The issue with GUP is that you elevate the page >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > The point was that such a memory might be not present yet and need a > page fault with all the side effects - memory reclaim, memcg charge... > >> I cannot see how it is related to kdbus. However, ... >> >> > If for nothing else then the memcg reasons mentioned in >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't >> > been charged yet (read faulted in) and then kdbus forced to fault it in >> > a different user's context then you basically allow to hide memory >> > allocations from the memcg. That is a clear show stopper. >> > >> > Or have I misunderstood the way how shmem buffers are used here? >> >> ..as you mentioned memcg, lets figure that out here. shmem buffers are >> used as receive-buffers by kdbus peers. They are read-only to >> user-space. All allocations are done by the kernel on message passing. > > OK, so the shmem buffer is allocated on the kernels behalf and under > its control and no userspace can hand over one to kdbus. Do I get > it right? If yes then the memcg escape I was describing above is > not possible of course. This wasn't clear to me from the previous > discussion. Thanks for the clarification! I'm still missing something here, I think. At the time of pool creation, the kernel calls shmem_file_setup in the context of the untrusted user. Then, when a privileged daemon broadcasts, the kernel calls vfs_iter_write or similar, thus allocating the page, right? I don't see why the page would be allocated early or why vfs_iter_write and the associated shmem code would care what memcg created the shmem file -- all of that code seems to use current's memcg on brief inspection. Bear in mind that the bad guy gets to use madvise, etc to mess around with the page cache state. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 18:11 ` Andy Lutomirski @ 2015-04-22 14:57 ` Michal Hocko 2015-04-22 19:36 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Michal Hocko @ 2015-04-22 14:57 UTC (permalink / raw) To: Andy Lutomirski Cc: David Herrmann, One Thousand Gnomes, Greg Kroah-Hartman, Richard Weinberger, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue 21-04-15 11:11:35, Andy Lutomirski wrote: > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <mhocko@suse.cz> wrote: > > On Tue 21-04-15 16:01:01, David Herrmann wrote: > >> Hi > >> > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: > >> >> Hi > >> >> > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > >> >> <gnomes@lxorguk.ukuu.org.uk> wrote: > >> >> >> On top of that, I think that someone into resource management needs to > >> >> >> seriously consider whether having a broadcast send do get_user_pages > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > >> >> >> is a good idea. > >> >> > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or > >> >> > pass bits of a GEM object thereby keeping entire graphics textures > >> >> > referenced 8) > >> >> > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal > >> >> issue that does not belong here. > >> > > >> > It kind of does AFAIU. > >> > >> No, it is not. The issue with GUP is that you elevate the page > >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > > > The point was that such a memory might be not present yet and need a > > page fault with all the side effects - memory reclaim, memcg charge... > > > >> I cannot see how it is related to kdbus. However, ... > >> > >> > If for nothing else then the memcg reasons mentioned in > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't > >> > been charged yet (read faulted in) and then kdbus forced to fault it in > >> > a different user's context then you basically allow to hide memory > >> > allocations from the memcg. That is a clear show stopper. > >> > > >> > Or have I misunderstood the way how shmem buffers are used here? > >> > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are > >> used as receive-buffers by kdbus peers. They are read-only to > >> user-space. All allocations are done by the kernel on message passing. > > > > OK, so the shmem buffer is allocated on the kernels behalf and under > > its control and no userspace can hand over one to kdbus. Do I get > > it right? If yes then the memcg escape I was describing above is > > not possible of course. This wasn't clear to me from the previous > > discussion. Thanks for the clarification! > > I'm still missing something here, I think. At the time of pool > creation, the kernel calls shmem_file_setup in the context of the > untrusted user. Then, when a privileged daemon broadcasts, the kernel > calls vfs_iter_write or similar, thus allocating the page, right? I > don't see why the page would be allocated early or why vfs_iter_write > and the associated shmem code would care what memcg created the shmem > file -- all of that code seems to use current's memcg on brief > inspection. Yes it is the current task on the first charge or the original memcg on the swap in. But my understanding from the above, and I haven't read the code yet, is that the untrusted userspace is only reader from the buffer and isn't allowed to modify the buffer. > Bear in mind that the bad guy gets to use madvise, etc to mess around > with the page cache state. How can an untrusted user play with shmem when it is read-only? shmem_file_setup shouls create an unlinked file so no process can access it via tmpfs AFAIU and potentially fault the memory before the producent will fill it up (thus fault in in the trusted context). I have no idea how the receiver gets to the buffer though. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-22 14:57 ` Michal Hocko @ 2015-04-22 19:36 ` Andy Lutomirski 2015-04-27 12:46 ` Michal Hocko 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-22 19:36 UTC (permalink / raw) To: Michal Hocko Cc: Arnd Bergmann, linux-kernel, Jiri Kosina, Andrew Morton, Al Viro, Daniel Mack, Borislav Petkov, One Thousand Gnomes, Linus Torvalds, Richard Weinberger, Tom Gundersen, Steven Rostedt, Greg Kroah-Hartman, Eric W. Biederman, David Herrmann, Djalal Harouni On Apr 22, 2015 7:57 AM, "Michal Hocko" <mhocko@suse.cz> wrote: > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote: > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <mhocko@suse.cz> wrote: > > > On Tue 21-04-15 16:01:01, David Herrmann wrote: > > >> Hi > > >> > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: > > >> >> Hi > > >> >> > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > > >> >> <gnomes@lxorguk.ukuu.org.uk> wrote: > > >> >> >> On top of that, I think that someone into resource management needs to > > >> >> >> seriously consider whether having a broadcast send do get_user_pages > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > > >> >> >> is a good idea. > > >> >> > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures > > >> >> > referenced 8) > > >> >> > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal > > >> >> issue that does not belong here. > > >> > > > >> > It kind of does AFAIU. > > >> > > >> No, it is not. The issue with GUP is that you elevate the page > > >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > > > > > The point was that such a memory might be not present yet and need a > > > page fault with all the side effects - memory reclaim, memcg charge... > > > > > >> I cannot see how it is related to kdbus. However, ... > > >> > > >> > If for nothing else then the memcg reasons mentioned in > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in > > >> > a different user's context then you basically allow to hide memory > > >> > allocations from the memcg. That is a clear show stopper. > > >> > > > >> > Or have I misunderstood the way how shmem buffers are used here? > > >> > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are > > >> used as receive-buffers by kdbus peers. They are read-only to > > >> user-space. All allocations are done by the kernel on message passing. > > > > > > OK, so the shmem buffer is allocated on the kernels behalf and under > > > its control and no userspace can hand over one to kdbus. Do I get > > > it right? If yes then the memcg escape I was describing above is > > > not possible of course. This wasn't clear to me from the previous > > > discussion. Thanks for the clarification! > > > > I'm still missing something here, I think. At the time of pool > > creation, the kernel calls shmem_file_setup in the context of the > > untrusted user. Then, when a privileged daemon broadcasts, the kernel > > calls vfs_iter_write or similar, thus allocating the page, right? I > > don't see why the page would be allocated early or why vfs_iter_write > > and the associated shmem code would care what memcg created the shmem > > file -- all of that code seems to use current's memcg on brief > > inspection. > > Yes it is the current task on the first charge or the original memcg on > the swap in. But my understanding from the above, and I haven't read the > code yet, is that the untrusted userspace is only reader from the buffer > and isn't allowed to modify the buffer. > > > Bear in mind that the bad guy gets to use madvise, etc to mess around > > with the page cache state. > > How can an untrusted user play with shmem when it is read-only? > shmem_file_setup shouls create an unlinked file so no process can access > it via tmpfs AFAIU and potentially fault the memory before the producent > will fill it up (thus fault in in the trusted context). I have no idea > how the receiver gets to the buffer though. > The receiver gets to mmap the buffer. I'm not sure what protection they get. The thing I'm worried about is that the receiver might deliberately avoid faulting in a bunch of pages and instead wait for the producer to touch them, causing pages that logically belong to the receiver to be charged to the producer instead. --Andy > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-22 19:36 ` Andy Lutomirski @ 2015-04-27 12:46 ` Michal Hocko 2015-04-27 20:11 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Michal Hocko @ 2015-04-27 12:46 UTC (permalink / raw) To: Andy Lutomirski Cc: Arnd Bergmann, linux-kernel, Jiri Kosina, Andrew Morton, Al Viro, Daniel Mack, Borislav Petkov, One Thousand Gnomes, Linus Torvalds, Richard Weinberger, Tom Gundersen, Steven Rostedt, Greg Kroah-Hartman, Eric W. Biederman, David Herrmann, Djalal Harouni On Wed 22-04-15 12:36:12, Andy Lutomirski wrote: > On Apr 22, 2015 7:57 AM, "Michal Hocko" <mhocko@suse.cz> wrote: > > > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote: > > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <mhocko@suse.cz> wrote: > > > > On Tue 21-04-15 16:01:01, David Herrmann wrote: > > > >> Hi > > > >> > > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: > > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: > > > >> >> Hi > > > >> >> > > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > > > >> >> <gnomes@lxorguk.ukuu.org.uk> wrote: > > > >> >> >> On top of that, I think that someone into resource management needs to > > > >> >> >> seriously consider whether having a broadcast send do get_user_pages > > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > > > >> >> >> is a good idea. > > > >> >> > > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or > > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures > > > >> >> > referenced 8) > > > >> >> > > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is > > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal > > > >> >> issue that does not belong here. > > > >> > > > > >> > It kind of does AFAIU. > > > >> > > > >> No, it is not. The issue with GUP is that you elevate the page > > > >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > > > > > > > The point was that such a memory might be not present yet and need a > > > > page fault with all the side effects - memory reclaim, memcg charge... > > > > > > > >> I cannot see how it is related to kdbus. However, ... > > > >> > > > >> > If for nothing else then the memcg reasons mentioned in > > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't > > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in > > > >> > a different user's context then you basically allow to hide memory > > > >> > allocations from the memcg. That is a clear show stopper. > > > >> > > > > >> > Or have I misunderstood the way how shmem buffers are used here? > > > >> > > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are > > > >> used as receive-buffers by kdbus peers. They are read-only to > > > >> user-space. All allocations are done by the kernel on message passing. > > > > > > > > OK, so the shmem buffer is allocated on the kernels behalf and under > > > > its control and no userspace can hand over one to kdbus. Do I get > > > > it right? If yes then the memcg escape I was describing above is > > > > not possible of course. This wasn't clear to me from the previous > > > > discussion. Thanks for the clarification! > > > > > > I'm still missing something here, I think. At the time of pool > > > creation, the kernel calls shmem_file_setup in the context of the > > > untrusted user. Then, when a privileged daemon broadcasts, the kernel > > > calls vfs_iter_write or similar, thus allocating the page, right? I > > > don't see why the page would be allocated early or why vfs_iter_write > > > and the associated shmem code would care what memcg created the shmem > > > file -- all of that code seems to use current's memcg on brief > > > inspection. > > > > Yes it is the current task on the first charge or the original memcg on > > the swap in. But my understanding from the above, and I haven't read the > > code yet, is that the untrusted userspace is only reader from the buffer > > and isn't allowed to modify the buffer. > > > > > Bear in mind that the bad guy gets to use madvise, etc to mess around > > > with the page cache state. > > > > How can an untrusted user play with shmem when it is read-only? > > shmem_file_setup shouls create an unlinked file so no process can access > > it via tmpfs AFAIU and potentially fault the memory before the producent > > will fill it up (thus fault in in the trusted context). I have no idea > > how the receiver gets to the buffer though. > > > > The receiver gets to mmap the buffer. I'm not sure what protection they get. OK, so I've checked the code. kdbus_pool_new sets up a shmem file (unlinked) so not visible externally. The consumer will get it via mmap on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and clears VM_MAYWRITE. The receiver even doesn't have access to the shmem file directly. It is ugly that kdbus_pool_mmap replaces the original vm_file and make it point to the shmem file. I am not sure whether this is safe all the time and it would deserve a big fat comment. On the other hand, it seems some drivers are doing this already (e.g. dma_buf_mmap). > The thing I'm worried about is that the receiver might deliberately > avoid faulting in a bunch of pages and instead wait for the producer > to touch them, causing pages that logically belong to the receiver to > be charged to the producer instead. Hmm, now that I am looking into the code it seems you are right. E.g. kdbus_cmd_send runs in the context of the sender AFAIU. This gets down to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this is where we get to charge the memory. AFAIU the terminology all the receivers will share the same shmem file when mmaping the endpoint. This, however, doesn't seem to be exploitable to hide memory charges because the receiver cannot make the buffer writable. A nasty process with a small memcg limit could still pre-fault the memory before any writer gets sends a message and slow the whole endpoint traffic. But that wouldn't be a completely new thing because processes might hammer on memory even without memcg... It is just that this would be kind of easier with memcg. If that is the concern then the buffer should be pre-charged at the time when it is created. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 12:46 ` Michal Hocko @ 2015-04-27 20:11 ` Andy Lutomirski 2015-04-29 17:24 ` Michal Hocko 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-27 20:11 UTC (permalink / raw) To: Michal Hocko Cc: Arnd Bergmann, linux-kernel, Jiri Kosina, Andrew Morton, Al Viro, Daniel Mack, Borislav Petkov, One Thousand Gnomes, Linus Torvalds, Richard Weinberger, Tom Gundersen, Steven Rostedt, Greg Kroah-Hartman, Eric W. Biederman, David Herrmann, Djalal Harouni [resent without HTML] On Apr 27, 2015 5:46 AM, "Michal Hocko" <mhocko@suse.cz> wrote: > > On Wed 22-04-15 12:36:12, Andy Lutomirski wrote: > > On Apr 22, 2015 7:57 AM, "Michal Hocko" <mhocko@suse.cz> wrote: > > > > > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote: > > > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Tue 21-04-15 16:01:01, David Herrmann wrote: > > > > >> Hi > > > > >> > > > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <mhocko@suse.cz> wrote: > > > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote: > > > > >> >> Hi > > > > >> >> > > > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes > > > > >> >> <gnomes@lxorguk.ukuu.org.uk> wrote: > > > > >> >> >> On top of that, I think that someone into resource management needs to > > > > >> >> >> seriously consider whether having a broadcast send do get_user_pages > > > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!) > > > > >> >> >> is a good idea. > > > > >> >> > > > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or > > > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures > > > > >> >> > referenced 8) > > > > >> >> > > > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is > > > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() / > > > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal > > > > >> >> issue that does not belong here. > > > > >> > > > > > >> > It kind of does AFAIU. > > > > >> > > > > >> No, it is not. The issue with GUP is that you elevate the page > > > > >> ref-count and thus prevent lru isolation, sealing, whatsoever. > > > > > > > > > > The point was that such a memory might be not present yet and need a > > > > > page fault with all the side effects - memory reclaim, memcg charge... > > > > > > > > > >> I cannot see how it is related to kdbus. However, ... > > > > >> > > > > >> > If for nothing else then the memcg reasons mentioned in > > > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an > > > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't > > > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in > > > > >> > a different user's context then you basically allow to hide memory > > > > >> > allocations from the memcg. That is a clear show stopper. > > > > >> > > > > > >> > Or have I misunderstood the way how shmem buffers are used here? > > > > >> > > > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are > > > > >> used as receive-buffers by kdbus peers. They are read-only to > > > > >> user-space. All allocations are done by the kernel on message passing. > > > > > > > > > > OK, so the shmem buffer is allocated on the kernels behalf and under > > > > > its control and no userspace can hand over one to kdbus. Do I get > > > > > it right? If yes then the memcg escape I was describing above is > > > > > not possible of course. This wasn't clear to me from the previous > > > > > discussion. Thanks for the clarification! > > > > > > > > I'm still missing something here, I think. At the time of pool > > > > creation, the kernel calls shmem_file_setup in the context of the > > > > untrusted user. Then, when a privileged daemon broadcasts, the kernel > > > > calls vfs_iter_write or similar, thus allocating the page, right? I > > > > don't see why the page would be allocated early or why vfs_iter_write > > > > and the associated shmem code would care what memcg created the shmem > > > > file -- all of that code seems to use current's memcg on brief > > > > inspection. > > > > > > Yes it is the current task on the first charge or the original memcg on > > > the swap in. But my understanding from the above, and I haven't read the > > > code yet, is that the untrusted userspace is only reader from the buffer > > > and isn't allowed to modify the buffer. > > > > > > > Bear in mind that the bad guy gets to use madvise, etc to mess around > > > > with the page cache state. > > > > > > How can an untrusted user play with shmem when it is read-only? > > > shmem_file_setup shouls create an unlinked file so no process can access > > > it via tmpfs AFAIU and potentially fault the memory before the producent > > > will fill it up (thus fault in in the trusted context). I have no idea > > > how the receiver gets to the buffer though. > > > > > > > The receiver gets to mmap the buffer. I'm not sure what protection they get. > > OK, so I've checked the code. kdbus_pool_new sets up a shmem file > (unlinked) so not visible externally. The consumer will get it via mmap > on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and > clears VM_MAYWRITE. The receiver even doesn't have access to the shmem > file directly. > > It is ugly that kdbus_pool_mmap replaces the original vm_file and make > it point to the shmem file. I am not sure whether this is safe all the > time and it would deserve a big fat comment. On the other hand, it seems > some drivers are doing this already (e.g. dma_buf_mmap). What happens to map_files in proc? It seems unlikely that CRIU would ever work on dma_buf, but this could be a problem for CRIU with kdbus. > > > The thing I'm worried about is that the receiver might deliberately > > avoid faulting in a bunch of pages and instead wait for the producer > > to touch them, causing pages that logically belong to the receiver to > > be charged to the producer instead. > > Hmm, now that I am looking into the code it seems you are right. E.g. > kdbus_cmd_send runs in the context of the sender AFAIU. This gets down > to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this > is where we get to charge the memory. AFAIU the terminology all the > receivers will share the same shmem file when mmaping the endpoint. > > This, however, doesn't seem to be exploitable to hide memory charges > because the receiver cannot make the buffer writable. A nasty process > with a small memcg limit could still pre-fault the memory before any > writer gets sends a message and slow the whole endpoint traffic. But > that wouldn't be a completely new thing because processes might hammer > on memory even without memcg... It is just that this would be kind of > easier with memcg. > If that is the concern then the buffer should be pre-charged at the time > when it is created. The attach I had in mind was that the nasty process with a small memcg creates one or many of these and doesn't pre-fault it. Then a sender (systemd?) sends messages and they get charged, possibly once for each copy sent, to the root memcg. So kdbus should probably pre-charge the creator of the pool. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 20:11 ` Andy Lutomirski @ 2015-04-29 17:24 ` Michal Hocko 0 siblings, 0 replies; 333+ messages in thread From: Michal Hocko @ 2015-04-29 17:24 UTC (permalink / raw) To: Andy Lutomirski Cc: Arnd Bergmann, linux-kernel, Jiri Kosina, Andrew Morton, Al Viro, Daniel Mack, Borislav Petkov, One Thousand Gnomes, Linus Torvalds, Richard Weinberger, Tom Gundersen, Steven Rostedt, Greg Kroah-Hartman, Eric W. Biederman, David Herrmann, Djalal Harouni On Mon 27-04-15 13:11:03, Andy Lutomirski wrote: > [resent without HTML] > > On Apr 27, 2015 5:46 AM, "Michal Hocko" <mhocko@suse.cz> wrote: > > > > On Wed 22-04-15 12:36:12, Andy Lutomirski wrote: [...] > > > The receiver gets to mmap the buffer. I'm not sure what protection they get. > > > > OK, so I've checked the code. kdbus_pool_new sets up a shmem file > > (unlinked) so not visible externally. The consumer will get it via mmap > > on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and > > clears VM_MAYWRITE. The receiver even doesn't have access to the shmem > > file directly. > > > > It is ugly that kdbus_pool_mmap replaces the original vm_file and make > > it point to the shmem file. I am not sure whether this is safe all the > > time and it would deserve a big fat comment. On the other hand, it seems > > some drivers are doing this already (e.g. dma_buf_mmap). > > What happens to map_files in proc? It seems unlikely that CRIU would > ever work on dma_buf, but this could be a problem for CRIU with kdbus. I am not familiar with CRIU and likewise with map_files directory. I've actually heard about it for the first time. [looking...] So proc_map_files_readdir will iterate all VMAs including the one backed by the buffer and it will see it's vm_file which will point to the shmem file AFAICS. So it doesn't seem like CRIU would care because all parties on the buffer would see the same inode. Whether that is really enough, I dunno. > > > The thing I'm worried about is that the receiver might deliberately > > > avoid faulting in a bunch of pages and instead wait for the producer > > > to touch them, causing pages that logically belong to the receiver to > > > be charged to the producer instead. > > > > Hmm, now that I am looking into the code it seems you are right. E.g. > > kdbus_cmd_send runs in the context of the sender AFAIU. This gets down > > to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this > > is where we get to charge the memory. AFAIU the terminology all the > > receivers will share the same shmem file when mmaping the endpoint. > > > > This, however, doesn't seem to be exploitable to hide memory charges > > because the receiver cannot make the buffer writable. A nasty process > > with a small memcg limit could still pre-fault the memory before any > > writer gets sends a message and slow the whole endpoint traffic. But > > that wouldn't be a completely new thing because processes might hammer > > on memory even without memcg... It is just that this would be kind of > > easier with memcg. > > If that is the concern then the buffer should be pre-charged at the time > > when it is created. > > The attach I had in mind was that the nasty process with a small memcg > creates one or many of these and doesn't pre-fault it. Then a sender > (systemd?) sends messages and they get charged, possibly once for each > copy sent, to the root memcg. Dunno but I suspect that systemd will not talk to random endpoints. Or can those endpoints be registered on a systembus by an untrusted task? Bus vs. endpoint relation is still not entirely clear to me. But even if that was possible I fail to see how the small memcg plays any role when the task doesn't control the shmem buffer directly but it only has a read only mapping of it. > So kdbus should probably pre-charge the creator of the pool. Yes, as I've said above. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 9:35 ` One Thousand Gnomes 2015-04-21 10:17 ` David Herrmann @ 2015-04-21 10:51 ` Greg Kroah-Hartman 2015-04-21 11:03 ` Jiri Kosina 1 sibling, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-21 10:51 UTC (permalink / raw) To: One Thousand Gnomes Cc: Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 10:35:19AM +0100, One Thousand Gnomes wrote: > > We do need something for the multicast messaging. Whether that's > supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX > message queue extensions ?). There's no real IP layer reliable ordered > multicast delivery system that is low latency and lightweight because > once it hits real networks it changes from a hard problem into a > seriously hard problem because of multicast implosions and the like. This was attempted in the past with AF_DBUS, but the networking maintainers rightfully pointed out that the model there did not work. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 10:51 ` Greg Kroah-Hartman @ 2015-04-21 11:03 ` Jiri Kosina 2015-04-21 12:56 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Jiri Kosina @ 2015-04-21 11:03 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: One Thousand Gnomes, Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, 21 Apr 2015, Greg Kroah-Hartman wrote: > > We do need something for the multicast messaging. Whether that's > > supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX > > message queue extensions ?). There's no real IP layer reliable ordered > > multicast delivery system that is low latency and lightweight because > > once it hits real networks it changes from a hard problem into a > > seriously hard problem because of multicast implosions and the like. > > This was attempted in the past with AF_DBUS, but the networking > maintainers rightfully pointed out that the model there did not work. BTW, I don't think this has been brought up in this discussion yet ... please correct me if I am wrong, my memory is very faint here (*), but wasn't the main objection to AF_BUS that defining what happens when one of the subscribed receivers disconnects is a policy matter, and as such belongs to userspace (which wasn't the case with the submitted AF_BUS implementation)? Was that considered unfixable and AF_BUS consequently given up because of this? I personally think that AF_BUS makes quite a lot of sense -- it builds on what we already have (AF_UNIX credential passing, memfd sealing, etc), it basically "just implements a missing socket semantics" (wrt. reliability and multicasting). (*) and I really would like to avoid the digging out and reading thread similar to this one, about AF_BUS, again Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 11:03 ` Jiri Kosina @ 2015-04-21 12:56 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-21 12:56 UTC (permalink / raw) To: Jiri Kosina Cc: One Thousand Gnomes, Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 01:03:59PM +0200, Jiri Kosina wrote: > On Tue, 21 Apr 2015, Greg Kroah-Hartman wrote: > > > > We do need something for the multicast messaging. Whether that's > > > supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX > > > message queue extensions ?). There's no real IP layer reliable ordered > > > multicast delivery system that is low latency and lightweight because > > > once it hits real networks it changes from a hard problem into a > > > seriously hard problem because of multicast implosions and the like. > > > > This was attempted in the past with AF_DBUS, but the networking > > maintainers rightfully pointed out that the model there did not work. > > BTW, I don't think this has been brought up in this discussion yet ... > please correct me if I am wrong, my memory is very faint here (*), but > wasn't the main objection to AF_BUS that defining what happens when one of > the subscribed receivers disconnects is a policy matter, and as such > belongs to userspace (which wasn't the case with the submitted AF_BUS > implementation)? > > Was that considered unfixable and AF_BUS consequently given up because of > this? I think it was one of the reasons, I seem to remember many more. At that time, I had lunch with David Miller and he told me a few specific reasons along those lines, and that it just wasn't going to work as a network protocol at all, and to not try that method anymore, but instead, do it as a specific IPC interface, as has been done here :) thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 22:06 ` Andy Lutomirski 2015-04-21 7:38 ` Johannes Stezenbach 2015-04-21 9:35 ` One Thousand Gnomes @ 2015-04-21 10:31 ` Greg Kroah-Hartman 2015-04-21 10:53 ` Borislav Petkov ` (2 more replies) 2 siblings, 3 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-21 10:31 UTC (permalink / raw) To: Andy Lutomirski Cc: Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 03:06:09PM -0700, Andy Lutomirski wrote: > On Mon, Apr 20, 2015 at 2:46 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote: > >> Greg, > >> > >> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman: > >> >> In which situation on a common Linux system is the current dbus too slow today? > >> >> I've never seen a issue like "Oh my system is slow because dbus is > >> >> eating too much CPU cycles". > >> > > >> > See the original email which explained all of the things we can not do > >> > with D-Bus, some of which are due to speed, that can now be done with the > >> > kdbus code. > >> > >> okay, let's do it together. > >> > >> 1. Performance > >> You write: > >> "DBus is not used for performance sensitive applications because DBus is slow. > >> We want to make it fast so we can finally use it for low-latency, > >> high-throughput applications." > >> > >> Which applications exactly? > >> This reads to me like a solution for a non-existing problem. > > > > Anything that uses UDS for large buffers today can switch to using kdbus > > for it's data stream as it is faster. I know the Pulse Audio people > > have discussed this, and there are other people as well (Enlightenment > > library developers, glib, wayland, etc.) Without the code being in the > > kernel, no project is going to spend the time to convert their codebase > > to a feature that isn't accepted. > > Anything that uses UDS for large buffers today can switch to using > memfd over SCM_RIGHTS right now. If SCM_RIGHTS is too slow, then we > can fix it along the lines that Al proposed. But that doesn't solve the latency issues. As has been said many times in this thread, when using UDS to build a better IPC for apps, you will probably end up with todays D-Bus userspace implementation, and not have any of the other things that we keep talking about kdbus having. Bringing up SCM_RIGHTS means that this is not going to be a bus system at all. One principal design goal is to _not_ have peer-to-peer connections between all communicating parties, but rather one connection to a central component. If that component is not in the kernel, it has to be a userspace deamon, which in turn has all of the issues that dbus-daemon currently has. > >> 2. Security > >> I don't think that you need a 13k piece of code in the kernel to solve > >> that issue. > > > > Wait, what? How can you blow by that requirement by just saying that > > this proposal isn't acceptable? You can't do that, sorry. Please show > > how what we have proposed does not provide the security requirements as > > is documented. > > This is backwards. The way this discussion is going is: > > kdbus promoters: here's some code > > someone else: the code does such and such in a way that's wrong for xyz reason > > kdbus promoters: show us the implementation bug in such and such > > This is not how this discussion should work. Richard didn't say there > was a bug in your code; he said that your code was too large. "Your code is too large" does not provide any value to this discussion at all, sorry. Richard is being a jerk here, please don't perpetuate that line of discussion, it's not helpful at all. > >> 3. Semantics for apps with heavy data payloads > >> > >> Again, sounds like a solution for a non-existing problem. > > > > No, media apps need to share their data somehow, and kdbus provides a > > way to do that. GNOME portals are one such proposed codebase that is > > looking to use kdbus for this, and again, so is Pulse Audio and the > > other groups listed above. > > AFAICT you're talking about passing data into and out of a sandbox for > processing or UI purposes. We have two excellent ways to do that > right now: memfd and splice, depending on exactly what you're doing. That does not solve the latency issues, which is crucial for sound and graphics. > >> 4. "Being in the kernel closes a lot of races which can't be fixed with > >> the current userspace solutions." > >> > >> You really need a in-kernel dbus with 13k to solve that? > > > > Do you know of a smaller amount of code to solve this problem? If so, > > wonderful, please show us, but we aren't playing code golf here. We are > > proposing something that is well documented and easy to maintain, while > > still being fast and correct. If it you think this can be done in a > > smaller amount of code, please show us where we are doing needless > > things in the patches. > > I do. Implement something like my old SCM_IDENTITY proposal, which is > kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I > never proposed most of the absurd metadata items that kdbus is > proposing, and I also suggesting doing it over plain old UNIX sockets. We _want_ this metadata. You don't, that's fine. Calling our position "absurd" does not contribute to the discussion. We are simply exporting data that is already accessible via /proc and other locations, and do so in a race-free manner, something the kernel has never been able to provide in the past. We do not, in any way, export any additional internal kernel state, again, we are merely closing a race gap that has been there. > > Because of that, and the thread where the proposed security problems > > were agreed not to be a security problem, I don't see a reason anymore > > why this code should not be merged. > > > > With the exception of Al's code review, which is being addressed. But > > that's a minor thing, not a major design flaw at all. > > My NACK stands. A security problem was fixed, Please note that this issue was addressed in v2, which was posted many months ago. It is not present in this submission at all. > but the metadata system > has multiple problems, each of which is independently sufficient to > earn my nack. If you still see a problem, please explain what it is. At least give a general outline so that we can try to understand where you are coming from here. On the systemd mailing list you said that your only issue was that you are not convinced that this is a useful feature. But now you are saying you have "multiple concerns". What are they? > > On top of that, the policy mechanism is iffy and is probably worthy of my nack. This is a well-established concept that has worked great for many years. Why should we break with that? > On top of that, I think that someone into resource management needs to > seriously consider whether having a broadcast send do get_user_pages > or the equivalent on pages supplied by untrusted recipients (plural!) > is a good idea. Recipients need TALK access to the sender to receive broadcasts. Furthermore, even on AF_UNIX you need sender-side buffering, which might trigger reclaim. But sender-side buffering does not make sense for broadcasts (there is no POLLOUT for broadcasts), which is why we implemented the kdbus pools. I really doubt that the netlink-way of making all buffers kernel-owned is the way to go. But it would be trivial to change pool-memory on the root-memcg or even lock it, which would be almost equivalent to kernel-owned buffers, if you think that would solve a problem. > > On top of *that*, I have serious doubts that the whole design make > sense. That doesn't earn my nack specifically, but it sure seems like > a lot of people share my doubts that the design makes sense, and I > don't hear a whole lot of people saying that they thing the design is > a good thing to put in the kernel. > > Also, the current thread-of-lesser-doom on the systemd list greatly > decreases my confidence that the issues that have earned my nack will > get resolved. The kdbus designers seem to be unwilling to accept that > code should be merged into the kernel merely because I (me, > personally) don't see a straightforward security exploit that the code > enables. You have claimed that there is a security issue with no arguments backing it up. No one is expecting an actual exploit, but at least an explanation of what you have in mind and how it applies to the kdbus design would be appreciated. Otherwise this is an argument that no one can ever refute, and isn't fair. We are trying to find proper solutions for the problems we see, and that people tell us about. If there is a security issue in any of this, please let us know, and if these are unfixable we are very open to change the design. After all, that's why we are discussing it here in the open. Your review comments, and Al's, have been invaluable in helping make this code better, and I greatly appreciate them. The code is much better today than the v1 submission, and it shows, your insights here have been wonderful. But now, by saying that somehow the existing design details that we have picked are dangerous, without providing any details as to _why_ they are dangerous, is leaving us with nothing to actually be able to change. So please, specifics please, otherwise there's no way that we can provide a solution for this problem area. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 10:31 ` Greg Kroah-Hartman @ 2015-04-21 10:53 ` Borislav Petkov 2015-04-21 11:09 ` Greg Kroah-Hartman 2015-04-21 13:18 ` Olivier Galibert 2015-04-21 18:18 ` Andy Lutomirski 2 siblings, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-21 10:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 12:31:28PM +0200, Greg Kroah-Hartman wrote: > "Your code is too large" does not provide any value to this discussion > at all, sorry. Richard is being a jerk here, please don't perpetuate > that line of discussion, it's not helpful at all. We're becomint offensive slowly, aren't we? I'm sorry but Richard's right. A lot of people told you already that this is a *lot* of code and no other kernel person has given its Reviewed-by: for this. Which tells me that no one has reviewed it, maybe because it is a *lot* of code. Or maybe because people are busy with other stuff and don't have time to review 13KLOC and a spec ontop for something which is supposed to speed up some userspace pile which reportedly hasn't been written yet or for some use cases which by no means justify the addition of 13KLOC accelerator code to the kernel. Yeah, right. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 10:53 ` Borislav Petkov @ 2015-04-21 11:09 ` Greg Kroah-Hartman 2015-04-21 11:39 ` Borislav Petkov 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-21 11:09 UTC (permalink / raw) To: Borislav Petkov Cc: Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 12:53:20PM +0200, Borislav Petkov wrote: > On Tue, Apr 21, 2015 at 12:31:28PM +0200, Greg Kroah-Hartman wrote: > > "Your code is too large" does not provide any value to this discussion > > at all, sorry. Richard is being a jerk here, please don't perpetuate > > that line of discussion, it's not helpful at all. > > We're becomint offensive slowly, aren't we? Vs. being offensive quickly like you started out with? :) > I'm sorry but Richard's right. A lot of people told you already that > this is a *lot* of code and no other kernel person has given its > Reviewed-by: for this. And I've kept asking for review, and the people who have reviewed it, their issues have been addressed. But to somehow say "it's insecure because it is a lot of code" is unhelpful and flat out mean. > Which tells me that no one has reviewed it, maybe because it is a *lot* > of code. Or maybe because people are busy with other stuff and don't > have time to review 13KLOC and a spec ontop for something which is > supposed to speed up some userspace pile which reportedly hasn't been > written yet or for some use cases which by no means justify the addition > of 13KLOC accelerator code to the kernel. We add chunks of code large than this to the kernel all the time. For core infrastructure pieces. Asking for people to review it, and discuss it is normal, nothing is happening different here. In the end, not everyone reviews all of the code, that's normal. I see chunks get merged all the time and I go "what? That's crazy? Who would want a virtual machine as their networking packet parser?" But given that the maintainer of the subsystem acked it, and has agreed to maintain it, I trust them to do the right thing. That's how kernel development works, on trust. And that's the core issue here, trust. You are the only one to bring this up in the thread, but I feel it's the underlying theme. You act as if you don't trust us, the developers, to be doing the right thing here. If so, great, please, let's talk about it. Trust is about not always getting everything right, but trust that the people involved will be around to fix it if something is wrong. And that the people involved are actually working toward something they see is valuable and needed in an honest way. If you don't trust me, great, say it. If you don't trust David, Daniel, and Djalal, great, say so, and we can work on addressing that issue. If you don't trust the code, wonderful, please let me know that and show what is untrustworthy about it, again, as Al and Andy have done so. Let us address it that way, as has been done so thanks to Al and Andy's review. But to do drive-by potshots in an email thread, just because you somehow don't like the color of the bikeshed, without having every looked inside the bikeshed to see what it is doing there, is completely unfair and a unreasonable and totally unproductive. greg "read the code luke" k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 11:09 ` Greg Kroah-Hartman @ 2015-04-21 11:39 ` Borislav Petkov 0 siblings, 0 replies; 333+ messages in thread From: Borislav Petkov @ 2015-04-21 11:39 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 01:09:42PM +0200, Greg Kroah-Hartman wrote: > Vs. being offensive quickly like you started out with? :) Of course you'll come back with an attack. Polemical tactics or what is this called? > And I've kept asking for review, and the people who have reviewed it, > their issues have been addressed. > > But to somehow say "it's insecure because it is a lot of code" is > unhelpful and flat out mean. Well, so other people are reviewing it now. And it clearly needs more review, people suggested even sitting down and talking face to face. So we can put the mind-boggling upstreaming rush on hold now, right? Until all the doubts have been cleared and there's general agreement. And if you tell me that this current state of the affairs is general agreement, then I can surely understand the NAKs. > We add chunks of code large than this to the kernel all the time. For > core infrastructure pieces. Asking for people to review it, and discuss > it is normal, nothing is happening different here. 13KLOC in one go without a single Reviewed-by? I don't think so. > In the end, not everyone reviews all of the code, that's normal. I see > chunks get merged all the time and I go "what? That's crazy? Who would > want a virtual machine as their networking packet parser?" But given > that the maintainer of the subsystem acked it, and has agreed to > maintain it, I trust them to do the right thing. That's how kernel > development works, on trust. You got an argument about that already: we're talking core code here, not some driver or a packet parser which is not used by *everything*. > And that's the core issue here, trust. > > You are the only one to bring this up in the thread, but I feel it's the > underlying theme. You act as if you don't trust us, the developers, to > be doing the right thing here. If so, great, please, let's talk about > it. Well, maybe I'm the only one to say it... A lot of people are simply tired of the talk-people-into-submission tactics and can you blame them?! > Trust is about not always getting everything right, but trust that the > people involved will be around to fix it if something is wrong. And > that the people involved are actually working toward something they see > is valuable and needed in an honest way. Yeah, like the time I trusted to open a bugzilla to fix the "debug" parsing on the command line and systemd hijacking it. After that debacle, the trust container here is empty. That ship has sailed, sorry. > If you don't trust me, great, say it. If you don't trust David, Daniel, > and Djalal, great, say so, and we can work on addressing that issue. > > If you don't trust the code, wonderful, please let me know that and show > what is untrustworthy about it, again, as Al and Andy have done so. Let > us address it that way, as has been done so thanks to Al and Andy's > review. > > But to do drive-by potshots in an email thread, just because you somehow > don't like the color of the bikeshed, without having every looked inside > the bikeshed to see what it is doing there, is completely unfair and a > unreasonable and totally unproductive. Yeah right, like you don't do that. There's another one of your attack the opponent tactics. Oh well, I *trust* you to do *that*! :-D Let me give you a similar comeback: "But I'm on CC!" :-P > greg "read the code luke" k-h Oh but I'm looking. And I don't really like what I'm seeing. But to get back to Richard's statement: it is a *lot* of code so I, and probably everyone else too, can't just drop everything and jump on 13KLOC code. It needs time. So let's stop the polemics and agree that it is not time to upstream this thing yet. We should instead get down to business and give it a good long look until Reviewed-by's start happening. Agreed? Boris "Darth Maul". -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 10:31 ` Greg Kroah-Hartman 2015-04-21 10:53 ` Borislav Petkov @ 2015-04-21 13:18 ` Olivier Galibert 2015-04-21 13:48 ` Greg Kroah-Hartman 2015-04-21 18:18 ` Andy Lutomirski 2 siblings, 1 reply; 333+ messages in thread From: Olivier Galibert @ 2015-04-21 13:18 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 12:31 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > Bringing up SCM_RIGHTS means that this is not going to be a bus system > at all. One principal design goal is to _not_ have peer-to-peer > connections between all communicating parties, but rather one connection > to a central component. If that component is not in the kernel, it has > to be a userspace deamon, which in turn has all of the issues that > dbus-daemon currently has. You're not making sense there. If there is no daemon, then you're peer-to-peer, because there's no central component. If you consider the kernel the central component, then peer-to-peer is almost impossible by definition. It seems that almost everybody here thinks that the plumbing (e.g. transmitting messages in-order with multicasting) should be separated from the policy (who communicates with who), possibly leveraging the packet filtering infrastructure to implement the decided policy. What it is you reject about that point of view, which seems relatively normal when you think about building a collection of useful tools? OG. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 13:18 ` Olivier Galibert @ 2015-04-21 13:48 ` Greg Kroah-Hartman 2015-04-21 15:53 ` One Thousand Gnomes 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-21 13:48 UTC (permalink / raw) To: Olivier Galibert Cc: Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 03:18:35PM +0200, Olivier Galibert wrote: > On Tue, Apr 21, 2015 at 12:31 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > Bringing up SCM_RIGHTS means that this is not going to be a bus system > > at all. One principal design goal is to _not_ have peer-to-peer > > connections between all communicating parties, but rather one connection > > to a central component. If that component is not in the kernel, it has > > to be a userspace deamon, which in turn has all of the issues that > > dbus-daemon currently has. > > You're not making sense there. If there is no daemon, then you're > peer-to-peer, because there's no central component. The kernel is the central component, as implemented in the patches. > If you consider the kernel the central component, then peer-to-peer is > almost impossible by definition. Um, no, they go through the kernel for that model as well, same interface, it just depends on the type of message that you are sending as to who the recipients are (single or more than one.) > It seems that almost everybody here thinks that the plumbing (e.g. > transmitting messages in-order with multicasting) should be separated > from the policy (who communicates with who), possibly leveraging the > packet filtering infrastructure to implement the decided policy. What > it is you reject about that point of view, which seems relatively > normal when you think about building a collection of useful tools? The plumbing is "separated" from the policy in that they are different data structures, but you have to have the policy in order to know who to connect with whom, otherwise it just doesn't work. How would packet filtering work here for this type of decision making? That's a much more complex interface than what we have implemented, don't you agree? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 13:48 ` Greg Kroah-Hartman @ 2015-04-21 15:53 ` One Thousand Gnomes 0 siblings, 0 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-21 15:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Olivier Galibert, Andy Lutomirski, Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni > Um, no, they go through the kernel for that model as well, same > interface, it just depends on the type of message that you are sending > as to who the recipients are (single or more than one.) In other words its bog standard classic network layer multicasting. You don't need much policy for that > How would packet filtering work here for this type of decision making? > That's a much more complex interface than what we have implemented, > don't you agree? No - its about 10 lines of code to invoke EBPF. The socket layer supports it today. What it doesn't have is the multicast transport bits. It's a classic networking layer item. I know DaveM didn't like the original because all the policy was mixed up in it and the blocking problem was undefined but every time I look at this I reach the same conclusion - Its a socket layer problem - The sk_buff structures are the memory allocator needed - The socket layer does the resource management - The socket layer has SCM_RIGHTS already - The socket layer has EBPF already - The socket layer has a fantastic debugging environment - It's the same components needed for fast MQ services and (almost) for HPC uses [HPC wants scatter/gather too] It just needs - RDP multicast AF_UNIX type sockets. Not in itself a huge problem, although it does need some nice refcounting so that you queue each message once and the readers share it. - A clear fairly policy-free description of how you deal with multicast to some clients who are "full" And I think that aspect of things needs to go back via the networking maintainers to figure out if this is "DaveM doesn't like dbus" or there are some actually insoluble problems I'm missing here. Designing alternate turds to go around DaveM may not be the right approach even if he's stubborn 8) Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 10:31 ` Greg Kroah-Hartman 2015-04-21 10:53 ` Borislav Petkov 2015-04-21 13:18 ` Olivier Galibert @ 2015-04-21 18:18 ` Andy Lutomirski 2 siblings, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-21 18:18 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 3:31 AM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Mon, Apr 20, 2015 at 03:06:09PM -0700, Andy Lutomirski wrote: >> On Mon, Apr 20, 2015 at 2:46 PM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >> > On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote: >> >> Greg, >> >> >> >> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman: >> >> >> In which situation on a common Linux system is the current dbus too slow today? >> >> >> I've never seen a issue like "Oh my system is slow because dbus is >> >> >> eating too much CPU cycles". >> >> > >> >> > See the original email which explained all of the things we can not do >> >> > with D-Bus, some of which are due to speed, that can now be done with the >> >> > kdbus code. >> >> >> >> okay, let's do it together. >> >> >> >> 1. Performance >> >> You write: >> >> "DBus is not used for performance sensitive applications because DBus is slow. >> >> We want to make it fast so we can finally use it for low-latency, >> >> high-throughput applications." >> >> >> >> Which applications exactly? >> >> This reads to me like a solution for a non-existing problem. >> > >> > Anything that uses UDS for large buffers today can switch to using kdbus >> > for it's data stream as it is faster. I know the Pulse Audio people >> > have discussed this, and there are other people as well (Enlightenment >> > library developers, glib, wayland, etc.) Without the code being in the >> > kernel, no project is going to spend the time to convert their codebase >> > to a feature that isn't accepted. >> >> Anything that uses UDS for large buffers today can switch to using >> memfd over SCM_RIGHTS right now. If SCM_RIGHTS is too slow, then we >> can fix it along the lines that Al proposed. > > But that doesn't solve the latency issues. I said memfd, not memfd bounced off a userspace daemon. AFAICT AF_UNIX peer-to-peer is considerably faster than kdbus, and I don't see why memfd would change this. > > As has been said many times in this thread, when using UDS to build a > better IPC for apps, you will probably end up with todays D-Bus > userspace implementation, and not have any of the other things that we > keep talking about kdbus having. > > Bringing up SCM_RIGHTS means that this is not going to be a bus system > at all. One principal design goal is to _not_ have peer-to-peer > connections between all communicating parties, but rather one connection > to a central component. If that component is not in the kernel, it has > to be a userspace deamon, which in turn has all of the issues that > dbus-daemon currently has. > AFAICT userspace dbus-daemon has two major problems: 1. SCM_RIGHTS sucks. That's why I proposed fixing it. 2. Performance. But using an in-kernel bus is far from the only solution. I much prefer adding something simple and flexible in the kernel so that a userspace daemon can easily and efficiently introduce two bus users to each other. >> >> 3. Semantics for apps with heavy data payloads >> >> >> >> Again, sounds like a solution for a non-existing problem. >> > >> > No, media apps need to share their data somehow, and kdbus provides a >> > way to do that. GNOME portals are one such proposed codebase that is >> > looking to use kdbus for this, and again, so is Pulse Audio and the >> > other groups listed above. >> >> AFAICT you're talking about passing data into and out of a sandbox for >> processing or UI purposes. We have two excellent ways to do that >> right now: memfd and splice, depending on exactly what you're doing. > > That does not solve the latency issues, which is crucial for sound and > graphics. As above, there's only a latency issue right now if you want sound and graphics to use a *bus*, and even that could be fixed without moving the bus into the kernel. > >> >> 4. "Being in the kernel closes a lot of races which can't be fixed with >> >> the current userspace solutions." >> >> >> >> You really need a in-kernel dbus with 13k to solve that? >> > >> > Do you know of a smaller amount of code to solve this problem? If so, >> > wonderful, please show us, but we aren't playing code golf here. We are >> > proposing something that is well documented and easy to maintain, while >> > still being fast and correct. If it you think this can be done in a >> > smaller amount of code, please show us where we are doing needless >> > things in the patches. >> >> I do. Implement something like my old SCM_IDENTITY proposal, which is >> kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I >> never proposed most of the absurd metadata items that kdbus is >> proposing, and I also suggesting doing it over plain old UNIX sockets. > > We _want_ this metadata. You don't, that's fine. Calling our position > "absurd" does not contribute to the discussion. We are simply exporting > data that is already accessible via /proc and other locations, and do so > in a race-free manner, something the kernel has never been able to > provide in the past. > > We do not, in any way, export any additional internal kernel state, > again, we are merely closing a race gap that has been there. This has been covered ad nauseum on the systemd thread, so I'm going not going to respond here. > >> > Because of that, and the thread where the proposed security problems >> > were agreed not to be a security problem, I don't see a reason anymore >> > why this code should not be merged. >> > >> > With the exception of Al's code review, which is being addressed. But >> > that's a minor thing, not a major design flaw at all. >> >> My NACK stands. A security problem was fixed, > > Please note that this issue was addressed in v2, which was posted many > months ago. It is not present in this submission at all. That's why I said "was fixed". > >> but the metadata system >> has multiple problems, each of which is independently sufficient to >> earn my nack. > > If you still see a problem, please explain what it is. At least give a > general outline so that we can try to understand where you are coming > from here. On the systemd mailing list you said that your only issue > was that you are not convinced that this is a useful feature. But now > you are saying you have "multiple concerns". What are they? > We've only discussed creds on the systemd list. There's still cmdline and starttime (at least). I've actually *submitted patches* to fix starttime, but no one seems to care. i'll resubmit them anyway for 4.2, since I think they're more generally useful. [snip] --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 21:46 ` Greg Kroah-Hartman 2015-04-20 22:06 ` Andy Lutomirski @ 2015-04-21 8:18 ` Richard Cochran 1 sibling, 0 replies; 333+ messages in thread From: Richard Cochran @ 2015-04-21 8:18 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 11:46:51PM +0200, Greg Kroah-Hartman wrote: > As was pointed out, even the tiny IoT devices running Linux are now > using D-Bus, it's everywhere :) I would like to take issue with that assertion. Some people are putting dbus and systemd into embedded devices, but not into "tiny" ones but rather "beefy" ones. Furthermore, just because some people use systemd in embedded doesn't mean that that is the optimal solution or that everyone does it that way. Thanks, Richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 21:16 ` Richard Weinberger 2015-04-20 21:46 ` Greg Kroah-Hartman @ 2015-04-21 9:07 ` Johannes Stezenbach 2015-04-21 13:37 ` Havoc Pennington 1 sibling, 1 reply; 333+ messages in thread From: Johannes Stezenbach @ 2015-04-21 9:07 UTC (permalink / raw) To: Richard Weinberger Cc: Greg Kroah-Hartman, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote: > Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman: > >> In which situation on a common Linux system is the current dbus too slow today? > >> I've never seen a issue like "Oh my system is slow because dbus is > >> eating too much CPU cycles". > > > > See the original email which explained all of the things we can not do > > with D-Bus, some of which are due to speed, that can now be done with the > > kdbus code. > > okay, let's do it together. > > 1. Performance > You write: > "DBus is not used for performance sensitive applications because DBus is slow. > We want to make it fast so we can finally use it for low-latency, > high-throughput applications." > > Which applications exactly? > This reads to me like a solution for a non-existing problem. This is the line of thinking I was aiming at during a previous review cycle. Basically, as Havoc outlined in his mail explaining the design decisions, he traded speed for simplicity and chose the slowest possible messaging topology (everything goes through a central broker). That makes sense because, to quote from his mail: http://article.gmane.org/gmane.linux.kernel/1931720 > Message passing or IPC isn't really the most important part of dbus. > Process lifecycle tracking and discovery are more important. I asked for performance numbers and got this reply from David Herrmann: http://article.gmane.org/gmane.linux.kernel.api/7636 My line of thinking had been to amend DBus with optional direct client/server communication for the performance critical cases, since I believe those cases are RPC calls and not other types of messaging (see also the "Performance" section in the cover letter of this thread). (My other line of thinking had been: if you need performance, don't use DBus e.g. in the case of the tiny ARM systems sending hundreds of thousands of messages during boot, quoted by Greg.) Now, after reading Havoc's description of the DBus design trade-offs, I have doubts that modifiying the DBus architecture in userspace to speed it up is a good thing. OTOH I am still convinced kdbus is the wrong solution, for the sole reason called gut feeling. There is nothing about the kdbus API that makes me go "oh nice, elegant, I want to use it". And the kdbus authors seem to agree and tell you you should only ever use it via a library like sd-bus and try to justify it by comparing it to ALSA :-( The ideas discussed among Alan, Andy and Al, even if ad-hoc and yet immature, immediately seemed to be much more appealing to me. I hope something real comes out of that. Johannes ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 9:07 ` Johannes Stezenbach @ 2015-04-21 13:37 ` Havoc Pennington 2015-04-22 1:51 ` Bernd Petrovitsch 2015-04-22 13:09 ` Johannes Stezenbach 0 siblings, 2 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-21 13:37 UTC (permalink / raw) To: Johannes Stezenbach Cc: Richard Weinberger, Greg Kroah-Hartman, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi, On Tue, Apr 21, 2015 at 5:07 AM, Johannes Stezenbach <js@sig21.net> wrote: > My line of thinking had been to amend DBus with optional direct > client/server communication for the performance critical > cases, since I believe those cases are RPC calls and not other > types of messaging (see also the "Performance" section in the > cover letter of this thread). (My other line of thinking had > been: if you need performance, don't use DBus e.g. in the > case of the tiny ARM systems sending hundreds of thousands of > messages during boot, quoted by Greg.) > This has long been sort of the 'party line' and I've told many people this on the dbus mailing list over the years (almost exactly what you just said - that for performance-critical cases they should open a direct socket or use something else or whatever). Usually this makes app developers a little cranky because something that was going to be easy in their mind just got harder. I think the pressure to use dbus happens for several reasons, if you use a side channel some example complaints people have are: * you have to reinvent any dbus solutions for security policy, containerization, debugging, introspection, etc. * you're now writing custom socket code instead of using the high-level dbus API * the side channel loses message ordering with respect to dbus messages * your app code is kind of "infected" structurally by a performance optimization concern * you have to decide in advance which messages are "too big" or "too numerous" - which may not be obvious, think of a cut-and-paste API, where usually it's a paragraph of text but it could in theory be a giant image * you can't do big/numerous multicast, side channel only solves the unicast There's no doubt that it's possible to use a side channel - just as it was possible to construct an ad hoc IPC system prior to dbus - but the overall OS (counting both kernel and userspace) perhaps becomes more complex as a result, compared to having one model that supports more cases. One way to frame it: the low performance makes dbus into a relatively leaky abstraction where there's this surprise lurking for app developers that they might have to roll their own IPC on the side or special-case some of their messages. it's not the end of the world, it's just that it would have a certain amount of overall simplicity (counting userspace+kernel together) if one solution covered almost all use-cases in this "process-to-process comms on local system" scenario, instead of 90% of use-cases but too slow for the last 10%. The simplicity here isn't only for app developers, it's also for anyone doing debugging or administration or system integration, where they can deal with one system _or_ one system plus various ad-hoc side channels. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 13:37 ` Havoc Pennington @ 2015-04-22 1:51 ` Bernd Petrovitsch 2015-04-22 3:11 ` Havoc Pennington 2015-04-22 13:09 ` Johannes Stezenbach 1 sibling, 1 reply; 333+ messages in thread From: Bernd Petrovitsch @ 2015-04-22 1:51 UTC (permalink / raw) To: Havoc Pennington Cc: Johannes Stezenbach, Richard Weinberger, Greg Kroah-Hartman, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi all! On Die, 2015-04-21 at 09:37 -0400, Havoc Pennington wrote: [...] > This has long been sort of the 'party line' and I've told many people > this on the dbus mailing list over the years (almost exactly what you > just said - that for performance-critical cases they should open a > direct socket or use something else or whatever). Usually this makes > app developers a little cranky because something that was going to be > easy in their mind just got harder. Perhaps these developers should rethink the design and protocols of their apps - or pay the price for a stupid design which relies on heavy IPC traffic (and usually - sooner or later - heavy network traffic). Or - at least - deliver a (technical!) proof why this isn't feasible. The case of "patching the kernel to lie about the kernel's command line" just because some ill-designed user-space daemon misused it" was bad enough and the above smells quite similarly. Kind regards, Bernd -- "I dislike type abstraction if it has no real reason. And saving on typing is not a good reason - if your typing speed is the main issue when you're coding, you're doing something seriously wrong." - Linus Torvalds ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-22 1:51 ` Bernd Petrovitsch @ 2015-04-22 3:11 ` Havoc Pennington 0 siblings, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-22 3:11 UTC (permalink / raw) To: Bernd Petrovitsch Cc: Johannes Stezenbach, Richard Weinberger, Greg Kroah-Hartman, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 9:51 PM, Bernd Petrovitsch <bernd@petrovitsch.priv.at> wrote: > Hi all! > > On Die, 2015-04-21 at 09:37 -0400, Havoc Pennington wrote: > [...] >> This has long been sort of the 'party line' and I've told many people >> this on the dbus mailing list over the years (almost exactly what you >> just said - that for performance-critical cases they should open a >> direct socket or use something else or whatever). Usually this makes >> app developers a little cranky because something that was going to be >> easy in their mind just got harder. > > Perhaps these developers should rethink the design and protocols of > their apps - or pay the price for a stupid design which relies on heavy > IPC traffic (and usually - sooner or later - heavy network traffic). > Or - at least - deliver a (technical!) proof why this isn't feasible. > > The case of "patching the kernel to lie about the kernel's command line" > just because some ill-designed user-space daemon misused it" was bad > enough and the above smells quite similarly. > I don't think it's ridiculous that app developers try the clean, simple solution first (use one IPC for everything) and only optimize once they discover they need to. If dbus were faster, many of these designs might not be stupid and might not be a mis-use. And the app might delete a lot of code, which is a plus for anyone using that app. More code = more bugs after all. I grant you that some apps have bad code, but I don't think it's my job to punish them by making things slow on purpose. I only made things slow because I didn't know a way to make them fast without sacrificing a more important goal, but it was never ideal. The kdbus developers have proposed a way out of the tradeoff. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 13:37 ` Havoc Pennington 2015-04-22 1:51 ` Bernd Petrovitsch @ 2015-04-22 13:09 ` Johannes Stezenbach 1 sibling, 0 replies; 333+ messages in thread From: Johannes Stezenbach @ 2015-04-22 13:09 UTC (permalink / raw) To: Havoc Pennington Cc: Richard Weinberger, Greg Kroah-Hartman, David Herrmann, Linus Torvalds, Steven Rostedt, One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Tue, Apr 21, 2015 at 09:37:44AM -0400, Havoc Pennington wrote: > > I think the pressure to use dbus happens for several reasons, if you > use a side channel some example complaints people have are: > > * you have to reinvent any dbus solutions for security policy, > containerization, debugging, introspection, etc. > * you're now writing custom socket code instead of using the > high-level dbus API > * the side channel loses message ordering with respect to dbus messages > * your app code is kind of "infected" structurally by a performance > optimization concern > * you have to decide in advance which messages are "too big" or "too > numerous" - which may not be obvious, think of a cut-and-paste API, > where usually it's a paragraph of text but it could in theory be a > giant image > * you can't do big/numerous multicast, side channel only solves the unicast > > There's no doubt that it's possible to use a side channel - just as it > was possible to construct an ad hoc IPC system prior to dbus - but the > overall OS (counting both kernel and userspace) perhaps becomes more > complex as a result, compared to having one model that supports more > cases. > > One way to frame it: the low performance makes dbus into a relatively > leaky abstraction where there's this surprise lurking for app > developers that they might have to roll their own IPC on the side or > special-case some of their messages. > > it's not the end of the world, it's just that it would have a certain > amount of overall simplicity (counting userspace+kernel together) if > one solution covered almost all use-cases in this "process-to-process > comms on local system" scenario, instead of 90% of use-cases but too > slow for the last 10%. The simplicity here isn't only for app > developers, it's also for anyone doing debugging or administration or > system integration, where they can deal with one system _or_ one > system plus various ad-hoc side channels. Clearly it is not useful to put the burden on the app developers. However, I do not (yet?) understand why direct links couldn't be added to the DBus daemon and library and be used fairly transparently by apps: - allow peers to announce "I allow direct connect" (we don't want to many sockets/connections, just e.g. gconf, polkit, ... where it matters for performance) - when clients do an RPC call, check if the server allows direct connect and then do it (via DBus daemon as helper) - obviously the clients would maintain the connection to the DBus daemon for the remaining purposes Of course. that means the DBus daemon cannot enforce the policy anymore, you could use the same database but the code which uses it would have to be moved into the dbus library. I must admit that I do not understand the importance of message ordering between some RPC call and other messages via the DBus daemon since the app can do the RPC call at any time. Wrt big/numerous multicast, you are right that this wouldn't solve it, but doesn't seem the problem we need to address? At least I've not seen any performance measurements which would indicate it. That all said, I'm not opposed at all to adding kernel infrastructure for the benefit of DBus. However, I am quite disappointed both by the monolithic, single-purpose design of the kdbus API, and especially by the way it is presented to the kernel community. What I mean by the latter is that we get an amount kernel code which you cannot understand unless you also understand the userspace DBus *and* the actual usage of DBus in desktop systems, and this is accompanied with statements along the line of "many smart people worked on this for two years and everyone agreed". I.e., we only get the solution but not the bckground knowledge to understand and judge the solution for ourselves. What I had appreciated instead: - performance meansurement results which demontrate the problem and the actual DBus use in practice for various message type / use cases - an account of the attempts that have been made to fix it and the reasons why they failed, so we can understand how the current design has evolved The latter may be asking a lot, but IPC is a core OS feature which comes right after CPU and memory resource management and basic I/O. The basic IPC APIs are fairly simple, the socket API is already quite complex, and kdbus goes to another level of complexity and cruftiness, and with all the words which have been written in this thread there is still not an adequate justification for it. For example, I do understand the policy database has to be in the kernel as it is checked for every message, but I don't see why the name service needs to be in the kernel. I suspect (lacking performance figures) that name ownership changes are relatively rare, and lookups, too (ISTR you mentioned clients cache the result). For the base messaging and policy filtering I don't see why this has to be one monolithic API and not split in a fairly simple, general purpose messaging API, and a completely seperate API for configuring the filters and attaching them to the bus. Johannes ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 @ 2015-04-20 20:26 George Spelvin 2015-04-21 12:08 ` Austin S Hemmelgarn 0 siblings, 1 reply; 333+ messages in thread From: George Spelvin @ 2015-04-20 20:26 UTC (permalink / raw) To: gregkh, richard.weinberger Cc: bp, jkosina, linux-kernel, luto, martin, richard, umgwanakikbuti, viro > It's used everywhere, on servers, > embedded systems, desktops, you name it. All languages have bindings > for it, and it's the underpinning of a modern Linux stack. Since when? D-bus is some GUI depoendency. On my console-only servers, it's not needed, and not installed: # dpkg-query -s libdbus-1-3 dbus dpkg-query: package 'libdbus-1-3' is not installed and no information is available dpkg-query: package 'dbus' is not installed and no information is available # dpkg-query -l \*dbus\* dpkg-query: no packages found matching *dbus* It's also not needed on a basic GUI system. Firefox complains about saving preferences if it's not running, but runs just fine: $ pgrep dbus $ ps 6570 19644 25779 29487 29492 PID TTY STAT TIME COMMAND 6570 ? Sl 1:48 iceweasel 19644 ? Sl 0:29 /usr/bin/vlc -I qt4 25779 ? Sl 0:16 rhythmbox 29487 ? Sl 0:00 /usr/bin/gnumeric 29492 ? Sl 0:01 /usr/bin/gimp $ Richard Weinberger wrote: > kdbus will be a major hard-dependency for every non-trivial userland. > Like cgroups... and > We're all forced to use cgroups, systemd, udev unless we want to have busybox > as userland. That's a fact. My daily desktop also has # CONFIG_CGROUPS is not set And no systemd. Udev actually does something useful, so I have it on my desktop, but I have machines with a static /dev instead. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 20:26 George Spelvin @ 2015-04-21 12:08 ` Austin S Hemmelgarn 0 siblings, 0 replies; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-21 12:08 UTC (permalink / raw) To: George Spelvin, gregkh, richard.weinberger Cc: bp, jkosina, linux-kernel, luto, martin, richard, umgwanakikbuti, viro [-- Attachment #1: Type: text/plain, Size: 1030 bytes --] On 2015-04-20 16:26, George Spelvin wrote: >> It's used everywhere, on servers, >> embedded systems, desktops, you name it. All languages have bindings >> for it, and it's the underpinning of a modern Linux stack. > > Since when? D-bus is some GUI depoendency. On my console-only servers, it's > not needed, and not installed: > > # dpkg-query -s libdbus-1-3 dbus > dpkg-query: package 'libdbus-1-3' is not installed and no information is available > dpkg-query: package 'dbus' is not installed and no information is available > # dpkg-query -l \*dbus\* > dpkg-query: no packages found matching *dbus* Same here, but I use Gentoo, so it's easy to avoid stuff you don't want ;) [...] > Richard Weinberger wrote: > And no systemd. Udev actually does something useful, so I have it on my > desktop, but I have machines with a static /dev instead. Likewise, I get by just fine with OpenRC, eudev, and Monit, the combination of which provides all the functionality of SystemD that I actually care about. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* [GIT PULL] kdbus for 4.1-rc1 @ 2015-04-13 19:03 Greg Kroah-Hartman 2015-04-13 19:29 ` Eric W. Biederman ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-13 19:03 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton Cc: Arnd Bergmann, ebiederm, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) ---------------------------------------------------------------- kdbus for 4.1-rc1 Here's the kdbus pull request for 4.1-rc1. It's been under development for many years now, and been in linux-next for many months, and has undergone loads of testing a review and even a few good arguments. It comes with full documentation and tests. There has been a few complaints about the code, notably from people who don't like the use of metadata in the bus messages. That is actually one of the main features here, as we can get this data in a secure and reliable way, and it's something that userspace requires today. So while it does look "odd" to people who are not familiar with dbus, this is something that finally fixes a number of almost unfixable races in the current dbus implementations. The rest of this pull request message comes from the kdbus patch posting messages as sent to lkml previously: Reasons kdbus should be in the kernel, instead of userspace as it is currently done today includes the following: * Performance: Fewer process context switches, fewer copies, fewer syscalls, larger memory chunks via memfd. This is really important for a whole class of userspace programs that are ported from other operating systems that are run on tiny ARM systems that rely on hundreds of thousands of messages passed at boot time, and at "critical" times in their user interaction loops. DBus is not used for performance sensitive applications because DBus is slow. We want to make it fast so we can finally use it for low-latency, high-throughput applications. A simple DBus method-call+reply takes 200us on an up-to-date test machine, with kdbus it takes 8us (with UDS about 2us). If the packet size is increased from 8k to 128k, kdbus even beats UDS due to single-copy transfers. * Security: The peers which communicate do not have to trust each other, as the only trustworthy component in the game is the kernel which adds metadata and ensures that all data passed as payload is either copied or sealed, so that the receiver can parse the data without having to protect against changing memory while parsing buffers. Also, all the data transfer is controlled by the kernel, so that LSMs can track and control what is going on, without involving userspace. Because of the LSM issue, security people are much happier with this model than the current scheme of having to hook into dbus to mediate things. * More types of metadata can be attached to messages than in userspace * Semantics for apps with heavy data payloads (media apps, for instance) with optinal priority message dequeuing, and global message ordering. Some "crazy" people are playing with using kdbus for audio data in the system. I'm not saying that this is the best model for this, but until now, there wasn't any other way to do this without having to create custom "buses", one for each application library. * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions. For example, with kdbus, there is a way a client can disconnect from a bus, but do so only if no further messages present in its queue, which is crucial for implementing race-free "exit-on-idle" services * Eavesdropping on the kernel level, so privileged users can hook into the message stream without hacking support for that into their userspace processes * A number of smaller benefits: for example kdbus learned a way to peek full messages without dequeing them, which is really useful for logging metadata when handling bus-activation requests. * dbus-daemon is not available during early-boot or shutdown. DBus marshaling is the de-facto standard in all major(!) Linux desktop systems. It is well established and accepted by many DEs. It also solves many other problems, including: policy, authentication / authorization, well-known name registry, efficient broadcasts / multicasts, peer discovery, bus discovery, metadata transmission, and more. It is a shame that we cannot use this well-established protocol for low-latency applications. We, effectively, have to duplicate all this code on custom UDS and other transports just because DBus is too slow. kdbus tries to unify those efforts, so that we don't need multiple policy implementations, name registries and peer discovery mechanisms. Furthermore, kdbus implements comprehensive, yet optional, metadata transmission that allows to identify and authenticate peers in a race-free manner (which is *not* possible with UDS). Also, kdbus provides a single transport bus with sequential message numbering. If you use multiple channels, you cannot give any ordering guarantees across peers (for instance, regarding parallel name-registry changes). Of course, some of the bits above could be implemented in userspace alone, for example with more sophisticated memory management APIs, but this is usually done by losing out on the other details. For example, for many of the memory management APIs, it's hard to not require the communicating peers to fully trust each other. And we _really_ don't want peers to have to trust each other. Another benefit of having this in the kernel, rather than as a userspace daemon, is that you can now easily use the bus from the initrd, or up to the very end when the system shuts down. On current userspace D-Bus, this is not really possible, as this requires passing the bus instance around between initrd and the "real" system. Such a transition of all fds also requires keeping full state of what has already been read from the connection fds. kdbus makes this much simpler, as we can change the ownership of the bus, just by passing one fd over from one part to the other. Given the theoretical advantages above, here are some real-world examples: * The Tizen developers have been complaining about the high latency of DBus for polkit'ish policy queries. That's why their authentication framework uses custom UDS sockets (called 'Cynara'). If a UI-interaction needs multiple authentication-queries, you don't want it to take multiple milliseconds, given that you usually want to render the result in the same frame. * PulseAudio doesn't use DBus for data transmission. They had to implement their own marshaling code, transport layer and so on, just because DBus1-latency is horrible. With kdbus, we can basically drop this code-duplication and unify the IPC layer. Same is true for Wayland, btw. * By moving broadcast-transmission into the kernel, we can use the time-slices of the sender to perform heavy operations. This is also true for policy decisions, etc. With a userspace daemon, we cannot perform operations in a time-slice of the caller. This makes DoS attacks much harder. * With priority-inheritance, we can do synchronous calls into trusted peers and let them optionally use our time-slice to perform the action. This allows syscall-like/binder-like method-calls into other processes. Without priority-inheritance, this is not possible in a secure manner (see 'priority-inheritance'). * Logging-daemons often want to attach metadata to log-messages so debugging/filtering gets easier. If short-lived programs send log-messages, the destination peer might not be able to read such metadata from /proc, as the process might no longer be available at that time. Same is true for policy-decisions like polkit does. You cannot send off method-calls and exit. You have to wait for a reply, even though you might not even care for it. If you don't wait, the other side might not be able to verify your identity and as such reject the request. * Even though the dbus traffic on idle-systems might be low, this doesn't mean it's not significant at boot-times or under high-load. If you run a dbus-monitor of your choice, you will see there is an significant number of messages exchanged during VT-switches, startup, shutdown, suspend, wakeup, hotplugging and similar situations where lots of control-messages are exchanged. We don't want to spend hundreds of ms just to transmit those messages. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> ---------------------------------------------------------------- Arnd Bergmann (1): kdbus: avoid the use of struct timespec Daniel Mack (18): kdbus: add documentation kdbus: add uapi header file kdbus: add driver skeleton, ioctl entry points and utility functions kdbus: add connection pool implementation kdbus: add connection, queue handling and message validation code kdbus: add node and filesystem implementation kdbus: add code to gather metadata kdbus: add code for notifications and matches kdbus: add code for buses, domains and endpoints kdbus: add name registry implementation kdbus: add policy database implementation kdbus: add Makefile, Kconfig and MAINTAINERS entry kdbus: add walk-through user space example kdbus: add selftests Documentation: kdbus: fix location for generated files kdbus: connection: fix handling of failed fget() kdbus: Fix CONFIG_KDBUS help text samples: kdbus: build kdbus-workers conditionally David Herrmann (5): kdbus: samples/kdbus: add -lrt samples/kdbus: drop wrong include Documentation/kdbus: fix out-of-tree builds Documentation/kdbus: support quiet builds selftests/kdbus: fix gitignore Lucas De Marchi (1): kdbus: fix header guard name Lukasz Skalski (1): Documentation/kdbus: replace 'reply_cookie' with 'cookie_reply' Nicolas Iooss (1): kdbus: fix minor typo in the walk-through example Sergei Zviagintsev (5): kdbus: uapi: Fix kernel-doc for enum kdbus_send_flags Documentation: kdbus: Fix list of KDBUS_CMD_ENDPOINT_UPDATE errors Documentation: kdbus: Update list of ioctls which cause writing to receiver's pool Documentation: kdbus: Fix description of KDBUS_SEND_SYNC_REPLY flag Documentation: kdbus: Fix typos Tyler Baker (1): selftest/kdbus: enable cross compilation Documentation/Makefile | 2 +- Documentation/ioctl/ioctl-number.txt | 1 + Documentation/kdbus/.gitignore | 2 + Documentation/kdbus/Makefile | 40 + Documentation/kdbus/kdbus.bus.xml | 359 ++++ Documentation/kdbus/kdbus.connection.xml | 1250 ++++++++++++ Documentation/kdbus/kdbus.endpoint.xml | 429 ++++ Documentation/kdbus/kdbus.fs.xml | 124 ++ Documentation/kdbus/kdbus.item.xml | 839 ++++++++ Documentation/kdbus/kdbus.match.xml | 555 ++++++ Documentation/kdbus/kdbus.message.xml | 1276 ++++++++++++ Documentation/kdbus/kdbus.name.xml | 711 +++++++ Documentation/kdbus/kdbus.policy.xml | 406 ++++ Documentation/kdbus/kdbus.pool.xml | 326 +++ Documentation/kdbus/kdbus.xml | 1012 ++++++++++ Documentation/kdbus/stylesheet.xsl | 16 + MAINTAINERS | 13 + Makefile | 1 + include/uapi/linux/Kbuild | 1 + include/uapi/linux/kdbus.h | 979 +++++++++ include/uapi/linux/magic.h | 2 + init/Kconfig | 13 + ipc/Makefile | 2 +- ipc/kdbus/Makefile | 22 + ipc/kdbus/bus.c | 560 ++++++ ipc/kdbus/bus.h | 101 + ipc/kdbus/connection.c | 2214 +++++++++++++++++++++ ipc/kdbus/connection.h | 257 +++ ipc/kdbus/domain.c | 296 +++ ipc/kdbus/domain.h | 77 + ipc/kdbus/endpoint.c | 275 +++ ipc/kdbus/endpoint.h | 67 + ipc/kdbus/fs.c | 510 +++++ ipc/kdbus/fs.h | 28 + ipc/kdbus/handle.c | 617 ++++++ ipc/kdbus/handle.h | 85 + ipc/kdbus/item.c | 339 ++++ ipc/kdbus/item.h | 64 + ipc/kdbus/limits.h | 64 + ipc/kdbus/main.c | 125 ++ ipc/kdbus/match.c | 559 ++++++ ipc/kdbus/match.h | 35 + ipc/kdbus/message.c | 616 ++++++ ipc/kdbus/message.h | 133 ++ ipc/kdbus/metadata.c | 1159 +++++++++++ ipc/kdbus/metadata.h | 57 + ipc/kdbus/names.c | 772 +++++++ ipc/kdbus/names.h | 74 + ipc/kdbus/node.c | 910 +++++++++ ipc/kdbus/node.h | 84 + ipc/kdbus/notify.c | 248 +++ ipc/kdbus/notify.h | 30 + ipc/kdbus/policy.c | 489 +++++ ipc/kdbus/policy.h | 51 + ipc/kdbus/pool.c | 728 +++++++ ipc/kdbus/pool.h | 46 + ipc/kdbus/queue.c | 678 +++++++ ipc/kdbus/queue.h | 92 + ipc/kdbus/reply.c | 257 +++ ipc/kdbus/reply.h | 68 + ipc/kdbus/util.c | 201 ++ ipc/kdbus/util.h | 74 + samples/Kconfig | 7 + samples/Makefile | 3 +- samples/kdbus/.gitignore | 1 + samples/kdbus/Makefile | 9 + samples/kdbus/kdbus-api.h | 114 ++ samples/kdbus/kdbus-workers.c | 1326 ++++++++++++ tools/testing/selftests/Makefile | 1 + tools/testing/selftests/kdbus/.gitignore | 1 + tools/testing/selftests/kdbus/Makefile | 48 + tools/testing/selftests/kdbus/kdbus-enum.c | 94 + tools/testing/selftests/kdbus/kdbus-enum.h | 14 + tools/testing/selftests/kdbus/kdbus-test.c | 923 +++++++++ tools/testing/selftests/kdbus/kdbus-test.h | 85 + tools/testing/selftests/kdbus/kdbus-util.c | 1615 +++++++++++++++ tools/testing/selftests/kdbus/kdbus-util.h | 222 +++ tools/testing/selftests/kdbus/test-activator.c | 318 +++ tools/testing/selftests/kdbus/test-attach-flags.c | 750 +++++++ tools/testing/selftests/kdbus/test-benchmark.c | 451 +++++ tools/testing/selftests/kdbus/test-bus.c | 175 ++ tools/testing/selftests/kdbus/test-chat.c | 122 ++ tools/testing/selftests/kdbus/test-connection.c | 616 ++++++ tools/testing/selftests/kdbus/test-daemon.c | 65 + tools/testing/selftests/kdbus/test-endpoint.c | 341 ++++ tools/testing/selftests/kdbus/test-fd.c | 789 ++++++++ tools/testing/selftests/kdbus/test-free.c | 64 + tools/testing/selftests/kdbus/test-match.c | 441 ++++ tools/testing/selftests/kdbus/test-message.c | 731 +++++++ tools/testing/selftests/kdbus/test-metadata-ns.c | 506 +++++ tools/testing/selftests/kdbus/test-monitor.c | 176 ++ tools/testing/selftests/kdbus/test-names.c | 194 ++ tools/testing/selftests/kdbus/test-policy-ns.c | 632 ++++++ tools/testing/selftests/kdbus/test-policy-priv.c | 1269 ++++++++++++ tools/testing/selftests/kdbus/test-policy.c | 80 + tools/testing/selftests/kdbus/test-sync.c | 369 ++++ tools/testing/selftests/kdbus/test-timeout.c | 99 + 97 files changed, 34069 insertions(+), 3 deletions(-) create mode 100644 Documentation/kdbus/.gitignore create mode 100644 Documentation/kdbus/Makefile create mode 100644 Documentation/kdbus/kdbus.bus.xml create mode 100644 Documentation/kdbus/kdbus.connection.xml create mode 100644 Documentation/kdbus/kdbus.endpoint.xml create mode 100644 Documentation/kdbus/kdbus.fs.xml create mode 100644 Documentation/kdbus/kdbus.item.xml create mode 100644 Documentation/kdbus/kdbus.match.xml create mode 100644 Documentation/kdbus/kdbus.message.xml create mode 100644 Documentation/kdbus/kdbus.name.xml create mode 100644 Documentation/kdbus/kdbus.policy.xml create mode 100644 Documentation/kdbus/kdbus.pool.xml create mode 100644 Documentation/kdbus/kdbus.xml create mode 100644 Documentation/kdbus/stylesheet.xsl create mode 100644 include/uapi/linux/kdbus.h create mode 100644 ipc/kdbus/Makefile create mode 100644 ipc/kdbus/bus.c create mode 100644 ipc/kdbus/bus.h create mode 100644 ipc/kdbus/connection.c create mode 100644 ipc/kdbus/connection.h create mode 100644 ipc/kdbus/domain.c create mode 100644 ipc/kdbus/domain.h create mode 100644 ipc/kdbus/endpoint.c create mode 100644 ipc/kdbus/endpoint.h create mode 100644 ipc/kdbus/fs.c create mode 100644 ipc/kdbus/fs.h create mode 100644 ipc/kdbus/handle.c create mode 100644 ipc/kdbus/handle.h create mode 100644 ipc/kdbus/item.c create mode 100644 ipc/kdbus/item.h create mode 100644 ipc/kdbus/limits.h create mode 100644 ipc/kdbus/main.c create mode 100644 ipc/kdbus/match.c create mode 100644 ipc/kdbus/match.h create mode 100644 ipc/kdbus/message.c create mode 100644 ipc/kdbus/message.h create mode 100644 ipc/kdbus/metadata.c create mode 100644 ipc/kdbus/metadata.h create mode 100644 ipc/kdbus/names.c create mode 100644 ipc/kdbus/names.h create mode 100644 ipc/kdbus/node.c create mode 100644 ipc/kdbus/node.h create mode 100644 ipc/kdbus/notify.c create mode 100644 ipc/kdbus/notify.h create mode 100644 ipc/kdbus/policy.c create mode 100644 ipc/kdbus/policy.h create mode 100644 ipc/kdbus/pool.c create mode 100644 ipc/kdbus/pool.h create mode 100644 ipc/kdbus/queue.c create mode 100644 ipc/kdbus/queue.h create mode 100644 ipc/kdbus/reply.c create mode 100644 ipc/kdbus/reply.h create mode 100644 ipc/kdbus/util.c create mode 100644 ipc/kdbus/util.h create mode 100644 samples/kdbus/.gitignore create mode 100644 samples/kdbus/Makefile create mode 100644 samples/kdbus/kdbus-api.h create mode 100644 samples/kdbus/kdbus-workers.c create mode 100644 tools/testing/selftests/kdbus/.gitignore create mode 100644 tools/testing/selftests/kdbus/Makefile create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h create mode 100644 tools/testing/selftests/kdbus/test-activator.c create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c create mode 100644 tools/testing/selftests/kdbus/test-bus.c create mode 100644 tools/testing/selftests/kdbus/test-chat.c create mode 100644 tools/testing/selftests/kdbus/test-connection.c create mode 100644 tools/testing/selftests/kdbus/test-daemon.c create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c create mode 100644 tools/testing/selftests/kdbus/test-fd.c create mode 100644 tools/testing/selftests/kdbus/test-free.c create mode 100644 tools/testing/selftests/kdbus/test-match.c create mode 100644 tools/testing/selftests/kdbus/test-message.c create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c create mode 100644 tools/testing/selftests/kdbus/test-monitor.c create mode 100644 tools/testing/selftests/kdbus/test-names.c create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c create mode 100644 tools/testing/selftests/kdbus/test-policy.c create mode 100644 tools/testing/selftests/kdbus/test-sync.c create mode 100644 tools/testing/selftests/kdbus/test-timeout.c ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:03 Greg Kroah-Hartman @ 2015-04-13 19:29 ` Eric W. Biederman 2015-04-13 19:42 ` Greg Kroah-Hartman ` (2 more replies) 2015-04-13 20:13 ` Andy Lutomirski 2015-04-23 13:05 ` Greg Kroah-Hartman 2 siblings, 3 replies; 333+ messages in thread From: Eric W. Biederman @ 2015-04-13 19:29 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > > ---------------------------------------------------------------- > kdbus for 4.1-rc1 > > Here's the kdbus pull request for 4.1-rc1. > > It's been under development for many years now, and been in linux-next > for many months, and has undergone loads of testing a review and even a few > good arguments. It comes with full documentation and tests. > There has been a few complaints about the code, notably from people who > don't like the use of metadata in the bus messages. That is actually > one of the main features here, as we can get this data in a secure and > reliable way, and it's something that userspace requires today. So > while it does look "odd" to people who are not familiar with dbus, this > is something that finally fixes a number of almost unfixable races in > the current dbus implementations. And the code that transfers the meta-data is wrong. It is generally not something that userspace requires today, certainly userspace is not using it. You are exporting a weird set of information in a unique way that makes it race free enough to make ``security'' decisions upon but the data in general is not appropriate to make those decisions. I remain opposed to this half thought out trash of an ABI for the meta-data. Just because something happens to be exported in a DEBUG api today does not make it appropriate for userspace to run around making security decisions with that information. Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> I think it is premature to be merging kdbus. You have fuddamental issues that can not be fixed once the ABI is frozen. The semantics of the meta-data you export are extremely poorly defined. > The rest of this pull request message comes from the kdbus patch posting > messages as sent to lkml previously: > > Reasons kdbus should be in the kernel, instead of userspace as it is > currently done today includes the following: > > * Performance: Fewer process context switches, fewer copies, fewer > syscalls, larger memory chunks via memfd. This is really important > for a whole class of userspace programs that are ported from other > operating systems that are run on tiny ARM systems that rely on > hundreds of thousands of messages passed at boot time, and at > "critical" times in their user interaction loops. DBus is not used > for performance sensitive applications because DBus is slow. > We want to make it fast so we can finally use it for low-latency, > high-throughput applications. A simple DBus method-call+reply takes > 200us on an up-to-date test machine, with kdbus it takes 8us (with > UDS about 2us). If the packet size is increased from 8k to 128k, > kdbus even beats UDS due to single-copy transfers. And with a good design kdbus could be faster. > * Security: The peers which communicate do not have to trust each > other, as the only trustworthy component in the game is the kernel > which adds metadata and ensures that all data passed as payload is > either copied or sealed, so that the receiver can parse the data > without having to protect against changing memory while parsing > buffers. Also, all the data transfer is controlled by the kernel, > so that LSMs can track and control what is going on, without > involving userspace. Because of the LSM issue, security people are > much happier with this model than the current scheme of having to > hook into dbus to mediate things. > * More types of metadata can be attached to messages than in > userspace The meta-data is poorly thought and and much of it is not appropriate for making security decisions anywhere except in the kernel. All I have seen with the meta-data discussion is sticking heads in the sand and resubmitting and hoping your reviewers go away. If you won't do a good responsible job on this before the code is merged how can we possibly expect you to do a good job later. Or is this going to be another API where userspace will be broken at arbitrary moments by arbitrary users? How are you going to fix the security issues your poor API comes with it when then are eventually spelled out clearly and to fix them means breaking everyones desktop environment? Eric ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:29 ` Eric W. Biederman @ 2015-04-13 19:42 ` Greg Kroah-Hartman 2015-04-13 19:49 ` Richard Weinberger 2015-04-13 20:22 ` Al Viro 2015-04-14 0:19 ` Eric W. Biederman 2015-04-22 8:58 ` Borislav Petkov 2 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-13 19:42 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote: > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > > > > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > > > > are available in the git repository at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > > > > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > > > > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > > > > ---------------------------------------------------------------- > > kdbus for 4.1-rc1 > > > > Here's the kdbus pull request for 4.1-rc1. > > > > It's been under development for many years now, and been in linux-next > > for many months, and has undergone loads of testing a review and even a few > > good arguments. It comes with full documentation and tests. > > > There has been a few complaints about the code, notably from people who > > don't like the use of metadata in the bus messages. That is actually > > one of the main features here, as we can get this data in a secure and > > reliable way, and it's something that userspace requires today. So > > while it does look "odd" to people who are not familiar with dbus, this > > is something that finally fixes a number of almost unfixable races in > > the current dbus implementations. > > And the code that transfers the meta-data is wrong. > > It is generally not something that userspace requires today, certainly > userspace is not using it. > > You are exporting a weird set of information in a unique way that makes > it race free enough to make ``security'' decisions upon but the data > in general is not appropriate to make those decisions. I asked this before but you didn't answer as to why you thought these decisions were not valid. It's what userspace does today already. > I remain opposed to this half thought out trash of an ABI for the > meta-data. You don't have to enable the metadata if you don't want to use it, it's an option :) > Just because something happens to be exported in a DEBUG api today does > not make it appropriate for userspace to run around making security > decisions with that information. What is exported in a debug api today that is being used here? I asked this before but never saw a response. > > * Performance: Fewer process context switches, fewer copies, fewer > > syscalls, larger memory chunks via memfd. This is really important > > for a whole class of userspace programs that are ported from other > > operating systems that are run on tiny ARM systems that rely on > > hundreds of thousands of messages passed at boot time, and at > > "critical" times in their user interaction loops. DBus is not used > > for performance sensitive applications because DBus is slow. > > We want to make it fast so we can finally use it for low-latency, > > high-throughput applications. A simple DBus method-call+reply takes > > 200us on an up-to-date test machine, with kdbus it takes 8us (with > > UDS about 2us). If the packet size is increased from 8k to 128k, > > kdbus even beats UDS due to single-copy transfers. > > And with a good design kdbus could be faster. Faster than today, sure, we've already found some areas that can be optimized, but that's all internal changes, to be done later, nothing affecting the userspace api at all. Even then, today it's very fast. > > * Security: The peers which communicate do not have to trust each > > other, as the only trustworthy component in the game is the kernel > > which adds metadata and ensures that all data passed as payload is > > either copied or sealed, so that the receiver can parse the data > > without having to protect against changing memory while parsing > > buffers. Also, all the data transfer is controlled by the kernel, > > so that LSMs can track and control what is going on, without > > involving userspace. Because of the LSM issue, security people are > > much happier with this model than the current scheme of having to > > hook into dbus to mediate things. > > * More types of metadata can be attached to messages than in > > userspace > > The meta-data is poorly thought and and much of it is not appropriate > for making security decisions anywhere except in the kernel. > > All I have seen with the meta-data discussion is sticking heads in the > sand and resubmitting and hoping your reviewers go away. No, we have asked for specifics but have gotten none, other than random complaints like this. Please be specific as to what is being used incorrectly. > If you won't do a good responsible job on this before the code is merged > how can we possibly expect you to do a good job later. Or is this going > to be another API where userspace will be broken at arbitrary moments by > arbitrary users? > > How are you going to fix the security issues your poor API comes with it > when then are eventually spelled out clearly and to fix them means > breaking everyones desktop environment? What security issues? There are none that I know of, please be specific and not just make vague accusations please. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:42 ` Greg Kroah-Hartman @ 2015-04-13 19:49 ` Richard Weinberger 2015-04-13 19:54 ` Greg Kroah-Hartman 2015-04-13 20:22 ` Al Viro 1 sibling, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-13 19:49 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, LKML, daniel, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: >> I remain opposed to this half thought out trash of an ABI for the >> meta-data. > > You don't have to enable the metadata if you don't want to use it, it's > an option :) Wasn't this also an argument for CONFIG_CGROUPS? Now we're forced to enable it by default to boot a recent distro and CONFIG_CGROUPS is still not fixed. -- Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:49 ` Richard Weinberger @ 2015-04-13 19:54 ` Greg Kroah-Hartman 2015-04-13 19:57 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-13 19:54 UTC (permalink / raw) To: Richard Weinberger Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, LKML, daniel, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote: > On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > >> I remain opposed to this half thought out trash of an ABI for the > >> meta-data. > > > > You don't have to enable the metadata if you don't want to use it, it's > > an option :) > > Wasn't this also an argument for CONFIG_CGROUPS? > Now we're forced to enable it by default to boot a recent distro > and CONFIG_CGROUPS is still not fixed. CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some words with you :) Anyway, yes, it's an option, but given that people are using this metadata today in userspace just fine, I fail to see how having the kernel be a transport for this same data is an issue. When the kernel is the transport, it can do so in a race-free way, and you can properly do security tests/logic based on it. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:54 ` Greg Kroah-Hartman @ 2015-04-13 19:57 ` Richard Weinberger 2015-04-13 20:03 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-13 19:57 UTC (permalink / raw) To: Greg Kroah-Hartman, Richard Weinberger Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, LKML, daniel, David Herrmann, Djalal Harouni Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman: > On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote: >> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >>>> I remain opposed to this half thought out trash of an ABI for the >>>> meta-data. >>> >>> You don't have to enable the metadata if you don't want to use it, it's >>> an option :) >> >> Wasn't this also an argument for CONFIG_CGROUPS? >> Now we're forced to enable it by default to boot a recent distro >> and CONFIG_CGROUPS is still not fixed. > > CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some > words with you :) Tejun is working on it and does a *very* good job. But as long the unified hirarchy is not complete/stable we're facing issues. Ever tried to run systemd a linux container? ;) Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:57 ` Richard Weinberger @ 2015-04-13 20:03 ` Greg Kroah-Hartman 2015-04-13 20:08 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-13 20:03 UTC (permalink / raw) To: Richard Weinberger Cc: Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, LKML, daniel, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 09:57:24PM +0200, Richard Weinberger wrote: > > Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman: > > On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote: > >> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman > >> <gregkh@linuxfoundation.org> wrote: > >>>> I remain opposed to this half thought out trash of an ABI for the > >>>> meta-data. > >>> > >>> You don't have to enable the metadata if you don't want to use it, it's > >>> an option :) > >> > >> Wasn't this also an argument for CONFIG_CGROUPS? > >> Now we're forced to enable it by default to boot a recent distro > >> and CONFIG_CGROUPS is still not fixed. > > > > CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some > > words with you :) > > Tejun is working on it and does a *very* good job. But as long the unified > hirarchy is not complete/stable we're facing issues. > Ever tried to run systemd a linux container? ;) Works just fine for me, I do it daily. Here's how I spin up a debian image on my local filesystem, running systemd within it just swimmingly: sudo systemd-nspawn -D debian/ /sbin/init Also works just fine with gentoo and arch images, both of which I use on a weekly basis in this manner. Perhaps you are doing something odd that prevents this from working for you? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 20:03 ` Greg Kroah-Hartman @ 2015-04-13 20:08 ` Richard Weinberger 0 siblings, 0 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-13 20:08 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, LKML, daniel, David Herrmann, Djalal Harouni Am 13.04.2015 um 22:03 schrieb Greg Kroah-Hartman: > On Mon, Apr 13, 2015 at 09:57:24PM +0200, Richard Weinberger wrote: >> >> Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman: >>> On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote: >>>> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman >>>> <gregkh@linuxfoundation.org> wrote: >>>>>> I remain opposed to this half thought out trash of an ABI for the >>>>>> meta-data. >>>>> >>>>> You don't have to enable the metadata if you don't want to use it, it's >>>>> an option :) >>>> >>>> Wasn't this also an argument for CONFIG_CGROUPS? >>>> Now we're forced to enable it by default to boot a recent distro >>>> and CONFIG_CGROUPS is still not fixed. >>> >>> CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some >>> words with you :) >> >> Tejun is working on it and does a *very* good job. But as long the unified >> hirarchy is not complete/stable we're facing issues. >> Ever tried to run systemd a linux container? ;) > > Works just fine for me, I do it daily. Here's how I spin up a debian > image on my local filesystem, running systemd within it just swimmingly: > sudo systemd-nspawn -D debian/ /sbin/init > > Also works just fine with gentoo and arch images, both of which I use on > a weekly basis in this manner. > > Perhaps you are doing something odd that prevents this from working for > you? systemd-nspawn does not support user namespaces. But the real issue is that cgroup notification does not work within namespaces. I.e. systemd within the namespaces does not get a notify when all processes within a cgroup are gone. You'll notice that by running a container a long time, systemd will get slower and slower as a lot of sessions (mostly crond) will stay. It is known by systemd folks and I have been told that they need the new unified cgroup hirarchy to deal with that. I consult a lot in the linux container hosting area and had a lot of "fun" with issues like that... Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:42 ` Greg Kroah-Hartman 2015-04-13 19:49 ` Richard Weinberger @ 2015-04-13 20:22 ` Al Viro 2015-04-13 20:37 ` Greg Kroah-Hartman 2015-04-15 1:36 ` Andy Lutomirski 1 sibling, 2 replies; 333+ messages in thread From: Al Viro @ 2015-04-13 20:22 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote: > > I remain opposed to this half thought out trash of an ABI for the > > meta-data. > > You don't have to enable the metadata if you don't want to use it, it's > an option :) OK, _that_ argument needs to be stomped out. It had been used before, and it was a deliberate scam. There is no such thing as optional kernel interface, especially when udev/dbus/systemd crowd is nearby. We'd been through that excuse before; remember how devtmpfs was pushed in as "optional"? This is a huge red flag. On the level of "I need your account information to transfer $200M you might have inherited from my deceased client". Just to recap how it went the last time around: Kay kept pushing his piece of code into the tree, claiming that it was optional, that nobody who doesn't like it has to enable it, so what's the problem? OK, in it went. And pretty soon udev (maintained by the same... meticulously honorable person) had stopped working on the kernels that didn't have that enabled. We had been there before. To paraphrase another... meticulously honorable person, "if you didn't want something relied upon, why have you put it into the kernel?" Said person is on the record as having no problem whatsoever with adding dependencies to the bottom of userland stack. IMO either it's OK without "if you don't like it, don't enable it", or it should not be merged at all. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 20:22 ` Al Viro @ 2015-04-13 20:37 ` Greg Kroah-Hartman 2015-04-15 1:36 ` Andy Lutomirski 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-13 20:37 UTC (permalink / raw) To: Al Viro Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Mon, Apr 13, 2015 at 09:22:33PM +0100, Al Viro wrote: > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote: > > > I remain opposed to this half thought out trash of an ABI for the > > > meta-data. > > > > You don't have to enable the metadata if you don't want to use it, it's > > an option :) > > OK, _that_ argument needs to be stomped out. It had been used before, > and it was a deliberate scam. There is no such thing as optional kernel > interface, especially when udev/dbus/systemd crowd is nearby. We'd been > through that excuse before; remember how devtmpfs was pushed in as "optional"? > > This is a huge red flag. On the level of "I need your account information > to transfer $200M you might have inherited from my deceased client". > > Just to recap how it went the last time around: Kay kept pushing his piece of > code into the tree, claiming that it was optional, that nobody who doesn't > like it has to enable it, so what's the problem? OK, in it went. And pretty > soon udev (maintained by the same... meticulously honorable person) had > stopped working on the kernels that didn't have that enabled. > > We had been there before. To paraphrase another... meticulously honorable > person, "if you didn't want something relied upon, why have you put it into the > kernel?" Said person is on the record as having no problem whatsoever with > adding dependencies to the bottom of userland stack. > > IMO either it's OK without "if you don't like it, don't enable it", or it > should not be merged at all. We want it. I want it. Andy asked for the option to be disabled as he didn't want it, so it was made that way. I'll gladly put that back in, as I don't know of any problems with it, other than Eric's vague rants about the issue. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 20:22 ` Al Viro 2015-04-13 20:37 ` Greg Kroah-Hartman @ 2015-04-15 1:36 ` Andy Lutomirski 2015-04-15 6:54 ` Richard Weinberger ` (2 more replies) 1 sibling, 3 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-15 1:36 UTC (permalink / raw) To: Al Viro Cc: Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote: >> > I remain opposed to this half thought out trash of an ABI for the >> > meta-data. >> >> You don't have to enable the metadata if you don't want to use it, it's >> an option :) > > OK, _that_ argument needs to be stomped out. It had been used before, > and it was a deliberate scam. There is no such thing as optional kernel > interface, especially when udev/dbus/systemd crowd is nearby. We'd been > through that excuse before; remember how devtmpfs was pushed in as "optional"? > > This is a huge red flag. On the level of "I need your account information > to transfer $200M you might have inherited from my deceased client". > > Just to recap how it went the last time around: Kay kept pushing his piece of > code into the tree, claiming that it was optional, that nobody who doesn't > like it has to enable it, so what's the problem? OK, in it went. And pretty > soon udev (maintained by the same... meticulously honorable person) had > stopped working on the kernels that didn't have that enabled. > > We had been there before. To paraphrase another... meticulously honorable > person, "if you didn't want something relied upon, why have you put it into the > kernel?" Said person is on the record as having no problem whatsoever with > adding dependencies to the bottom of userland stack. It appears that, if kdbus is merged, upstream udev may end up requiring it: http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html Grumble. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 1:36 ` Andy Lutomirski @ 2015-04-15 6:54 ` Richard Weinberger 2015-04-15 7:31 ` Mike Galbraith 2015-04-15 8:48 ` Greg Kroah-Hartman 2015-04-15 8:18 ` Martin Steigerwald 2015-04-15 8:29 ` Greg Kroah-Hartman 2 siblings, 2 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-15 6:54 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> We had been there before. To paraphrase another... meticulously honorable >> person, "if you didn't want something relied upon, why have you put it into the >> kernel?" Said person is on the record as having no problem whatsoever with >> adding dependencies to the bottom of userland stack. > > It appears that, if kdbus is merged, upstream udev may end up requiring it: > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html Why so surprised? kdbus will be a major hard-dependency for every non-trivial userland. Like cgroups... -- Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 6:54 ` Richard Weinberger @ 2015-04-15 7:31 ` Mike Galbraith 2015-04-15 14:48 ` Michal Schmidt 2015-04-15 8:48 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: Mike Galbraith @ 2015-04-15 7:31 UTC (permalink / raw) To: Richard Weinberger Cc: Andy Lutomirski, Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 2015-04-15 at 08:54 +0200, Richard Weinberger wrote: > On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net > > wrote: > > > We had been there before. To paraphrase another... meticulously > > > honorable > > > person, "if you didn't want something relied upon, why have you > > > put it into the > > > kernel?" Said person is on the record as having no problem > > > whatsoever with > > > adding dependencies to the bottom of userland stack. > > > > It appears that, if kdbus is merged, upstream udev may end up > > requiring it: > > > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html > > Why so surprised? > kdbus will be a major hard-dependency for every non-trivial userland. > Like cgroups... Heh, makes one wonder how we ever survived. My openSUSE box is thoroughly infested with latest system-disease, and it seems the thing has now mandated group scheduling. Whether you need/want it and its size large overhead or not is immaterial. I'm not seeing an on/off switch anyway. (shrug, axe should work as substitute, say "byebye tentacle"). -Mike ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 7:31 ` Mike Galbraith @ 2015-04-15 14:48 ` Michal Schmidt 2015-04-15 15:34 ` Mike Galbraith ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: Michal Schmidt @ 2015-04-15 14:48 UTC (permalink / raw) To: Mike Galbraith Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/15/2015 09:31 AM, Mike Galbraith wrote: > it seems [systemd] has now mandated group scheduling. What makes you think so? Was it the fact that by default you have a populated /sys/fs/cgroup/cpu/ hierarchy? This is either because some unit requests the use of the cpu controller using one of the CPU*= directives from systemd.resource-control(5), or (perhaps more likely) because there is a privileged unit with Delegate=yes. The most likely candidate is user@0.service, and so you could try preventing it from starting: systemctl mask user@0.service Note that systemd still works without group scheduling or any cgroup subsystems enabled in the kernel: $ grep GROUP .config CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_CGROUP_HUGETLB is not set # CONFIG_CGROUP_PERF is not set # CONFIG_CGROUP_SCHED is not set # CONFIG_BLK_CGROUP is not set # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_NETFILTER_XT_MATCH_CGROUP is not set # CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set # CONFIG_NET_CLS_CGROUP is not set # CONFIG_CGROUP_NET_PRIO is not set # CONFIG_CGROUP_NET_CLASSID is not set Michal ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 14:48 ` Michal Schmidt @ 2015-04-15 15:34 ` Mike Galbraith 2015-04-15 16:42 ` Mike Galbraith 2015-04-17 16:53 ` Mike Galbraith 2 siblings, 0 replies; 333+ messages in thread From: Mike Galbraith @ 2015-04-15 15:34 UTC (permalink / raw) To: Michal Schmidt Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote: > On 04/15/2015 09:31 AM, Mike Galbraith wrote: > > it seems [systemd] has now mandated group scheduling. > > What makes you think so? If group sched is available, systemd decides on its own to use it, thus making the decision to eat that overhead for me should I happen to boot say an enterprise kernel to do some performance measurements. Perhaps there is a way to beg it to please not do that, but if so, I didn't find it in time. The service that started group scheduling was explicitly disabled by me, but systemd started it at boot despite that. Perhaps I didn't express my wishes clearly enough, or I need to burn a virgin or something to become worthy of its attention, dunno. Applying my axe to its tentacles fixed the communication issue. -Mike ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 14:48 ` Michal Schmidt 2015-04-15 15:34 ` Mike Galbraith @ 2015-04-15 16:42 ` Mike Galbraith 2015-04-17 16:53 ` Mike Galbraith 2 siblings, 0 replies; 333+ messages in thread From: Mike Galbraith @ 2015-04-15 16:42 UTC (permalink / raw) To: Michal Schmidt Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote: > systemctl mask user@0.service That off switch may work better, I'll try it when I have time to squabble with the thing again, thanks. user@0 was disabled in yast (suse admin tool) by me, yet found to be in state disabled+active upon every boot. Just as yast did, systemctl status reported it as being both disabled and active, which led me to the conclusion that someone other than me controls this service. -Mike ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 14:48 ` Michal Schmidt 2015-04-15 15:34 ` Mike Galbraith 2015-04-15 16:42 ` Mike Galbraith @ 2015-04-17 16:53 ` Mike Galbraith 2 siblings, 0 replies; 333+ messages in thread From: Mike Galbraith @ 2015-04-17 16:53 UTC (permalink / raw) To: Michal Schmidt Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote: > On 04/15/2015 09:31 AM, Mike Galbraith wrote: > > it seems [systemd] has now mandated group scheduling. > > What makes you think so? Was it the fact that by default you have a > populated /sys/fs/cgroup/cpu/ hierarchy? This is either because some > unit requests the use of the cpu controller using one of the CPU*= > directives from systemd.resource-control(5), or (perhaps more likely) > because there is a privileged unit with Delegate=yes. The most likely > candidate is user@0.service, and so you could try preventing it from > starting: > systemctl mask user@0.service BTW, asking it to symlink it's disabled service to /dev/null, did indeed convince it to stop running said disabled service. > Note that systemd still works without group scheduling or any cgroup > subsystems enabled in the kernel: > > $ grep GROUP .config > CONFIG_CGROUPS=y Yup. CONFIG_CGROUPS=y all by itself isn't useless either, as that allows the user to use his box for something other than a doorstop. Hohum, 'nuff of that ;-) Thanks for the hint, it seems a tad dainbramaged, but it works. -Mike ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 6:54 ` Richard Weinberger 2015-04-15 7:31 ` Mike Galbraith @ 2015-04-15 8:48 ` Greg Kroah-Hartman 2015-04-15 9:00 ` Richard Weinberger 2015-04-15 11:25 ` One Thousand Gnomes 1 sibling, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:48 UTC (permalink / raw) To: Richard Weinberger Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote: > On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote: > >> We had been there before. To paraphrase another... meticulously honorable > >> person, "if you didn't want something relied upon, why have you put it into the > >> kernel?" Said person is on the record as having no problem whatsoever with > >> adding dependencies to the bottom of userland stack. > > > > It appears that, if kdbus is merged, upstream udev may end up requiring it: > > > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html > > Why so surprised? > kdbus will be a major hard-dependency for every non-trivial userland. > Like cgroups... Maybe because things like cgroups, and kdbus in the future, solves a need that the developers in that area have to solve problems and provide functionality that their users require? Look, us kernel developers only work on one huge, multithreaded, global state binary. Our experience in multi-application interactions with shared state and permission requirements is usually quite limited. If you don't trust the developers of those programs outside the kernel, don't use them, there are still distros out there that don't require them. But if you do trust them, then don't make snide comments about how they don't know what they are doing, because that's just flat out rude. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:48 ` Greg Kroah-Hartman @ 2015-04-15 9:00 ` Richard Weinberger 2015-04-15 9:20 ` Greg Kroah-Hartman 2015-04-15 11:25 ` One Thousand Gnomes 1 sibling, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-15 9:00 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman: > On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote: >> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>> We had been there before. To paraphrase another... meticulously honorable >>>> person, "if you didn't want something relied upon, why have you put it into the >>>> kernel?" Said person is on the record as having no problem whatsoever with >>>> adding dependencies to the bottom of userland stack. >>> >>> It appears that, if kdbus is merged, upstream udev may end up requiring it: >>> >>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html >> >> Why so surprised? >> kdbus will be a major hard-dependency for every non-trivial userland. >> Like cgroups... > > Maybe because things like cgroups, and kdbus in the future, solves a > need that the developers in that area have to solve problems and > provide functionality that their users require? I agree that a high level bus is needed and dbus is not perfect. But this does not mean that we need a in-kernel dbus in any case. > Look, us kernel developers only work on one huge, multithreaded, global > state binary. Our experience in multi-application interactions with > shared state and permission requirements is usually quite limited. If > you don't trust the developers of those programs outside the kernel, > don't use them, there are still distros out there that don't require > them. We're all forced to use cgroups, systemd, udev unless we want to have busybox as userland. That's a fact. systemd and its dependencies are not a bad thing per se. But we have to be very sure that new hard-dependencies are in well shape before we push them into the kernel. IMHO this is also Andy and Eris's point. Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:00 ` Richard Weinberger @ 2015-04-15 9:20 ` Greg Kroah-Hartman 2015-04-15 9:21 ` Borislav Petkov 2015-04-15 9:28 ` Richard Weinberger 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 9:20 UTC (permalink / raw) To: Richard Weinberger Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:00:50AM +0200, Richard Weinberger wrote: > Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman: > > On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote: > >> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote: > >>>> We had been there before. To paraphrase another... meticulously honorable > >>>> person, "if you didn't want something relied upon, why have you put it into the > >>>> kernel?" Said person is on the record as having no problem whatsoever with > >>>> adding dependencies to the bottom of userland stack. > >>> > >>> It appears that, if kdbus is merged, upstream udev may end up requiring it: > >>> > >>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html > >> > >> Why so surprised? > >> kdbus will be a major hard-dependency for every non-trivial userland. > >> Like cgroups... > > > > Maybe because things like cgroups, and kdbus in the future, solves a > > need that the developers in that area have to solve problems and > > provide functionality that their users require? > > I agree that a high level bus is needed and dbus is not perfect. > But this does not mean that we need a in-kernel dbus in any case. So what do you propose to solve the issues presented in my original email about the usecases that this code addresses? > > Look, us kernel developers only work on one huge, multithreaded, global > > state binary. Our experience in multi-application interactions with > > shared state and permission requirements is usually quite limited. If > > you don't trust the developers of those programs outside the kernel, > > don't use them, there are still distros out there that don't require > > them. > > We're all forced to use cgroups, systemd, udev unless we want to have busybox > as userland. That's a fact. Is that a problem? > systemd and its dependencies are not a bad thing per se. > But we have to be very sure that new hard-dependencies are > in well shape before we push them into the kernel. That's fine, and normal, and I expect it. But please provide technical reasons why the proposal is not acceptable, like Andy has done in this thread. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:20 ` Greg Kroah-Hartman @ 2015-04-15 9:21 ` Borislav Petkov 2015-04-15 9:27 ` Greg Kroah-Hartman 2015-04-15 9:28 ` Richard Weinberger 1 sibling, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-15 9:21 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: > > We're all forced to use cgroups, systemd, udev unless we want to have busybox > > as userland. That's a fact. > > Is that a problem? I'm amazed that you're really actually asking that question :-( -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:21 ` Borislav Petkov @ 2015-04-15 9:27 ` Greg Kroah-Hartman 2015-04-15 9:30 ` Richard Weinberger 2015-04-15 9:44 ` Borislav Petkov 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 9:27 UTC (permalink / raw) To: Borislav Petkov Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote: > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: > > > We're all forced to use cgroups, systemd, udev unless we want to have busybox > > > as userland. That's a fact. > > > > Is that a problem? > > I'm amazed that you're really actually asking that question :-( Really? Why can't userspace rely on the features that the kernel provides them? If not, why would the feature be created and supported by us kernel developers in the first place? That makes no sense at all, please explain. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:27 ` Greg Kroah-Hartman @ 2015-04-15 9:30 ` Richard Weinberger 2015-04-15 9:49 ` Greg Kroah-Hartman 2015-04-15 9:44 ` Borislav Petkov 1 sibling, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-15 9:30 UTC (permalink / raw) To: Greg Kroah-Hartman, Borislav Petkov Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman: > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote: >> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: >>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox >>>> as userland. That's a fact. >>> >>> Is that a problem? >> >> I'm amazed that you're really actually asking that question :-( > > Really? Why can't userspace rely on the features that the kernel > provides them? If not, why would the feature be created and supported > by us kernel developers in the first place? This IMHO not the problem. But if we add a new component to the kernel which *will* be used by almost every userland out there (systemd won the "init wars") we have to make sure that we're all fine with it. Andy and Eric have some very valid concerns. Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:30 ` Richard Weinberger @ 2015-04-15 9:49 ` Greg Kroah-Hartman 2015-04-15 9:53 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 9:49 UTC (permalink / raw) To: Richard Weinberger Cc: Borislav Petkov, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:30:52AM +0200, Richard Weinberger wrote: > Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman: > > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote: > >> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: > >>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox > >>>> as userland. That's a fact. > >>> > >>> Is that a problem? > >> > >> I'm amazed that you're really actually asking that question :-( > > > > Really? Why can't userspace rely on the features that the kernel > > provides them? If not, why would the feature be created and supported > > by us kernel developers in the first place? > > This IMHO not the problem. > But if we add a new component to the kernel which *will* be used > by almost every userland out there (systemd won the "init wars") > we have to make sure that we're all fine with it. Sure, but why would this be different from any other kernel feature that we add? We have to be sure we are fine with everything we merge, as we are saying we are going to maintain this stuff for forever. > Andy and Eric have some very valid concerns. I've tried to address Andy's concerns, Eric is not being very specific, so there's nothing I can do there :) thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:49 ` Greg Kroah-Hartman @ 2015-04-15 9:53 ` Richard Weinberger 0 siblings, 0 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-15 9:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Borislav Petkov, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am 15.04.2015 um 11:49 schrieb Greg Kroah-Hartman: > On Wed, Apr 15, 2015 at 11:30:52AM +0200, Richard Weinberger wrote: >> Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman: >>> On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote: >>>> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: >>>>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox >>>>>> as userland. That's a fact. >>>>> >>>>> Is that a problem? >>>> >>>> I'm amazed that you're really actually asking that question :-( >>> >>> Really? Why can't userspace rely on the features that the kernel >>> provides them? If not, why would the feature be created and supported >>> by us kernel developers in the first place? >> >> This IMHO not the problem. >> But if we add a new component to the kernel which *will* be used >> by almost every userland out there (systemd won the "init wars") >> we have to make sure that we're all fine with it. > > Sure, but why would this be different from any other kernel feature that > we add? We have to be sure we are fine with everything we merge, as we > are saying we are going to maintain this stuff for forever. There is nothing different. The series has currently two NACKs, 0 ACKs and 0 Reviews. I don't think that any other series would get merged in such a state. >> Andy and Eric have some very valid concerns. > > I've tried to address Andy's concerns, Eric is not being very specific, > so there's nothing I can do there :) What about Stevens proposal to talk at Plumbers? I fear the discussion is at a dead end and needs a face to face resolution. Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:27 ` Greg Kroah-Hartman 2015-04-15 9:30 ` Richard Weinberger @ 2015-04-15 9:44 ` Borislav Petkov 2015-04-15 11:40 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-15 9:44 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote: > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote: > > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: > > > > We're all forced to use cgroups, systemd, udev unless we want to have busybox > > > > as userland. That's a fact. > > > > > > Is that a problem? > > > > I'm amazed that you're really actually asking that question :-( > > Really? Why can't userspace rely on the features that the kernel > provides them? Userspace can do whatever it wants. As long as I'm not being *forced* to do what userspace thinks is the right thing. It seems to me that since that whole systemd* debacle started, we're forgetting the choice aspect. And dammit, I want my choice. I want to be able to choose what I'm running. Not run what someone else thought what would be good for me to run. If I wanted that, I'd long switched to windoze or äbble. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:44 ` Borislav Petkov @ 2015-04-15 11:40 ` Greg Kroah-Hartman 2015-04-15 13:03 ` Borislav Petkov ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 11:40 UTC (permalink / raw) To: Borislav Petkov Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:44:11AM +0200, Borislav Petkov wrote: > On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote: > > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote: > > > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: > > > > > We're all forced to use cgroups, systemd, udev unless we want to have busybox > > > > > as userland. That's a fact. > > > > > > > > Is that a problem? > > > > > > I'm amazed that you're really actually asking that question :-( > > > > Really? Why can't userspace rely on the features that the kernel > > provides them? > > Userspace can do whatever it wants. As long as I'm not being *forced* to > do what userspace thinks is the right thing. > > It seems to me that since that whole systemd* debacle started, we're > forgetting the choice aspect. What "choice" aspect? Surely you aren't going to make the "Linux is about choice" argument are you? > And dammit, I want my choice. I want to be able to choose what I'm > running. Not run what someone else thought what would be good for me to > run. If I wanted that, I'd long switched to windoze or äbble. Oh crap, you went there :) Take a look at http://www.islinuxaboutchoice.com/ please. And yes, you can take Linux (the kernel) and do whatever you want with it (look at Android for an example of no existing userspace code, just the kernel and everything else new for a "choice".) You have to trust someone to help make your system work together in a unified way. If you can't trust your distro's engineers, then either start your own distro, or only run busybox on top of a kernel. You really don't have much other "choice" than that :) So stop making this discussion be about "oh those horrid systemd developers, I don't want their code as my init system" as that's not what any of this is about at all. It's about the patches being proposed, and the API involved in it. Please stick to that. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:40 ` Greg Kroah-Hartman @ 2015-04-15 13:03 ` Borislav Petkov 2015-04-15 15:41 ` Steven Rostedt 2015-04-15 19:04 ` Martin Steigerwald 2 siblings, 0 replies; 333+ messages in thread From: Borislav Petkov @ 2015-04-15 13:03 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:40:36PM +0200, Greg Kroah-Hartman wrote: > So stop making this discussion be about "oh those horrid systemd > developers, I don't want their code as my init system" as that's not > what any of this is about at all. It's about the patches being > proposed, and the API involved in it. Please stick to that. Well, you went there by saying that I should simply accept systemd and whatever other crap people are producing just because Linux is not about choice. And I'm still amazed that you really and seriously think that - you must've been drinking the systemd cool aid for too long. So to get back to kdbust: the design of this thing is flawed, it clearly needs a lot more discussing and changes and it *absolutely* has no place upstream in its current form as *no* *one* has reviewed that pile except Andy and Eric to a certain degree. Oh and I haven't seen them lift their NAKs yet... You and I know that's not how stuff is upstreamed. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:40 ` Greg Kroah-Hartman 2015-04-15 13:03 ` Borislav Petkov @ 2015-04-15 15:41 ` Steven Rostedt 2015-04-15 16:40 ` Greg Kroah-Hartman 2015-04-15 19:04 ` Martin Steigerwald 2 siblings, 1 reply; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 15:41 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:40:36PM +0200, Greg Kroah-Hartman wrote: > > You have to trust someone to help make your system work together in a > unified way. If you can't trust your distro's engineers, then either > start your own distro, or only run busybox on top of a kernel. You > really don't have much other "choice" than that :) And obviously there is a lack of trust. And once kdbus is in, we must use it, or support our own distro where we just do not have the time. Personally, I'm fine with getting something in that will help userspace tools work better. The issue I see, mostly from the side lines as I haven't totally submerged myself into the dbus protocol (I think I should spend some time to do just that), this is going too fast. Once it is in the kernel, whatever ABI we expose is locked in stone. There's no changing it. We need to make sure that this is well thought out. People seem to be of the impression that the current dbus design has flaws, but because everything relies on it we must still push it into the kernel because it mimics what is out there in user space. I disagree. As others have said. We do not need to follow the dbus design. If we can supply a better transport layer than what the kernel supplies today, then tools will eventually merge to it away from dbus. Perhaps the kernel can supply just enough to have dbus improve its speed, but not with the entire complex solution that kdbus is presenting today. This isn't a case of Republicans vs Democrats pushing a health care system within a window that was rushed. Now the US has a health care system that somewhat works but due to politics its not being fixed (the ABI is solidified). I don't want to have the same thing with kdbus. We are technical people here, lets solve it with a technical solution, and not rush into things. dbus works today, what's the rush to put something into the kernel that must be supported forever. Lets make sure we do it right. I'm serious about my Linux Plumbers proposal. If you can make it, and get the dbus authors there too, and hopefully, Andy, Al and Eric can make it too. We should really sit down and talk about it. Any other kernel developer that wants to participate should, as a prerequisite, sit down and write a dbus interface, such that they have an idea of how it works. I plan to. And I hope that I can learn more about the interface and productively join in this discussion. I'm willing to moderate the kdbus microconference. I think I'll add it now. Thoughts? -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 15:41 ` Steven Rostedt @ 2015-04-15 16:40 ` Greg Kroah-Hartman 2015-04-15 16:48 ` Jiri Kosina 2015-04-15 17:20 ` Steven Rostedt 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 16:40 UTC (permalink / raw) To: Steven Rostedt Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:41:53AM -0400, Steven Rostedt wrote: > > And obviously there is a lack of trust. And once kdbus is in, we must use > it, or support our own distro where we just do not have the time. Just like cgroups, and ftrace :) > Personally, I'm fine with getting something in that will help userspace > tools work better. The issue I see, mostly from the side lines as I haven't > totally submerged myself into the dbus protocol (I think I should spend > some time to do just that), this is going too fast. Once it is in the kernel, > whatever ABI we expose is locked in stone. There's no changing it. We need > to make sure that this is well thought out. People seem to be of the impression > that the current dbus design has flaws, but because everything relies on it > we must still push it into the kernel because it mimics what is out there > in user space. I disagree. "fast"? Are you kidding me? This stuff has been under active, public, development for over two years. We have been posting public patches, asking for review and comments for _months_ now. Given that there were no more specific review comments on the patch set, and its success in linux-next for almost the entire 4.0 development cycle, I asked it to be merged. I don't know too many other kernel features/drivers that have taken this long, or done this "slowly", do you? > As others have said. We do not need to follow the dbus design. If we can supply > a better transport layer than what the kernel supplies today, then tools will > eventually merge to it away from dbus. Perhaps the kernel can supply just enough > to have dbus improve its speed, but not with the entire complex solution that > kdbus is presenting today. I originally thought this would work too. 8 months of work later, I was proven wrong, that will not work. Or it imposes too much additional work on userspace that really makes no sense at all. The in-kernel code isn't a lot (again, 13k lines, smaller than almost all of the drivers you are using today on an individual basis) It's also really fast, but with benchmarks, David and Andy have found some minor bottlenecks that can make things faster. Yes it seems complex, but read the documentation to get an idea of what is happening here. I think you will get a better appreciation of what is going on. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:40 ` Greg Kroah-Hartman @ 2015-04-15 16:48 ` Jiri Kosina 2015-04-15 17:33 ` Greg Kroah-Hartman 2015-04-15 17:20 ` Steven Rostedt 1 sibling, 1 reply; 333+ messages in thread From: Jiri Kosina @ 2015-04-15 16:48 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > The in-kernel code isn't a lot (again, 13k lines, smaller than almost > all of the drivers you are using today on an individual basis) It's I originally didn't want to comment on this, but now that you are making this argument for 3rd or 4th time, I can't really resist. What exactly are you trying to "prove" by the 13k-lines argument? mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole memory reclaim is trivial to review? -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:48 ` Jiri Kosina @ 2015-04-15 17:33 ` Greg Kroah-Hartman 2015-04-15 18:06 ` Steven Rostedt 2015-04-16 8:43 ` Jiri Kosina 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 17:33 UTC (permalink / raw) To: Jiri Kosina Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 06:48:46PM +0200, Jiri Kosina wrote: > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > > > The in-kernel code isn't a lot (again, 13k lines, smaller than almost > > all of the drivers you are using today on an individual basis) It's > > I originally didn't want to comment on this, but now that you are making > this argument for 3rd or 4th time, I can't really resist. What exactly are > you trying to "prove" by the 13k-lines argument? > > mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole > memory reclaim is trivial to review? I'm trying to say that it's not a ton of code. lines of code are of course not a valid way to judge complexity, and I'm not trying to say that. I am trying to point out that it isn't "huge" by comparing it to other chunks of code that we all know and love. We merge subsystems with new userspace apis that are large than this all the time. I'm trying to say this isn't something "unusual" at all. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:33 ` Greg Kroah-Hartman @ 2015-04-15 18:06 ` Steven Rostedt 2015-04-16 8:43 ` Jiri Kosina 1 sibling, 0 replies; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 18:06 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jiri Kosina, Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 19:33:57 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > We merge subsystems with new userspace apis that are large than this all > the time. I'm trying to say this isn't something "unusual" at all. I believe the difference is that those subsystems are not part of the core system infrastructure. If it is, can you please tell me what they are. People don't use perf and tracing to run their desktops. People will be using kdbus though. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:33 ` Greg Kroah-Hartman 2015-04-15 18:06 ` Steven Rostedt @ 2015-04-16 8:43 ` Jiri Kosina 1 sibling, 0 replies; 333+ messages in thread From: Jiri Kosina @ 2015-04-16 8:43 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > > I originally didn't want to comment on this, but now that you are > > making this argument for 3rd or 4th time, I can't really resist. What > > exactly are you trying to "prove" by the 13k-lines argument? > > > > mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole > > memory reclaim is trivial to review? > > I'm trying to say that it's not a ton of code. lines of code are of > course not a valid way to judge complexity, and I'm not trying to say > that. I am trying to point out that it isn't "huge" by comparing it to > other chunks of code that we all know and love. > > We merge subsystems with new userspace apis that are large than this all > the time. I'm trying to say this isn't something "unusual" at all. I agree with you on that point. Merging 13k lines isn't a big deal, we do that all the time. But I don't think anyone in this (or previous) thread brought up the number of lines of kdbus as an unltimate argument for questioning or even NACKing it. So I completely fail to see why this is so relevant that you keep repeating it. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:40 ` Greg Kroah-Hartman 2015-04-15 16:48 ` Jiri Kosina @ 2015-04-15 17:20 ` Steven Rostedt 2015-04-15 17:41 ` Havoc Pennington ` (2 more replies) 1 sibling, 3 replies; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 17:20 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 18:40:33 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Apr 15, 2015 at 11:41:53AM -0400, Steven Rostedt wrote: > > > > And obviously there is a lack of trust. And once kdbus is in, we must use > > it, or support our own distro where we just do not have the time. > > Just like cgroups, and ftrace :) Exactly. > > > Personally, I'm fine with getting something in that will help userspace > > tools work better. The issue I see, mostly from the side lines as I haven't > > totally submerged myself into the dbus protocol (I think I should spend > > some time to do just that), this is going too fast. Once it is in the kernel, > > whatever ABI we expose is locked in stone. There's no changing it. We need > > to make sure that this is well thought out. People seem to be of the impression > > that the current dbus design has flaws, but because everything relies on it > > we must still push it into the kernel because it mimics what is out there > > in user space. I disagree. > > "fast"? Are you kidding me? This stuff has been under active, public, > development for over two years. We have been posting public patches, > asking for review and comments for _months_ now. Given that there were > no more specific review comments on the patch set, and its success in > linux-next for almost the entire 4.0 development cycle, I asked it to be > merged. > > I don't know too many other kernel features/drivers that have taken this > long, or done this "slowly", do you? What other features/drivers that you know introduce a major new IPC user space interface that will be a core component of the system? > > > As others have said. We do not need to follow the dbus design. If we can supply > > a better transport layer than what the kernel supplies today, then tools will > > eventually merge to it away from dbus. Perhaps the kernel can supply just enough > > to have dbus improve its speed, but not with the entire complex solution that > > kdbus is presenting today. > > I originally thought this would work too. 8 months of work later, I was > proven wrong, that will not work. Or it imposes too much additional > work on userspace that really makes no sense at all. The in-kernel code > isn't a lot (again, 13k lines, smaller than almost all of the drivers > you are using today on an individual basis) It's also really fast, but > with benchmarks, David and Andy have found some minor bottlenecks that > can make things faster. > > Yes it seems complex, but read the documentation to get an idea of what > is happening here. I think you will get a better appreciation of what > is going on. I read a bit of the documentation, but not enough. I really need to sit down and play with code. That's the way I learn and understand. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:20 ` Steven Rostedt @ 2015-04-15 17:41 ` Havoc Pennington 2015-04-15 17:55 ` Greg Kroah-Hartman 2015-04-15 18:12 ` Greg Kroah-Hartman 2 siblings, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-15 17:41 UTC (permalink / raw) To: Steven Rostedt Cc: Greg Kroah-Hartman, Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 1:20 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > I read a bit of the documentation, but not enough. I really need to sit > down and play with code. That's the way I learn and understand. > It might be useful for some of the current devs to post about the best APIs to play with these days - my old libdbus is pretty painful, compared to some of the newer stuff. gdbus nicely shows a callback-based way to handle owning a service, using a function like g_bus_own_name: https://developer.gnome.org/gio/stable/gio-Owning-Bus-Names.html#g-bus-own-name The callback-based approach means the library can handle reconnection/restart on behalf of the app. The flip side (the way you use rather than provide a service) looks similar: https://developer.gnome.org/gio/stable/gio-Watching-Bus-Names.html#g-bus-watch-name Here the library can deal with complexities of a service being restarted, the app only has to write the callbacks so they can be called more than once (with alternating appeared/vanished handlers). You can see in those API docs more of the ordering guarantees, in this case on callback invocation - less for apps to screw up. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:20 ` Steven Rostedt 2015-04-15 17:41 ` Havoc Pennington @ 2015-04-15 17:55 ` Greg Kroah-Hartman 2015-04-15 21:55 ` One Thousand Gnomes 2015-04-15 18:12 ` Greg Kroah-Hartman 2 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 17:55 UTC (permalink / raw) To: Steven Rostedt Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote: > > I don't know too many other kernel features/drivers that have taken this > > long, or done this "slowly", do you? > > What other features/drivers that you know introduce a major new IPC > user space interface that will be a core component of the system? We've been merging these about one every other kernel release for a while now. Look at the drivers/misc/mic/ for one such example, there are many others like this that are dealing with distributed systems and having the kernel communicate between them through some custom userspace api. Usually ioctls :) We merge a lot of stuff, and unfortunately it's hard to get a view of everything that happens all the time. I suggest reading at least the shortlog summary of every commit if people are curious, I know I do. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:55 ` Greg Kroah-Hartman @ 2015-04-15 21:55 ` One Thousand Gnomes 0 siblings, 0 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 21:55 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 19:55:15 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote: > > > I don't know too many other kernel features/drivers that have taken this > > > long, or done this "slowly", do you? > > > > What other features/drivers that you know introduce a major new IPC > > user space interface that will be a core component of the system? > > We've been merging these about one every other kernel release for a > while now. Look at the drivers/misc/mic/ For a single specific piece of hardware, not a general API that by your own admission will effectively be mandatory. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:20 ` Steven Rostedt 2015-04-15 17:41 ` Havoc Pennington 2015-04-15 17:55 ` Greg Kroah-Hartman @ 2015-04-15 18:12 ` Greg Kroah-Hartman 2 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 18:12 UTC (permalink / raw) To: Steven Rostedt Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote: > > Yes it seems complex, but read the documentation to get an idea of what > > is happening here. I think you will get a better appreciation of what > > is going on. > > I read a bit of the documentation, but not enough. I really need to sit > down and play with code. That's the way I learn and understand. Here's a good mapping for C developers that Lennart wrote last year: https://lwn.net/Articles/619250/ that should give you a good starting point. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:40 ` Greg Kroah-Hartman 2015-04-15 13:03 ` Borislav Petkov 2015-04-15 15:41 ` Steven Rostedt @ 2015-04-15 19:04 ` Martin Steigerwald 2 siblings, 0 replies; 333+ messages in thread From: Martin Steigerwald @ 2015-04-15 19:04 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am Mittwoch, 15. April 2015, 13:40:36 schrieb Greg Kroah-Hartman: > On Wed, Apr 15, 2015 at 11:44:11AM +0200, Borislav Petkov wrote: > > On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote: > > > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote: > > > > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote: > > > > > > We're all forced to use cgroups, systemd, udev unless we want > > > > > > to have busybox as userland. That's a fact. > > > > > > > > > > Is that a problem? > > > > > > > > I'm amazed that you're really actually asking that question :-( > > > > > > Really? Why can't userspace rely on the features that the kernel > > > provides them? > > > > Userspace can do whatever it wants. As long as I'm not being *forced* > > to do what userspace thinks is the right thing. > > > > It seems to me that since that whole systemd* debacle started, we're > > forgetting the choice aspect. > > What "choice" aspect? Surely you aren't going to make the "Linux is > about choice" argument are you? > > > And dammit, I want my choice. I want to be able to choose what I'm > > running. Not run what someone else thought what would be good for me > > to > > run. If I wanted that, I'd long switched to windoze or äbble. > > Oh crap, you went there :) > > Take a look at http://www.islinuxaboutchoice.com/ please. Just one question: In what way is the post of a single kernel developer authoritative for the whole community? Even if I would make a poster of 200x100 meters or so and stick it onto a building, it wouldn´t be. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:20 ` Greg Kroah-Hartman 2015-04-15 9:21 ` Borislav Petkov @ 2015-04-15 9:28 ` Richard Weinberger 1 sibling, 0 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-15 9:28 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am 15.04.2015 um 11:20 schrieb Greg Kroah-Hartman: > On Wed, Apr 15, 2015 at 11:00:50AM +0200, Richard Weinberger wrote: >> Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman: >>> On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote: >>>> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>>>> We had been there before. To paraphrase another... meticulously honorable >>>>>> person, "if you didn't want something relied upon, why have you put it into the >>>>>> kernel?" Said person is on the record as having no problem whatsoever with >>>>>> adding dependencies to the bottom of userland stack. >>>>> >>>>> It appears that, if kdbus is merged, upstream udev may end up requiring it: >>>>> >>>>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html >>>> >>>> Why so surprised? >>>> kdbus will be a major hard-dependency for every non-trivial userland. >>>> Like cgroups... >>> >>> Maybe because things like cgroups, and kdbus in the future, solves a >>> need that the developers in that area have to solve problems and >>> provide functionality that their users require? >> >> I agree that a high level bus is needed and dbus is not perfect. >> But this does not mean that we need a in-kernel dbus in any case. > > So what do you propose to solve the issues presented in my original > email about the usecases that this code addresses? > >>> Look, us kernel developers only work on one huge, multithreaded, global >>> state binary. Our experience in multi-application interactions with >>> shared state and permission requirements is usually quite limited. If >>> you don't trust the developers of those programs outside the kernel, >>> don't use them, there are still distros out there that don't require >>> them. >> >> We're all forced to use cgroups, systemd, udev unless we want to have busybox >> as userland. That's a fact. > > Is that a problem? > >> systemd and its dependencies are not a bad thing per se. >> But we have to be very sure that new hard-dependencies are >> in well shape before we push them into the kernel. > > That's fine, and normal, and I expect it. But please provide technical > reasons why the proposal is not acceptable, like Andy has done in this > thread. I did not state that the proposal is not acceptable. My statement was that we have to be well aware of the fact that we will be forced to use kdbus in future as it will become a dependency. Some developers on IRC said they don't care about kdbus at all as long they can disable it. This is wrong, we have to use it. And that is fine. But we're all have be aware of the implications. kdbus will be ABI. Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:48 ` Greg Kroah-Hartman 2015-04-15 9:00 ` Richard Weinberger @ 2015-04-15 11:25 ` One Thousand Gnomes 2015-04-15 13:20 ` Borislav Petkov 2015-04-15 15:45 ` Steven Rostedt 1 sibling, 2 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 11:25 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni > Look, us kernel developers only work on one huge, multithreaded, global > state binary. Our experience in multi-application interactions with > shared state and permission requirements is usually quite limited. If > you don't trust the developers of those programs outside the kernel, > don't use them, there are still distros out there that don't require > them. Speak for yourself. There are a lot of us here who work and have worked on low level messaging, on networking, on clusters and on things like distributed shared memory, infiniband etc. I've worked on networks, including broken stateful protocols, I've maintined and developed internet and ISDN router code, I've worked with message passing realtime systems. Equally the folks who wrote dbus generally also know sweet fa about writing a kernel and maintaining it for 25 years. Gtk is on its 3rd completely incompatible instance (and has incompatibilities even within major versions), Gnome is on its third major incompatible release - closer would be to say at least the "second project with the same name", and neither are as old as the kernel. dbus is not an appropriate design for a kernel messaging layer for a variety of reasons. That's not to say dbus shouldn't be able to use a fast kernel messaging layer, or that one shouldn't exist. dbus is basically a very large very specialized and somewhat flawed policy engine on top of what should be simple messaging. The two need splitting apart. Abstract low level messaging layers are not a new concept. V7 unix had one experimentally. It's about getting the separation right. IMHO that probably involves getting the right people in the right place together - dbus designers, MPI and realtime people, kernel folks and possibly also some of the hardware messaging folk. In filesystem terms - stop writing a dbus only file system - figure out what a messaging "vfs" looks like - figure out what an clean low level kernel model looks like - figure out what has to be where to put the policy in userspace What might also be worth review is how much dbus traffic actually ought to be an object store implemented say with tmpfs and inotify type functionality (or extensions of that) so that you can set/read/enumerate/get change notifications on properties. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:25 ` One Thousand Gnomes @ 2015-04-15 13:20 ` Borislav Petkov 2015-04-15 15:45 ` Steven Rostedt 1 sibling, 0 replies; 333+ messages in thread From: Borislav Petkov @ 2015-04-15 13:20 UTC (permalink / raw) To: One Thousand Gnomes Cc: Greg Kroah-Hartman, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote: > dbus is not an appropriate design for a kernel messaging layer for a > variety of reasons. That's not to say dbus shouldn't be able to use a > fast kernel messaging layer, or that one shouldn't exist. > > dbus is basically a very large very specialized and somewhat flawed > policy engine on top of what should be simple messaging. The two need > splitting apart. > > Abstract low level messaging layers are not a new concept. V7 unix had > one experimentally. It's about getting the separation right. > > IMHO that probably involves getting the right people in the right place > together - dbus designers, MPI and realtime people, kernel folks and > possibly also some of the hardware messaging folk. > > In filesystem terms > > - stop writing a dbus only file system > - figure out what a messaging "vfs" looks like > - figure out what an clean low level kernel model looks like > - figure out what has to be where to put the policy in userspace > > What might also be worth review is how much dbus traffic actually ought to > be an object store implemented say with tmpfs and inotify type > functionality (or extensions of that) so that you can > set/read/enumerate/get change notifications on properties. FWIW, this sounds really sane and makes a lot of sense to me. I'd be willing to give it some review cycles, as far as I can, when done this way. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:25 ` One Thousand Gnomes 2015-04-15 13:20 ` Borislav Petkov @ 2015-04-15 15:45 ` Steven Rostedt 2015-04-15 15:46 ` Andy Lutomirski 2015-04-15 16:35 ` Greg Kroah-Hartman 1 sibling, 2 replies; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 15:45 UTC (permalink / raw) To: One Thousand Gnomes Cc: Greg Kroah-Hartman, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote: > > IMHO that probably involves getting the right people in the right place > together - dbus designers, MPI and realtime people, kernel folks and > possibly also some of the hardware messaging folk. /me continues on as a broken record I suggest that we can do this at Linux Plumbers, and then follow up at Kernel Summit, for those that can (or wont) attend plumbers. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 15:45 ` Steven Rostedt @ 2015-04-15 15:46 ` Andy Lutomirski 2015-04-15 16:35 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-15 15:46 UTC (permalink / raw) To: Steven Rostedt Cc: One Thousand Gnomes, Greg Kroah-Hartman, Richard Weinberger, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 8:45 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote: >> >> IMHO that probably involves getting the right people in the right place >> together - dbus designers, MPI and realtime people, kernel folks and >> possibly also some of the hardware messaging folk. > > /me continues on as a broken record > > I suggest that we can do this at Linux Plumbers, and then follow up at > Kernel Summit, for those that can (or wont) attend plumbers. I'm definitely available for KS. I'm not sure about Plumbers. --Andy > > -- Steve > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 15:45 ` Steven Rostedt 2015-04-15 15:46 ` Andy Lutomirski @ 2015-04-15 16:35 ` Greg Kroah-Hartman 2015-04-15 17:06 ` Steven Rostedt 1 sibling, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 16:35 UTC (permalink / raw) To: Steven Rostedt Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:45:52AM -0400, Steven Rostedt wrote: > On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote: > > > > IMHO that probably involves getting the right people in the right place > > together - dbus designers, MPI and realtime people, kernel folks and > > possibly also some of the hardware messaging folk. > > /me continues on as a broken record > > I suggest that we can do this at Linux Plumbers, and then follow up at > Kernel Summit, for those that can (or wont) attend plumbers. I really doubt this will work for Plumbers, sorry. And technical things don't work well, if at all, at Kernel Summit. We have had meetings about this at the past two Plumbers conferences, where none of these things came up (i.e. dislike of the D-Bus model). I'll be glad to discuss this at both places, but let's try to work through the technical things through email, as really, that's the best place for it. Al just proved this by pointing out some issues to be resolved (RW lock only used as a W lock, odd atomic values and locking without documenting the lifecycles, etc.) And that's the way this is supposed to work, nothing new/different here that I can see. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:35 ` Greg Kroah-Hartman @ 2015-04-15 17:06 ` Steven Rostedt 2015-04-15 17:31 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 17:06 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 18:35:20 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > I suggest that we can do this at Linux Plumbers, and then follow up at > > Kernel Summit, for those that can (or wont) attend plumbers. > > I really doubt this will work for Plumbers, sorry. And technical things > don't work well, if at all, at Kernel Summit. > > We have had meetings about this at the past two Plumbers conferences, > where none of these things came up (i.e. dislike of the D-Bus model). But were the people that are not liking it at those conference sessions? > > I'll be glad to discuss this at both places, but let's try to work > through the technical things through email, as really, that's the best > place for it. > > Al just proved this by pointing out some issues to be resolved (RW lock > only used as a W lock, odd atomic values and locking without documenting > the lifecycles, etc.) And that's the way this is supposed to work, > nothing new/different here that I can see. But you are missing one of the complaints that I'm reading from people. The proposed ABI is too complex. Do we really want to jump into having to support another tty layer? One thing that I think may be really worth doing is that everyone on this thread that has not yet done so, write a simple dbus application to try to understand its design. Break it down to the requirements that are needed, and discuss that. Is there a reason that this patch must go in this merge window? Having something this controversial take place during the merge window suggests its a bit premature to push in now. Especially since it creates a new user space interface. I think we need to really think hard and long before we add something that can not be modified at a later date. I personally think face to face may help, even if it's just hallway tracks. But at a minimum, I think more kernel developers need to play with dbus to understand this more. And then be able to give a better feedback. I'm also thinking that the bare minimum for a transport layer should go in. Find out the exact requirements (as Alan suggested) and implement that, instead of just implementing the full layer that is happening in userspace today. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:06 ` Steven Rostedt @ 2015-04-15 17:31 ` Greg Kroah-Hartman 2015-04-15 18:04 ` Steven Rostedt 2015-04-15 21:56 ` One Thousand Gnomes 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 17:31 UTC (permalink / raw) To: Steven Rostedt Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:06:49PM -0400, Steven Rostedt wrote: > On Wed, 15 Apr 2015 18:35:20 +0200 > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > I suggest that we can do this at Linux Plumbers, and then follow up at > > > Kernel Summit, for those that can (or wont) attend plumbers. > > > > I really doubt this will work for Plumbers, sorry. And technical things > > don't work well, if at all, at Kernel Summit. > > > > We have had meetings about this at the past two Plumbers conferences, > > where none of these things came up (i.e. dislike of the D-Bus model). > > But were the people that are not liking it at those conference sessions? People who don't like a topic, usually go to a session about it, why would they? :) > > I'll be glad to discuss this at both places, but let's try to work > > through the technical things through email, as really, that's the best > > place for it. > > > > Al just proved this by pointing out some issues to be resolved (RW lock > > only used as a W lock, odd atomic values and locking without documenting > > the lifecycles, etc.) And that's the way this is supposed to work, > > nothing new/different here that I can see. > > But you are missing one of the complaints that I'm reading from > people. The proposed ABI is too complex. Do we really want to jump into > having to support another tty layer? Don't make idle comments, the tty layer is far more complex and larger than the kdbus code, with much nastier issues and problems. And we handle that just fine :) As far as the "support" issue, we have 4 people who are all experienced, senior kernel developers who are signed up to maintain this. There's more experience here for this one MAINTAINERS entry per line of code than I have seen in quite some time. Are people somehow worried that all 4 of us are going to run away? Do people not trust the 4 of us to stick around and maintain this and deal with any issues found for the next few decades? If so, please let us know, as it seems like people feel we are dumping this code on them to maintain, which is anything but true. > One thing that I think may be really worth doing is that everyone on > this thread that has not yet done so, write a simple dbus application > to try to understand its design. Break it down to the requirements that > are needed, and discuss that. I've done that, it's hard, use the gdbus interface instead, it makes your life much easier. I'll again refer to ALSA here, no one writes a "raw" ALSA program, they all use the library to interact with the kernel. Do that here, there are wonderful dbus libraries out there, for all languages. Use them instead. > Is there a reason that this patch must go in this merge window? What makes this merge window any different from any other? Again, I explained why I asked it to be merged at this point in time. If people have technical issues with it, I'll be more than glad to work them out and merge it later, there's no "hard and fast deadline" anyone is asking for here. > Having something this controversial take place during the merge window > suggests its a bit premature to push in now. "take place"? Have you been ignoring these patches posted numerous times for many months? This is the point in time to ask for code to be merged, just like any other code, nothing is special here. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:31 ` Greg Kroah-Hartman @ 2015-04-15 18:04 ` Steven Rostedt 2015-04-15 21:56 ` One Thousand Gnomes 1 sibling, 0 replies; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 18:04 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 19:31:45 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > But were the people that are not liking it at those conference sessions? > > People who don't like a topic, usually go to a session about it, why > would they? :) Exactly, but if you invite those people, and say "hey, here's your chance to set us straight" maybe they'll come. I would. But give them a few weeks notice, so that they can study what's out there. > > But you are missing one of the complaints that I'm reading from > > people. The proposed ABI is too complex. Do we really want to jump into > > having to support another tty layer? > > Don't make idle comments, the tty layer is far more complex and larger We are all making our own little exaggerated metaphors. ;-) > than the kdbus code, with much nastier issues and problems. And we > handle that just fine :) > > As far as the "support" issue, we have 4 people who are all experienced, > senior kernel developers who are signed up to maintain this. There's > more experience here for this one MAINTAINERS entry per line of code > than I have seen in quite some time. No, but people seems to be worried about the complexity. If everyone understands that there's no other choice but to have it complex (like RCU is), then everyone will be fine with it. But right now, people are questioning why it needs to be complex. But we need more people to spend time on it to make sure it does. > > One thing that I think may be really worth doing is that everyone on > > this thread that has not yet done so, write a simple dbus application > > to try to understand its design. Break it down to the requirements that > > are needed, and discuss that. > > I've done that, it's hard, use the gdbus interface instead, it makes > your life much easier. I still need to play with the code and see exactly what it does. What goes into the kernel needs to be the raw interface only. Everything else should be in a library that takes care of the details. Is that what is here? > > I'll again refer to ALSA here, no one writes a "raw" ALSA program, they > all use the library to interact with the kernel. Do that here, there > are wonderful dbus libraries out there, for all languages. Use them > instead. Is this what is being proposed (again, I need to go back and read the original change log. I did it once, but mostly forgot what was in it). > > > Is there a reason that this patch must go in this merge window? > > What makes this merge window any different from any other? Again, I > explained why I asked it to be merged at this point in time. If people > have technical issues with it, I'll be more than glad to work them out > and merge it later, there's no "hard and fast deadline" anyone is asking > for here. Well, there's been a few minor things that have been pointed out (the locking), and having something as small as that take place during a merge window, to me, would be cause to wait another merge window. > > > Having something this controversial take place during the merge window > > suggests its a bit premature to push in now. > > "take place"? Have you been ignoring these patches posted numerous > times for many months? This is the point in time to ask for code to be > merged, just like any other code, nothing is special here. But there are still complaints about it. Perhaps people are just noticing. We are all busy, and nobody (but perhaps Andrew Morton and Jon Corbet) reads every LKML message. It's now getting more eyes. That's a good thing. I'd like more time to play with it so that I can understand why exactly it needs to go in as you say it does. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:31 ` Greg Kroah-Hartman 2015-04-15 18:04 ` Steven Rostedt @ 2015-04-15 21:56 ` One Thousand Gnomes 2015-04-15 22:11 ` Andy Lutomirski 1 sibling, 1 reply; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 21:56 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Steven Rostedt, Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 19:31:45 +0200 > Don't make idle comments, the tty layer is far more complex and larger > than the kdbus code, with much nastier issues and problems. And we > handle that just fine :) The tty layer is the way it is because of design decisions dating back 20 years that were (with hindsight) wrong coupled with the fact that POSIX took a lot of the behavioural guarantees from an armwaving claim about what Unix(tm) implemented without thinking about how to implement them (as far as I can tell - given many of the guarantees are broken in Unix!) > I'll again refer to ALSA here, no one writes a "raw" ALSA program, they > all use the library to interact with the kernel. Do that here, there > are wonderful dbus libraries out there, for all languages. Use them > instead. Agreed entirely - I don't disagree that we need a fast messaging layer. The question is what bits belong in kernel. Go wants one, JMS wants one, porting from stuff like QNX wants one (although they use the POSIX API on QNX), MPI wants one (but with some useful and subtly different semantics), various embedded things from tiny uKernels want one. The question is what the kernel bit should actually look like, and how many we need. My guess is that we actually have three of the big use cases covered - futexes and shared memory cover the tiny uKernel emulation bits (and on a lawnmower engine sized ARM thats probably the only way to get the speed approaching that of a tiny rtos) - posix queues cover things like QNX porting - publish/subscribe - via tmpfs but we don't cover - multicasting - some types of credential and authority passing - scatter/gather without excessive userspace wakes > > Is there a reason that this patch must go in this merge window? > > What makes this merge window any different from any other? Again, I > explained why I asked it to be merged at this point in time. If people > have technical issues with it, I'll be more than glad to work them out > and merge it later, there's no "hard and fast deadline" anyone is asking > for here. The problem I have is that every time someone points out a fundamental design issue you simply say "Why haven't you reviewed 13,000 lines of code". I haven't given it an in depth review for the same reason as if someone posted 13,000 lines of "I've got an awesome new file system which uses a FAT and 8.3 file names". There's some more pressing concerns to sort first. The fact it's complex and hard to follow also doesn't encourage review. And the fact Al tried to read it and is asking for help really worries me 8) > > > Having something this controversial take place during the merge window > > suggests its a bit premature to push in now. > > "take place"? Have you been ignoring these patches posted numerous > times for many months? This is the point in time to ask for code to be > merged, just like any other code, nothing is special here. Well - you've asked. I see two NACKs from people with great taste. So I think the next step is to defer trying to submit it and work through the fact that Al can't follow the locking, and other people don't believe the security model is maintainable. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 21:56 ` One Thousand Gnomes @ 2015-04-15 22:11 ` Andy Lutomirski 2015-04-15 22:18 ` Al Viro 2015-04-16 10:31 ` Daniel Mack 0 siblings, 2 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-15 22:11 UTC (permalink / raw) To: One Thousand Gnomes Cc: Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 2:56 PM, One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: > On Wed, 15 Apr 2015 19:31:45 +0200 >> Don't make idle comments, the tty layer is far more complex and larger >> than the kdbus code, with much nastier issues and problems. And we >> handle that just fine :) > > The tty layer is the way it is because of design decisions dating back 20 > years that were (with hindsight) wrong coupled with the fact that POSIX > took a lot of the behavioural guarantees from an armwaving claim about > what Unix(tm) implemented without thinking about how to implement them > (as far as I can tell - given many of the guarantees are broken in Unix!) > >> I'll again refer to ALSA here, no one writes a "raw" ALSA program, they >> all use the library to interact with the kernel. Do that here, there >> are wonderful dbus libraries out there, for all languages. Use them >> instead. > > Agreed entirely - I don't disagree that we need a fast messaging layer. > The question is what bits belong in kernel. Go wants one, JMS wants one, > porting from stuff like QNX wants one (although they use the POSIX API > on QNX), MPI wants one (but with some useful and subtly different > semantics), various embedded things from tiny uKernels want one. > > The question is what the kernel bit should actually look like, and how > many we need. > > My guess is that we actually have three of the big use cases covered > > - futexes and shared memory cover the tiny uKernel emulation bits (and on > a lawnmower engine sized ARM thats probably the only way to get the > speed approaching that of a tiny rtos) > - posix queues cover things like QNX porting > - publish/subscribe - via tmpfs > > but we don't cover > > - multicasting > - some types of credential and authority passing > - scatter/gather without excessive userspace wakes I would really like to see a very lightweight capability-based messaging system. By "capability-based" I don't mean Linux capabilities. I mean that a user program could give some very lightweight token to a peer authorizing that peer to use some service (by reference to the same token), and the peer could pass it on to other peers as an introduction mechanism. (Search for "capability-based security".) This is functionally identical to passing AF_UNIX socket fds over SCM_RIGHTS, but I want something much lighter weight. Also, getting the really high performance stuff right would be nice. Binder has one thing going for it (IIRC -- I've talked about it to some of the authors, but I've never so much as glanced at the code): it has a primitive to send and wait for a reply. This reduces the load on scheduler. I wish kdbus were blazingly fast, but I don't think it is :( I think the bar should be either similar performance to (peer-to-peer) AF_UNIX or something possibly more complex but considerably faster. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:11 ` Andy Lutomirski @ 2015-04-15 22:18 ` Al Viro 2015-04-15 22:28 ` Andy Lutomirski 2015-04-16 10:31 ` Daniel Mack 1 sibling, 1 reply; 333+ messages in thread From: Al Viro @ 2015-04-15 22:18 UTC (permalink / raw) To: Andy Lutomirski Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote: > This is functionally identical to passing AF_UNIX socket fds over > SCM_RIGHTS, but I want something much lighter weight. Most of the weight in SCM_RIGHTS comes from the fact that you can pass AF_UNIX sockets over it, which requires a garbage collector. Exclude that and suddenly it becomes very cheap... ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:18 ` Al Viro @ 2015-04-15 22:28 ` Andy Lutomirski 2015-04-15 22:48 ` Al Viro 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-15 22:28 UTC (permalink / raw) To: Al Viro Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote: > >> This is functionally identical to passing AF_UNIX socket fds over >> SCM_RIGHTS, but I want something much lighter weight. > > Most of the weight in SCM_RIGHTS comes from the fact that you can > pass AF_UNIX sockets over it, which requires a garbage collector. > Exclude that and suddenly it becomes very cheap... I should have been more specific. I don't mean the performance of SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds around, each with their socket data structures and buffers. I think that dbus could be quite efficiently implemented with a userspace daemon that just introduces peers to each other, but the fd explosion could be rather bad for some use cases. I'll be the first to admit that I don't have a clean API in mind. There was a lightweight fd proposal way back when, but it never went anywhere, and it might not be suitable anyway. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:28 ` Andy Lutomirski @ 2015-04-15 22:48 ` Al Viro 2015-04-15 22:54 ` Andy Lutomirski 2015-04-15 22:56 ` Eric Dumazet 0 siblings, 2 replies; 333+ messages in thread From: Al Viro @ 2015-04-15 22:48 UTC (permalink / raw) To: Andy Lutomirski Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote: > On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote: > > > >> This is functionally identical to passing AF_UNIX socket fds over > >> SCM_RIGHTS, but I want something much lighter weight. > > > > Most of the weight in SCM_RIGHTS comes from the fact that you can > > pass AF_UNIX sockets over it, which requires a garbage collector. > > Exclude that and suddenly it becomes very cheap... > > I should have been more specific. I don't mean the performance of > SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds > around, each with their socket data structures and buffers. > > I think that dbus could be quite efficiently implemented with a > userspace daemon that just introduces peers to each other, but the fd > explosion could be rather bad for some use cases. > > I'll be the first to admit that I don't have a clean API in mind. > There was a lightweight fd proposal way back when, but it never went > anywhere, and it might not be suitable anyway. Wait, are you talking about the overhead of descriptors used for capability tokens (essentially zero - one system-wide struct file per capability + one pointer in descriptor table of anyone who holds it + two bits in bitmaps in the sam descriptor tables) or about the overhead of descriptors used to send/receive those over? The latter don't have to be sockets at all - they could bloody well be files on some ipcfs, or character device, or FIFOs, etc. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:48 ` Al Viro @ 2015-04-15 22:54 ` Andy Lutomirski 2015-04-15 23:27 ` Al Viro 2015-04-15 22:56 ` Eric Dumazet 1 sibling, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-15 22:54 UTC (permalink / raw) To: Al Viro Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 3:48 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote: >> On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote: >> > >> >> This is functionally identical to passing AF_UNIX socket fds over >> >> SCM_RIGHTS, but I want something much lighter weight. >> > >> > Most of the weight in SCM_RIGHTS comes from the fact that you can >> > pass AF_UNIX sockets over it, which requires a garbage collector. >> > Exclude that and suddenly it becomes very cheap... >> >> I should have been more specific. I don't mean the performance of >> SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds >> around, each with their socket data structures and buffers. >> >> I think that dbus could be quite efficiently implemented with a >> userspace daemon that just introduces peers to each other, but the fd >> explosion could be rather bad for some use cases. >> >> I'll be the first to admit that I don't have a clean API in mind. >> There was a lightweight fd proposal way back when, but it never went >> anywhere, and it might not be suitable anyway. > > Wait, are you talking about the overhead of descriptors used for capability > tokens (essentially zero - one system-wide struct file per capability + > one pointer in descriptor table of anyone who holds it + two bits in > bitmaps in the sam descriptor tables) or about the overhead of descriptors > used to send/receive those over? The latter don't have to be sockets > at all - they could bloody well be files on some ipcfs, or character device, > or FIFOs, etc. Huh, interesting. I was imagining that each of a server's peers (capability holders) would have a fresh struct file, but maybe this wouldn't be needed at all. You'd still need a way to get replies to your request, but the API could just as easily be: int send_to_capability(int dest, int source, const void *data, size_t len, ...); where dest would be the destination's fd and source would be whatever receive queue I expect the response on. So maybe this is feasible. It doesn't solve broadcasts, but dbus unicast could easily layer over a facility like this and the context switch problem would go away for unicast. Heck, I'd use it for my own proprietary stuff, too. It would be way easier than the absurd tangle of socketpairs I currently use. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:54 ` Andy Lutomirski @ 2015-04-15 23:27 ` Al Viro 2015-04-16 0:47 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Al Viro @ 2015-04-15 23:27 UTC (permalink / raw) To: Andy Lutomirski Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 03:54:10PM -0700, Andy Lutomirski wrote: > Huh, interesting. > > I was imagining that each of a server's peers (capability holders) > would have a fresh struct file, but maybe this wouldn't be needed at > all. You'd still need a way to get replies to your request, but the > API could just as easily be: > > int send_to_capability(int dest, int source, const void *data, size_t len, ...); > > where dest would be the destination's fd and source would be whatever > receive queue I expect the response on. > > So maybe this is feasible. It doesn't solve broadcasts, but dbus > unicast could easily layer over a facility like this and the context > switch problem would go away for unicast. > > Heck, I'd use it for my own proprietary stuff, too. It would be way > easier than the absurd tangle of socketpairs I currently use. BTW, the main issue with AF_UNIX passing is that recepient isn't asleep awaiting for descriptors - they are thrown by sender at whoever's receiving and sit there until somebody gets around to picking them. _IF_ we had client: I want a desciptor <goes to sleep, interruptibly> kernel: assign it a sequence number server: sees request (including sequence number) server: give this fd to originator of request #N kernel: check if originator is still there, insert the damn thing into their descriptor table if they still are and return the obtained number or server: tell the originator of request #N to fuck off kernel: check if originator is still there and gleefully pass the "fuck off" if they still are we wouldn't have the in-flight state at all, and there goes the garbage collection shite. With some elaboration, it could even carry the authentication traffic - "fuck off" might be "answer this challenge", with the next "I want a descriptor" carrying reply... ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 23:27 ` Al Viro @ 2015-04-16 0:47 ` Andy Lutomirski 2015-04-16 1:04 ` Al Viro 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-16 0:47 UTC (permalink / raw) To: Al Viro Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 4:27 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Apr 15, 2015 at 03:54:10PM -0700, Andy Lutomirski wrote: >> Huh, interesting. >> >> I was imagining that each of a server's peers (capability holders) >> would have a fresh struct file, but maybe this wouldn't be needed at >> all. You'd still need a way to get replies to your request, but the >> API could just as easily be: >> >> int send_to_capability(int dest, int source, const void *data, size_t len, ...); >> >> where dest would be the destination's fd and source would be whatever >> receive queue I expect the response on. >> >> So maybe this is feasible. It doesn't solve broadcasts, but dbus >> unicast could easily layer over a facility like this and the context >> switch problem would go away for unicast. >> >> Heck, I'd use it for my own proprietary stuff, too. It would be way >> easier than the absurd tangle of socketpairs I currently use. > > BTW, the main issue with AF_UNIX passing is that recepient isn't asleep > awaiting for descriptors - they are thrown by sender at whoever's receiving > and sit there until somebody gets around to picking them. > > _IF_ we had > client: I want a desciptor <goes to sleep, interruptibly> > kernel: assign it a sequence number > server: sees request (including sequence number) > server: give this fd to originator of request #N > kernel: check if originator is still there, insert the damn thing into their > descriptor table if they still are and return the obtained number > or > server: tell the originator of request #N to fuck off > kernel: check if originator is still there and gleefully pass the "fuck off" if > they still are > > we wouldn't have the in-flight state at all, and there goes the garbage > collection shite. With some elaboration, it could even carry the > authentication traffic - "fuck off" might be "answer this challenge", with > the next "I want a descriptor" carrying reply... I wonder if we could get away with having the receiver pre-allocate some placeholder fds and then have the kernel replace a placeholder with a passed fd immediately when the fd is sent and enqueue *that* in the cmsg data. If you send an fd to someone who hasn't assigned any placeholders to the receiving socket, then you get an error. To keep the accounting sane, a placeholder would be a bona fide fd, presumably a reference to a global placeholder anon_inode. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 0:47 ` Andy Lutomirski @ 2015-04-16 1:04 ` Al Viro 2015-04-16 5:53 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Al Viro @ 2015-04-16 1:04 UTC (permalink / raw) To: Andy Lutomirski Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 05:47:18PM -0700, Andy Lutomirski wrote: > I wonder if we could get away with having the receiver pre-allocate > some placeholder fds and then have the kernel replace a placeholder > with a passed fd immediately when the fd is sent and enqueue *that* in > the cmsg data. If you send an fd to someone who hasn't assigned any > placeholders to the receiving socket, then you get an error. *UGH* It's a really bad idea. The thing is, descriptor table that isn't shared is assumed to be unchanged. So when fdget() looks a file up, it doesn't have to bump its refcount - the reference in descriptor table itself will stay. Conversely, fdput() doesn't have to drop it in such case (we encode whether we need to drop into struct fd returned by fdget() and passed to fdput()). That relies on no third-party modifications of descriptor table and yes, the effect _is_ noticable - playing with struct file refcounts does result in considerable overhead. If recepient sits in "gimme a descriptor", we are fine - if descriptor table was shared, the other users would be doing full refcount song and dance and if it wasn't, recepient is the sole user _and_ it isn't betwee fdget() and fdput() at the moment. With your "replace the dummies when sending" trick we break all of that - we don't know what the recepient is doing at the moment and for all we know they might be in the middle of something like e.g. fstat() on your placeholder. With rather unpleasant effects... ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 1:04 ` Al Viro @ 2015-04-16 5:53 ` Andy Lutomirski 0 siblings, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-16 5:53 UTC (permalink / raw) To: Al Viro Cc: Arnd Bergmann, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Tom Gundersen, Richard Weinberger, Steven Rostedt, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, Djalal Harouni On Apr 15, 2015 6:04 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote: > > On Wed, Apr 15, 2015 at 05:47:18PM -0700, Andy Lutomirski wrote: > > > I wonder if we could get away with having the receiver pre-allocate > > some placeholder fds and then have the kernel replace a placeholder > > with a passed fd immediately when the fd is sent and enqueue *that* in > > the cmsg data. If you send an fd to someone who hasn't assigned any > > placeholders to the receiving socket, then you get an error. > > *UGH* > > It's a really bad idea. The thing is, descriptor table that isn't shared > is assumed to be unchanged. So when fdget() looks a file up, it doesn't > have to bump its refcount - the reference in descriptor table itself will > stay. Conversely, fdput() doesn't have to drop it in such case (we encode > whether we need to drop into struct fd returned by fdget() and passed to > fdput()). > > That relies on no third-party modifications of descriptor table and yes, > the effect _is_ noticable - playing with struct file refcounts does result > in considerable overhead. > > If recepient sits in "gimme a descriptor", we are fine - if descriptor table > was shared, the other users would be doing full refcount song and dance and > if it wasn't, recepient is the sole user _and_ it isn't betwee fdget() and > fdput() at the moment. With your "replace the dummies when sending" trick > we break all of that - we don't know what the recepient is doing at the moment > and for all we know they might be in the middle of something like e.g. > fstat() on your placeholder. With rather unpleasant effects... Hmm. I don't love the special blocking call either -- it break polling loops. We could have the existence of a placeholderfd count as an extra reference to the descriptor table, with the associated performance hit. Or we could allow each placeholderfd to collect one received fd but not actually switch over. The latter is ugly and still has minor DoS issues -- we'd have to prevent placeholderfds from being passed through this mechanism or SCM_RIGHTS. But wait... what about an evil trick? What if all placeholderfds are the *same* struct file and that struct file is never deleted? Then fdget on a placeholderfd is safe, since it's implicitly pinned. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:48 ` Al Viro 2015-04-15 22:54 ` Andy Lutomirski @ 2015-04-15 22:56 ` Eric Dumazet 1 sibling, 0 replies; 333+ messages in thread From: Eric Dumazet @ 2015-04-15 22:56 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 2015-04-15 at 23:48 +0100, Al Viro wrote: > On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote: > > On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote: > > > > > >> This is functionally identical to passing AF_UNIX socket fds over > > >> SCM_RIGHTS, but I want something much lighter weight. > > > > > > Most of the weight in SCM_RIGHTS comes from the fact that you can > > > pass AF_UNIX sockets over it, which requires a garbage collector. > > > Exclude that and suddenly it becomes very cheap... > > > > I should have been more specific. I don't mean the performance of > > SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds > > around, each with their socket data structures and buffers. > > > > I think that dbus could be quite efficiently implemented with a > > userspace daemon that just introduces peers to each other, but the fd > > explosion could be rather bad for some use cases. > > > > I'll be the first to admit that I don't have a clean API in mind. > > There was a lightweight fd proposal way back when, but it never went > > anywhere, and it might not be suitable anyway. > > Wait, are you talking about the overhead of descriptors used for capability > tokens (essentially zero - one system-wide struct file per capability + > one pointer in descriptor table of anyone who holds it + two bits in > bitmaps in the sam descriptor tables) or about the overhead of descriptors > used to send/receive those over? The latter don't have to be sockets > at all - they could bloody well be files on some ipcfs, or character device, > or FIFOs, etc. This kind of remind me futex : From an apparent simple idea we got to the point of having more than 3000 lines of code in kernel/futex.c It is sad that af_unix was chosen to support fd passing in the first place. This is serious DOS vector. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:11 ` Andy Lutomirski 2015-04-15 22:18 ` Al Viro @ 2015-04-16 10:31 ` Daniel Mack 2015-04-16 12:02 ` Tom Gundersen 1 sibling, 1 reply; 333+ messages in thread From: Daniel Mack @ 2015-04-16 10:31 UTC (permalink / raw) To: Andy Lutomirski, One Thousand Gnomes Cc: Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel, David Herrmann, Djalal Harouni On 04/16/2015 12:11 AM, Andy Lutomirski wrote: > Also, getting the really high performance stuff right would be nice. > Binder has one thing going for it (IIRC -- I've talked about it to > some of the authors, but I've never so much as glanced at the code): > it has a primitive to send and wait for a reply. This reduces the > load on scheduler. kdbus has the same thing, we call it a synchronous reply. That concept is actually comprehensively explained in kdbus.message(7): By default, all calls to kdbus are considered asynchronous, non-blocking. However, as there are many use cases that need to wait for a remote peer to answer a method call, there's a way to send a message and wait for a reply in a synchronous fashion. This is what the KDBUS_SEND_SYNC_REPLY controls. The KDBUS_CMD_SEND ioctl will block until the reply has arrived, the timeout limit is reached, in case the remote connection was shut down, or if interrupted by a signal before any reply; see signal(7). The offset of the reply message in the sender's pool is stored in in offset_reply when the ioctl has returned without error. Hence, there is no need for another KDBUS_CMD_RECV ioctl or anything else to receive the reply. Thanks, Daniel ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 10:31 ` Daniel Mack @ 2015-04-16 12:02 ` Tom Gundersen 2015-04-16 12:15 ` Olaf Hering 2015-04-21 16:36 ` Eric W. Biederman 0 siblings, 2 replies; 333+ messages in thread From: Tom Gundersen @ 2015-04-16 12:02 UTC (permalink / raw) To: Jiri Kosina Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 2:09 PM, Jiri Kosina <jkosina@suse.cz> wrote: > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > >> 'systemctl reboot' calls a bunch of other things to determine if you >> have local access to the machine, or permissions to reboot the machine >> (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do, >> and then, it decides to reboot or not. That happens today, right? I >> don't understand the argument here. > > And what exactly is the argument that this is the way it should be > implemnted? > > Why can't it just rely on the kernel to provide final answer to "to reboot > or not to reboot, that is the question"? > > At the end of the day, it's the kernel that decides whether it will really > ultimately ask the platform to reboot. > > If, for whatever reason (which might be completely invisible to userspace) > kernel decides not to do so, userspace has to be able to recover from such > failure in any case. This is not how shutting down a general purpose operating system works. If a system is shut down, all user sessions are terminated, all services are stopped in the right order, all remaining processes killed, all file systems are unmounted, all storage devices disassembled, and so on. All this is implemented entirely in userspace and involves a number of complex transitions from the normal init system, to a shutdown PID 1 process and finally a transition back to the initial ramdisk so that we can unmount the root file system even. After all that is done, in the right order, following dependencies, while enforcing timeouts, then the very last step is actually the reboot() system call that then brings the kernel to a halt, and possibly turns off power. Thus I don't see how your suggestion can be applied in any way to how system shutdown works: the shutdown procedure includes these non-trivial preparation steps described above, and it is essential that this preparation is not begun unless the client requesting it actually has sufficient rights to do so. Or to put this another way: if the system went all the way down, so that everything is killed, unmounted, disassembled, to the point even that we transitioned away from the root file system, then the reboot() system call is really just the tiniest bit of it. And you should not be able to get there if you originally didn't even possess the capability to execute that last step... Moreover, the daemon performing the shutdown tasks is necessarily always privileged enough to do so, so calling into the kernel and see what happens is completely the wrong thing to do (it would simply succeed). What matters is if the client calling the daemon is sufficiently privileged. If the client has the capabilites necessary to call the reboot syscall directly, it makes no sense to disallow them from doing a clean reboot. It would be like giving someone access to pull the power plug, but not allow them to shutdown the machine cleanly. To conclude, the kernel makes the decision for allowing reboot() to succeed based on CAP_SYS_BOOT, so when we decide whether or not to perform the preparation steps, we really must also use CAP_SYS_BOOT. If we are more restrictive, it does not gain us anything as people with CAP_SYS_BOOT can just circumvent our logic and "pull the plug" by calling reboot() directly. If we are less restrictive and for instance check for uid==0 it would essentially mean that we have added a way to circumvent the dropping of CAP_SYS_BOOT. Cheers, Tom ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 12:02 ` Tom Gundersen @ 2015-04-16 12:15 ` Olaf Hering 2015-04-16 12:43 ` Harald Hoyer 2015-04-21 16:36 ` Eric W. Biederman 1 sibling, 1 reply; 333+ messages in thread From: Olaf Hering @ 2015-04-16 12:15 UTC (permalink / raw) To: Tom Gundersen Cc: Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 16, Tom Gundersen wrote: > to a shutdown PID 1 process and finally a transition back to > the initial ramdisk so that we can unmount the root file system even. Is that wishful thinking or actually implemented somewhere? Olaf ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 12:15 ` Olaf Hering @ 2015-04-16 12:43 ` Harald Hoyer 0 siblings, 0 replies; 333+ messages in thread From: Harald Hoyer @ 2015-04-16 12:43 UTC (permalink / raw) To: Olaf Hering, Tom Gundersen Cc: Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am 16.04.2015 um 14:15 schrieb Olaf Hering: > On Thu, Apr 16, Tom Gundersen wrote: > >> to a shutdown PID 1 process and finally a transition back to >> the initial ramdisk so that we can unmount the root file system even. > > Is that wishful thinking or actually implemented somewhere? This is done on any system, which uses dracut and systemd for a long time now. As SUSE switched to dracut recently, it should be the same as on RHEL-7/Fedora now. If /run/initramfs/shutdown exists and is executable, /usr/lib/systemd/systemd-shutdown switches root to /run/initramfs/ and executes shutdown. The shutdown script umounts the old real root (after umounting /oldroot/{proc,sys,run,dev}), then if the old real root was living on an assembled device, like mdraid, the device is disassembled and waited for the device to be clean. See http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/99shutdown/shutdown.sh and for example for mdraid: http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/90mdraid/md-shutdown.sh This solved quite a lot of problems for unsynced raids. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 12:02 ` Tom Gundersen 2015-04-16 12:15 ` Olaf Hering @ 2015-04-21 16:36 ` Eric W. Biederman 2015-04-21 19:38 ` Matthew Garrett 1 sibling, 1 reply; 333+ messages in thread From: Eric W. Biederman @ 2015-04-21 16:36 UTC (permalink / raw) To: Tom Gundersen Cc: Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Tom Gundersen <teg@jklm.no> writes: > Moreover, the daemon performing the shutdown tasks is necessarily > always privileged enough to do so, so calling into the kernel and see > what happens is completely the wrong thing to do (it would simply > succeed). What matters is if the client calling the daemon is > sufficiently privileged. If the client has the capabilites necessary > to call the reboot syscall directly, it makes no sense to disallow > them from doing a clean reboot. It would be like giving someone access > to pull the power plug, but not allow them to shutdown the machine > cleanly. > > To conclude, the kernel makes the decision for allowing reboot() to > succeed based on CAP_SYS_BOOT, so when we decide whether or not to > perform the preparation steps, we really must also use CAP_SYS_BOOT. > If we are more restrictive, it does not gain us anything as people > with CAP_SYS_BOOT can just circumvent our logic and "pull the plug" by > calling reboot() directly. If we are less restrictive and for instance > check for uid==0 it would essentially mean that we have added a way to > circumvent the dropping of CAP_SYS_BOOT. *Blink* Privilege escalation via CAP_SYS_BOOT *Blink* *Puts on black hat* HeHeHe. You mean all I need to do to get around all of the logging servers is capture CAP_SYS_BOOT? Say like just capture this crazy watchdog program that doesn't run as root so that it can only reboot the system? HeHeHe So I can just trigger a clean reboot wait for journald, auditd, and syslog all to shut down and then do evil things to the machine without having to worry about erasing forensic evidence? Bahahaha! This looks like fun I should play with this. *Takes black hat off* Seriously it does not make sense to reuse these bits for purpose to which they were not designed. A reboot proceeded by a clean shutdown is something different from a reboot that skips all of those steps. I can understand the concerns about not wanting to allow circumventing dropping CAP_SYS_BOOT but even with that concern in place I think it is silly. That isn't what CAP_SYS_BOOT means. Over the long term userspace doing weird things like this will mean that we will have the change the kernel to add CAP_SYS_BOOT_THIS_TIME_I_MEAN_IT. And have that control the reboot system call and have the existing CAP_SYS_BOOT be some kind of token for userspace. Instead of going down that rat hole it would be much better for userspace to figure out a token of their own. Perhaps a file descriptor certain privileged processes can pass, perhaps something else. The bottom line is that I tend to suck at figuring out how to exploit systems and I saw an exploit possibility with the extended privileges you granted to CAP_SYS_BOOT nearly instantly. I can't imagine how kernel capabilities are the right too for this kind of job. Eric ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 16:36 ` Eric W. Biederman @ 2015-04-21 19:38 ` Matthew Garrett 2015-04-21 19:55 ` Austin S Hemmelgarn 0 siblings, 1 reply; 333+ messages in thread From: Matthew Garrett @ 2015-04-21 19:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Tom Gundersen, Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 21, 2015 at 11:36:54AM -0500, Eric W. Biederman wrote: > > HeHeHe. You mean all I need to do to get around all of the logging servers is > capture CAP_SYS_BOOT? Say like just capture this crazy watchdog program > that doesn't run as root so that it can only reboot the system? HeHeHe > So I can just trigger a clean reboot wait for journald, auditd, and > syslog all to shut down and then do evil things to the machine without > having to worry about erasing forensic evidence? CAP_SYS_BOOT gives you kexec, and kexec with init=/bin/sh lets you do anything. You added that in dc009d92435f99498cbc579ce76bf28e837e2c14 and now the horse is long gone. Don't give CAP_SYS_BOOT to anything you don't trust with full privileges. -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 19:38 ` Matthew Garrett @ 2015-04-21 19:55 ` Austin S Hemmelgarn 0 siblings, 0 replies; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-21 19:55 UTC (permalink / raw) To: Matthew Garrett, Eric W. Biederman Cc: Tom Gundersen, Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni [-- Attachment #1: Type: text/plain, Size: 1051 bytes --] On 2015-04-21 15:38, Matthew Garrett wrote: > On Tue, Apr 21, 2015 at 11:36:54AM -0500, Eric W. Biederman wrote: >> >> HeHeHe. You mean all I need to do to get around all of the logging servers is >> capture CAP_SYS_BOOT? Say like just capture this crazy watchdog program >> that doesn't run as root so that it can only reboot the system? HeHeHe >> So I can just trigger a clean reboot wait for journald, auditd, and >> syslog all to shut down and then do evil things to the machine without >> having to worry about erasing forensic evidence? > > CAP_SYS_BOOT gives you kexec, and kexec with init=/bin/sh lets you do > anything. You added that in dc009d92435f99498cbc579ce76bf28e837e2c14 and > now the horse is long gone. Don't give CAP_SYS_BOOT to anything you > don't trust with full privileges. > The point is that Eric's suggestion works even on kernels without kexec(), which is significant because a significant number of security minded people (myself included) explicitly disable kexec in their kernel configuration. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 1:36 ` Andy Lutomirski 2015-04-15 6:54 ` Richard Weinberger @ 2015-04-15 8:18 ` Martin Steigerwald 2015-04-15 8:32 ` Greg Kroah-Hartman 2015-04-15 8:29 ` Greg Kroah-Hartman 2 siblings, 1 reply; 333+ messages in thread From: Martin Steigerwald @ 2015-04-15 8:18 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski: > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote: > >> > I remain opposed to this half thought out trash of an ABI for the > >> > meta-data. > >> > >> You don't have to enable the metadata if you don't want to use it, > >> it's > >> an option :) > > > > OK, _that_ argument needs to be stomped out. It had been used before, > > and it was a deliberate scam. There is no such thing as optional > > kernel interface, especially when udev/dbus/systemd crowd is nearby. > > We'd been through that excuse before; remember how devtmpfs was > > pushed in as "optional"? > > > > This is a huge red flag. On the level of "I need your account > > information to transfer $200M you might have inherited from my > > deceased client". > > > > Just to recap how it went the last time around: Kay kept pushing his > > piece of code into the tree, claiming that it was optional, that > > nobody who doesn't like it has to enable it, so what's the problem? > > OK, in it went. And pretty soon udev (maintained by the same... > > meticulously honorable person) had stopped working on the kernels > > that didn't have that enabled. > > > > We had been there before. To paraphrase another... meticulously > > honorable person, "if you didn't want something relied upon, why have > > you put it into the kernel?" Said person is on the record as having > > no problem whatsoever with adding dependencies to the bottom of > > userland stack. > > It appears that, if kdbus is merged, upstream udev may end up requiring > it: > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html > > Grumble. Honestly, I think that tightly coupling systemd and udev to certain kernel versions in lock step is crap. That you require some minimum version after some reasonable time, sure. But in lockstep? Seriously. I certainly do not want a broken system just cause I have to load an older kernel version for some reason. And yes, I think its good not to force just about any userspace idea into the kernel. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:18 ` Martin Steigerwald @ 2015-04-15 8:32 ` Greg Kroah-Hartman 2015-04-15 8:52 ` Martin Steigerwald 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:32 UTC (permalink / raw) To: Martin Steigerwald Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote: > Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski: > > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> > wrote: > > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote: > > >> > I remain opposed to this half thought out trash of an ABI for the > > >> > meta-data. > > >> > > >> You don't have to enable the metadata if you don't want to use it, > > >> it's > > >> an option :) > > > > > > OK, _that_ argument needs to be stomped out. It had been used before, > > > and it was a deliberate scam. There is no such thing as optional > > > kernel interface, especially when udev/dbus/systemd crowd is nearby. > > > We'd been through that excuse before; remember how devtmpfs was > > > pushed in as "optional"? > > > > > > This is a huge red flag. On the level of "I need your account > > > information to transfer $200M you might have inherited from my > > > deceased client". > > > > > > Just to recap how it went the last time around: Kay kept pushing his > > > piece of code into the tree, claiming that it was optional, that > > > nobody who doesn't like it has to enable it, so what's the problem? > > > OK, in it went. And pretty soon udev (maintained by the same... > > > meticulously honorable person) had stopped working on the kernels > > > that didn't have that enabled. > > > > > > We had been there before. To paraphrase another... meticulously > > > honorable person, "if you didn't want something relied upon, why have > > > you put it into the kernel?" Said person is on the record as having > > > no problem whatsoever with adding dependencies to the bottom of > > > userland stack. > > > > It appears that, if kdbus is merged, upstream udev may end up requiring > > it: > > > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html > > > > Grumble. > > Honestly, I think that tightly coupling systemd and udev to certain kernel > versions in lock step is crap. Where do you see that happening? > That you require some minimum version after some reasonable time, sure. > But in lockstep? Seriously. Has that happened in the past? Look at the minimum requirements of systemd/udev today, something like the 3.7 kernel release, many years old. > I certainly do not want a broken system just cause I have to load an older > kernel version for some reason. No one does. But, work with your distribution if you end up with something like this. Remember, the goal is that you can always run newer kernels on older userspace, as that is something that we kernel developers can enforce. Userspace programs have other requirements / communities, it's up to them to decide what their oldest kernel version they wish to support. Hint, even glibc makes these kinds of requirements, it's nothing new at all here, so why is this even an issue? > And yes, I think its good not to force just about any userspace idea into > the kernel. Do you have any technical objections to the patch as proposed? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:32 ` Greg Kroah-Hartman @ 2015-04-15 8:52 ` Martin Steigerwald 2015-04-15 9:02 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Martin Steigerwald @ 2015-04-15 8:52 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am Mittwoch, 15. April 2015, 10:32:19 schrieb Greg Kroah-Hartman: > On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote: > > Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski: > > > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> > > > > wrote: > > > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote: > > > >> > I remain opposed to this half thought out trash of an ABI for > > > >> > the > > > >> > meta-data. > > > >> > > > >> You don't have to enable the metadata if you don't want to use > > > >> it, > > > >> it's > > > >> an option :) > > > > > > > > OK, _that_ argument needs to be stomped out. It had been used > > > > before, > > > > and it was a deliberate scam. There is no such thing as optional > > > > kernel interface, especially when udev/dbus/systemd crowd is > > > > nearby. > > > > We'd been through that excuse before; remember how devtmpfs was > > > > pushed in as "optional"? > > > > > > > > This is a huge red flag. On the level of "I need your account > > > > information to transfer $200M you might have inherited from my > > > > deceased client". > > > > > > > > Just to recap how it went the last time around: Kay kept pushing > > > > his > > > > piece of code into the tree, claiming that it was optional, that > > > > nobody who doesn't like it has to enable it, so what's the > > > > problem? > > > > OK, in it went. And pretty soon udev (maintained by the same... > > > > meticulously honorable person) had stopped working on the kernels > > > > that didn't have that enabled. > > > > > > > > We had been there before. To paraphrase another... meticulously > > > > honorable person, "if you didn't want something relied upon, why > > > > have > > > > you put it into the kernel?" Said person is on the record as > > > > having > > > > no problem whatsoever with adding dependencies to the bottom of > > > > userland stack. > > > > > > It appears that, if kdbus is merged, upstream udev may end up > > > requiring > > > it: > > > > > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657. > > > html > > > > > > Grumble. > > > > Honestly, I think that tightly coupling systemd and udev to certain > > kernel versions in lock step is crap. > > Where do you see that happening? > > > That you require some minimum version after some reasonable time, > > sure. > > But in lockstep? Seriously. > > Has that happened in the past? Look at the minimum requirements of > systemd/udev today, something like the 3.7 kernel release, many years > old. I refer to the linked mailing list post from Lennart as I quote here: > To make this clear, we expect that systemd and kernels are updated in > lockstep. We explicitly do not support really old kernels with really > (which means 3.4 right now), but even that should be taken with a grain > of salt, as we already made clear that soon after kdbus is merged into > the kernel we'll probably make a hard requirement on it from the systemd > side. Thats plenty clear, isn´t it? As soond as kdbus is merged into kernel, systemd will depend on it, and then… if I need to go back to older kernel, I have to downgrade systemd as well? > > I certainly do not want a broken system just cause I have to load an > > older kernel version for some reason. > > No one does. But, work with your distribution if you end up with > something like this. Remember, the goal is that you can always run > newer kernels on older userspace, as that is something that we kernel > developers can enforce. Userspace programs have other requirements / > communities, it's up to them to decide what their oldest kernel version > they wish to support. Hint, even glibc makes these kinds of > requirements, it's nothing new at all here, so why is this even an > issue? Its no issue for me that systemd required kernel 3.7. But… what Lennart announces above regarding kdbus reads quite differently. > > And yes, I think its good not to force just about any userspace idea > > into the kernel. > > Do you have any technical objections to the patch as proposed? If I had, I would have written it. I explained already that I see that kernel developers have strong technical objections with kdbus. And that I think it is important to acknowledge it, instead of telling them, that the API is required from userspace, userspace people know what they do, and they should just go away with their concerns. Thats at least how I received quite some of your responses. Well and I raised an eyebrow on the busname matching rules and the capability stuff. Yet, I didn´t comment on it, cause I didn´t look at it in-depth. I just ask you to take those seriously who did. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:52 ` Martin Steigerwald @ 2015-04-15 9:02 ` Greg Kroah-Hartman 2015-04-15 9:28 ` Martin Steigerwald 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 9:02 UTC (permalink / raw) To: Martin Steigerwald Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 10:52:37AM +0200, Martin Steigerwald wrote: > Am Mittwoch, 15. April 2015, 10:32:19 schrieb Greg Kroah-Hartman: > > On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote: > > > Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski: > > > > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> > > > > > > wrote: > > > > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman > wrote: > > > > >> > I remain opposed to this half thought out trash of an ABI for > > > > >> > the > > > > >> > meta-data. > > > > >> > > > > >> You don't have to enable the metadata if you don't want to use > > > > >> it, > > > > >> it's > > > > >> an option :) > > > > > > > > > > OK, _that_ argument needs to be stomped out. It had been used > > > > > before, > > > > > and it was a deliberate scam. There is no such thing as optional > > > > > kernel interface, especially when udev/dbus/systemd crowd is > > > > > nearby. > > > > > We'd been through that excuse before; remember how devtmpfs was > > > > > pushed in as "optional"? > > > > > > > > > > This is a huge red flag. On the level of "I need your account > > > > > information to transfer $200M you might have inherited from my > > > > > deceased client". > > > > > > > > > > Just to recap how it went the last time around: Kay kept pushing > > > > > his > > > > > piece of code into the tree, claiming that it was optional, that > > > > > nobody who doesn't like it has to enable it, so what's the > > > > > problem? > > > > > OK, in it went. And pretty soon udev (maintained by the same... > > > > > meticulously honorable person) had stopped working on the kernels > > > > > that didn't have that enabled. > > > > > > > > > > We had been there before. To paraphrase another... meticulously > > > > > honorable person, "if you didn't want something relied upon, why > > > > > have > > > > > you put it into the kernel?" Said person is on the record as > > > > > having > > > > > no problem whatsoever with adding dependencies to the bottom of > > > > > userland stack. > > > > > > > > It appears that, if kdbus is merged, upstream udev may end up > > > > requiring > > > > it: > > > > > > > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657. > > > > html > > > > > > > > Grumble. > > > > > > Honestly, I think that tightly coupling systemd and udev to certain > > > kernel versions in lock step is crap. > > > > Where do you see that happening? > > > > > That you require some minimum version after some reasonable time, > > > sure. > > > But in lockstep? Seriously. > > > > Has that happened in the past? Look at the minimum requirements of > > systemd/udev today, something like the 3.7 kernel release, many years > > old. > > I refer to the linked mailing list post from Lennart as I quote here: > > > To make this clear, we expect that systemd and kernels are updated in > > lockstep. We explicitly do not support really old kernels with really > > (which means 3.4 right now), but even that should be taken with a grain > > of salt, as we already made clear that soon after kdbus is merged into > > the kernel we'll probably make a hard requirement on it from the systemd > > side. > > Thats plenty clear, isn´t it? As soond as kdbus is merged into kernel, > systemd will depend on it, and then… if I need to go back to older kernel, > I have to downgrade systemd as well? > > > > I certainly do not want a broken system just cause I have to load an > > > older kernel version for some reason. > > > > No one does. But, work with your distribution if you end up with > > something like this. Remember, the goal is that you can always run > > newer kernels on older userspace, as that is something that we kernel > > developers can enforce. Userspace programs have other requirements / > > communities, it's up to them to decide what their oldest kernel version > > they wish to support. Hint, even glibc makes these kinds of > > requirements, it's nothing new at all here, so why is this even an > > issue? > > Its no issue for me that systemd required kernel 3.7. But… what Lennart > announces above regarding kdbus reads quite differently. Adding features to the systemd repo, and then having those releases make it out to your distro is a multi-year timeframe normally, and multi-month at the least. If a distro made such a decision to not support old kernels by accepting such a userspace requirement, take it up with them. And there are forks of systemd that keep around older kernel support, and distros use them for this very reason. Because they want to use old kernel versions, and that's great. It's the same for any kernel feature, programs are free to use them if they want to. If glibc were to make the requirement tomorrow that they are going to use memfd for their internal use and require that everyone update their kernels for their new release, we would all laugh that that is pretty funny and their user base would suffer. But again, that's nothing that the kernel has any control over, take it up with that project if you object to that. Personally, I want people to use the new code/features I provide them in the kernel, and get upset when people don't. Otherwise, why would I have spent so much time creating them and supporting them in the first place? > > > And yes, I think its good not to force just about any userspace idea > > > into the kernel. > > > > Do you have any technical objections to the patch as proposed? > > If I had, I would have written it. I explained already that I see that > kernel developers have strong technical objections with kdbus. And that I > think it is important to acknowledge it, instead of telling them, that the > API is required from userspace, userspace people know what they do, and > they should just go away with their concerns. > > Thats at least how I received quite some of your responses. > > Well and I raised an eyebrow on the busname matching rules and the > capability stuff. Yet, I didn´t comment on it, cause I didn´t look at it > in-depth. I just ask you to take those seriously who did. I take technical comments very seriously, where have I not? If you have technical reasons why the current implementation has problems, please let me know, and I will be glad to address them. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:02 ` Greg Kroah-Hartman @ 2015-04-15 9:28 ` Martin Steigerwald 2015-04-15 11:52 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Martin Steigerwald @ 2015-04-15 9:28 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am Mittwoch, 15. April 2015, 11:02:12 schrieb Greg Kroah-Hartman: > > > > And yes, I think its good not to force just about any userspace > > > > idea > > > > into the kernel. > > > > > > > > > > > > Do you have any technical objections to the patch as proposed? > > > > > > > > If I had, I would have written it. I explained already that I see > > that > > kernel developers have strong technical objections with kdbus. And > > that I think it is important to acknowledge it, instead of telling > > them, that the API is required from userspace, userspace people know > > what they do, and they should just go away with their concerns. > > > > > > > > Thats at least how I received quite some of your responses. > > > > > > > > Well and I raised an eyebrow on the busname matching rules and the > > capability stuff. Yet, I didn´t comment on it, cause I didn´t look at > > it in-depth. I just ask you to take those seriously who did. > > I take technical comments very seriously, where have I not? If you have > technical reasons why the current implementation has problems, please > let me know, and I will be glad to address them. >From what I read you basically answered all technical comments like in: The dbus API is like it is for a very good reason, everyone is using it and everyone agrees. Capabilities are used in userspace for good reason and so on. But I see, here, not everyone does. Most of your answers didn´t seem to address the concerns raised of having this in the *kernel*. Especially the security concerns. Thats what I meant with "And yes, I think its good not to force just about any userspace into the kernel". I think arguing with this is how userspace does it pattern, even if it truly is for a very good reason, is not sufficient as argument for having it in the kernel. I am just looking at the argumentative pattern here. If other kernel developers complain about how hard it is to review and wrap their mind around the kdbus patches… I am scared at just trying to understand the patches. So no technical complaints from me. I did not nack it nor do I see myself in the position to nack it. So feel free to do with my argument what you like. I just tried to understand why the communication in here works in circles as it does and I think will continue to work like that as long as its the userspace does it that way argument or this is optional argument only. For the discussion to go anywhere its important to acknowledge each other. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:28 ` Martin Steigerwald @ 2015-04-15 11:52 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 11:52 UTC (permalink / raw) To: Martin Steigerwald Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:28:36AM +0200, Martin Steigerwald wrote: > Am Mittwoch, 15. April 2015, 11:02:12 schrieb Greg Kroah-Hartman: > > > > > And yes, I think its good not to force just about any userspace > > > > > idea > > > > > into the kernel. > > > > > > > > > > > > > > > > Do you have any technical objections to the patch as proposed? > > > > > > > > > > > > If I had, I would have written it. I explained already that I see > > > that > > > kernel developers have strong technical objections with kdbus. And > > > that I think it is important to acknowledge it, instead of telling > > > them, that the API is required from userspace, userspace people know > > > what they do, and they should just go away with their concerns. > > > > > > > > > > > > Thats at least how I received quite some of your responses. > > > > > > > > > > > > Well and I raised an eyebrow on the busname matching rules and the > > > capability stuff. Yet, I didn´t comment on it, cause I didn´t look at > > > it in-depth. I just ask you to take those seriously who did. > > > > I take technical comments very seriously, where have I not? If you have > > technical reasons why the current implementation has problems, please > > let me know, and I will be glad to address them. > > >From what I read you basically answered all technical comments like in: > > The dbus API is like it is for a very good reason, everyone is using it > and everyone agrees. Capabilities are used in userspace for good reason > and so on. > > But I see, here, not everyone does. > > Most of your answers didn´t seem to address the concerns raised of having > this in the *kernel*. Especially the security concerns. I have responded to the security concerns, please don't say that I did not. > Thats what I meant with "And yes, I think its good not to force just about > any userspace into the kernel". I think arguing with this is how userspace > does it pattern, even if it truly is for a very good reason, is not > sufficient as argument for having it in the kernel. > > I am just looking at the argumentative pattern here. If other kernel > developers complain about how hard it is to review and wrap their mind > around the kdbus patches… I am scared at just trying to understand the > patches. So no technical complaints from me. I did not nack it nor do I > see myself in the position to nack it. Please take the time to read it, 13k lines isn't much. To not read the code and yet complain about the code is total nonsense. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 1:36 ` Andy Lutomirski 2015-04-15 6:54 ` Richard Weinberger 2015-04-15 8:18 ` Martin Steigerwald @ 2015-04-15 8:29 ` Greg Kroah-Hartman 2 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:29 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 06:36:28PM -0700, Andy Lutomirski wrote: > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote: > >> > I remain opposed to this half thought out trash of an ABI for the > >> > meta-data. > >> > >> You don't have to enable the metadata if you don't want to use it, it's > >> an option :) > > > > OK, _that_ argument needs to be stomped out. It had been used before, > > and it was a deliberate scam. There is no such thing as optional kernel > > interface, especially when udev/dbus/systemd crowd is nearby. We'd been > > through that excuse before; remember how devtmpfs was pushed in as "optional"? > > > > This is a huge red flag. On the level of "I need your account information > > to transfer $200M you might have inherited from my deceased client". > > > > Just to recap how it went the last time around: Kay kept pushing his piece of > > code into the tree, claiming that it was optional, that nobody who doesn't > > like it has to enable it, so what's the problem? OK, in it went. And pretty > > soon udev (maintained by the same... meticulously honorable person) had > > stopped working on the kernels that didn't have that enabled. > > > > We had been there before. To paraphrase another... meticulously honorable > > person, "if you didn't want something relied upon, why have you put it into the > > kernel?" Said person is on the record as having no problem whatsoever with > > adding dependencies to the bottom of userland stack. > > It appears that, if kdbus is merged, upstream udev may end up requiring it: > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html Why would anyone propose a kernel api if they didn't actually plan to use it? Look at the first email in this thread, it shows the people/projects that want to use this. This is a crazy argument to try to make people, "stop using the feature that the kernel provides you!" greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:29 ` Eric W. Biederman 2015-04-13 19:42 ` Greg Kroah-Hartman @ 2015-04-14 0:19 ` Eric W. Biederman 2015-04-14 0:34 ` Andy Lutomirski 2015-04-14 17:55 ` Greg Kroah-Hartman 2015-04-22 8:58 ` Borislav Petkov 2 siblings, 2 replies; 333+ messages in thread From: Eric W. Biederman @ 2015-04-14 0:19 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz ebiederm@xmission.com (Eric W. Biederman) writes: > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > >> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: >> >> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) >> >> are available in the git repository at: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 >> >> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: >> >> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) >> >> ---------------------------------------------------------------- >> kdbus for 4.1-rc1 >> >> Here's the kdbus pull request for 4.1-rc1. >> >> It's been under development for many years now, and been in linux-next >> for many months, and has undergone loads of testing a review and even a few >> good arguments. It comes with full documentation and tests. > >> There has been a few complaints about the code, notably from people who >> don't like the use of metadata in the bus messages. That is actually >> one of the main features here, as we can get this data in a secure and >> reliable way, and it's something that userspace requires today. So >> while it does look "odd" to people who are not familiar with dbus, this >> is something that finally fixes a number of almost unfixable races in >> the current dbus implementations. > > And the code that transfers the meta-data is wrong. In fact it is worse than I thought. With an userspace application able to give meaning to any of the bits of meta-data that are passed (capabilities, cgroup, security labels, etc) that in the fullness of time dropping in them will grant you more permissions somewhere. Which means that it becomes impossible to change anything. Impossible to jail anything. It in fact becomes impossible to do anything right. Which means the ultimate result of the direction kdbus is going is a world where nothing can be done without introducing a security issue or breaking userspace. So as far as I can tell kdbus has a fundamental design flaw. My apologies for being the bearer of bad news. Eric ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 0:19 ` Eric W. Biederman @ 2015-04-14 0:34 ` Andy Lutomirski 2015-04-14 17:55 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-14 0:34 UTC (permalink / raw) To: Eric W. Biederman Cc: Greg Kroah-Hartman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 5:19 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > ebiederm@xmission.com (Eric W. Biederman) writes: > >> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: >> >>> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: >>> >>> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) >>> >>> are available in the git repository at: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 >>> >>> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: >>> >>> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) >>> >>> ---------------------------------------------------------------- >>> kdbus for 4.1-rc1 >>> >>> Here's the kdbus pull request for 4.1-rc1. >>> >>> It's been under development for many years now, and been in linux-next >>> for many months, and has undergone loads of testing a review and even a few >>> good arguments. It comes with full documentation and tests. >> >>> There has been a few complaints about the code, notably from people who >>> don't like the use of metadata in the bus messages. That is actually >>> one of the main features here, as we can get this data in a secure and >>> reliable way, and it's something that userspace requires today. So >>> while it does look "odd" to people who are not familiar with dbus, this >>> is something that finally fixes a number of almost unfixable races in >>> the current dbus implementations. >> >> And the code that transfers the meta-data is wrong. > > In fact it is worse than I thought. > > With an userspace application able to give meaning to any of the bits of > meta-data that are passed (capabilities, cgroup, security labels, etc) > that in the fullness of time dropping in them will grant you more > permissions somewhere. > > Which means that it becomes impossible to change anything. Impossible > to jail anything. It in fact becomes impossible to do anything right. > > Which means the ultimate result of the direction kdbus is going is a > world where nothing can be done without introducing a security issue or > breaking userspace. > > So as far as I can tell kdbus has a fundamental design flaw. > > My apologies for being the bearer of bad news. > I agree here. I cannot overstate the degree to which passing caps around through metadata is a bad idea. LSM labels are probably nearly as bad. Having LSM hooks in kdbus is one thing, but passing the *raw labels* around and letting userspace muck with them will cause the policy situation to be incomprehensible. User code should get simple yes/no answers from LSM policy, not raw data. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 0:19 ` Eric W. Biederman 2015-04-14 0:34 ` Andy Lutomirski @ 2015-04-14 17:55 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-14 17:55 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Mon, Apr 13, 2015 at 07:19:49PM -0500, Eric W. Biederman wrote: > ebiederm@xmission.com (Eric W. Biederman) writes: > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > > >> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > >> > >> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > >> > >> are available in the git repository at: > >> > >> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > >> > >> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > >> > >> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > >> > >> ---------------------------------------------------------------- > >> kdbus for 4.1-rc1 > >> > >> Here's the kdbus pull request for 4.1-rc1. > >> > >> It's been under development for many years now, and been in linux-next > >> for many months, and has undergone loads of testing a review and even a few > >> good arguments. It comes with full documentation and tests. > > > >> There has been a few complaints about the code, notably from people who > >> don't like the use of metadata in the bus messages. That is actually > >> one of the main features here, as we can get this data in a secure and > >> reliable way, and it's something that userspace requires today. So > >> while it does look "odd" to people who are not familiar with dbus, this > >> is something that finally fixes a number of almost unfixable races in > >> the current dbus implementations. > > > > And the code that transfers the meta-data is wrong. > > In fact it is worse than I thought. Please see the email response I just wrote to Andy about this, it should address these misconceptions. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:29 ` Eric W. Biederman 2015-04-13 19:42 ` Greg Kroah-Hartman 2015-04-14 0:19 ` Eric W. Biederman @ 2015-04-22 8:58 ` Borislav Petkov 2015-04-23 19:14 ` Greg Kroah-Hartman 2 siblings, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-22 8:58 UTC (permalink / raw) To: Eric W. Biederman, Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote: > And the code that transfers the meta-data is wrong. > > It is generally not something that userspace requires today, certainly > userspace is not using it. > > You are exporting a weird set of information in a unique way that makes > it race free enough to make ``security'' decisions upon but the data > in general is not appropriate to make those decisions. > > I remain opposed to this half thought out trash of an ABI for the > meta-data. > > Just because something happens to be exported in a DEBUG api today does > not make it appropriate for userspace to run around making security > decisions with that information. > > Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > I think it is premature to be merging kdbus. You have fuddamental > issues that can not be fixed once the ABI is frozen. > > The semantics of the meta-data you export are extremely poorly defined. Not only that - it looks like a serious amount of work on each sent packet. So I did some staring, correct me if I missed something: kdbus_cmd_send - KDBUS_CMD_SEND, ioctl cmd, copy stuff from userspace |-> kdbus_kmsg_new_from_cmd(), kmalloc+memset + prepare a *lot* of stuff like: |-> m->proc_meta = kdbus_meta_proc_new(); m->conn_meta = kdbus_meta_conn_new(); ... |-> kdbus_bus_broadcast(conn->ep->bus, conn, kmsg); let's look at the broadcast mode |-> hash_for_each(bus->conn_hash, i, conn_dst, hentry) { iterate over hash buckets, O(256) |-> kdbus_meta_proc_collect(kmsg->proc_meta, attach_flags); collect a *lot* of stuff from current etc |-> kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src, attach_flags); collect more stuff and this happens on *every* send. A *lot* of work. Now multiply that by the amount of messages this thing is going to send per second. It piles up. So you have the overhead right then and there in the design without even being able to fix it. Or at least pretty damn hard to fix. So unless I'm missing something, this right there is a design problem. Why can't this messaging be done with a nifty O(1) scheme like sending parties issuing auth tokens and whatever and the kernel doing the arbitration and distribution of those tokens? That gets you sandboxing, dropping privileges and whatever else fancy containers people wanna do for free. Token recipient has the token - that's all that counts. Again, this is from a short staring only, I might just as well be missing something but you'll tell me :-) Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-22 8:58 ` Borislav Petkov @ 2015-04-23 19:14 ` Greg Kroah-Hartman 2015-04-23 20:56 ` Borislav Petkov 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-23 19:14 UTC (permalink / raw) To: Borislav Petkov Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Wed, Apr 22, 2015 at 10:58:28AM +0200, Borislav Petkov wrote: > On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote: > > And the code that transfers the meta-data is wrong. > > > > It is generally not something that userspace requires today, certainly > > userspace is not using it. > > > > You are exporting a weird set of information in a unique way that makes > > it race free enough to make ``security'' decisions upon but the data > > in general is not appropriate to make those decisions. > > > > I remain opposed to this half thought out trash of an ABI for the > > meta-data. > > > > Just because something happens to be exported in a DEBUG api today does > > not make it appropriate for userspace to run around making security > > decisions with that information. > > > > Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > > > I think it is premature to be merging kdbus. You have fuddamental > > issues that can not be fixed once the ABI is frozen. > > > > The semantics of the meta-data you export are extremely poorly defined. > > Not only that - it looks like a serious amount of work on each sent > packet. So I did some staring, correct me if I missed something: > > kdbus_cmd_send - KDBUS_CMD_SEND, ioctl cmd, copy stuff from userspace > |-> kdbus_kmsg_new_from_cmd(), kmalloc+memset + prepare a *lot* of stuff like: > |-> m->proc_meta = kdbus_meta_proc_new(); > m->conn_meta = kdbus_meta_conn_new(); > ... > |-> kdbus_bus_broadcast(conn->ep->bus, conn, kmsg); let's look at the broadcast mode > |-> hash_for_each(bus->conn_hash, i, conn_dst, hentry) { iterate over hash buckets, O(256) I don't know what O(256) means here, O notation usually is used to show the complexity of a function, so this really is almost always the same amount of time, based on using the hash function. I've never seen a number in O() before, but I went to school a long time ago, and probably forgot something... Or am I misunderstanding your note here? > |-> kdbus_meta_proc_collect(kmsg->proc_meta, attach_flags); collect a *lot* of stuff from current etc > |-> kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src, attach_flags); collect more stuff > > and this happens on *every* send. A *lot* of work. Yes, these looks like a lot of stuff but it's still really fast. And we need it. > Now multiply that by the amount of messages this thing is going to send > per second. It piles up. So you have the overhead right then and there > in the design without even being able to fix it. Or at least pretty damn > hard to fix. It's way faster than what we have today, and David has found a few areas that can go faster, so I don't really understand the objection. If you can come up with a faster way to do this, that would be great and most appreciated. > So unless I'm missing something, this right there is a design problem. > > Why can't this messaging be done with a nifty O(1) scheme like sending > parties issuing auth tokens and whatever and the kernel doing the > arbitration and distribution of those tokens? Hm, this seems to be to be O(1), pretty constant, we do the same amount of work all the time. Then we send the message to the people listening to it (so that is O(n) depending on the number of listeners, really the best that I think you can get). Or am I misunderstanding what you are asking for here? > That gets you sandboxing, dropping privileges and whatever else fancy > containers people wanna do for free. Token recipient has the token - > that's all that counts. I don't understand what a token provides that is different from what is happening here, please explain. How can that be faster than what we do today? confused, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 19:14 ` Greg Kroah-Hartman @ 2015-04-23 20:56 ` Borislav Petkov 2015-04-23 21:22 ` David Herrmann 2015-04-24 6:36 ` Greg Kroah-Hartman 0 siblings, 2 replies; 333+ messages in thread From: Borislav Petkov @ 2015-04-23 20:56 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote: > I don't know what O(256) means here, O notation usually is used to > show the complexity of a function, so this really is almost always the > same amount of time, based on using the hash function. This is iterating over 256 hash buckets. So O(n) complexity. Better? > Yes, these looks like a lot of stuff but it's still really fast. "really fast" - that's the right way to quantify things, right? Let me reply in your terms: "no, it is dumb and slow". > And we need it. *Of* *course* you need it, what else. Lemme guess: there's no other way to do this than the way it was done now, right? And we should stop asking such stupid questions and accept it... Yeah, of course. > Hm, this seems to be to be O(1), pretty constant, we do the same amount > of work all the time. The same *pile* of unnecessary and needless work. You go and collect *all* that data on *every* packet send?! How many packets per second are we talking here? 100, 1000, 10000...? Let's say you're "really fast" because you've bought a "bigger machine" and do that information collection per packet for, say 10 microseconds (I'm probably too generous here but whatever). So at peak rates of 10000 packets per second, and 10µs preparation time per packet, you're wasting 100000 µs == 100 msec, i.e. 1/10th of a second you're busy only with sending packets. Hmm, but then the receiving side needs CPU time too... Oh yeah, and then those pesky userspace processes need some CPU time too... Are you really serious or is this some tactic of deliberately asking dumb questions? Let me know now so that I can stop wasting my time. > I don't understand what a token provides that is different from what is > happening here, please explain. How can that be faster than what we do > today? A token-based scheme would give you significantly less traffic; distributing those in sandboxing, containers, etc for free and you can throw the metadata collecting in the garbage can: Example: * A daemon issues a token, say, a capability to reboot. * It gives that token (with the kernel as intermediary) to a recipient which should be allowed to reboot. * recipient can drop privileges, run in a sandbox, whatever, it still has that token. That's exactly one packet sent *without* any information collection. Recipient has to authenticate itself to the kernel when requesting the packet. Clean and simple. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 20:56 ` Borislav Petkov @ 2015-04-23 21:22 ` David Herrmann 2015-04-23 21:33 ` Richard Weinberger 2015-04-23 21:41 ` Borislav Petkov 2015-04-24 6:36 ` Greg Kroah-Hartman 1 sibling, 2 replies; 333+ messages in thread From: David Herrmann @ 2015-04-23 21:22 UTC (permalink / raw) To: Borislav Petkov Cc: Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack, Djalal Harouni Hi On Thu, Apr 23, 2015 at 10:56 PM, Borislav Petkov <bp@alien8.de> wrote: > On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote: >> I don't know what O(256) means here, O notation usually is used to >> show the complexity of a function, so this really is almost always the >> same amount of time, based on using the hash function. > > This is iterating over 256 hash buckets. So O(n) complexity. Better? No it's not. O(256) equals O(1). Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 21:22 ` David Herrmann @ 2015-04-23 21:33 ` Richard Weinberger 2015-04-24 14:02 ` Steven Rostedt 2015-04-23 21:41 ` Borislav Petkov 1 sibling, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-23 21:33 UTC (permalink / raw) To: David Herrmann Cc: Borislav Petkov, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack, Djalal Harouni On Thu, Apr 23, 2015 at 11:22 PM, David Herrmann <dh.herrmann@gmail.com> wrote: > Hi > > On Thu, Apr 23, 2015 at 10:56 PM, Borislav Petkov <bp@alien8.de> wrote: >> On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote: >>> I don't know what O(256) means here, O notation usually is used to >>> show the complexity of a function, so this really is almost always the >>> same amount of time, based on using the hash function. >> >> This is iterating over 256 hash buckets. So O(n) complexity. Better? > > No it's not. O(256) equals O(1). Yeah, that's absolutely correct. I think Boris wanted to say that iterating over all hash buckets can be costly. -- Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 21:33 ` Richard Weinberger @ 2015-04-24 14:02 ` Steven Rostedt 0 siblings, 0 replies; 333+ messages in thread From: Steven Rostedt @ 2015-04-24 14:02 UTC (permalink / raw) To: Richard Weinberger Cc: David Herrmann, Borislav Petkov, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack, Djalal Harouni On Thu, Apr 23, 2015 at 11:33:19PM +0200, Richard Weinberger wrote: > > No it's not. O(256) equals O(1). > > Yeah, that's absolutely correct. > I think Boris wanted to say that iterating over all hash buckets > can be costly. You are thinking of 'k' (the constant), where you usually have k*O(1), where k does matter when comparing two algorithms with the same Big O value. And sometimes even different O() values if the 'n' is small enough. 100*O(1) vs 1*O(n), the latter is better if n < 100. Something that runs at O(n) but takes 1ms per n is a much worse algorithm than something that runs at O(n) and takes 1us per n. Both have the same O() notation, but which algorithm you use is obvious. But Greg is right, you O notation isn't applicable here. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 21:22 ` David Herrmann 2015-04-23 21:33 ` Richard Weinberger @ 2015-04-23 21:41 ` Borislav Petkov 2015-04-24 5:02 ` Steven Noonan 1 sibling, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-23 21:41 UTC (permalink / raw) To: David Herrmann Cc: Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack, Djalal Harouni On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote: > No it's not. O(256) equals O(1). Ok, you're right. Maybe O() was not the right thing to use when trying to point out that iterating over 256 hash buckets and then following the chain in each bucket per packet broadcast looks like a lot. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 21:41 ` Borislav Petkov @ 2015-04-24 5:02 ` Steven Noonan 2015-04-24 9:04 ` Borislav Petkov 0 siblings, 1 reply; 333+ messages in thread From: Steven Noonan @ 2015-04-24 5:02 UTC (permalink / raw) To: Borislav Petkov Cc: David Herrmann, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack, Djalal Harouni On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <bp@alien8.de> wrote: > On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote: >> No it's not. O(256) equals O(1). > > Ok, you're right. Maybe O() was not the right thing to use when trying > to point out that iterating over 256 hash buckets and then following the > chain in each bucket per packet broadcast looks like a lot. > Heh. I guess you could call it an "expensive O(1)". While big-O notation is useful for describing algorithm scalability with respect to input size, it falls flat on its face when trying to articulate impact in measurable units. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 5:02 ` Steven Noonan @ 2015-04-24 9:04 ` Borislav Petkov 2015-04-24 10:28 ` Daniel Mack 0 siblings, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-24 9:04 UTC (permalink / raw) To: Steven Noonan Cc: David Herrmann, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack, Djalal Harouni On Thu, Apr 23, 2015 at 10:02:52PM -0700, Steven Noonan wrote: > On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote: > >> No it's not. O(256) equals O(1). > > > > Ok, you're right. Maybe O() was not the right thing to use when trying > > to point out that iterating over 256 hash buckets and then following the > > chain in each bucket per packet broadcast looks like a lot. > > > > Heh. I guess you could call it an "expensive O(1)". While big-O > notation is useful for describing algorithm scalability with respect > to input size, it falls flat on its face when trying to articulate > impact in measurable units. Right, so in thinking about this more today, on a fresh head, it still is O(n) because we do broadcast the packet to n recipients - the hash_for_each() thing iterates over 256 hash buckets and also follows the linked list chain in each bucket. Its length is depending on how many connections are in the bucket, i.e. recipients. And I'd guess that number changes dynamically so probably linear. And then there's the collection of, let's call it metadata of questionable use, *per* packet which is pretty expensive in my book. It becomes even more expensive if it is completely useless as in, the receiving side doesn't need it all. Now, one might argue that you have to do O(n) work when broadcasting to n recipients anyway and you can't get that cheaper but maybe the design is not optimal. Maybe it could be made to not broadcast at all, or broadcast to a subset of recipients, only those which are actually interested in the broadcast. That's why I was looking at some simple token-based schemes. And that's why I think Andy has some very cool ideas which we should definitely pay attention to: https://lkml.kernel.org/r/CALCETrXXUiYKAhsXsdqH2uZMddDhK5hX6V9%2BrZcHwa1X5WC%2B1g@mail.gmail.com before we go and commit this thing and cast it stone. Because if it goes in, there's no changing it because we'll be then breaking userspace and that's no-no. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 9:04 ` Borislav Petkov @ 2015-04-24 10:28 ` Daniel Mack 2015-04-24 10:50 ` Borislav Petkov 0 siblings, 1 reply; 333+ messages in thread From: Daniel Mack @ 2015-04-24 10:28 UTC (permalink / raw) To: Borislav Petkov, Steven Noonan Cc: David Herrmann, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Djalal Harouni Hi, On 04/24/2015 11:04 AM, Borislav Petkov wrote: > On Thu, Apr 23, 2015 at 10:02:52PM -0700, Steven Noonan wrote: >> On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <bp@alien8.de> wrote: >>> On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote: >>>> No it's not. O(256) equals O(1). >>> >>> Ok, you're right. Maybe O() was not the right thing to use when trying >>> to point out that iterating over 256 hash buckets and then following the >>> chain in each bucket per packet broadcast looks like a lot. >>> >> >> Heh. I guess you could call it an "expensive O(1)". While big-O >> notation is useful for describing algorithm scalability with respect >> to input size, it falls flat on its face when trying to articulate >> impact in measurable units. > > Right, so in thinking about this more today, on a fresh head, it still > is O(n) because we do broadcast the packet to n recipients - the > hash_for_each() thing iterates over 256 hash buckets and also follows > the linked list chain in each bucket. Its length is depending on how > many connections are in the bucket, i.e. recipients. And I'd guess that > number changes dynamically so probably linear. Sure, for broadcasts, we have to walk the list of peers connected to the bus and see which one is interested in a particular message. We do that by looking at the match rules of each of them, which are based on well-known names, IDs, notification types or bloom filters. The policy logic limits this further, as receivers of a broadcast must have TALK access to the sender. If these rules let a message pass, all the metadata that the receiving peer asked for (by setting a flag at connect time) is collected, unless it has been collected already for some other peer for the same message. In other words, in worst case, we collect all the metadata items exactly once per message. If none of the connections with permissive match/policy rules for a message is interested in any metadata items, nothing will be collected at all. The reason why the peers are organized in a hash table is that we have to look them up by ID for unicast messages. > And then there's the collection of, let's call it metadata of > questionable use, *per* packet which is pretty expensive in my book. > It becomes even more expensive if it is completely useless as in, the > receiving side doesn't need it all. If the receiving side doesn't need it, it shouldn't opt-in for that piece of information. The metadata logic is really only there so receiving peers are directly supplied with information that they would otherwise look up themselves from /proc or something. Also, we collect metadata at send time and for every message intentionally, so that it reflects the state of the sender at the time of sending. This way, the information is not subject to races of asynchronous lookups. > Now, one might argue that you have to do O(n) work when broadcasting > to n recipients anyway and you can't get that cheaper but maybe the > design is not optimal. Maybe it could be made to not broadcast at all, > or broadcast to a subset of recipients, only those which are actually > interested in the broadcast. That's exactly what happens :) There are some more details on this in kdbus.match(7). Thanks, Daniel ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 10:28 ` Daniel Mack @ 2015-04-24 10:50 ` Borislav Petkov 2015-04-24 11:26 ` Daniel Mack 0 siblings, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-24 10:50 UTC (permalink / raw) To: Daniel Mack Cc: Steven Noonan, David Herrmann, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Djalal Harouni Hi, On Fri, Apr 24, 2015 at 12:28:54PM +0200, Daniel Mack wrote: > Sure, for broadcasts, we have to walk the list of peers connected to the > bus and see which one is interested in a particular message. We do that And this "... we have to walk the list ..." right there raises the alarm. Can this walking of elements where you know they wouldn't match be avoided? > by looking at the match rules of each of them, which are based on > well-known names, IDs, notification types or bloom filters. The policy > logic limits this further, as receivers of a broadcast must have TALK > access to the sender. So it sounds to me like there are characteristics which can already prepare lists of recipients interested in some sort of message. So would it be possible for recipients to "register" for such messages and the sending side would simply iterate a list of solely interested recipients? This will definitely save you the iteration over all n connections and would make the metadata collection probably not needed (or at least a subset of it) because recipients will have to establish eligibility for receiving a certain message at register time and once they're on the list, you implicitly know why they're there. I don't know whether that fits all use cases but it definitely does only the *necessary* work for message transfer and not more. > If these rules let a message pass, all the metadata that the receiving > peer asked for (by setting a flag at connect time) is collected, unless > it has been collected already for some other peer for the same message. > In other words, in worst case, we collect all the metadata items exactly > once per message. Right. > If none of the connections with permissive match/policy rules for a > message is interested in any metadata items, nothing will be collected > at all. But we still iterate through there and look at the arg @what and ->collected. And this is useless work which can be avoided IMHO. > If the receiving side doesn't need it, it shouldn't opt-in for that > piece of information. > > The metadata logic is really only there so receiving peers are directly > supplied with information that they would otherwise look up themselves > from /proc or something. Also, we collect metadata at send time and for > every message intentionally, so that it reflects the state of the sender > at the time of sending. This way, the information is not subject to > races of asynchronous lookups. Ok. > > Now, one might argue that you have to do O(n) work when broadcasting > > to n recipients anyway and you can't get that cheaper but maybe the > > design is not optimal. Maybe it could be made to not broadcast at all, > > or broadcast to a subset of recipients, only those which are actually > > interested in the broadcast. > > That's exactly what happens :) There are some more details on this in > kdbus.match(7). But this is not for KDBUS_DST_ID_BROADCAST types, right? Because there you have to iterate over *all* recipients in the connection hash. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 10:50 ` Borislav Petkov @ 2015-04-24 11:26 ` Daniel Mack 0 siblings, 0 replies; 333+ messages in thread From: Daniel Mack @ 2015-04-24 11:26 UTC (permalink / raw) To: Borislav Petkov Cc: Steven Noonan, David Herrmann, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-kernel, Djalal Harouni Hi, On 04/24/2015 12:50 PM, Borislav Petkov wrote: > On Fri, Apr 24, 2015 at 12:28:54PM +0200, Daniel Mack wrote: >> Sure, for broadcasts, we have to walk the list of peers connected to the >> bus and see which one is interested in a particular message. We do that > > And this "... we have to walk the list ..." right there raises the > alarm. Can this walking of elements where you know they wouldn't match > be avoided? Yes, see below. >> by looking at the match rules of each of them, which are based on >> well-known names, IDs, notification types or bloom filters. The policy >> logic limits this further, as receivers of a broadcast must have TALK >> access to the sender. > > So it sounds to me like there are characteristics which can already > prepare lists of recipients interested in some sort of message. So > would it be possible for recipients to "register" for such messages > and the sending side would simply iterate a list of solely interested > recipients? > > This will definitely save you the iteration over all n connections and > would make the metadata collection probably not needed (or at least a > subset of it) because recipients will have to establish eligibility for > receiving a certain message at register time and once they're on the > list, you implicitly know why they're there. David is working on patches that store hashes of the matches in trees so we can look them up more efficiently. We'd still need to check the bloom filter for all remaining candidates though. These are, however, implementation details which potentially make the code harder to read. We are well aware of certain spots that can be made more efficient, but we were hoping for more reviews by keeping the implementation simple for now. >> If none of the connections with permissive match/policy rules for a >> message is interested in any metadata items, nothing will be collected >> at all. > > But we still iterate through there and look at the arg @what and > ->collected. And this is useless work which can be avoided IMHO. Not sure if it really matters, but we can probably add an early bail there, yes. Something like what &= ~mp->collected; if (!what) return; Noted down, thanks! >>> Now, one might argue that you have to do O(n) work when broadcasting >>> to n recipients anyway and you can't get that cheaper but maybe the >>> design is not optimal. Maybe it could be made to not broadcast at all, >>> or broadcast to a subset of recipients, only those which are actually >>> interested in the broadcast. >> >> That's exactly what happens :) There are some more details on this in >> kdbus.match(7). > > But this is not for KDBUS_DST_ID_BROADCAST types, right? Yes it is - all broadcast messages are subject to opt-in filters installed by the receiving peer. Thanks, Daniel ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 20:56 ` Borislav Petkov 2015-04-23 21:22 ` David Herrmann @ 2015-04-24 6:36 ` Greg Kroah-Hartman 2015-04-24 6:45 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-24 6:36 UTC (permalink / raw) To: Borislav Petkov Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote: > > Hm, this seems to be to be O(1), pretty constant, we do the same amount > > of work all the time. > > The same *pile* of unnecessary and needless work. You go and collect > *all* that data on *every* packet send?! No, not at all, the metadata is cached, we only collect that for the first message sent, if we didn't know it already, or we do it on the "open" of the connection, depending on what we are gathering metadata for. The mc->collected test right before collecting the specific metadata is that "cached or not" test. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 6:36 ` Greg Kroah-Hartman @ 2015-04-24 6:45 ` Greg Kroah-Hartman 2015-04-24 7:27 ` Martin Steigerwald 2015-04-24 8:35 ` Greg Kroah-Hartman 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-24 6:45 UTC (permalink / raw) To: Borislav Petkov Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote: > On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote: > > > Hm, this seems to be to be O(1), pretty constant, we do the same amount > > > of work all the time. > > > > The same *pile* of unnecessary and needless work. You go and collect > > *all* that data on *every* packet send?! > > No, not at all, the metadata is cached, we only collect that for the > first message sent, if we didn't know it already, or we do it on the > "open" of the connection, depending on what we are gathering metadata > for. > > The mc->collected test right before collecting the specific metadata is > that "cached or not" test. Oh wait, no, there are some send-time metadata that is collected for every message, see Linus's email for more details about that. Maybe this can be changed to cache things even more than we currently do. it's early, shouldn't write emails before coffee... David had some flamegraphs floating around that showed where all the time on transmit / receive was being spent, and I don't think that the metadata area was all that relevant, but I can't find them anymore to say for sure. There are other areas that can be sped up on the send path, but perf data is the best way to verify this. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 6:45 ` Greg Kroah-Hartman @ 2015-04-24 7:27 ` Martin Steigerwald 2015-04-24 8:35 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Martin Steigerwald @ 2015-04-24 7:27 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Borislav Petkov, Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz Am Freitag, 24. April 2015, 08:45:15 schrieb Greg Kroah-Hartman: > On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote: > > On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote: > > > > Hm, this seems to be to be O(1), pretty constant, we do the same > > > > amount > > > > of work all the time. > > > > > > The same *pile* of unnecessary and needless work. You go and collect > > > *all* that data on *every* packet send?! > > > > No, not at all, the metadata is cached, we only collect that for the > > first message sent, if we didn't know it already, or we do it on the > > "open" of the connection, depending on what we are gathering metadata > > for. > > > > The mc->collected test right before collecting the specific metadata > > is > > that "cached or not" test. > > Oh wait, no, there are some send-time metadata that is collected for > every message, see Linus's email for more details about that. Maybe > this can be changed to cache things even more than we currently do. > > it's early, shouldn't write emails before coffee... > > David had some flamegraphs floating around that showed where all the > time on transmit / receive was being spent, and I don't think that the > metadata area was all that relevant, but I can't find them anymore to > say for sure. There are other areas that can be sped up on the send > path, but perf data is the best way to verify this. I think thats exactly the data that others have asked for several times, so I think it would be good to find it again. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 6:45 ` Greg Kroah-Hartman 2015-04-24 7:27 ` Martin Steigerwald @ 2015-04-24 8:35 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-24 8:35 UTC (permalink / raw) To: Borislav Petkov Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Fri, Apr 24, 2015 at 08:45:15AM +0200, Greg Kroah-Hartman wrote: > On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote: > > On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote: > > > > Hm, this seems to be to be O(1), pretty constant, we do the same amount > > > > of work all the time. > > > > > > The same *pile* of unnecessary and needless work. You go and collect > > > *all* that data on *every* packet send?! > > > > No, not at all, the metadata is cached, we only collect that for the > > first message sent, if we didn't know it already, or we do it on the > > "open" of the connection, depending on what we are gathering metadata > > for. > > > > The mc->collected test right before collecting the specific metadata is > > that "cached or not" test. > > Oh wait, no, there are some send-time metadata that is collected for > every message, see Linus's email for more details about that. Maybe > this can be changed to cache things even more than we currently do. > > it's early, shouldn't write emails before coffee... > > David had some flamegraphs floating around that showed where all the > time on transmit / receive was being spent, and I don't think that the > metadata area was all that relevant, but I can't find them anymore to > say for sure. There are other areas that can be sped up on the send > path, but perf data is the best way to verify this. Here's the graphs that he posted during the last code review cycle that are relevant here: http://lkml.iu.edu/hypermail/linux/kernel/1503.2/02624.html greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:03 Greg Kroah-Hartman 2015-04-13 19:29 ` Eric W. Biederman @ 2015-04-13 20:13 ` Andy Lutomirski 2015-04-13 20:45 ` Greg Kroah-Hartman 2015-04-23 13:05 ` Greg Kroah-Hartman 2 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-13 20:13 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > > ---------------------------------------------------------------- > kdbus for 4.1-rc1 > > Here's the kdbus pull request for 4.1-rc1. > > It's been under development for many years now, and been in linux-next > for many months, and has undergone loads of testing a review and even a few > good arguments. It comes with full documentation and tests. > > There has been a few complaints about the code, notably from people who > don't like the use of metadata in the bus messages. That is actually > one of the main features here, as we can get this data in a secure and > reliable way, and it's something that userspace requires today. So > while it does look "odd" to people who are not familiar with dbus, this > is something that finally fixes a number of almost unfixable races in > the current dbus implementations. While I generally like the concept of having a better in-kernel IPC mechanism, after some consideration I don't think this belongs in the kernel in its current form. Here's why. First, the naming is counterintuitive. There are "endpoints", but you don't send messages to endpoints. In fact, an basic kdbus setup will have exactly one endpoint AFAICT. Wtf? This makes talking about it awkward. A lot of the design seems to be to violate the concept of "mechanism, not policy". Kdbus is very much a port of userspace dbus to the kernel, and it appears to be a port designed to preserve some questionable design decisions instead of learning from them. For example, kdbus sticks a whole policy database in the kernel, but that policy database (AFAICT -- holy crap it's overcomplicated) is *not* a simple set of rules like "if A then allow B". Instead it has really weird dependencies not on what name you're sending to but on what *other* names the thing you're sending to has. Sorry, but this way lies (a) the inability for a large set of developers to understand what's going on and (b) security bugs. Also, the result probably can't be reused as part of a non-legacy-filled sensible design Kdbus claims to be very fast. Unfortunately, requests for a broad set of benchmarks have mostly been ignored, my attempts to benchmark it (admittedly I didn't try that hard) were several times worse than published figures, and, most tellingly, *no one* has claimed that kdbus is faster than AF_UNIX. In fact, everyone seems to acknowledge that kdbus is several times slower than AF_UNIX. The metadata thing is problematic. It seems to be intended to serve two purposes: data gathering for logging and authentication. Unfortunately, it has issues. There are no fewer than *three* metadata capture points: creation of a bus, connection to a bus, and sending of a message. The kdbus authors like to point out that these are all optional, but IMO that's bunk. Someone will write a userspace library that rejects messages from people who don't enable all of them, then then we're screwed. Why are we screwed? Because any kdbus client *won't know which metadata matters*. That means that we automatically have the worst of all worlds, not the best. Also, the bus creation metadata is completely worthless for anything other than logging, but someone will use it for something other than logging, at which point it's vulnerable to a DoS. No one has explained to my satisfaction why this isn't a problem. Also, the metadata code captures things that are, in my book completely unacceptable, such as cmdline and (!) capabilities. I bet that the cmdline capture is extra special fscked up when cgroups and such are in play because *it reads from the sender's VM*. IOW it's insecure and pointless. (OK, it has a point: logging. But I really don't think that belongs in the kernel.) In summary, the general idea is good, but the implementation isn't general enough, the policy stuff is too specialized and enshrines bad design, the performance isn't good enough to justify it, and the metadata is nasty. So, for what it's worth, NACK in its present form. Sorry. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 20:13 ` Andy Lutomirski @ 2015-04-13 20:45 ` Greg Kroah-Hartman 2015-04-13 21:01 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-13 20:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote: > On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > > > > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > > > > are available in the git repository at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > > > > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > > > > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > > > > ---------------------------------------------------------------- > > kdbus for 4.1-rc1 > > > > Here's the kdbus pull request for 4.1-rc1. > > > > It's been under development for many years now, and been in linux-next > > for many months, and has undergone loads of testing a review and even a few > > good arguments. It comes with full documentation and tests. > > > > There has been a few complaints about the code, notably from people who > > don't like the use of metadata in the bus messages. That is actually > > one of the main features here, as we can get this data in a secure and > > reliable way, and it's something that userspace requires today. So > > while it does look "odd" to people who are not familiar with dbus, this > > is something that finally fixes a number of almost unfixable races in > > the current dbus implementations. > > While I generally like the concept of having a better in-kernel IPC > mechanism, after some consideration I don't think this belongs in the > kernel in its current form. Here's why. > > First, the naming is counterintuitive. There are "endpoints", but you > don't send messages to endpoints. In fact, an basic kdbus setup will > have exactly one endpoint AFAICT. Wtf? This makes talking about it > awkward. Did you read the documentation? We've been over this before, and it should all be addressed in the documentation based on this coming up. > A lot of the design seems to be to violate the concept of "mechanism, > not policy". Kdbus is very much a port of userspace dbus to the > kernel, and it appears to be a port designed to preserve some > questionable design decisions instead of learning from them. > > For example, kdbus sticks a whole policy database in the kernel, but > that policy database (AFAICT -- holy crap it's overcomplicated) is > *not* a simple set of rules like "if A then allow B". Instead it has > really weird dependencies not on what name you're sending to but on > what *other* names the thing you're sending to has. Sorry, but this > way lies (a) the inability for a large set of developers to understand > what's going on and (b) security bugs. Also, the result probably > can't be reused as part of a non-legacy-filled sensible design What policy database? Matching messages to subscribers? That's the same type of "database" that other ipc subsystems need/want, there's nothing radical here. And lots of things has changed from userspace, based on a decade of knowledge of how dbus works, and how dbus itself was implemented. The design, and code, has been reviewed by those developers. Where issues were raised, they were fixed. Yes, dbus is "odd", but it serves a real need, and does so quite well, and now kdbus is the next evolution of that system, fixing and addressing the issues learned from implementing and designing dbus and previous versions of this type of ipc (corba, dcom, com, etc.) > Kdbus claims to be very fast. Unfortunately, requests for a broad set > of benchmarks have mostly been ignored, my attempts to benchmark it > (admittedly I didn't try that hard) were several times worse than > published figures, and, most tellingly, *no one* has claimed that > kdbus is faster than AF_UNIX. In fact, everyone seems to acknowledge > that kdbus is several times slower than AF_UNIX. It does more than AF_UNIX, so of course it's going to be slower. But you can't do all the things you need to do with dbus with just AF_UNIX, it's a different model. Again, the documentation should explain this. And the benchmarks and source were posted by David previously, with full details, this is the first time I've heard you could not reproduce them using that code. > The metadata thing is problematic. It seems to be intended to serve > two purposes: data gathering for logging and authentication. > Unfortunately, it has issues. There are no fewer than *three* > metadata capture points: creation of a bus, connection to a bus, and > sending of a message. The kdbus authors like to point out that these > are all optional, but IMO that's bunk. Someone will write a userspace > library that rejects messages from people who don't enable all of > them, then then we're screwed. Remember, you asked for it to be optional, it wasn't in the beginning :) So let's make it not optional, great. And the capture points are in different places as it is different data and entry points. > Why are we screwed? Because any kdbus client *won't know which > metadata matters*. That means that we automatically have the worst of > all worlds, not the best. Also, the bus creation metadata is > completely worthless for anything other than logging, but someone will > use it for something other than logging, at which point it's > vulnerable to a DoS. No one has explained to my satisfaction why this > isn't a problem. I don't think the creation data is worthless, I'm pretty sure the SELinux people are using it to validate things, but I could be wrong. Others on the cc: know more about that than I do and can provide details. > Also, the metadata code captures things that are, in my book > completely unacceptable, such as cmdline and (!) capabilities. I bet > that the cmdline capture is extra special fscked up when cgroups and > such are in play because *it reads from the sender's VM*. IOW it's > insecure and pointless. (OK, it has a point: logging. But I really > don't think that belongs in the kernel.) The sender's vm is what is wanted here. And cmdline is something that userspace gets today, and does things with, as does SELinux, and auditing. Same for capabilities, it's not insecure and pointless, it's the same thing that is provided to userspace, and userspace makes decisions on today, independent of kdbus/dbus. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 20:45 ` Greg Kroah-Hartman @ 2015-04-13 21:01 ` Andy Lutomirski 2015-04-14 17:50 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-13 21:01 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote: >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: >> > >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) >> > >> > are available in the git repository at: >> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 >> > >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: >> > >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) >> > >> > ---------------------------------------------------------------- >> > kdbus for 4.1-rc1 >> > >> > Here's the kdbus pull request for 4.1-rc1. >> > >> > It's been under development for many years now, and been in linux-next >> > for many months, and has undergone loads of testing a review and even a few >> > good arguments. It comes with full documentation and tests. >> > >> > There has been a few complaints about the code, notably from people who >> > don't like the use of metadata in the bus messages. That is actually >> > one of the main features here, as we can get this data in a secure and >> > reliable way, and it's something that userspace requires today. So >> > while it does look "odd" to people who are not familiar with dbus, this >> > is something that finally fixes a number of almost unfixable races in >> > the current dbus implementations. >> >> While I generally like the concept of having a better in-kernel IPC >> mechanism, after some consideration I don't think this belongs in the >> kernel in its current form. Here's why. >> >> First, the naming is counterintuitive. There are "endpoints", but you >> don't send messages to endpoints. In fact, an basic kdbus setup will >> have exactly one endpoint AFAICT. Wtf? This makes talking about it >> awkward. > > Did you read the documentation? We've been over this before, and it > should all be addressed in the documentation based on this coming up. > >> A lot of the design seems to be to violate the concept of "mechanism, >> not policy". Kdbus is very much a port of userspace dbus to the >> kernel, and it appears to be a port designed to preserve some >> questionable design decisions instead of learning from them. >> >> For example, kdbus sticks a whole policy database in the kernel, but >> that policy database (AFAICT -- holy crap it's overcomplicated) is >> *not* a simple set of rules like "if A then allow B". Instead it has >> really weird dependencies not on what name you're sending to but on >> what *other* names the thing you're sending to has. Sorry, but this >> way lies (a) the inability for a large set of developers to understand >> what's going on and (b) security bugs. Also, the result probably >> can't be reused as part of a non-legacy-filled sensible design > > What policy database? Matching messages to subscribers? That's the > same type of "database" that other ipc subsystems need/want, there's > nothing radical here. Let me quote from the latest version of the kdbus docs: Note that TALK access is checked against all names of a connection. For example, if a connection owns both <constant>'org.foo.bar'</constant> and <constant>'org.blah.baz'</constant>, and the policy database allows <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this permission is also granted to <constant>'org.foo.bar'</constant>. That might sound illogical, but after all, we allow messages to be directed to either the ID or a well-known name, and policy is applied to the connection, not the name. In other words, the effective TALK policy for a connection is the most permissive of all names the connection owns. In my humble opinion, this paragraph speaks for itself. The design is bad, full stop. [...] > And the benchmarks and source were posted by David previously, with full > details, this is the first time I've heard you could not reproduce them > using that code. No it's not. But I got bored and didn't try again. > >> The metadata thing is problematic. It seems to be intended to serve >> two purposes: data gathering for logging and authentication. >> Unfortunately, it has issues. There are no fewer than *three* >> metadata capture points: creation of a bus, connection to a bus, and >> sending of a message. The kdbus authors like to point out that these >> are all optional, but IMO that's bunk. Someone will write a userspace >> library that rejects messages from people who don't enable all of >> them, then then we're screwed. > > Remember, you asked for it to be optional, it wasn't in the beginning :) > > So let's make it not optional, great. And the capture points are in > different places as it is different data and entry points. Then I'll have to find a way to embolden my NACK further. My point is that capturing garbage like cmdline and capabilities (again, that latter part is completely unacceptable under any circumstances whatsoever) on behalf of *all* senders is a disaster. If it's optional, then I can at least hope that userspace will honor the optionality and let everything turn it off. If it's mandatory, then kdbus is just unsafe to use to send messages to untrusted parties. > >> Why are we screwed? Because any kdbus client *won't know which >> metadata matters*. That means that we automatically have the worst of >> all worlds, not the best. Also, the bus creation metadata is >> completely worthless for anything other than logging, but someone will >> use it for something other than logging, at which point it's >> vulnerable to a DoS. No one has explained to my satisfaction why this >> isn't a problem. > > I don't think the creation data is worthless, I'm pretty sure the > SELinux people are using it to validate things, but I could be wrong. > Others on the cc: know more about that than I do and can provide > details. Does that code even exist in public form yet? > >> Also, the metadata code captures things that are, in my book >> completely unacceptable, such as cmdline and (!) capabilities. I bet >> that the cmdline capture is extra special fscked up when cgroups and >> such are in play because *it reads from the sender's VM*. IOW it's >> insecure and pointless. (OK, it has a point: logging. But I really >> don't think that belongs in the kernel.) > > The sender's vm is what is wanted here. And cmdline is something that > userspace gets today, and does things with, as does SELinux, and > auditing. Same for capabilities, it's not insecure and pointless, it's > the same thing that is provided to userspace, and userspace makes > decisions on today, independent of kdbus/dbus. Is there anything that userspace makes decisions on based on capabilities? If so, please tell me and I'll entertain myself by writing exploits for them. The fact that some existing userspace does awful things does *not* justify adding new kernel mechanisms with which to repeat those mistakes. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 21:01 ` Andy Lutomirski @ 2015-04-14 17:50 ` Greg Kroah-Hartman 2015-04-14 18:57 ` Andy Lutomirski 2015-04-14 22:33 ` Jiri Kosina 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-14 17:50 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote: > On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote: > >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman > >> <gregkh@linuxfoundation.org> wrote: > >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > >> > > >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > >> > > >> > are available in the git repository at: > >> > > >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > >> > > >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > >> > > >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > >> > > >> > ---------------------------------------------------------------- > >> > kdbus for 4.1-rc1 > >> > > >> > Here's the kdbus pull request for 4.1-rc1. > >> > > >> > It's been under development for many years now, and been in linux-next > >> > for many months, and has undergone loads of testing a review and even a few > >> > good arguments. It comes with full documentation and tests. > >> > > >> > There has been a few complaints about the code, notably from people who > >> > don't like the use of metadata in the bus messages. That is actually > >> > one of the main features here, as we can get this data in a secure and > >> > reliable way, and it's something that userspace requires today. So > >> > while it does look "odd" to people who are not familiar with dbus, this > >> > is something that finally fixes a number of almost unfixable races in > >> > the current dbus implementations. > >> > >> While I generally like the concept of having a better in-kernel IPC > >> mechanism, after some consideration I don't think this belongs in the > >> kernel in its current form. Here's why. > >> > >> First, the naming is counterintuitive. There are "endpoints", but you > >> don't send messages to endpoints. In fact, an basic kdbus setup will > >> have exactly one endpoint AFAICT. Wtf? This makes talking about it > >> awkward. > > > > Did you read the documentation? We've been over this before, and it > > should all be addressed in the documentation based on this coming up. > > > >> A lot of the design seems to be to violate the concept of "mechanism, > >> not policy". Kdbus is very much a port of userspace dbus to the > >> kernel, and it appears to be a port designed to preserve some > >> questionable design decisions instead of learning from them. > >> > >> For example, kdbus sticks a whole policy database in the kernel, but > >> that policy database (AFAICT -- holy crap it's overcomplicated) is > >> *not* a simple set of rules like "if A then allow B". Instead it has > >> really weird dependencies not on what name you're sending to but on > >> what *other* names the thing you're sending to has. Sorry, but this > >> way lies (a) the inability for a large set of developers to understand > >> what's going on and (b) security bugs. Also, the result probably > >> can't be reused as part of a non-legacy-filled sensible design > > > > What policy database? Matching messages to subscribers? That's the > > same type of "database" that other ipc subsystems need/want, there's > > nothing radical here. > > Let me quote from the latest version of the kdbus docs: > > Note that TALK access is checked against all names of a connection. For > example, if a connection owns both <constant>'org.foo.bar'</constant> and > <constant>'org.blah.baz'</constant>, and the policy database allows > <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this > permission is also granted to <constant>'org.foo.bar'</constant>. That > might sound illogical, but after all, we allow messages to be directed to > either the ID or a well-known name, and policy is applied to the > connection, not the name. In other words, the effective TALK policy for a > connection is the most permissive of all names the connection owns. > > In my humble opinion, this paragraph speaks for itself. The design is > bad, full stop. First off, thanks for reading the docs, I appreciate that. But realize also, that this is straight from the D-Bus spec. We aren't doing anything "radical" here, this is what your desktop uses that you are typing your email from. Yes, it's an unfortunate design, but one that we are all stuck with (think of it as having to implement code for horrid hardware that you have to get to work properly.) There are many applications out there which don't address messages to their well-known name destination but to the ID which they looked up earlier and cached. In fact, that behavior is the default in the gdbus library implementation. If a connection owns two names, and one is more permissive than the other one, an attacker could as well choose the more openly configured name to get a message delivered. That's nothing we can protect from really. So ideally you never do that, just like you shouldn't do that in an network configuration with DNS, if you want to manage access properly. The logic here is comparable to IP vs. DNS - A host may have multiple DNS names assigned, just like a service may be the owner of multiple well-known names - Clients can talk to a service using its unique ID (uint64_t) or its well known name. - Clients can as well look up the ID of a well-known name and address messages to it directly - Hence, we cannot make decisions based on the well-known name that has been used to send the message - Instead, we have to fall back to the logic described in the docs - Firewall rules are applied to IPs, _not_ DNS names! D-Bus is a specification that has been out there for over a decade, and we are not designing anything new here, but rather implementing it as designed. We have to be compatible to the existing users of the DBus system, and don't have the luxury of being able to change core things like this and expect the world to be able to change just because the design is not as clean as it should/could be. Again, just like getting horrid hardware to work properly, sometimes we have to write odd code. Or having to implement a network protocol that doesn't seem to be designed "perfectly", yet is used by a few hundred million systems so we have to remain compatible. This is all that we are doing here for stuff like this. Remember, this is called kDBUS, not kGENERICIPC, no matter how much we would have liked that to happen from a kernel standpoint. :) > > And the benchmarks and source were posted by David previously, with full > > details, this is the first time I've heard you could not reproduce them > > using that code. > > No it's not. But I got bored and didn't try again. Sorry, I was not aware of that. > >> The metadata thing is problematic. It seems to be intended to serve > >> two purposes: data gathering for logging and authentication. You forgot about introspection, more on that below. > >> Unfortunately, it has issues. There are no fewer than *three* > >> metadata capture points: creation of a bus, connection to a bus, and > >> sending of a message. The kdbus authors like to point out that these > >> are all optional, but IMO that's bunk. Someone will write a userspace > >> library that rejects messages from people who don't enable all of > >> them, then then we're screwed. > > > > Remember, you asked for it to be optional, it wasn't in the beginning :) > > > > So let's make it not optional, great. And the capture points are in > > different places as it is different data and entry points. > > Then I'll have to find a way to embolden my NACK further. My point is > that capturing garbage like cmdline and capabilities (again, that > latter part is completely unacceptable under any circumstances > whatsoever) on behalf of *all* senders is a disaster. If it's > optional, then I can at least hope that userspace will honor the > optionality and let everything turn it off. If it's mandatory, then > kdbus is just unsafe to use to send messages to untrusted parties. It's opted in by the receiving peer if the task implementing a service wants to access these pieces of information. It is optional, and the documentation clearly states that userspace should cope with this, and also, when they are available we make sure to provide the correct race-free information. As said many times before, an application can do so already today with information from other API file systems, so why is this suddenly a problem when kdbus optionally offers the exact same information along with each transmitted message? Yes, we all "hate" capabilities, but userspace uses them, and gets access to them all the time through the POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through /proc/pid/status. They are something that we have to support and handle properly. In the very first submission of kdbus, we stated that we want to allow userspace methods to access these same bits to be able to make decisions about permissions. And to do so in a race-free manner, which is very hard, if not almost impossible, to do so from userspace alone. For instance, if a task has CAP_NET_ADMIN set, we can use that information in order to allow or disallow certain actions to be taken by a privileged process. Or, if a client that has the capability to call reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the system, the system daemon listening for that message knows that yes, at the time that the client made that call, it really did have that capability so it is ok to actually reboot the system. Instead of trying to use SCM_CREDENTIALS to get the pid and another round of cap_get_pid() and the like, all of which are susceptable to racing and all sorts of other horrors, that are insecure, we can provide this information in an atomic, and secure way. The kernel today, and userspace, relies on capabilities all the time (i.e. almost every syscall), how are they something that is somehow not valid to use and support? And of course, as Eric will point out, capabailities are not translatable across user namespaces, which is a problem. Because of this, we dispose of that piece of metadata information when a message crosses a user namespace boundry. This is the right thing to do, which is not the case for almost all other kernel apis which report bogus capabilies when user namespaces are crossed. So we implemented this correctly, and somehow that is a feature so bad that both you and Eric think the whole baby should be thrown out? How else should this be implemented? As for the command line information, yes, it is "unsafe", and we clearly state taht in the documentation. However, it is still a very valid piece of information. For example, when a service is activated by a method, getting to know which binary caused that to happen is very usefull when debugging. It's also very useful when debugging multi-call binaries because the command line actually tells you argv[0] correctly. Because of this, that's why lots of userspace tools use the command line information today, again, providing that information is a help to them, why wouldn't we provide them that help when we have access to it? Metadata attachment has always been optional, based on the setting of the receiving peer, but we have added, at your request, the ability to globally limit what kdbus is able to transport for that metadata, regardless of the settings on both sides. It sounds like this option isn't liked, and I'll be glad to revert it as I do think the metadata is useful and wanted. > >> Why are we screwed? Because any kdbus client *won't know which > >> metadata matters*. That means that we automatically have the worst of > >> all worlds, not the best. Also, the bus creation metadata is > >> completely worthless for anything other than logging, but someone will > >> use it for something other than logging, at which point it's > >> vulnerable to a DoS. No one has explained to my satisfaction why this > >> isn't a problem. Metadata is gathered for logging, authentication and introspection. Bus creator metadata is not used for logging or authentication, but for introspection only. It could be really useful for a service that has a bus handle to actually know which bus it is connected to, but it's not supposed to be used as authentication measure. So I was wrong to think that the SELinux people use it, sorry about that. Remember there are three different places that metadata is collected, for three different things. Yeah, we call them all "metadata", which is probably why the confusion here, but these all are different "things" entirely, and the documentation does describe this really well. If not, please let me know and I will work on it to make it more clear. The important point here is that you cannot look at this concept without keeping the dbus spec in scope. Nobody is supposed to write native kdbus clients directly. you can, of course, but the entire concept of how services are implemented follows a higer-level logic which is supposed to be implemented by high-level libraries. Yes, this isn't the best argument for why you might feel more comforatable about merging this code, as us kernel developers are used to stand-alone apis that they can use without library helpers, but it is common and needed. But really, when was the last time you wrote an ALSA library from scratch? :) Again, remember the compatibility requirements for your userspace D-Bus clients today, we have to ensure this, or this code is pointless. A word about introspection. In talking about this with Daniel on IRC today, he came up with this good example to explain it better to me, as I didn't quite understand it well. I'll paraphrase it here, keeping with the "bus" metaphor that D-Bus requires: Imagine we're all taking a little tour, out to the nature, a lake or something. We're taking a bus to get there. The bus can accommodate a large number of people, and we don't know yet who will join. Everybody who enters the bus has to show their passport to the conductor (refrain from calling it driver, because hell no, it clearly isn't a driver!! ;)). The conductor makes a copy of each of the people entering the bus, because it wants to know who's on the bus. One property of that strange bunch of programmers on the bus is that they don't necessarily respond to anything, but whenever anyone in the bus talks to another person, they show their passport in order to identify themselves, because you know you can't trust anyone. Next, the police stops the bus and wants to know who's on it. As the programmers usually don't respond when being spoken to, especially if it is the police, the bus conductor hands out a list of all the passport copies he gathered. That is called introspection that is not backed by cooperative bus members. The conductor makes a copy of each OF THE PASSPORT of the people entering the bus, to help the police (i.e. debuggers) determine who is on the bus. It is a property of the bus itself which describes which personal data you have to give to the conductor in order to be allowed in. If you're not willing to give out all the bits the bus requires you to, you have to stay out. That's not a problem of the system, but rather something to discuss with the owner of the bus. This way, it is totally possible to have a bus that does not require anything from its passengers, and passengers that do not allow any personal information to be revealed, but then the police can't do much of course when it stops the bus in order to introspect it. Then there is a set of global laws in that world in which all the busses live. These laws define which data is allowed to be passed around at all in general. When a bus requires its passengers to reveal their hair color, for instance, but passing that information around is forbidden by global law. This requirement is ignored when buses are created or anyone enters any of those buses. And to complete the story and outline the differences of the passports that were used to make a copy from and the one that is used during communication, we'd have to a add story about people changing their hair color constantly in the washroom on the back of the bus, out of sight of the conductor, but this metaphor is getting quite long enough already... Does that help explain introspection and the need for it here? > >> Also, the metadata code captures things that are, in my book > >> completely unacceptable, such as cmdline and (!) capabilities. I bet > >> that the cmdline capture is extra special fscked up when cgroups and > >> such are in play because *it reads from the sender's VM*. IOW it's > >> insecure and pointless. (OK, it has a point: logging. But I really > >> don't think that belongs in the kernel.) > > > > The sender's vm is what is wanted here. And cmdline is something that > > userspace gets today, and does things with, as does SELinux, and > > auditing. Same for capabilities, it's not insecure and pointless, it's > > the same thing that is provided to userspace, and userspace makes > > decisions on today, independent of kdbus/dbus. > > Is there anything that userspace makes decisions on based on > capabilities? If so, please tell me and I'll entertain myself by > writing exploits for them. > > The fact that some existing userspace does awful things does *not* > justify adding new kernel mechanisms with which to repeat those > mistakes. polkit used to do something like this, but the obvious race conditions that you know about prevented it from working properly, so other odd work-arounds had to be created. However, if we can provide this in a race free manner, those work-arounds are no longer needed. As documented in the original email on this thread, Tizen wants to use this, as it solves a real need that they have. Their workarounds involve using custom UDS sockets, but the latency involved is horrid and unacceptable. Using a kdbus message solves this issue for them, allowing UI rendering to work properly/quickly. Again, capabilities are something we all require and rely on today, passing the current capability on to a recipient isn't a way to raise privileges at all, but rather, properly determine if they are present at sending time, if wanted. How does that create an insecure system? What am I missing that is so bad here with the design we have? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 17:50 ` Greg Kroah-Hartman @ 2015-04-14 18:57 ` Andy Lutomirski 2015-04-14 19:23 ` Greg Kroah-Hartman 2015-04-15 12:00 ` Greg Kroah-Hartman 2015-04-14 22:33 ` Jiri Kosina 1 sibling, 2 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-14 18:57 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote: >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote: >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman >> >> <gregkh@linuxfoundation.org> wrote: >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: >> >> > >> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) >> >> > >> >> > are available in the git repository at: >> >> > >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 >> >> > >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: >> >> > >> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) >> >> > >> >> > ---------------------------------------------------------------- >> >> > kdbus for 4.1-rc1 >> >> > >> >> > Here's the kdbus pull request for 4.1-rc1. >> >> > >> >> > It's been under development for many years now, and been in linux-next >> >> > for many months, and has undergone loads of testing a review and even a few >> >> > good arguments. It comes with full documentation and tests. >> >> > >> >> > There has been a few complaints about the code, notably from people who >> >> > don't like the use of metadata in the bus messages. That is actually >> >> > one of the main features here, as we can get this data in a secure and >> >> > reliable way, and it's something that userspace requires today. So >> >> > while it does look "odd" to people who are not familiar with dbus, this >> >> > is something that finally fixes a number of almost unfixable races in >> >> > the current dbus implementations. >> >> >> >> While I generally like the concept of having a better in-kernel IPC >> >> mechanism, after some consideration I don't think this belongs in the >> >> kernel in its current form. Here's why. >> >> >> >> First, the naming is counterintuitive. There are "endpoints", but you >> >> don't send messages to endpoints. In fact, an basic kdbus setup will >> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it >> >> awkward. >> > >> > Did you read the documentation? We've been over this before, and it >> > should all be addressed in the documentation based on this coming up. >> > >> >> A lot of the design seems to be to violate the concept of "mechanism, >> >> not policy". Kdbus is very much a port of userspace dbus to the >> >> kernel, and it appears to be a port designed to preserve some >> >> questionable design decisions instead of learning from them. >> >> >> >> For example, kdbus sticks a whole policy database in the kernel, but >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is >> >> *not* a simple set of rules like "if A then allow B". Instead it has >> >> really weird dependencies not on what name you're sending to but on >> >> what *other* names the thing you're sending to has. Sorry, but this >> >> way lies (a) the inability for a large set of developers to understand >> >> what's going on and (b) security bugs. Also, the result probably >> >> can't be reused as part of a non-legacy-filled sensible design >> > >> > What policy database? Matching messages to subscribers? That's the >> > same type of "database" that other ipc subsystems need/want, there's >> > nothing radical here. >> >> Let me quote from the latest version of the kdbus docs: >> >> Note that TALK access is checked against all names of a connection. For >> example, if a connection owns both <constant>'org.foo.bar'</constant> and >> <constant>'org.blah.baz'</constant>, and the policy database allows >> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this >> permission is also granted to <constant>'org.foo.bar'</constant>. That >> might sound illogical, but after all, we allow messages to be directed to >> either the ID or a well-known name, and policy is applied to the >> connection, not the name. In other words, the effective TALK policy for a >> connection is the most permissive of all names the connection owns. >> >> In my humble opinion, this paragraph speaks for itself. The design is >> bad, full stop. > > First off, thanks for reading the docs, I appreciate that. But realize > also, that this is straight from the D-Bus spec. We aren't doing > anything "radical" here, this is what your desktop uses that you are > typing your email from. > > Yes, it's an unfortunate design, but one that we are all stuck with > (think of it as having to implement code for horrid hardware that you > have to get to work properly.) I agree. You've sent a pull request for an unfortunate design. I don't think that unfortunate design belongs in the kernel. If it says in userspace, then user programmers could potentially fix it some day. > There are many applications out there > which don't address messages to their well-known name destination but > to the ID which they looked up earlier and cached. In fact, that > behavior is the default in the gdbus library implementation. > > If a connection owns two names, and one is more permissive than the > other one, an attacker could as well choose the more openly configured > name to get a message delivered. That's nothing we can protect from > really. So ideally you never do that, just like you shouldn't do that > in an network configuration with DNS, if you want to manage access > properly. > > The logic here is comparable to IP vs. DNS [snip some] It's comparable to someone trying to write a firewall that filters on DNS names. There's a good reason that people don't do that. [snip] >> >> Then I'll have to find a way to embolden my NACK further. My point is >> that capturing garbage like cmdline and capabilities (again, that >> latter part is completely unacceptable under any circumstances >> whatsoever) on behalf of *all* senders is a disaster. If it's >> optional, then I can at least hope that userspace will honor the >> optionality and let everything turn it off. If it's mandatory, then >> kdbus is just unsafe to use to send messages to untrusted parties. > > It's opted in by the receiving peer if the task implementing a service > wants to access these pieces of information. It is optional, and the > documentation clearly states that userspace should cope with this, and > also, when they are available we make sure to provide the correct > race-free information. > > As said many times before, an application can do so already today with > information from other API file systems, so why is this suddenly a > problem when kdbus optionally offers the exact same information along > with each transmitted message? Yes, we all "hate" capabilities, but > userspace uses them, and gets access to them all the time through the > POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through > /proc/pid/status. They are something that we have to support and handle > properly. > > In the very first submission of kdbus, we stated that we want to allow > userspace methods to access these same bits to be able to make decisions > about permissions. And to do so in a race-free manner, which is very > hard, if not almost impossible, to do so from userspace alone. > > For instance, if a task has CAP_NET_ADMIN set, we can use that > information in order to allow or disallow certain actions to be taken by > a privileged process. Or, if a client that has the capability to call > reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the > system, the system daemon listening for that message knows that yes, at > the time that the client made that call, it really did have that > capability so it is ok to actually reboot the system. > > Instead of trying to use SCM_CREDENTIALS to get the pid and another > round of cap_get_pid() and the like, all of which are susceptable to > racing and all sorts of other horrors, that are insecure, we can provide > this information in an atomic, and secure way. /me suppresses a long string of expletives. Please point me at the code that does this with caps. It's WRONG in userspace and it's WRONG in the kernel. I want to know what code that runs on my system does this so I can send the appropriate bug reports and get it fixed. I think the RHEL crowd at least will take it seriously when I tell them that this is a security hole. > > The kernel today, and userspace, relies on capabilities all the time > (i.e. almost every syscall), how are they something that is somehow not > valid to use and support? No. The *kernel* relies on caps. Userspace should not. > > > And of course, as Eric will point out, capabailities are not > translatable across user namespaces, which is a problem. Because of > this, we dispose of that piece of metadata information when a message > crosses a user namespace boundry. This is the right thing to do, which > is not the case for almost all other kernel apis which report bogus > capabilies when user namespaces are crossed. The right thing to do is to not use capabilities for userspace stuff. > > So we implemented this correctly, and somehow that is a feature so bad > that both you and Eric think the whole baby should be thrown out? How > else should this be implemented? It shouldn't be implemented. > > As documented in the original email on this thread, Tizen wants to use > this, as it solves a real need that they have. Their workarounds > involve using custom UDS sockets, but the latency involved is horrid and > unacceptable. Using a kdbus message solves this issue for them, > allowing UI rendering to work properly/quickly. > > Again, capabilities are something we all require and rely on today, > passing the current capability on to a recipient isn't a way to raise > privileges at all, but rather, properly determine if they are present > at sending time, if wanted. How does that create an insecure system? > What am I missing that is so bad here with the design we have? That, even if the implementation could be made to be useful and correct, capabilities refer to privileges wrt the kernel, not userspace. They're not the right bit of policy to look at here. For example, the thing that should make it possible to run 'systemctl reboot' or whatever is not CAP_SYS_BOOT, because CAP_SYS_BOOT is the permission to hard reboot the system immediately, and that's not what 'systemctl reboot' is for. I find myself comparing kdbus to win32k, and that's not a good sign... --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 18:57 ` Andy Lutomirski @ 2015-04-14 19:23 ` Greg Kroah-Hartman 2015-04-14 19:24 ` Borislav Petkov ` (2 more replies) 2015-04-15 12:00 ` Greg Kroah-Hartman 1 sibling, 3 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-14 19:23 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote: > On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote: > >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman > >> <gregkh@linuxfoundation.org> wrote: > >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote: > >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman > >> >> <gregkh@linuxfoundation.org> wrote: > >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > >> >> > > >> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > >> >> > > >> >> > are available in the git repository at: > >> >> > > >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > >> >> > > >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > >> >> > > >> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > >> >> > > >> >> > ---------------------------------------------------------------- > >> >> > kdbus for 4.1-rc1 > >> >> > > >> >> > Here's the kdbus pull request for 4.1-rc1. > >> >> > > >> >> > It's been under development for many years now, and been in linux-next > >> >> > for many months, and has undergone loads of testing a review and even a few > >> >> > good arguments. It comes with full documentation and tests. > >> >> > > >> >> > There has been a few complaints about the code, notably from people who > >> >> > don't like the use of metadata in the bus messages. That is actually > >> >> > one of the main features here, as we can get this data in a secure and > >> >> > reliable way, and it's something that userspace requires today. So > >> >> > while it does look "odd" to people who are not familiar with dbus, this > >> >> > is something that finally fixes a number of almost unfixable races in > >> >> > the current dbus implementations. > >> >> > >> >> While I generally like the concept of having a better in-kernel IPC > >> >> mechanism, after some consideration I don't think this belongs in the > >> >> kernel in its current form. Here's why. > >> >> > >> >> First, the naming is counterintuitive. There are "endpoints", but you > >> >> don't send messages to endpoints. In fact, an basic kdbus setup will > >> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it > >> >> awkward. > >> > > >> > Did you read the documentation? We've been over this before, and it > >> > should all be addressed in the documentation based on this coming up. > >> > > >> >> A lot of the design seems to be to violate the concept of "mechanism, > >> >> not policy". Kdbus is very much a port of userspace dbus to the > >> >> kernel, and it appears to be a port designed to preserve some > >> >> questionable design decisions instead of learning from them. > >> >> > >> >> For example, kdbus sticks a whole policy database in the kernel, but > >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is > >> >> *not* a simple set of rules like "if A then allow B". Instead it has > >> >> really weird dependencies not on what name you're sending to but on > >> >> what *other* names the thing you're sending to has. Sorry, but this > >> >> way lies (a) the inability for a large set of developers to understand > >> >> what's going on and (b) security bugs. Also, the result probably > >> >> can't be reused as part of a non-legacy-filled sensible design > >> > > >> > What policy database? Matching messages to subscribers? That's the > >> > same type of "database" that other ipc subsystems need/want, there's > >> > nothing radical here. > >> > >> Let me quote from the latest version of the kdbus docs: > >> > >> Note that TALK access is checked against all names of a connection. For > >> example, if a connection owns both <constant>'org.foo.bar'</constant> and > >> <constant>'org.blah.baz'</constant>, and the policy database allows > >> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this > >> permission is also granted to <constant>'org.foo.bar'</constant>. That > >> might sound illogical, but after all, we allow messages to be directed to > >> either the ID or a well-known name, and policy is applied to the > >> connection, not the name. In other words, the effective TALK policy for a > >> connection is the most permissive of all names the connection owns. > >> > >> In my humble opinion, this paragraph speaks for itself. The design is > >> bad, full stop. > > > > First off, thanks for reading the docs, I appreciate that. But realize > > also, that this is straight from the D-Bus spec. We aren't doing > > anything "radical" here, this is what your desktop uses that you are > > typing your email from. > > > > Yes, it's an unfortunate design, but one that we are all stuck with > > (think of it as having to implement code for horrid hardware that you > > have to get to work properly.) > > I agree. You've sent a pull request for an unfortunate design. I > don't think that unfortunate design belongs in the kernel. If it says > in userspace, then user programmers could potentially fix it some day. You might not like the design, but it is a valid design. Again, we don't refuse to support hardware that is designed badly. Or support protocols we don't necessarily like, that's not the job of a kernel or operating system. And here's Havoc's response as to why actually, this is a good design: http://lists.freedesktop.org/archives/dbus/2015-April/016651.html so while we might not think it's nice, maybe we are just not that knowledgeable in this design space, and need to trust those that are. I know I do. I'll respond to the rest after I get some dinner... thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:23 ` Greg Kroah-Hartman @ 2015-04-14 19:24 ` Borislav Petkov 2015-04-14 19:32 ` Greg Kroah-Hartman 2015-04-14 19:35 ` Al Viro 2015-04-14 20:14 ` John Stoffel 2 siblings, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-14 19:24 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: > You might not like the design, but it is a valid design. Again, we > don't refuse to support hardware that is designed badly. Yeah except the small difference that unlike this, we can't change hardware. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:24 ` Borislav Petkov @ 2015-04-14 19:32 ` Greg Kroah-Hartman 2015-04-14 19:40 ` Al Viro 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-14 19:32 UTC (permalink / raw) To: Borislav Petkov Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote: > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: > > You might not like the design, but it is a valid design. Again, we > > don't refuse to support hardware that is designed badly. > > Yeah except the small difference that unlike this, we can't change > hardware. And we can't change the design/implementation of many things, again, it's not the kernel's job to prevent something, just because we don't like the RFC, from being accepted. Go read Havoc's email about why the design is the way it is that I just posted. Maybe we are the ones that really don't know the issues involved enough to say that the current design is somehow "wrong". thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:32 ` Greg Kroah-Hartman @ 2015-04-14 19:40 ` Al Viro 2015-04-14 19:48 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Al Viro @ 2015-04-14 19:40 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote: > On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote: > > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: > > > You might not like the design, but it is a valid design. Again, we > > > don't refuse to support hardware that is designed badly. > > > > Yeah except the small difference that unlike this, we can't change > > hardware. > > And we can't change the design/implementation of many things, again, > it's not the kernel's job to prevent something, just because we don't > like the RFC, from being accepted. Translate, please. What exactly will be prevented by NAK on your Fine Piece Of Software? Not dbus working as it does, surely? ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:40 ` Al Viro @ 2015-04-14 19:48 ` Greg Kroah-Hartman 2015-04-14 19:53 ` Borislav Petkov ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-14 19:48 UTC (permalink / raw) To: Al Viro Cc: Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 08:40:04PM +0100, Al Viro wrote: > On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote: > > On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote: > > > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: > > > > You might not like the design, but it is a valid design. Again, we > > > > don't refuse to support hardware that is designed badly. > > > > > > Yeah except the small difference that unlike this, we can't change > > > hardware. > > > > And we can't change the design/implementation of many things, again, > > it's not the kernel's job to prevent something, just because we don't > > like the RFC, from being accepted. > > Translate, please. What exactly will be prevented by NAK on your Fine > Piece Of Software? Not dbus working as it does, surely? I don't understand. You can not like the D-Bus model (and accordingly the X11 model), but to prevent users from wanting to use it in a more secure, and faster way by implementing it like we have seems very odd to me. It's not going to stop anything from working, it's just going to stop some programs from being able to do things they really want to do (see the first email for examples.) Yes, we could make this live outside the kernel tree, but that's not the way we work anymore. We merge things that are useful, that match our security and coding requirements, and are going to be maintained by people we trust. To have the only major objection be "we don't like the way the protocol is designed because we know better, sorry", isn't ok at all. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:48 ` Greg Kroah-Hartman @ 2015-04-14 19:53 ` Borislav Petkov 2015-04-15 8:44 ` Greg Kroah-Hartman 2015-04-14 20:11 ` Martin Steigerwald 2015-04-14 22:39 ` Jiri Kosina 2 siblings, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-14 19:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 09:48:04PM +0200, Greg Kroah-Hartman wrote: > It's not going to stop anything from working, it's just going to stop > some programs from being able to do things they really want to do (see > the first email for examples.) Until it is made "mandatory" as Al said earlier. > Yes, we could make this live outside the kernel tree, but that's not the > way we work anymore. > We merge things that are useful, that match our > security and coding requirements, and are going to be maintained by > people we trust. We trust? I'm not going to even comment on that. And frankly, merging a useful piece of code sounds completely different to me than this serious backlash I'm reading from the sidelines. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:53 ` Borislav Petkov @ 2015-04-15 8:44 ` Greg Kroah-Hartman 2015-04-15 8:54 ` Jiri Kosina 2015-04-15 9:35 ` Borislav Petkov 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:44 UTC (permalink / raw) To: Borislav Petkov Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 09:53:36PM +0200, Borislav Petkov wrote: > On Tue, Apr 14, 2015 at 09:48:04PM +0200, Greg Kroah-Hartman wrote: > > It's not going to stop anything from working, it's just going to stop > > some programs from being able to do things they really want to do (see > > the first email for examples.) > > Until it is made "mandatory" as Al said earlier. If you really don't like userspace using features the kernel provides you, well, there's nothing I can say that will change that odd feeling, sorry. If we don't want to make the metadata thing optional because everyone will end up always using it, great, we will go make that change, that's not an issue at all. It will then end up looking like the first proposal that was made many months ago :) > > Yes, we could make this live outside the kernel tree, but that's not the > > way we work anymore. > > > We merge things that are useful, that match our > > security and coding requirements, and are going to be maintained by > > people we trust. > > We trust? I'm not going to even comment on that. Really? Who in that MAINTAINERS file entry do you not trust? Seriously, if that's the issue here, please let me know. Do you not trust me? Daniel? David? Djalal? All of us have been long-time kernel developers and maintainers of other portions of the kernel stack that you rely on every day. If you have objections to any of us maintaining this code, let me know. Otherwise, stop making foolish statements. > And frankly, merging a useful piece of code sounds completely different > to me than this serious backlash I'm reading from the sidelines. I don't understand what this means. If you have a technical reason for why this code shouldn't be merged, great, please let me know and we can work to address that. Andy and Al have spent time reviewing and giving us comments, and that's wonderful and valuable and is why I treat their comments seriously. If you are interested in the code, please review it, otherwise I don't see what this adds to the conversation at all, do you? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:44 ` Greg Kroah-Hartman @ 2015-04-15 8:54 ` Jiri Kosina 2015-04-15 9:09 ` Greg Kroah-Hartman 2015-04-15 9:35 ` Borislav Petkov 1 sibling, 1 reply; 333+ messages in thread From: Jiri Kosina @ 2015-04-15 8:54 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Borislav Petkov, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > If you have a technical reason for why this code shouldn't be merged, > great, please let me know and we can work to address that. Andy and Al > have spent time reviewing and giving us comments, and that's wonderful > and valuable and is why I treat their comments seriously. If you are > interested in the code, please review it, otherwise I don't see what > this adds to the conversation at all, do you? You've actually touched another issue I see here, and that is -- the code is complex like crazy. I've spent big part of past two days trying to get my head around it, but I am still far away from getting at least the 1000 miles overview of how exactly the message passing is designed. I understand that the primary reason for this complexity is probably the dbus protocol specification itself. But the problem really is that I don't think you've received even a single Reviewed-by: from someone who hasn't been directly involved in developing the code, right? For something that's potentially such a core mechanism as a completely new, massively-adopted IPC, this does send a warning singal. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:54 ` Jiri Kosina @ 2015-04-15 9:09 ` Greg Kroah-Hartman 2015-04-15 12:36 ` Al Viro 2015-04-15 16:47 ` Steven Rostedt 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 9:09 UTC (permalink / raw) To: Jiri Kosina Cc: Borislav Petkov, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 10:54:41AM +0200, Jiri Kosina wrote: > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > > > If you have a technical reason for why this code shouldn't be merged, > > great, please let me know and we can work to address that. Andy and Al > > have spent time reviewing and giving us comments, and that's wonderful > > and valuable and is why I treat their comments seriously. If you are > > interested in the code, please review it, otherwise I don't see what > > this adds to the conversation at all, do you? > > You've actually touched another issue I see here, and that is -- the code > is complex like crazy. > > I've spent big part of past two days trying to get my head around it, but > I am still far away from getting at least the 1000 miles overview of how > exactly the message passing is designed. > > I understand that the primary reason for this complexity is probably the > dbus protocol specification itself. Yes it is. > But the problem really is that I don't think you've received even a single > Reviewed-by: from someone who hasn't been directly involved in developing > the code, right? I've asked for it, but finding people to review code is hard, as you know. It's only 13k lines long, smaller than a serial port driver (my unit of code review), so it's not all that big. It's smaller than the USB3 host controller driver as well, and very few people ever reviewed that beast :) > For something that's potentially such a core mechanism as a completely > new, massively-adopted IPC, this does send a warning singal. If you know of a way to force others to review code, please let me know. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:09 ` Greg Kroah-Hartman @ 2015-04-15 12:36 ` Al Viro 2015-04-15 13:13 ` Greg Kroah-Hartman 2015-04-15 16:47 ` Steven Rostedt 1 sibling, 1 reply; 333+ messages in thread From: Al Viro @ 2015-04-15 12:36 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote: > I've asked for it, but finding people to review code is hard, as you > know. It's only 13k lines long, smaller than a serial port driver (my > unit of code review), so it's not all that big. > > It's smaller than the USB3 host controller driver as well, and very few > people ever reviewed that beast :) > > > For something that's potentially such a core mechanism as a completely > > new, massively-adopted IPC, this does send a warning singal. > > If you know of a way to force others to review code, please let me know. Have it in a less nasty state, perhaps? Random question: al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock); ipc/kdbus/node.c:340: down_write(&kdbus_node_idr_lock); ipc/kdbus/node.c:344: up_write(&kdbus_node_idr_lock); ipc/kdbus/node.c:444: down_write(&kdbus_node_idr_lock); ipc/kdbus/node.c:452: up_write(&kdbus_node_idr_lock); Do you see anything wrong with that? Or with things like that: mutex_lock(&pos->lock); v_pre = atomic_read(&pos->active); if (v_pre >= 0) atomic_add_return(KDBUS_NODE_BIAS, &pos->active); else if (v_pre == KDBUS_NODE_NEW) atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT); mutex_unlock(&pos->lock); What are the locking rules for ->active/->waitq/->lock? Are those the outermost thing in the hierarchy? Or is that dependent on the node location? It sure as hell is outside of (at least) ->mmap_sem (by way of kdbus_conn_connect() establishing that ->active/->waitq is outside of ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone brings in a lot... Document your goddamn locking, would you? It *IS* new code, and you, as you say, had very few people working on it, so you don't have the excuses for the mess existing in older parts of the tree. Locking complexity in there is easily as bad as that of VFS sans the RCU fun; sure, I can spend a week and (hopefully) document it for you, but I would really prefer if you guys had done that. And I *do* appreciate the comments in node.c, but they are nowhere near enough. Tracking the call chains in there and trying to derive the locking ordering from those is quite a bit of work; _verifying_ that it matches the claimed one would be expected from reviewers, but as it is you are asking to spend a lot of efforts to close the gaps in your documentation. Sheesh... ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 12:36 ` Al Viro @ 2015-04-15 13:13 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 13:13 UTC (permalink / raw) To: Al Viro Cc: Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:36:33PM +0100, Al Viro wrote: > al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock > ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock); > ipc/kdbus/node.c:340: down_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:344: up_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:444: down_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:452: up_write(&kdbus_node_idr_lock); Heh, that's a leftover from an older version, I'll go fix that up to be a simple mutex, which is all that this is doing here anyway. > Do you see anything wrong with that? Or with things like that: > mutex_lock(&pos->lock); > v_pre = atomic_read(&pos->active); > if (v_pre >= 0) > atomic_add_return(KDBUS_NODE_BIAS, &pos->active); > else if (v_pre == KDBUS_NODE_NEW) > atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT); > mutex_unlock(&pos->lock); > What are the locking rules for ->active/->waitq/->lock? Are those the > outermost thing in the hierarchy? Or is that dependent on the node location? > It sure as hell is outside of (at least) ->mmap_sem (by way of > kdbus_conn_connect() establishing that ->active/->waitq is outside of > ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything > taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone > brings in a lot... > > Document your goddamn locking, would you? It *IS* new code, and you, as you > say, had very few people working on it, so you don't have the excuses for > the mess existing in older parts of the tree. Fair enough, documenting the locking is a good thing, that will make reviewing this easier, I'll go work on that. > Locking complexity in there is easily as bad as that of VFS sans the RCU fun; > sure, I can spend a week and (hopefully) document it for you, but I would > really prefer if you guys had done that. And I *do* appreciate the comments > in node.c, but they are nowhere near enough. Thanks, it's hard to balance the comment/code level at times. And yes, it is complex and should be explained better, will work on that. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:09 ` Greg Kroah-Hartman 2015-04-15 12:36 ` Al Viro @ 2015-04-15 16:47 ` Steven Rostedt 1 sibling, 0 replies; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 16:47 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jiri Kosina, Borislav Petkov, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote: > > > But the problem really is that I don't think you've received even a single > > Reviewed-by: from someone who hasn't been directly involved in developing > > the code, right? > > I've asked for it, but finding people to review code is hard, as you Perhaps try harder. You know more kernel developers than I do. You don't have anyone you can say "hey, I need this code reviewed, can you spend some time to review it for me"? I have a few developers that are willing to do that for me, and I wont push some code (if it is complex) until they give their review-by for it. I did that with the latest TRACE_DEFINE_ENUM() code, as well as my ftrace trampoline code and the multi buffer code. None of that went in until I had their reviewed-by tags. > know. It's only 13k lines long, smaller than a serial port driver (my > unit of code review), so it's not all that big. Length of code does not determine the complexity of it. > > It's smaller than the USB3 host controller driver as well, and very few > people ever reviewed that beast :) > > > For something that's potentially such a core mechanism as a completely > > new, massively-adopted IPC, this does send a warning singal. > > If you know of a way to force others to review code, please let me know. Keep asking, that's the best way. That's what I do. Also, I really like Alan's approach to this. Let me requote it here: - stop writing a dbus only file system - figure out what a messaging "vfs" looks like - figure out what an clean low level kernel model looks like - figure out what has to be where to put the policy in userspace -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:44 ` Greg Kroah-Hartman 2015-04-15 8:54 ` Jiri Kosina @ 2015-04-15 9:35 ` Borislav Petkov 2015-04-15 11:45 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: Borislav Petkov @ 2015-04-15 9:35 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 10:44:40AM +0200, Greg Kroah-Hartman wrote: > If you really don't like userspace using features the kernel provides > you, well, there's nothing I can say that will change that odd feeling, > sorry. Are you even reading what people are saying? I don't like the mandatory(!) aspect of this, which it will eventually become. There is this thing called "choice", remember? > Really? Who in that MAINTAINERS file entry do you not trust? The fact that you're still pushing for this current design *in the face* of people pointing out serious design flaws with this makes me not really trust you. > I don't understand what this means. If you have a technical reason > for why this code shouldn't be merged, great, please let me know and > we can work to address that. Andy and Al have spent time reviewing > and giving us comments, and that's wonderful and valuable and is > why I treat their comments seriously. If you are interested in the > code, please review it, Yeah, I took a brief look at the code. It is overcomplicated. If I were to review it properly, I'd ask you to split it in small patchsets. Hell, I'm pretty sure you would do the same for code you don't know if you were in my shoes. Also, considering the complexity of this patchset, it doesn't have a single Reviewed-by by an external party. If this were any other submission, it would've been kicked to the curb a long time ago. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 9:35 ` Borislav Petkov @ 2015-04-15 11:45 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 11:45 UTC (permalink / raw) To: Borislav Petkov Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:35:07AM +0200, Borislav Petkov wrote: > On Wed, Apr 15, 2015 at 10:44:40AM +0200, Greg Kroah-Hartman wrote: > > If you really don't like userspace using features the kernel provides > > you, well, there's nothing I can say that will change that odd feeling, > > sorry. > > Are you even reading what people are saying? You aren't reading the patches :) > I don't like the mandatory(!) aspect of this, which it will eventually > become. There is this thing called "choice", remember? See my other response about that. > > Really? Who in that MAINTAINERS file entry do you not trust? > > The fact that you're still pushing for this current design *in the face* > of people pointing out serious design flaws with this makes me not > really trust you. Please discuss these "serious design flaws". I have responded to all of the ones that I have seen so far in this thread. And in all of the other threads since this patch series was first posted months ago. I would love to discuss the code, so please, let's do that. > > I don't understand what this means. If you have a technical reason > > for why this code shouldn't be merged, great, please let me know and > > we can work to address that. Andy and Al have spent time reviewing > > and giving us comments, and that's wonderful and valuable and is > > why I treat their comments seriously. If you are interested in the > > code, please review it, > > Yeah, I took a brief look at the code. It is overcomplicated. > > If I were to review it properly, I'd ask you to split it in small > patchsets. Hell, I'm pretty sure you would do the same for code you > don't know if you were in my shoes. It has been split into small patchsets, see the original postings. And really, 13k lines of code is not all that big. We review driver submissions larger than that all the time. Remember, your USB host controller driver is bigger than that. > Also, considering the complexity of this patchset, it doesn't have > a single Reviewed-by by an external party. If this were any other > submission, it would've been kicked to the curb a long time ago. Please, review it, I would love for others to do so, and have been asking for that since the beginning of this whole process months ago. And I'd like to thank Andy and others for doing that. Based on their review comments we have changed the api, redone the infrastructure, and modified lots of different things. The code has massively changed for the better because of this process. I'm not asking for it to stop, I'm asking for it to be merged now as everyone seems to have not had any more comments on the code anymore, other than Andy's specific comments, and everyone else's vague rants. I'm addressing Andy's comments, and I would love to address yours, if you actually made any technical ones here. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:48 ` Greg Kroah-Hartman 2015-04-14 19:53 ` Borislav Petkov @ 2015-04-14 20:11 ` Martin Steigerwald 2015-04-14 22:39 ` Jiri Kosina 2 siblings, 0 replies; 333+ messages in thread From: Martin Steigerwald @ 2015-04-14 20:11 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am Dienstag, 14. April 2015, 21:48:04 schrieb Greg Kroah-Hartman: > On Tue, Apr 14, 2015 at 08:40:04PM +0100, Al Viro wrote: > > On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote: > > > On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote: > > > > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: > > > > > You might not like the design, but it is a valid design. Again, > > > > > we > > > > > don't refuse to support hardware that is designed badly. > > > > > > > > Yeah except the small difference that unlike this, we can't change > > > > hardware. > > > > > > And we can't change the design/implementation of many things, again, > > > it's not the kernel's job to prevent something, just because we > > > don't > > > like the RFC, from being accepted. > > > > Translate, please. What exactly will be prevented by NAK on your Fine > > Piece Of Software? Not dbus working as it does, surely? > > I don't understand. You can not like the D-Bus model (and accordingly > the X11 model), but to prevent users from wanting to use it in a more > secure, and faster way by implementing it like we have seems very odd to > me. > > It's not going to stop anything from working, it's just going to stop > some programs from being able to do things they really want to do (see > the first email for examples.) > > Yes, we could make this live outside the kernel tree, but that's not the > way we work anymore. We merge things that are useful, that match our > security and coding requirements, and are going to be maintained by > people we trust. To have the only major objection be "we don't like > the way the protocol is designed because we know better, sorry", isn't > ok at all. Greg, I think I understood Al here. dbus as it is used in KDE, GNOME, network-manager, systemd, you name it does work. Not merging kdbus will not break it. So the ones who want to see kdbus in kernel want to do something better or differently like it is currently done in dbus. And yes, I have seen the presentations about the benefits of having dbus in the kernel. But if thats the case, what I think Al asks for a *new* kernel component is a sound design that does not repeat any flaws from the original design as the original design is no hardware that cannot be changed anymore after production. And to whether the design of kdbus is sound there seem to be strong different oppinions about it. I think it is important to accept that and go from there. On the other hand, if you do things differently enough from the way userspace dbus is doing it in order to have such a sound design, it may be necessary to adapt all applications to it. But since kdbus is not yet in the kernel officially this would not violate the "we never ever break userspace" rule, cause the kernel obviously doesn´t guarantee the stability of the current userspace dbus API, cause it doesn´t yet have such an API at all. But if kdbus goes in, it has, and then it needs to guarantee it until this "never break userspace" rule is changed, *if* ever. And also: Even if the kernel API is different in order to be sound, it may be possible to adapt userspace dbus to use it to improve upon some of its current flaws so that applications using it do not need to be changed at all. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:48 ` Greg Kroah-Hartman 2015-04-14 19:53 ` Borislav Petkov 2015-04-14 20:11 ` Martin Steigerwald @ 2015-04-14 22:39 ` Jiri Kosina 2015-04-15 8:38 ` Greg Kroah-Hartman 2015-04-15 10:37 ` One Thousand Gnomes 2 siblings, 2 replies; 333+ messages in thread From: Jiri Kosina @ 2015-04-14 22:39 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote: > I don't understand. You can not like the D-Bus model (and accordingly > the X11 model), I thought that the general hatred level of the X11 "model" and the protocol lead to al the efforts to reimplement this properly ... in userspace (for example Wayland, right?). I don't think anyone was ever seriously suggesting "X11 model is broken, so let's push it to kernel" ... ? Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 22:39 ` Jiri Kosina @ 2015-04-15 8:38 ` Greg Kroah-Hartman 2015-04-15 10:37 ` One Thousand Gnomes 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:38 UTC (permalink / raw) To: Jiri Kosina Cc: Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 12:39:22AM +0200, Jiri Kosina wrote: > On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote: > > > I don't understand. You can not like the D-Bus model (and accordingly > > the X11 model), > > I thought that the general hatred level of the X11 "model" and the > protocol lead to al the efforts to reimplement this properly ... in > userspace (for example Wayland, right?). > > I don't think anyone was ever seriously suggesting "X11 model is broken, > so let's push it to kernel" ... ? Ok, fine, it's a broken metaphore, see Havoc's email for why I brought that up here. It's the issue that a stateful bus is required for applications that is the main point I'm trying to get across. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 22:39 ` Jiri Kosina 2015-04-15 8:38 ` Greg Kroah-Hartman @ 2015-04-15 10:37 ` One Thousand Gnomes 2015-04-15 11:49 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 10:37 UTC (permalink / raw) To: Jiri Kosina Cc: Greg Kroah-Hartman, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 00:39:22 +0200 (CEST) Jiri Kosina <jkosina@suse.cz> wrote: > On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote: > > > I don't understand. You can not like the D-Bus model (and accordingly > > the X11 model), > > I thought that the general hatred level of the X11 "model" and the > protocol lead to al the efforts to reimplement this properly ... in > userspace (for example Wayland, right?). > > I don't think anyone was ever seriously suggesting "X11 model is broken, > so let's push it to kernel" ... ? The X11 model is *nothing* to do with the dbus/kdbus model. X11 does properties by attaching them to windows. Those properties can be monitored for changes and they can be queried. Setting them is asynchronous, querying them is sync or with the newer event based libraries can be async. X11 properties are network safe, handled through the same X11 authority as everything else. Two apps can happily run on different systems sharing a display over the network and sharing and responding to changes in X11 properties - and it just works. The Gnome people tried to re-invent X11 properties and embedding badly with CORBA, then with dbus, despite the fact the Andrew system could already do it really fast and cleanly even before Gnome was thought of. There is no comparison between the elegance of X11 property setting and a chunk of proposed kernel code that is half the size of a tiny X server! The dbus model is also flawed in a load of other ways in user space because message handling in the hands of people with no concept of systemic performance analysis just leads to disaster. One of the big reasons dbus is so "slow" isn't that dbus is "slow", it's that the crapware on top of it makes *thousands* of dbus queries. If you must do it in kernel why not use the Android binder - it's awful, broken, and dubiously secure, but at least we'd still only have one awful, broken dubiously secure rpc/property layer in kernel. "It's the issue that a stateful bus is required for applications that is the main point I'm trying to get across." That would be the "if dbus crashes I have to reboot" design flaw of Gnome and friends. The only state you need is beyond the endpoints. It's a message passing system. If you think message passing needs state then I'd take a look at the internet. State belongs in the end points. It's telling that I can lose and recover my internet connection without rebooting but not my desktops internal messaging. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 10:37 ` One Thousand Gnomes @ 2015-04-15 11:49 ` Greg Kroah-Hartman 2015-04-15 12:03 ` One Thousand Gnomes ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 11:49 UTC (permalink / raw) To: One Thousand Gnomes Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:37:27AM +0100, One Thousand Gnomes wrote: > On Wed, 15 Apr 2015 00:39:22 +0200 (CEST) > Jiri Kosina <jkosina@suse.cz> wrote: > > > On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote: > > > > > I don't understand. You can not like the D-Bus model (and accordingly > > > the X11 model), > > > > I thought that the general hatred level of the X11 "model" and the > > protocol lead to al the efforts to reimplement this properly ... in > > userspace (for example Wayland, right?). > > > > I don't think anyone was ever seriously suggesting "X11 model is broken, > > so let's push it to kernel" ... ? > > The X11 model is *nothing* to do with the dbus/kdbus model. X11 does > properties by attaching them to windows. Those properties can be > monitored for changes and they can be queried. Setting them is > asynchronous, querying them is sync or with the newer event based > libraries can be async. X11 properties are network safe, handled through > the same X11 authority as everything else. Two apps can happily run on > different systems sharing a display over the network and sharing and > responding to changes in X11 properties - and it just works. > > The Gnome people tried to re-invent X11 properties and embedding badly > with CORBA, then with dbus, despite the fact the Andrew system could > already do it really fast and cleanly even before Gnome was thought of. > > There is no comparison between the elegance of X11 property setting and a > chunk of proposed kernel code that is half the size of a tiny X server! Hey, take that up with Havoc, he made the comparison :) > The dbus model is also flawed in a load of other ways in user space > because message handling in the hands of people with no concept of > systemic performance analysis just leads to disaster. One of the big > reasons dbus is so "slow" isn't that dbus is "slow", it's that the > crapware on top of it makes *thousands* of dbus queries. There's the issue of thousands of dbus queries, and then there's the issue that making those queries takes a measurable amount of time. We can fix the later one, the first one, well, not so much, but we can provide the resources for them to make a faster system if they want to. > If you must do it in kernel why not use the Android binder - it's awful, > broken, and dubiously secure, but at least we'd still only have one awful, > broken dubiously secure rpc/property layer in kernel. Binder does not match up to the dbus model at all, I've written about this in the past, and can dig it up again if you want. And, there is active research in moving the binder userspace library onto the kdbus code base, allowing the binder kernel driver to be removed one day. That would be a good thing to have happen, but I'm not holding my breath for it. Using it the other way around isn't going to work. > "It's the issue that a stateful bus is required for > applications that is the main point I'm trying to get across." > > That would be the "if dbus crashes I have to reboot" design flaw of > Gnome and friends. The only state you need is beyond the endpoints. It's a > message passing system. If you think message passing needs state then I'd > take a look at the internet. State belongs in the end points. The internet model with state in the endpoints doesn't always transfer properly to local applications, see Havoc's email for the details about that. > It's telling that I can lose and recover my internet connection without > rebooting but not my desktops internal messaging. Yes, as those are totally different things, let's not mix the issue up here please. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:49 ` Greg Kroah-Hartman @ 2015-04-15 12:03 ` One Thousand Gnomes 2015-04-15 12:41 ` Greg Kroah-Hartman 2015-04-15 12:55 ` Al Viro 2015-04-15 17:33 ` Steven Rostedt 2 siblings, 1 reply; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 12:03 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni > > There is no comparison between the elegance of X11 property setting and a > > chunk of proposed kernel code that is half the size of a tiny X server! > > Hey, take that up with Havoc, he made the comparison :) And it concerns me you blindly repeat it without realising its wrong. > > The dbus model is also flawed in a load of other ways in user space > > because message handling in the hands of people with no concept of > > systemic performance analysis just leads to disaster. One of the big > > reasons dbus is so "slow" isn't that dbus is "slow", it's that the > > crapware on top of it makes *thousands* of dbus queries. > > There's the issue of thousands of dbus queries, and then there's the > issue that making those queries takes a measurable amount of time. We > can fix the later one, the first one, well, not so much, but we can > provide the resources for them to make a faster system if they want to. If you fix the thousands of queries problem do you need kernel help at all. > The internet model with state in the endpoints doesn't always transfer > properly to local applications, see Havoc's email for the details about > that. URL ? (note how beautifully btw the stateless network and the URL string will become a reference to state) > > It's telling that I can lose and recover my internet connection without > > rebooting but not my desktops internal messaging. > > Yes, as those are totally different things, let's not mix the issue up > here please. They are *NOT* different things. They are fundamental properties of the underlying architecture. I worked on stateful networks and still have the scars. It is a fundamental property of stateful network that every time any key component goes castors up you lose the lot. It is a fairly fundamental property of stateless networks that equipment going castors up has no material impact on the network The internet is built upon three fundamental breakthroughs in technology - That stateless networks scale and can be reliable while stateful ones cannot scale and cannot be fixed to do so - That flow control is possible over a stateless network - That efficient data routing is possible over a stateless network Those are absolutely critical parts of any network or messaging implementation. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 12:03 ` One Thousand Gnomes @ 2015-04-15 12:41 ` Greg Kroah-Hartman 2015-04-15 14:06 ` One Thousand Gnomes 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 12:41 UTC (permalink / raw) To: One Thousand Gnomes Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:03:54PM +0100, One Thousand Gnomes wrote: > > > There is no comparison between the elegance of X11 property setting and a > > > chunk of proposed kernel code that is half the size of a tiny X server! > > > > Hey, take that up with Havoc, he made the comparison :) > > And it concerns me you blindly repeat it without realising its wrong. It's a metaphor that makes sense to me given my limited knowledge of the x11 protocol. If it's wrong, ok, I'm willing to learn, but I think it's still relevant here. > > > The dbus model is also flawed in a load of other ways in user space > > > because message handling in the hands of people with no concept of > > > systemic performance analysis just leads to disaster. One of the big > > > reasons dbus is so "slow" isn't that dbus is "slow", it's that the > > > crapware on top of it makes *thousands* of dbus queries. > > > > There's the issue of thousands of dbus queries, and then there's the > > issue that making those queries takes a measurable amount of time. We > > can fix the later one, the first one, well, not so much, but we can > > provide the resources for them to make a faster system if they want to. > > If you fix the thousands of queries problem do you need kernel help at > all. I've worked with developers of such systems, and no, they can't fix that problem. They are using "legacy" applications that they have to run on some type of operating system, and really don't want to use legacy operating systems anymore. Those "legacy" oses provide a system bus that allows them to send thousands of queries just fine, but when moving to Linux, we don't have anything other than D-Bus, so their library is ported to use it, and they have to handle their old applications that need/want the zillions of messages. Then they thow the thing on a very underpowered ARM processor and complain about boot time being so slow, but that's a different issue... > > The internet model with state in the endpoints doesn't always transfer > > properly to local applications, see Havoc's email for the details about > > that. > > URL ? > > (note how beautifully btw the stateless network and the URL string will > become a reference to state) Heh, yes, but there's very little state here: http://lists.freedesktop.org/archives/dbus/2015-April/016651.html There's also a follow-on message from the current D-Bus maintainer: http://lists.freedesktop.org/archives/dbus/2015-April/016653.html > > > It's telling that I can lose and recover my internet connection without > > > rebooting but not my desktops internal messaging. > > > > Yes, as those are totally different things, let's not mix the issue up > > here please. > > They are *NOT* different things. They are fundamental properties of the > underlying architecture. I worked on stateful networks and still have > the scars. It is a fundamental property of stateful network that every > time any key component goes castors up you lose the lot. It is a fairly > fundamental property of stateless networks that equipment going castors > up has no material impact on the network > > The internet is built upon three fundamental breakthroughs in technology > > - That stateless networks scale and can be reliable while stateful ones > cannot scale and cannot be fixed to do so > > - That flow control is possible over a stateless network > > - That efficient data routing is possible over a stateless network > > Those are absolutely critical parts of any network or messaging > implementation. People take those stateless models and build stateful ones on top of them, yes, it's great. But you still need a stateful model somewhere in order to be able to achieve many things (think a shopping cart application). Anyway, this is getting off-topic, there is very little "state" in the kdbus kernel code here, other than a naming database that Havoc and Simon explain the need for, and the normal lifecycle of kdbus "nodes" (new, linked, active, inactive, drained, freed). thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 12:41 ` Greg Kroah-Hartman @ 2015-04-15 14:06 ` One Thousand Gnomes 2015-04-15 16:27 ` Havoc Pennington 0 siblings, 1 reply; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 14:06 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni > operating systems anymore. Those "legacy" oses provide a system bus > that allows them to send thousands of queries just fine, but when moving > to Linux, we don't have anything other than D-Bus, so their library is > ported to use it, and they have to handle their old applications that > need/want the zillions of messages. And if you look at those systems btw many of them have a very compact, very clean very simple message passing interface,often in the hundreds not tens of thousands of lines of code. > People take those stateless models and build stateful ones on top of > them, yes, it's great. But you still need a stateful model somewhere in > order to be able to achieve many things (think a shopping cart > application). We put the IP stack in the kernel not the shopping cart. A good shopping cart of course only has state on the client. > Anyway, this is getting off-topic, there is very little "state" in the > kdbus kernel code here, other than a naming database that Havoc and > Simon explain the need for, and the normal lifecycle of kdbus "nodes" > (new, linked, active, inactive, drained, freed). I'm not convinced the naming data belongs in kernel beyond the simplest of "node 147". I'd offer a sort of proof by armwaving of this that if you have /dev/dbus/014 /dev/dbus/027 etc you can add a symlink to /dev/dbus/014 of /dev/dbus-by-name/gnome-wombat-grooming-daemon or whatever and we do that today for every other naming database and static allocation we've spent the past 15 years evicting from the kernel. That state isn't then held in a daemon that can crash nor is it invisible to debuggers, user tools and admins. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 14:06 ` One Thousand Gnomes @ 2015-04-15 16:27 ` Havoc Pennington 0 siblings, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-15 16:27 UTC (permalink / raw) To: One Thousand Gnomes Cc: Greg Kroah-Hartman, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Hi, I'm temporarily joining the list if anyone has questions about why dbus was originally the way it is. If you would like answers about its latest usage, systemd, or the kernel implementation, those are best answered by others. I "led" the original design but I was hardly the only person involved. I was sort of synthesizing previous efforts, lots of ideas from other people, and mediating the politics of the time. What I'd like to see in this conversation is: understanding what exists, and why it exists. If people understand that then I think they can make good decisions, using whatever process or timeline you like; I don't pretend to know much about kdbus, but I see a lot of confusion here about the use-case and design of dbus itself. No one should take the design on faith. To improve and maintain something it must be understood. Why should you bother to understand dbus as it exists? It's pretty successful, and I think for a reason. Hundreds of programs are using dbus, it's become (over a decade) foundational to the most-used Linux userspaces, there are many different implementations of it, and it's been quite a stable design over that time without any major changes. I don't think that's because it's perfect; I do think it's because some things are right, in ways that previous designs were not. The Linux userspace community went through a lot of alternatives before dbus, and dbus was the one that lasted. The worst-case scenario in my mind would be for the kernel to merge something dbus-like, but with ill-informed changes that render it worse. Then you would have a new ABI that nobody wants to use. We have a design in the wild that's been very successful. People using it for its intended use-case seem to like it. Step 1 is to try to understand why that is. I will try to give my take on some of the reasons. I can't emphasize enough that the success of dbus was *because of* many "obvious" criticisms people may have. Why? Tradeoffs. Given infinite time and resources, many of those tradeoffs can be mitigated or avoided - and I see kdbus as part of an effort to do so. The first and most important tradeoff: the central daemon (the hub in the wheel). A central daemon has several disadvantages. The success of dbus happened because those disadvantages, in this context, are not as important as the advantages. The advantages include: * ability to send a broadcast message to all interested processes * tracking/discovering well-known and unique names * crossing security domains (system-daemon-to-per-user-UIs, in particular) in an orderly fashion * reducing the number of file descriptors needed for N apps to all talk to each other * relatively simple model for application developers to get right The disadvantages include: * performance (extra context switches, copies, and validations) * it's difficult to handle killing/restarting the central daemon; dbus actually gives clients all the tools to do this, but in practice if you restart the daemon you are gambling that a hundred clients connected to it have implemented bug-free restart handling. * not a distributed cluster (it's a single bottleneck and point of failure running on a single machine - the daemon is a source of truth, which is also its virtue of course) For dbus to be as useful as it has been, these disadvantages, while not desirable, were acceptable tradeoffs. So it would be a mistake to solve any of these disadvantages by breaking the advantages. Message passing or IPC isn't really the most important part of dbus. Process lifecycle tracking and discovery are more important. However, by integrating the IPC system with the lifecycle tracking you can simplify the overall system and avoid race conditions. For example, you can have processes that auto-launch race-free when you send them a message, or more generally you can have an ordering between lifecycle events and other messages. For example if I send out a broadcast message and then disconnect, other clients will see first the broadcast and then the disconnect and won't have to handle the out-of-order case. dbus has a lot of semantic guarantees, such as message ordering, that reduce application complexity and therefore reduce code and reduce bugs. When implementing a Linux workstation userspace, ideally you have lots of little processes that do one thing each; but the tradeoff is that multi-process adds complexity. If your model for a multi-process program is that it has to solve a lot of hard distributed system problems, then it adds a LOT of complexity. But when everyone's on a single machine, it is not necessary to solve (all of) those problems, and in fact trying to solve non-problems creates bugs by adding tricky, rarely-touched codepaths. It is overengineering to treat "tray icon talking to NetworkManager" the same way you would treat IPC and shared state within a distributed cluster. Multi-process is valuable though; an alternative userspace design could be like Eclipse or Emacs, i.e. one enormous process with plugins, which would be a mess. There was some debate over my X11 analogy. One of the "thought experiments" while figuring out dbus was "why does CORBA seem to be at the root of endless bug reports, while X11 isn't?" Here are some things I think dbus has in common with X11: * it's a hub-and-spoke design (a central server that all apps connect to) rather than a design where every process talks directly to every other process * dbus names are directly modeled on X selections (see ICCCM) * designed to allow race-free asynchronous usage and minimize the need for round trips (though apps can certainly design bad APIs, see http://dbus.freedesktop.org/doc/dbus-api-design.html for advice on avoiding that) * binary protocol rather than text * generally assumes a reliable network - assumes all messages will arrive, as long as the connection is live * similar model for discovering and authenticating to the server * allows clients to track each other's lifecycle * it is stateful; clients connect, fetch the current state, then track changes to the state via events. Some differences from X11 of course: * X11 is a domain-specific server (about sharing the graphics and input hardware among multiple clients), while with dbus the domain-specific API will be in some client and the bus is only an intermediary. * X11 therefore has a bunch more server state than dbus; dbus only has to track clients, not track the state of the window system. * IPC on X11 is sort of bolted on in an ugly way (client messages) while dbus cleanly maps to the OO model people are used to in the rest of their code. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:49 ` Greg Kroah-Hartman 2015-04-15 12:03 ` One Thousand Gnomes @ 2015-04-15 12:55 ` Al Viro 2015-04-15 17:33 ` Steven Rostedt 2 siblings, 0 replies; 333+ messages in thread From: Al Viro @ 2015-04-15 12:55 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: One Thousand Gnomes, Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote: > > There is no comparison between the elegance of X11 property setting and a > > chunk of proposed kernel code that is half the size of a tiny X server! > > Hey, take that up with Havoc, he made the comparison :) Let me get it straight - you swing the reference to his posting as damn nearly the main argument, and yet you make _this_ reply when it gets questioned? Seriously? There's nothing wrong with "go read $PAPER, a lot of your questions are addressed there", but only if you are ready to answer the questions and objections from those who have read it. "Hey, take that up with $AUTHOR" doesn't cut it; try anything even remotely similar with e.g. reviewers of academic paper and see where it ends up. Havoc isn't submitting that thing; you are. If you are not qualified to defend your design and he is, try to talk him into doing that. Frankly, the longer it goes, the less I like the picture. It will be up to Linus, of course, but IMO the whole situation seriously stinks. ;-/ ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:49 ` Greg Kroah-Hartman 2015-04-15 12:03 ` One Thousand Gnomes 2015-04-15 12:55 ` Al Viro @ 2015-04-15 17:33 ` Steven Rostedt 2015-04-15 18:11 ` Greg Kroah-Hartman 2 siblings, 1 reply; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 17:33 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote: > > There's the issue of thousands of dbus queries, and then there's the > issue that making those queries takes a measurable amount of time. We > can fix the later one, the first one, well, not so much, but we can > provide the resources for them to make a faster system if they want to. I'll argue that you can't fix the later one. One thing that I've observed over the years of having faster computers is, as soon as you make it faster, people will write slower software. Currently the issue is that we have thousands of dbus queries, you make dbus 10x faster, I guarantee that people will write software with 10 thousand dbus queries and we are no better off than we are today. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:33 ` Steven Rostedt @ 2015-04-15 18:11 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 18:11 UTC (permalink / raw) To: Steven Rostedt Cc: One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote: > On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote: > > > > There's the issue of thousands of dbus queries, and then there's the > > issue that making those queries takes a measurable amount of time. We > > can fix the later one, the first one, well, not so much, but we can > > provide the resources for them to make a faster system if they want to. > > I'll argue that you can't fix the later one. One thing that I've observed over > the years of having faster computers is, as soon as you make it faster, people > will write slower software. > > Currently the issue is that we have thousands of dbus queries, you make dbus > 10x faster, I guarantee that people will write software with 10 thousand dbus > queries and we are no better off than we are today. Then they get to buy a faster machine :) ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:23 ` Greg Kroah-Hartman 2015-04-14 19:24 ` Borislav Petkov @ 2015-04-14 19:35 ` Al Viro 2015-04-14 19:43 ` Greg Kroah-Hartman 2015-04-14 20:14 ` John Stoffel 2 siblings, 1 reply; 333+ messages in thread From: Al Viro @ 2015-04-14 19:35 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: > > I agree. You've sent a pull request for an unfortunate design. I > > don't think that unfortunate design belongs in the kernel. If it says > > in userspace, then user programmers could potentially fix it some day. > > You might not like the design, but it is a valid design. Again, we > don't refuse to support hardware that is designed badly. Or support > protocols we don't necessarily like, that's not the job of a kernel or > operating system. Bullshit. The problem you seem to deliberately ignore is that once it's in the kernel, it's impossible to eradicate. It's not just a crap design, it's a crap design you are taking in as-is. And no, "the sole consumer of that API knows better, so bend over" is not a good idea. We have shitloads of examples when single-consumer APIs turned into screaming horrors; taking that in over the objections to API design, merely on "they do it that way, who the hell we are to say they are wrong?" is insane. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:35 ` Al Viro @ 2015-04-14 19:43 ` Greg Kroah-Hartman 2015-04-15 17:59 ` Austin S Hemmelgarn 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-14 19:43 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote: > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: > > > > I agree. You've sent a pull request for an unfortunate design. I > > > don't think that unfortunate design belongs in the kernel. If it says > > > in userspace, then user programmers could potentially fix it some day. > > > > You might not like the design, but it is a valid design. Again, we > > don't refuse to support hardware that is designed badly. Or support > > protocols we don't necessarily like, that's not the job of a kernel or > > operating system. > > Bullshit. The problem you seem to deliberately ignore is that once it's > in the kernel, it's impossible to eradicate. It's not just a crap design, > it's a crap design you are taking in as-is. It is not a crap design. Go read the link I provided. Havoc points out exactly why the design is the way it is, for very valid reasons. It's actually much like X11 is as well, but not like "normal" IP connections at all. > And no, "the sole consumer of that API knows better, so bend over" is not > a good idea. We have shitloads of examples when single-consumer APIs > turned into screaming horrors; taking that in over the objections to API > design, merely on "they do it that way, who the hell we are to say they > are wrong?" is insane. Again, in this domain, the design is sound. So much so that everyone who works in that area moved toward it (KDE, Qt, Go, etc.) We might not think it makes sense, and it did take me a while to wrap my head around it, but to call it "crap" is unfair, sorry. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:43 ` Greg Kroah-Hartman @ 2015-04-15 17:59 ` Austin S Hemmelgarn 2015-04-15 18:04 ` Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-15 17:59 UTC (permalink / raw) To: Greg Kroah-Hartman, Al Viro Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni [-- Attachment #1: Type: text/plain, Size: 1499 bytes --] On 2015-04-14 15:43, Greg Kroah-Hartman wrote: > On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote: >> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: >> >>>> I agree. You've sent a pull request for an unfortunate design. I >>>> don't think that unfortunate design belongs in the kernel. If it says >>>> in userspace, then user programmers could potentially fix it some day. >>> >>> You might not like the design, but it is a valid design. Again, we >>> don't refuse to support hardware that is designed badly. Or support >>> protocols we don't necessarily like, that's not the job of a kernel or >>> operating system. >> >> And no, "the sole consumer of that API knows better, so bend over" is not >> a good idea. We have shitloads of examples when single-consumer APIs >> turned into screaming horrors; taking that in over the objections to API >> design, merely on "they do it that way, who the hell we are to say they >> are wrong?" is insane. > > Again, in this domain, the design is sound. So much so that everyone > who works in that area moved toward it (KDE, Qt, Go, etc.) We might not > think it makes sense, and it did take me a while to wrap my head around > it, but to call it "crap" is unfair, sorry. > The reason that 'everyone who works in this area' adopted is not as much that the design is sound (I'm not arguing whether it is or isn't in this case) as it is that none of them could come up with anything better. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:59 ` Austin S Hemmelgarn @ 2015-04-15 18:04 ` Rik van Riel 2015-04-15 22:22 ` One Thousand Gnomes 2015-04-21 16:54 ` Diego Viola 2 siblings, 0 replies; 333+ messages in thread From: Rik van Riel @ 2015-04-15 18:04 UTC (permalink / raw) To: Austin S Hemmelgarn, Greg Kroah-Hartman, Al Viro Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/15/2015 01:59 PM, Austin S Hemmelgarn wrote: > On 2015-04-14 15:43, Greg Kroah-Hartman wrote: >> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote: >>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: >>> >>>>> I agree. You've sent a pull request for an unfortunate design. I >>>>> don't think that unfortunate design belongs in the kernel. If it says >>>>> in userspace, then user programmers could potentially fix it some day. >>>> >>>> You might not like the design, but it is a valid design. Again, we >>>> don't refuse to support hardware that is designed badly. Or support >>>> protocols we don't necessarily like, that's not the job of a kernel or >>>> operating system. >>> >>> And no, "the sole consumer of that API knows better, so bend over" is >>> not >>> a good idea. We have shitloads of examples when single-consumer APIs >>> turned into screaming horrors; taking that in over the objections to API >>> design, merely on "they do it that way, who the hell we are to say they >>> are wrong?" is insane. >> >> Again, in this domain, the design is sound. So much so that everyone >> who works in that area moved toward it (KDE, Qt, Go, etc.) We might not >> think it makes sense, and it did take me a while to wrap my head around >> it, but to call it "crap" is unfair, sorry. >> > > The reason that 'everyone who works in this area' adopted is not as much > that the design is sound (I'm not arguing whether it is or isn't in this > case) as it is that none of them could come up with anything better. They are smart people, and I would not underestimate the usefulness of the user space API (above the dbus library) that they came up with. That does not mean the actual in-kernel implementation needs to follow the same design criteria. It may make sense to have part of the implementation in kernel space, part in user space, and allow the userspace part to be switched out to accommodate other protocols over the same in-kernel bus... Moving some of the policy bits into a user space daemon may make sense. Storing messages that cannot be delivered right now in user space could make sense, too. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:59 ` Austin S Hemmelgarn 2015-04-15 18:04 ` Rik van Riel @ 2015-04-15 22:22 ` One Thousand Gnomes 2015-04-16 16:02 ` Havoc Pennington 2015-04-16 16:37 ` Robert Schwebel 2015-04-21 16:54 ` Diego Viola 2 siblings, 2 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 22:22 UTC (permalink / raw) To: Austin S Hemmelgarn Cc: Greg Kroah-Hartman, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni > The reason that 'everyone who works in this area' adopted is not as much > that the design is sound (I'm not arguing whether it is or isn't in this > case) as it is that none of them could come up with anything better. Actually most message passing code uses things like JMS and the various MQ libraries. Most IoT uses things other than dbus, small deep embedded never uses dbus. In the desktop space dbus wins because its very very easy to use and by network effects. Everything else related already talks via dbus, so you are going to have to talk dbus anyway to get anything done. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:22 ` One Thousand Gnomes @ 2015-04-16 16:02 ` Havoc Pennington 2015-04-16 17:31 ` David Herrmann 2015-04-16 16:37 ` Robert Schwebel 1 sibling, 1 reply; 333+ messages in thread From: Havoc Pennington @ 2015-04-16 16:02 UTC (permalink / raw) To: One Thousand Gnomes Cc: Austin S Hemmelgarn, Greg Kroah-Hartman, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 6:22 PM, One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: > Actually most message passing code uses things like JMS and the various > MQ libraries. Most IoT uses things other than dbus, small deep embedded > never uses dbus. fwiw, to me it's a mistake to think of dbus as "the same space" as something like JMS, or even small deep embedded uses. The use cases and appropriate tradeoffs are different enough that it's hard for me to think about them as one thing. If different uses can share some common kernel mechanisms then great, but one does have to be careful about one-size-fits-all-actually-fits-nobody. > In the desktop space dbus wins because its very very easy to use and by > network effects. Everything else related already talks via dbus, so you > are going to have to talk dbus anyway to get anything done. You may agree with me, but to me "easy to use" is necessary to dbus's utility - it's not a cosmetic feature. I was on the receiving end of the Linux desktop bug firehose both pre-dbus and post-dbus, and having IPC that's easy to use *correctly* means there are fewer bugs in that firehose. At least, fewer bugs caused by IPC. Of course, the thing that needs to be easy is the library API; it's OK if an underlying kernel API is hard, as long as it gives the library developers what they need to implement the easier API. It is OK to push complexity onto userspace, but it's a mistake to push it onto apps (as opposed to libraries that can be gotten right once for all apps). If you push complexity onto apps you get buggier apps, because application developers are experts in their app domain but aren't experts in every underlying platform feature. Why is dbus relatively easy to use? Some important pieces: - the semantic guarantees such as ordering that we've already mentioned - completeness - solves locating and tracking other processes, solves both unicast and broadcast, etc. - defines a mapping to objects-with-methods OO model Can it be even better - for sure. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 16:02 ` Havoc Pennington @ 2015-04-16 17:31 ` David Herrmann 2015-04-16 20:55 ` Al Viro 0 siblings, 1 reply; 333+ messages in thread From: David Herrmann @ 2015-04-16 17:31 UTC (permalink / raw) To: Al Viro Cc: Greg Kroah-Hartman, Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi On Wed, Apr 15, 2015 at 2:36 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote: > >> I've asked for it, but finding people to review code is hard, as you >> know. It's only 13k lines long, smaller than a serial port driver (my >> unit of code review), so it's not all that big. >> >> It's smaller than the USB3 host controller driver as well, and very few >> people ever reviewed that beast :) >> >> > For something that's potentially such a core mechanism as a completely >> > new, massively-adopted IPC, this does send a warning singal. >> >> If you know of a way to force others to review code, please let me know. > > Have it in a less nasty state, perhaps? Random question: > > al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock > ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock); > ipc/kdbus/node.c:340: down_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:344: up_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:444: down_write(&kdbus_node_idr_lock); > ipc/kdbus/node.c:452: up_write(&kdbus_node_idr_lock); As Greg said, this is a leftover from times we actually needed a lookup here. Nice catch, I have a local patch to convert the whole IDR into an IDA and drop the lock entirely (like kernfs does right now, for kernfs_node->ino). > Do you see anything wrong with that? Or with things like that: > mutex_lock(&pos->lock); > v_pre = atomic_read(&pos->active); > if (v_pre >= 0) > atomic_add_return(KDBUS_NODE_BIAS, &pos->active); > else if (v_pre == KDBUS_NODE_NEW) > atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT); > mutex_unlock(&pos->lock); > What are the locking rules for ->active/->waitq/->lock? Are those the > outermost thing in the hierarchy? Or is that dependent on the node location? > It sure as hell is outside of (at least) ->mmap_sem (by way of > kdbus_conn_connect() establishing that ->active/->waitq is outside of > ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything > taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone > brings in a lot... I'm working on patches to add more comments similar to how we did in node.c. For now, please see my explanations below: node->lock is the _innermost_ lock. node->active implements revoke support for nodes. It follows what kernfs->active does and isn't a lock in particular. We kinda treat it as rwsem, where down_write() is the outer-most lock in kdbus and _only_ called without any other lock held (kdbus_node_deactivate()). Read-side, we never ever block on the "lock", but only use try-lock. If it fails, the node is dead/revoked. Therefore, the read-side of 'active' nests almost arbitrarily. We hold 'active'-references almost everywhere, to make sure a node is not destroyed while we use it. However, we never sleep for an indefinite time while holding it. Given that the write-side is the outer-most lock in kdbus, it doesn't dead-lock against the try-lock readers. > Document your goddamn locking, would you? It *IS* new code, and you, as you > say, had very few people working on it, so you don't have the excuses for > the mess existing in older parts of the tree. Locking order (outer-most to inner-most): 1) domain->lock 2) names->rwlock 3) endpoint->lock 4) bus->conn_rwlock 5) policy->entries_rwlock 6) connection->lock 7) metadata->lock mmap_sem nests below metadata->lock. With the rcu-protected exe_file patches by Davidlohr Bueso, we can even drop that dependency. They have kinda stalled, though. Then we have a bunch of data structure protection, which can be called from any context: * bus->notify_lock * pool->lock * match->mdb_rwlock * node->lock Lastly, there're 2 locks which nest around everything and must not be taken with any lock held: * handle->rwlock (taken in ioctl-entry) * bus->notify_flush_lock (taken in work-queue) General object stacking is: domain -> bus -> endpoint -> policy -> connection -> {metadata,pool,match,node} The conn_rwlock protection of the conn-list locks on kdbus_bus is the only lock that doesn't follow this ordering. Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 17:31 ` David Herrmann @ 2015-04-16 20:55 ` Al Viro 2015-04-18 11:44 ` David Herrmann 0 siblings, 1 reply; 333+ messages in thread From: Al Viro @ 2015-04-16 20:55 UTC (permalink / raw) To: David Herrmann Cc: Greg Kroah-Hartman, Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni On Thu, Apr 16, 2015 at 07:31:22PM +0200, David Herrmann wrote: > I'm working on patches to add more comments similar to how we did in > node.c. For now, please see my explanations below: > > node->lock is the _innermost_ lock. > node->active implements revoke > support for nodes. It follows what kernfs->active does and isn't a > lock in particular. We kinda treat it as rwsem, where down_write() is > the outer-most lock in kdbus and _only_ called without any other lock > held (kdbus_node_deactivate()). Read-side, we never ever block on the > "lock", but only use try-lock. If it fails, the node is dead/revoked. > Therefore, the read-side of 'active' nests almost arbitrarily. We hold > 'active'-references almost everywhere, to make sure a node is not > destroyed while we use it. However, we never sleep for an indefinite > time while holding it. Umm... Theoretically, but ->mmap_sem being under it means that it might involve something like an NFS server timing out, so the latency might suck very badly. > Given that the write-side is the outer-most lock in kdbus, it doesn't > dead-lock against the try-lock readers. Huh? I see at least this call chain: kdbus_handle_ioctl_control() kdbus_node_acquire() kdbus_cmd_bus_make() kdbus_node_deactivate() Granted, it won't be the _same_ node (otherwise you'd deadlock solid right there and then), but it means that your locking order is sensitive to something about nodes; it's not entirely determined by the lock type. > Locking order (outer-most to inner-most): > 1) domain->lock > 2) names->rwlock > 3) endpoint->lock > 4) bus->conn_rwlock > 5) policy->entries_rwlock > 6) connection->lock > 7) metadata->lock > > mmap_sem nests below metadata->lock. With the rcu-protected exe_file > patches by Davidlohr Bueso, we can even drop that dependency. They > have kinda stalled, though. > > Then we have a bunch of data structure protection, which can be called > from any context: > * bus->notify_lock > * pool->lock > * match->mdb_rwlock > * node->lock > > Lastly, there're 2 locks which nest around everything and must not be > taken with any lock held: > * handle->rwlock (taken in ioctl-entry) as well as in ->poll(), for completeness sake. The latter, BTW, isn't nice - kdbus is far from being the only thing that does it, but having ->poll() block can be somewhat surprising... > * bus->notify_flush_lock (taken in work-queue) Hmm... That needs some care - it means that it nests inside anything held by callers of cancel_delayed_work_sync() on the corresponding work. AFAICS, there's at least one call chain leading to that from kdbus_node_deactivate() (via ->release_cb == kdbus_ep_release -> kdbus_conn_disconnect -> cancel_delayed_work_sync(&conn->work)) wait for kdbus_reply_list_scan_work -> kdbus_notify_flush grabs ->notify_flush_lock). Tracking back further is harder - not all call sites of kdbus_node_deactivate() can lead to that... BTW, it's not only done in wq callbacks - there's a direct chain from kdbus_conn_disconnect() as well (both through kdbus_name_release_all -> kdbus_notify_flush and directly through kdbus_notify_flush()). And from ioctl(), by many paths, while we are at it, but that only means that it nests inside handle->rwlock, and _that_ is really the outermost. What nests inside that one? It definitely a part of hierarchy - it can't be excluded from deadlock analysis as effectively outermost. As for the stuff under it... registry->rwlock is obvious, what else? > General object stacking is: > domain -> bus -> endpoint -> policy -> connection -> {metadata,pool,match,node} > The conn_rwlock protection of the conn-list locks on kdbus_bus is the > only lock that doesn't follow this ordering. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 20:55 ` Al Viro @ 2015-04-18 11:44 ` David Herrmann 0 siblings, 0 replies; 333+ messages in thread From: David Herrmann @ 2015-04-18 11:44 UTC (permalink / raw) To: Al Viro Cc: Greg Kroah-Hartman, Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni Hi On Thu, Apr 16, 2015 at 10:55 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Apr 16, 2015 at 07:31:22PM +0200, David Herrmann wrote: > >> I'm working on patches to add more comments similar to how we did in >> node.c. For now, please see my explanations below: >> >> node->lock is the _innermost_ lock. >> node->active implements revoke >> support for nodes. It follows what kernfs->active does and isn't a >> lock in particular. We kinda treat it as rwsem, where down_write() is >> the outer-most lock in kdbus and _only_ called without any other lock >> held (kdbus_node_deactivate()). Read-side, we never ever block on the >> "lock", but only use try-lock. If it fails, the node is dead/revoked. >> Therefore, the read-side of 'active' nests almost arbitrarily. We hold >> 'active'-references almost everywhere, to make sure a node is not >> destroyed while we use it. However, we never sleep for an indefinite >> time while holding it. > > Umm... Theoretically, but ->mmap_sem being under it means that it might > involve something like an NFS server timing out, so the latency might > suck very badly. Fixed! [1] Linus just pulled akpm#3, which includes the rcu-protection for exe-file. No more direct mmap_sem access in kdbus, anymore. >> Given that the write-side is the outer-most lock in kdbus, it doesn't >> dead-lock against the try-lock readers. > > Huh? I see at least this call chain: > kdbus_handle_ioctl_control() > kdbus_node_acquire() > kdbus_cmd_bus_make() > kdbus_node_deactivate() > Granted, it won't be the _same_ node (otherwise you'd deadlock solid > right there and then), but it means that your locking order is sensitive > to something about nodes; it's not entirely determined by the lock type. Indeed. We do allow pinning parent objects when deactivating its children. I updated my doc-drafts accordingly. >> Locking order (outer-most to inner-most): >> 1) domain->lock >> 2) names->rwlock >> 3) endpoint->lock >> 4) bus->conn_rwlock >> 5) policy->entries_rwlock >> 6) connection->lock >> 7) metadata->lock >> >> mmap_sem nests below metadata->lock. With the rcu-protected exe_file >> patches by Davidlohr Bueso, we can even drop that dependency. They >> have kinda stalled, though. >> >> Then we have a bunch of data structure protection, which can be called >> from any context: >> * bus->notify_lock >> * pool->lock >> * match->mdb_rwlock >> * node->lock >> >> Lastly, there're 2 locks which nest around everything and must not be >> taken with any lock held: >> * handle->rwlock (taken in ioctl-entry) > > as well as in ->poll(), for completeness sake. The latter, BTW, isn't > nice - kdbus is far from being the only thing that does it, but having > ->poll() block can be somewhat surprising... I have a patch to fix this [2]. But it's more complex than the rwsem, and requires some more review. However, it reduces the handle-locking to a minimum, such that we only lock it during setup and can reduce it to a mutex. >> * bus->notify_flush_lock (taken in work-queue) > > Hmm... That needs some care - it means that it nests inside anything held > by callers of cancel_delayed_work_sync() on the corresponding work. AFAICS, > there's at least one call chain leading to that from kdbus_node_deactivate() > (via ->release_cb == kdbus_ep_release -> kdbus_conn_disconnect -> > cancel_delayed_work_sync(&conn->work)) wait for kdbus_reply_list_scan_work > -> kdbus_notify_flush grabs ->notify_flush_lock). Tracking back further is > harder - not all call sites of kdbus_node_deactivate() can lead to that... > > BTW, it's not only done in wq callbacks - there's a direct chain from > kdbus_conn_disconnect() as well (both through kdbus_name_release_all -> > kdbus_notify_flush and directly through kdbus_notify_flush()). And from > ioctl(), by many paths, while we are at it, but that only means that it > nests inside handle->rwlock, and _that_ is really the outermost. Sorry, this was a mistake on my side. We do call kdbus_notify_flush() directly quite often. And it nests underneath the handle, correct. I noted this down. I did have patches to actually move the kdbus_notify_flush() call to the end of kdbus_handle_ioctl() and friends. Such so we flush all collected notifications on return to user-space, which would make the locking more obvious. However, it didn't make it much simpler, imo, so it was never applied. > What nests inside that one? It definitely a part of hierarchy - it can't > be excluded from deadlock analysis as effectively outermost. As for the > stuff under it... registry->rwlock is obvious, what else? (Updated) Data-structure locks: * bus->notify_lock * pool->lock * match->mdb_rwlock * node->lock Updated locking order: 1) handle->rwlock 2) bus->notify_flush_lock 3) domain->lock 4) names->rwlock 5) endpoint->lock 6) bus->conn_rwlock 7) policy->entries_rwlock 8) connection->lock 9) metadata->lock * node->active read-side locks arbitrarily underneath handle->rwlock. * node->active write-side nests underneath handle->rwlock, and underneath read-side of any parent-node->active. Thanks! Much appreciated! David [1] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=f396c12ecfda1717e5f76d6b4ab11e4db232e60d [2] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=61875e1abd38a965c9f7dfca28068dd0a871961c ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:22 ` One Thousand Gnomes 2015-04-16 16:02 ` Havoc Pennington @ 2015-04-16 16:37 ` Robert Schwebel 2015-04-17 13:45 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: Robert Schwebel @ 2015-04-16 16:37 UTC (permalink / raw) To: One Thousand Gnomes Cc: Austin S Hemmelgarn, Greg Kroah-Hartman, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 11:22:18PM +0100, One Thousand Gnomes wrote: > > The reason that 'everyone who works in this area' adopted is not as much > > that the design is sound (I'm not arguing whether it is or isn't in this > > case) as it is that none of them could come up with anything better. > > Actually most message passing code uses things like JMS and the various > MQ libraries. Most IoT uses things other than dbus, small deep embedded > never uses dbus. For what it's worth: we more and more use dbus for small deep embedded systems, IoT, loosely coupled industrial control applications etc. rsc -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 16:37 ` Robert Schwebel @ 2015-04-17 13:45 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-17 13:45 UTC (permalink / raw) To: Robert Schwebel Cc: One Thousand Gnomes, Austin S Hemmelgarn, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 16, 2015 at 06:37:45PM +0200, Robert Schwebel wrote: > On Wed, Apr 15, 2015 at 11:22:18PM +0100, One Thousand Gnomes wrote: > > > The reason that 'everyone who works in this area' adopted is not as much > > > that the design is sound (I'm not arguing whether it is or isn't in this > > > case) as it is that none of them could come up with anything better. > > > > Actually most message passing code uses things like JMS and the various > > MQ libraries. Most IoT uses things other than dbus, small deep embedded > > never uses dbus. > > For what it's worth: we more and more use dbus for small deep embedded > systems, IoT, loosely coupled industrial control applications etc. Thanks for confirming this, I thought I had seen it used in IoT devices already. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 17:59 ` Austin S Hemmelgarn 2015-04-15 18:04 ` Rik van Riel 2015-04-15 22:22 ` One Thousand Gnomes @ 2015-04-21 16:54 ` Diego Viola 2015-04-21 17:06 ` Greg Kroah-Hartman 2 siblings, 1 reply; 333+ messages in thread From: Diego Viola @ 2015-04-21 16:54 UTC (permalink / raw) To: Austin S Hemmelgarn Cc: Greg Kroah-Hartman, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni I'd like to see D-Bus in the kernel (kdbus), if that's going to make D-Bus faster. See this application taking 15 seconds to start just because D-Bus is too slow. https://bugs.kde.org/show_bug.cgi?id=342682 Hopefully kdbus solves problems such as this one. Diego On Wed, Apr 15, 2015 at 2:59 PM, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2015-04-14 15:43, Greg Kroah-Hartman wrote: >> >> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote: >>> >>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote: >>> >>>>> I agree. You've sent a pull request for an unfortunate design. I >>>>> don't think that unfortunate design belongs in the kernel. If it says >>>>> in userspace, then user programmers could potentially fix it some day. >>>> >>>> >>>> You might not like the design, but it is a valid design. Again, we >>>> don't refuse to support hardware that is designed badly. Or support >>>> protocols we don't necessarily like, that's not the job of a kernel or >>>> operating system. >>> >>> >>> And no, "the sole consumer of that API knows better, so bend over" is not >>> a good idea. We have shitloads of examples when single-consumer APIs >>> turned into screaming horrors; taking that in over the objections to API >>> design, merely on "they do it that way, who the hell we are to say they >>> are wrong?" is insane. >> >> >> Again, in this domain, the design is sound. So much so that everyone >> who works in that area moved toward it (KDE, Qt, Go, etc.) We might not >> think it makes sense, and it did take me a while to wrap my head around >> it, but to call it "crap" is unfair, sorry. >> > > The reason that 'everyone who works in this area' adopted is not as much > that the design is sound (I'm not arguing whether it is or isn't in this > case) as it is that none of them could come up with anything better. > ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 16:54 ` Diego Viola @ 2015-04-21 17:06 ` Greg Kroah-Hartman 2015-04-21 17:25 ` Diego Viola 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-21 17:06 UTC (permalink / raw) To: Diego Viola Cc: Austin S Hemmelgarn, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 21, 2015 at 01:54:54PM -0300, Diego Viola wrote: > I'd like to see D-Bus in the kernel (kdbus), if that's going to make > D-Bus faster. > > See this application taking 15 seconds to start just because D-Bus is too slow. > > https://bugs.kde.org/show_bug.cgi?id=342682 > > Hopefully kdbus solves problems such as this one. That bug really doesn't look like it would be solved by kdbus, I don't see a ton of messages being sent as the issue, do you? It seems like something is timing out and then continuing on with the application startup. But, you can try it out, grab the kernel patch, enable it in systemd, and try it for yourself and let us know! thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 17:06 ` Greg Kroah-Hartman @ 2015-04-21 17:25 ` Diego Viola 0 siblings, 0 replies; 333+ messages in thread From: Diego Viola @ 2015-04-21 17:25 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Austin S Hemmelgarn, Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni I'm not exactly sure what the problem is. It might not even be a problem with D-bus, and it's probably a timeout issue as you said. I'll give kdbus a try anyway and report back. Thanks, Diego On Tue, Apr 21, 2015 at 2:06 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Tue, Apr 21, 2015 at 01:54:54PM -0300, Diego Viola wrote: >> I'd like to see D-Bus in the kernel (kdbus), if that's going to make >> D-Bus faster. >> >> See this application taking 15 seconds to start just because D-Bus is too slow. >> >> https://bugs.kde.org/show_bug.cgi?id=342682 >> >> Hopefully kdbus solves problems such as this one. > > That bug really doesn't look like it would be solved by kdbus, I don't > see a ton of messages being sent as the issue, do you? It seems like > something is timing out and then continuing on with the application > startup. > > But, you can try it out, grab the kernel patch, enable it in systemd, > and try it for yourself and let us know! > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 19:23 ` Greg Kroah-Hartman 2015-04-14 19:24 ` Borislav Petkov 2015-04-14 19:35 ` Al Viro @ 2015-04-14 20:14 ` John Stoffel 2015-04-14 21:51 ` Steven Rostedt 2015-04-15 8:35 ` Greg Kroah-Hartman 2 siblings, 2 replies; 333+ messages in thread From: John Stoffel @ 2015-04-14 20:14 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni >>>>> "Greg" == Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: Greg> On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote: >> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote: >> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman >> >> <gregkh@linuxfoundation.org> wrote: >> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote: >> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman >> >> >> <gregkh@linuxfoundation.org> wrote: >> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: >> >> >> > >> >> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) >> >> >> > >> >> >> > are available in the git repository at: >> >> >> > >> >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 >> >> >> > >> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: >> >> >> > >> >> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) >> >> >> > >> >> >> > ---------------------------------------------------------------- >> >> >> > kdbus for 4.1-rc1 >> >> >> > >> >> >> > Here's the kdbus pull request for 4.1-rc1. >> >> >> > >> >> >> > It's been under development for many years now, and been in linux-next >> >> >> > for many months, and has undergone loads of testing a review and even a few >> >> >> > good arguments. It comes with full documentation and tests. >> >> >> > >> >> >> > There has been a few complaints about the code, notably from people who >> >> >> > don't like the use of metadata in the bus messages. That is actually >> >> >> > one of the main features here, as we can get this data in a secure and >> >> >> > reliable way, and it's something that userspace requires today. So >> >> >> > while it does look "odd" to people who are not familiar with dbus, this >> >> >> > is something that finally fixes a number of almost unfixable races in >> >> >> > the current dbus implementations. >> >> >> >> >> >> While I generally like the concept of having a better in-kernel IPC >> >> >> mechanism, after some consideration I don't think this belongs in the >> >> >> kernel in its current form. Here's why. >> >> >> >> >> >> First, the naming is counterintuitive. There are "endpoints", but you >> >> >> don't send messages to endpoints. In fact, an basic kdbus setup will >> >> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it >> >> >> awkward. >> >> > >> >> > Did you read the documentation? We've been over this before, and it >> >> > should all be addressed in the documentation based on this coming up. >> >> > >> >> >> A lot of the design seems to be to violate the concept of "mechanism, >> >> >> not policy". Kdbus is very much a port of userspace dbus to the >> >> >> kernel, and it appears to be a port designed to preserve some >> >> >> questionable design decisions instead of learning from them. >> >> >> >> >> >> For example, kdbus sticks a whole policy database in the kernel, but >> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is >> >> >> *not* a simple set of rules like "if A then allow B". Instead it has >> >> >> really weird dependencies not on what name you're sending to but on >> >> >> what *other* names the thing you're sending to has. Sorry, but this >> >> >> way lies (a) the inability for a large set of developers to understand >> >> >> what's going on and (b) security bugs. Also, the result probably >> >> >> can't be reused as part of a non-legacy-filled sensible design >> >> > >> >> > What policy database? Matching messages to subscribers? That's the >> >> > same type of "database" that other ipc subsystems need/want, there's >> >> > nothing radical here. >> >> >> >> Let me quote from the latest version of the kdbus docs: >> >> >> >> Note that TALK access is checked against all names of a connection. For >> >> example, if a connection owns both <constant>'org.foo.bar'</constant> and >> >> <constant>'org.blah.baz'</constant>, and the policy database allows >> >> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this >> >> permission is also granted to <constant>'org.foo.bar'</constant>. That >> >> might sound illogical, but after all, we allow messages to be directed to >> >> either the ID or a well-known name, and policy is applied to the >> >> connection, not the name. In other words, the effective TALK policy for a >> >> connection is the most permissive of all names the connection owns. >> >> >> >> In my humble opinion, this paragraph speaks for itself. The design is >> >> bad, full stop. >> > >> > First off, thanks for reading the docs, I appreciate that. But realize >> > also, that this is straight from the D-Bus spec. We aren't doing >> > anything "radical" here, this is what your desktop uses that you are >> > typing your email from. >> > >> > Yes, it's an unfortunate design, but one that we are all stuck with >> > (think of it as having to implement code for horrid hardware that you >> > have to get to work properly.) >> >> I agree. You've sent a pull request for an unfortunate design. I >> don't think that unfortunate design belongs in the kernel. If it says >> in userspace, then user programmers could potentially fix it some day. Greg> You might not like the design, but it is a valid design. Again, we Greg> don't refuse to support hardware that is designed badly. Or support Greg> protocols we don't necessarily like, that's not the job of a kernel or Greg> operating system. Greg> And here's Havoc's response as to why actually, this is a good design: Greg> http://lists.freedesktop.org/archives/dbus/2015-April/016651.html This is an interesting discussion, and one thing that sticks out to me is the comments in the URL above talking about how clients are supposed to use a generic name to bind to a resource, but actually do a lookup to get the specific name, and then bind to THAT. So the security concerns raised by Andy do seem to make sense, in that either security needs to be the same across all names of a service, so that you don't have problems with varying levels once people have connected. In terms of the X11 analogy, if I have someone connect, and then I do 'xhost -' it removes all access. It's not dependent on whether I'm bound to a specific or general service. So the security aspect really needs to be that the most restrictive takes precedence, not the other way around. And after having read a bunch of the docs, looked at the FAQ, etc; it's still no clearer to me what DBUS and KDBUS provides that's all so important or critical. Sure, it might be nice to have, but that's ok. So I think that's the steps people need to take, give concrete example of how DBUS is better than anything else out there and won't cause more problems down the line. John ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 20:14 ` John Stoffel @ 2015-04-14 21:51 ` Steven Rostedt 2015-04-14 22:05 ` Jiri Kosina 2015-04-15 8:35 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: Steven Rostedt @ 2015-04-14 21:51 UTC (permalink / raw) To: John Stoffel Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul E. McKenney, James Bottomley On Tue, Apr 14, 2015 at 04:14:34PM -0400, John Stoffel wrote: > > So I think that's the steps people need to take, give concrete example > of how DBUS is better than anything else out there and won't cause > more problems down the line. I believe that Linux Plumbers is still accepting MicroConferences. I wonder if this would be a good one to have. Try to get everyone face to face and talk about how exactly kdbus should be implemented in the kernel. This doesn't look to me like it is going to be solved via electronic communication. Looks like the old free beer at a convention where everyone can give their drunken arguments may be quite productive. Greg, you told me you'll be there. What about everyone else? Want to write up a MicroConf: http://wiki.linuxplumbersconf.org/2015:topics It's not that far off. Kdbus has waited this long, I'm sure it can wait till August as well. I'd really love to see this happen. I'll even supply the popcorn ;-) -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 21:51 ` Steven Rostedt @ 2015-04-14 22:05 ` Jiri Kosina 2015-04-15 6:56 ` Borislav Petkov 2015-04-15 8:37 ` Greg Kroah-Hartman 0 siblings, 2 replies; 333+ messages in thread From: Jiri Kosina @ 2015-04-14 22:05 UTC (permalink / raw) To: Steven Rostedt Cc: John Stoffel, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul E. McKenney, James Bottomley On Tue, 14 Apr 2015, Steven Rostedt wrote: > I believe that Linux Plumbers is still accepting MicroConferences. I > wonder if this would be a good one to have. Try to get everyone face to > face and talk about how exactly kdbus should be implemented in the > kernel. I personally would even put more emphasis on a session that would first focus on "why", before we look at "how". I have already asked about this during the earlier RFC submissions, but the only "take-home message" I took from that discussion was "because it's faster than what we currently have". I don't find that a sufficient justification by itself for something so complex (with potential implications all over the place for the whole Linux ecosystem), especially given the fact we already have sealed memfds zerocopy etc (and I am not even talking about the "infinite set-in-stone userspace API" implications this has). So definitely +1 from me for this discussion to happen, being it either LPC (which I will unfortunately probably have to miss due to personal reaons this year) or KS. It might help people like me, who have trouble understanding why we need it, and LKML discussions don't provide enough answers for them. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 22:05 ` Jiri Kosina @ 2015-04-15 6:56 ` Borislav Petkov 2015-04-15 8:37 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Borislav Petkov @ 2015-04-15 6:56 UTC (permalink / raw) To: Jiri Kosina Cc: Steven Rostedt, John Stoffel, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul E. McKenney, James Bottomley On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote: > So definitely +1 from me for this discussion to happen, being it > either LPC (which I will unfortunately probably have to miss due to > personal reaons this year) or KS. It might help people like me, who > have trouble understanding why we need it, and LKML discussions don't > provide enough answers for them. Oh, and then please do a writeup so that people like me can read about it and find out the answer to that same question. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 22:05 ` Jiri Kosina 2015-04-15 6:56 ` Borislav Petkov @ 2015-04-15 8:37 ` Greg Kroah-Hartman 2015-04-15 18:12 ` James Bottomley 1 sibling, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:37 UTC (permalink / raw) To: Jiri Kosina Cc: Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul E. McKenney, James Bottomley On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote: > On Tue, 14 Apr 2015, Steven Rostedt wrote: > > > I believe that Linux Plumbers is still accepting MicroConferences. I > > wonder if this would be a good one to have. Try to get everyone face to > > face and talk about how exactly kdbus should be implemented in the > > kernel. > > I personally would even put more emphasis on a session that would first > focus on "why", before we look at "how". > > I have already asked about this during the earlier RFC submissions, but > the only "take-home message" I took from that discussion was "because it's > faster than what we currently have". I don't find that a sufficient > justification by itself for something so complex (with potential > implications all over the place for the whole Linux ecosystem), especially > given the fact we already have sealed memfds zerocopy etc (and I am not > even talking about the "infinite set-in-stone userspace API" implications > this has). I wrote many many lines of "why" in the patch submissions, and in the first email in this thread. Are any of those specific solutions and "why" reasons not correct in your opinion? If so, great, please let me know. But to say that no one is focusing on "why" is a slight to those of us who have been providing just that. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:37 ` Greg Kroah-Hartman @ 2015-04-15 18:12 ` James Bottomley 2015-04-16 12:13 ` David Herrmann 0 siblings, 1 reply; 333+ messages in thread From: James Bottomley @ 2015-04-15 18:12 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jiri Kosina, Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul E. McKenney On Wed, 2015-04-15 at 10:37 +0200, Greg Kroah-Hartman wrote: > On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote: > > On Tue, 14 Apr 2015, Steven Rostedt wrote: > > > > > I believe that Linux Plumbers is still accepting MicroConferences. I > > > wonder if this would be a good one to have. Try to get everyone face to > > > face and talk about how exactly kdbus should be implemented in the > > > kernel. > > > > I personally would even put more emphasis on a session that would first > > focus on "why", before we look at "how". > > > > I have already asked about this during the earlier RFC submissions, but > > the only "take-home message" I took from that discussion was "because it's > > faster than what we currently have". I don't find that a sufficient > > justification by itself for something so complex (with potential > > implications all over the place for the whole Linux ecosystem), especially > > given the fact we already have sealed memfds zerocopy etc (and I am not > > even talking about the "infinite set-in-stone userspace API" implications > > this has). > > I wrote many many lines of "why" in the patch submissions, and in the > first email in this thread. Are any of those specific solutions and > "why" reasons not correct in your opinion? If so, great, please let me > know. > > But to say that no one is focusing on "why" is a slight to those of us > who have been providing just that. Please stop. A debate that degenerates into a disagreement about whether specific questions have or have not been answered is no debate at all: it's an ideological show case. If both sides are going to do the same at plumbers (or elsewhere) it will be a waste of time (well, except as a spectator sport). To make this work, you need (as the plumbers MC templates tell you) a list of key attendees from all sides of the debate who'll commit to coming (mostly what I've heard so far is people committing not to coming) and a list of guiding topics which people will commit to discussing honestly. For me the biggest issue is the container problem: it's really hard to containerise kdbus because of the stateful nature of the protocol and the fact that it has a well known system bus. Separation into domains works for OS containers, but application containers need more fluidity. It's not unlike the same problem on windows: Windows application containers are very difficult to do because the global registry means that OLE handlers all have to run inside your container as well (effectively making it an OS container). I'm sure, since we already have a lot of containers people going to plumbers, that we can get them to turn up for the discussion. James ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:12 ` James Bottomley @ 2015-04-16 12:13 ` David Herrmann 2015-04-17 19:27 ` James Bottomley 0 siblings, 1 reply; 333+ messages in thread From: David Herrmann @ 2015-04-16 12:13 UTC (permalink / raw) To: James Bottomley Cc: Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni, Paul E. McKenney Hi On Wed, Apr 15, 2015 at 8:12 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > For me the biggest issue is the container problem: it's really hard to > containerise kdbus because of the stateful nature of the protocol and > the fact that it has a well known system bus. Separation into domains > works for OS containers, but application containers need more fluidity. > It's not unlike the same problem on windows: Windows application > containers are very difficult to do because the global registry means > that OLE handlers all have to run inside your container as well > (effectively making it an OS container). I'm sure, since we already > have a lot of containers people going to plumbers, that we can get them > to turn up for the discussion. kdbus actually works very well in OS containers that mount a new kdbusfs inside the container. This new instance of kdbus will be entirely seperated from any other on the system. We've designed it that way especially with OS containers in mind. This is explained in kdbus.fs(7). It's very similar to devpts' container support, where you mount a new instance of devpts into each container instance you run. For Docker-style (i.e. app-focused) containers, it's a more complex story. kdbus will not solve this for you, but at least one thing deserves being mentioned: for this kind of sandboxing kdbus certainly makes things *easier*, compared to dbus1. Why? because the kernel gains a notion of individual messages and method call transactions, something that is completely unavailable if you stick to dbus1 where all the kernel sees is a raw stream of AF_UNIX/SOCK_STREAM bytes. In fact, kdbus as it is right now even contains minimal but explicit support for sandboxing, by allowing creation of multiple bus endpoints to the same bus that carry additional, more restrictive policy. Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 12:13 ` David Herrmann @ 2015-04-17 19:27 ` James Bottomley 2015-04-17 20:27 ` Havoc Pennington 0 siblings, 1 reply; 333+ messages in thread From: James Bottomley @ 2015-04-17 19:27 UTC (permalink / raw) To: David Herrmann Cc: Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni, Paul E. McKenney On Thu, 2015-04-16 at 14:13 +0200, David Herrmann wrote: > Hi > > On Wed, Apr 15, 2015 at 8:12 PM, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: > > For me the biggest issue is the container problem: it's really hard to > > containerise kdbus because of the stateful nature of the protocol and > > the fact that it has a well known system bus. Separation into domains > > works for OS containers, but application containers need more fluidity. > > It's not unlike the same problem on windows: Windows application > > containers are very difficult to do because the global registry means > > that OLE handlers all have to run inside your container as well > > (effectively making it an OS container). I'm sure, since we already > > have a lot of containers people going to plumbers, that we can get them > > to turn up for the discussion. > > kdbus actually works very well in OS containers that mount a new > kdbusfs inside the container. This new instance of kdbus will be > entirely seperated from any other on the system. We've designed it > that way especially with OS containers in mind. This is explained in > kdbus.fs(7). It's very similar to devpts' container support, where you > mount a new instance of devpts into each container instance you run. > > For Docker-style (i.e. app-focused) containers, it's a more complex > story. Well, no, docker-style is just one flavour of application containers. I'm actually much more interested in something very different: applications that use container features (like docker, rocket and systemd). Facilitating them is an interesting exercise. Also, applications inside containers were around long before docker in the PaaS space at least. > kdbus will not solve this for you, but at least one thing > deserves being mentioned: for this kind of sandboxing kdbus certainly > makes things *easier*, compared to dbus1. So slightly better than really difficult isn't terribly useful. > Why? because the kernel > gains a notion of individual messages and method call transactions, > something that is completely unavailable if you stick to dbus1 where > all the kernel sees is a raw stream of AF_UNIX/SOCK_STREAM bytes. In > fact, kdbus as it is right now even contains minimal but explicit > support for sandboxing, by allowing creation of multiple bus endpoints > to the same bus that carry additional, more restrictive policy. Sandboxing is a minor (albeit very useful) use of containers. You nicely ignored the actual problem I listed, which is the system bus. And the specific example of what happens. Let me try again. Just to provide the context, Virtuozzo has long supported containers on both Windows and Linux. We have been doing application containers on Linux for a long time, but we've been having issues doing the same thing on windows (in spite of the fact that our windows container system is very similar to the Linux one). In windows, OLE + the global registry is dbus on steroids. The idea seems simple and elegant: remote system elements are provided to you via an IPC interaction instead of being directly dynamically linked into your virtual address space. It allows windows applications to deal with arbitrary objects of unknown type because the type handlers are provided by the system via OLE. It's really elegant in a single user desktop environment because the system's job is to serve and protect only that user. In a multi user environment (as MS found with VDI) it's a lot more problematic because now either the type handlers are global (meaning local users can't modify them unlike in the single user case) or they're all local, meaning we're back to OS containers again. If you think abstractly of containers as a way to bring multi-user features to single user environments (essentially that's what OS virtualization is) you can see immediately why we're having such issues with non-os containers on Windows because the single bus/global namespace idea doesn't play well with multi-user. This is why I think kdbus is a bad idea: it solidifies as a linux kernel API something which runs counter to granular OS virtualization (and something which caused Windows to fall behind Linux in the container space). Splitting out the acceleration problem and leaving the rest to user space currently looks fine because the ideas Al and Andy are kicking around don't cause problems with OS virtualization. James ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-17 19:27 ` James Bottomley @ 2015-04-17 20:27 ` Havoc Pennington 2015-04-17 21:45 ` Alex Elsayed 2015-04-20 18:01 ` James Bottomley 0 siblings, 2 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-17 20:27 UTC (permalink / raw) To: James Bottomley Cc: David Herrmann, Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni, Paul E. McKenney Hi, On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > This is why I think kdbus is a bad idea: it solidifies as a linux kernel > API something which runs counter to granular OS virtualization (and > something which caused Windows to fall behind Linux in the container > space). Splitting out the acceleration problem and leaving the rest to > user space currently looks fine because the ideas Al and Andy are > kicking around don't cause problems with OS virtualization. > I'm interested in understanding this problem (if only for my own curiosity) but I'm not confident I understand what you're saying correctly. Can I try to explain back / ask questions and see what I have right? I think you are saying that if an application relies on a system service (= any other process that runs on the system bus) then to virtualize that app by itself in a dedicated container, the system bus and the system service need to also be in the container. So the container ends up with a bunch of stuff in it beyond only the application. Right / wrong / confused? I also think you're saying that userspace dbus has the same issue (this isn't a userspace vs. kernel thing per se), the objection to kdbus is that it makes this issue more solidified / harder to fix? Do you have ideas on how to go about fixing it, whether in userspace or kernel dbus? Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-17 20:27 ` Havoc Pennington @ 2015-04-17 21:45 ` Alex Elsayed 2015-04-20 18:01 ` James Bottomley 1 sibling, 0 replies; 333+ messages in thread From: Alex Elsayed @ 2015-04-17 21:45 UTC (permalink / raw) To: linux-kernel Havoc Pennington wrote: > Hi, > > On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: >> >> This is why I think kdbus is a bad idea: it solidifies as a linux kernel >> API something which runs counter to granular OS virtualization (and >> something which caused Windows to fall behind Linux in the container >> space). Splitting out the acceleration problem and leaving the rest to >> user space currently looks fine because the ideas Al and Andy are >> kicking around don't cause problems with OS virtualization. >> > > I'm interested in understanding this problem (if only for my own > curiosity) but I'm not confident I understand what you're saying > correctly. > > Can I try to explain back / ask questions and see what I have right? > > I think you are saying that if an application relies on a system > service (= any other process that runs on the system bus) then to > virtualize that app by itself in a dedicated container, the system bus > and the system service need to also be in the container. So the > container ends up with a bunch of stuff in it beyond only the > application. Right / wrong / confused? > > I also think you're saying that userspace dbus has the same issue > (this isn't a userspace vs. kernel thing per se), the objection to > kdbus is that it makes this issue more solidified / harder to fix? > > Do you have ideas on how to go about fixing it, whether in userspace > or kernel dbus? > > Havoc So far as I understand (and this may be wrong), this is the use case of kdbus "endpoints" - you'd create a (constrained) kdbus endpoint on the host, and then expose it to the application, such that the application uses it as if it were the system bus. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-17 20:27 ` Havoc Pennington 2015-04-17 21:45 ` Alex Elsayed @ 2015-04-20 18:01 ` James Bottomley 2015-04-21 8:09 ` Daniel Mack 1 sibling, 1 reply; 333+ messages in thread From: James Bottomley @ 2015-04-20 18:01 UTC (permalink / raw) To: Havoc Pennington Cc: David Herrmann, Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni, Paul E. McKenney On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote: > Hi, > > On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: > > > > This is why I think kdbus is a bad idea: it solidifies as a linux kernel > > API something which runs counter to granular OS virtualization (and > > something which caused Windows to fall behind Linux in the container > > space). Splitting out the acceleration problem and leaving the rest to > > user space currently looks fine because the ideas Al and Andy are > > kicking around don't cause problems with OS virtualization. > > > > I'm interested in understanding this problem (if only for my own > curiosity) but I'm not confident I understand what you're saying > correctly. > > Can I try to explain back / ask questions and see what I have right? > > I think you are saying that if an application relies on a system > service (= any other process that runs on the system bus) then to > virtualize that app by itself in a dedicated container, the system bus > and the system service need to also be in the container. So the > container ends up with a bunch of stuff in it beyond only the > application. Right / wrong / confused? Right. Consider named as the unix equivalent. In most application containers, it's provided from outside. However, any container that wants it provided inside simply intercepts and overrides the well known socket. We can do this in UNIX because there's no global bus handling these queries, it's simply a matter of knowing where the socket is. In windows you can't pick and choose the services you consume from outside. Either you pull the whole OLE namespace into the container, and thus have to provide everything from within, or try to run with none of it provided by the container. It's this everything or nothing that's the problem. Container virtualisation is about being granular and a system bus (or global OLE namespace) is about being monolithic. > I also think you're saying that userspace dbus has the same issue > (this isn't a userspace vs. kernel thing per se), the objection to > kdbus is that it makes this issue more solidified / harder to fix? Yes, it does. We have problems containerising Linux desktops as well. However, most of our server stuff is daemon and socket based, so that containerises nicely. In windows, OLE has been absorbed even into the server model which is why they have a bigger problem. > Do you have ideas on how to go about fixing it, whether in userspace > or kernel dbus? Well, I've always suspected the solution would be for dbus to have a hierarchical namespace of its own with the default policy be pass message to parent namespace. This would allow a container to determine which services were serviced outside and which inside the container (if you attach as a provider to the system bus in the container, that attachment supersedes the parent). However, this doesn't solve the security problem: just because a container hasn't attached an interior provider doesn't mean it should be allowed complete access to all services provided from outside. This is the nasty problem because it involves some type of filter on busses which pass through containers. James ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 18:01 ` James Bottomley @ 2015-04-21 8:09 ` Daniel Mack 2015-04-21 18:25 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Daniel Mack @ 2015-04-21 8:09 UTC (permalink / raw) To: James Bottomley, Havoc Pennington Cc: David Herrmann, Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Djalal Harouni, Paul E. McKenney Hi, On 04/20/2015 08:01 PM, James Bottomley wrote: > On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote: >> Do you have ideas on how to go about fixing it, whether in userspace >> or kernel dbus? > > Well, I've always suspected the solution would be for dbus to have a > hierarchical namespace of its own with the default policy be pass > message to parent namespace. This would allow a container to determine > which services were serviced outside and which inside the container (if > you attach as a provider to the system bus in the container, that > attachment supersedes the parent). > > However, this doesn't solve the security problem: just because a > container hasn't attached an interior provider doesn't mean it should be > allowed complete access to all services provided from outside. This is > the nasty problem because it involves some type of filter on busses > which pass through containers. Fair point, we've been thinking about that as well. What we implemented for that is something we call 'custom endpoints', which is described in kdbus.endpoint(7). In short, an endpoint is an entry point to the bus. Each bus provides a default endpoint node that enforces the bus-wide policy rules that define which well-known names a peer may own, see, or talk to. Custom endpoints can be added to carry additional policy rules for peers connected through it, and redirecting a task or container to the custom endpoint instead of the default one is as easy as bind-mounting the node. systemd units actually have support for that since a while, which is how we tested this feature. This implementation doesn't even add much code to kdbus, because we do have the policy code around anyway, so that's just a matter of which policy database to look at during runtime. That said, it would actually even be easy to implement a way to allow overriding names on custom endpoints too, so that services inside a container can replace such that already exist on the bus. It's just that so far, we haven't yet seen a use case for this. Thanks, Daniel ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-21 8:09 ` Daniel Mack @ 2015-04-21 18:25 ` Andy Lutomirski 0 siblings, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-21 18:25 UTC (permalink / raw) To: Daniel Mack Cc: James Bottomley, Havoc Pennington, David Herrmann, Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Djalal Harouni, Paul E. McKenney On Tue, Apr 21, 2015 at 1:09 AM, Daniel Mack <daniel@zonque.org> wrote: > Hi, > > On 04/20/2015 08:01 PM, James Bottomley wrote: >> On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote: > >>> Do you have ideas on how to go about fixing it, whether in userspace >>> or kernel dbus? >> >> Well, I've always suspected the solution would be for dbus to have a >> hierarchical namespace of its own with the default policy be pass >> message to parent namespace. This would allow a container to determine >> which services were serviced outside and which inside the container (if >> you attach as a provider to the system bus in the container, that >> attachment supersedes the parent). >> >> However, this doesn't solve the security problem: just because a >> container hasn't attached an interior provider doesn't mean it should be >> allowed complete access to all services provided from outside. This is >> the nasty problem because it involves some type of filter on busses >> which pass through containers. > > Fair point, we've been thinking about that as well. What we implemented > for that is something we call 'custom endpoints', which is described in > kdbus.endpoint(7). > > In short, an endpoint is an entry point to the bus. Each bus provides a > default endpoint node that enforces the bus-wide policy rules that > define which well-known names a peer may own, see, or talk to. Custom > endpoints can be added to carry additional policy rules for peers > connected through it, and redirecting a task or container to the custom > endpoint instead of the default one is as easy as bind-mounting the > node. systemd units actually have support for that since a while, which > is how we tested this feature. This implementation doesn't even add much > code to kdbus, because we do have the policy code around anyway, so > that's just a matter of which policy database to look at during runtime. > > That said, it would actually even be easy to implement a way to allow > overriding names on custom endpoints too, so that services inside a > container can replace such that already exist on the bus. It's just that > so far, we haven't yet seen a use case for this. This is part of why I think that kdbus is the wrong design. All of this is great, but this is the kind of policy that IMO belongs in userspace. If nothing else, it means that you can add things like this in the future without any kernel changes. dbus-daemon can do all of this (in principle, anyway) already -- just stick another dbus-daemon-like program in the container that proxies things as appropriate. I think that a good kernel-accelerated design could do the same thing without having to put any of this type of policy in the kernel. (As an example, capability-based IPC gets all of this for free.) --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 20:14 ` John Stoffel 2015-04-14 21:51 ` Steven Rostedt @ 2015-04-15 8:35 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:35 UTC (permalink / raw) To: John Stoffel Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 14, 2015 at 04:14:34PM -0400, John Stoffel wrote: > >>>>> "Greg" == Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes: > > Greg> On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote: > >> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman > >> <gregkh@linuxfoundation.org> wrote: > >> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote: > >> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman > >> >> <gregkh@linuxfoundation.org> wrote: > >> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote: > >> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman > >> >> >> <gregkh@linuxfoundation.org> wrote: > >> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > >> >> >> > > >> >> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > >> >> >> > > >> >> >> > are available in the git repository at: > >> >> >> > > >> >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > >> >> >> > > >> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > >> >> >> > > >> >> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > >> >> >> > > >> >> >> > ---------------------------------------------------------------- > >> >> >> > kdbus for 4.1-rc1 > >> >> >> > > >> >> >> > Here's the kdbus pull request for 4.1-rc1. > >> >> >> > > >> >> >> > It's been under development for many years now, and been in linux-next > >> >> >> > for many months, and has undergone loads of testing a review and even a few > >> >> >> > good arguments. It comes with full documentation and tests. > >> >> >> > > >> >> >> > There has been a few complaints about the code, notably from people who > >> >> >> > don't like the use of metadata in the bus messages. That is actually > >> >> >> > one of the main features here, as we can get this data in a secure and > >> >> >> > reliable way, and it's something that userspace requires today. So > >> >> >> > while it does look "odd" to people who are not familiar with dbus, this > >> >> >> > is something that finally fixes a number of almost unfixable races in > >> >> >> > the current dbus implementations. > >> >> >> > >> >> >> While I generally like the concept of having a better in-kernel IPC > >> >> >> mechanism, after some consideration I don't think this belongs in the > >> >> >> kernel in its current form. Here's why. > >> >> >> > >> >> >> First, the naming is counterintuitive. There are "endpoints", but you > >> >> >> don't send messages to endpoints. In fact, an basic kdbus setup will > >> >> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it > >> >> >> awkward. > >> >> > > >> >> > Did you read the documentation? We've been over this before, and it > >> >> > should all be addressed in the documentation based on this coming up. > >> >> > > >> >> >> A lot of the design seems to be to violate the concept of "mechanism, > >> >> >> not policy". Kdbus is very much a port of userspace dbus to the > >> >> >> kernel, and it appears to be a port designed to preserve some > >> >> >> questionable design decisions instead of learning from them. > >> >> >> > >> >> >> For example, kdbus sticks a whole policy database in the kernel, but > >> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is > >> >> >> *not* a simple set of rules like "if A then allow B". Instead it has > >> >> >> really weird dependencies not on what name you're sending to but on > >> >> >> what *other* names the thing you're sending to has. Sorry, but this > >> >> >> way lies (a) the inability for a large set of developers to understand > >> >> >> what's going on and (b) security bugs. Also, the result probably > >> >> >> can't be reused as part of a non-legacy-filled sensible design > >> >> > > >> >> > What policy database? Matching messages to subscribers? That's the > >> >> > same type of "database" that other ipc subsystems need/want, there's > >> >> > nothing radical here. > >> >> > >> >> Let me quote from the latest version of the kdbus docs: > >> >> > >> >> Note that TALK access is checked against all names of a connection. For > >> >> example, if a connection owns both <constant>'org.foo.bar'</constant> and > >> >> <constant>'org.blah.baz'</constant>, and the policy database allows > >> >> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this > >> >> permission is also granted to <constant>'org.foo.bar'</constant>. That > >> >> might sound illogical, but after all, we allow messages to be directed to > >> >> either the ID or a well-known name, and policy is applied to the > >> >> connection, not the name. In other words, the effective TALK policy for a > >> >> connection is the most permissive of all names the connection owns. > >> >> > >> >> In my humble opinion, this paragraph speaks for itself. The design is > >> >> bad, full stop. > >> > > >> > First off, thanks for reading the docs, I appreciate that. But realize > >> > also, that this is straight from the D-Bus spec. We aren't doing > >> > anything "radical" here, this is what your desktop uses that you are > >> > typing your email from. > >> > > >> > Yes, it's an unfortunate design, but one that we are all stuck with > >> > (think of it as having to implement code for horrid hardware that you > >> > have to get to work properly.) > >> > >> I agree. You've sent a pull request for an unfortunate design. I > >> don't think that unfortunate design belongs in the kernel. If it says > >> in userspace, then user programmers could potentially fix it some day. > > Greg> You might not like the design, but it is a valid design. Again, we > Greg> don't refuse to support hardware that is designed badly. Or support > Greg> protocols we don't necessarily like, that's not the job of a kernel or > Greg> operating system. > > Greg> And here's Havoc's response as to why actually, this is a good design: > Greg> http://lists.freedesktop.org/archives/dbus/2015-April/016651.html > > This is an interesting discussion, and one thing that sticks out to me > is the comments in the URL above talking about how clients are > supposed to use a generic name to bind to a resource, but actually do > a lookup to get the specific name, and then bind to THAT. > > So the security concerns raised by Andy do seem to make sense, in that > either security needs to be the same across all names of a service, so > that you don't have problems with varying levels once people have > connected. In terms of the X11 analogy, if I have someone connect, > and then I do 'xhost -' it removes all access. It's not dependent on > whether I'm bound to a specific or general service. > > So the security aspect really needs to be that the most restrictive > takes precedence, not the other way around. But look at how dbus handles this, isn't this done in the correct way? > And after having read a bunch of the docs, looked at the FAQ, etc; > it's still no clearer to me what DBUS and KDBUS provides that's all so > important or critical. Sure, it might be nice to have, but that's ok. The first email I wrote here explains all of this, are those not valid uses for such a service that the kernel can provide? > So I think that's the steps people need to take, give concrete example > of how DBUS is better than anything else out there and won't cause > more problems down the line. D-Bus has been around for over 10 years now, and was the result of many failed attempts to do something much like this (COM, DCOM, CORBA, and a few others). The developers involved had lots of experience in this area, and created a solution that ended up working very well for the problem domain. So well that all other competing technologies in that area were obsoleted and abondonded and everyone has moved to D-Bus as it solves the problems they have in a correct manner. The reason nothing else has come along might just be because nothing else _needs_ to come along, D-Bus solves the need. So unless you see a technical reason why the proposed code is somehow not correct, I don't understand your complaint. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 18:57 ` Andy Lutomirski 2015-04-14 19:23 ` Greg Kroah-Hartman @ 2015-04-15 12:00 ` Greg Kroah-Hartman 2015-04-15 12:09 ` Jiri Kosina 1 sibling, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 12:00 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni [Back to the capability discussion] On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote: > >> Then I'll have to find a way to embolden my NACK further. My point is > >> that capturing garbage like cmdline and capabilities (again, that > >> latter part is completely unacceptable under any circumstances > >> whatsoever) on behalf of *all* senders is a disaster. If it's > >> optional, then I can at least hope that userspace will honor the > >> optionality and let everything turn it off. If it's mandatory, then > >> kdbus is just unsafe to use to send messages to untrusted parties. > > > > It's opted in by the receiving peer if the task implementing a service > > wants to access these pieces of information. It is optional, and the > > documentation clearly states that userspace should cope with this, and > > also, when they are available we make sure to provide the correct > > race-free information. > > > > As said many times before, an application can do so already today with > > information from other API file systems, so why is this suddenly a > > problem when kdbus optionally offers the exact same information along > > with each transmitted message? Yes, we all "hate" capabilities, but > > userspace uses them, and gets access to them all the time through the > > POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through > > /proc/pid/status. They are something that we have to support and handle > > properly. > > > > In the very first submission of kdbus, we stated that we want to allow > > userspace methods to access these same bits to be able to make decisions > > about permissions. And to do so in a race-free manner, which is very > > hard, if not almost impossible, to do so from userspace alone. > > > > For instance, if a task has CAP_NET_ADMIN set, we can use that > > information in order to allow or disallow certain actions to be taken by > > a privileged process. Or, if a client that has the capability to call > > reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the > > system, the system daemon listening for that message knows that yes, at > > the time that the client made that call, it really did have that > > capability so it is ok to actually reboot the system. > > > > Instead of trying to use SCM_CREDENTIALS to get the pid and another > > round of cap_get_pid() and the like, all of which are susceptable to > > racing and all sorts of other horrors, that are insecure, we can provide > > this information in an atomic, and secure way. > > /me suppresses a long string of expletives. > > Please point me at the code that does this with caps. It's WRONG in > userspace and it's WRONG in the kernel. I want to know what code that > runs on my system does this so I can send the appropriate bug reports > and get it fixed. I think the RHEL crowd at least will take it > seriously when I tell them that this is a security hole. Look at how polkit and login manager work. Or anything that uses SCM_CREDENTIALS. Also I think PAM does odd things with credentials, but it's been a long time since I looked at any PAM code, I could be wrong. Also look at users of SO_PEERCRED, as those are used in places as well, but you know all about those. Also look at programs that make those capability calls, they are obviously using them for some reason, right? Nothing we can do about them, and it's not the main issue here at all, sorry for the side-discussion. > > The kernel today, and userspace, relies on capabilities all the time > > (i.e. almost every syscall), how are they something that is somehow not > > valid to use and support? > > No. The *kernel* relies on caps. Userspace should not. Userspace uses caps to have the kernel do things. Or not do things. If not, why do we have things like SCM_CREDINTIALS in the first place? > > And of course, as Eric will point out, capabailities are not > > translatable across user namespaces, which is a problem. Because of > > this, we dispose of that piece of metadata information when a message > > crosses a user namespace boundry. This is the right thing to do, which > > is not the case for almost all other kernel apis which report bogus > > capabilies when user namespaces are crossed. > > The right thing to do is to not use capabilities for userspace stuff. Again, userspace needs them in order to have the kernel do things for userspace as needed. Look at the Tizen example in the first email, where they had to use SCM_CREDENTIALS, and all of the speed/latency issues that this resulted in. > > So we implemented this correctly, and somehow that is a feature so bad > > that both you and Eric think the whole baby should be thrown out? How > > else should this be implemented? > > It shouldn't be implemented. Great, so can we also drop those POSIX functions and the /proc/ information as well? I didn't think so :) > > As documented in the original email on this thread, Tizen wants to use > > this, as it solves a real need that they have. Their workarounds > > involve using custom UDS sockets, but the latency involved is horrid and > > unacceptable. Using a kdbus message solves this issue for them, > > allowing UI rendering to work properly/quickly. > > > > Again, capabilities are something we all require and rely on today, > > passing the current capability on to a recipient isn't a way to raise > > privileges at all, but rather, properly determine if they are present > > at sending time, if wanted. How does that create an insecure system? > > What am I missing that is so bad here with the design we have? > > That, even if the implementation could be made to be useful and > correct, capabilities refer to privileges wrt the kernel, not > userspace. They're not the right bit of policy to look at here. So what is the right bit of policy to look at then? > For example, the thing that should make it possible to run 'systemctl > reboot' or whatever is not CAP_SYS_BOOT, because CAP_SYS_BOOT is the > permission to hard reboot the system immediately, and that's not what > 'systemctl reboot' is for. 'systemctl reboot' calls a bunch of other things to determine if you have local access to the machine, or permissions to reboot the machine (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do, and then, it decides to reboot or not. That happens today, right? I don't understand the argument here. confused, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 12:00 ` Greg Kroah-Hartman @ 2015-04-15 12:09 ` Jiri Kosina 2015-04-15 12:18 ` One Thousand Gnomes 2015-04-15 12:27 ` Greg Kroah-Hartman 0 siblings, 2 replies; 333+ messages in thread From: Jiri Kosina @ 2015-04-15 12:09 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > 'systemctl reboot' calls a bunch of other things to determine if you > have local access to the machine, or permissions to reboot the machine > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do, > and then, it decides to reboot or not. That happens today, right? I > don't understand the argument here. And what exactly is the argument that this is the way it should be implemnted? Why can't it just rely on the kernel to provide final answer to "to reboot or not to reboot, that is the question"? At the end of the day, it's the kernel that decides whether it will really ultimately ask the platform to reboot. If, for whatever reason (which might be completely invisible to userspace) kernel decides not to do so, userspace has to be able to recover from such failure in any case. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 12:09 ` Jiri Kosina @ 2015-04-15 12:18 ` One Thousand Gnomes 2015-04-15 12:30 ` Greg Kroah-Hartman 2015-04-15 12:27 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 12:18 UTC (permalink / raw) To: Jiri Kosina Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 15 Apr 2015 14:09:24 +0200 (CEST) Jiri Kosina <jkosina@suse.cz> wrote: > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > > > 'systemctl reboot' calls a bunch of other things to determine if you > > have local access to the machine, or permissions to reboot the machine > > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do, > > and then, it decides to reboot or not. That happens today, right? I > > don't understand the argument here. The first problem with that is that if you run the capability model in the kernel combined with our distributions through any kind of formal analysis it'll come out with more holes than a roll of wire netting. There are lots of capability handling bugs that allow you to get one capability from another where it should not be possible. Linux capabilities were a little ad-hoc and a "neat idea" in their day. It's not how anyone would do them now. At best they are ok for little things like network raw access in ping/traceroute. Thats an implementation detail. If we were to adopt something like capsicum the stuff you pass would look way different and the model would potentially work. > And what exactly is the argument that this is the way it should be > implemnted? For me the fact that capabilities are known legacy and broken, and the model will change. Better would be to just pass some "cookie" that can be used to ask "is the sender allowed to X" via the LSM modules. That futureproofs the portability I think - and is also actually more powerful anyway. > Why can't it just rely on the kernel to provide final answer to "to reboot > or not to reboot, that is the question"? It can, however you may want userspace to assert privileges and reboot even though the user doesn't have the right powers directly (think about mundane things like ctrl-alt-del or the reboot button on a desktop). Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 12:18 ` One Thousand Gnomes @ 2015-04-15 12:30 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 12:30 UTC (permalink / raw) To: One Thousand Gnomes Cc: Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:18:28PM +0100, One Thousand Gnomes wrote: > On Wed, 15 Apr 2015 14:09:24 +0200 (CEST) > Jiri Kosina <jkosina@suse.cz> wrote: > > > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > > > > > 'systemctl reboot' calls a bunch of other things to determine if you > > > have local access to the machine, or permissions to reboot the machine > > > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do, > > > and then, it decides to reboot or not. That happens today, right? I > > > don't understand the argument here. > > The first problem with that is that if you run the capability model in > the kernel combined with our distributions through any kind of formal > analysis it'll come out with more holes than a roll of wire netting. > > There are lots of capability handling bugs that allow you to get one > capability from another where it should not be possible. Linux > capabilities were a little ad-hoc and a "neat idea" in their day. "formal analysis"? Heh, yeah, I know all about that, and really, that's not anything we can do about here. > It's not how anyone would do them now. At best they are ok for little > things like network raw access in ping/traceroute. > > Thats an implementation detail. If we were to adopt something like > capsicum the stuff you pass would look way different and the model would > potentially work. True, the capsicum developers seem to have gone quiet on us :( > > And what exactly is the argument that this is the way it should be > > implemnted? > > For me the fact that capabilities are known legacy and broken, and the > model will change. Better would be to just pass some "cookie" that can be > used to ask "is the sender allowed to X" via the LSM modules. > > That futureproofs the portability I think - and is also actually more > powerful anyway. Yes, that would work, but that kind of sounds like the same thing we have today, just with a different name :) thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 12:09 ` Jiri Kosina 2015-04-15 12:18 ` One Thousand Gnomes @ 2015-04-15 12:27 ` Greg Kroah-Hartman 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 12:27 UTC (permalink / raw) To: Jiri Kosina Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 02:09:24PM +0200, Jiri Kosina wrote: > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote: > > > 'systemctl reboot' calls a bunch of other things to determine if you > > have local access to the machine, or permissions to reboot the machine > > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do, > > and then, it decides to reboot or not. That happens today, right? I > > don't understand the argument here. > > And what exactly is the argument that this is the way it should be > implemnted? I can't answer that, discuss it with the developers of that userspace code please. > Why can't it just rely on the kernel to provide final answer to "to reboot > or not to reboot, that is the question"? Usually you want to do a few things before telling the kernel to reboot, like unmount all filesystems and the like :) Anyway, we are getting away from the code at hand, please, let's discuss that. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 17:50 ` Greg Kroah-Hartman 2015-04-14 18:57 ` Andy Lutomirski @ 2015-04-14 22:33 ` Jiri Kosina 2015-04-15 8:56 ` Greg Kroah-Hartman 1 sibling, 1 reply; 333+ messages in thread From: Jiri Kosina @ 2015-04-14 22:33 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote: > Yes, it's an unfortunate design, but one that we are all stuck with > (think of it as having to implement code for horrid hardware that you > have to get to work properly.) Greg, I personally consider this a rather defunct analogy. Broken hardware comes from "outter space" we just have to live with somehow, and eventually try to gradually improve by working with vendors (and you yourself have of course made huge improvements in this very area). Linux userspace is coming, well, from Linux developers. The sole fact that someone wrote a daemon that runs on Linux seems like a very poor justification for sucking the daemon into kernel "because we have to live with it". Userspace has to live with it somehow (and eventually fix itself if necessary), yes. Why should kernel just contribute to this "unfortunate design" if it really isn't, in any way, obliged or forced to do so? Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-14 22:33 ` Jiri Kosina @ 2015-04-15 8:56 ` Greg Kroah-Hartman 2015-04-15 11:06 ` One Thousand Gnomes 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-15 8:56 UTC (permalink / raw) To: Jiri Kosina Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 12:33:30AM +0200, Jiri Kosina wrote: > On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote: > > > Yes, it's an unfortunate design, but one that we are all stuck with > > (think of it as having to implement code for horrid hardware that you > > have to get to work properly.) > > Greg, I personally consider this a rather defunct analogy. Broken hardware > comes from "outter space" we just have to live with somehow, and > eventually try to gradually improve by working with vendors (and you > yourself have of course made huge improvements in this very area). > > Linux userspace is coming, well, from Linux developers. The sole fact that > someone wrote a daemon that runs on Linux seems like a very poor > justification for sucking the daemon into kernel "because we have to live > with it". > Userspace has to live with it somehow (and eventually fix itself if > necessary), yes. Why should kernel just contribute to this "unfortunate > design" if it really isn't, in any way, obliged or forced to do so? I retract my "unfortunate design" statement, as Havoc pointed out exactly why that design is the way it is, and it makes sense to me. To quote the email that he wrote: The reason is that dbus views the world in a stateful way assuming that connections, and name ownership, can be tracked reliably. This is different from say http, and it's one reason that people used to Internet-oriented protocols find dbus strange. I'm one of those "people used to internet-oriented protocols", and I bet that almost all of us kernel developers also fall into that category, as the kernel for the most part, is one big tool to help implement those Internet-oriented protocols :) The very history of D-Bus, where it came from, who is now using it, what happened to all of the other proposed solutions in this area, is worth examining if you are interested in it. This type of protocol solves a real problem in this area, one that everyone has congregated on as the best-known solution for that issue. It's used everywhere, on servers, embedded systems, desktops, you name it. All languages have bindings for it, and it's the underpinning of a modern Linux stack. For us to somehow say that it's a "horible protocol" is terribly unfair, and unkind, to all of the people who have worked to make it the best possible solution for this problem space. And honestly, I don't have a better proposal. And I seriously doubt that anyone here does either. In the many years I've spent working on this, dbus has seemed to be odd, and strange, to the way that the kernel has normally worked, because it is. And that's not a bad thing, it's just different, and for us to support real needs and requirements of our users, is the requirement of the Linux kernel. Now if there are technical problems or insecurities in the proposed code submission, wonderful, please let me know and I'll be glad to work to address them. But let's just drop the whole "oooh, look, D-Bus is horrible looking, we can't support that!", is not a valid justification. And I'll defer back to the old AF_DBUS proposal, which was looked at from a technical point of view of the network developers who said that they didn't think that putting the D-Bus model into a network stack made any sense from a technical point of view, and outligned their objectsions. And they were right, hence this different proposal many years later based on their insight and suggestions. If you have objections like that, great, please let me know. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 8:56 ` Greg Kroah-Hartman @ 2015-04-15 11:06 ` One Thousand Gnomes 2015-04-15 16:00 ` Rik van Riel 0 siblings, 1 reply; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 11:06 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni > To quote the email that he wrote: > The reason is that dbus views the world in a stateful way > assuming that connections, and name ownership, can be tracked > reliably. This is different from say http, and it's one reason > that people used to Internet-oriented protocols find dbus > strange. > > I'm one of those "people used to internet-oriented protocols", and I bet > that almost all of us kernel developers also fall into that category, as > the kernel for the most part, is one big tool to help implement those > Internet-oriented protocols :) I worked on protocols with state. I suffered X.25, X.29, coloured book, ISDN. It's a completely *crap* model. It has unfixable reliability problems. It has unfixable flow control problems. The only thing it buys you is the ability to have more traffic in flight between end points than you have transient memory for at the endpoints. You don't need a grand unified state to track service locations and access (ie names), which is fortunate or we'd be rebooting the internet and all attached computers all the time. > The very history of D-Bus, where it came from, who is now using it, what > happened to all of the other proposed solutions in this area, is worth > examining if you are interested in it. This type of protocol solves a History is why you got where you did. The history of Windows 98 explains how they got there. It doesn't mean that continuing the same mistake is a good idea. > embedded systems, desktops, you name it. All languages have bindings > for it, and it's the underpinning of a modern Linux stack. For us to Everything used to have just a choice of COBOL or FORTRAN bindings. That was not a good reason to continue to program the world in either of them. > that anyone here does either. In the many years I've spent working on > this, dbus has seemed to be odd, and strange, to the way that the kernel > has normally worked, because it is. And that's not a bad thing, it's > just different, and for us to support real needs and requirements of our > users, is the requirement of the Linux kernel. There are I think a set of intertwined problems here - An efficient delivery system for multicast messages delivered locally (be that MPI, dbus whatever - it's not "dbus or nothing") - A kernel side dynamic namespace to describe what goes where - A kernel side security model to describe who may receive what, and which additional information/tags/cred info - Something that provides state to stuff that needs it (and probably belongs in userspace - dbus name service etc) - Something that maps dbus and other models onto the kernel security model (and we have tools like EBPF which are very powerful) - Something that maps the kernel layer onto models like MPI-3 > Now if there are technical problems or insecurities in the proposed code > submission, wonderful, please let me know and I'll be glad to work to > address them. But let's just drop the whole "oooh, look, D-Bus is > horrible looking, we can't support that!", is not a valid justification. We can however leave it in userspace until we understand the right small clean way to support it and other needs. At the moment for example cluster people can't really use this stuff because its not network aware, and HPC people can't use it because it's got dbus hardwired into it so can't speak MPI-3 and the like even though MPI 3 has similar concepts around DPM, as well as having proper models for parallelism and collective operations that are lacking in dbus. If the userspace folks choose to continue to implement dbust over it but the kernel layer is clean and generic then all is good, because someone can replace dbust with something better. If its got dbust hard wired into it then its a complete mess. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 11:06 ` One Thousand Gnomes @ 2015-04-15 16:00 ` Rik van Riel 2015-04-15 16:44 ` Havoc Pennington 0 siblings, 1 reply; 333+ messages in thread From: Rik van Riel @ 2015-04-15 16:00 UTC (permalink / raw) To: One Thousand Gnomes, Greg Kroah-Hartman Cc: Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/15/2015 07:06 AM, One Thousand Gnomes wrote: >> that anyone here does either. In the many years I've spent working on >> this, dbus has seemed to be odd, and strange, to the way that the kernel >> has normally worked, because it is. And that's not a bad thing, it's >> just different, and for us to support real needs and requirements of our >> users, is the requirement of the Linux kernel. > > There are I think a set of intertwined problems here > > - An efficient delivery system for multicast messages delivered locally > (be that MPI, dbus whatever - it's not "dbus or nothing") > > - A kernel side dynamic namespace to describe what goes where > > - A kernel side security model to describe who may receive what, and > which additional information/tags/cred info > > - Something that provides state to stuff that needs it (and probably > belongs in userspace - dbus name service etc) > > - Something that maps dbus and other models onto the kernel security > model (and we have tools like EBPF which are very powerful) > > - Something that maps the kernel layer onto models like MPI-3 It is not clear to me why user space applications would have to change if the kernel bus used for dbus behaves differently from the userspace dbus daemon. Can't libdbus take care of the differences, and remove some of the problems highlighted by Alan (eg. the possibility of the protocol requiring the kernel to keep more messages in flight than we have memory for) ? ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:00 ` Rik van Riel @ 2015-04-15 16:44 ` Havoc Pennington 2015-04-15 18:16 ` Steven Rostedt ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-15 16:44 UTC (permalink / raw) To: Rik van Riel Cc: One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <riel@redhat.com> wrote: > On 04/15/2015 07:06 AM, One Thousand Gnomes wrote: > >>> that anyone here does either. In the many years I've spent working on >>> this, dbus has seemed to be odd, and strange, to the way that the kernel >>> has normally worked, because it is. And that's not a bad thing, it's >>> just different, and for us to support real needs and requirements of our >>> users, is the requirement of the Linux kernel. >> >> There are I think a set of intertwined problems here >> >> - An efficient delivery system for multicast messages delivered locally >> (be that MPI, dbus whatever - it's not "dbus or nothing") >> >> - A kernel side dynamic namespace to describe what goes where >> >> - A kernel side security model to describe who may receive what, and >> which additional information/tags/cred info >> >> - Something that provides state to stuff that needs it (and probably >> belongs in userspace - dbus name service etc) >> >> - Something that maps dbus and other models onto the kernel security >> model (and we have tools like EBPF which are very powerful) >> >> - Something that maps the kernel layer onto models like MPI-3 When trying to split apart problems, for dbus it's important to keep ordering guarantees. That is, with dbus if I send a broadcast message, then send a unicast request to another client, then drop the connection causing the bus to broadcast that I've dropped; then the other client will see those things in that order - the broadcast, then the request, and then that I've dropped the connection. If you have separate facilities for these things, it could get hard to keep them in order. dbus uses the simple model that they stay in order because the bus conceptually has a single dispatch queue. By pushing everything through one queue, dbus is trying to reduce the number of codepaths in applications. Apps have a lot of new problems to solve if messages get their order scrambled. (dbus does NOT guarantee order across multiple clients, of course - there's no guarantee that all clients get the broadcast, before anyone gets the next message - each client has its own buffer on both read and write. The ordering is only with respect to each client's message stream.) Ordering is vital for tracking state, because if you're sending out events to describe changes in state, the order of those changes is important. Of course there are more complex ways to handle this over in distributed-systems-world. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:44 ` Havoc Pennington @ 2015-04-15 18:16 ` Steven Rostedt 2015-04-15 18:40 ` Havoc Pennington 2015-04-15 20:22 ` Andy Lutomirski 2015-04-15 22:08 ` One Thousand Gnomes 2 siblings, 1 reply; 333+ messages in thread From: Steven Rostedt @ 2015-04-15 18:16 UTC (permalink / raw) To: Havoc Pennington Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 12:44:44PM -0400, Havoc Pennington wrote: > > By pushing everything through one queue, dbus is trying to reduce the > number of codepaths in applications. Apps have a lot of new problems > to solve if messages get their order scrambled. But can't a dbus library handle this for the apps? Like implementing TCP on top of UDP. I really doubt the entire dbus protocol needs to be pushed into the kernel. I'm going to try to spend some time reading about dbus and playing with the code (thanks for the links BTW!). Then I can see if I can come up with something too. Or at least be able to ask the right questions. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 18:16 ` Steven Rostedt @ 2015-04-15 18:40 ` Havoc Pennington 0 siblings, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-15 18:40 UTC (permalink / raw) To: Steven Rostedt Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 2:16 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > But can't a dbus library handle this for the apps? Like implementing TCP on > top of UDP. I really doubt the entire dbus protocol needs to be pushed into > the kernel. You could probably do something like assign sequence numbers, temporarily relax ordering, and then reconstruct the order where needed, but somebody still has to assign the sequence numbers in order, and the bus has to process requests in order (it can't flip a subscribe and an unsubscribe, for example). So I don't know whether you could get anywhere with it or not. The current model (userspace dbus daemon, don't know about kdbus) is like this: - you have a pool of ordered incoming queues from each client, where each incoming queue conceptually ends with EOF of course - the main bus loop does: - pick the head message or EOF from any nonempty incoming queue for dispatch - route it according to destination address or subscribers - if the destination includes the bus itself (e.g. someone wanting to subscribe or own a name or whatever) then process the request... note that this will potentially affect how the next message gets routed - for each destination client, write the message to the ordered outgoing queue for that client - if the incoming queue has EOF then send out notifications about that to interested clients, clean up bus names, etc. Conceptually, filling and draining the queues could easily be in separate threads, though the userspace daemon doesn't do that. > I'm going to try to spend some time reading about dbus and playing with the > code (thanks for the links BTW!). Then I can see if I can come up with > something too. Or at least be able to ask the right questions. > Greg may be right to point people to Lennart's C binding which has a lot less "baggage" than the GLib stuff, which assumes knowledge of the "glib way" Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:44 ` Havoc Pennington 2015-04-15 18:16 ` Steven Rostedt @ 2015-04-15 20:22 ` Andy Lutomirski 2015-04-15 20:41 ` Al Viro ` (3 more replies) 2015-04-15 22:08 ` One Thousand Gnomes 2 siblings, 4 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-15 20:22 UTC (permalink / raw) To: Havoc Pennington Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote: > On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <riel@redhat.com> wrote: >> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote: >> >>>> that anyone here does either. In the many years I've spent working on >>>> this, dbus has seemed to be odd, and strange, to the way that the kernel >>>> has normally worked, because it is. And that's not a bad thing, it's >>>> just different, and for us to support real needs and requirements of our >>>> users, is the requirement of the Linux kernel. >>> >>> There are I think a set of intertwined problems here >>> >>> - An efficient delivery system for multicast messages delivered locally >>> (be that MPI, dbus whatever - it's not "dbus or nothing") >>> >>> - A kernel side dynamic namespace to describe what goes where >>> >>> - A kernel side security model to describe who may receive what, and >>> which additional information/tags/cred info >>> >>> - Something that provides state to stuff that needs it (and probably >>> belongs in userspace - dbus name service etc) >>> >>> - Something that maps dbus and other models onto the kernel security >>> model (and we have tools like EBPF which are very powerful) >>> >>> - Something that maps the kernel layer onto models like MPI-3 > > When trying to split apart problems, for dbus it's important to keep > ordering guarantees. > > That is, with dbus if I send a broadcast message, then send a unicast > request to another client, then drop the connection causing the bus to > broadcast that I've dropped; then the other client will see those > things in that order - the broadcast, then the request, and then that > I've dropped the connection. This leads me to a potentially interesting question: where's the buffering? If there's a bus with lots of untrusted clients and one of them broadcasts data faster than all receivers can process it, where does it go? At least with a userspace solution, it's clear what the OOM killer should kill when this happens. Unless it's PID 1. Sigh. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 20:22 ` Andy Lutomirski @ 2015-04-15 20:41 ` Al Viro 2015-04-15 21:07 ` Rik van Riel ` (2 subsequent siblings) 3 siblings, 0 replies; 333+ messages in thread From: Al Viro @ 2015-04-15 20:41 UTC (permalink / raw) To: Andy Lutomirski Cc: Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 01:22:12PM -0700, Andy Lutomirski wrote: > This leads me to a potentially interesting question: where's the > buffering? If there's a bus with lots of untrusted clients and one of > them broadcasts data faster than all receivers can process it, where > does it go? > > At least with a userspace solution, it's clear what the OOM killer > should kill when this happens. Unless it's PID 1. Sigh. ... and there is a PID 1 specimen that really likes to spew over dbus. A lot. I had never been able to find out _why_ does systemd feel like broadcasting all kinds of stuff from PID 1 - maybe somebody in this thread can answer that. For example, what's the point of broadcasting mount table updates, when * it can't hope to catch all individual changes - they _can_ get lumped together, no matter what it tries. * any process can just as easily keep track of that data on its own as it could by watching those broadcasts; parsing /proc/self/mountinfo isn't harder than parsing notifications. * you need to start with obtaining the original state somehow, or what would you apply those updates to? * if one insists on having a daemon doing such broadcasts, what the hell is the point of having PID 1 do that? Exact same logics would do just fine. Moreover, you could have one running in a namespace of your session, which is something PID 1 won't see. Sure, I understand why it wants to be aware of what's mounted and where it's mounted. Just as it wants to know what time it is. Should it broadcast a dbus message every second, just to tell everyone what had it found about the time? I'm somewhat tempted to propose AF_TWITTER - would match the style... ;-/ And frankly, this really looks like a social media braindamage - complete with status update broadcast every time a plane flies by... ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 20:22 ` Andy Lutomirski 2015-04-15 20:41 ` Al Viro @ 2015-04-15 21:07 ` Rik van Riel 2015-04-16 18:03 ` Djalal Harouni 2015-04-15 21:58 ` Havoc Pennington 2015-04-16 13:13 ` Tom Gundersen 3 siblings, 1 reply; 333+ messages in thread From: Rik van Riel @ 2015-04-15 21:07 UTC (permalink / raw) To: Andy Lutomirski, Havoc Pennington Cc: One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/15/2015 04:22 PM, Andy Lutomirski wrote: > On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote: >> On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <riel@redhat.com> wrote: >>> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote: >>> >>>>> that anyone here does either. In the many years I've spent working on >>>>> this, dbus has seemed to be odd, and strange, to the way that the kernel >>>>> has normally worked, because it is. And that's not a bad thing, it's >>>>> just different, and for us to support real needs and requirements of our >>>>> users, is the requirement of the Linux kernel. >>>> >>>> There are I think a set of intertwined problems here >>>> >>>> - An efficient delivery system for multicast messages delivered locally >>>> (be that MPI, dbus whatever - it's not "dbus or nothing") >>>> >>>> - A kernel side dynamic namespace to describe what goes where >>>> >>>> - A kernel side security model to describe who may receive what, and >>>> which additional information/tags/cred info >>>> >>>> - Something that provides state to stuff that needs it (and probably >>>> belongs in userspace - dbus name service etc) >>>> >>>> - Something that maps dbus and other models onto the kernel security >>>> model (and we have tools like EBPF which are very powerful) >>>> >>>> - Something that maps the kernel layer onto models like MPI-3 >> >> When trying to split apart problems, for dbus it's important to keep >> ordering guarantees. >> >> That is, with dbus if I send a broadcast message, then send a unicast >> request to another client, then drop the connection causing the bus to >> broadcast that I've dropped; then the other client will see those >> things in that order - the broadcast, then the request, and then that >> I've dropped the connection. > > This leads me to a potentially interesting question: where's the > buffering? If there's a bus with lots of untrusted clients and one of > them broadcasts data faster than all receivers can process it, where > does it go? > > At least with a userspace solution, it's clear what the OOM killer > should kill when this happens. Unless it's PID 1. Sigh. It may be useful to do the buffering (and general interception of any message that cannot be delivered) in a userspace program. Not only to get the buffers out of the kernel and into swappable memory, but also so people could re-use the same infrastructure for things like cluster communication (or communication between different containers) - the userspace daemons could take care of routing messages to and from the outside. They could also be useful to keep some of the policy stuff outside of the kernel, if only to ensure that the kernel side policy is not set in stone, and people can do things differently in the future if they want to. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 21:07 ` Rik van Riel @ 2015-04-16 18:03 ` Djalal Harouni 0 siblings, 0 replies; 333+ messages in thread From: Djalal Harouni @ 2015-04-16 18:03 UTC (permalink / raw) To: Rik van Riel Cc: Andy Lutomirski, Havoc Pennington, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann Hi, On Wed, Apr 15, 2015 at 05:07:28PM -0400, Rik van Riel wrote: [...] > > This leads me to a potentially interesting question: where's the > > buffering? If there's a bus with lots of untrusted clients and one of > > them broadcasts data faster than all receivers can process it, where > > does it go? > > > > At least with a userspace solution, it's clear what the OOM killer > > should kill when this happens. Unless it's PID 1. Sigh > > It may be useful to do the buffering (and general interception > of any message that cannot be delivered) in a userspace program. > > Not only to get the buffers out of the kernel and into swappable > memory, but also so people could re-use the same infrastructure > for things like cluster communication (or communication between > different containers) - the userspace daemons could take care of > routing messages to and from the outside. > > They could also be useful to keep some of the policy stuff > outside of the kernel, if only to ensure that the kernel side > policy is not set in stone, and people can do things differently > in the future if they want to. > kdbus connections have memory pools, please check kdbus.pool(7). The pool has its own quota accounting to prevent bad scenarios, and the memory is attributed to the connection. Messages that can't be delivered are not stored in the pool, but senders will get an appropriate error code. For further details on how this works, please see kdbus.message(7). If you are aware of any corner-cases we overlooked, please let us know. Regarding the policy, the implementaion is hardly more complex than traditional UNIX file permissions. Bus names may have multiple permissions assined, each of which consist of a bit-mask to denote OWN, TALK and SEE flags which are applied to UIDs, GIDs or "world". This policy has to be enforced by the kernel, therfore the information it acts upon also needs to be stored there. For further details, please see kdbus.policy(7). The concept of a name policy originates from dbus1 [1], however we simplified it substantially, removing features which we believe rather belong into userspace. [1] http://dbus.freedesktop.org/doc/dbus-daemon.1.html -- Djalal Harouni http://opendz.org ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 20:22 ` Andy Lutomirski 2015-04-15 20:41 ` Al Viro 2015-04-15 21:07 ` Rik van Riel @ 2015-04-15 21:58 ` Havoc Pennington 2015-04-16 13:13 ` Tom Gundersen 3 siblings, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-15 21:58 UTC (permalink / raw) To: Andy Lutomirski Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 15, 2015 at 4:22 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > This leads me to a potentially interesting question: where's the > buffering? If there's a bus with lots of untrusted clients and one of > them broadcasts data faster than all receivers can process it, where > does it go? > > At least with a userspace solution, it's clear what the OOM killer > should kill when this happens. Unless it's PID 1. Sigh. > There's the history and there's the probably-should-happen. I'm sure this can be improved. What I think should probably happen is: - if a client is trying to send a message and the bus's incoming buffer from that client is full, the bus should stop reading (forcing the client to do its own buffering). - if a client is not consuming messages fast enough and the bus's outgoing buffer to that client fills up, the client should be disconnected. This would essentially copy how the X server works (again). The original userspace implementation has configurable buffer size limits and also limits on resources (such as number of connections and match rules) used by a single user, but I don't think it does the right things when limits are reached. When the incoming queue is full for a client, I'm not sure whether it stops reading from that client or sends the client errors, I don't remember. When the outgoing-from-the-daemon queue is full (a client isn't reading messages fast enough), if I remember right messages to that client are dropped with an error reply to the sender - this error probably gets ignored much of the time in practice, but in theory the sender could retry. A full outgoing queue for one client doesn't affect other clients, who are still able to receive messages. For broadcast messages, a full queue means a client will miss those broadcasts. Disconnecting might be better than this drop-the-message behavior, because clients could then assume that *either* they got all messages that were broadcast, *or* they got disconnected - they won't ever silently miss broadcasts and end up in a weird confused state. Xserver does this - if I'm reading the code correctly just now (xserver/os/io.c, FlushClient()), it buffers outgoing messages until realloc fails, and then it disconnects the client. If X didn't do this, then clients could miss events and become confused about the state of the server. The same will often apply in dbus scenarios. In practice right now APIs are designed and limits are configured to try to avoid ever hitting the limits (unless something is malicious or badly broken), because if you hit them things go to hell - much like running out of memory, or hitting file descriptor limits. Disconnecting slow-reading clients would probably improve this; the full buffer would be instantly freed, and the client could reconnect and re-establish all state it cares about, if it wants to. So it might gracefully recover sometimes, if the problem was transient. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 20:22 ` Andy Lutomirski ` (2 preceding siblings ...) 2015-04-15 21:58 ` Havoc Pennington @ 2015-04-16 13:13 ` Tom Gundersen 2015-04-16 14:34 ` Andy Lutomirski 2015-04-16 19:01 ` Havoc Pennington 3 siblings, 2 replies; 333+ messages in thread From: Tom Gundersen @ 2015-04-16 13:13 UTC (permalink / raw) To: Andy Lutomirski Cc: Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/15/2015 10:22 PM, Andy Lutomirski wrote: > On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote: >> That is, with dbus if I send a broadcast message, then send a unicast >> request to another client, then drop the connection causing the bus to >> broadcast that I've dropped; then the other client will see those >> things in that order - the broadcast, then the request, and then that >> I've dropped the connection. > > This leads me to a potentially interesting question: where's the > buffering? If there's a bus with lots of untrusted clients and one of > them broadcasts data faster than all receivers can process it, where > does it go? The concepts implemented in kdbus are actually quite different from dbus1: Every connection to the bus has a memory pool assigned to store incoming messages and variably sized runtime data returned by kdbus. The pool memory is swappable, backed by a shmem file which is associated with the bus connection. Also, broadcasts are opt-in, so you only receive them if you subscribed for the specific signal. It is either sent by another userspace task, or by the kernel itself for things like name owner changes. In order to receive those, a connection must install a match. By default, no-one will receive any broadcasts. All types of messages (unicast and broadcast) are directly stored into a pool slice of the receiving connection, and this slice is not reused by the kernel until userspace is finished with it and frees it. Hence, a client which doesn't process its incoming messages will, at some point, run out of pool space. If that happens for unicast messages, the sender will get an EXFULL error. If it happens for a multicast message, all we can do is drop the message, and tell the receiver how many messages have been lost when it issues KDBUS_CMD_RECV the next time. There's more on that in kdbus.message(7). Also note that there is a quota logic in kdbus which protects against a single connection conducting a DOS against another one. Together with the policy code, this logic prevents one peer from flooding the pool of another peer. Communication with a 3rd party is not affected by this, due to the fair allocation scheme of the pool logic. All this is explained in detail in kdbus.pool(7), but please let us know if anything there is unclear. > At least with a userspace solution, it's clear what the OOM killer > should kill when this happens. Unless it's PID 1. Sigh. No, if the buffering was done in the sender, the OOM killer would catch the sending peer, which is of course the wrong thing to do, because one connection could blow up a task simply by not responding to the messages it sends. This is the reason why the pool concept was a design principle in kdbus from the very beginning. Cheers, Tom ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 13:13 ` Tom Gundersen @ 2015-04-16 14:34 ` Andy Lutomirski 2015-04-16 15:01 ` David Herrmann 2015-04-16 19:01 ` Havoc Pennington 1 sibling, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-16 14:34 UTC (permalink / raw) To: Tom Gundersen Cc: Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 16, 2015 at 6:13 AM, Tom Gundersen <teg@jklm.no> wrote: > On 04/15/2015 10:22 PM, Andy Lutomirski wrote: >> On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote: >>> That is, with dbus if I send a broadcast message, then send a unicast >>> request to another client, then drop the connection causing the bus to >>> broadcast that I've dropped; then the other client will see those >>> things in that order - the broadcast, then the request, and then that >>> I've dropped the connection. >> >> This leads me to a potentially interesting question: where's the >> buffering? If there's a bus with lots of untrusted clients and one of >> them broadcasts data faster than all receivers can process it, where >> does it go? > > The concepts implemented in kdbus are actually quite different from dbus1: > > Every connection to the bus has a memory pool assigned to store > incoming messages and variably sized runtime data returned by kdbus. > The pool memory is swappable, backed by a shmem file which is > associated with the bus connection. > > Also, broadcasts are opt-in, so you only receive them if you > subscribed for the specific signal. It is either sent by another > userspace task, or by the kernel itself for things like name owner > changes. In order to receive those, a connection must install a match. > By default, no-one will receive any broadcasts. > > All types of messages (unicast and broadcast) are directly stored into > a pool slice of the receiving connection, and this slice is not reused > by the kernel until userspace is finished with it and frees it. Hence, > a client which doesn't process its incoming messages will, at some > point, run out of pool space. If that happens for unicast messages, > the sender will get an EXFULL error. If it happens for a multicast > message, all we can do is drop the message, and tell the receiver how > many messages have been lost when it issues KDBUS_CMD_RECV the next > time. There's more on that in kdbus.message(7). > > Also note that there is a quota logic in kdbus which protects against > a single connection conducting a DOS against another one. Together > with the policy code, this logic prevents one peer from flooding the > pool of another peer. Communication with a 3rd party is not affected > by this, due to the fair allocation scheme of the pool logic. > > All this is explained in detail in kdbus.pool(7), but please let us > know if anything there is unclear. > This is neat, but it sounds like it will potentially add large amounts of latency under even mild memory pressure. Whose memcg does the pool use? If it's the receiver's, and if the receiver can configure a memcg, then it seems that even a single receiver could probably cause the sender to block for an unlimited amount of time. (And yes, I really hope that some day the cgroupns issues get resolved and some programs really will be able to create their own cgroups, even on systemd-using systems using the systemd-blessed configuration.) --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 14:34 ` Andy Lutomirski @ 2015-04-16 15:01 ` David Herrmann 2015-04-16 17:04 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: David Herrmann @ 2015-04-16 15:01 UTC (permalink / raw) To: Andy Lutomirski Cc: Tom Gundersen, Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, Djalal Harouni Hi On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote: > Whose memcg does the pool use? The pool-owner's (i.e., the receiver's). > If it's the receiver's, and if the > receiver can configure a memcg, then it seems that even a single > receiver could probably cause the sender to block for an unlimited > amount of time. How? Which of those calls can block? I don't see how that can happen. Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 15:01 ` David Herrmann @ 2015-04-16 17:04 ` Andy Lutomirski 2015-04-17 9:19 ` Michal Hocko 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-16 17:04 UTC (permalink / raw) To: David Herrmann Cc: Tom Gundersen, Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, Djalal Harouni On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote: > Hi > > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> Whose memcg does the pool use? > > The pool-owner's (i.e., the receiver's). > >> If it's the receiver's, and if the >> receiver can configure a memcg, then it seems that even a single >> receiver could probably cause the sender to block for an unlimited >> amount of time. > > How? Which of those calls can block? I don't see how that can happen. I admit I don't fully understand memcg, but vfs_iter_write is presumably going to need to get write access to the target pool page, and that, in turn, will need that page to exist in memory and to be writable, which may need to page it in and/or allocate a page. If that uses the receiver's memcg (as it should), then the receiver can make it block. Even if it doesn't use the receiver's memcg, it can trigger direct reclaim, I think. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 17:04 ` Andy Lutomirski @ 2015-04-17 9:19 ` Michal Hocko 2015-04-17 18:54 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Michal Hocko @ 2015-04-17 9:19 UTC (permalink / raw) To: Andy Lutomirski Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, Djalal Harouni On Thu 16-04-15 10:04:17, Andy Lutomirski wrote: > On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote: > > Hi > > > > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote: > >> Whose memcg does the pool use? > > > > The pool-owner's (i.e., the receiver's). > > > >> If it's the receiver's, and if the > >> receiver can configure a memcg, then it seems that even a single > >> receiver could probably cause the sender to block for an unlimited > >> amount of time. > > > > How? Which of those calls can block? I don't see how that can happen. > > I admit I don't fully understand memcg, but vfs_iter_write is > presumably going to need to get write access to the target pool page, > and that, in turn, will need that page to exist in memory and to be > writable, which may need to page it in and/or allocate a page. If > that uses the receiver's memcg (as it should), then the receiver can > make it block. Even if it doesn't use the receiver's memcg, it can > trigger direct reclaim, I think. Yes, memcg direct reclaim might trigger but we are no longer waiting for the OOM victim from non page fault paths so the time is bounded. It still might a quite some time, though, depending on the amount of work done in the direct reclaim. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-17 9:19 ` Michal Hocko @ 2015-04-17 18:54 ` Andy Lutomirski 2015-04-20 12:43 ` Michal Hocko 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-17 18:54 UTC (permalink / raw) To: Michal Hocko Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, Djalal Harouni On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <mhocko@suse.cz> wrote: > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote: >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote: >> > Hi >> > >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> Whose memcg does the pool use? >> > >> > The pool-owner's (i.e., the receiver's). >> > >> >> If it's the receiver's, and if the >> >> receiver can configure a memcg, then it seems that even a single >> >> receiver could probably cause the sender to block for an unlimited >> >> amount of time. >> > >> > How? Which of those calls can block? I don't see how that can happen. >> >> I admit I don't fully understand memcg, but vfs_iter_write is >> presumably going to need to get write access to the target pool page, >> and that, in turn, will need that page to exist in memory and to be >> writable, which may need to page it in and/or allocate a page. If >> that uses the receiver's memcg (as it should), then the receiver can >> make it block. Even if it doesn't use the receiver's memcg, it can >> trigger direct reclaim, I think. > > Yes, memcg direct reclaim might trigger but we are no longer waiting for > the OOM victim from non page fault paths so the time is bounded. It > still might a quite some time, though, depending on the amount of work > done in the direct reclaim. Is that still true if OOM notifiers are involved? I've lost track of what changed there. Any any event, I'm not entirely convinced that having a broadcast send cause, say, PID 1 to block until an unbounded number of pages in a potentially unbounded number of memcgs are reclaimed is a good idea. In the kdbus model's favor, I think that allowing pages of data in the receive queue to be swapped out is potentially quite nice, but I'm less convinced about non-full pages in the receive queue. There's a resource management tradeoff here, and one nice thing about AF_UNIX is that sends are genuinely non-blocking. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-17 18:54 ` Andy Lutomirski @ 2015-04-20 12:43 ` Michal Hocko 2015-04-20 20:03 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: Michal Hocko @ 2015-04-20 12:43 UTC (permalink / raw) To: Andy Lutomirski Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, Djalal Harouni On Fri 17-04-15 11:54:42, Andy Lutomirski wrote: > On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <mhocko@suse.cz> wrote: > > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote: > >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote: > >> > Hi > >> > > >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote: > >> >> Whose memcg does the pool use? > >> > > >> > The pool-owner's (i.e., the receiver's). > >> > > >> >> If it's the receiver's, and if the > >> >> receiver can configure a memcg, then it seems that even a single > >> >> receiver could probably cause the sender to block for an unlimited > >> >> amount of time. > >> > > >> > How? Which of those calls can block? I don't see how that can happen. > >> > >> I admit I don't fully understand memcg, but vfs_iter_write is > >> presumably going to need to get write access to the target pool page, > >> and that, in turn, will need that page to exist in memory and to be > >> writable, which may need to page it in and/or allocate a page. If > >> that uses the receiver's memcg (as it should), then the receiver can > >> make it block. Even if it doesn't use the receiver's memcg, it can > >> trigger direct reclaim, I think. > > > > Yes, memcg direct reclaim might trigger but we are no longer waiting for > > the OOM victim from non page fault paths so the time is bounded. It > > still might a quite some time, though, depending on the amount of work > > done in the direct reclaim. > > Is that still true if OOM notifiers are involved? I've lost track of > what changed there. memcg OOM is not triggered from get_user_pages. See 519e52473ebe (mm: memcg: enable memcg OOM killer only for user faults) > Any any event, I'm not entirely convinced that having a broadcast send > cause, say, PID 1 to block until an unbounded number of pages in a > potentially unbounded number of memcgs are reclaimed is a good idea. This deserves a clarification I guess. It is the memcg of the current task which gets charged during the page fault normally. So if PID1 tries to fault the memory in it will be its (most probably root) memcg which gets charged. If the memory was already charged to a different task's memcg and then it got swapped out, though, the PID1 would indeed wait for the reclaim in the target memcg to swap the page back in. In either case this sounds like a potential problem, because tasks could hide their memory charges from the limit or PID1 context could be blocked. But maybe I just misunderstood the and an uncharged memory cannot be used for the buffer. > In the kdbus model's favor, I think that allowing pages of data in the > receive queue to be swapped out is potentially quite nice, but I'm > less convinced about non-full pages in the receive queue. There's a > resource management tradeoff here, and one nice thing about AF_UNIX is > that sends are genuinely non-blocking. > > --Andy -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-20 12:43 ` Michal Hocko @ 2015-04-20 20:03 ` Andy Lutomirski 0 siblings, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-20 20:03 UTC (permalink / raw) To: Michal Hocko Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, Djalal Harouni On Mon, Apr 20, 2015 at 5:43 AM, Michal Hocko <mhocko@suse.cz> wrote: > On Fri 17-04-15 11:54:42, Andy Lutomirski wrote: >> On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <mhocko@suse.cz> wrote: >> > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote: >> >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote: >> >> > Hi >> >> > >> >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> >> Whose memcg does the pool use? >> >> > >> >> > The pool-owner's (i.e., the receiver's). >> >> > >> >> >> If it's the receiver's, and if the >> >> >> receiver can configure a memcg, then it seems that even a single >> >> >> receiver could probably cause the sender to block for an unlimited >> >> >> amount of time. >> >> > >> >> > How? Which of those calls can block? I don't see how that can happen. >> >> >> >> I admit I don't fully understand memcg, but vfs_iter_write is >> >> presumably going to need to get write access to the target pool page, >> >> and that, in turn, will need that page to exist in memory and to be >> >> writable, which may need to page it in and/or allocate a page. If >> >> that uses the receiver's memcg (as it should), then the receiver can >> >> make it block. Even if it doesn't use the receiver's memcg, it can >> >> trigger direct reclaim, I think. >> > >> > Yes, memcg direct reclaim might trigger but we are no longer waiting for >> > the OOM victim from non page fault paths so the time is bounded. It >> > still might a quite some time, though, depending on the amount of work >> > done in the direct reclaim. >> >> Is that still true if OOM notifiers are involved? I've lost track of >> what changed there. > > memcg OOM is not triggered from get_user_pages. See 519e52473ebe (mm: > memcg: enable memcg OOM killer only for user faults) > >> Any any event, I'm not entirely convinced that having a broadcast send >> cause, say, PID 1 to block until an unbounded number of pages in a >> potentially unbounded number of memcgs are reclaimed is a good idea. > > This deserves a clarification I guess. It is the memcg of the current > task which gets charged during the page fault normally. So if PID1 tries > to fault the memory in it will be its (most probably root) memcg which > gets charged. If the memory was already charged to a different task's > memcg and then it got swapped out, though, the PID1 would indeed wait > for the reclaim in the target memcg to swap the page back in. > > In either case this sounds like a potential problem, because tasks > could hide their memory charges from the limit or PID1 context could > be blocked. But maybe I just misunderstood the and an uncharged memory > cannot be used for the buffer. > Hmm. One of the explicit design goals of kdbus is for sandboxing, i.e. creating a restricted view ("endpoint") and letting sandboxed things talk to non-sandboxed things outside through that restricted view. Given that, the ability for a broadcast receiver to cause a sender (PID 1?) to allocate root-memcg pages seems like it could be a problem. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 13:13 ` Tom Gundersen 2015-04-16 14:34 ` Andy Lutomirski @ 2015-04-16 19:01 ` Havoc Pennington 2015-04-17 13:23 ` Daniel Mack 1 sibling, 1 reply; 333+ messages in thread From: Havoc Pennington @ 2015-04-16 19:01 UTC (permalink / raw) To: Tom Gundersen Cc: Andy Lutomirski, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 16, 2015 at 9:13 AM, Tom Gundersen <teg@jklm.no> wrote: > All types of messages (unicast and broadcast) are directly stored into > a pool slice of the receiving connection, and this slice is not reused > by the kernel until userspace is finished with it and frees it. Hence, > a client which doesn't process its incoming messages will, at some > point, run out of pool space. If that happens for unicast messages, > the sender will get an EXFULL error. If it happens for a multicast > message, all we can do is drop the message, and tell the receiver how > many messages have been lost when it issues KDBUS_CMD_RECV the next > time. There's more on that in kdbus.message(7). > Have you guys already grappled with what libraries/apps should do with this information? To handle the knowledge that "N messages have been lost," it seems like the client must answer "are there any messages that, if lost, would put any code using this connection into a confused state" and then the client has to recover from said confused state. A library probably can't do this - it doesn't know what state matters or how to recover it - so each app would have to... and are connections ever shared between modules of an app? (for example: could a library such as GTK+ or pulseaudio be using the connection, and then application code is also using the connection, so none of those code modules has the whole picture... at that point, none of the modules knows what to do about lost messages... to try to handle lost messages in a module, you'd need a private connection(?)... which might be fine as long as each app having a number of connections isn't too bloated.) How to handle a send error depends a lot on what's being sent... but if I were writing a general-purpose library wrapper, I'd be very tempted to hide EXFULL behind an unbounded (or very-high-bounded) userspace send buffer, which of course is what you were trying to avoid, but I am skeptical that the average app will handle this error sensibly. The traditional userspace bus isn't any better than what you've described here, of course - it's even worse - and it works well enough. The limits are simply set high enough that they won't be hit unless someone's broken or evil. Which is also the traditional approach to say file descriptor limits or swap space: set the limit high and hope you won't reach it. For the case of the X server, the limit on message buffers appears to be "until malloc fails," so they have the limit quite high, higher than userspace dbus does. "set high limits and don't hit them" is a tried-and-true approach. With either the existing userspace bus or kdbus, I bet you could come up with ways to use limit exhaustion to get various services and apps into confused states as they miss messages they were relying on, simply because this is too hard for apps to reliably get right. The lower the limits, the easier it would be to cause trouble by forcing them to be hit. In a perfect world we could figure out which client is "at fault" for filling a buffer - the slow receiver or the overzealous sender - so we could throttle or disconnect the guilty party instead of throwing errors that won't be handled well ... but not sure that's practical. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 19:01 ` Havoc Pennington @ 2015-04-17 13:23 ` Daniel Mack 2015-04-17 14:54 ` Havoc Pennington 0 siblings, 1 reply; 333+ messages in thread From: Daniel Mack @ 2015-04-17 13:23 UTC (permalink / raw) To: Havoc Pennington, Tom Gundersen Cc: Andy Lutomirski, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, David Herrmann, Djalal Harouni Hi Havoc, On 04/16/2015 09:01 PM, Havoc Pennington wrote: > On Thu, Apr 16, 2015 at 9:13 AM, Tom Gundersen <teg@jklm.no> wrote: >> All types of messages (unicast and broadcast) are directly stored into >> a pool slice of the receiving connection, and this slice is not reused >> by the kernel until userspace is finished with it and frees it. Hence, >> a client which doesn't process its incoming messages will, at some >> point, run out of pool space. If that happens for unicast messages, >> the sender will get an EXFULL error. If it happens for a multicast >> message, all we can do is drop the message, and tell the receiver how >> many messages have been lost when it issues KDBUS_CMD_RECV the next >> time. There's more on that in kdbus.message(7). > > Have you guys already grappled with what libraries/apps should do with > this information? > > To handle the knowledge that "N messages have been lost," it seems > like the client must answer "are there any messages that, if lost, > would put any code using this connection into a confused state" and > then the client has to recover from said confused state. This can only happen with user-originated DBus signal messages. For unicast messages such as method calls, the sender will actually see -EXFULL, and no part of the message is transmitted, leaving neither side in a confused state. But yes, for broadcast signal messages, we can't reject the sender because one single peer is out of buffer space, and we can't allow boundless allocations on the receiver either, so informing the other side is the best we can do. Note that dbus-daemon just drops such signals silently. So with this counter we simply add a debug mechanism for now. There hasn't been a consensus on how to react to such errors on the application level. The easiest way is obviously to re-sync all your state with the peer (which could be as easy as calling ObjectManager.GetManagedObjects() or Properties.GetAll()). > A library probably can't do this - it doesn't know what state matters > or how to recover it - so each app would have to... and are > connections ever shared between modules of an app? (for example: could > a library such as GTK+ or pulseaudio be using the connection, and then > application code is also using the connection, so none of those code > modules has the whole picture... at that point, none of the modules > knows what to do about lost messages... to try to handle lost messages > in a module, you'd need a private connection(?)... which might be fine > as long as each app having a number of connections isn't too bloated.) > > How to handle a send error depends a lot on what's being sent... but > if I were writing a general-purpose library wrapper, I'd be very > tempted to hide EXFULL behind an unbounded (or very-high-bounded) > userspace send buffer, which of course is what you were trying to > avoid, but I am skeptical that the average app will handle this error > sensibly. Actually, we see no real difference between constrained outgoing or incoming buffers. Even with a very-high-bounded send-buffer, you still need to deal with it running full. > The traditional userspace bus isn't any better than what you've > described here, of course - it's even worse - and it works well > enough. The limits are simply set high enough that they won't be hit > unless someone's broken or evil. Which is also the traditional > approach to say file descriptor limits or swap space: set the limit > high and hope you won't reach it. For the case of the X server, the > limit on message buffers appears to be "until malloc fails," so they > have the limit quite high, higher than userspace dbus does. "set high > limits and don't hit them" is a tried-and-true approach. > > With either the existing userspace bus or kdbus, I bet you could come > up with ways to use limit exhaustion to get various services and apps > into confused states as they miss messages they were relying on, > simply because this is too hard for apps to reliably get right. The > lower the limits, the easier it would be to cause trouble by forcing > them to be hit. > > In a perfect world we could figure out which client is "at fault" for > filling a buffer - the slow receiver or the overzealous sender - so we > could throttle or disconnect the guilty party instead of throwing > errors that won't be handled well ... but not sure that's practical. Exactly, you need heuristics for that. It's non-trivial to figure out whether the receiver or sender is to blame. We've thought about how to address that for a while and came up with a quota logic that is similar to what dbus-daemon implements in order to prevent single connections from overflowing the pool of a receiver. The limits that apply to that are currently hard-coded, and they work well on our systems. In the future, they can easily be made a bus-wide property that can be configured at bus creation time. Thanks, Daniel ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-17 13:23 ` Daniel Mack @ 2015-04-17 14:54 ` Havoc Pennington 0 siblings, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-17 14:54 UTC (permalink / raw) To: Daniel Mack Cc: Tom Gundersen, Andy Lutomirski, Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, linux-kernel, David Herrmann, Djalal Harouni On Fri, Apr 17, 2015 at 9:23 AM, Daniel Mack <daniel@zonque.org> wrote: > > This can only happen with user-originated DBus signal messages. For > unicast messages such as method calls, the sender will actually see > -EXFULL, and no part of the message is transmitted, leaving neither side > in a confused state. Well - big asterisk, * no confused state IF the sender handles EXFULL in a reasonable way. Which it probably doesn't most of the time :-) but as you say it's no worse than it ever was. > But yes, for broadcast signal messages, we can't > reject the sender because one single peer is out of buffer space, and we > can't allow boundless allocations on the receiver either, so informing > the other side is the best we can do. If this was ever going to happen (if the limits weren't high), I do think it would be better to disconnect or throttle/backpressure somehow, instead of breaking semantics. But the trouble is figuring out how to do that... I don't know how. So the alternative is to set high limits. I think you're fine, it obviously works OK with the current userspace daemon that punts in a similar way, and unix has a long tradition of limits like this plus applications sucking at handling the "limit reached" errors. It'll all work out... > Note that dbus-daemon just drops such signals silently. So with this > counter we simply add a debug mechanism for now. There hasn't been a > consensus on how to react to such errors on the application level. The > easiest way is obviously to re-sync all your state with the peer (which > could be as easy as calling ObjectManager.GetManagedObjects() or > Properties.GetAll()). It's not realistic to expect the bulk of apps to handle this thing. Special system services such as pid 1, you probably have the expertise and time to try to carefully restore all state. Regular old apps will get confused in practice if limits are hit in practice, but people will configure the limits such that they're only hit if there's some pathology going on. >> How to handle a send error depends a lot on what's being sent... but >> if I were writing a general-purpose library wrapper, I'd be very >> tempted to hide EXFULL behind an unbounded (or very-high-bounded) >> userspace send buffer, which of course is what you were trying to >> avoid, but I am skeptical that the average app will handle this error >> sensibly. > > Actually, we see no real difference between constrained outgoing or > incoming buffers. Even with a very-high-bounded send-buffer, you still > need to deal with it running full. What I'm saying is that there's a practical difference between limits low enough to be hit in normal operation, and limits high enough that someobody has to be evil/broken before you hit them. With the "throw an error" setup, if you set the limits low enough to be hit in practice, then userspace will be buggy and break - that's my prediction at least. It's not different from say the file descriptor limits. If you crank down your allowed open descriptors such that a user session actually hits the limit, pretty much the session isn't usable. That's all I'm saying. If you wanted to be able to configure the limits low, where they'd be hit in practice, then I think you'd want to look at some solution other than tossing these errors that people will fail to handle correctly, even if that solution were complex and/or heuristic. If you set the limits high, it doesn't really matter so you can KISS. It's sort of an academic point ... tons of kernel features already have this issue. So carry on, you're good. :-) Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 16:44 ` Havoc Pennington 2015-04-15 18:16 ` Steven Rostedt 2015-04-15 20:22 ` Andy Lutomirski @ 2015-04-15 22:08 ` One Thousand Gnomes 2015-04-16 13:14 ` Daniel Mack 2 siblings, 1 reply; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-15 22:08 UTC (permalink / raw) To: Havoc Pennington Cc: Rik van Riel, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni > When trying to split apart problems, for dbus it's important to keep > ordering guarantees. Yes I assumed that - minus disconnection/reconnect and running out of queue space. Some users also want priority queueing (with or without the guarantee for the same priority). Many of the other systems that can use a fast multicast messaging system have priority queues - which is one reason the existing POSIX messaging has priority. > That is, with dbus if I send a broadcast message, then send a unicast > request to another client, then drop the connection causing the bus to > broadcast that I've dropped; then the other client will see those > things in that order - the broadcast, then the request, and then that > I've dropped the connection. That's a simple matter of refcounting the buffers 8). I'm not really concerned about the low level queue side of things. The proposed implementation looks horribly convoluted for what the sk_buff layer can already do standing on one leg. We know how to implement that part cleanly, and its probably not hard to nail onto AF_UNIX or to expand posix message queues to provide that service (and maybe then even convince POSIX about it) If it was just "here's a general purpose multicast message service" in a small clean chunk of code I'd be cheering it into the tree. Even if you need complicated filter rules because we can use EBPF to allow the client library to do really sophisticated filtering and avoid wakeups for noise. It's the complexity, the attachment to a lot of state in kernel and the fact it doesn't appear to solve the general purpose problems that bothers me. > By pushing everything through one queue, dbus is trying to reduce the > number of codepaths in applications. Apps have a lot of new problems > to solve if messages get their order scrambled. And I assume any user space solution for that purpose would end up re-ordering messages if they could get shuffled so its > (dbus does NOT guarantee order across multiple clients, of course - > there's no guarantee that all clients get the broadcast, before anyone > gets the next message - each client has its own buffer on both read > and write. The ordering is only with respect to each client's message > stream.) > > Ordering is vital for tracking state, because if you're sending out > events to describe changes in state, the order of those changes is > important. Most of the time IMHO you don't want to listen to changes in state, you want to notice that the state wasn't the value it was before and adapt. > Of course there are more complex ways to handle this over in > distributed-systems-world. And publish/subscribe models - which for certain uses scale better, are easier to make reliable and avoid a lot of the mess. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-15 22:08 ` One Thousand Gnomes @ 2015-04-16 13:14 ` Daniel Mack 2015-04-16 17:15 ` One Thousand Gnomes 0 siblings, 1 reply; 333+ messages in thread From: Daniel Mack @ 2015-04-16 13:14 UTC (permalink / raw) To: One Thousand Gnomes, Havoc Pennington Cc: Rik van Riel, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, David Herrmann, Djalal Harouni On 04/16/2015 12:08 AM, One Thousand Gnomes wrote: >> When trying to split apart problems, for dbus it's important to keep >> ordering guarantees. > > Yes I assumed that - minus disconnection/reconnect and running out of > queue space. Some users also want priority queueing (with or without the > guarantee for the same priority). Many of the other systems that can use > a fast multicast messaging system have priority queues - which is one > reason the existing POSIX messaging has priority. And so does kdbus. By default, strict ordering is enforced when messages are received, but optionally, that action may be constrained to messages of a minimal priority. This allows for use cases where timing critical data is interleaved with control data on the same connection. That's described in kdbus.message(7), and is also covered by test cases. Thanks, Daniel ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-16 13:14 ` Daniel Mack @ 2015-04-16 17:15 ` One Thousand Gnomes 0 siblings, 0 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-16 17:15 UTC (permalink / raw) To: Daniel Mack Cc: Havoc Pennington, Rik van Riel, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel, David Herrmann, Djalal Harouni > And so does kdbus. By default, strict ordering is enforced when messages > are received, but optionally, that action may be constrained to messages > of a minimal priority. This allows for use cases where timing critical > data is interleaved with control data on the same connection. That's > described in kdbus.message(7), and is also covered by test cases. More to the point "and so do POSIX message queues". They are also a standard, a cross OS feature and relatively cleanly implemented in kernel, ditto some classes of socket behaviour are similar and SYS5 IPC (of which we shall not speak further I hope 8) ). I'm not saying that they solve the problem but they might avoid some of the complexities. Filtering is generalizable in Linux with a few lines of code, so rather than hardcoding dbus semantics EBPF can express pretty much any uni/multi/broadcast filtering policy rule for dbus or anything else. I agree entirely with Havoc that the ease of use wants to be preserved and semantics at the top of the dbus library shoudn't change. Dbus does have the problem of being too easy to use badly, but that's hard to fix technically 8) Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-13 19:03 Greg Kroah-Hartman 2015-04-13 19:29 ` Eric W. Biederman 2015-04-13 20:13 ` Andy Lutomirski @ 2015-04-23 13:05 ` Greg Kroah-Hartman 2015-04-23 14:17 ` One Thousand Gnomes ` (2 more replies) 2 siblings, 3 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-23 13:05 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton Cc: Arnd Bergmann, ebiederm, gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz On Mon, Apr 13, 2015 at 09:03:50PM +0200, Greg Kroah-Hartman wrote: > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051: > > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1 > > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336: > > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200) > > ---------------------------------------------------------------- Given this has been a crazy email thread, let's try to figure out what the status is here. Al Viro pointed out some odd locking (r/w lock only used in write mode), and asked for some more documentation / description of the object model used here. David provided that, and will send a minor fix for the rw lock, so I think that issue is now resolved. David has created a few other minor changes based on Al's review that I will forward on later. Andy's concerns about the capability stuff has been hashed out in multiple threads here. The kernel code isn't buggy as-designed or implemented from what we can all tell, it's just that the new functionality isn't liked by everyone, which is totally fair, but not a reason to declare that the function isn't useful. Alan, and others, want a tiny, generic, multi-cast IPC method that also works across networks. They feel that this is something that D-Bus might be able to use in the future in userspace to build on top of. Lots of people have said they want something like this for years, but that doesn't address the issue here with kdbus, which is a very specific solution for a very common and wide-spread usage model that Linux userspace relies on today. I too would love to see such an IPC be created, and two years ago thought it would be possible to achieve here. But over time, and in working with the D-Bus model and requirements, it just didn't happen here. Given that no one has ever been able to accomplish such a thing in the past means that it's either impossible to do, or that no one really wants such a thing bad enough to actually do the work :) Did I miss anything else here? Are there any technical reasons I'm forgetting about for why this can't be pulled in as-is for this merge window? As for merging this, due to some changes in the vfs tree, specifically due to 5d5d56897530 ("make new_sync_{read,write}() static"), after the kdbus code is merged with your latest tree, it can cause problems, as reported by Sergei Zviagintsev. I didn't want to rebase anything, and solving the issue against 3.19 would require us to export __vfs_read(), as Al already did in your tree, so you can just merge it, and then apply the patch I'll send in response to this message for it, which resolves the issue. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 13:05 ` Greg Kroah-Hartman @ 2015-04-23 14:17 ` One Thousand Gnomes 2015-04-23 16:36 ` Greg Kroah-Hartman 2015-04-23 18:33 ` Richard Weinberger 2 siblings, 0 replies; 333+ messages in thread From: One Thousand Gnomes @ 2015-04-23 14:17 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, ebiederm, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz > Alan, and others, want a tiny, generic, multi-cast IPC method that also > works across networks. They feel that this is something that D-Bus I never said - across networks. And locally it has been done, even microcontrollers have done it. > Lots of people have said they want something like this for years, but > that doesn't address the issue here with kdbus, which is a very specific > solution for a very common and wide-spread usage model that Linux You've missed off a variety of important points that have been raised - whether its a dumb model performancewise compared with using it to set up a memfd or similar - cgroup interactions - the heavyweight nature of going via get_user_pages and __vfs_read raher than just assuming message sizes are sensibly constrained and could far better just be allocated and copied to a refcounted kernel buffer - exposure of capabilities and how you futureproof it > userspace relies on today. I too would love to see such an IPC be > created, and two years ago thought it would be possible to achieve > here. But over time, and in working with the D-Bus model and > requirements, it just didn't happen here. Given that no one has ever > been able to accomplish such a thing in the past means that it's either > impossible to do, or that no one really wants such a thing bad enough to > actually do the work :) > > Did I miss anything else here? Are there any technical reasons I'm > forgetting about for why this can't be pulled in as-is for this merge > window? Like the outstanding NACKS ? Greg - you are sounding like you have some kind of special entitlement to ignore the way this works for everyone else. If you are feeling frustrated, annoyed and led up several avenues at once then welcome to the world of every other submitter who doesn't think have some kind of magic stage door pass to get their crap in the kernel when there are core maintainers asking hard and unanswerd questions and who have nacked it. There's no huge hurry. There are a bunch of things like the interactions with cgroups, and the privilege and capability model which need careful examination. Slipping it one release to get that right isn't a big deal - it's not even as if you can't use hardware without it as with a driver missing a merge - this is just a performance tweak. Alan ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 13:05 ` Greg Kroah-Hartman 2015-04-23 14:17 ` One Thousand Gnomes @ 2015-04-23 16:36 ` Greg Kroah-Hartman 2015-04-23 16:46 ` Andy Lutomirski 2015-04-23 18:33 ` Richard Weinberger 2 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-23 16:36 UTC (permalink / raw) To: Linus Torvalds, luto, Andrew Morton Cc: Arnd Bergmann, ebiederm, gnomes, teg, jkosina, linux-kernel, daniel, dh.herrmann, tixxdz On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote: > > Andy's concerns about the capability stuff has been hashed out in > multiple threads here. The kernel code isn't buggy as-designed or > implemented from what we can all tell, it's just that the new > functionality isn't liked by everyone, which is totally fair, but not a > reason to declare that the function isn't useful. Andy, did I capture your existing position correctly? If we drop the caps metadata, I'm guessing that you are ok with the code as you have reviewed it and tested it out. So should I just add a small patch that removes this for now? After that, we can discuss the addition of capabilities to the metadata as an add-on feature with a future patch and not hold up this larger merge request? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 16:36 ` Greg Kroah-Hartman @ 2015-04-23 16:46 ` Andy Lutomirski 2015-04-23 17:16 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-23 16:46 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote: >> >> Andy's concerns about the capability stuff has been hashed out in >> multiple threads here. The kernel code isn't buggy as-designed or >> implemented from what we can all tell, it's just that the new >> functionality isn't liked by everyone, which is totally fair, but not a >> reason to declare that the function isn't useful. > > Andy, did I capture your existing position correctly? If we drop the > caps metadata, I'm guessing that you are ok with the code as you have > reviewed it and tested it out. So should I just add a small patch that > removes this for now? After that, we can discuss the addition of > capabilities to the metadata as an add-on feature with a future patch > and not hold up this larger merge request? No. I can fish out lists I've posted of what I personally dislike. To repeat from my not-yet-awake memory, briefly: - starttime, cmdline, and possibly other pieces of metadata are also problematic. I think starttime is especially bad because it both breaks CRIU and is IMO completely unnecessary -- I sent out draft "highpid" patches a while ago to give a much better alternative that isn't racy and won't break CRIU. But cmdline is also IMO ridiculous. - There's still an open performance question. Namely: is kdbus performant? - The policy system still sucks. Now, if we give up on the idea of anyone ever using it for anything other than dbus as it currently works, maybe this isn't a real problem. - Someone should probably convince someone who understands memory accounting that the pool mechanism accounts memory acceptably. I don't know much about mm stuff, but I think it's subject to all kinds of nasty latency and accounting abuses, some of which might even be exploited by accident. I haven't reviewed most of it. I've reviewed the metadata code (and not recently) and the pool *docs*. Shouldn't the bulk of this code have actual review before it gets merged? I've only reviewed some of it, and I didn't like what I found in that small fraction, hence my objections to caps. --Andy > > thanks, > > greg k-h -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 16:46 ` Andy Lutomirski @ 2015-04-23 17:16 ` Greg Kroah-Hartman 2015-04-23 17:34 ` Andy Lutomirski ` (3 more replies) 0 siblings, 4 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-23 17:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote: > On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote: > >> > >> Andy's concerns about the capability stuff has been hashed out in > >> multiple threads here. The kernel code isn't buggy as-designed or > >> implemented from what we can all tell, it's just that the new > >> functionality isn't liked by everyone, which is totally fair, but not a > >> reason to declare that the function isn't useful. > > > > Andy, did I capture your existing position correctly? If we drop the > > caps metadata, I'm guessing that you are ok with the code as you have > > reviewed it and tested it out. So should I just add a small patch that > > removes this for now? After that, we can discuss the addition of > > capabilities to the metadata as an add-on feature with a future patch > > and not hold up this larger merge request? > > No. I can fish out lists I've posted of what I personally dislike. > To repeat from my not-yet-awake memory, briefly: > > - starttime, cmdline, and possibly other pieces of metadata are also > problematic. I think starttime is especially bad because it both > breaks CRIU and is IMO completely unnecessary -- I sent out draft > "highpid" patches a while ago to give a much better alternative that > isn't racy and won't break CRIU. But cmdline is also IMO ridiculous. starttime was removed a while ago, are you sure you are looking at the latest code? cmdline has been discussed and it really helps with debugging. Decisions aren't being made based on it. > - There's still an open performance question. Namely: is kdbus performant? Yes, I thought that was already answered. Tizen posted some numbers with a much older version of the code, before David fixed a bunch of issues that he and you found, and that averaged between 25-50% faster. Details are in this presentation: http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf The Tizen and GENIVI developers are off running numbers with the latest code, or so they told me through emails, but I don't know when/if that will ever happen, so I can't promise more than what is already here. > - The policy system still sucks. Now, if we give up on the idea of > anyone ever using it for anything other than dbus as it currently > works, maybe this isn't a real problem. As designed, it's for D-Bus, so there's not much I can suggest here, this isn't a "generic IPC" :) The binder developers at Samsung have stated that the implementation we have here works for their model as well, so I guess that is some kind of verification it's not entirely tied to D-Bus. They have plans on dropping the existing binder kernel code and using the kdbus code instead when it is merged. > - Someone should probably convince someone who understands memory > accounting that the pool mechanism accounts memory acceptably. I > don't know much about mm stuff, but I think it's subject to all kinds > of nasty latency and accounting abuses, some of which might even be > exploited by accident. Michal and David agree that this all works properly. I don't know of anyone else to ask about it, do you? > I haven't reviewed most of it. I've reviewed the metadata code (and > not recently) and the pool *docs*. > > Shouldn't the bulk of this code have actual review before it gets > merged? I've only reviewed some of it, and I didn't like what I found > in that small fraction, hence my objections to caps. I'd love more review, and we have been asking for it since last October. You provided a lot of it a while ago, and that helped immensely. I can't force anyone to read the code, I can only go on what people offer to do. We have 3 signed-off-bys on the main kdbus patches, and numerous other different developers have provided fixes / tweaks that are in this tree, so it's not like this is unread/unposted code here at all. thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 17:16 ` Greg Kroah-Hartman @ 2015-04-23 17:34 ` Andy Lutomirski 2015-04-23 17:42 ` Stephen Smalley ` (2 subsequent siblings) 3 siblings, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-23 17:34 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 10:16 AM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote: >> On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >> > On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote: >> >> >> >> Andy's concerns about the capability stuff has been hashed out in >> >> multiple threads here. The kernel code isn't buggy as-designed or >> >> implemented from what we can all tell, it's just that the new >> >> functionality isn't liked by everyone, which is totally fair, but not a >> >> reason to declare that the function isn't useful. >> > >> > Andy, did I capture your existing position correctly? If we drop the >> > caps metadata, I'm guessing that you are ok with the code as you have >> > reviewed it and tested it out. So should I just add a small patch that >> > removes this for now? After that, we can discuss the addition of >> > capabilities to the metadata as an add-on feature with a future patch >> > and not hold up this larger merge request? >> >> No. I can fish out lists I've posted of what I personally dislike. >> To repeat from my not-yet-awake memory, briefly: >> >> - starttime, cmdline, and possibly other pieces of metadata are also >> problematic. I think starttime is especially bad because it both >> breaks CRIU and is IMO completely unnecessary -- I sent out draft >> "highpid" patches a while ago to give a much better alternative that >> isn't racy and won't break CRIU. But cmdline is also IMO ridiculous. > > starttime was removed a while ago, are you sure you are looking at the > latest code? No, I'm sure I haven't. I looked at the latest code just long enough to see that caps were still there. So the latest code is unreviewed by me or, as far as I can tell, by anyone else who should review it. > > cmdline has been discussed and it really helps with debugging. > Decisions aren't being made based on it. This might be addressed by the module parameter. Haven't checked recent versions. None of this addresses the fact that metadata is captured both at send and connect time. I still think that this is asking for tons of security problems down the line. > >> - There's still an open performance question. Namely: is kdbus performant? > > Yes, I thought that was already answered. Tizen posted some numbers > with a much older version of the code, before David fixed a bunch of > issues that he and you found, and that averaged between 25-50% faster. > Details are in this presentation: > http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf > AFAICS no one has ever even tried to address whether the kdbus design (shmem pools, send-time metadata, plus optional memfd) gives as good performance as plain ol' sockets. A lot of the complexity of kdbus is due to its novel buffering scheme, and that scheme AFAICS has only been seriously benchmarked against userspace dbus, which is a poor reference. I neither see any compelling a priori reason to think that the buffering scheme is a performance win, nor do I see good numbers. Instead, I've seen numbers suggesting that it's much slower than AF_UNIX peer to peer. I realize that it looks like I'm comparing apples (peer to peer) to oranges (bus), but that's just because AF_UNIX really is the best comparison in the absence of a serious attempt at a socket-like bus with benchmarks. > The Tizen and GENIVI developers are off running numbers with the latest > code, or so they told me through emails, but I don't know when/if that > will ever happen, so I can't promise more than what is already here. > >> - The policy system still sucks. Now, if we give up on the idea of >> anyone ever using it for anything other than dbus as it currently >> works, maybe this isn't a real problem. > > As designed, it's for D-Bus, so there's not much I can suggest here, > this isn't a "generic IPC" :) Move it to userspace with a daemon that answers policy questions and makes introductions? > >> - Someone should probably convince someone who understands memory >> accounting that the pool mechanism accounts memory acceptably. I >> don't know much about mm stuff, but I think it's subject to all kinds >> of nasty latency and accounting abuses, some of which might even be >> exploited by accident. > > Michal and David agree that this all works properly. I don't know of > anyone else to ask about it, do you? I thought Michal wasn't a little less convinced. I really don't see why pages allocated due to sends would be charged to the receiver, nor do I see why, even if that were fixed, it wouldn't be a serious performance problem with memcgs and memory pressure in play. I'm really surprised that GENIVI is okay with this. The latency seems like it will be highly unpredictable. > >> I haven't reviewed most of it. I've reviewed the metadata code (and >> not recently) and the pool *docs*. >> >> Shouldn't the bulk of this code have actual review before it gets >> merged? I've only reviewed some of it, and I didn't like what I found >> in that small fraction, hence my objections to caps. > > I'd love more review, and we have been asking for it since last October. > You provided a lot of it a while ago, and that helped immensely. > > I can't force anyone to read the code, I can only go on what people > offer to do. We have 3 signed-off-bys on the main kdbus patches, and > numerous other different developers have provided fixes / tweaks that > are in this tree, so it's not like this is unread/unposted code here at > all. I think it doesn't help that reviewing the code can be a painful exercise when threads about a single review point drag on for hundreds of posts. Also, it's discouraging that, after a single review point results in hundreds of posts, reviewers get asked whether everything's okay now. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 17:16 ` Greg Kroah-Hartman 2015-04-23 17:34 ` Andy Lutomirski @ 2015-04-23 17:42 ` Stephen Smalley 2015-04-23 19:30 ` Greg Kroah-Hartman 2015-04-23 17:57 ` Linus Torvalds 2015-04-24 13:50 ` Lukasz Skalski 3 siblings, 1 reply; 333+ messages in thread From: Stephen Smalley @ 2015-04-23 17:42 UTC (permalink / raw) To: Greg Kroah-Hartman, Andy Lutomirski Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote: > The binder developers at Samsung have stated that the implementation we > have here works for their model as well, so I guess that is some kind of > verification it's not entirely tied to D-Bus. They have plans on > dropping the existing binder kernel code and using the kdbus code > instead when it is merged. Where do things stand wrt LSM hooks for kdbus? I don't see any security hook calls in the kdbus tree except for the purpose of metadata collection of process security labels. But nothing for enforcing MAC over kdbus IPC. binder has a set of security hooks for that purpose, so it would be a regression wrt MAC enforcement to switch from binder to kdbus without equivalent checking there. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 17:42 ` Stephen Smalley @ 2015-04-23 19:30 ` Greg Kroah-Hartman 2015-04-24 2:08 ` Karol Lewandowski 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-23 19:30 UTC (permalink / raw) To: Stephen Smalley, Karol Lewandowski Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote: > On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote: > > The binder developers at Samsung have stated that the implementation we > > have here works for their model as well, so I guess that is some kind of > > verification it's not entirely tied to D-Bus. They have plans on > > dropping the existing binder kernel code and using the kdbus code > > instead when it is merged. > > Where do things stand wrt LSM hooks for kdbus? I don't see any security > hook calls in the kdbus tree except for the purpose of metadata > collection of process security labels. But nothing for enforcing MAC > over kdbus IPC. binder has a set of security hooks for that purpose, so > it would be a regression wrt MAC enforcement to switch from binder to > kdbus without equivalent checking there. There was a set of LSM hooks proposed for kdbus posted by Karol Lewandowsk last October, and it also included SELinux and Smack patches. They were going to be refreshed based on the latest code changes, but I haven't seen them posted, or I can't seem to find them in my limited email archive. Karol, what's the status of them? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 19:30 ` Greg Kroah-Hartman @ 2015-04-24 2:08 ` Karol Lewandowski 2015-04-29 21:16 ` Paul Moore 0 siblings, 1 reply; 333+ messages in thread From: Karol Lewandowski @ 2015-04-24 2:08 UTC (permalink / raw) To: Greg Kroah-Hartman, Paul Osmialowski Cc: Stephen Smalley, Karol Lewandowski, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, k.lewandowsk On Thu, Apr 23, 2015 at 09:30:13PM +0200, Greg Kroah-Hartman wrote: > On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote: > > On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote: > > > The binder developers at Samsung have stated that the implementation we > > > have here works for their model as well, so I guess that is some kind of > > > verification it's not entirely tied to D-Bus. They have plans on > > > dropping the existing binder kernel code and using the kdbus code > > > instead when it is merged. > > > > Where do things stand wrt LSM hooks for kdbus? I don't see any security > > hook calls in the kdbus tree except for the purpose of metadata > > collection of process security labels. But nothing for enforcing MAC > > over kdbus IPC. binder has a set of security hooks for that purpose, so > > it would be a regression wrt MAC enforcement to switch from binder to > > kdbus without equivalent checking there. > > There was a set of LSM hooks proposed for kdbus posted by Karol > Lewandowsk last October, and it also included SELinux and Smack patches. > They were going to be refreshed based on the latest code changes, but I > haven't seen them posted, or I can't seem to find them in my limited > email archive. We have been waiting for right moment with these. :-) > Karol, what's the status of them? I have handed patchset over to Paul Osmialowski who started rework it for v4 relatively recently. I think it shouldn't be that hard to post updated version... Paul? ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 2:08 ` Karol Lewandowski @ 2015-04-29 21:16 ` Paul Moore 0 siblings, 0 replies; 333+ messages in thread From: Paul Moore @ 2015-04-29 21:16 UTC (permalink / raw) To: Karol Lewandowski Cc: Greg Kroah-Hartman, Paul Osmialowski, Stephen Smalley, Karol Lewandowski, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, k.lewandowsk On Fri, Apr 24, 2015 at 4:08 AM, Karol Lewandowski <karol.k.lewandowski@gmail.com> wrote: > On Thu, Apr 23, 2015 at 09:30:13PM +0200, Greg Kroah-Hartman wrote: >> On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote: >> > On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote: >> > > The binder developers at Samsung have stated that the implementation we >> > > have here works for their model as well, so I guess that is some kind of >> > > verification it's not entirely tied to D-Bus. They have plans on >> > > dropping the existing binder kernel code and using the kdbus code >> > > instead when it is merged. >> > >> > Where do things stand wrt LSM hooks for kdbus? I don't see any security >> > hook calls in the kdbus tree except for the purpose of metadata >> > collection of process security labels. But nothing for enforcing MAC >> > over kdbus IPC. binder has a set of security hooks for that purpose, so >> > it would be a regression wrt MAC enforcement to switch from binder to >> > kdbus without equivalent checking there. >> >> There was a set of LSM hooks proposed for kdbus posted by Karol >> Lewandowsk last October, and it also included SELinux and Smack patches. >> They were going to be refreshed based on the latest code changes, but I >> haven't seen them posted, or I can't seem to find them in my limited >> email archive. > > We have been waiting for right moment with these. :-) > >> Karol, what's the status of them? > > I have handed patchset over to Paul Osmialowski who started rework it for v4 > relatively recently. I think it shouldn't be that hard to post updated version... > > Paul? Different Paul here, but very interested in the LSM and SELinux hooks for obvious reasons; at a bare minimum please CC the LSM list on the kdbus hooks, and preferably the SELinux list as well. The initial SELinux hooks I threw together were just a rough first pass, we (the LSM and SELinux folks) need to have a better discussion about how to provide the necessary access controls for kdbus ... preferably before it finds its way into a released kernel. -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 17:16 ` Greg Kroah-Hartman 2015-04-23 17:34 ` Andy Lutomirski 2015-04-23 17:42 ` Stephen Smalley @ 2015-04-23 17:57 ` Linus Torvalds 2015-04-23 18:04 ` Linus Torvalds 2015-04-23 18:48 ` Linus Torvalds 2015-04-24 13:50 ` Lukasz Skalski 3 siblings, 2 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-23 17:57 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 10:16 AM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: >> >> - starttime, cmdline, and possibly other pieces of metadata are also >> problematic. I think starttime is especially bad because it both >> breaks CRIU and is IMO completely unnecessary -- I sent out draft >> "highpid" patches a while ago to give a much better alternative that >> isn't racy and won't break CRIU. But cmdline is also IMO ridiculous. > > starttime was removed a while ago, are you sure you are looking at the > latest code? > > cmdline has been discussed and it really helps with debugging. > Decisions aren't being made based on it. Quite frankly, I personally find cmdline/comm etc *much* worse than sending the capabilities. The whole notion of knowing "the other end is root" (or more specifically some capability like "the other end can access raw hardware") I think is a thing that absolutely makes sense in any communication channel. I really don't even see why it would be conditional. I mean, it's not exactly a secret anyway, and it just makes *sense* for any protocol that may end up doing operations _for_ the recipient. Same goes for uid etc - if you are implementing a service daemon, the uid of the requester sure as hell makes a ton of difference in what you might want to expose. Things like "does this user have access rights to the printer?" are very natural questions to ask. So I really don't understand why that part is even controversial. kdbus wasn't meant to be some generic IPC mechanism. It is meant as a way to talk to system daemons. So the whole "capabilities and user information" is really to me a non-issue. It's clearly required information, and if you don't want to expose it, you damn well have absolutely *zero* business talking to system daemons. Really, it's that simple. But things like "comm" and the cmdline? That makes me nervous. There are real privacy issues there. Sure, maybe you think it's useful for debugging, but the very fact that you think it's useful for debugging makes me suspect you might be logging it (for future debugging). And quite frankly, I don't think you should be logging things like that. Yes, yes, if you're a system admin, you can find those things out, but they should *not* be something that you just end up logging by mistake or because "it's easy and all the information is right there". If somebody is printing something, it shouldn't matter if it's "lpr" or "firefox http://horses.and.trannyporn.my.little.pony.com/" that does the printing. And you can go "but we don't log it" all you want. It's still a bad idea. Sane people should refuse to allow a system service to see those kinds of things by default, for a very simple reason: it's none of their business. So I'd suggest just getting rid of "tid_comm/pid_comm/cmdline". There is no possible valid excuse for them. They aren't trustworthy anyway (ie a real attacker can obfuscate them easily), and they *are* potentially sensitive. [ Side note: the tid_comm/pid_comm ones depend on TASK_COMM_LEN anyway, which might change. a 16-byte command name used to be insanely long in the traditional unix environment, but these days it's actually regularly a truncated name due to programs called things like "gnome-shell-extension-prefs" or "abrt-action-generate-core-backtrace". ] Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 17:57 ` Linus Torvalds @ 2015-04-23 18:04 ` Linus Torvalds 2015-04-23 18:56 ` Greg Kroah-Hartman 2015-04-23 18:48 ` Linus Torvalds 1 sibling, 1 reply; 333+ messages in thread From: Linus Torvalds @ 2015-04-23 18:04 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > If somebody is printing something, it shouldn't matter if it's "lpr" > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that > does the printing. And btw, it's not just "this is information that shouldn't be logged". It's literally "information that should not *ever* be used". I can easily see some phone manufacturer deciding to do "value add" by adding a special case where a special vendor system manager program gets a back door to some service, because it needs to access the camera for user identification at login time, so there's some magic if (!strcmp(client->pid_comm, "vendor-login-pr")) return ACCESS_OK; because "it was the simplest way to do this", and the programmer knew it was a hack, but he needed to get it working because he had a deadline yesterday. And then somebody figures this out, and makes an app that takes pictures on your phone surreptitiously. No, we can't protect against vendors doing stupid things, but we very much also shouldn't make the kernel have interfaces that basically encourage people to do stupid things because they make irrelevant and wrongheaded data available. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 18:04 ` Linus Torvalds @ 2015-04-23 18:56 ` Greg Kroah-Hartman 2015-04-23 19:22 ` Andy Lutomirski 2015-04-23 20:51 ` Linus Torvalds 0 siblings, 2 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-23 18:56 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote: > On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > If somebody is printing something, it shouldn't matter if it's "lpr" > > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that > > does the printing. > > And btw, it's not just "this is information that shouldn't be logged". > > It's literally "information that should not *ever* be used". I can > easily see some phone manufacturer deciding to do "value add" by > adding a special case where a special vendor system manager program > gets a back door to some service, because it needs to access the > camera for user identification at login time, so there's some magic > > if (!strcmp(client->pid_comm, "vendor-login-pr")) > return ACCESS_OK; > > because "it was the simplest way to do this", and the programmer knew > it was a hack, but he needed to get it working because he had a > deadline yesterday. > > And then somebody figures this out, and makes an app that takes > pictures on your phone surreptitiously. > > No, we can't protect against vendors doing stupid things, but we very > much also shouldn't make the kernel have interfaces that basically > encourage people to do stupid things because they make irrelevant and > wrongheaded data available. Doing access control based on comm and cmdline is horrid, I totally agree. But right now, any process in the system can read any other process's comm and cmdline value out of /proc today. So removing it from the metadata is fine for kdbus, I can live with that, but it really isn't "preventing" anything that's not already visible to everyone, so if someone wanting to be "bad" could always still log it or do anything else they wanted with it. Doesn't syslog uses it today all over the place for logging stuff that happens in the system? Or am I missing something here? thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 18:56 ` Greg Kroah-Hartman @ 2015-04-23 19:22 ` Andy Lutomirski 2015-04-23 19:33 ` Greg KH 2015-04-23 20:51 ` Linus Torvalds 1 sibling, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-23 19:22 UTC (permalink / raw) To: Greg KH Cc: One Thousand Gnomes, Arnd Bergmann, Linus Torvalds, Tom Gundersen, linux-kernel, Jiri Kosina, David Herrmann, Eric W. Biederman, Andrew Morton, Djalal Harouni, Daniel Mack On Apr 23, 2015 11:56 AM, "Greg Kroah-Hartman" <gregkh@linuxfoundation.org> wrote: > > On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote: > > On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > If somebody is printing something, it shouldn't matter if it's "lpr" > > > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that > > > does the printing. > > > > And btw, it's not just "this is information that shouldn't be logged". > > > > It's literally "information that should not *ever* be used". I can > > easily see some phone manufacturer deciding to do "value add" by > > adding a special case where a special vendor system manager program > > gets a back door to some service, because it needs to access the > > camera for user identification at login time, so there's some magic > > > > if (!strcmp(client->pid_comm, "vendor-login-pr")) > > return ACCESS_OK; > > > > because "it was the simplest way to do this", and the programmer knew > > it was a hack, but he needed to get it working because he had a > > deadline yesterday. > > > > And then somebody figures this out, and makes an app that takes > > pictures on your phone surreptitiously. > > > > No, we can't protect against vendors doing stupid things, but we very > > much also shouldn't make the kernel have interfaces that basically > > encourage people to do stupid things because they make irrelevant and > > wrongheaded data available. > > Doing access control based on comm and cmdline is horrid, I totally > agree. But right now, any process in the system can read any other > process's comm and cmdline value out of /proc today. So removing it > from the metadata is fine for kdbus, I can live with that, but it really > isn't "preventing" anything that's not already visible to everyone, so > if someone wanting to be "bad" could always still log it or do anything > else they wanted with it. I feel like a broken record. This isn't true in general. Selinux can and, I believe, often does prevent this. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 19:22 ` Andy Lutomirski @ 2015-04-23 19:33 ` Greg KH 2015-04-23 20:53 ` Linus Torvalds 0 siblings, 1 reply; 333+ messages in thread From: Greg KH @ 2015-04-23 19:33 UTC (permalink / raw) To: Andy Lutomirski Cc: One Thousand Gnomes, Arnd Bergmann, Linus Torvalds, Tom Gundersen, linux-kernel, Jiri Kosina, David Herrmann, Eric W. Biederman, Andrew Morton, Djalal Harouni, Daniel Mack On Thu, Apr 23, 2015 at 12:22:10PM -0700, Andy Lutomirski wrote: > On Apr 23, 2015 11:56 AM, "Greg Kroah-Hartman" > <gregkh@linuxfoundation.org> wrote: > > > > On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote: > > > On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds > > > <torvalds@linux-foundation.org> wrote: > > > > > > > > If somebody is printing something, it shouldn't matter if it's "lpr" > > > > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that > > > > does the printing. > > > > > > And btw, it's not just "this is information that shouldn't be logged". > > > > > > It's literally "information that should not *ever* be used". I can > > > easily see some phone manufacturer deciding to do "value add" by > > > adding a special case where a special vendor system manager program > > > gets a back door to some service, because it needs to access the > > > camera for user identification at login time, so there's some magic > > > > > > if (!strcmp(client->pid_comm, "vendor-login-pr")) > > > return ACCESS_OK; > > > > > > because "it was the simplest way to do this", and the programmer knew > > > it was a hack, but he needed to get it working because he had a > > > deadline yesterday. > > > > > > And then somebody figures this out, and makes an app that takes > > > pictures on your phone surreptitiously. > > > > > > No, we can't protect against vendors doing stupid things, but we very > > > much also shouldn't make the kernel have interfaces that basically > > > encourage people to do stupid things because they make irrelevant and > > > wrongheaded data available. > > > > Doing access control based on comm and cmdline is horrid, I totally > > agree. But right now, any process in the system can read any other > > process's comm and cmdline value out of /proc today. So removing it > > from the metadata is fine for kdbus, I can live with that, but it really > > isn't "preventing" anything that's not already visible to everyone, so > > if someone wanting to be "bad" could always still log it or do anything > > else they wanted with it. > > I feel like a broken record. This isn't true in general. Works on my box :) > Selinux can and, I believe, often does prevent this. Ok, then the LSM patches for kdbus should be able to also mediate this as well if needed. I haven't looked at the LSM kdbus patches in a long time, so I don't remember exactly what they were looking at. Again, I don't object to dropping this in kdbus, just confused as this seemed to me to be something that is always available to all processes anyway, we weren't adding something previously "hidden". thanks, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 19:33 ` Greg KH @ 2015-04-23 20:53 ` Linus Torvalds 0 siblings, 0 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-23 20:53 UTC (permalink / raw) To: Greg KH Cc: Andy Lutomirski, One Thousand Gnomes, Arnd Bergmann, Tom Gundersen, linux-kernel, Jiri Kosina, David Herrmann, Eric W. Biederman, Andrew Morton, Djalal Harouni, Daniel Mack On Thu, Apr 23, 2015 at 12:33 PM, Greg KH <gregkh@linuxfoundation.org> wrote: > On Thu, Apr 23, 2015 at 12:22:10PM -0700, Andy Lutomirski wrote: > >> Selinux can and, I believe, often does prevent this. > > Ok, then the LSM patches for kdbus should be able to also mediate this > as well if needed. No Greg. Just remove the shit. Really. Take out the command line and the task name. You already admitted that there is no actual valid use for it. We don't add crap that then has to be disabled with secuirity rules just because it was a bad interface. Just make the interface not do it in the first place. It's that simple. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 18:56 ` Greg Kroah-Hartman 2015-04-23 19:22 ` Andy Lutomirski @ 2015-04-23 20:51 ` Linus Torvalds 1 sibling, 0 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-23 20:51 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 11:56 AM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > Doing access control based on comm and cmdline is horrid, I totally > agree. But right now, any process in the system can read any other > process's comm and cmdline value out of /proc today. You have to work extra hard for it, and it's preventable anyway (ie selinux). In contrast, with the information in the kdbus message, it's almost certain that any random "enable debugging for dbus" patch will start logging it, because "it's just there". That's a big difference. Most bugs and security issues come because people make trivial make trivial mistakes, not because people explicitly go out of their way to make them. > Doesn't syslog uses it today all over the place for logging stuff that > happens in the system? Hell no. Sure, if an application explicitly says "log this message", then we save the application name. But not for random system interactions. The example Andy gave about doing things like name lookup is a good one. Doesn't systemd already do a dns cache module? Doing a name lookup is some *seriously* different thing than using "syslog()" to explicitly log messages. And if kdbus people can't see that difference, I don't see what we can discuss here. Do you really not see the privacy implications? It turns privacy violations from "you have to actually work at it" to "they happen pretty much by mistake". Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 17:57 ` Linus Torvalds 2015-04-23 18:04 ` Linus Torvalds @ 2015-04-23 18:48 ` Linus Torvalds 1 sibling, 0 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-23 18:48 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Same goes for uid etc - if you are implementing a service daemon, the > uid of the requester sure as hell makes a ton of difference in what > you might want to expose. Things like "does this user have access > rights to the printer?" are very natural questions to ask. Hmm. Looking at the code, it strikes me that not only does kdbus_meta_proc_collect() collect too much, but some of what it collects it just seems to do *wrong*. So I agree with collecting user and credential information (obviously unlike some people ;), but I think the code that does it is just wrong. The way to collect user and credential information is very simple: you look at "file->f_cred". That's _it_. Nothing more. Maybe you do "get_cred(file->f_cred):" if you have lifetimes of this after the "struct file" is gone. But you don't copy the fields individually or willy-nilly. That "struct cred" reference gets you all you need. It gets you the supplementary groups. It gets you the capabilities. It gets you the user and group id's. And equally importantly, it gets you the namespace so that you can do conversions to random target namespaces later, when you actually *use* the information. There might be some question about whether you should use "current->cred" or "file->f_cred", but the latter is almost always the right thing to use when you are doing file operations. The unix filesystem security model is about permissions at open time, not at use time. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 17:16 ` Greg Kroah-Hartman ` (2 preceding siblings ...) 2015-04-23 17:57 ` Linus Torvalds @ 2015-04-24 13:50 ` Lukasz Skalski 2015-04-24 14:19 ` Havoc Pennington 2015-04-27 21:32 ` Linus Torvalds 3 siblings, 2 replies; 333+ messages in thread From: Lukasz Skalski @ 2015-04-24 13:50 UTC (permalink / raw) To: Greg Kroah-Hartman, Andy Lutomirski Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, linux-kernel Hi All, On 04/23/2015 07:16 PM, Greg Kroah-Hartman wrote: > On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote: >> - There's still an open performance question. Namely: is kdbus performant? > > Yes, I thought that was already answered. Tizen posted some numbers > with a much older version of the code, before David fixed a bunch of > issues that he and you found, and that averaged between 25-50% faster. > Details are in this presentation: > http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf > > The Tizen and GENIVI developers are off running numbers with the latest > code, or so they told me through emails, but I don't know when/if that > will ever happen, so I can't promise more than what is already here. > I'm working on kdbus support for GLib ([1],[2]). I saw some questions about kdbus performance, so I've prepared simple benchmark. Because David already has posted some comparison results between kdbus and UDS, I've decided to use my GLib port with native kdbus support (it should be noted, that this port is not finished yet and there are still some places for improvements, thus please do not treat these test results as final). To perform tests I've created two simple apps: - server: http://fpaste.org/215157/ - client: http://fpaste.org/215156/ The first one (server) registers itself on the bus under well-known name ("com.test.app") and waits for calls to its objects and methods. The second one (client) makes calls and records periods of time between moment of preparing of a call to the moment of receiving an answer. The measurement is made by performing 20000 of calls and computing a sum of duration of every call (for two different sizes of message payload: 1000 and 10000 bytes). The client program returns total time of performed calls after successful execution. All tests have been run on VirtualBox with ArchLinux and latest version of systemd and kdbus. The test results are following: +--------------+--------------------+--------------------+ | | Elapsed time | Elapsed time | | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON | | [bytes] | KDBUS SUPPORT* | | +--------------+--------------------+--------------------+ | | 1) 2.874264 s | 1) 4.624631 s | | 1000 | 2) 2.932835 s | 2) 4.669730 s | | | 3) 2.899634 s | 3) 4.747275 s | | | 4) 2.970106 s | 4) 4.725723 s | +--------------+--------------------+--------------------+ | | 3) 3.182379 s | 3) 5.469663 s | | 10000 | 3) 3.334170 s | 3) 5.520757 s | | | 3) 3.353305 s | 3) 5.556374 s | | | 3) 3.367732 s | 3) 5.597758 s | +--------------+--------------------+--------------------+ *all tests performed without using memfd mechanism. I hope it will be useful for someone :) [1] https://github.com/lukasz-skalski/glib [2] https://bugzilla.gnome.org/show_bug.cgi?id=721861 Cheers,-- Lukasz Skalski Samsung R&D Institute Poland Samsung Electronics l.skalski@samsung.com ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 13:50 ` Lukasz Skalski @ 2015-04-24 14:19 ` Havoc Pennington 2015-04-24 14:34 ` Lukasz Skalski 2015-04-27 21:32 ` Linus Torvalds 1 sibling, 1 reply; 333+ messages in thread From: Havoc Pennington @ 2015-04-24 14:19 UTC (permalink / raw) To: Lukasz Skalski Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: > - client: http://fpaste.org/215156/ > Cool - it might also be interesting to try this without blocking round trips, i.e. send requests as quickly as you can, and collect replies asynchronously. That's how people ideally use dbus. It should certainly reduce the total benchmark time, but just wondering if this usage increases or decreases the delta between userspace daemon and kdbus. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 14:19 ` Havoc Pennington @ 2015-04-24 14:34 ` Lukasz Skalski 2015-04-24 19:25 ` Greg Kroah-Hartman 0 siblings, 1 reply; 333+ messages in thread From: Lukasz Skalski @ 2015-04-24 14:34 UTC (permalink / raw) To: Havoc Pennington Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/24/2015 04:19 PM, Havoc Pennington wrote: > On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: >> - client: http://fpaste.org/215156/ >> > > Cool - it might also be interesting to try this without blocking round > trips, i.e. send requests as quickly as you can, and collect replies > asynchronously. That's how people ideally use dbus. It should > certainly reduce the total benchmark time, but just wondering if this > usage increases or decreases the delta between userspace daemon and > kdbus. No problem - I'll prepare also asynchronous version. > > Havoc > BR, -- Lukasz Skalski Samsung R&D Institute Poland Samsung Electronics l.skalski@samsung.com ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 14:34 ` Lukasz Skalski @ 2015-04-24 19:25 ` Greg Kroah-Hartman 2015-04-27 8:57 ` Lukasz Skalski 0 siblings, 1 reply; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-24 19:25 UTC (permalink / raw) To: Lukasz Skalski Cc: Havoc Pennington, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote: > On 04/24/2015 04:19 PM, Havoc Pennington wrote: > > On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: > >> - client: http://fpaste.org/215156/ > >> > > > > Cool - it might also be interesting to try this without blocking round > > trips, i.e. send requests as quickly as you can, and collect replies > > asynchronously. That's how people ideally use dbus. It should > > certainly reduce the total benchmark time, but just wondering if this > > usage increases or decreases the delta between userspace daemon and > > kdbus. > > No problem - I'll prepare also asynchronous version. That would be great to see as well. Many thanks for doing this work. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 19:25 ` Greg Kroah-Hartman @ 2015-04-27 8:57 ` Lukasz Skalski 2015-04-27 17:18 ` Greg Kroah-Hartman 2015-04-27 22:29 ` David Lang 0 siblings, 2 replies; 333+ messages in thread From: Lukasz Skalski @ 2015-04-27 8:57 UTC (permalink / raw) To: Greg Kroah-Hartman, Havoc Pennington Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote: > On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote: >> On 04/24/2015 04:19 PM, Havoc Pennington wrote: >>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: >>>> - client: http://fpaste.org/215156/ >>>> >>> >>> Cool - it might also be interesting to try this without blocking round >>> trips, i.e. send requests as quickly as you can, and collect replies >>> asynchronously. That's how people ideally use dbus. It should >>> certainly reduce the total benchmark time, but just wondering if this >>> usage increases or decreases the delta between userspace daemon and >>> kdbus. >> >> No problem - I'll prepare also asynchronous version. > > That would be great to see as well. Many thanks for doing this work. As it was proposed by Havoc and Greg I've created simple benchmark for asynchronous calls: - server: http://fpaste.org/215157/ (the same as in the previous test) - client: http://fpaste.org/215724/ (asynchronous version) For asynchronous version of client I had to decrease number of calls to 128 (for synchronous version it was x20000 calls), otherwise we can exceed the maximum number of pending replies per connection. The test results are following: +--------------+--------------------+--------------------+ | | Elapsed time | Elapsed time | | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON | | [bytes] | KDBUS SUPPORT* | | +--------------+--------------------+--------------------+ | | 1) 0.018639 s | 1) 0.029947 s | | 1000 | 2) 0.017045 s | 2) 0.032812 s | | | 3) 0.017490 s | 3) 0.029971 s | | | 4) 0.018001 s | 4) 0.026485 s | +--------------+--------------------+--------------------+ | | 3) 0.019898 s | 3) 0.040914 s | | 10000 | 3) 0.022187 s | 3) 0.033604 s | | | 3) 0.020854 s | 3) 0.037616 s | | | 3) 0.020020 s | 3) 0.033772 s | +--------------+--------------------+--------------------+ *all tests performed without using memfd mechanism. And as I wrote in my previous mail, kdbus transport for GLib is not finished yet and there are still some places for improvements, so please do not treat these test results as final). > > greg k-h > Cheers, -- Lukasz Skalski Samsung R&D Institute Poland Samsung Electronics l.skalski@samsung.com ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 8:57 ` Lukasz Skalski @ 2015-04-27 17:18 ` Greg Kroah-Hartman 2015-04-27 22:29 ` David Lang 1 sibling, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-27 17:18 UTC (permalink / raw) To: Lukasz Skalski Cc: Havoc Pennington, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 27, 2015 at 10:57:45AM +0200, Lukasz Skalski wrote: > On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote: > > On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote: > >> On 04/24/2015 04:19 PM, Havoc Pennington wrote: > >>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: > >>>> - client: http://fpaste.org/215156/ > >>>> > >>> > >>> Cool - it might also be interesting to try this without blocking round > >>> trips, i.e. send requests as quickly as you can, and collect replies > >>> asynchronously. That's how people ideally use dbus. It should > >>> certainly reduce the total benchmark time, but just wondering if this > >>> usage increases or decreases the delta between userspace daemon and > >>> kdbus. > >> > >> No problem - I'll prepare also asynchronous version. > > > > That would be great to see as well. Many thanks for doing this work. > > As it was proposed by Havoc and Greg I've created simple benchmark for > asynchronous calls: > > - server: http://fpaste.org/215157/ (the same as in the previous test) > - client: http://fpaste.org/215724/ (asynchronous version) > > For asynchronous version of client I had to decrease number of calls to > 128 (for synchronous version it was x20000 calls), otherwise we can > exceed the maximum number of pending replies per connection. > > The test results are following: > > +--------------+--------------------+--------------------+ > | | Elapsed time | Elapsed time | > | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON | > | [bytes] | KDBUS SUPPORT* | | > +--------------+--------------------+--------------------+ > | | 1) 0.018639 s | 1) 0.029947 s | > | 1000 | 2) 0.017045 s | 2) 0.032812 s | > | | 3) 0.017490 s | 3) 0.029971 s | > | | 4) 0.018001 s | 4) 0.026485 s | > +--------------+--------------------+--------------------+ > | | 3) 0.019898 s | 3) 0.040914 s | > | 10000 | 3) 0.022187 s | 3) 0.033604 s | > | | 3) 0.020854 s | 3) 0.037616 s | > | | 3) 0.020020 s | 3) 0.033772 s | > +--------------+--------------------+--------------------+ > *all tests performed without using memfd mechanism. > > And as I wrote in my previous mail, kdbus transport for GLib is not > finished yet and there are still some places for improvements, so please > do not treat these test results as final). Very nice, thanks. Any chance you can bump those message sizes up to over 512k? I think that will show a huge difference. Even just under 512k should be faster, as you have shown, but I have been told that for messages larger than 512k, the D-Bus daemon has "issues", which has kept people from wanting to use messages that large before now. thanks again, greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 8:57 ` Lukasz Skalski 2015-04-27 17:18 ` Greg Kroah-Hartman @ 2015-04-27 22:29 ` David Lang 2015-04-28 10:53 ` Lukasz Skalski 1 sibling, 1 reply; 333+ messages in thread From: David Lang @ 2015-04-27 22:29 UTC (permalink / raw) To: Lukasz Skalski Cc: Greg Kroah-Hartman, Havoc Pennington, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, 27 Apr 2015, Lukasz Skalski wrote: > Subject: Re: [GIT PULL] kdbus for 4.1-rc1 > > On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote: >> On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote: >>> On 04/24/2015 04:19 PM, Havoc Pennington wrote: >>>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: >>>>> - client: http://fpaste.org/215156/ >>>>> >>>> >>>> Cool - it might also be interesting to try this without blocking round >>>> trips, i.e. send requests as quickly as you can, and collect replies >>>> asynchronously. That's how people ideally use dbus. It should >>>> certainly reduce the total benchmark time, but just wondering if this >>>> usage increases or decreases the delta between userspace daemon and >>>> kdbus. >>> >>> No problem - I'll prepare also asynchronous version. >> >> That would be great to see as well. Many thanks for doing this work. > > As it was proposed by Havoc and Greg I've created simple benchmark for > asynchronous calls: > > - server: http://fpaste.org/215157/ (the same as in the previous test) > - client: http://fpaste.org/215724/ (asynchronous version) > > For asynchronous version of client I had to decrease number of calls to > 128 (for synchronous version it was x20000 calls), otherwise we can > exceed the maximum number of pending replies per connection. aren't we being told that part of the reason for needing kdbus is that thousands, or tens of thousands of messages are being spewed out? how does limiting it to 128 messages represent real-life if this is the case? David Lang > The test results are following: > > +--------------+--------------------+--------------------+ > | | Elapsed time | Elapsed time | > | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON | > | [bytes] | KDBUS SUPPORT* | | > +--------------+--------------------+--------------------+ > | | 1) 0.018639 s | 1) 0.029947 s | > | 1000 | 2) 0.017045 s | 2) 0.032812 s | > | | 3) 0.017490 s | 3) 0.029971 s | > | | 4) 0.018001 s | 4) 0.026485 s | > +--------------+--------------------+--------------------+ > | | 3) 0.019898 s | 3) 0.040914 s | > | 10000 | 3) 0.022187 s | 3) 0.033604 s | > | | 3) 0.020854 s | 3) 0.037616 s | > | | 3) 0.020020 s | 3) 0.033772 s | > +--------------+--------------------+--------------------+ > *all tests performed without using memfd mechanism. > > And as I wrote in my previous mail, kdbus transport for GLib is not > finished yet and there are still some places for improvements, so please > do not treat these test results as final). > >> >> greg k-h >> > > Cheers, > ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 22:29 ` David Lang @ 2015-04-28 10:53 ` Lukasz Skalski 0 siblings, 0 replies; 333+ messages in thread From: Lukasz Skalski @ 2015-04-28 10:53 UTC (permalink / raw) To: David Lang Cc: Greg Kroah-Hartman, Havoc Pennington, Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/28/2015 12:29 AM, David Lang wrote: > On Mon, 27 Apr 2015, Lukasz Skalski wrote: > > aren't we being told that part of the reason for needing kdbus is that > thousands, or tens of thousands of messages are being spewed out? how > does limiting it to 128 messages represent real-life if this is the case? > AFAIK, at this moment some limits (like for example maximum number of queued requests waiting for a reply or ) for both - DBus daemon and kdbus, are the same (or at least quite similar). > David Lang > > -- Lukasz Skalski Samsung R&D Institute Poland Samsung Electronics l.skalski@samsung.com ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-24 13:50 ` Lukasz Skalski 2015-04-24 14:19 ` Havoc Pennington @ 2015-04-27 21:32 ` Linus Torvalds 2015-04-27 21:40 ` Andy Lutomirski 2015-04-28 10:39 ` Lukasz Skalski 1 sibling, 2 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-27 21:32 UTC (permalink / raw) To: Lukasz Skalski Cc: Greg Kroah-Hartman, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: > > To perform tests I've created two simple apps: > > - server: http://fpaste.org/215157/ > - client: http://fpaste.org/215156/ So since Andy reported that dbus seems to be a few orders of magnitude too slow, I tried to build these apps to see what it even does. They don't buidl on F21. You seem to be using features that are too new to exist even in fairly modern distros: server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared so I can't even see what dbus does *now*. That said, either you're running your test on a potato, or dbus is seriously screwed up. No way should it take 4+ seconds to send a 1000b message to back and forth 20k times. But as mentioned, I can't even see what it's doing right now. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 21:32 ` Linus Torvalds @ 2015-04-27 21:40 ` Andy Lutomirski 2015-04-27 22:00 ` Linus Torvalds 2015-04-28 10:39 ` Lukasz Skalski 1 sibling, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-27 21:40 UTC (permalink / raw) To: Linus Torvalds Cc: Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 27, 2015 at 2:32 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: >> >> To perform tests I've created two simple apps: >> >> - server: http://fpaste.org/215157/ >> - client: http://fpaste.org/215156/ > > So since Andy reported that dbus seems to be a few orders of magnitude > too slow, I tried to build these apps to see what it even does. > > They don't buidl on F21. You seem to be using features that are too > new to exist even in fairly modern distros: > > server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared > > so I can't even see what dbus does *now*. Change "USER" to "SESSION". Build with: gcc -Wall -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -o client client.c -lglib-2.0 -ldbus-glib-1 -ldbus-1 -lgobject-2.0 -lglib-2.0 -ldbus-1 -lgio-2.0 The again with s/client/server/ For all I know, the USER vs SESSION distinction matters, but I can't imagine why. > > That said, either you're running your test on a potato, or dbus is > seriously screwed up. No way should it take 4+ seconds to send a 1000b > message to back and forth 20k times. But as mentioned, I can't even > see what it's doing right now. Whee! I'm typing this email on a potato! --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 21:40 ` Andy Lutomirski @ 2015-04-27 22:00 ` Linus Torvalds 2015-04-27 22:14 ` Linus Torvalds 2015-04-28 12:49 ` Havoc Pennington 0 siblings, 2 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-27 22:00 UTC (permalink / raw) To: Andy Lutomirski Cc: Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 27, 2015 at 2:40 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > Change "USER" to "SESSION". That works. > Build with: Hell no. I used gcc client.c -o client $(pkg-config --cflags --libs gtk+-2.0) instead. That worked. >> That said, either you're running your test on a potato, or dbus is >> seriously screwed up. No way should it take 4+ seconds to send a 1000b >> message to back and forth 20k times. But as mentioned, I can't even >> see what it's doing right now. > > Whee! I'm typing this email on a potato! No, I think you're right, there's the other non-potato choice: "dbus is seriously screwed up". That thing has almost no kernel footprint. It's spending all it's time in user space overhead. Quite frankly, the whole "kdbus is important for performance" seems to be *totally* invalidated by even a minimal look at profiles for that thing. Here's the top-15 offender list: 2.62% gdbus libc-2.20.so [.] _int_malloc 2.43% gdbus libc-2.20.so [.] free 2.31% server libc-2.20.so [.] free 2.12% gdbus libc-2.20.so [.] malloc 1.77% gdbus libglib-2.0.so.0.4200.2 [.] g_utf8_validate 1.43% gdbus libglib-2.0.so.0.4200.2 [.] g_slice_alloc 1.41% gdbus libglib-2.0.so.0.4200.2 [.] g_hash_table_lookup 1.28% server libc-2.20.so [.] _int_malloc 1.27% gdbus libglib-2.0.so.0.4200.2 [.] g_mutex_lock 1.22% gdbus libglib-2.0.so.0.4200.2 [.] g_variant_unref 1.16% server libc-2.20.so [.] malloc 1.14% gdbus libglib-2.0.so.0.4200.2 [.] g_bit_lock 1.07% gdbus libglib-2.0.so.0.4200.2 [.] g_slice_free1 1.05% gdbus libglib-2.0.so.0.4200.2 [.] g_bit_unlock 1.01% gdbus libglib-2.0.so.0.4200.2 [.] g_mutex_unlock there's not a kernel function in sight in the top-15, and it's all just overhead. The above is from the server side, but the client looks similar. If somebody wants to speed up dbus, they should likely look at the user-space code, not the kernel side. My guess is that pretty much the entirely of the quoted kdbus "speedup" isn't because it speeds up any kernel side thing, it's because it avoids the user-space crap in the dbus server. IOW, all the people who say that it's about avoiding context switches are probably just full of shit. It's not about context switches, it's about bad user-level code. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 22:00 ` Linus Torvalds @ 2015-04-27 22:14 ` Linus Torvalds 2015-04-28 13:44 ` Havoc Pennington ` (3 more replies) 2015-04-28 12:49 ` Havoc Pennington 1 sibling, 4 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-27 22:14 UTC (permalink / raw) To: Andy Lutomirski Cc: Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > IOW, all the people who say that it's about avoiding context switches > are probably just full of shit. It's not about context switches, it's > about bad user-level code. Just to make sure, I did a system-wide profile (so that you can actually see the overhead of context switching better), and that didn't change the picture. The scheduler overhead *might* be 1% or so. So really. The people who talk about how kdbus improves performance are just full of sh*t. Yes, it improves things, but the improvement seems to be 100% "incidental", in that it avoids a few trips down the user-space problems. The real problems seem to be in dbus memory management (suggestion: keep a small per-thread cache of those message allocations) and to a smaller degree in the crazy utf8 validation (why the f*ck does it do that anyway?), with some locking problems thrown in for good measure. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 22:14 ` Linus Torvalds @ 2015-04-28 13:44 ` Havoc Pennington 2015-04-28 14:48 ` Havoc Pennington 2015-06-22 17:33 ` Jindrich Makovicka ` (2 subsequent siblings) 3 siblings, 1 reply; 333+ messages in thread From: Havoc Pennington @ 2015-04-28 13:44 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 27, 2015 at 6:14 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > The real problems seem to be in dbus memory management (suggestion: > keep a small per-thread cache of those message allocations) and to a > smaller degree in the crazy utf8 validation (why the f*ck does it do > that anyway?), with some locking problems thrown in for good measure. > I would say there are two distinct performance topics here. A. is the fixed overhead of various bindings (which may well vary a lot by binding). This is parsing, validation, allocation, locking, whatever. It tends to be "per message operation" (read/parse or marshal/write of a message). B. is how many of these "message operations" (read/parse, marshal/write) are happening. To make A*B smaller, one can reduce either A or B. The kdbus idea seems to be mostly about B, eliminating the bus daemon's read/parse and marshal/write, and reducing it to only one marshal/write by the sender and one read/parse by the recipient without the daemon in between. People have worked on A for clients, by doing the systemd binding for example, but perhaps they have been reluctant to work on the bus daemon itself to improve A for the bus because they felt solving B would involve eliminating the bus daemon anyway. If you are planning to solve B via kdbus, then optimizing the bus daemon itself would be a waste of time (A only matters for clients, not the bus, in kdbus world). That email I linked earlier (http://lists.freedesktop.org/archives/dbus/2012-March/015024.html ) has many suggestions on A for the bus daemon itself, but of course taking the bus daemon out of the equation would be more effective than any amount of optimizing it. A. is kind of a realm of many choices - there are tons of bindings, and people can decide if they want the convenient-but-malloc-happy glib ones, or the more traditional C style of systemd, or Python or Java or JavaScript or whatever ... this is an area where people can make the tradeoff they want. But everyone is "stuck" with the bus daemon (or kdbus) since it has to be shared among clients, of course. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 13:44 ` Havoc Pennington @ 2015-04-28 14:48 ` Havoc Pennington 2015-04-28 17:18 ` Theodore Ts'o 2015-04-28 17:19 ` David Lang 0 siblings, 2 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-28 14:48 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni btw if I can make a suggestion, it's quite confusing to talk about "dbus" unqualified when we are talking about implementation issues, since it muddles bus daemon vs. clients, and also since there are lots of implementations of the client bindings: http://www.freedesktop.org/wiki/Software/DBusBindings/ For the bus daemon, the only two implementations I know of are the original one (which uses libdbus as its binding) and kdbus, though. I would expect there's no question the bus daemon can be faster, maybe say 1.5x raw sockets instead of 2.5x, or whatever - something on that order. Should probably simply stipulate this for discussion purposes: "someone could optimize the crap out of the bus daemon". The kdbus question is about whether to eliminate this daemon entirely. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 14:48 ` Havoc Pennington @ 2015-04-28 17:18 ` Theodore Ts'o 2015-04-28 20:25 ` Havoc Pennington 2015-04-28 17:19 ` David Lang 1 sibling, 1 reply; 333+ messages in thread From: Theodore Ts'o @ 2015-04-28 17:18 UTC (permalink / raw) To: Havoc Pennington Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 28, 2015 at 10:48:10AM -0400, Havoc Pennington wrote: > btw if I can make a suggestion, it's quite confusing to talk about > "dbus" unqualified when we are talking about implementation issues, > since it muddles bus daemon vs. clients, and also since there are lots > of implementations of the client bindings: > > http://www.freedesktop.org/wiki/Software/DBusBindings/ > > For the bus daemon, the only two implementations I know of are the > original one (which uses libdbus as its binding) and kdbus, though. > > I would expect there's no question the bus daemon can be faster, maybe > say 1.5x raw sockets instead of 2.5x, or whatever - something on that > order. Should probably simply stipulate this for discussion purposes: > "someone could optimize the crap out of the bus daemon". The kdbus > question is about whether to eliminate this daemon entirely. So the question is if one of the justifications for moving the daemon into kernel space is that it's performance is crap, then I think it is useful to determine whether a fully optimized userspace daemon would be good enough. After all, we can go down the Novell Netware path and push arbitrary web servers, ldap servers, etc. all into the kernel on the excuse of "the performance would be faster". But that begs the question of how much performance improvements can be made purely in userspace, and ignores all of the security and stability costs of moving more and more code into the kernel. So the question I have is why in the world do we want to be able to support 1.5x raw sockets for a bus speed? What's the use case where that kind of performance is required for a bus based system, and is that a world we really want to live in? I find dbus to be extremely hard to debug when my desktop starts doing things I don't want it to do. The fact that it might be flinging around hundreds of thousands of messages, and that this is something we want to encourage, doesn't make me feel any more kindly inclined towards dbus or kdbus.... - Ted ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 17:18 ` Theodore Ts'o @ 2015-04-28 20:25 ` Havoc Pennington 2015-04-28 23:12 ` John Stoffel 0 siblings, 1 reply; 333+ messages in thread From: Havoc Pennington @ 2015-04-28 20:25 UTC (permalink / raw) To: Theodore Ts'o, Havoc Pennington, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <tytso@mit.edu> wrote: > So the question is if one of the justifications for moving the daemon > into kernel space is that it's performance is crap, then I think it is > useful to determine whether a fully optimized userspace daemon would > be good enough. > Yeah. I don't know how you answer that, because the answer is probably "it would be good enough for some things and not for other things." It depends on whether an app is sending enough data to be too slow, and it depends on the hardware, right. What I think we might know: the userspace:kernel time-to-send ratio should always be around 2:1, if both of them are similarly-implemented, because the userspace version has about 2x the work to do. The actual wall-clock time of course depends on the hardware and what's being sent. If there was a deviation from 2:1 in a benchmark, it might be because of implementation issues - so for example libdbus+dbus-daemon might be 3:1 or 5:1 to sd-dbus+kdbus, because sd-dbus isn't as bloated as libdbus, say. That isn't telling you anything about kernel vs. userspace architecture, the extra ratio above 2:1 is only telling you about userspace implementation quality. For purposes of deciding what to put in kernel - the differences between dbus client implementations (sd-dbus, libdbus, gdbus, etc.) seem like irrelevant noise to me. Re: the slippery slope to LDAP in the kernel - my questions would be things like 1) what are non-performance reasons to have dbus in the kernel, such as early boot or security considerations; 2) does LDAP in kernel give these kind of 2:1 gains; 3) is there a simpler way to get the 2:1 gain for dbus... Others can answer those better than I can. I _would_ say that dbus is more "generic" than something like LDAP; dbus is specific to the use-case of coordinating processes on a single machine, but it isn't specific to any particular application, and it's been used for lots of different applications. On my laptop, which is a pretty normal fedora 21 as far as I know: $ rpm -q --whatrequires 'libdbus-1.so.3()(64bit)' | wc -l 113 this omits anyone using a different binding, it's only libdbus users. > I find dbus to be extremely hard to debug when my desktop starts doing > things I don't want it to do. The fact that it might be flinging around hundreds > of thousands of messages, and that this is something we want to encourage, This particular argument doesn't resonate with me ... if dbus is hard to debug, it's not as if "ad hoc application-specific sidechannel somebody cooked up" is going to be easier. People aren't usually making up data to send around just because they can. If they need to send an audio stream, and dbus is too slow, they'll send it another ad hoc way, but it ultimately has to get sent. Same for most data, it is the size it is and it needs to go where it needs to go, for some what-the-user-wants-to-do kind of reason. If apps have to, they say "I'm sorry Dave I can't do that - you can't software-decode 4K video on your 300mhz ARM" - of course. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 20:25 ` Havoc Pennington @ 2015-04-28 23:12 ` John Stoffel 2015-04-29 0:45 ` Havoc Pennington ` (2 more replies) 0 siblings, 3 replies; 333+ messages in thread From: John Stoffel @ 2015-04-28 23:12 UTC (permalink / raw) To: Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni >>>>> "Havoc" == Havoc Pennington <hp@pobox.com> writes: Havoc> On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <tytso@mit.edu> wrote: >> So the question is if one of the justifications for moving the daemon >> into kernel space is that it's performance is crap, then I think it is >> useful to determine whether a fully optimized userspace daemon would >> be good enough. >> Havoc> Yeah. I don't know how you answer that, because the answer is Havoc> probably "it would be good enough for some things and not for Havoc> other things." It depends on whether an app is sending enough Havoc> data to be too slow, and it depends on the hardware, right. So what happens if we put kdbus into the kernel and it's still too slow? What then? Havoc> What I think we might know: the userspace:kernel time-to-send Havoc> ratio should always be around 2:1, if both of them are Havoc> similarly-implemented, because the userspace version has about Havoc> 2x the work to do. I'm not sure I agree with this statement, just putting something into the kernel doesn't magically make the work go away, and the overhead people are talking about won't change if applications and libraries keep opening/closing the connection to the bus all the time. Havoc> The actual wall-clock time of course depends on the hardware Havoc> and what's being sent. Havoc> If there was a deviation from 2:1 in a benchmark, it might be Havoc> because of implementation issues - so for example Havoc> libdbus+dbus-daemon might be 3:1 or 5:1 to sd-dbus+kdbus, Havoc> because sd-dbus isn't as bloated as libdbus, say. That isn't Havoc> telling you anything about kernel vs. userspace architecture, Havoc> the extra ratio above 2:1 is only telling you about userspace Havoc> implementation quality. Which is also telling you that maybe userspace could be improved more, before it needs to even think about going into the kernel? Havoc> For purposes of deciding what to put in kernel - the Havoc> differences between dbus client implementations (sd-dbus, Havoc> libdbus, gdbus, etc.) seem like irrelevant noise to me. Havoc> Re: the slippery slope to LDAP in the kernel - my questions Havoc> would be things like 1) what are non-performance reasons to Havoc> have dbus in the kernel, such as early boot or security Havoc> considerations; 2) does LDAP in kernel give these kind of 2:1 Havoc> gains; 3) is there a simpler way to get the 2:1 gain for Havoc> dbus... Havoc> Others can answer those better than I can. Havoc> I _would_ say that dbus is more "generic" than something like Havoc> LDAP; dbus is specific to the use-case of coordinating Havoc> processes on a single machine, but it isn't specific to any Havoc> particular application, and it's been used for lots of Havoc> different applications. On my laptop, which is a pretty normal Havoc> fedora 21 as far as I know: LDAP is pretty damn generic, in that you can put pretty large objects into it, and pretty large OUs, etc. So why would it be a candidate for going into the kernel? And why is kdbus so important in the kernel as well? People have talked about it needing to be there for bootup, but isn't that why we ripped out RAID detection and such from the kernel and built initramfs, so that there's LESS in the kernel, and more in an early userspace? Same idea with dbus in my opinion. Havoc> $ rpm -q --whatrequires 'libdbus-1.so.3()(64bit)' | wc -l Havoc> 113 Havoc> this omits anyone using a different binding, it's only libdbus users. >> I find dbus to be extremely hard to debug when my desktop starts doing >> things I don't want it to do. The fact that it might be flinging around hundreds >> of thousands of messages, and that this is something we want to encourage, Havoc> This particular argument doesn't resonate with me ... if dbus Havoc> is hard to debug, it's not as if "ad hoc application-specific Havoc> sidechannel somebody cooked up" is going to be easier. When Ted is saying it's hard to debug... then maybe it's a bit crappy in design or implementation? Havoc> People aren't usually making up data to send around just because they Havoc> can. If they need to send an audio stream, and dbus is too slow, Havoc> they'll send it another ad hoc way, but it ultimately has to get sent. Havoc> Same for most data, it is the size it is and it needs to go where it Havoc> needs to go, for some what-the-user-wants-to-do kind of reason. Havoc> If apps have to, they say "I'm sorry Dave I can't do that - you Havoc> can't software-decode 4K video on your 300mhz ARM" - of course. So why DOES audio need to go via DBUS? What about video? Why shouldn't that go via dbus as well? If one userspace implementation is so crappy, why can't that implementation be tossed and a better one done? Or why can't they just optimize/tune it in userspace instead? John ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 23:12 ` John Stoffel @ 2015-04-29 0:45 ` Havoc Pennington 2015-04-29 11:33 ` Harald Hoyer 2015-04-29 12:47 ` Harald Hoyer 2 siblings, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-29 0:45 UTC (permalink / raw) To: John Stoffel Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 28, 2015 at 7:12 PM, John Stoffel <john@stoffel.org> wrote: > Havoc> Yeah. I don't know how you answer that, because the answer is > Havoc> probably "it would be good enough for some things and not for > Havoc> other things." It depends on whether an app is sending enough > Havoc> data to be too slow, and it depends on the hardware, right. > > So what happens if we put kdbus into the kernel and it's still too > slow? What then? What my above paragraph was intended to mean is: I don't understand what it means to ask about a "too slow" fixed line here. Every time you make it substantively faster, it works for more apps or on slower hardware, presumably. You dial the speed, and you include or exclude certain app ideas accordingly. I think dbus works for lots of purposes now, despite being slow. Lots of people are using it. In many uses, super-slow-dbus might be 1% of the profile of whatever the user-visible functionality is, and nobody cares how fast dbus is. In other uses, they might. Some people are saying they would use it in more ways if it were faster and/or available in early boot and/or whatever else. I'm not those people, because right now I'm not working on dbus or anything using dbus. They would have to say what's 'fast enough' for them. "What happens if unix sockets are too slow? what then?" - it's not a coherent question. It's always relative to what you're trying to do, surely. > I'm not sure I agree with this statement, just putting something into > the kernel doesn't magically make the work go away The kdbus guys should really explain this. I have my understanding of it but theirs will be more accurate. > Which is also telling you that maybe userspace could be improved more, > before it needs to even think about going into the kernel? I imagine people have already improved the part of userspace they are thinking of keeping (sd-dbus, replacing libdbus) and they don't want to rewrite dbus-daemon only to immediately discard it. (The part of the "dbus" overall system which hasn't been rewritten and optimized is the daemon, which could be dropped completely in kdbus-world.) It's not especially mysterious what's slow about the existing daemon implementation, in my opinion; it's been the same for 10 years. The rough outline of speeding it up would be to replace libdbus with something sd-dbus-like, and then do a round of profiling and tuning. The 2012 email I linked to earlier had some other ideas. But this is a lot of work, it isn't "just" port to sd-dbus, the daemon is strongly entangled with libdbus right now. I don't blame people for being unmotivated on this if they believe it's a dead end. In that same 2012 email you'll notice I advised doing exactly what Linus suggests; do the userspace tuning rather than quote "arguing with kernel developers": http://lists.freedesktop.org/archives/dbus/2012-March/015024.html But I do admire that people felt kdbus was the right answer so have gone for it anyway, and I do think Linux as a complete OS (kernel+userspace) deserves a great answer in this problem space. > LDAP is pretty damn generic, in that you can put pretty large objects > into it, and pretty large OUs, etc. So why would it be a candidate > for going into the kernel? And why is kdbus so important in the > kernel as well? People have talked about it needing to be there for > bootup, but isn't that why we ripped out RAID detection and such from > the kernel and built initramfs, so that there's LESS in the kernel, > and more in an early userspace? Same idea with dbus in my opinion. > I don't have a well-developed philosophy on what should be in the kernel or not. That is something the kernel maintainers have to answer. My main concern here is that people understand what dbus is about historically, so they don't do silly stuff - whether cargo cult keeping a 'feature' that was always a bad idea, or speeding it up by breaking intentional and important semantics, or whatever. When I see people saying they don't understand what dbus is because they have no idea how a Linux workstation userspace is put together, that's something I can help with. When I see people saying maybe it isn't worth the complexity to put this in the kernel if it's only an N% speedup, I can see that, I'm not going to say that's wrong or right. It depends to me on what apps are enabled by the N%, or whether early boot and other factors are important. > When Ted is saying it's hard to debug... then maybe it's a bit crappy > in design or implementation? Or maybe he just doesn't know how to debug it, honestly. I find the kernel hard to debug because I know very little about it. I find the desktop simple to debug, at least as simple as debugging millions of lines of code can be. The difference is that I have never done kernel debugging and I'm already familiar with how the desktop works. dbus has tools that log every message and let you explore and introspect everything on it, etc. - it works for me. > So why DOES audio need to go via DBUS? What about video? Why > shouldn't that go via dbus as well? > > If one userspace implementation is so crappy, why can't that > implementation be tossed and a better one done? Or why can't they > just optimize/tune it in userspace instead? In this email I listed what I could remember app developers bringing up when told to use a sidechannel instead of dbus: http://article.gmane.org/gmane.linux.kernel/1935002 I can't speak to what makes sense for audio or video, but I'm sure people who work on those things could. Re: why can't it be done in userspace, the only thing I'd repeat again here is that when people mention ways to speed up the bus daemon in userspace, they often sound like they would abandon one or more of the semantic guarantees of dbus (usually ordering, sometimes things like the guaranteed-correct sender information or whatever). And _maybe_ some of those guarantees are worth abandoning, but I'd be very careful with it. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 23:12 ` John Stoffel 2015-04-29 0:45 ` Havoc Pennington @ 2015-04-29 11:33 ` Harald Hoyer 2015-04-29 12:47 ` Harald Hoyer 2 siblings, 0 replies; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 11:33 UTC (permalink / raw) To: John Stoffel, Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 29.04.2015 01:12, John Stoffel wrote: >>>>>> "Havoc" == Havoc Pennington <hp@pobox.com> writes: > > Havoc> On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <tytso@mit.edu> wrote: >>> I find dbus to be extremely hard to debug when my desktop starts doing >>> things I don't want it to do. The fact that it might be flinging around hundreds >>> of thousands of messages, and that this is something we want to encourage, > > Havoc> This particular argument doesn't resonate with me ... if dbus > Havoc> is hard to debug, it's not as if "ad hoc application-specific > Havoc> sidechannel somebody cooked up" is going to be easier. > > When Ted is saying it's hard to debug... then maybe it's a bit crappy > in design or implementation? There is a very nice tool to debug the traffic for kdbus. http://lists.freedesktop.org/archives/dbus/2014-March/016178.html Also the patched wireshark makes it as easy as analyzing network traffic. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 23:12 ` John Stoffel 2015-04-29 0:45 ` Havoc Pennington 2015-04-29 11:33 ` Harald Hoyer @ 2015-04-29 12:47 ` Harald Hoyer 2015-04-29 13:33 ` Richard Weinberger ` (5 more replies) 2 siblings, 6 replies; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 12:47 UTC (permalink / raw) To: John Stoffel, Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 29.04.2015 01:12, John Stoffel wrote: > LDAP is pretty damn generic, in that you can put pretty large objects into > it, and pretty large OUs, etc. So why would it be a candidate for going > into the kernel? And why is kdbus so important in the kernel as well? > People have talked about it needing to be there for bootup, but isn't that > why we ripped out RAID detection and such from the kernel and built > initramfs, so that there's LESS in the kernel, and more in an early > userspace? Same idea with dbus in my opinion. Let me elaborate on the initramfs/shutdown situation a little bit more, because I have to deal with that every day. Because of the "let's move everything to userspace" sentiment we nowadays have the situation, that we need a lot of tools to setup the root device. Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network (with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the LVM, open crypto devices, etc... And if something goes wrong, you want to have a shell, see all the logs and debug things. Now over the time we moved away from simple shell scripts (without any logging) and static compiled special versions for the initramfs to a mini distribution in the initramfs, which simplifies maintenance and improves reliability. Basically you want to use the same tools in the initramfs (and shutdown) which you already have and use in your real root, with the same configuration files and the same interfaces and the same code paths. Therefore systemd is started in dracut created initramfs, which starts journald for logging. The same basic systemd targets exist in the initramfs as on the real root, so normally you don't have to cope with specialized versions for the initramfs. The target here is to have the same IPC mechanism from the very beginning to the very end. No crappy fallback mechanisms in case a daemon is not running or has crashed, no creepy transition from initramfs root to real root to shutdown root. We already have such transitions like: systemd, journald, mdmon [1], etc. systemd has to serialize itself, journald's file descriptors are transitioned over, mdmon jumps through hoops. Remember you want to get rid of open files and executables and have to reexec everything, if you transition from the initramfs root to the real root, and also from the real root to the shutdown root. We really don't want the IPC mechanism to be in a flux state. All tools have to fallback to a non-standard mechanism in that case. If I have to pull in a dbus daemon in the initramfs, we still have the chicken and egg problem for PID 1 talking to the logging daemon and starting dbus. systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus cannot log anything on startup, if journald is not running, etc... dbus-daemon would have to transition to the real root, and from the real root to the shutdown root, without losing state. Of course this can all be done, but it would involve fallback mechanisms, which we want to get rid off. Hopefully, you don't suggest to merge dbus with PID 1. Also with a daemon, you will lose the points mentioned in the cover mail : * Security: The peers which communicate do not have to trust each other, as the only trustworthy component in the game is the kernel which adds metadata and ensures that all data passed as payload is either copied or sealed, so that the receiver can parse the data without having to protect against changing memory while parsing buffers. Also, all the data transfer is controlled by the kernel, so that LSMs can track and control what is going on, without involving userspace. Because of the LSM issue, security people are much happier with this model than the current scheme of having to hook into dbus to mediate things. * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions. For example, with kdbus, there is a way a client can disconnect from a bus, but do so only if no further messages present in its queue, which is crucial for implementing race-free "exit-on-idle" services * Eavesdropping on the kernel level, so privileged users can hook into the message stream without hacking support for that into their userspace processes * A number of smaller benefits: for example kdbus learned a way to peek full messages without dequeing them, which is really useful for logging metadata when handling bus-activation requests. I don't care, if the kdbus speedup is only marginal. In my ideal world, there is a standard IPC mechanism from the beginning to the end, which does not rely on any process running (except the kernel) and which is used by _all_ tools, be it a system daemon providing information and interfaces about device assembly or network setup tools or end user desktop processes. dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you can invent the wheel again (NIH, "we know better") and wait and see, if that works out. Until then the whole common IPC problem is unresolved and Linux distributions are just a collection of random software with no common interoperability and home grown interfaces. [1] transitioning mdmon is one of the critical parts for an IMSM raid array. Also running an executable from a disk, which the executable is monitoring, and which stops functioning, if the executable is not responding is insane. Thanks for reading... -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJVQNL1AAoJEANOs3ABTfJw0BUQAJgj2RNKR8L7xVPwH2GovmST nioOl6sg9u2m8NgYM6TJUJI3yHbOOiRVCTXHb9fmkTk/hBDxsT+X4lFevh0mDLJu Y5bk1RwGn8Ail3GLR6il9RhlNMKEqN2Ik4Ey26IdxQkOhIIAy9IfrNBdsdoNpJ7I P7qhP8J1DKfmIlgryrXy/mTZ1Nl1m6UlpMZHDSqlnPWuT/iJn0wORbs319fgAQx/ kkPvgSqTGkDetHGNzYmghgRzimNBR5ZftH0HS3Chq6rXPiSbdct/dE8VkQRiEWYo k6tE83qJr9KbSdBFqnbznVaOpTCQatdanVPBzzz4DTkuSKBlAxIbdXRaFsJCSnKp 7r+h8q+AgdALJXEyx5AyBeh8/dK1a/PsMzOtYZg6FXAz211geTxHeY8bTdOrzys9 kJGwUbbq4rIyvseEl53+Ugh2qZQptDKCj6F46H3iuhsOyUbPXzg1E7K8w2gApwSY L/eLEcQw+TApULyEhDrQqXlFBPz4vFP38mHNQ6T1Yt3sJuVoU12dOQNN6836htpe h4ijpaTbUkFV8b/7xgGqOlSBio4iSppybXfiBtHT7NBa4da1L+WG0xT+nR8RSMxd Gblt9ECZmbay6SIMYQBhntZD5Hs76iSJl0j2i9zg8E1pBw8O5w0jvlA02fOz2pkp wQsPrxNdlkBxFHVtf/3V =Dc23 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 12:47 ` Harald Hoyer @ 2015-04-29 13:33 ` Richard Weinberger 2015-04-29 13:38 ` Harald Hoyer 2015-04-29 13:35 ` Stephen Smalley ` (4 subsequent siblings) 5 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-29 13:33 UTC (permalink / raw) To: Harald Hoyer Cc: John Stoffel, Havoc Pennington, Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, Apr 29, 2015 at 2:47 PM, Harald Hoyer <harald@redhat.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > On 29.04.2015 01:12, John Stoffel wrote: >> LDAP is pretty damn generic, in that you can put pretty large objects into >> it, and pretty large OUs, etc. So why would it be a candidate for going >> into the kernel? And why is kdbus so important in the kernel as well? >> People have talked about it needing to be there for bootup, but isn't that >> why we ripped out RAID detection and such from the kernel and built >> initramfs, so that there's LESS in the kernel, and more in an early >> userspace? Same idea with dbus in my opinion. > > Let me elaborate on the initramfs/shutdown situation a little bit more, > because I have to deal with that every day. > > Because of the "let's move everything to userspace" sentiment we nowadays > have the situation, that we need a lot of tools to setup the root device. > > Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network > (with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the > LVM, open crypto devices, etc... > And if something goes wrong, you want to have a shell, see all the logs and > debug things. None of these tools depend on dbus (_desktop_ bus). > Now over the time we moved away from simple shell scripts (without any > logging) and static compiled special versions for the initramfs to a mini > distribution in the initramfs, which simplifies maintenance and improves > reliability. > > Basically you want to use the same tools in the initramfs (and shutdown) > which you already have and use in your real root, with the same configuration > files and the same interfaces and the same code paths. > > Therefore systemd is started in dracut created initramfs, which starts > journald for logging. The same basic systemd targets exist in the initramfs > as on the real root, so normally you don't have to cope with specialized > versions for the initramfs. > > The target here is to have the same IPC mechanism from the very beginning to > the very end. No crappy fallback mechanisms in case a daemon is not running > or has crashed, no creepy transition from initramfs root to real root to > shutdown root. > > We already have such transitions like: systemd, journald, mdmon [1], etc. > systemd has to serialize itself, journald's file descriptors are transitioned > over, mdmon jumps through hoops. Remember you want to get rid of open files > and executables and have to reexec everything, if you transition from the > initramfs root to the real root, and also from the real root to the shutdown > root. > > We really don't want the IPC mechanism to be in a flux state. All tools have > to fallback to a non-standard mechanism in that case. > > If I have to pull in a dbus daemon in the initramfs, we still have the > chicken and egg problem for PID 1 talking to the logging daemon and starting > dbus. > systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus > cannot log anything on startup, if journald is not running, etc... The only reason why you need dbus in your initramfs is because of systemd and its tools, isn't it? > In my ideal world, there is a standard IPC mechanism from the beginning to > the end, which does not rely on any process running (except the kernel) and > which is used by _all_ tools, be it a system daemon providing information and > interfaces about device assembly or network setup tools or end user desktop > processes. It depends how you define "beginning". To me an initramfs is a *very* minimal tool to prepare the rootfs and nothing more (no udev, no systemd, no "mini distro"). If the initramfs fails to do its job it can print to the console like the kernel does if it fails at a very early stage. -- Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 13:33 ` Richard Weinberger @ 2015-04-29 13:38 ` Harald Hoyer 2015-04-29 13:46 ` Richard Weinberger 2015-04-29 16:26 ` John Stoffel 0 siblings, 2 replies; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 13:38 UTC (permalink / raw) To: Richard Weinberger; +Cc: linux-kernel On 29.04.2015 15:33, Richard Weinberger wrote: > It depends how you define "beginning". To me an initramfs is a *very* minimal > tool to prepare the rootfs and nothing more (no udev, no systemd, no > "mini distro"). > If the initramfs fails to do its job it can print to the console like > the kernel does if it fails > at a very early stage. > Your solution might work for your small personal needs, but not for our customers. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 13:38 ` Harald Hoyer @ 2015-04-29 13:46 ` Richard Weinberger 2015-04-29 14:01 ` Harald Hoyer 2015-04-29 16:26 ` John Stoffel 1 sibling, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-29 13:46 UTC (permalink / raw) To: Harald Hoyer; +Cc: linux-kernel Am 29.04.2015 um 15:38 schrieb Harald Hoyer: > On 29.04.2015 15:33, Richard Weinberger wrote: >> It depends how you define "beginning". To me an initramfs is a *very* minimal >> tool to prepare the rootfs and nothing more (no udev, no systemd, no >> "mini distro"). >> If the initramfs fails to do its job it can print to the console like >> the kernel does if it fails >> at a very early stage. >> > > Your solution might work for your small personal needs, but not for our customers. Correct, I don't know your customers, all I know are my customers. :-) What feature do your customers need? I mean, I fully agree with you that an initramfs must not fail silently but how does dbus help there? If it fails to mount the rootfs there is not much it can do. Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 13:46 ` Richard Weinberger @ 2015-04-29 14:01 ` Harald Hoyer 2015-04-29 14:04 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 14:01 UTC (permalink / raw) To: Richard Weinberger; +Cc: linux-kernel On 29.04.2015 15:46, Richard Weinberger wrote: > Am 29.04.2015 um 15:38 schrieb Harald Hoyer: >> On 29.04.2015 15:33, Richard Weinberger wrote: >>> It depends how you define "beginning". To me an initramfs is a *very* minimal >>> tool to prepare the rootfs and nothing more (no udev, no systemd, no >>> "mini distro"). >>> If the initramfs fails to do its job it can print to the console like >>> the kernel does if it fails >>> at a very early stage. >>> >> >> Your solution might work for your small personal needs, but not for our customers. > > Correct, I don't know your customers, all I know are my customers. :-) > > What feature do your customers need? > I mean, I fully agree with you that an initramfs must not fail silently > but how does dbus help there? If it fails to mount the rootfs there is not > much it can do. > > Thanks, > //richard > We don't handcraft the initramfs script for every our customers, therefore we have to generically support hotplug, persistent device names, persistent interface names, network connectivity in the initramfs, user input handling for passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume from hibernation, […] And no, this is not a simple minimal tool. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:01 ` Harald Hoyer @ 2015-04-29 14:04 ` Richard Weinberger 2015-04-29 14:11 ` Harald Hoyer 0 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-29 14:04 UTC (permalink / raw) To: Harald Hoyer; +Cc: linux-kernel Am 29.04.2015 um 16:01 schrieb Harald Hoyer: > On 29.04.2015 15:46, Richard Weinberger wrote: >> Am 29.04.2015 um 15:38 schrieb Harald Hoyer: >>> On 29.04.2015 15:33, Richard Weinberger wrote: >>>> It depends how you define "beginning". To me an initramfs is a *very* minimal >>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no >>>> "mini distro"). >>>> If the initramfs fails to do its job it can print to the console like >>>> the kernel does if it fails >>>> at a very early stage. >>>> >>> >>> Your solution might work for your small personal needs, but not for our customers. >> >> Correct, I don't know your customers, all I know are my customers. :-) >> >> What feature do your customers need? >> I mean, I fully agree with you that an initramfs must not fail silently >> but how does dbus help there? If it fails to mount the rootfs there is not >> much it can do. >> >> Thanks, >> //richard >> > > We don't handcraft the initramfs script for every our customers, therefore we > have to generically support hotplug, persistent device names, persistent > interface names, network connectivity in the initramfs, user input handling for > passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, > raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, > FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume > from hibernation, […] This is correct. But which of these tools/features depend on dbus? Thanks, //richard P.s: Please don't drop the CC list. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:04 ` Richard Weinberger @ 2015-04-29 14:11 ` Harald Hoyer 2015-04-29 14:18 ` Richard Weinberger 2015-04-29 14:46 ` Austin S Hemmelgarn 0 siblings, 2 replies; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 14:11 UTC (permalink / raw) To: Richard Weinberger; +Cc: linux-kernel On 29.04.2015 16:04, Richard Weinberger wrote: > Am 29.04.2015 um 16:01 schrieb Harald Hoyer: >> On 29.04.2015 15:46, Richard Weinberger wrote: >>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer: >>>> On 29.04.2015 15:33, Richard Weinberger wrote: >>>>> It depends how you define "beginning". To me an initramfs is a *very* minimal >>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no >>>>> "mini distro"). >>>>> If the initramfs fails to do its job it can print to the console like >>>>> the kernel does if it fails >>>>> at a very early stage. >>>>> >>>> >>>> Your solution might work for your small personal needs, but not for our customers. >>> >>> Correct, I don't know your customers, all I know are my customers. :-) >>> >>> What feature do your customers need? >>> I mean, I fully agree with you that an initramfs must not fail silently >>> but how does dbus help there? If it fails to mount the rootfs there is not >>> much it can do. >>> >>> Thanks, >>> //richard >>> >> >> We don't handcraft the initramfs script for every our customers, therefore we >> have to generically support hotplug, persistent device names, persistent >> interface names, network connectivity in the initramfs, user input handling for >> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, >> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, >> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume >> from hibernation, […] > > This is correct. But which of these tools/features depend on dbus? I would love to add dbus support to all of them and use it, so I can connect them all more easily. No need for them to invent their own version of IPC, which can only be used by their own tool set. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:11 ` Harald Hoyer @ 2015-04-29 14:18 ` Richard Weinberger 2015-04-29 14:53 ` Harald Hoyer 2015-04-29 14:46 ` Austin S Hemmelgarn 1 sibling, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-29 14:18 UTC (permalink / raw) To: Harald Hoyer; +Cc: linux-kernel Am 29.04.2015 um 16:11 schrieb Harald Hoyer: >>> We don't handcraft the initramfs script for every our customers, therefore we >>> have to generically support hotplug, persistent device names, persistent >>> interface names, network connectivity in the initramfs, user input handling for >>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, >>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, >>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume >>> from hibernation, […] >> >> This is correct. But which of these tools/features depend on dbus? > > I would love to add dbus support to all of them and use it, so I can connect > them all more easily. No need for them to invent their own version of IPC, > which can only be used by their own tool set. Why/how do you need to connect them? Sorry for being persistent but as I use most of these tools too (also in initramfs) I'm very curious. Many of us grumpy kernel devs simply don't know all the use case of you have to cover. So, please explain. :-) Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:18 ` Richard Weinberger @ 2015-04-29 14:53 ` Harald Hoyer 2015-04-29 14:58 ` Richard Weinberger 2015-04-29 15:03 ` Theodore Ts'o 0 siblings, 2 replies; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 14:53 UTC (permalink / raw) To: Richard Weinberger; +Cc: linux-kernel On 29.04.2015 16:18, Richard Weinberger wrote: > Am 29.04.2015 um 16:11 schrieb Harald Hoyer: >>>> We don't handcraft the initramfs script for every our customers, therefore we >>>> have to generically support hotplug, persistent device names, persistent >>>> interface names, network connectivity in the initramfs, user input handling for >>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, >>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, >>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume >>>> from hibernation, […] >>> >>> This is correct. But which of these tools/features depend on dbus? >> >> I would love to add dbus support to all of them and use it, so I can connect >> them all more easily. No need for them to invent their own version of IPC, >> which can only be used by their own tool set. > > Why/how do you need to connect them? > Sorry for being persistent but as I use most of these tools too (also in initramfs) > I'm very curious. > > Many of us grumpy kernel devs simply don't know all the use case of you have to cover. > So, please explain. :-) > Well, using shell scripts I connected all of these tools in the earlier versions of dracut [1]. Been there, done that. When using bash to wait for an interface to come up [2] or doing dhcp [3], the (at least my) pain threshold is reached, and you want something more sophisticated. So, one starts eyeing NetworkManager or systemd-networkd. Both of them have CLI tools and helpers and these tools and helpers talk to each other with (guess what?) an IPC mechanism, which happens to be DBUS (because it's the IPC of choice, if you don't want to reinvent the wheel). But let's not pinpoint that to network alone. Parsing output of tools with shell scripts is horrible, slow, fragile, error prone. Sure, I can write one binary to rule them all, pull out all the code from all tools I need, but for me an IPC mechanism sounds a lot better. And it should be _one_ common IPC mechanism and not a plethora of them. It should feel like an operating system and not like a bunch of thrown together software, which is glued together with some magic shell scripts. [1] https://dracut.wiki.kernel.org/index.php/Main_Page [2] http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/40network/net-lib.sh#n483 [3] http://pkgs.fedoraproject.org/cgit/dhcp.git/tree/dhclient-script ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:53 ` Harald Hoyer @ 2015-04-29 14:58 ` Richard Weinberger 2015-04-29 15:03 ` Theodore Ts'o 1 sibling, 0 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-29 14:58 UTC (permalink / raw) To: Harald Hoyer; +Cc: linux-kernel Am 29.04.2015 um 16:53 schrieb Harald Hoyer: > On 29.04.2015 16:18, Richard Weinberger wrote: >> Am 29.04.2015 um 16:11 schrieb Harald Hoyer: >>>>> We don't handcraft the initramfs script for every our customers, therefore we >>>>> have to generically support hotplug, persistent device names, persistent >>>>> interface names, network connectivity in the initramfs, user input handling for >>>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, >>>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, >>>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume >>>>> from hibernation, […] >>>> >>>> This is correct. But which of these tools/features depend on dbus? >>> >>> I would love to add dbus support to all of them and use it, so I can connect >>> them all more easily. No need for them to invent their own version of IPC, >>> which can only be used by their own tool set. >> >> Why/how do you need to connect them? >> Sorry for being persistent but as I use most of these tools too (also in initramfs) >> I'm very curious. >> >> Many of us grumpy kernel devs simply don't know all the use case of you have to cover. >> So, please explain. :-) >> > > Well, using shell scripts I connected all of these tools in the earlier > versions of dracut [1]. Been there, done that. > > When using bash to wait for an interface to come up [2] or doing dhcp [3], the > (at least my) pain threshold is reached, and you want something more sophisticated. > > So, one starts eyeing NetworkManager or systemd-networkd. Both of them have CLI > tools and helpers and these tools and helpers talk to each other with (guess > what?) an IPC mechanism, which happens to be DBUS (because it's the IPC of > choice, if you don't want to reinvent the wheel). > > But let's not pinpoint that to network alone. Parsing output of tools with > shell scripts is horrible, slow, fragile, error prone. So, you want to replace bash by dbus? I'll stop now with arguing. Let's agree to disagree. > Sure, I can write one binary to rule them all, pull out all the code from all > tools I need, but for me an IPC mechanism sounds a lot better. And it should be > _one_ common IPC mechanism and not a plethora of them. It should feel like an > operating system and not like a bunch of thrown together software, which is > glued together with some magic shell scripts. This is how UNIX works. ;) Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:53 ` Harald Hoyer 2015-04-29 14:58 ` Richard Weinberger @ 2015-04-29 15:03 ` Theodore Ts'o 2015-04-29 15:21 ` Austin S Hemmelgarn 2015-04-29 16:25 ` Martin Steigerwald 1 sibling, 2 replies; 333+ messages in thread From: Theodore Ts'o @ 2015-04-29 15:03 UTC (permalink / raw) To: Harald Hoyer; +Cc: Richard Weinberger, linux-kernel On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote: > Sure, I can write one binary to rule them all, pull out all the code from all > tools I need, but for me an IPC mechanism sounds a lot better. And it should be > _one_ common IPC mechanism and not a plethora of them. It should feel like an > operating system and not like a bunch of thrown together software, which is > glued together with some magic shell scripts. And so requiring wireshark (and X?) in initramfs to debug problems once dbus is introduced is better? I would think shell scripts are *easier* to debug when things go wrong, especially in a minimal environment such as an initial ram disk. Having had to debug problems in a distro initramfs when trying to help a customer bring up a FC boot disk long ago in another life, I'm certain I would rather debug problems while on site at a classified machine room[1] using shell scripts, and trying to debug dbus is something that would be infinitely worse. - Ted [1] So no laptop, no google, no access to sources to figure out random dbus messages, etc. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:03 ` Theodore Ts'o @ 2015-04-29 15:21 ` Austin S Hemmelgarn 2015-04-30 9:05 ` Łukasz Stelmach 2015-04-29 16:25 ` Martin Steigerwald 1 sibling, 1 reply; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-29 15:21 UTC (permalink / raw) To: Theodore Ts'o, Harald Hoyer, Richard Weinberger, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1487 bytes --] On 2015-04-29 11:03, Theodore Ts'o wrote: > On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote: >> Sure, I can write one binary to rule them all, pull out all the code from all >> tools I need, but for me an IPC mechanism sounds a lot better. And it should be >> _one_ common IPC mechanism and not a plethora of them. It should feel like an >> operating system and not like a bunch of thrown together software, which is >> glued together with some magic shell scripts. > > And so requiring wireshark (and X?) in initramfs to debug problems > once dbus is introduced is better? > > I would think shell scripts are *easier* to debug when things go > wrong, especially in a minimal environment such as an initial ram > disk. Having had to debug problems in a distro initramfs when trying > to help a customer bring up a FC boot disk long ago in another life, > I'm certain I would rather debug problems while on site at a > classified machine room[1] using shell scripts, and trying to debug > dbus is something that would be infinitely worse. > > - Ted > > [1] So no laptop, no google, no access to sources to figure out random > dbus messages, etc. Likewise. I keep hearing from people that shell scripting is hard, it really isn't compared to a number of other scripting languages, you just need to actually learn to do it right (which is getting more and more difficult these days cause fewer and fewer CS schools are teaching Unix). [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:21 ` Austin S Hemmelgarn @ 2015-04-30 9:05 ` Łukasz Stelmach 2015-04-30 9:12 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Łukasz Stelmach @ 2015-04-30 9:05 UTC (permalink / raw) To: Austin S Hemmelgarn Cc: Theodore Ts'o, Harald Hoyer, Richard Weinberger, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2266 bytes --] It was <2015-04-29 śro 17:21>, when Austin S Hemmelgarn wrote: > On 2015-04-29 11:03, Theodore Ts'o wrote: >> On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote: >>> Sure, I can write one binary to rule them all, pull out all the code from all >>> tools I need, but for me an IPC mechanism sounds a lot better. And it should be >>> _one_ common IPC mechanism and not a plethora of them. It should feel like an >>> operating system and not like a bunch of thrown together software, which is >>> glued together with some magic shell scripts. >> >> And so requiring wireshark (and X?) in initramfs to debug problems >> once dbus is introduced is better? >> >> I would think shell scripts are *easier* to debug when things go >> wrong, [...] > I keep hearing from people that shell scripting is hard, it really > isn't compared to a number of other scripting languages, you just need > to actually learn to do it right (which is getting more and more > difficult these days cause fewer and fewer CS schools are teaching > Unix). My 2/100 of a currency of your choice. As much as I like(ed) shell scripts as a boot up tool and disliked obscure boot-up procedures of some operating system, I can't help but notice that GNU/Linux distributions have become very sophisticated/complcated (cross out if not applicable). Personally I feel that this degree of coplexity can't be supported by shell scripts piping data around. It does not scale. I am not 100% sure a new IPC is the answer, simply because I do not have experience to be so. It definitely can be and the problem, as I see it, is real. (The alternative answer is PowerShells capability to pipe objects. I don't like it and I thik it's not a full answer.) Regardless, of initrd issues I feel there is a need of a local IPC that is more capable than UDS. Linus Torvalds is probably right that dbus-daemon is everything but effictient. I disagree, however, that it can be optimised and therefore solve *all* issues kdbus is trying to address. dbus-deamon, by design, can't some things. It can't transmitt large payloads without copying them. It can't be made race-free. Kind regards, -- Łukasz Stelmach Samsung R&D Institute Poland Samsung Electronics [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 9:05 ` Łukasz Stelmach @ 2015-04-30 9:12 ` Richard Weinberger 2015-04-30 10:19 ` Łukasz Stelmach 0 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-30 9:12 UTC (permalink / raw) To: Łukasz Stelmach, Austin S Hemmelgarn Cc: Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 817 bytes --] Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: > Regardless, of initrd issues I feel there is a need of a local IPC that > is more capable than UDS. Linus Torvalds is probably right that > dbus-daemon is everything but effictient. I disagree, however, that it > can be optimised and therefore solve *all* issues kdbus is trying to > address. dbus-deamon, by design, can't some things. It can't transmitt > large payloads without copying them. It can't be made race-free. This is true. But as long dbus-deamon is not optimized as much as possible there is no reason to force push kdbus. As soon dbus-deamon exploits all kernel interfaces as much it can and it still needs work (may it performance or other stuff) we can think of new kernel features which can help dbus-deamon. Thanks, //richard [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 9:12 ` Richard Weinberger @ 2015-04-30 10:19 ` Łukasz Stelmach 2015-04-30 10:40 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Łukasz Stelmach @ 2015-04-30 10:19 UTC (permalink / raw) To: Richard Weinberger Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1537 bytes --] It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: > Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >> Regardless, of initrd issues I feel there is a need of a local IPC >> that is more capable than UDS. Linus Torvalds is probably right that >> dbus-daemon is everything but effictient. I disagree, however, that >> it can be optimised and therefore solve *all* issues kdbus is trying >> to address. dbus-deamon, by design, can't some things. It can't >> transmitt large payloads without copying them. It can't be made >> race-free. > > This is true. > But as long dbus-deamon is not optimized as much as possible there is > no reason to force push kdbus. > As soon dbus-deamon exploits all kernel interfaces as much it can and > it still needs work (may it performance or other stuff) we can think > of new kernel features which can help dbus-deamon. I may not be well informed about kernel interfaces, but there are some use cases no dbus-daemon optimisation can make work properly because of rece-conditons introduced by the user-space based message router. For example, a service can't aquire credentials of a client process that actually sent a request (it can, but it can't trust them). The service can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, dbus-daemon, can check client's and srevice's labels and enforce a policy but it is going to be the daemon and not the LSM code in the kernel. -- Łukasz Stelmach Samsung R&D Institute Poland Samsung Electronics [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 10:19 ` Łukasz Stelmach @ 2015-04-30 10:40 ` Richard Weinberger 2015-04-30 12:16 ` Łukasz Stelmach 0 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-30 10:40 UTC (permalink / raw) To: Łukasz Stelmach Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1840 bytes --] Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach: > It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: >> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >>> Regardless, of initrd issues I feel there is a need of a local IPC >>> that is more capable than UDS. Linus Torvalds is probably right that >>> dbus-daemon is everything but effictient. I disagree, however, that >>> it can be optimised and therefore solve *all* issues kdbus is trying >>> to address. dbus-deamon, by design, can't some things. It can't >>> transmitt large payloads without copying them. It can't be made >>> race-free. >> >> This is true. >> But as long dbus-deamon is not optimized as much as possible there is >> no reason to force push kdbus. >> As soon dbus-deamon exploits all kernel interfaces as much it can and >> it still needs work (may it performance or other stuff) we can think >> of new kernel features which can help dbus-deamon. > > I may not be well informed about kernel interfaces, but there are some > use cases no dbus-daemon optimisation can make work properly because of > rece-conditons introduced by the user-space based message router. > > For example, a service can't aquire credentials of a client process that > actually sent a request (it can, but it can't trust them). The service > can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, > dbus-daemon, can check client's and srevice's labels and enforce a > policy but it is going to be the daemon and not the LSM code in the > kernel. That's why I said we can think of new kernel features if they are needed. But they current sink or swim approach of kdbus folks is also not the solution. As I said, if dbus-daemon utilizes the kernel interface as much as possible we can think of new features. Thanks, //richard [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 10:40 ` Richard Weinberger @ 2015-04-30 12:16 ` Łukasz Stelmach 2015-04-30 12:23 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Łukasz Stelmach @ 2015-04-30 12:16 UTC (permalink / raw) To: Richard Weinberger Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2220 bytes --] It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote: > Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach: >> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: >>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >>>> Regardless, of initrd issues I feel there is a need of a local IPC >>>> that is more capable than UDS. Linus Torvalds is probably right that >>>> dbus-daemon is everything but effictient. I disagree, however, that >>>> it can be optimised and therefore solve *all* issues kdbus is trying >>>> to address. dbus-deamon, by design, can't some things. It can't >>>> transmitt large payloads without copying them. It can't be made >>>> race-free. >>> >>> This is true. >>> But as long dbus-deamon is not optimized as much as possible there is >>> no reason to force push kdbus. >>> As soon dbus-deamon exploits all kernel interfaces as much it can and >>> it still needs work (may it performance or other stuff) we can think >>> of new kernel features which can help dbus-deamon. >> >> I may not be well informed about kernel interfaces, but there are >> some use cases no dbus-daemon optimisation can make work properly >> because of rece-conditons introduced by the user-space based message >> router. >> >> For example, a service can't aquire credentials of a client process that >> actually sent a request (it can, but it can't trust them). The service >> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, >> dbus-daemon, can check client's and srevice's labels and enforce a >> policy but it is going to be the daemon and not the LSM code in the >> kernel. > > That's why I said we can think of new kernel features if they are > needed. But they current sink or swim approach of kdbus folks is also > not the solution. As I said, if dbus-daemon utilizes the kernel > interface as much as possible we can think of new features. What kernel interfaces do you suggest to use to solve the issues I mentioned in the second paragraph: race conditions, LSM support (for example)? BTW. Does anyone know how microkernel-based OSes implement sockets? -- Łukasz Stelmach Samsung R&D Institute Poland Samsung Electronics [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 12:16 ` Łukasz Stelmach @ 2015-04-30 12:23 ` Richard Weinberger 2015-04-30 12:40 ` Łukasz Stelmach 0 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-30 12:23 UTC (permalink / raw) To: Łukasz Stelmach Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2337 bytes --] Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach: > It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote: >> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach: >>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: >>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >>>>> Regardless, of initrd issues I feel there is a need of a local IPC >>>>> that is more capable than UDS. Linus Torvalds is probably right that >>>>> dbus-daemon is everything but effictient. I disagree, however, that >>>>> it can be optimised and therefore solve *all* issues kdbus is trying >>>>> to address. dbus-deamon, by design, can't some things. It can't >>>>> transmitt large payloads without copying them. It can't be made >>>>> race-free. >>>> >>>> This is true. >>>> But as long dbus-deamon is not optimized as much as possible there is >>>> no reason to force push kdbus. >>>> As soon dbus-deamon exploits all kernel interfaces as much it can and >>>> it still needs work (may it performance or other stuff) we can think >>>> of new kernel features which can help dbus-deamon. >>> >>> I may not be well informed about kernel interfaces, but there are >>> some use cases no dbus-daemon optimisation can make work properly >>> because of rece-conditons introduced by the user-space based message >>> router. >>> >>> For example, a service can't aquire credentials of a client process that >>> actually sent a request (it can, but it can't trust them). The service >>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, >>> dbus-daemon, can check client's and srevice's labels and enforce a >>> policy but it is going to be the daemon and not the LSM code in the >>> kernel. >> >> That's why I said we can think of new kernel features if they are >> needed. But they current sink or swim approach of kdbus folks is also >> not the solution. As I said, if dbus-daemon utilizes the kernel >> interface as much as possible we can think of new features. > > What kernel interfaces do you suggest to use to solve the issues > I mentioned in the second paragraph: race conditions, LSM support (for > example)? The question is whether it makes sense to collect this kind of meta data. I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS. Thanks, //richard [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 12:23 ` Richard Weinberger @ 2015-04-30 12:40 ` Łukasz Stelmach 2015-04-30 12:45 ` Richard Weinberger 0 siblings, 1 reply; 333+ messages in thread From: Łukasz Stelmach @ 2015-04-30 12:40 UTC (permalink / raw) To: Richard Weinberger Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1794 bytes --] It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote: > Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach: >> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote: >>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach: >>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: >>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >>>>>> Regardless, of initrd issues I feel there is a need of a local IPC >>>>>> that is more capable than UDS. [...] >>>> For example, a service can't aquire credentials of a client process that >>>> actually sent a request (it can, but it can't trust them). The service >>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, >>>> dbus-daemon, can check client's and srevice's labels and enforce a >>>> policy but it is going to be the daemon and not the LSM code in the >>>> kernel. >>> >>> That's why I said we can think of new kernel features if they are >>> needed. But they current sink or swim approach of kdbus folks is also >>> not the solution. As I said, if dbus-daemon utilizes the kernel >>> interface as much as possible we can think of new features. >> >> What kernel interfaces do you suggest to use to solve the issues >> I mentioned in the second paragraph: race conditions, LSM support (for >> example)? > > The question is whether it makes sense to collect this kind of meta data. > I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS. Race conditions have nothing to do with metadata. Neither has LSM support. AF_UNIX with multicast support wouldn't be AF_UNIX anymore. AF_BUS? I haven't followed the discussion back then. Why do you think it is better than kdbus? -- Łukasz Stelmach Samsung R&D Institute Poland Samsung Electronics [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 12:40 ` Łukasz Stelmach @ 2015-04-30 12:45 ` Richard Weinberger 2015-04-30 14:52 ` Łukasz Stelmach 0 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-30 12:45 UTC (permalink / raw) To: Łukasz Stelmach Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1962 bytes --] Am 30.04.2015 um 14:40 schrieb Łukasz Stelmach: > It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote: >> Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach: >>> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote: >>>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach: >>>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: >>>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >>>>>>> Regardless, of initrd issues I feel there is a need of a local IPC >>>>>>> that is more capable than UDS. > [...] >>>>> For example, a service can't aquire credentials of a client process that >>>>> actually sent a request (it can, but it can't trust them). The service >>>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, >>>>> dbus-daemon, can check client's and srevice's labels and enforce a >>>>> policy but it is going to be the daemon and not the LSM code in the >>>>> kernel. >>>> >>>> That's why I said we can think of new kernel features if they are >>>> needed. But they current sink or swim approach of kdbus folks is also >>>> not the solution. As I said, if dbus-daemon utilizes the kernel >>>> interface as much as possible we can think of new features. >>> >>> What kernel interfaces do you suggest to use to solve the issues >>> I mentioned in the second paragraph: race conditions, LSM support (for >>> example)? >> >> The question is whether it makes sense to collect this kind of meta data. >> I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS. > > Race conditions have nothing to do with metadata. Neither has LSM > support. Sorry, I thought you mean the races while collecting metadata in userspace... > AF_UNIX with multicast support wouldn't be AF_UNIX anymore. > > AF_BUS? I haven't followed the discussion back then. Why do you think it > is better than kdbus? Please see https://lwn.net/Articles/641278/ Thanks, //richard [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 12:45 ` Richard Weinberger @ 2015-04-30 14:52 ` Łukasz Stelmach 2015-04-30 15:05 ` Richard Weinberger 2015-07-03 9:13 ` cee1 0 siblings, 2 replies; 333+ messages in thread From: Łukasz Stelmach @ 2015-04-30 14:52 UTC (permalink / raw) To: Richard Weinberger Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2826 bytes --] It was <2015-04-30 czw 14:45>, when Richard Weinberger wrote: > Am 30.04.2015 um 14:40 schrieb Łukasz Stelmach: >> It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote: >>> Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach: >>>> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote: >>>>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach: >>>>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: >>>>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >>>>>>>> Regardless, of initrd issues I feel there is a need of a local IPC >>>>>>>> that is more capable than UDS. >> [...] >>>>>> For example, a service can't aquire credentials of a client process that >>>>>> actually sent a request (it can, but it can't trust them). The service >>>>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, >>>>>> dbus-daemon, can check client's and srevice's labels and enforce a >>>>>> policy but it is going to be the daemon and not the LSM code in the >>>>>> kernel. >>>>> >>>>> That's why I said we can think of new kernel features if they are >>>>> needed. But they current sink or swim approach of kdbus folks is also >>>>> not the solution. As I said, if dbus-daemon utilizes the kernel >>>>> interface as much as possible we can think of new features. >>>> >>>> What kernel interfaces do you suggest to use to solve the issues >>>> I mentioned in the second paragraph: race conditions, LSM support (for >>>> example)? >>> >>> The question is whether it makes sense to collect this kind of meta data. >>> I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS. >> >> Race conditions have nothing to do with metadata. Neither has LSM >> support. > > Sorry, I thought you mean the races while collecting metadata in userspace... My bad, some reace conditions *are* associated with collecting metadata but ont all. It is impossible (correct me if I am wrong) to implement reliable die-on-idle with dbus-daemon. >> AF_UNIX with multicast support wouldn't be AF_UNIX anymore. >> >> AF_BUS? I haven't followed the discussion back then. Why do you think it >> is better than kdbus? > > Please see https://lwn.net/Articles/641278/ Thanks. If I understand correctly, the author suggests using EBPF on a receiveing socket side for receiving multicast messages. This is nice if you care about introducing (or not) (too?) much of new code. However, AFAICT it may be more computationally complex than Bloom filters because you need to run EBPF on every receiving socket instead of getting a list of a few of them to copy data to. Of course for small number of receivers the "constant" cost of running the Bloom filter may be higher. Kind regards, -- Łukasz Stelmach Samsung R&D Institute Poland Samsung Electronics [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 14:52 ` Łukasz Stelmach @ 2015-04-30 15:05 ` Richard Weinberger 2015-07-03 9:13 ` cee1 1 sibling, 0 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-30 15:05 UTC (permalink / raw) To: Łukasz Stelmach Cc: Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel, One Thousand Gnomes, Andy Lutomirski [-- Attachment #1: Type: text/plain, Size: 1457 bytes --] Am 30.04.2015 um 16:52 schrieb Łukasz Stelmach: >> Sorry, I thought you mean the races while collecting metadata in userspace... > > My bad, some reace conditions *are* associated with collecting metadata > but ont all. It is impossible (correct me if I am wrong) to implement > reliable die-on-idle with dbus-daemon. IIRC Andy gave some ideas howto deal with that. i.e. https://lkml.org/lkml/2015/4/29/622 >>> AF_UNIX with multicast support wouldn't be AF_UNIX anymore. >>> >>> AF_BUS? I haven't followed the discussion back then. Why do you think it >>> is better than kdbus? >> >> Please see https://lwn.net/Articles/641278/ > > Thanks. If I understand correctly, the author suggests using EBPF on a > receiveing socket side for receiving multicast messages. This is nice if > you care about introducing (or not) (too?) much of new code. However, > AFAICT it may be more computationally complex than Bloom filters because > you need to run EBPF on every receiving socket instead of getting a list > of a few of them to copy data to. Of course for small number of > receivers the "constant" cost of running the Bloom filter may be higher. To make the story short, the kdbus *concept* needs much more thought. There are many ideas out there howto deal with dbus issues without introducing an ad-hoc solution. AF_BUS is just one of them. IMHO AF_BUS would be nice but the decision is not up to me. Thanks, //richard [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 14:52 ` Łukasz Stelmach 2015-04-30 15:05 ` Richard Weinberger @ 2015-07-03 9:13 ` cee1 1 sibling, 0 replies; 333+ messages in thread From: cee1 @ 2015-07-03 9:13 UTC (permalink / raw) To: Łukasz Stelmach Cc: Richard Weinberger, Austin S Hemmelgarn, Theodore Ts'o, Harald Hoyer, linux-kernel, Greg KH 2015-04-30 22:52 GMT+08:00 Łukasz Stelmach <l.stelmach@samsung.com>: > It was <2015-04-30 czw 14:45>, when Richard Weinberger wrote: >> Am 30.04.2015 um 14:40 schrieb Łukasz Stelmach: >>> It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote: >>>> Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach: >>>>> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote: >>>>>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach: >>>>>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote: >>>>>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach: >>>>>>>>> Regardless, of initrd issues I feel there is a need of a local IPC >>>>>>>>> that is more capable than UDS. >>> [...] >>>>>>> For example, a service can't aquire credentials of a client process that >>>>>>> actually sent a request (it can, but it can't trust them). The service >>>>>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes, >>>>>>> dbus-daemon, can check client's and srevice's labels and enforce a >>>>>>> policy but it is going to be the daemon and not the LSM code in the >>>>>>> kernel. >>>>>> >>>>>> That's why I said we can think of new kernel features if they are >>>>>> needed. But they current sink or swim approach of kdbus folks is also >>>>>> not the solution. As I said, if dbus-daemon utilizes the kernel >>>>>> interface as much as possible we can think of new features. >>>>> >>>>> What kernel interfaces do you suggest to use to solve the issues >>>>> I mentioned in the second paragraph: race conditions, LSM support (for >>>>> example)? >>>> >>>> The question is whether it makes sense to collect this kind of meta data. >>>> I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS. >>> >>> Race conditions have nothing to do with metadata. Neither has LSM >>> support. >> >> Sorry, I thought you mean the races while collecting metadata in userspace... > > My bad, some reace conditions *are* associated with collecting metadata > but ont all. It is impossible (correct me if I am wrong) to implement > reliable die-on-idle with dbus-daemon. > >>> AF_UNIX with multicast support wouldn't be AF_UNIX anymore. >>> >>> AF_BUS? I haven't followed the discussion back then. Why do you think it >>> is better than kdbus? >> >> Please see https://lwn.net/Articles/641278/ > > Thanks. If I understand correctly, the author suggests using EBPF on a > receiveing socket side for receiving multicast messages. This is nice if > you care about introducing (or not) (too?) much of new code. However, > AFAICT it may be more computationally complex than Bloom filters because > you need to run EBPF on every receiving socket instead of getting a list > of a few of them to copy data to. Of course for small number of > receivers the "constant" cost of running the Bloom filter may be higher. Still think about the idea of implementing KDBUS in the form of socket. What about using __multicast group__ instead of EBPF, to send/receive multicast message? (Which can implement the bloom filter as follows ?) E.g. Sender: send to multi_address Receivers: if ((multi_address & joined_address) == joined_address) { /* a message for us */ } Then we can further apply EBFP to remove the "False positive" case, which will otherwise wake up user space code, and let it clear "False positive" case. -- Regards, - cee1 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:03 ` Theodore Ts'o 2015-04-29 15:21 ` Austin S Hemmelgarn @ 2015-04-29 16:25 ` Martin Steigerwald 1 sibling, 0 replies; 333+ messages in thread From: Martin Steigerwald @ 2015-04-29 16:25 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Harald Hoyer, Richard Weinberger, linux-kernel Am Mittwoch, 29. April 2015, 11:03:41 schrieb Theodore Ts'o: > On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote: > > Sure, I can write one binary to rule them all, pull out all the code > > from all tools I need, but for me an IPC mechanism sounds a lot > > better. And it should be _one_ common IPC mechanism and not a > > plethora of them. It should feel like an operating system and not > > like a bunch of thrown together software, which is glued together > > with some magic shell scripts. > > And so requiring wireshark (and X?) in initramfs to debug problems > once dbus is introduced is better? > > I would think shell scripts are *easier* to debug when things go > wrong, especially in a minimal environment such as an initial ram > disk. Having had to debug problems in a distro initramfs when trying > to help a customer bring up a FC boot disk long ago in another life, > I'm certain I would rather debug problems while on site at a > classified machine room[1] using shell scripts, and trying to debug > dbus is something that would be infinitely worse. Later in boot process I have seen some debugging issues with a systemd governed userspace: Bug 1213778 - drops into emergency mode without any error message if it cannot find a filesystem in /etc/fstab https://bugzilla.redhat.com/show_bug.cgi?id=1213778 Bug 1213781 - does not start ssh service if a filesystem in /etc/fstab cannot be mounted https://bugzilla.redhat.com/show_bug.cgi?id=1213781 With shell scripts I had error messages for these, here I had to browse the output of journalctl -xb to find out. And then what do I do if dbus communication is not working for some reason? Will I then be able to use journalctl at all? Not that proper error reporting can´t be added in systemd and the dependency handling can´t be fixed towards some more sanity like for example "no /boot mount, no worries, I still start that ssh service for you", but currently for me this is a clear and heavy regression when I compare this with sysvinit boot behavior. I do fix things when being dropped to initramfs shell, I added some script snippet for supporting BTRFS RAID 1 while it wasn´t yet supported in Debian (dunno if it is for booting in Jessie, but as my bug reports have not been closed yet, maybe bot), I still want to be able to do these kinds of things. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:11 ` Harald Hoyer 2015-04-29 14:18 ` Richard Weinberger @ 2015-04-29 14:46 ` Austin S Hemmelgarn 2015-04-29 14:51 ` Richard Weinberger 2015-04-29 15:07 ` Harald Hoyer 1 sibling, 2 replies; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-29 14:46 UTC (permalink / raw) To: Harald Hoyer, Richard Weinberger; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 2594 bytes --] On 2015-04-29 10:11, Harald Hoyer wrote: > On 29.04.2015 16:04, Richard Weinberger wrote: >> Am 29.04.2015 um 16:01 schrieb Harald Hoyer: >>> On 29.04.2015 15:46, Richard Weinberger wrote: >>>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer: >>>>> On 29.04.2015 15:33, Richard Weinberger wrote: >>>>>> It depends how you define "beginning". To me an initramfs is a *very* minimal >>>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no >>>>>> "mini distro"). >>>>>> If the initramfs fails to do its job it can print to the console like >>>>>> the kernel does if it fails >>>>>> at a very early stage. >>>>>> >>>>> >>>>> Your solution might work for your small personal needs, but not for our customers. >>>> >>>> Correct, I don't know your customers, all I know are my customers. :-) >>>> >>>> What feature do your customers need? >>>> I mean, I fully agree with you that an initramfs must not fail silently >>>> but how does dbus help there? If it fails to mount the rootfs there is not >>>> much it can do. >>>> >>>> Thanks, >>>> //richard >>>> >>> >>> We don't handcraft the initramfs script for every our customers, therefore we >>> have to generically support hotplug, persistent device names, persistent >>> interface names, network connectivity in the initramfs, user input handling for >>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, >>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, >>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume >>> from hibernation, […] >> >> This is correct. But which of these tools/features depend on dbus? > > I would love to add dbus support to all of them and use it, so I can connect > them all more easily. No need for them to invent their own version of IPC, > which can only be used by their own tool set. > Resume is built into the kernel, so no need for IPC there. Keymaps, fonts, and fsck need no IPC either. FIPS related stuff should need no IPC. Anything to do with the Device Mapper and hotplug should just need uevents. While I can kind of see you wanting to have lvmetad in the initramfs for use with LVM, I've seen all kinds of reports of issues caused by that. I can also kind of understand wanting some kind of unified IPC for the netboot related stuff, DBus is still serious overkill for any of that IMHO. As things stand currently, the few things in that list that I know actually use IPC for anything get by just fine (and much faster) using just UDS. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:46 ` Austin S Hemmelgarn @ 2015-04-29 14:51 ` Richard Weinberger 2015-04-29 15:07 ` Harald Hoyer 1 sibling, 0 replies; 333+ messages in thread From: Richard Weinberger @ 2015-04-29 14:51 UTC (permalink / raw) To: Austin S Hemmelgarn, Harald Hoyer; +Cc: linux-kernel Am 29.04.2015 um 16:46 schrieb Austin S Hemmelgarn: > On 2015-04-29 10:11, Harald Hoyer wrote: >> On 29.04.2015 16:04, Richard Weinberger wrote: >>> Am 29.04.2015 um 16:01 schrieb Harald Hoyer: >>>> On 29.04.2015 15:46, Richard Weinberger wrote: >>>>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer: >>>>>> On 29.04.2015 15:33, Richard Weinberger wrote: >>>>>>> It depends how you define "beginning". To me an initramfs is a *very* minimal >>>>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no >>>>>>> "mini distro"). >>>>>>> If the initramfs fails to do its job it can print to the console like >>>>>>> the kernel does if it fails >>>>>>> at a very early stage. >>>>>>> >>>>>> >>>>>> Your solution might work for your small personal needs, but not for our customers. >>>>> >>>>> Correct, I don't know your customers, all I know are my customers. :-) >>>>> >>>>> What feature do your customers need? >>>>> I mean, I fully agree with you that an initramfs must not fail silently >>>>> but how does dbus help there? If it fails to mount the rootfs there is not >>>>> much it can do. >>>>> >>>>> Thanks, >>>>> //richard >>>>> >>>> >>>> We don't handcraft the initramfs script for every our customers, therefore we >>>> have to generically support hotplug, persistent device names, persistent >>>> interface names, network connectivity in the initramfs, user input handling for >>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, >>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, >>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume >>>> from hibernation, […] >>> >>> This is correct. But which of these tools/features depend on dbus? >> >> I would love to add dbus support to all of them and use it, so I can connect >> them all more easily. No need for them to invent their own version of IPC, >> which can only be used by their own tool set. >> > Resume is built into the kernel, so no need for IPC there. Keymaps, fonts, and fsck need no IPC either. FIPS related stuff should need no IPC. Anything to do with the Device > Mapper and hotplug should just need uevents. While I can kind of see you wanting to have lvmetad in the initramfs for use with LVM, I've seen all kinds of reports of issues caused > by that. I can also kind of understand wanting some kind of unified IPC for the netboot related stuff, DBus is still serious overkill for any of that IMHO. As things stand > currently, the few things in that list that I know actually use IPC for anything get by just fine (and much faster) using just UDS. Even if dbus is really needed you can embed it into the initramfs. What you need is a good bootstrap procedure between dbus/systemd within the initramfs and the real rootfs. Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 14:46 ` Austin S Hemmelgarn 2015-04-29 14:51 ` Richard Weinberger @ 2015-04-29 15:07 ` Harald Hoyer 2015-04-29 15:17 ` Austin S Hemmelgarn 1 sibling, 1 reply; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 15:07 UTC (permalink / raw) To: Austin S Hemmelgarn, Richard Weinberger; +Cc: linux-kernel On 29.04.2015 16:46, Austin S Hemmelgarn wrote: > On 2015-04-29 10:11, Harald Hoyer wrote: >> On 29.04.2015 16:04, Richard Weinberger wrote: >>> Am 29.04.2015 um 16:01 schrieb Harald Hoyer: >>>> On 29.04.2015 15:46, Richard Weinberger wrote: >>>>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer: >>>>>> On 29.04.2015 15:33, Richard Weinberger wrote: >>>>>>> It depends how you define "beginning". To me an initramfs is a *very* >>>>>>> minimal >>>>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no >>>>>>> "mini distro"). >>>>>>> If the initramfs fails to do its job it can print to the console like >>>>>>> the kernel does if it fails >>>>>>> at a very early stage. >>>>>>> >>>>>> >>>>>> Your solution might work for your small personal needs, but not for our >>>>>> customers. >>>>> >>>>> Correct, I don't know your customers, all I know are my customers. :-) >>>>> >>>>> What feature do your customers need? >>>>> I mean, I fully agree with you that an initramfs must not fail silently >>>>> but how does dbus help there? If it fails to mount the rootfs there is not >>>>> much it can do. >>>>> >>>>> Thanks, >>>>> //richard >>>>> >>>> >>>> We don't handcraft the initramfs script for every our customers, therefore we >>>> have to generically support hotplug, persistent device names, persistent >>>> interface names, network connectivity in the initramfs, user input handling >>>> for >>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems, >>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI, >>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume >>>> from hibernation, […] >>> >>> This is correct. But which of these tools/features depend on dbus? >> >> I would love to add dbus support to all of them and use it, so I can connect >> them all more easily. No need for them to invent their own version of IPC, >> which can only be used by their own tool set. >> > Resume is built into the kernel, so no need for IPC there. Keymaps, fonts, and > fsck need no IPC either. FIPS related stuff should need no IPC. Anything to > do with the Device Mapper and hotplug should just need uevents. While I can > kind of see you wanting to have lvmetad in the initramfs for use with LVM, I've > seen all kinds of reports of issues caused by that. I can also kind of > understand wanting some kind of unified IPC for the netboot related stuff, DBus > is still serious overkill for any of that IMHO. As things stand currently, the > few things in that list that I know actually use IPC for anything get by just > fine (and much faster) using just UDS. > > He asked what customers need, because he does not need udev, systemd, "mini distro". Most of the stuff does not work without udev and something like systemd. And all of the stuff I mentioned together forms a "mini distro" for me. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:07 ` Harald Hoyer @ 2015-04-29 15:17 ` Austin S Hemmelgarn 2015-04-29 15:22 ` Harald Hoyer 0 siblings, 1 reply; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-29 15:17 UTC (permalink / raw) To: Harald Hoyer, Richard Weinberger; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 604 bytes --] On 2015-04-29 11:07, Harald Hoyer wrote: > Most of the stuff does not work without udev and something like systemd. > That's funny, apparently the initramfs images I've been using for multiple months now on server systems at work which don't have systemd, udev, or dbus, and do LVM/RAID assembly, network configuration, crypto devices, multipath, many different filesystems, and a number of other oddball configurations due to the insanity that is the software I have to deal with from our company, don't work. I wonder how my systems are booting successfully 100% of the time then? [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:17 ` Austin S Hemmelgarn @ 2015-04-29 15:22 ` Harald Hoyer 2015-04-29 15:41 ` Austin S Hemmelgarn 2015-04-29 18:28 ` Martin Steigerwald 0 siblings, 2 replies; 333+ messages in thread From: Harald Hoyer @ 2015-04-29 15:22 UTC (permalink / raw) To: Austin S Hemmelgarn, Richard Weinberger; +Cc: linux-kernel On 29.04.2015 17:17, Austin S Hemmelgarn wrote: > On 2015-04-29 11:07, Harald Hoyer wrote: >> Most of the stuff does not work without udev and something like systemd. >> > That's funny, apparently the initramfs images I've been using for multiple > months now on server systems at work which don't have systemd, udev, or dbus, > and do LVM/RAID assembly, network configuration, crypto devices, multipath, > many different filesystems, and a number of other oddball configurations due to > the insanity that is the software I have to deal with from our company, don't > work. I wonder how my systems are booting successfully 100% of the time then? > > Then you should probably open source your initramfs, so we can all benefit from it and use it for all distributions. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:22 ` Harald Hoyer @ 2015-04-29 15:41 ` Austin S Hemmelgarn 2015-04-29 18:28 ` Martin Steigerwald 1 sibling, 0 replies; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-29 15:41 UTC (permalink / raw) To: Harald Hoyer, Richard Weinberger; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1249 bytes --] On 2015-04-29 11:22, Harald Hoyer wrote: > On 29.04.2015 17:17, Austin S Hemmelgarn wrote: >> On 2015-04-29 11:07, Harald Hoyer wrote: >>> Most of the stuff does not work without udev and something like systemd. >>> >> That's funny, apparently the initramfs images I've been using for multiple >> months now on server systems at work which don't have systemd, udev, or dbus, >> and do LVM/RAID assembly, network configuration, crypto devices, multipath, >> many different filesystems, and a number of other oddball configurations due to >> the insanity that is the software I have to deal with from our company, don't >> work. I wonder how my systems are booting successfully 100% of the time then? >> >> > > Then you should probably open source your initramfs, so we can all benefit from > it and use it for all distributions. > It's (mostly, aside from a couple of overlays to deal with the hoops I have to jump through to get some of our software working) just the standard one generated by Gentoo's 'genkernel' program (specifically the version from the genkernel-ng package), although now that I actually look at it, it might have udev in it, although I'm certain the ones that I have don't have systemd or dbus. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:22 ` Harald Hoyer 2015-04-29 15:41 ` Austin S Hemmelgarn @ 2015-04-29 18:28 ` Martin Steigerwald 1 sibling, 0 replies; 333+ messages in thread From: Martin Steigerwald @ 2015-04-29 18:28 UTC (permalink / raw) To: Harald Hoyer; +Cc: Austin S Hemmelgarn, Richard Weinberger, linux-kernel Am Mittwoch, 29. April 2015, 17:22:08 schrieb Harald Hoyer: > On 29.04.2015 17:17, Austin S Hemmelgarn wrote: > > On 2015-04-29 11:07, Harald Hoyer wrote: > >> Most of the stuff does not work without udev and something like > >> systemd.> > > That's funny, apparently the initramfs images I've been using for > > multiple months now on server systems at work which don't have > > systemd, udev, or dbus, and do LVM/RAID assembly, network > > configuration, crypto devices, multipath, many different filesystems, > > and a number of other oddball configurations due to the insanity that > > is the software I have to deal with from our company, don't work. I > > wonder how my systems are booting successfully 100% of the time then? > Then you should probably open source your initramfs, so we can all > benefit from it and use it for all distributions. Do you really think that the tooling will make that much of a difference? I think there will always be cases where a initramfs will not work until adapted to it. And then its nice, to be able to do things like this: merkaba:~> cat /etc/initramfs-tools/scripts/local-top/btrfs #!/bin/sh PREREQ="lvm" prereqs() { echo $PREREQ } case $1 in prereqs) prereqs exit 0; esac . /scripts/functions log_begin_msg "Initializing BTRFS RAID-1." modprobe btrfs vgchange -ay btrfs device scan log_end_msg How would I add support for some configuration that a systemd or purely dracut + udev based initramfs does not support *yet*, on my own? Yes, one can argue, why doesn´t Debian support it already, but heck, I can do it myself and report a bug about it, without having to fire up a C compiler in order to fix things. I may be able to do this myself, but at a much higher cost in time. Above thing works so long already that I even often forgot about it. That said, if I still get a chance to execute a script at some time, a dracut based initramfs may just be totally fine with it, but I want this possibility and a shell to fix things up myself it they go wrong. And while I do not get the need for having systemd in the initramfs at all, I might be fine with it, if I can fix things up myself in case of problems. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 13:38 ` Harald Hoyer 2015-04-29 13:46 ` Richard Weinberger @ 2015-04-29 16:26 ` John Stoffel 2015-04-29 17:39 ` Steven Rostedt 2015-04-29 22:49 ` Theodore Ts'o 1 sibling, 2 replies; 333+ messages in thread From: John Stoffel @ 2015-04-29 16:26 UTC (permalink / raw) To: Harald Hoyer; +Cc: Richard Weinberger, linux-kernel >>>>> "Harald" == Harald Hoyer <harald@redhat.com> writes: Harald> On 29.04.2015 15:33, Richard Weinberger wrote: >> It depends how you define "beginning". To me an initramfs is a *very* minimal >> tool to prepare the rootfs and nothing more (no udev, no systemd, no >> "mini distro"). >> If the initramfs fails to do its job it can print to the console like >> the kernel does if it fails >> at a very early stage. >> Harald> Your solution might work for your small personal needs, but Harald> not for our customers. Arguing that your needs outweight mine because you have customers ain't gonna fly... I don't care about your customers and why should I? I'm not getting any money from them. Nor do I make any money from Linux kernel though as an IT person, I support Linux all day long. Do my requirements get listened to as well? If your customers wnat this feature, you're more than welcome to fork the kernel and support it yourself. Oh wait... Redhat does that already. So what's the problem? Just put it into RHEL (which I use I admit, along with Debian/Mint) and be done with it. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 16:26 ` John Stoffel @ 2015-04-29 17:39 ` Steven Rostedt 2015-04-29 19:10 ` Martin Steigerwald 2015-04-29 19:28 ` John Stoffel 2015-04-29 22:49 ` Theodore Ts'o 1 sibling, 2 replies; 333+ messages in thread From: Steven Rostedt @ 2015-04-29 17:39 UTC (permalink / raw) To: John Stoffel; +Cc: Harald Hoyer, Richard Weinberger, linux-kernel On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: > > If your customers wnat this feature, you're more than welcome to fork > the kernel and support it yourself. Oh wait... Redhat does that > already. So what's the problem? Just put it into RHEL (which I use > I admit, along with Debian/Mint) and be done with it. Red Hat tries very hard to push things upstream. It's policy is to not keep things for themselves, but always work with the community. That way, everyone benefits. Ideally, we should come up with a solution that works for all. -- Steve ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 17:39 ` Steven Rostedt @ 2015-04-29 19:10 ` Martin Steigerwald 2015-04-29 19:28 ` John Stoffel 1 sibling, 0 replies; 333+ messages in thread From: Martin Steigerwald @ 2015-04-29 19:10 UTC (permalink / raw) To: Steven Rostedt Cc: John Stoffel, Harald Hoyer, Richard Weinberger, linux-kernel Am Mittwoch, 29. April 2015, 13:39:42 schrieb Steven Rostedt: > On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: > > If your customers wnat this feature, you're more than welcome to fork > > the kernel and support it yourself. Oh wait... Redhat does that > > already. So what's the problem? Just put it into RHEL (which I use > > I admit, along with Debian/Mint) and be done with it. > > Red Hat tries very hard to push things upstream. It's policy is to not > keep things for themselves, but always work with the community. That > way, everyone benefits. Ideally, we should come up with a solution that > works for all. I think work with the community is a two-way process. Two way as in actually really *listening* to feedback instead of trying to push things as much as possible, believing to be *right* about things. I honestly dislike the "I know it better than you, go away" kind of attitude I have seen again and again. Here, in systemd-devel (where I unsubscribed again as I saw no use in continuing the discussion there) and even in Debian mailinglists and bug reports. Wherever I look what I call the "systemd" approach triggers intense polarity and resistance. With that I do not say there is something wrong about it, yet, I ask myself, why is that? And my best answer I came up with up to now comes back to how proponents of the new, different, not necessary better or worse, way, treat feedback. There I found to some extent: taking the feedback into account and actually adressing it. Especially when the feedback fitted into the new way of doing things. Yet I also found: - "I know it better than you, go away." - "Please only stick to pure technical reasons" as in "Whats wrong with the code?" disregarding any concerns about the *concept* and about different oppinions about whether kdbus code actually really belongs into the kernel - Ignoring it So I still say the issues are not purely technical. So I think a purely technical as in "what´s wrong with the code?" approach will not address the core of this discussion and the strong resistance against merging kdbus into the kernel. It sometimes appears to me like childs arguing about whether to paint their favorite toy red or green. I think a healthy approach might be to agree to disagree and work from there. That would at least break the "I am right", "No, I am right" cycle. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 17:39 ` Steven Rostedt 2015-04-29 19:10 ` Martin Steigerwald @ 2015-04-29 19:28 ` John Stoffel 1 sibling, 0 replies; 333+ messages in thread From: John Stoffel @ 2015-04-29 19:28 UTC (permalink / raw) To: Steven Rostedt Cc: John Stoffel, Harald Hoyer, Richard Weinberger, linux-kernel >>>>> "Steven" == Steven Rostedt <rostedt@goodmis.org> writes: Steven> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: >> >> If your customers wnat this feature, you're more than welcome to fork >> the kernel and support it yourself. Oh wait... Redhat does that >> already. So what's the problem? Just put it into RHEL (which I use >> I admit, along with Debian/Mint) and be done with it. Steven> Red Hat tries very hard to push things upstream. It's policy Steven> is to not keep things for themselves, but always work with the Steven> community. That way, everyone benefits. Ideally, we should Steven> come up with a solution that works for all. Yeah, I agree they have been good. I'm just reacting to the off the cuff comment of "my customers need it" which isn't a justification for this feature, esp when it hasn't been shown to be needed in the kernel. We went through alot of this with tux the in-kernel httpd server, and pushing other stuff out to user-space over the years. Why this needs to come in isn't clear. Or why not just a small part needing to come in with the rest in userspace. John ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 16:26 ` John Stoffel 2015-04-29 17:39 ` Steven Rostedt @ 2015-04-29 22:49 ` Theodore Ts'o 2015-04-30 0:05 ` David Lang 1 sibling, 1 reply; 333+ messages in thread From: Theodore Ts'o @ 2015-04-29 22:49 UTC (permalink / raw) To: John Stoffel; +Cc: Harald Hoyer, Richard Weinberger, linux-kernel On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: > If your customers wnat this feature, you're more than welcome to fork > the kernel and support it yourself. Oh wait... Redhat does that > already. So what's the problem? Just put it into RHEL (which I use > I admit, along with Debian/Mint) and be done with it. Harald, If you make the RHEL initramfs harder to debug in the field, I will await the time when some Red Hat field engineers will need to do the same sort of thing I have had to do in the field, and be amused when they want to shake you very warmly by the throat. :-) Seriously, keep things as simple as possible in the initramfs; don't use complicated bus protocols; that way lies madness. Enterprise systems aren't constantly booting (or they shouldn't be, if your kernels are sufficiently reliable :-), so trying to optimize for an extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. - Ted ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 22:49 ` Theodore Ts'o @ 2015-04-30 0:05 ` David Lang 2015-04-30 0:15 ` Dave Airlie 0 siblings, 1 reply; 333+ messages in thread From: David Lang @ 2015-04-30 0:05 UTC (permalink / raw) To: Theodore Ts'o Cc: John Stoffel, Harald Hoyer, Richard Weinberger, linux-kernel On Wed, 29 Apr 2015, Theodore Ts'o wrote: > On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: >> If your customers wnat this feature, you're more than welcome to fork >> the kernel and support it yourself. Oh wait... Redhat does that >> already. So what's the problem? Just put it into RHEL (which I use >> I admit, along with Debian/Mint) and be done with it. > > Harald, > > If you make the RHEL initramfs harder to debug in the field, I will > await the time when some Red Hat field engineers will need to do the > same sort of thing I have had to do in the field, and be amused when > they want to shake you very warmly by the throat. :-) > > Seriously, keep things as simple as possible in the initramfs; don't > use complicated bus protocols; that way lies madness. Enterprise > systems aren't constantly booting (or they shouldn't be, if your > kernels are sufficiently reliable :-), so trying to optimize for an > extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. I've had Enterprise systems where I could hit power on two boxes, and finish the OS install on one before the other has even finished POST and look for the boot media. I did this 5 years ago, before the "let's speed up boot" push started. Admittedly, this wasn't a stock distro boot/install, it was my own optimized one, but it also wasn't as optimized and automated as it could have been (several points where the installer needed to pick items from a menu and enter values) David Lang ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 0:05 ` David Lang @ 2015-04-30 0:15 ` Dave Airlie 2015-04-30 0:18 ` David Lang 0 siblings, 1 reply; 333+ messages in thread From: Dave Airlie @ 2015-04-30 0:15 UTC (permalink / raw) To: David Lang Cc: Theodore Ts'o, John Stoffel, Harald Hoyer, Richard Weinberger, linux-kernel On 30 April 2015 at 10:05, David Lang <david@lang.hm> wrote: > On Wed, 29 Apr 2015, Theodore Ts'o wrote: > >> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: >>> >>> If your customers wnat this feature, you're more than welcome to fork >>> the kernel and support it yourself. Oh wait... Redhat does that >>> already. So what's the problem? Just put it into RHEL (which I use >>> I admit, along with Debian/Mint) and be done with it. >> >> >> Harald, >> >> If you make the RHEL initramfs harder to debug in the field, I will >> await the time when some Red Hat field engineers will need to do the >> same sort of thing I have had to do in the field, and be amused when >> they want to shake you very warmly by the throat. :-) >> >> Seriously, keep things as simple as possible in the initramfs; don't >> use complicated bus protocols; that way lies madness. Enterprise >> systems aren't constantly booting (or they shouldn't be, if your >> kernels are sufficiently reliable :-), so trying to optimize for an >> extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. > > > I've had Enterprise systems where I could hit power on two boxes, and finish > the OS install on one before the other has even finished POST and look for > the boot media. I did this 5 years ago, before the "let's speed up boot" > push started. > > Admittedly, this wasn't a stock distro boot/install, it was my own optimized > one, but it also wasn't as optimized and automated as it could have been > (several points where the installer needed to pick items from a menu and > enter values) > You guys might have missed this new industry trend, I think they call it virtualisation, I hear it's going to be big, you might want to look into it. Dave. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 0:15 ` Dave Airlie @ 2015-04-30 0:18 ` David Lang 2015-04-30 1:20 ` Dave Airlie 0 siblings, 1 reply; 333+ messages in thread From: David Lang @ 2015-04-30 0:18 UTC (permalink / raw) To: Dave Airlie Cc: Theodore Ts'o, John Stoffel, Harald Hoyer, Richard Weinberger, linux-kernel On Thu, 30 Apr 2015, Dave Airlie wrote: > On 30 April 2015 at 10:05, David Lang <david@lang.hm> wrote: >> On Wed, 29 Apr 2015, Theodore Ts'o wrote: >> >>> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: >>>> >>>> If your customers wnat this feature, you're more than welcome to fork >>>> the kernel and support it yourself. Oh wait... Redhat does that >>>> already. So what's the problem? Just put it into RHEL (which I use >>>> I admit, along with Debian/Mint) and be done with it. >>> >>> >>> Harald, >>> >>> If you make the RHEL initramfs harder to debug in the field, I will >>> await the time when some Red Hat field engineers will need to do the >>> same sort of thing I have had to do in the field, and be amused when >>> they want to shake you very warmly by the throat. :-) >>> >>> Seriously, keep things as simple as possible in the initramfs; don't >>> use complicated bus protocols; that way lies madness. Enterprise >>> systems aren't constantly booting (or they shouldn't be, if your >>> kernels are sufficiently reliable :-), so trying to optimize for an >>> extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. >> >> >> I've had Enterprise systems where I could hit power on two boxes, and finish >> the OS install on one before the other has even finished POST and look for >> the boot media. I did this 5 years ago, before the "let's speed up boot" >> push started. >> >> Admittedly, this wasn't a stock distro boot/install, it was my own optimized >> one, but it also wasn't as optimized and automated as it could have been >> (several points where the installer needed to pick items from a menu and >> enter values) >> > > You guys might have missed this new industry trend, I think they call > it virtualisation, > > I hear it's going to be big, you might want to look into it. So what do you run your virtual machines on? you still have to put an OS on the hardware to support your VMs. Virtualization doesn't eliminate servers (as much as some cloud advocates like to claim it does) And virtualization has overhead, sometimes very significant overhead, so it's not always the right answer. David Lang ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-30 0:18 ` David Lang @ 2015-04-30 1:20 ` Dave Airlie 0 siblings, 0 replies; 333+ messages in thread From: Dave Airlie @ 2015-04-30 1:20 UTC (permalink / raw) To: David Lang Cc: Theodore Ts'o, John Stoffel, Harald Hoyer, Richard Weinberger, linux-kernel >>> >>> I've had Enterprise systems where I could hit power on two boxes, and >>> finish >>> the OS install on one before the other has even finished POST and look >>> for >>> the boot media. I did this 5 years ago, before the "let's speed up boot" >>> push started. >>> >>> Admittedly, this wasn't a stock distro boot/install, it was my own >>> optimized >>> one, but it also wasn't as optimized and automated as it could have been >>> (several points where the installer needed to pick items from a menu and >>> enter values) >>> >> >> You guys might have missed this new industry trend, I think they call >> it virtualisation, >> >> I hear it's going to be big, you might want to look into it. > > > So what do you run your virtual machines on? you still have to put an OS on > the hardware to support your VMs. Virtualization doesn't eliminate servers > (as much as some cloud advocates like to claim it does) > > And virtualization has overhead, sometimes very significant overhead, so > it's not always the right answer. > Thanks for proving my point, RHEL as a distro runs in both scenarios, optimising one is important at the moment, that fact that it might speed up boot on server which take 15 mins to POST is a side effect. For some reason people seem to think their one use case is all that matters, Dave. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 12:47 ` Harald Hoyer 2015-04-29 13:33 ` Richard Weinberger @ 2015-04-29 13:35 ` Stephen Smalley 2015-04-29 15:18 ` Simon McVittie 2015-04-29 15:27 ` Martin Steigerwald ` (3 subsequent siblings) 5 siblings, 1 reply; 333+ messages in thread From: Stephen Smalley @ 2015-04-29 13:35 UTC (permalink / raw) To: Harald Hoyer, John Stoffel, Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul Moore, James Morris, LSM On 04/29/2015 08:47 AM, Harald Hoyer wrote: > On 29.04.2015 01:12, John Stoffel wrote: >> LDAP is pretty damn generic, in that you can put pretty large objects into >> it, and pretty large OUs, etc. So why would it be a candidate for going >> into the kernel? And why is kdbus so important in the kernel as well? >> People have talked about it needing to be there for bootup, but isn't that >> why we ripped out RAID detection and such from the kernel and built >> initramfs, so that there's LESS in the kernel, and more in an early >> userspace? Same idea with dbus in my opinion. > > Let me elaborate on the initramfs/shutdown situation a little bit more, > because I have to deal with that every day. > > Because of the "let's move everything to userspace" sentiment we nowadays > have the situation, that we need a lot of tools to setup the root device. > > Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network > (with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the > LVM, open crypto devices, etc... > And if something goes wrong, you want to have a shell, see all the logs and > debug things. > > Now over the time we moved away from simple shell scripts (without any > logging) and static compiled special versions for the initramfs to a mini > distribution in the initramfs, which simplifies maintenance and improves > reliability. > > Basically you want to use the same tools in the initramfs (and shutdown) > which you already have and use in your real root, with the same configuration > files and the same interfaces and the same code paths. > > Therefore systemd is started in dracut created initramfs, which starts > journald for logging. The same basic systemd targets exist in the initramfs > as on the real root, so normally you don't have to cope with specialized > versions for the initramfs. > > The target here is to have the same IPC mechanism from the very beginning to > the very end. No crappy fallback mechanisms in case a daemon is not running > or has crashed, no creepy transition from initramfs root to real root to > shutdown root. > > We already have such transitions like: systemd, journald, mdmon [1], etc. > systemd has to serialize itself, journald's file descriptors are transitioned > over, mdmon jumps through hoops. Remember you want to get rid of open files > and executables and have to reexec everything, if you transition from the > initramfs root to the real root, and also from the real root to the shutdown > root. > > We really don't want the IPC mechanism to be in a flux state. All tools have > to fallback to a non-standard mechanism in that case. > > If I have to pull in a dbus daemon in the initramfs, we still have the > chicken and egg problem for PID 1 talking to the logging daemon and starting > dbus. > systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus > cannot log anything on startup, if journald is not running, etc... > > dbus-daemon would have to transition to the real root, and from the real root > to the shutdown root, without losing state. > > Of course this can all be done, but it would involve fallback mechanisms, > which we want to get rid off. Hopefully, you don't suggest to merge dbus with > PID 1. Also with a daemon, you will lose the points mentioned in the cover mail > : > > * Security: The peers which communicate do not have to trust each > other, as the only trustworthy component in the game is the kernel > which adds metadata and ensures that all data passed as payload is > either copied or sealed, so that the receiver can parse the data > without having to protect against changing memory while parsing > buffers. Also, all the data transfer is controlled by the kernel, > so that LSMs can track and control what is going on, without > involving userspace. Because of the LSM issue, security people are > much happier with this model than the current scheme of having to > hook into dbus to mediate things. I just want to caution that this justification for kdbus is not fully realized in the current implementation. As it currently stands, there are no LSM hook calls in the kdbus tree beyond metadata collection of security labels. I know there have been experimental proof-of-concept patches floating around for LSM hooks for kdbus but they aren't merged as of yet, nor are there any real implementations of the hooks for the security modules. If kdbus is merged, we need to make sure that the LSM support is integrated before it gets enabled in any distros or we'll have a completely unmediated IPC channel, i.e. a bypass of any security policy restrictions on IPC. It is also interesting that kdbus allows impersonation of any credential, including security label, by "privileged" clients, where privileged simply means it either has CAP_IPC_OWNER or owns (euid matches uid) the bus. That seems wrong at least for security labels; either we should not support impersonation of security labels at all or at least it should be controlled by the security module based on its own logic and attributes, not based on CAP_IPC_OWNER or a uid match. Has anyone even reviewed the privilege and UID-based model of kdbus for its implications with respect to discretionary access control? > * Being in the kernel closes a lot of races which can't be fixed with > the current userspace solutions. For example, with kdbus, there is a > way a client can disconnect from a bus, but do so only if no further > messages present in its queue, which is crucial for implementing > race-free "exit-on-idle" services > > * Eavesdropping on the kernel level, so privileged users can hook into > the message stream without hacking support for that into their > userspace processes This one worried me a bit, particularly the statement that such eavesdropping is unobservable by any other participant on the bus. Seems a bit prone to abuse, particularly since it can be done by any privileged client, not merely the process that originally created the bus? ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 13:35 ` Stephen Smalley @ 2015-04-29 15:18 ` Simon McVittie 2015-04-29 17:48 ` Stephen Smalley 0 siblings, 1 reply; 333+ messages in thread From: Simon McVittie @ 2015-04-29 15:18 UTC (permalink / raw) To: Stephen Smalley, Harald Hoyer, John Stoffel, Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul Moore, James Morris, LSM On 29/04/15 14:35, Stephen Smalley wrote: > As it currently stands, there > are no LSM hook calls in the kdbus tree beyond metadata collection of > security labels. SELinux and AppArmor are the two particularly interesting LSMs here: those are the ones that have support for user-space mediation in dbus-daemon, and hence the ones for which replacing dbus-daemon with kdbus, without LSM hooks, would be a regression. > It is also interesting that kdbus allows impersonation of any > credential, including security label, by "privileged" clients, where > privileged simply means it either has CAP_IPC_OWNER or owns (euid > matches uid) the bus. FWIW, this particular feature is *not* one of those that are necessary for feature parity with dbus-daemon. There's no API for making dbus-daemon fake its clients' credentials; if you can ptrace it, then you can of course subvert it arbitrarily, but nothing less hackish than that is currently offered. > On 04/29/2015 08:47 AM, Harald Hoyer wrote: >> * Eavesdropping on the kernel level, so privileged users can hook into >> the message stream without hacking support for that into their >> userspace processes > > This one worried me a bit, particularly the statement that such > eavesdropping is unobservable by any other participant on the bus. > Seems a bit prone to abuse, particularly since it can be done by any > privileged client, not merely the process that originally created the bus? For feature parity with dbus-daemon, the fact that eavesdropping/monitoring *exists* is necessary (it's a widely used developer/sysadmin feature) but the precise mechanics of how you get it are not necessarily set in stone. In particular, if you think kdbus' definition of "are you privileged?" may be too broad, that seems a valid question to be asking. In traditional D-Bus, individual users can normally eavesdrop/monitor on their own session buses (which are not a security boundary, unless specially reconfigured), and this is a useful property; on non-LSM systems without special configuration, each user should ideally be able to monitor their own kdbus user bus, too. The system bus *is* a security boundary, and administrative privileges should be required to eavesdrop on it. At a high level, someone with "full root privileges" should be able to eavesdrop, and ordinary users should not; there are various possible criteria for distinguishing between those two extremes, and I have no opinion on whether CAP_IPC_OWNER is the most appropriate cutoff point. In dbus-daemon, LSMs with integration code in dbus-daemon have the opportunity to mediate eavesdropping specially. SELinux does not currently do this (as far as I can see), but AppArmor does, so AppArmor-confined processes are not normally allowed to eavesdrop on the session bus (even though the same user's unconfined processes may). That seems like one of the obvious places for an LSM hook in kdbus. Having eavesdropping be unobservable means that applications cannot change their behaviour while they are being watched, either maliciously (to hide from investigation) or accidentally (bugs that only happen when not being debugged are the hardest to fix). dbus-daemon's traditional implementation of eavesdropping has had side-effects in the past, which is undesirable, and is addressed by the new monitoring interface in version 1.9. kdbus' version of eavesdropping is quite similar to the new monitoring interface. -- Simon McVittie Collabora Ltd. <http://www.collabora.com/> For context, I am a D-Bus maintainer, but neither the original designer of D-Bus nor a kdbus developer. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:18 ` Simon McVittie @ 2015-04-29 17:48 ` Stephen Smalley 0 siblings, 0 replies; 333+ messages in thread From: Stephen Smalley @ 2015-04-29 17:48 UTC (permalink / raw) To: Simon McVittie, Harald Hoyer, John Stoffel, Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni, Paul Moore, James Morris, LSM On 04/29/2015 11:18 AM, Simon McVittie wrote: > On 29/04/15 14:35, Stephen Smalley wrote: >> It is also interesting that kdbus allows impersonation of any >> credential, including security label, by "privileged" clients, where >> privileged simply means it either has CAP_IPC_OWNER or owns (euid >> matches uid) the bus. > > FWIW, this particular feature is *not* one of those that are necessary > for feature parity with dbus-daemon. There's no API for making > dbus-daemon fake its clients' credentials; if you can ptrace it, then > you can of course subvert it arbitrarily, but nothing less hackish than > that is currently offered. Then I'd be inclined to drop it from kdbus unless some compelling use case exists, and even then, I don't believe that CAP_IPC_OWNER or bus-owner uid match is sufficient even for forging credentials other than the security label. For socket credentials passing, for example, the kernel checks CAP_SYS_ADMIN for pid forging, CAP_SETUID for uid forging, and CAP_SETGID for gid forging. And I don't believe we support any form of forging of the security label on socket credentials. > For feature parity with dbus-daemon, the fact that > eavesdropping/monitoring *exists* is necessary (it's a widely used > developer/sysadmin feature) but the precise mechanics of how you get it > are not necessarily set in stone. In particular, if you think kdbus' > definition of "are you privileged?" may be too broad, that seems a valid > question to be asking. > > In traditional D-Bus, individual users can normally eavesdrop/monitor on > their own session buses (which are not a security boundary, unless > specially reconfigured), and this is a useful property; on non-LSM > systems without special configuration, each user should ideally be able > to monitor their own kdbus user bus, too. > > The system bus *is* a security boundary, and administrative privileges > should be required to eavesdrop on it. At a high level, someone with > "full root privileges" should be able to eavesdrop, and ordinary users > should not; there are various possible criteria for distinguishing > between those two extremes, and I have no opinion on whether > CAP_IPC_OWNER is the most appropriate cutoff point. > > In dbus-daemon, LSMs with integration code in dbus-daemon have the > opportunity to mediate eavesdropping specially. SELinux does not > currently do this (as far as I can see), but AppArmor does, so > AppArmor-confined processes are not normally allowed to eavesdrop on the > session bus (even though the same user's unconfined processes may). That > seems like one of the obvious places for an LSM hook in kdbus. Yes, we would want to control this in SELinux; I suspect that either the eavesdropping functionality did not exist in dbus-daemon at the time of the original dbus-daemon SELinux integration or it was an oversight. > Having eavesdropping be unobservable means that applications cannot > change their behaviour while they are being watched, either maliciously > (to hide from investigation) or accidentally (bugs that only happen when > not being debugged are the hardest to fix). dbus-daemon's traditional > implementation of eavesdropping has had side-effects in the past, which > is undesirable, and is addressed by the new monitoring interface in > version 1.9. kdbus' version of eavesdropping is quite similar to the new > monitoring interface. ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 12:47 ` Harald Hoyer 2015-04-29 13:33 ` Richard Weinberger 2015-04-29 13:35 ` Stephen Smalley @ 2015-04-29 15:27 ` Martin Steigerwald 2015-04-29 16:36 ` David Lang 2015-04-29 18:54 ` Andy Lutomirski ` (2 subsequent siblings) 5 siblings, 1 reply; 333+ messages in thread From: Martin Steigerwald @ 2015-04-29 15:27 UTC (permalink / raw) To: Harald Hoyer Cc: John Stoffel, Havoc Pennington, Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer: > We really don't want the IPC mechanism to be in a flux state. All tools > have to fallback to a non-standard mechanism in that case. > > If I have to pull in a dbus daemon in the initramfs, we still have the > chicken and egg problem for PID 1 talking to the logging daemon and > starting dbus. > systemd cannot talk to journald via dbus unless dbus-daemon is started, > dbus cannot log anything on startup, if journald is not running, etc... Do I get this right that it is basically a userspace *design* decision that you use as a reason to have kdbus inside the kernel? Is it really necessary to use DBUS for talking to journald? And does it really matter that much if any message before starting up dbus do not appear in the log? /proc/kmsg is a ring buffer, it can still be copied over later. I remember this kind of reason not not having cgroup management in a separate process, but these are both in userspace. "We have done it this way in userspace, thus this needs to be in kernel" doesn´t sound quite convincing to me as an argument for having dbus inside the kernel. Userspace uses the API the kernel and glibc provide, yes, it makes sense to look at what userspace needs, but designing some things in userspace and then requiring support for these design decisions in the kernel just doesn´t sound quite right to me. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 15:27 ` Martin Steigerwald @ 2015-04-29 16:36 ` David Lang 0 siblings, 0 replies; 333+ messages in thread From: David Lang @ 2015-04-29 16:36 UTC (permalink / raw) To: Martin Steigerwald Cc: Harald Hoyer, John Stoffel, Havoc Pennington, Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Wed, 29 Apr 2015, Martin Steigerwald wrote: > Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer: >> We really don't want the IPC mechanism to be in a flux state. All tools >> have to fallback to a non-standard mechanism in that case. >> >> If I have to pull in a dbus daemon in the initramfs, we still have the >> chicken and egg problem for PID 1 talking to the logging daemon and >> starting dbus. >> systemd cannot talk to journald via dbus unless dbus-daemon is started, >> dbus cannot log anything on startup, if journald is not running, etc... > > Do I get this right that it is basically a userspace *design* decision > that you use as a reason to have kdbus inside the kernel? > > Is it really necessary to use DBUS for talking to journald? And does it > really matter that much if any message before starting up dbus do not > appear in the log? /proc/kmsg is a ring buffer, it can still be copied over > later. I've been getting the early boot messages in my logs for decades (assuming the system doesn't fail before the syslog daemon is started). It sometimes has required setting a larger than default ringbuffer in the kernel, but that's easy enough to do. David Lang ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 12:47 ` Harald Hoyer ` (2 preceding siblings ...) 2015-04-29 15:27 ` Martin Steigerwald @ 2015-04-29 18:54 ` Andy Lutomirski 2015-04-29 19:30 ` Austin S Hemmelgarn 2015-04-30 20:14 ` Eric W. Biederman 2015-05-01 15:49 ` Austin S Hemmelgarn 5 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-29 18:54 UTC (permalink / raw) To: Harald Hoyer Cc: Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, John Stoffel, Djalal Harouni On Apr 29, 2015 5:48 AM, "Harald Hoyer" <harald@redhat.com> wrote: > Of course this can all be done, but it would involve fallback mechanisms, > which we want to get rid off. Hopefully, you don't suggest to merge dbus with > PID 1. Also with a daemon, you will lose the points mentioned in the cover mail > : > > * Security: The peers which communicate do not have to trust each > other, as the only trustworthy component in the game is the kernel > which adds metadata and ensures that all data passed as payload is > either copied or sealed, so that the receiver can parse the data > without having to protect against changing memory while parsing > buffers. Also, all the data transfer is controlled by the kernel, > so that LSMs can track and control what is going on, without > involving userspace. Because of the LSM issue, security people are > much happier with this model than the current scheme of having to > hook into dbus to mediate things. Other security people prefer code to stay out of the kernel when possible. Also, this metadata argument is still invalid. Sockets can be improved for this purpose. > > * Being in the kernel closes a lot of races which can't be fixed with > the current userspace solutions. For example, with kdbus, there is a > way a client can disconnect from a bus, but do so only if no further > messages present in its queue, which is crucial for implementing > race-free "exit-on-idle" services This can be implemented in userspace. Client to dbus daemon: may I exit now? Dbus daemon to client: yes (and no more messages) or no > > * Eavesdropping on the kernel level, so privileged users can hook into > the message stream without hacking support for that into their > userspace processes > Why would it be a hack in userspace but not a hack in the kernel? > * A number of smaller benefits: for example kdbus learned a way to peek > full messages without dequeing them, which is really useful for > logging metadata when handling bus-activation requests. MSG_PEEK? > > I don't care, if the kdbus speedup is only marginal. > > In my ideal world, there is a standard IPC mechanism from the beginning to > the end, which does not rely on any process running (except the kernel) and > which is used by _all_ tools, be it a system daemon providing information and > interfaces about device assembly or network setup tools or end user desktop > processes. > > dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you can > invent the wheel again (NIH, "we know better") and wait and see, if that > works out. Until then the whole common IPC problem is unresolved and Linux > distributions are just a collection of random software with no common > interoperability and home grown interfaces. I don't think anyone is suggesting throwing out dbus. > > [1] transitioning mdmon is one of the critical parts for an IMSM raid array. > Also running an executable from a disk, which the executable is monitoring, > and which stops functioning, if the executable is not responding is insane. > That sounds like an excellent reason to just keep running the initramfs copy, and I don't see why IPC has anything to do with where the executable comes from. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 18:54 ` Andy Lutomirski @ 2015-04-29 19:30 ` Austin S Hemmelgarn 2015-04-29 19:42 ` Andy Lutomirski 2015-04-29 22:34 ` John Stoffel 0 siblings, 2 replies; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-04-29 19:30 UTC (permalink / raw) To: Andy Lutomirski, Harald Hoyer Cc: Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, John Stoffel, Djalal Harouni [-- Attachment #1: Type: text/plain, Size: 847 bytes --] On 2015-04-29 14:54, Andy Lutomirski wrote: > On Apr 29, 2015 5:48 AM, "Harald Hoyer" <harald@redhat.com> wrote: >> >> * Being in the kernel closes a lot of races which can't be fixed with >> the current userspace solutions. For example, with kdbus, there is a >> way a client can disconnect from a bus, but do so only if no further >> messages present in its queue, which is crucial for implementing >> race-free "exit-on-idle" services > > This can be implemented in userspace. > > Client to dbus daemon: may I exit now? > Dbus daemon to client: yes (and no more messages) or no > Depending on how this is implemented, there would be a potential issue if a message arrived for the client after the daemon told it it could exit, but before it finished shutdown, in which case the message might get lost. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 19:30 ` Austin S Hemmelgarn @ 2015-04-29 19:42 ` Andy Lutomirski 2015-04-29 20:15 ` David Lang 2015-04-29 22:34 ` John Stoffel 1 sibling, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-29 19:42 UTC (permalink / raw) To: Austin S Hemmelgarn Cc: Harald Hoyer, Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, John Stoffel, Djalal Harouni On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2015-04-29 14:54, Andy Lutomirski wrote: >> >> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <harald@redhat.com> wrote: >>> >>> >>> * Being in the kernel closes a lot of races which can't be fixed with >>> the current userspace solutions. For example, with kdbus, there is a >>> way a client can disconnect from a bus, but do so only if no further >>> messages present in its queue, which is crucial for implementing >>> race-free "exit-on-idle" services >> >> >> This can be implemented in userspace. >> >> Client to dbus daemon: may I exit now? >> Dbus daemon to client: yes (and no more messages) or no >> > Depending on how this is implemented, there would be a potential issue if a > message arrived for the client after the daemon told it it could exit, but > before it finished shutdown, in which case the message might get lost. > Then implement it the right way? The client sends some kind of sequence number with its request. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 19:42 ` Andy Lutomirski @ 2015-04-29 20:15 ` David Lang 2015-04-29 20:24 ` Andy Lutomirski 0 siblings, 1 reply; 333+ messages in thread From: David Lang @ 2015-04-29 20:15 UTC (permalink / raw) To: Andy Lutomirski Cc: Austin S Hemmelgarn, Harald Hoyer, Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, John Stoffel, Djalal Harouni On Wed, 29 Apr 2015, Andy Lutomirski wrote: > On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn > <ahferroin7@gmail.com> wrote: >> On 2015-04-29 14:54, Andy Lutomirski wrote: >>> >>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <harald@redhat.com> wrote: >>>> >>>> >>>> * Being in the kernel closes a lot of races which can't be fixed with >>>> the current userspace solutions. For example, with kdbus, there is a >>>> way a client can disconnect from a bus, but do so only if no further >>>> messages present in its queue, which is crucial for implementing >>>> race-free "exit-on-idle" services >>> >>> >>> This can be implemented in userspace. >>> >>> Client to dbus daemon: may I exit now? >>> Dbus daemon to client: yes (and no more messages) or no >>> >> Depending on how this is implemented, there would be a potential issue if a >> message arrived for the client after the daemon told it it could exit, but >> before it finished shutdown, in which case the message might get lost. >> > > Then implement it the right way? The client sends some kind of > sequence number with its request. so any app in the system can prevent any other app from exiting/restarting by just sending it the equivalent of a ping over dbus? preventing an app from exiting because there are unhandled messages doesn't mean that those messages are going to be handled, just that they will get read and dropped on the floor by an app trying to exit. Sometimes you will just end up with a hung app that can't process messages and needs to be restarted, but can't be restarted because there are pending messages. The problem with "guaranteed delivery" messages is that things _will_ go wrong that will cause the messages to not be received and processed. At that point you have the choice of loosing some messages or freezing your entire system (you can buffer them for some time, but eventually you will run out of buffer space) We see this all the time in the logging world, people configure their systems for reliable delivery of log messages to a remote machine, then when that remote machine goes down and can't receive messages (or a network issue blocks the traffic), the sending machine blocks and causes an outage. Being too strict about guaranteeing delivery just doesn't work. You must have a mechanism to abort and throw away unprocessed messages. If this means disconnecting the receiver so that there are no missing messages to the receiver, that's a valid choice. But preventing a receiver from exiting because it hasn't processed a message is not a valid choice. David Lang ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 20:15 ` David Lang @ 2015-04-29 20:24 ` Andy Lutomirski 2015-04-29 20:43 ` David Lang 0 siblings, 1 reply; 333+ messages in thread From: Andy Lutomirski @ 2015-04-29 20:24 UTC (permalink / raw) To: David Lang Cc: Austin S Hemmelgarn, Harald Hoyer, Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, John Stoffel, Djalal Harouni On Wed, Apr 29, 2015 at 1:15 PM, David Lang <david@lang.hm> wrote: > On Wed, 29 Apr 2015, Andy Lutomirski wrote: > >> On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn >> <ahferroin7@gmail.com> wrote: >>> >>> On 2015-04-29 14:54, Andy Lutomirski wrote: >>>> >>>> >>>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <harald@redhat.com> wrote: >>>>> >>>>> >>>>> >>>>> * Being in the kernel closes a lot of races which can't be fixed with >>>>> the current userspace solutions. For example, with kdbus, there is >>>>> a >>>>> way a client can disconnect from a bus, but do so only if no >>>>> further >>>>> messages present in its queue, which is crucial for implementing >>>>> race-free "exit-on-idle" services >>>> >>>> >>>> >>>> This can be implemented in userspace. >>>> >>>> Client to dbus daemon: may I exit now? >>>> Dbus daemon to client: yes (and no more messages) or no >>>> >>> Depending on how this is implemented, there would be a potential issue if >>> a >>> message arrived for the client after the daemon told it it could exit, >>> but >>> before it finished shutdown, in which case the message might get lost. >>> >> >> Then implement it the right way? The client sends some kind of >> sequence number with its request. > > > so any app in the system can prevent any other app from exiting/restarting > by just sending it the equivalent of a ping over dbus? > > preventing an app from exiting because there are unhandled messages doesn't > mean that those messages are going to be handled, just that they will get > read and dropped on the floor by an app trying to exit. Sometimes you will > just end up with a hung app that can't process messages and needs to be > restarted, but can't be restarted because there are pending messages. I think this consideration is more or less the same whether it's handled in the kernel or in userspace, though. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 20:24 ` Andy Lutomirski @ 2015-04-29 20:43 ` David Lang 2015-04-29 20:51 ` David Herrmann 0 siblings, 1 reply; 333+ messages in thread From: David Lang @ 2015-04-29 20:43 UTC (permalink / raw) To: Andy Lutomirski Cc: Austin S Hemmelgarn, Harald Hoyer, Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, John Stoffel, Djalal Harouni On Wed, 29 Apr 2015, Andy Lutomirski wrote: > On Wed, Apr 29, 2015 at 1:15 PM, David Lang <david@lang.hm> wrote: >> On Wed, 29 Apr 2015, Andy Lutomirski wrote: >> >>> On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn >>> <ahferroin7@gmail.com> wrote: >>>> >>>> On 2015-04-29 14:54, Andy Lutomirski wrote: >>>>> >>>>> >>>>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <harald@redhat.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> * Being in the kernel closes a lot of races which can't be fixed with >>>>>> the current userspace solutions. For example, with kdbus, there is >>>>>> a >>>>>> way a client can disconnect from a bus, but do so only if no >>>>>> further >>>>>> messages present in its queue, which is crucial for implementing >>>>>> race-free "exit-on-idle" services >>>>> >>>>> >>>>> >>>>> This can be implemented in userspace. >>>>> >>>>> Client to dbus daemon: may I exit now? >>>>> Dbus daemon to client: yes (and no more messages) or no >>>>> >>>> Depending on how this is implemented, there would be a potential issue if >>>> a >>>> message arrived for the client after the daemon told it it could exit, >>>> but >>>> before it finished shutdown, in which case the message might get lost. >>>> >>> >>> Then implement it the right way? The client sends some kind of >>> sequence number with its request. >> >> >> so any app in the system can prevent any other app from exiting/restarting >> by just sending it the equivalent of a ping over dbus? >> >> preventing an app from exiting because there are unhandled messages doesn't >> mean that those messages are going to be handled, just that they will get >> read and dropped on the floor by an app trying to exit. Sometimes you will >> just end up with a hung app that can't process messages and needs to be >> restarted, but can't be restarted because there are pending messages. > > I think this consideration is more or less the same whether it's > handled in the kernel or in userspace, though. If the justification for why this needs to be in the kernel is that you can't reliably prevent apps from exiting if there are pending messages, then the answer of "preventing apps from exiting if there are pending messages isn't a sane thing to try and do" is a direct counter to that justification for including it in the kernel. David Lang ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 20:43 ` David Lang @ 2015-04-29 20:51 ` David Herrmann 2015-04-30 1:42 ` John Stoffel 0 siblings, 1 reply; 333+ messages in thread From: David Herrmann @ 2015-04-29 20:51 UTC (permalink / raw) To: David Lang Cc: Andy Lutomirski, Austin S Hemmelgarn, Harald Hoyer, Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, Eric W. Biederman, John Stoffel, Djalal Harouni Hi On Wed, Apr 29, 2015 at 10:43 PM, David Lang <david@lang.hm> wrote: > If the justification for why this needs to be in the kernel is that you > can't reliably prevent apps from exiting if there are pending messages, [...] It's not. > the answer of "preventing apps from exiting if there are pending messages > isn't a sane thing to try and do" is a direct counter to that justification > for including it in the kernel. It's optionally used for reliable exit-on-idle. Thanks David ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 20:51 ` David Herrmann @ 2015-04-30 1:42 ` John Stoffel 0 siblings, 0 replies; 333+ messages in thread From: John Stoffel @ 2015-04-30 1:42 UTC (permalink / raw) To: David Herrmann Cc: David Lang, Andy Lutomirski, Austin S Hemmelgarn, Harald Hoyer, Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, Eric W. Biederman, John Stoffel, Djalal Harouni >>>>> "David" == David Herrmann <dh.herrmann@gmail.com> writes: David> Hi David> On Wed, Apr 29, 2015 at 10:43 PM, David Lang <david@lang.hm> wrote: >> If the justification for why this needs to be in the kernel is that you >> can't reliably prevent apps from exiting if there are pending messages, [...] David> It's not. >> the answer of "preventing apps from exiting if there are pending messages >> isn't a sane thing to try and do" is a direct counter to that justification >> for including it in the kernel. David> It's optionally used for reliable exit-on-idle. Then why is there a critical race that must be solved in the kernel if it's optional? And can you please describe in more detail what this 'exit-on-idle' thing is and how it works and why you would use it? ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 19:30 ` Austin S Hemmelgarn 2015-04-29 19:42 ` Andy Lutomirski @ 2015-04-29 22:34 ` John Stoffel 1 sibling, 0 replies; 333+ messages in thread From: John Stoffel @ 2015-04-29 22:34 UTC (permalink / raw) To: Austin S Hemmelgarn Cc: Andy Lutomirski, Harald Hoyer, Arnd Bergmann, Havoc Pennington, linux-kernel, Jiri Kosina, Andrew Morton, Daniel Mack, One Thousand Gnomes, Linus Torvalds, Lukasz Skalski, Theodore Ts'o, Tom Gundersen, Greg Kroah-Hartman, David Herrmann, Eric W. Biederman, John Stoffel, Djalal Harouni >>>>> "Austin" == Austin S Hemmelgarn <ahferroin7@gmail.com> writes: Austin> On 2015-04-29 14:54, Andy Lutomirski wrote: >> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <harald@redhat.com> wrote: >>> >>> * Being in the kernel closes a lot of races which can't be fixed with >>> the current userspace solutions. For example, with kdbus, there is a >>> way a client can disconnect from a bus, but do so only if no further >>> messages present in its queue, which is crucial for implementing >>> race-free "exit-on-idle" services >> >> This can be implemented in userspace. >> >> Client to dbus daemon: may I exit now? >> Dbus daemon to client: yes (and no more messages) or no >> Austin> Depending on how this is implemented, there would be a Austin> potential issue if a message arrived for the client after the Austin> daemon told it it could exit, but before it finished shutdown, Austin> in which case the message might get lost. What makes anyone think they can guarrantee that a message is even received? I could see the daemon sending the message and the client getting a segfault and dumping core. What then? How would kdbus solve this type of "race" anyway? Can anyone give a concrete example of one of the races that are closed here? That's been one of the missing examples. And remember, there's no perfection. Even in the kernel we just had a discussion about missed/missing IPIs and lost processor interrupts, etc. Expecting perfection is just asking for trouble. That's why there are timeouts, retries and just giving up and throwing an exception. John ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 12:47 ` Harald Hoyer ` (3 preceding siblings ...) 2015-04-29 18:54 ` Andy Lutomirski @ 2015-04-30 20:14 ` Eric W. Biederman 2015-05-01 15:49 ` Austin S Hemmelgarn 5 siblings, 0 replies; 333+ messages in thread From: Eric W. Biederman @ 2015-04-30 20:14 UTC (permalink / raw) To: Harald Hoyer, John Stoffel, Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On April 29, 2015 7:47:53 AM CDT, Harald Hoyer <harald@redhat.com> wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA256 > >On 29.04.2015 01:12, John Stoffel wrote: >> LDAP is pretty damn generic, in that you can put pretty large objects >into >> it, and pretty large OUs, etc. So why would it be a candidate for >going >> into the kernel? And why is kdbus so important in the kernel as >well? >> People have talked about it needing to be there for bootup, but isn't >that >> why we ripped out RAID detection and such from the kernel and built >> initramfs, so that there's LESS in the kernel, and more in an early >> userspace? Same idea with dbus in my opinion. > >Let me elaborate on the initramfs/shutdown situation a little bit more, >because I have to deal with that every day. > >Because of the "let's move everything to userspace" sentiment we >nowadays >have the situation, that we need a lot of tools to setup the root >device. > >Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the >network >(with bridging, bonding, etc.), the iSCSI connection, assemble the >raid, the >LVM, open crypto devices, etc... >And if something goes wrong, you want to have a shell, see all the logs >and >debug things. > >Now over the time we moved away from simple shell scripts (without any >logging) and static compiled special versions for the initramfs to a >mini >distribution in the initramfs, which simplifies maintenance and >improves >reliability. > >Basically you want to use the same tools in the initramfs (and >shutdown) >which you already have and use in your real root, with the same >configuration >files and the same interfaces and the same code paths. > >Therefore systemd is started in dracut created initramfs, which starts >journald for logging. The same basic systemd targets exist in the >initramfs >as on the real root, so normally you don't have to cope with >specialized >versions for the initramfs. > >The target here is to have the same IPC mechanism from the very >beginning to >the very end. No crappy fallback mechanisms in case a daemon is not >running >or has crashed, no creepy transition from initramfs root to real root >to >shutdown root. > >We already have such transitions like: systemd, journald, mdmon [1], >etc. >systemd has to serialize itself, journald's file descriptors are >transitioned >over, mdmon jumps through hoops. Remember you want to get rid of open >files >and executables and have to reexec everything, if you transition from >the >initramfs root to the real root, and also from the real root to the >shutdown >root. > >We really don't want the IPC mechanism to be in a flux state. All tools >have >to fallback to a non-standard mechanism in that case. > >If I have to pull in a dbus daemon in the initramfs, we still have the >chicken and egg problem for PID 1 talking to the logging daemon and >starting >dbus. >systemd cannot talk to journald via dbus unless dbus-daemon is started, >dbus >cannot log anything on startup, if journald is not running, etc... > >dbus-daemon would have to transition to the real root, and from the >real root >to the shutdown root, without losing state. Which does not sound fundamentally hard. Unify the roots, and make /run or wherever the dbus socket lives always available. As long as your initramfs has the latest versions of software there is no need for any tricky transitions except to upgrade software on a running system. >Of course this can all be done, but it would involve fallback >mechanisms, >which we want to get rid off. Only if you design things poorly. >Hopefully, you don't suggest to merge >dbus with >PID 1. Also with a daemon, you will lose the points mentioned in the >cover mail I don't see how something that is inappropriate to be in PID 1 is better in PID 0. >I don't care, if the kdbus speedup is only marginal. > >In my ideal world, there is a standard IPC mechanism from the beginning >to >the end, which does not rely on any process running (except the kernel) >and >which is used by _all_ tools, be it a system daemon providing >information and >interfaces about device assembly or network setup tools or end user >desktop >processes. And that is a beautiful dream and an absolutely rubbish way to get there. If the performance is not top notch everything can not use your beautiful IPC mechanism. Which means your dream fails. Good performance is a hard requirement to get where you want to be. >dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you >can >invent the wheel again (NIH, "we know better") and wait and see, if >that >works out. Until then the whole common IPC problem is unresolved and >Linux >distributions are just a collection of random software with no common >interoperability and home grown interfaces. kdbus seems to be the NIH "we know better better" approach. Many of it's design decisions we have chosen to differently elsewhere in the kernel because the have caused problems. When these issues have been pointed out in review people have blown off leading to the current mess. Furthermore I don't know that I have seen people arguing for transporting something other than dbus messages but rather I have seen people pointing out there have been many excellent IPC mechanisms that are simpler and faster for the same kind of task and suggesting mapping dbus to better kernel primitives might be productive. But seriously if you want to have one IPC mechanism to rule them all you won't succeed in convincing everyone with the currently sloppily designed kdbus code. Performance matters, simplicity matters, being able to explain design decisions matter. Eric ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-29 12:47 ` Harald Hoyer ` (4 preceding siblings ...) 2015-04-30 20:14 ` Eric W. Biederman @ 2015-05-01 15:49 ` Austin S Hemmelgarn 5 siblings, 0 replies; 333+ messages in thread From: Austin S Hemmelgarn @ 2015-05-01 15:49 UTC (permalink / raw) To: Harald Hoyer, John Stoffel, Havoc Pennington Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni [-- Attachment #1: Type: text/plain, Size: 1361 bytes --] On 2015-04-29 08:47, Harald Hoyer wrote: > Until then the whole common IPC problem is unresolved and Linux > distributions are just a collection of random software with no common > interoperability and home grown interfaces. I don't know how I managed to not notice this comment before, but I find it particularly hilarious. The part about 'no common interoperability' is just plain BS with the exception of some of the insanity being touted by systemd advocates and the insanity that is accessibility software on linux, you can easily string together pretty much arbitrary strings of commands using fifo's to achieve almost anything; the actual interoperability issues (WRT to the command line at least, which is where all the stuff you are complaining about works) come up only with stuff (like journald for example) that just refuses to use text interfaces on the command-line. Also, the 'home grown interfaces' you are complaining about are used on every operating system (not just Linux or other Unix progeny) every day, and no amount of better IPC is going to stop that; furthermore, almost every current 'standard' protocol or interface used on the internet started out as a 'home grown interface' (TCP/IP immediately comes to mind, followed shortly by NFS, SMTP, WebDAV, SSH, XML, JSON, and a whole slew of others). [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 14:48 ` Havoc Pennington 2015-04-28 17:18 ` Theodore Ts'o @ 2015-04-28 17:19 ` David Lang 2015-04-28 19:19 ` Havoc Pennington 1 sibling, 1 reply; 333+ messages in thread From: David Lang @ 2015-04-28 17:19 UTC (permalink / raw) To: Havoc Pennington Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, 28 Apr 2015, Havoc Pennington wrote: > btw if I can make a suggestion, it's quite confusing to talk about > "dbus" unqualified when we are talking about implementation issues, > since it muddles bus daemon vs. clients, and also since there are lots > of implementations of the client bindings: > > http://www.freedesktop.org/wiki/Software/DBusBindings/ > > For the bus daemon, the only two implementations I know of are the > original one (which uses libdbus as its binding) and kdbus, though. > > I would expect there's no question the bus daemon can be faster, maybe > say 1.5x raw sockets instead of 2.5x, or whatever - something on that > order. Should probably simply stipulate this for discussion purposes: > "someone could optimize the crap out of the bus daemon". The kdbus > question is about whether to eliminate this daemon entirely. As I'm seeing things, we aren't talking about 1.5x vs 2.5x, we're talking about 1000x If the examples that are being used to show the performance advantage of kdbus vs normal dbus are doing the wrong thing, then we need to get some other examples available to people who don't live and breath dbus that 'so things right' so that the kernel developers can see what you think is the real problem and how kdbus addresses it. So far, this 'wrong' example is the only thing that's been posted to show the performance advantage of kdbus. David Lang ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 17:19 ` David Lang @ 2015-04-28 19:19 ` Havoc Pennington 2015-04-28 20:34 ` David Lang 2015-04-28 20:43 ` Linus Torvalds 0 siblings, 2 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-28 19:19 UTC (permalink / raw) To: David Lang Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@lang.hm> wrote: > If the examples that are being used to show the performance advantage of > kdbus vs normal dbus are doing the wrong thing, then we need to get some > other examples available to people who don't live and breath dbus that 'so > things right' so that the kernel developers can see what you think is the > real problem and how kdbus addresses it. > > So far, this 'wrong' example is the only thing that's been posted to show > the performance advantage of kdbus. I'm hopeful someone will do that. fwiw, I would be suspicious of a broken benchmark if it didn't show: * the bus daemon means an extra read/parse and marshal/write per message, so 4 vs. 2 * the existence of the bus daemon therefore makes a message send/receive take roughly twice as long https://lwn.net/Articles/580194/ has a bit more elaboration about number of copies, validations, and context switches in each case. >From what I can tell, the core performance claim for kdbus is that for a userspace daemon to be a routing intermediary, it has to receive and re-send messages. If the baseline performance of IPC is the cost to send once and receive once, adding the daemon means there's twice as much to do (1 more receive, 1 more send). However fast you make send/receive, the daemon always means there are twice as many send/receives as there would be with no daemon. If that isn't what a benchmark shows, then there's a mystery to explain... (one disruption to the ratio of course could be if the clients use a much faster or slower dbus lib than the daemon) As noted many times, of course this 2x penalty for the daemon was a conscious tradeoff - kdbus is trying to escape the tradeoff in order to extend usage of dbus to more use cases. Given the tradeoff, _existing_ uses of dbus seem to prefer the performance hit to the loss of useful semantics, but potential new users would like to or need to have both. That LWN article lists some other non-performance rationales for kdbus too, of course. Aside: earlier I referred to the systemd dbus client binding without a link, the link appears to be: http://cgit.freedesktop.org/systemd/systemd/tree/src/libsystemd/sd-bus Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 19:19 ` Havoc Pennington @ 2015-04-28 20:34 ` David Lang 2015-04-28 20:42 ` Andy Lutomirski 2015-04-28 20:43 ` Linus Torvalds 1 sibling, 1 reply; 333+ messages in thread From: David Lang @ 2015-04-28 20:34 UTC (permalink / raw) To: Havoc Pennington Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, 28 Apr 2015, Havoc Pennington wrote: > On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@lang.hm> wrote: >> If the examples that are being used to show the performance advantage of >> kdbus vs normal dbus are doing the wrong thing, then we need to get some >> other examples available to people who don't live and breath dbus that 'so >> things right' so that the kernel developers can see what you think is the >> real problem and how kdbus addresses it. >> >> So far, this 'wrong' example is the only thing that's been posted to show >> the performance advantage of kdbus. > > I'm hopeful someone will do that. > > fwiw, I would be suspicious of a broken benchmark if it didn't show: > > * the bus daemon means an extra read/parse and marshal/write per > message, so 4 vs. 2 > * the existence of the bus daemon therefore makes a message > send/receive take roughly twice as long > > https://lwn.net/Articles/580194/ has a bit more elaboration about > number of copies, validations, and context switches in each case. > > From what I can tell, the core performance claim for kdbus is that for > a userspace daemon to be a routing intermediary, it has to receive and > re-send messages. If the baseline performance of IPC is the cost to > send once and receive once, adding the daemon means there's twice as > much to do (1 more receive, 1 more send). However fast you make > send/receive, the daemon always means there are twice as many > send/receives as there would be with no daemon. there are twice as many context switches, nobody disputes that, the question is if it matters. It doesn't matter if the message router is in kernel space or user space, it still needs to read/parse, marshal/write the data, so you aren't saving that time due to it being in the kernel. > If that isn't what a benchmark shows, then there's a mystery to > explain... (one disruption to the ratio of course could be if the > clients use a much faster or slower dbus lib than the daemon) > > As noted many times, of course this 2x penalty for the daemon was a > conscious tradeoff - kdbus is trying to escape the tradeoff in order > to extend usage of dbus to more use cases. Given the tradeoff, > _existing_ uses of dbus seem to prefer the performance hit to the loss > of useful semantics, but potential new users would like to or need to > have both. If there is a 2x performance improvement for being in the kernel, but a 100x performance improvement from fixing the userspace code, the effort should be spent on the userspace code, not on moving things to kernel space. Remember the Tux in-kernel webserver? it showed performance improvements from putting the http daemon in the kernel, and a lot of the arguments about it sound very similar (reduced context switches, etc) David Lang ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 20:34 ` David Lang @ 2015-04-28 20:42 ` Andy Lutomirski 0 siblings, 0 replies; 333+ messages in thread From: Andy Lutomirski @ 2015-04-28 20:42 UTC (permalink / raw) To: David Lang Cc: Havoc Pennington, Linus Torvalds, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 28, 2015 at 1:34 PM, David Lang <david@lang.hm> wrote: > On Tue, 28 Apr 2015, Havoc Pennington wrote: > >> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@lang.hm> wrote: >>> >>> If the examples that are being used to show the performance advantage of >>> kdbus vs normal dbus are doing the wrong thing, then we need to get some >>> other examples available to people who don't live and breath dbus that >>> 'so >>> things right' so that the kernel developers can see what you think is the >>> real problem and how kdbus addresses it. >>> >>> So far, this 'wrong' example is the only thing that's been posted to show >>> the performance advantage of kdbus. >> >> >> I'm hopeful someone will do that. >> >> fwiw, I would be suspicious of a broken benchmark if it didn't show: >> >> * the bus daemon means an extra read/parse and marshal/write per >> message, so 4 vs. 2 >> * the existence of the bus daemon therefore makes a message >> send/receive take roughly twice as long >> >> https://lwn.net/Articles/580194/ has a bit more elaboration about >> number of copies, validations, and context switches in each case. >> >> From what I can tell, the core performance claim for kdbus is that for >> a userspace daemon to be a routing intermediary, it has to receive and >> re-send messages. If the baseline performance of IPC is the cost to >> send once and receive once, adding the daemon means there's twice as >> much to do (1 more receive, 1 more send). However fast you make >> send/receive, the daemon always means there are twice as many >> send/receives as there would be with no daemon. > > > there are twice as many context switches, nobody disputes that, the question > is if it matters. > > It doesn't matter if the message router is in kernel space or user space, it > still needs to read/parse, marshal/write the data, so you aren't saving that > time due to it being in the kernel. > >> If that isn't what a benchmark shows, then there's a mystery to >> explain... (one disruption to the ratio of course could be if the >> clients use a much faster or slower dbus lib than the daemon) >> >> As noted many times, of course this 2x penalty for the daemon was a >> conscious tradeoff - kdbus is trying to escape the tradeoff in order >> to extend usage of dbus to more use cases. Given the tradeoff, >> _existing_ uses of dbus seem to prefer the performance hit to the loss >> of useful semantics, but potential new users would like to or need to >> have both. > > > If there is a 2x performance improvement for being in the kernel, but a 100x > performance improvement from fixing the userspace code, the effort should be > spent on the userspace code, not on moving things to kernel space. I would guess that, if we compared a highly optimized userspace implementation to a kernel implementation, we'd see less than 2x difference. After all, a userspace daemon doesn't really need to unmarshal and re-marshal anything except headers. For large messages, we could use splice and avoid a couple of copies, too. If the scheduler became a bottleneck, it could be interesting to add something like a send-and-poll primitive. I suspect that some workloads currently do unnecessary context switches with only standard POSIX primitives. If A sends a message to B, then there's a brief window in which both A and B are runnable. Ideally we wouldn't context switch until A calls poll or epoll_wait, but I don't know how well that works in practice. There's more room for generic improvements than just that. At LSF/MM we were talking about more scalable epoll variants that would allow a multithreaded daemon to be woken up on the core that received incoming data. That would allow an efficient multi-queue dbus with fewer migrations and IPIs. At some point, I'd like to implement PCID on x86 (if no one beats me to it, and this is a low priority for me), which will allow us to skip expensive TLB flushes while context switching. I have no idea whether ARM can do something similar. --Andy ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-28 19:19 ` Havoc Pennington 2015-04-28 20:34 ` David Lang @ 2015-04-28 20:43 ` Linus Torvalds 1 sibling, 0 replies; 333+ messages in thread From: Linus Torvalds @ 2015-04-28 20:43 UTC (permalink / raw) To: Havoc Pennington Cc: David Lang, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Tue, Apr 28, 2015 at 12:19 PM, Havoc Pennington <hp@pobox.com> wrote: > > From what I can tell, the core performance claim for kdbus is that for > a userspace daemon to be a routing intermediary, it has to receive and > re-send messages. If the baseline performance of IPC is the cost to > send once and receive once, adding the daemon means there's twice as > much to do (1 more receive, 1 more send). However fast you make > send/receive, the daemon always means there are twice as many > send/receives as there would be with no daemon. HOWEVER. That's only a good optimization strategy if the code is optimized to begin with. If the code spends 10x as much time in user space in "overhead" as it actually spends in the kernel, the proper place to optimize is to get rid of the 10x. That will make things much faster. Once user space is lean and mean, at that point do I believe that "ok, let's add kernel code for the last bit of performance". But as it is right now, anybody who works on kdbus and claims that _performance_ is the reason for their work is just looking at teh wrong piece of the puzzle. Now, there may be *other* reasons why kdbus is a good idea. But quite frankly, every time somebody asks "why", performance seems to be one of the main answers. And quite frankly, that *stinks*. Do proper optimizations of the actual real costs before starting to work on kernel stuff. It's *stupid* to add a kernel driver to get 2x improvement, when there's a 10x bloat in user space. Is that really so hard to see? I don't think it is at *all* appropriate to say "we're a f*cking bloated pig, but we're too lazy to fix the bloat and the primary performance problems, so we'll add a kernel interface to partially hide the issue". That is particularly true because if you fix the user-level performance problems, you may notice that there was something stupid in the interfaces, and some of the kernel interface design was wrong. Linus ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 22:14 ` Linus Torvalds 2015-04-28 13:44 ` Havoc Pennington @ 2015-06-22 17:33 ` Jindrich Makovicka 2015-06-22 20:23 ` Jiri Kosina 2015-06-22 21:24 ` Jindřich Makovička 2015-07-07 21:40 ` Johannes Stezenbach 3 siblings, 1 reply; 333+ messages in thread From: Jindrich Makovicka @ 2015-06-22 17:33 UTC (permalink / raw) To: linux-kernel On Mon, 27 Apr 2015 15:14:49 -0700, Linus Torvalds wrote: > On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> IOW, all the people who say that it's about avoiding context switches >> are probably just full of shit. It's not about context switches, it's >> about bad user-level code. > > Just to make sure, I did a system-wide profile (so that you can actually > see the overhead of context switching better), and that didn't change > the picture. > > The scheduler overhead *might* be 1% or so. > > So really. The people who talk about how kdbus improves performance are > just full of sh*t. Yes, it improves things, but the improvement seems to > be 100% "incidental", in that it avoids a few trips down the user-space > problems. > > The real problems seem to be in dbus memory management (suggestion: keep > a small per-thread cache of those message allocations) and to a smaller > degree in the crazy utf8 validation (why the f*ck does it do that > anyway?), with some locking problems thrown in for good measure. In case someone actually still reads this, I guess the global rw_lock in gobject/gtype.c is one of the culprits. Every GType instance allocation/ deallocation is serialized using this lock, which pretty much disqualifies GObject from being used for anything scalable to multiple threads. GStreamer used to have serious performance issues due to that, which AFAIK have been solved by removing GType from GStreamer core in the 1.0 release. Regards, -- Jindrich Makovicka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-06-22 17:33 ` Jindrich Makovicka @ 2015-06-22 20:23 ` Jiri Kosina 0 siblings, 0 replies; 333+ messages in thread From: Jiri Kosina @ 2015-06-22 20:23 UTC (permalink / raw) To: Jindrich Makovicka; +Cc: linux-kernel On Mon, 22 Jun 2015, Jindrich Makovicka wrote: > >> IOW, all the people who say that it's about avoiding context switches > >> are probably just full of shit. It's not about context switches, it's > >> about bad user-level code. > > > > Just to make sure, I did a system-wide profile (so that you can actually > > see the overhead of context switching better), and that didn't change > > the picture. > > > > The scheduler overhead *might* be 1% or so. > > > > So really. The people who talk about how kdbus improves performance are > > just full of sh*t. Yes, it improves things, but the improvement seems to > > be 100% "incidental", in that it avoids a few trips down the user-space > > problems. > > > > The real problems seem to be in dbus memory management (suggestion: keep > > a small per-thread cache of those message allocations) and to a smaller > > degree in the crazy utf8 validation (why the f*ck does it do that > > anyway?), with some locking problems thrown in for good measure. > > In case someone actually still reads this, I guess the global rw_lock in > gobject/gtype.c is one of the culprits. Every GType instance allocation/ > deallocation is serialized using this lock, which pretty much > disqualifies GObject from being used for anything scalable to multiple > threads. > > GStreamer used to have serious performance issues due to that, which > AFAIK have been solved by removing GType from GStreamer core in the 1.0 > release. This is interesting piece of information, but you unfortunately dropped everybody from CC, so it's very likely that it's going to be lost in the noise. Could you please resend with CC list restored? Thanks, -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 22:14 ` Linus Torvalds 2015-04-28 13:44 ` Havoc Pennington 2015-06-22 17:33 ` Jindrich Makovicka @ 2015-06-22 21:24 ` Jindřich Makovička 2015-07-07 21:40 ` Johannes Stezenbach 3 siblings, 0 replies; 333+ messages in thread From: Jindřich Makovička @ 2015-06-22 21:24 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, 27 Apr 2015 15:14:49 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > IOW, all the people who say that it's about avoiding context > > switches are probably just full of shit. It's not about context > > switches, it's about bad user-level code. > > Just to make sure, I did a system-wide profile (so that you can > actually see the overhead of context switching better), and that > didn't change the picture. > > The scheduler overhead *might* be 1% or so. > > So really. The people who talk about how kdbus improves performance > are just full of sh*t. Yes, it improves things, but the improvement > seems to be 100% "incidental", in that it avoids a few trips down the > user-space problems. > > The real problems seem to be in dbus memory management (suggestion: > keep a small per-thread cache of those message allocations) and to a > smaller degree in the crazy utf8 validation (why the f*ck does it do > that anyway?), with some locking problems thrown in for good measure. In case someone actually still reads this, I guess the global rw_lock in gobject/gtype.c, used by the GDbus binding, is one of the culprits. Every GType instance allocation/ deallocation is serialized using this lock, which pretty much disqualifies GObject from being used for anything scalable to multiple threads. GStreamer used to have serious performance issues due to that, which AFAIK have been solved by removing GType from GStreamer core in the 1.0 release. Regards, -- Jindrich Makovicka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 22:14 ` Linus Torvalds ` (2 preceding siblings ...) 2015-06-22 21:24 ` Jindřich Makovička @ 2015-07-07 21:40 ` Johannes Stezenbach 3 siblings, 0 replies; 333+ messages in thread From: Johannes Stezenbach @ 2015-07-07 21:40 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni [-- Attachment #1: Type: text/plain, Size: 2135 bytes --] On Mon, Apr 27, 2015 at 03:14:49PM -0700, Linus Torvalds wrote: > On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > IOW, all the people who say that it's about avoiding context switches > > are probably just full of shit. It's not about context switches, it's > > about bad user-level code. > > Just to make sure, I did a system-wide profile (so that you can > actually see the overhead of context switching better), and that > didn't change the picture. > > The scheduler overhead *might* be 1% or so. > > So really. The people who talk about how kdbus improves performance > are just full of sh*t. Yes, it improves things, but the improvement > seems to be 100% "incidental", in that it avoids a few trips down the > user-space problems. I was interested how plain UDS performs compared to the dbus-client/dbus-server benchmark when doing a similar transaction (RPC call from client1 to client2 via a server, i.e 4 send() and 4 recv() syscalls per RPC msg). Since I had worked on socket code for some project anyway, I decided to write a stupid little benchmark. On my machine, dbus-client/dbus-server needs ~200us per call (1024 byte msg), UDS "dbus call" needs ~23us. Of course, someone who cares about performance wouldn't use sync RPC via a message broker, so I added single-client and async mode to the benchmark for comparison. Async mode not only decreases scheduling overhead, it also can use two CPU cores, so it's more than twice as fast. ./server dbus (you need to run two clients, the timing loop starts when the second client connects) ./client sync 4096 1000000 22.757250 s, 43942 msg/s, 22.8 us/msg, 171.638 MB/s ./client async 4096 1000000 8.197482 s, 121989 msg/s, 8.2 us/msg, 476.488 MB/s ./server single (only a single client talks to the server) ./client sync 4096 1000000 10.980143 s, 91073 msg/s, 11.0 us/msg, 355.733 MB/s ./client async 4096 1000000 3.041953 s, 328736 msg/s, 3.0 us/msg, 1284.044 MB/s In all cases 1 msg means "send request + receive response". Johannes [-- Attachment #2: server.c --] [-- Type: text/x-csrc, Size: 2339 bytes --] /* UDS server */ #include <errno.h> #include <poll.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <sys/timerfd.h> #include <sys/un.h> // use abstract address #define SOCKET "\0udsbench.socket" static int die(const char *msg) { if (errno) fprintf(stderr, "%s: error %d %m\n", msg, errno); else fprintf(stderr, "%s\n", msg); if (errno != ECONNRESET) exit(EXIT_FAILURE); return 0; } int main(int argc, char *argv[]) { struct sockaddr_un addr = { .sun_family = AF_UNIX, .sun_path = SOCKET, }; int sock, client1, client2 = -1, rc, len; struct pollfd pfd[2]; char buf[65536]; unsigned long cnt = 0; bool single = false; if (argc != 2) die("usage: server {single|dbus}"); if (!strcmp(argv[1], "single")) single = true; printf("running in %s mode\n", single ? "single" : "dbus"); sock = socket(AF_UNIX, SOCK_SEQPACKET, 0); if (sock < 0) die("can't create socket"); if (bind(sock, (struct sockaddr *) &addr, sizeof(addr)) < 0) die("can't bind address"); if (listen(sock, 5) < 0) die("can't listen"); printf("waiting for client 1\n"); client1 = accept(sock, NULL, NULL); if (client1 < 0) die("accept"); if (!single) { printf("waiting for client 2\n"); client2 = accept(sock, NULL, NULL); if (client2 < 0) die("accept"); write(client2, "\01", 1); } write(client1, "\0", 1); printf("enter event loop\n"); pfd[0].fd = client1; pfd[1].fd = client2; pfd[0].events = pfd[1].events = POLLIN; for (;;) { rc = poll(pfd, single ? 1 : 2, -1); if (rc < 0) die("poll"); if (pfd[0].revents & POLLIN) { len = read(client1, buf, sizeof(buf)); if (len < 0) { die("read from client 1"); break; } if (len == 0) { printf("client 1 EOF\n"); break; } rc = write(single ? client1 : client2, buf, len); if (len != rc) { die("write to client 2"); break; } cnt++; } if (pfd[1].revents & POLLIN) { len = read(client2, buf, sizeof(buf)); if (len < 0) { die("read from client 2"); break; } if (len == 0) { printf("client 2 EOF\n"); break; } rc = write(client1, buf, len); if (len != rc) { die("write to client 1"); break; } cnt++; } } printf("passed %lu messages\n", cnt); return EXIT_SUCCESS; } [-- Attachment #3: client.c --] [-- Type: text/x-csrc, Size: 2662 bytes --] /* UDS client */ #include <alloca.h> #include <errno.h> #include <poll.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <time.h> #include <unistd.h> #include <sys/socket.h> #include <sys/timerfd.h> #include <sys/un.h> // use abstract address #define SOCKET "\0udsbench.socket" static int die(const char *msg) { if (errno) fprintf(stderr, "%s: error %d %m\n", msg, errno); else fprintf(stderr, "%s\n", msg); if (errno != EPIPE) exit(EXIT_FAILURE); return 0; } int main(int argc, char *argv[]) { struct sockaddr_un addr = { .sun_family = AF_UNIX, .sun_path = SOCKET, }; int sock, rc, client = 1, i; char *buf; struct timespec start, end; double duration; bool async = false; struct pollfd pfd; typeof(&read) f1, f2; long msglen, loops; if (argc != 4) die("usage: client {sync|async} msglen loops"); if (!strcmp(argv[1], "async")) async = true; msglen = strtoul(argv[2], NULL, 0); loops = strtoul(argv[3], NULL, 0); printf("running in %s mode, msg size %lu, %lu loops\n", async ? "async" : "sync", msglen, loops); buf = alloca(msglen); sock = socket(AF_UNIX, SOCK_SEQPACKET, 0); if (sock < 0) die("can't create socket"); if (connect(sock, (struct sockaddr *) &addr, sizeof(addr)) < 0) die("can't connect"); printf("waiting for other client\n"); rc = read(sock, buf, 1); if (rc != 1) { die("read"); exit(EXIT_FAILURE); } if (buf[0] != '\0') client = 2; printf("this is client %d\n", client); clock_gettime(CLOCK_MONOTONIC, &start); if (client == 1) f1 = (typeof(&read))write, f2 = read; else f1 = read, f2 = (typeof(&read))write; if (async && client == 1) { pfd.fd = sock; pfd.events = POLLIN | POLLOUT; for (i = 0; i < loops; ) { rc = poll(&pfd, 1, -1); if (rc == -1) die("poll"); if (pfd.revents & POLLOUT) { rc = write(sock, buf, msglen); if (rc != msglen) { die("write"); break; } } if (pfd.revents & POLLIN) { rc = read(sock, buf, msglen); if (rc != msglen) { die("read"); break; } i++; } } } else { for (i = 0; i < loops; i++) { rc = f1(sock, buf, msglen); if (rc != msglen) { die(f1 == read ? "read" : "write"); break; } rc = f2(sock, buf, msglen); if (rc != msglen) { die(f2 == read ? "read" : "write"); break; } } } clock_gettime(CLOCK_MONOTONIC, &end); duration = end.tv_sec - start.tv_sec + (end.tv_nsec - start.tv_nsec) * 1e-9; printf("%f s, %.0f msg/s, %.1f us/msg, %.3f MB/s\n", duration, loops / duration, duration * 1e6 / loops, (loops * msglen >> 20) / duration); return EXIT_SUCCESS; } [-- Attachment #4: Makefile --] [-- Type: text/plain, Size: 79 bytes --] CFLAGS := -O3 -Wall CC := gcc all: client server clean: rm -f client server ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 22:00 ` Linus Torvalds 2015-04-27 22:14 ` Linus Torvalds @ 2015-04-28 12:49 ` Havoc Pennington 1 sibling, 0 replies; 333+ messages in thread From: Havoc Pennington @ 2015-04-28 12:49 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On Mon, Apr 27, 2015 at 6:00 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > If somebody wants to speed up dbus, they should likely look at the > user-space code, not the kernel side. To be more precise, your profile seems to show a lot of the gdbus (glib bindings) user space code. (And the blocking version of this benchmark *is* doing something ridiculous, by blocking for every round trip, which is the one performance mistake the docs say over and over not to make.) There are at least two other C bindings (a plain-C one in systemd and the original libdbus). If someone wanted to get the noise out of the picture I imagine the plain-C bindings in systemd might have a lot less in the way of allocations and locks than gdbus, though I haven't looked at them. Those systemd bindings are also the ones people asking for more performance are probably using (because they are talking about early boot, system services, etc.) > My guess is that pretty much the entirely of the quoted kdbus > "speedup" isn't because it speeds up any kernel side thing, it's > because it avoids the user-space crap in the dbus server. The dbus bus daemon doesn't link to any g_ functions, fwiw, when interpreting these profiles. Though nobody would claim the bus daemon is fast, it is on the order of a few times slower than a raw socket last I checked (which was a long time ago) ... here are some old threads: http://lists.freedesktop.org/pipermail/dbus/2004-November/001779.html http://lists.freedesktop.org/archives/dbus/2012-March/015024.html In 2004, the libdbus parsing/validation/malloc/etc. overhead was 2.5x a raw socket without the bus daemon, and about twice that with the bus daemon (since the daemon adds another read and another write per message). I'm not aware of any reason this would have changed dramatically, though it doesn't mean there isn't one. Havoc ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-27 21:32 ` Linus Torvalds 2015-04-27 21:40 ` Andy Lutomirski @ 2015-04-28 10:39 ` Lukasz Skalski 1 sibling, 0 replies; 333+ messages in thread From: Lukasz Skalski @ 2015-04-28 10:39 UTC (permalink / raw) To: Linus Torvalds Cc: Greg Kroah-Hartman, Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni On 04/27/2015 11:32 PM, Linus Torvalds wrote: > On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote: >> >> To perform tests I've created two simple apps: >> >> - server: http://fpaste.org/215157/ >> - client: http://fpaste.org/215156/ > > So since Andy reported that dbus seems to be a few orders of magnitude > too slow, I tried to build these apps to see what it even does. > > They don't buidl on F21. You seem to be using features that are too > new to exist even in fairly modern distros: > > server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared > > so I can't even see what dbus does *now*. > I've just explained it in my mail to Andy. As it was discussed some time ago with GLib developers, we introduced two new bus types called "user" (G_BUS_TYPE_USER) and "machine" (G_BUS_TYPE_MACHINE). At this moment, these are only available on GLib devel branch, so I should replace G_BUS_TYPE_USER with G_BUS_TYPE_SESSION before I posted my benchmark apps - sorry for that. > That said, either you're running your test on a potato, or dbus is > seriously screwed up. No way should it take 4+ seconds to send a 1000b > message to back and forth 20k times. But as mentioned, I can't even > see what it's doing right now. > > Linus > -- Lukasz Skalski Samsung R&D Institute Poland Samsung Electronics l.skalski@samsung.com ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 13:05 ` Greg Kroah-Hartman 2015-04-23 14:17 ` One Thousand Gnomes 2015-04-23 16:36 ` Greg Kroah-Hartman @ 2015-04-23 18:33 ` Richard Weinberger 2015-04-23 19:01 ` Greg Kroah-Hartman 2 siblings, 1 reply; 333+ messages in thread From: Richard Weinberger @ 2015-04-23 18:33 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, LKML, Daniel Mack, David Herrmann, Djalal Harouni, Borislav Petkov, Steven Rostedt On Thu, Apr 23, 2015 at 3:05 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > Did I miss anything else here? Are there any technical reasons I'm > forgetting about for why this can't be pulled in as-is for this merge > window? Maybe I get again accused of ``being a jerk'' but I still dare to ask about Boris' unanswered question: http://marc.info/?l=linux-kernel&m=142969313220781 In fact Boris and I are currently reviewing the code but it is a slow process as we both have day jobs... AFACT Steven is also looking at it. -- Thanks, //richard ^ permalink raw reply [flat|nested] 333+ messages in thread
* Re: [GIT PULL] kdbus for 4.1-rc1 2015-04-23 18:33 ` Richard Weinberger @ 2015-04-23 19:01 ` Greg Kroah-Hartman 0 siblings, 0 replies; 333+ messages in thread From: Greg Kroah-Hartman @ 2015-04-23 19:01 UTC (permalink / raw) To: Richard Weinberger Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski, LKML, Daniel Mack, David Herrmann, Djalal Harouni, Borislav Petkov, Steven Rostedt On Thu, Apr 23, 2015 at 08:33:47PM +0200, Richard Weinberger wrote: > On Thu, Apr 23, 2015 at 3:05 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > Did I miss anything else here? Are there any technical reasons I'm > > forgetting about for why this can't be pulled in as-is for this merge > > window? > > Maybe I get again accused of ``being a jerk'' but I still dare to ask about > Boris' unanswered question: > http://marc.info/?l=linux-kernel&m=142969313220781 No, I'm not going to say you are being a jerk for asking for a response, only when you say things like "the code is too big" :) I go reply now, I didn't really understand it the first time around, and still don't the second time either... > In fact Boris and I are currently reviewing the code but it is a slow > process as we both have day jobs... That's great, thanks for doing it. greg k-h ^ permalink raw reply [flat|nested] 333+ messages in thread
end of thread, other threads:[~2015-07-07 22:22 UTC | newest] Thread overview: 333+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-15 18:18 [GIT PULL] kdbus for 4.1-rc1 Linus Torvalds 2015-04-15 18:28 ` Linus Torvalds 2015-04-15 18:37 ` Greg Kroah-Hartman 2015-04-15 22:16 ` One Thousand Gnomes 2015-04-15 18:37 ` Greg Kroah-Hartman 2015-04-15 22:26 ` Andy Lutomirski 2015-04-16 18:20 ` David Herrmann 2015-04-20 20:43 ` Richard Weinberger 2015-04-20 20:56 ` Greg Kroah-Hartman 2015-04-20 21:16 ` Richard Weinberger 2015-04-20 21:46 ` Greg Kroah-Hartman 2015-04-20 22:06 ` Andy Lutomirski 2015-04-21 7:38 ` Johannes Stezenbach 2015-04-21 9:35 ` One Thousand Gnomes 2015-04-21 10:17 ` David Herrmann 2015-04-21 12:20 ` Michal Hocko 2015-04-21 14:01 ` David Herrmann 2015-04-21 14:27 ` Michal Hocko 2015-04-21 14:47 ` David Herrmann 2015-04-21 18:11 ` Andy Lutomirski 2015-04-22 14:57 ` Michal Hocko 2015-04-22 19:36 ` Andy Lutomirski 2015-04-27 12:46 ` Michal Hocko 2015-04-27 20:11 ` Andy Lutomirski 2015-04-29 17:24 ` Michal Hocko 2015-04-21 10:51 ` Greg Kroah-Hartman 2015-04-21 11:03 ` Jiri Kosina 2015-04-21 12:56 ` Greg Kroah-Hartman 2015-04-21 10:31 ` Greg Kroah-Hartman 2015-04-21 10:53 ` Borislav Petkov 2015-04-21 11:09 ` Greg Kroah-Hartman 2015-04-21 11:39 ` Borislav Petkov 2015-04-21 13:18 ` Olivier Galibert 2015-04-21 13:48 ` Greg Kroah-Hartman 2015-04-21 15:53 ` One Thousand Gnomes 2015-04-21 18:18 ` Andy Lutomirski 2015-04-21 8:18 ` Richard Cochran 2015-04-21 9:07 ` Johannes Stezenbach 2015-04-21 13:37 ` Havoc Pennington 2015-04-22 1:51 ` Bernd Petrovitsch 2015-04-22 3:11 ` Havoc Pennington 2015-04-22 13:09 ` Johannes Stezenbach -- strict thread matches above, loose matches on Subject: below -- 2015-04-20 20:26 George Spelvin 2015-04-21 12:08 ` Austin S Hemmelgarn 2015-04-13 19:03 Greg Kroah-Hartman 2015-04-13 19:29 ` Eric W. Biederman 2015-04-13 19:42 ` Greg Kroah-Hartman 2015-04-13 19:49 ` Richard Weinberger 2015-04-13 19:54 ` Greg Kroah-Hartman 2015-04-13 19:57 ` Richard Weinberger 2015-04-13 20:03 ` Greg Kroah-Hartman 2015-04-13 20:08 ` Richard Weinberger 2015-04-13 20:22 ` Al Viro 2015-04-13 20:37 ` Greg Kroah-Hartman 2015-04-15 1:36 ` Andy Lutomirski 2015-04-15 6:54 ` Richard Weinberger 2015-04-15 7:31 ` Mike Galbraith 2015-04-15 14:48 ` Michal Schmidt 2015-04-15 15:34 ` Mike Galbraith 2015-04-15 16:42 ` Mike Galbraith 2015-04-17 16:53 ` Mike Galbraith 2015-04-15 8:48 ` Greg Kroah-Hartman 2015-04-15 9:00 ` Richard Weinberger 2015-04-15 9:20 ` Greg Kroah-Hartman 2015-04-15 9:21 ` Borislav Petkov 2015-04-15 9:27 ` Greg Kroah-Hartman 2015-04-15 9:30 ` Richard Weinberger 2015-04-15 9:49 ` Greg Kroah-Hartman 2015-04-15 9:53 ` Richard Weinberger 2015-04-15 9:44 ` Borislav Petkov 2015-04-15 11:40 ` Greg Kroah-Hartman 2015-04-15 13:03 ` Borislav Petkov 2015-04-15 15:41 ` Steven Rostedt 2015-04-15 16:40 ` Greg Kroah-Hartman 2015-04-15 16:48 ` Jiri Kosina 2015-04-15 17:33 ` Greg Kroah-Hartman 2015-04-15 18:06 ` Steven Rostedt 2015-04-16 8:43 ` Jiri Kosina 2015-04-15 17:20 ` Steven Rostedt 2015-04-15 17:41 ` Havoc Pennington 2015-04-15 17:55 ` Greg Kroah-Hartman 2015-04-15 21:55 ` One Thousand Gnomes 2015-04-15 18:12 ` Greg Kroah-Hartman 2015-04-15 19:04 ` Martin Steigerwald 2015-04-15 9:28 ` Richard Weinberger 2015-04-15 11:25 ` One Thousand Gnomes 2015-04-15 13:20 ` Borislav Petkov 2015-04-15 15:45 ` Steven Rostedt 2015-04-15 15:46 ` Andy Lutomirski 2015-04-15 16:35 ` Greg Kroah-Hartman 2015-04-15 17:06 ` Steven Rostedt 2015-04-15 17:31 ` Greg Kroah-Hartman 2015-04-15 18:04 ` Steven Rostedt 2015-04-15 21:56 ` One Thousand Gnomes 2015-04-15 22:11 ` Andy Lutomirski 2015-04-15 22:18 ` Al Viro 2015-04-15 22:28 ` Andy Lutomirski 2015-04-15 22:48 ` Al Viro 2015-04-15 22:54 ` Andy Lutomirski 2015-04-15 23:27 ` Al Viro 2015-04-16 0:47 ` Andy Lutomirski 2015-04-16 1:04 ` Al Viro 2015-04-16 5:53 ` Andy Lutomirski 2015-04-15 22:56 ` Eric Dumazet 2015-04-16 10:31 ` Daniel Mack 2015-04-16 12:02 ` Tom Gundersen 2015-04-16 12:15 ` Olaf Hering 2015-04-16 12:43 ` Harald Hoyer 2015-04-21 16:36 ` Eric W. Biederman 2015-04-21 19:38 ` Matthew Garrett 2015-04-21 19:55 ` Austin S Hemmelgarn 2015-04-15 8:18 ` Martin Steigerwald 2015-04-15 8:32 ` Greg Kroah-Hartman 2015-04-15 8:52 ` Martin Steigerwald 2015-04-15 9:02 ` Greg Kroah-Hartman 2015-04-15 9:28 ` Martin Steigerwald 2015-04-15 11:52 ` Greg Kroah-Hartman 2015-04-15 8:29 ` Greg Kroah-Hartman 2015-04-14 0:19 ` Eric W. Biederman 2015-04-14 0:34 ` Andy Lutomirski 2015-04-14 17:55 ` Greg Kroah-Hartman 2015-04-22 8:58 ` Borislav Petkov 2015-04-23 19:14 ` Greg Kroah-Hartman 2015-04-23 20:56 ` Borislav Petkov 2015-04-23 21:22 ` David Herrmann 2015-04-23 21:33 ` Richard Weinberger 2015-04-24 14:02 ` Steven Rostedt 2015-04-23 21:41 ` Borislav Petkov 2015-04-24 5:02 ` Steven Noonan 2015-04-24 9:04 ` Borislav Petkov 2015-04-24 10:28 ` Daniel Mack 2015-04-24 10:50 ` Borislav Petkov 2015-04-24 11:26 ` Daniel Mack 2015-04-24 6:36 ` Greg Kroah-Hartman 2015-04-24 6:45 ` Greg Kroah-Hartman 2015-04-24 7:27 ` Martin Steigerwald 2015-04-24 8:35 ` Greg Kroah-Hartman 2015-04-13 20:13 ` Andy Lutomirski 2015-04-13 20:45 ` Greg Kroah-Hartman 2015-04-13 21:01 ` Andy Lutomirski 2015-04-14 17:50 ` Greg Kroah-Hartman 2015-04-14 18:57 ` Andy Lutomirski 2015-04-14 19:23 ` Greg Kroah-Hartman 2015-04-14 19:24 ` Borislav Petkov 2015-04-14 19:32 ` Greg Kroah-Hartman 2015-04-14 19:40 ` Al Viro 2015-04-14 19:48 ` Greg Kroah-Hartman 2015-04-14 19:53 ` Borislav Petkov 2015-04-15 8:44 ` Greg Kroah-Hartman 2015-04-15 8:54 ` Jiri Kosina 2015-04-15 9:09 ` Greg Kroah-Hartman 2015-04-15 12:36 ` Al Viro 2015-04-15 13:13 ` Greg Kroah-Hartman 2015-04-15 16:47 ` Steven Rostedt 2015-04-15 9:35 ` Borislav Petkov 2015-04-15 11:45 ` Greg Kroah-Hartman 2015-04-14 20:11 ` Martin Steigerwald 2015-04-14 22:39 ` Jiri Kosina 2015-04-15 8:38 ` Greg Kroah-Hartman 2015-04-15 10:37 ` One Thousand Gnomes 2015-04-15 11:49 ` Greg Kroah-Hartman 2015-04-15 12:03 ` One Thousand Gnomes 2015-04-15 12:41 ` Greg Kroah-Hartman 2015-04-15 14:06 ` One Thousand Gnomes 2015-04-15 16:27 ` Havoc Pennington 2015-04-15 12:55 ` Al Viro 2015-04-15 17:33 ` Steven Rostedt 2015-04-15 18:11 ` Greg Kroah-Hartman 2015-04-14 19:35 ` Al Viro 2015-04-14 19:43 ` Greg Kroah-Hartman 2015-04-15 17:59 ` Austin S Hemmelgarn 2015-04-15 18:04 ` Rik van Riel 2015-04-15 22:22 ` One Thousand Gnomes 2015-04-16 16:02 ` Havoc Pennington 2015-04-16 17:31 ` David Herrmann 2015-04-16 20:55 ` Al Viro 2015-04-18 11:44 ` David Herrmann 2015-04-16 16:37 ` Robert Schwebel 2015-04-17 13:45 ` Greg Kroah-Hartman 2015-04-21 16:54 ` Diego Viola 2015-04-21 17:06 ` Greg Kroah-Hartman 2015-04-21 17:25 ` Diego Viola 2015-04-14 20:14 ` John Stoffel 2015-04-14 21:51 ` Steven Rostedt 2015-04-14 22:05 ` Jiri Kosina 2015-04-15 6:56 ` Borislav Petkov 2015-04-15 8:37 ` Greg Kroah-Hartman 2015-04-15 18:12 ` James Bottomley 2015-04-16 12:13 ` David Herrmann 2015-04-17 19:27 ` James Bottomley 2015-04-17 20:27 ` Havoc Pennington 2015-04-17 21:45 ` Alex Elsayed 2015-04-20 18:01 ` James Bottomley 2015-04-21 8:09 ` Daniel Mack 2015-04-21 18:25 ` Andy Lutomirski 2015-04-15 8:35 ` Greg Kroah-Hartman 2015-04-15 12:00 ` Greg Kroah-Hartman 2015-04-15 12:09 ` Jiri Kosina 2015-04-15 12:18 ` One Thousand Gnomes 2015-04-15 12:30 ` Greg Kroah-Hartman 2015-04-15 12:27 ` Greg Kroah-Hartman 2015-04-14 22:33 ` Jiri Kosina 2015-04-15 8:56 ` Greg Kroah-Hartman 2015-04-15 11:06 ` One Thousand Gnomes 2015-04-15 16:00 ` Rik van Riel 2015-04-15 16:44 ` Havoc Pennington 2015-04-15 18:16 ` Steven Rostedt 2015-04-15 18:40 ` Havoc Pennington 2015-04-15 20:22 ` Andy Lutomirski 2015-04-15 20:41 ` Al Viro 2015-04-15 21:07 ` Rik van Riel 2015-04-16 18:03 ` Djalal Harouni 2015-04-15 21:58 ` Havoc Pennington 2015-04-16 13:13 ` Tom Gundersen 2015-04-16 14:34 ` Andy Lutomirski 2015-04-16 15:01 ` David Herrmann 2015-04-16 17:04 ` Andy Lutomirski 2015-04-17 9:19 ` Michal Hocko 2015-04-17 18:54 ` Andy Lutomirski 2015-04-20 12:43 ` Michal Hocko 2015-04-20 20:03 ` Andy Lutomirski 2015-04-16 19:01 ` Havoc Pennington 2015-04-17 13:23 ` Daniel Mack 2015-04-17 14:54 ` Havoc Pennington 2015-04-15 22:08 ` One Thousand Gnomes 2015-04-16 13:14 ` Daniel Mack 2015-04-16 17:15 ` One Thousand Gnomes 2015-04-23 13:05 ` Greg Kroah-Hartman 2015-04-23 14:17 ` One Thousand Gnomes 2015-04-23 16:36 ` Greg Kroah-Hartman 2015-04-23 16:46 ` Andy Lutomirski 2015-04-23 17:16 ` Greg Kroah-Hartman 2015-04-23 17:34 ` Andy Lutomirski 2015-04-23 17:42 ` Stephen Smalley 2015-04-23 19:30 ` Greg Kroah-Hartman 2015-04-24 2:08 ` Karol Lewandowski 2015-04-29 21:16 ` Paul Moore 2015-04-23 17:57 ` Linus Torvalds 2015-04-23 18:04 ` Linus Torvalds 2015-04-23 18:56 ` Greg Kroah-Hartman 2015-04-23 19:22 ` Andy Lutomirski 2015-04-23 19:33 ` Greg KH 2015-04-23 20:53 ` Linus Torvalds 2015-04-23 20:51 ` Linus Torvalds 2015-04-23 18:48 ` Linus Torvalds 2015-04-24 13:50 ` Lukasz Skalski 2015-04-24 14:19 ` Havoc Pennington 2015-04-24 14:34 ` Lukasz Skalski 2015-04-24 19:25 ` Greg Kroah-Hartman 2015-04-27 8:57 ` Lukasz Skalski 2015-04-27 17:18 ` Greg Kroah-Hartman 2015-04-27 22:29 ` David Lang 2015-04-28 10:53 ` Lukasz Skalski 2015-04-27 21:32 ` Linus Torvalds 2015-04-27 21:40 ` Andy Lutomirski 2015-04-27 22:00 ` Linus Torvalds 2015-04-27 22:14 ` Linus Torvalds 2015-04-28 13:44 ` Havoc Pennington 2015-04-28 14:48 ` Havoc Pennington 2015-04-28 17:18 ` Theodore Ts'o 2015-04-28 20:25 ` Havoc Pennington 2015-04-28 23:12 ` John Stoffel 2015-04-29 0:45 ` Havoc Pennington 2015-04-29 11:33 ` Harald Hoyer 2015-04-29 12:47 ` Harald Hoyer 2015-04-29 13:33 ` Richard Weinberger 2015-04-29 13:38 ` Harald Hoyer 2015-04-29 13:46 ` Richard Weinberger 2015-04-29 14:01 ` Harald Hoyer 2015-04-29 14:04 ` Richard Weinberger 2015-04-29 14:11 ` Harald Hoyer 2015-04-29 14:18 ` Richard Weinberger 2015-04-29 14:53 ` Harald Hoyer 2015-04-29 14:58 ` Richard Weinberger 2015-04-29 15:03 ` Theodore Ts'o 2015-04-29 15:21 ` Austin S Hemmelgarn 2015-04-30 9:05 ` Łukasz Stelmach 2015-04-30 9:12 ` Richard Weinberger 2015-04-30 10:19 ` Łukasz Stelmach 2015-04-30 10:40 ` Richard Weinberger 2015-04-30 12:16 ` Łukasz Stelmach 2015-04-30 12:23 ` Richard Weinberger 2015-04-30 12:40 ` Łukasz Stelmach 2015-04-30 12:45 ` Richard Weinberger 2015-04-30 14:52 ` Łukasz Stelmach 2015-04-30 15:05 ` Richard Weinberger 2015-07-03 9:13 ` cee1 2015-04-29 16:25 ` Martin Steigerwald 2015-04-29 14:46 ` Austin S Hemmelgarn 2015-04-29 14:51 ` Richard Weinberger 2015-04-29 15:07 ` Harald Hoyer 2015-04-29 15:17 ` Austin S Hemmelgarn 2015-04-29 15:22 ` Harald Hoyer 2015-04-29 15:41 ` Austin S Hemmelgarn 2015-04-29 18:28 ` Martin Steigerwald 2015-04-29 16:26 ` John Stoffel 2015-04-29 17:39 ` Steven Rostedt 2015-04-29 19:10 ` Martin Steigerwald 2015-04-29 19:28 ` John Stoffel 2015-04-29 22:49 ` Theodore Ts'o 2015-04-30 0:05 ` David Lang 2015-04-30 0:15 ` Dave Airlie 2015-04-30 0:18 ` David Lang 2015-04-30 1:20 ` Dave Airlie 2015-04-29 13:35 ` Stephen Smalley 2015-04-29 15:18 ` Simon McVittie 2015-04-29 17:48 ` Stephen Smalley 2015-04-29 15:27 ` Martin Steigerwald 2015-04-29 16:36 ` David Lang 2015-04-29 18:54 ` Andy Lutomirski 2015-04-29 19:30 ` Austin S Hemmelgarn 2015-04-29 19:42 ` Andy Lutomirski 2015-04-29 20:15 ` David Lang 2015-04-29 20:24 ` Andy Lutomirski 2015-04-29 20:43 ` David Lang 2015-04-29 20:51 ` David Herrmann 2015-04-30 1:42 ` John Stoffel 2015-04-29 22:34 ` John Stoffel 2015-04-30 20:14 ` Eric W. Biederman 2015-05-01 15:49 ` Austin S Hemmelgarn 2015-04-28 17:19 ` David Lang 2015-04-28 19:19 ` Havoc Pennington 2015-04-28 20:34 ` David Lang 2015-04-28 20:42 ` Andy Lutomirski 2015-04-28 20:43 ` Linus Torvalds 2015-06-22 17:33 ` Jindrich Makovicka 2015-06-22 20:23 ` Jiri Kosina 2015-06-22 21:24 ` Jindřich Makovička 2015-07-07 21:40 ` Johannes Stezenbach 2015-04-28 12:49 ` Havoc Pennington 2015-04-28 10:39 ` Lukasz Skalski 2015-04-23 18:33 ` Richard Weinberger 2015-04-23 19:01 ` Greg Kroah-Hartman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.