* [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? @ 2017-08-04 1:16 Andy Lutomirski 2017-08-04 1:30 ` Greg KH ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Andy Lutomirski @ 2017-08-04 1:16 UTC (permalink / raw) To: ksummit-discuss [Note: I'm not entirely sure I can make it to the kernel summit this year, due to having a tiny person and tons of travel] This may be highly controversial, but: there seems to be a weakness in the kernel development model in the way that new ABI features become stable. The current model is, roughly: 1. Someone writes the code. Maybe they cc linux-abi, maybe they don't. 2. People hopefully review the code. 3. A subsystem maintainer merges the code. They hope the ABI is right. 4. Linus gets a pull request. Linus probably doesn't review the ABI for sanity, style, blatant bugs, etc. If Linus did, then he'd never get anything else done. 5. The new ABI lands in -rc1. 6. If someone finds a problem or objects, it had better get fixed before the next real release. There's a few problems here. One is that the people who would really review the ABI might not even notice until step 5 or 6 or so. Another is that it takes some time for userspace to get experience with a new ABI. I'm wondering if there are other models that could work. I think it would be nice for us to be able to land a kernel in Linus tree and still wait a while before stabilizing it. Rust, for example, has a strict policy for this that seems to work quite well. Maybe we could pull something off where big new features hide behind a named feature gate for a while. That feature gate can only be enabled under some circumstances that make it very hard to mistake it for true stability. (For example, maybe you *can't* enable feature gates on a final kernel unless you manually patch something.) Here are a few examples that come to mind for where this would have helped: - Whatever that new RDMA socket type was that was deemed totally broken but only just after it hit a real release. - O_TMPFILE. I discovered that it corrupted filesystems in -rc6 or -rc7. That got fixed, the the API is still a steaming pile of crap. - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago. I'm sure there are tons more. Is this too crazy, or is it worth discussing? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski @ 2017-08-04 1:30 ` Greg KH 2017-08-04 4:15 ` Andy Lutomirski ` (2 more replies) 2017-08-04 2:26 ` Theodore Ts'o 2017-08-09 0:00 ` NeilBrown 2 siblings, 3 replies; 37+ messages in thread From: Greg KH @ 2017-08-04 1:30 UTC (permalink / raw) To: Andy Lutomirski; +Cc: ksummit-discuss On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote: > [Note: I'm not entirely sure I can make it to the kernel summit this > year, due to having a tiny person and tons of travel] > > This may be highly controversial, but: there seems to be a weakness in > the kernel development model in the way that new ABI features become > stable. The current model is, roughly: > > 1. Someone writes the code. Maybe they cc linux-abi, maybe they don't. > 2. People hopefully review the code. > 3. A subsystem maintainer merges the code. They hope the ABI is right. > 4. Linus gets a pull request. Linus probably doesn't review the ABI > for sanity, style, blatant bugs, etc. If Linus did, then he'd never > get anything else done. > 5. The new ABI lands in -rc1. > 6. If someone finds a problem or objects, it had better get fixed > before the next real release. > > There's a few problems here. One is that the people who would really > review the ABI might not even notice until step 5 or 6 or so. Another > is that it takes some time for userspace to get experience with a new > ABI. > > I'm wondering if there are other models that could work. I think it > would be nice for us to be able to land a kernel in Linus tree and > still wait a while before stabilizing it. Rust, for example, has a > strict policy for this that seems to work quite well. What does Rust do here? > Maybe we could pull something off where big new features hide behind a > named feature gate for a while. That feature gate can only be enabled > under some circumstances that make it very hard to mistake it for true > stability. (For example, maybe you *can't* enable feature gates on a > final kernel unless you manually patch something.) > > Here are a few examples that come to mind for where this would have helped: > > - Whatever that new RDMA socket type was that was deemed totally > broken but only just after it hit a real release. > - O_TMPFILE. I discovered that it corrupted filesystems in -rc6 or > -rc7. That got fixed, the the API is still a steaming pile of crap. > - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago. > > I'm sure there are tons more. > > Is this too crazy, or is it worth discussing? I think it is, it keeps coming up over and over and it's not getting any easier. We are long past the time when we only had to duplicate what other operating systems do, adding new features is much different. I like the "manually patch" thing as an good idea for how to maybe do this, but who is going to do that patching for testing? What's the rule for how long time has to pass before it can be enabled? thanks, greg k-h ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 1:30 ` Greg KH @ 2017-08-04 4:15 ` Andy Lutomirski 2017-08-04 5:08 ` Sergey Senozhatsky 2017-08-04 8:23 ` Daniel Vetter 2 siblings, 0 replies; 37+ messages in thread From: Andy Lutomirski @ 2017-08-04 4:15 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss, Andy Lutomirski On Thu, Aug 3, 2017 at 6:30 PM, Greg KH <greg@kroah.com> wrote: > On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote: >> I'm wondering if there are other models that could work. I think it >> would be nice for us to be able to land a kernel in Linus tree and >> still wait a while before stabilizing it. Rust, for example, has a >> strict policy for this that seems to work quite well. > > What does Rust do here? > Rust has named unstable features. In order to use them, you need to declare, with a special annotation, your desire to use the named feature. You also need to be running a nightly build of Rust -- stable versions will refuse to compile code that requests unstable features. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 1:30 ` Greg KH 2017-08-04 4:15 ` Andy Lutomirski @ 2017-08-04 5:08 ` Sergey Senozhatsky 2017-08-04 8:23 ` Daniel Vetter 2 siblings, 0 replies; 37+ messages in thread From: Sergey Senozhatsky @ 2017-08-04 5:08 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss, Andy Lutomirski On (08/03/17 18:30), Greg KH wrote: [..] > > There's a few problems here. One is that the people who would really > > review the ABI might not even notice until step 5 or 6 or so. Another > > is that it takes some time for userspace to get experience with a new > > ABI. > > > > I'm wondering if there are other models that could work. I think it > > would be nice for us to be able to land a kernel in Linus tree and > > still wait a while before stabilizing it. Rust, for example, has a > > strict policy for this that seems to work quite well. > > What does Rust do here? I think Andy meant how Rust tags all of its features: ... #[stable(feature = "rust1", since = "1.0.0")] fn finish(&self) -> u64; /// Writes some data into this `Hasher`. /// /// # Examples /// /// ``` /// use std::collections::hash_map::DefaultHasher; /// use std::hash::Hasher; /// /// let mut hasher = DefaultHasher::new(); /// let data = [0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef]; /// /// hasher.write(&data); /// /// println!("Hash is {:x}!", hasher.finish()); /// ``` #[stable(feature = "rust1", since = "1.0.0")] fn write(&mut self, bytes: &[u8]); /// Writes a single `u8` into this hasher. #[inline] #[stable(feature = "hasher_write", since = "1.3.0")] fn write_u8(&mut self, i: u8) { self.write(&[i]) } /// Writes a single `u16` into this hasher. #[inline] #[stable(feature = "hasher_write", since = "1.3.0")] fn write_u16(&mut self, i: u16) { self.write(&unsafe { mem::transmute::<_, [u8; 2]>(i) }) } #[unstable(feature = "sip_hash_13", issue = "34767")] #[allow(deprecated)] pub use self::sip::{SipHasher13, SipHasher24}; ... -ss ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 1:30 ` Greg KH 2017-08-04 4:15 ` Andy Lutomirski 2017-08-04 5:08 ` Sergey Senozhatsky @ 2017-08-04 8:23 ` Daniel Vetter 2 siblings, 0 replies; 37+ messages in thread From: Daniel Vetter @ 2017-08-04 8:23 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss, Andy Lutomirski On Fri, Aug 4, 2017 at 3:30 AM, Greg KH <greg@kroah.com> wrote: > On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote: >> [Note: I'm not entirely sure I can make it to the kernel summit this >> year, due to having a tiny person and tons of travel] >> >> This may be highly controversial, but: there seems to be a weakness in >> the kernel development model in the way that new ABI features become >> stable. The current model is, roughly: >> >> 1. Someone writes the code. Maybe they cc linux-abi, maybe they don't. >> 2. People hopefully review the code. >> 3. A subsystem maintainer merges the code. They hope the ABI is right. >> 4. Linus gets a pull request. Linus probably doesn't review the ABI >> for sanity, style, blatant bugs, etc. If Linus did, then he'd never >> get anything else done. >> 5. The new ABI lands in -rc1. >> 6. If someone finds a problem or objects, it had better get fixed >> before the next real release. >> >> There's a few problems here. One is that the people who would really >> review the ABI might not even notice until step 5 or 6 or so. Another >> is that it takes some time for userspace to get experience with a new >> ABI. >> >> I'm wondering if there are other models that could work. I think it >> would be nice for us to be able to land a kernel in Linus tree and >> still wait a while before stabilizing it. Rust, for example, has a >> strict policy for this that seems to work quite well. > > What does Rust do here? > >> Maybe we could pull something off where big new features hide behind a >> named feature gate for a while. That feature gate can only be enabled >> under some circumstances that make it very hard to mistake it for true >> stability. (For example, maybe you *can't* enable feature gates on a >> final kernel unless you manually patch something.) >> >> Here are a few examples that come to mind for where this would have helped: >> >> - Whatever that new RDMA socket type was that was deemed totally >> broken but only just after it hit a real release. >> - O_TMPFILE. I discovered that it corrupted filesystems in -rc6 or >> -rc7. That got fixed, the the API is still a steaming pile of crap. >> - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago. >> >> I'm sure there are tons more. >> >> Is this too crazy, or is it worth discussing? > > I think it is, it keeps coming up over and over and it's not getting any > easier. We are long past the time when we only had to duplicate what > other operating systems do, adding new features is much different. > > I like the "manually patch" thing as an good idea for how to maybe do > this, but who is going to do that patching for testing? What's the rule > for how long time has to pass before it can be enabled? Imo the real fix is to crank up requirements. What we have right now for gpu drivers is: - Kernel patch, fully reviewed. - Full docs for the internals for the same. - Testcases for the same. - The entire userspace stack using that feature. Not some tech demo, prototype or testcase, but the real deal. Also no vendor forks. In extreme cases this means patches to 3+ projects (other than the kernel) to wire the entire thing through the desktop stack. - Testcases for all those userspace bits. - Full review on all those userspace bits. This generally means kernel hackers review the userspace and userpace people the kernel bits too. - Not quite there yet, but some starts on real docs for the uapi. - Also not quite where it needs to be yet, but full CI over everything (at least on for the i915 driver, which is still the team that pushes for most of the generic drm uapi additions). Some drivers chicken a bit on some of these pieces (less docs/review), but for the big ones and all the core bits, that's what we do. That's some steep requirements, and it makes adding new uapi a fairly big pain, but as a result we generally ship it as soon as it hits linux-next, with very few regrets in hindsight. Ship as in, it e.g. hits a stable mese release which distros often ship in their stable release way before the much slower kernel release process has gone through it's paces. And yes, each of the rules we have for new uapi is because there's at least one case where we did regret not doing that :-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski 2017-08-04 1:30 ` Greg KH @ 2017-08-04 2:26 ` Theodore Ts'o 2017-08-04 3:27 ` Stephen Rothwell 2017-08-04 8:42 ` Jiri Kosina 2017-08-09 0:00 ` NeilBrown 2 siblings, 2 replies; 37+ messages in thread From: Theodore Ts'o @ 2017-08-04 2:26 UTC (permalink / raw) To: Andy Lutomirski; +Cc: ksummit-discuss On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote: > Maybe we could pull something off where big new features hide behind a > named feature gate for a while. That feature gate can only be enabled > under some circumstances that make it very hard to mistake it for true > stability. (For example, maybe you *can't* enable feature gates on a > final kernel unless you manually patch something.) I think the problem you've pointed out is a real and vexing one, and it's good we talk about possible solution. The challenge with using a feature gate is that the examples you gave weren't the simply, obvious cases of a new syscall, but flags to a system call (O_TMPFILE) or a new socket type (e.g., for RDMA). So that implies that adding a feature gate is going to requre making code changes to enforce the ability turn off the feature. Even if it's done using CONFIG_* #ifdefs and KConfig options, it's going to require more effort than cc'ing the patch to linux-api@vger.kernel.org. And if the problem is that aren't bothering to remember to cc linux-api@, I'm not sure it's realistic to think they will implement a feature gate. Furthermore, problems with an API design or implementation tend to get noticed either (a) when someone else does a code audit and tries to define exactly what the semantics will be for the new flag or syscall or socket type, or (b) when someone tries using it and discovers problems (usually not in the common path, since presumably the developer tested that bit of it, but in the error handling). The feature gate won't necessarily help with (a), except that it gives people a bit more time to notice that the new feature went in, and it probably actively makes (b) worse, since if it is under a feature gate, fewer people are likely to experiment with the new feature. One way that we could try to make things better is by having some kind of semi-automated system which monitors changes in include/uapi/*.h in linux-next. Unfortunately there will be a lot of false negatives, so it's going to require a human to figure out which of the changes represent new/changed API's, and which are just cleanups / rearragements. (We could try to see if we could train a Machine Learning model --- but even if we can make a some nueral nets play Go, I'm personally dubious this is something that ML would be successful at. I might be pleasantly surprised, though, if someone wanted to give it a try. :-) - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 2:26 ` Theodore Ts'o @ 2017-08-04 3:27 ` Stephen Rothwell 2017-08-04 5:13 ` Julia Lawall 2017-08-04 8:42 ` Jiri Kosina 1 sibling, 1 reply; 37+ messages in thread From: Stephen Rothwell @ 2017-08-04 3:27 UTC (permalink / raw) To: Theodore Ts'o; +Cc: ksummit-discuss, Andy Lutomirski Hi Ted, On Thu, 3 Aug 2017 22:26:39 -0400 Theodore Ts'o <tytso@mit.edu> wrote: > > One way that we could try to make things better is by having some kind > of semi-automated system which monitors changes in include/uapi/*.h in > linux-next. Unfortunately there will be a lot of false negatives, so > it's going to require a human to figure out which of the changes > represent new/changed API's, and which are just cleanups / > rearragements. (We could try to see if we could train a Machine > Learning model --- but even if we can make a some nueral nets play Go, > I'm personally dubious this is something that ML would be successful > at. I might be pleasantly surprised, though, if someone wanted to > give it a try. :-) OK, so this is what we have so far in linux-next: $ git log --no-merges --pretty="%h %s" stable..next-20170803 include/uapi 9aa302aaede8 membarrier: Expedited private command b3a88f222ae1 drm/msm: Add a parameter query for the number of ringbuffers 7f1779fd071b drm/msm: Implement per-submitqueue fences 20ab08455d3c drm/msm: Add per-instance submit queues 80fe969b5d50 uapi: fix linux/sysctl.h userspace compilation errors 4edc19eeeee3 mm: userfaultfd: add feature to request for a signal delivery 0d33bdf5ea6c mm: shm: use new hugetlb size encoding definitions b217edddf5c2 mm: arch: consolidate mmap hugetlb size encodings 21d75dd8e83d mm: hugetlb: define system call hugetlb size encodings in single file cdbc78ba7026 drm/msm: Remove __user from __u64 data types db1689aa61bd drm: Create a format/modifier blob e6fc3b68558e drm: Plumb modifiers through plane init bb7c19f96012 tcp: add related fields into SCM_TIMESTAMPING_OPT_STATS 3282e65558b3 tcp: remove unused mib counters 615095752100 netfilter: nf_tables: Allow object names of up to 255 chars 387454901bd6 netfilter: nf_tables: Allow set names of up to 255 chars b7263e071aba netfilter: nf_tables: Allow chain name of up to 255 chars e46abbcc05aa netfilter: nf_tables: Allow table names of up to 255 chars e62e484df049 net sched actions: add time filter for action dumping 90825b23a887 net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch 64c83d837329 net netlink: Add new type NLA_BITFIELD32 1a5f3da20bd9 net: ethtool: add support for forward error correction modes ca1136c99b66 blktrace: export cgroup info in trace f30994622b2b drm/vc4: Add an ioctl for labeling GEM BOs for summary stats 67cbe3532c2c RDMA/qedr: notify user application of supported WIDs ad84dad2160d RDMA/qedr: notify user application if DPM is supported cc731525f26a signal: Remove kernel interal si_code magic d08477aa975e fcntl: Don't use ambiguous SIG_POLL si_codes 3078f5f1bd8b IB/mlx4: Add support for RSS QP b8d46ca03506 IB/mlx4: Add support for WQ indirection table related verbs 400b1ebcfe31 IB/mlx4: Add support for WQ related verbs ea30b966f7dd IB/mlx4: Add inline-receive support 2dee0e545894 IB/uverbs: Enable QP creation with a given source QP number 784b4e612d42 netfilter: nf_tables: Attach process info to NFT_MSG_NEWGEN notifications ddc6c70f07bb rxrpc: Move the packet.h include file into net/rxrpc/ 727f8914477e rxrpc: Expose UAPI definitions to userspace eb0baf8a0d92 perf/core: Define the common branch type classification 6b2bbb08747a media: cec: rework the cec event handling 6303d97873d3 media: linux/cec.h: add pin monitoring API support fc60a8b675bd tty: serial: owl: Implement console driver 97f91a7cf04f bpf: add bpf_redirect_map helper routine 546ac1ffb70d bpf: add devmap, a map for storing net device references 814abfabef3c xdp: add bpf_redirect helper function 66bf97967726 annotate RWF_... flags 6545135a5ed2 drm/qxl: fix __user annotations Some of these are just fixing bugs or moving things around. I wonder where we can go from here? Does every change to a uapi file need a corresponding documentation and man pages update? Do we dare try doing (shock! horror!) design before implementation? ;-) The, of course, there is the whole sysfs and tracing mess :-( -- Cheers, Stephen Rothwell ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 3:27 ` Stephen Rothwell @ 2017-08-04 5:13 ` Julia Lawall 2017-08-04 14:20 ` Theodore Ts'o 0 siblings, 1 reply; 37+ messages in thread From: Julia Lawall @ 2017-08-04 5:13 UTC (permalink / raw) To: Stephen Rothwell; +Cc: Andy Lutomirski, ksummit-discuss On Fri, 4 Aug 2017, Stephen Rothwell wrote: > Hi Ted, > > On Thu, 3 Aug 2017 22:26:39 -0400 Theodore Ts'o <tytso@mit.edu> wrote: > > > > One way that we could try to make things better is by having some kind > > of semi-automated system which monitors changes in include/uapi/*.h in > > linux-next. Unfortunately there will be a lot of false negatives, so > > it's going to require a human to figure out which of the changes > > represent new/changed API's, and which are just cleanups / > > rearragements. (We could try to see if we could train a Machine > > Learning model --- but even if we can make a some nueral nets play Go, > > I'm personally dubious this is something that ML would be successful > > at. I might be pleasantly surprised, though, if someone wanted to > > give it a try. :-) > > OK, so this is what we have so far in linux-next: I did some work on a semantic patch for collecting the error codes returned by all of the system class. Things were going fairly well until I discovered that is fairly common near the user level to return error codes in reference parameters rather than by direct returns, and that meant that I was going to have to duplicate my entire rule set. I also observed that the documentation is not always that precise. It will say typically returns -E1, -E2, -E3, and may return other stuff, so in that case there is less to check. I was thinking of getting an intern to address the first problem, and then to check the results to the extent that they are checkable. One could also run the rules over time, to see if there have been changes, to see even in the other stuff case if something should be added. julia > > $ git log --no-merges --pretty="%h %s" stable..next-20170803 include/uapi > 9aa302aaede8 membarrier: Expedited private command > b3a88f222ae1 drm/msm: Add a parameter query for the number of ringbuffers > 7f1779fd071b drm/msm: Implement per-submitqueue fences > 20ab08455d3c drm/msm: Add per-instance submit queues > 80fe969b5d50 uapi: fix linux/sysctl.h userspace compilation errors > 4edc19eeeee3 mm: userfaultfd: add feature to request for a signal delivery > 0d33bdf5ea6c mm: shm: use new hugetlb size encoding definitions > b217edddf5c2 mm: arch: consolidate mmap hugetlb size encodings > 21d75dd8e83d mm: hugetlb: define system call hugetlb size encodings in single file > cdbc78ba7026 drm/msm: Remove __user from __u64 data types > db1689aa61bd drm: Create a format/modifier blob > e6fc3b68558e drm: Plumb modifiers through plane init > bb7c19f96012 tcp: add related fields into SCM_TIMESTAMPING_OPT_STATS > 3282e65558b3 tcp: remove unused mib counters > 615095752100 netfilter: nf_tables: Allow object names of up to 255 chars > 387454901bd6 netfilter: nf_tables: Allow set names of up to 255 chars > b7263e071aba netfilter: nf_tables: Allow chain name of up to 255 chars > e46abbcc05aa netfilter: nf_tables: Allow table names of up to 255 chars > e62e484df049 net sched actions: add time filter for action dumping > 90825b23a887 net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch > 64c83d837329 net netlink: Add new type NLA_BITFIELD32 > 1a5f3da20bd9 net: ethtool: add support for forward error correction modes > ca1136c99b66 blktrace: export cgroup info in trace > f30994622b2b drm/vc4: Add an ioctl for labeling GEM BOs for summary stats > 67cbe3532c2c RDMA/qedr: notify user application of supported WIDs > ad84dad2160d RDMA/qedr: notify user application if DPM is supported > cc731525f26a signal: Remove kernel interal si_code magic > d08477aa975e fcntl: Don't use ambiguous SIG_POLL si_codes > 3078f5f1bd8b IB/mlx4: Add support for RSS QP > b8d46ca03506 IB/mlx4: Add support for WQ indirection table related verbs > 400b1ebcfe31 IB/mlx4: Add support for WQ related verbs > ea30b966f7dd IB/mlx4: Add inline-receive support > 2dee0e545894 IB/uverbs: Enable QP creation with a given source QP number > 784b4e612d42 netfilter: nf_tables: Attach process info to NFT_MSG_NEWGEN notifications > ddc6c70f07bb rxrpc: Move the packet.h include file into net/rxrpc/ > 727f8914477e rxrpc: Expose UAPI definitions to userspace > eb0baf8a0d92 perf/core: Define the common branch type classification > 6b2bbb08747a media: cec: rework the cec event handling > 6303d97873d3 media: linux/cec.h: add pin monitoring API support > fc60a8b675bd tty: serial: owl: Implement console driver > 97f91a7cf04f bpf: add bpf_redirect_map helper routine > 546ac1ffb70d bpf: add devmap, a map for storing net device references > 814abfabef3c xdp: add bpf_redirect helper function > 66bf97967726 annotate RWF_... flags > 6545135a5ed2 drm/qxl: fix __user annotations > > Some of these are just fixing bugs or moving things around. I wonder > where we can go from here? Does every change to a uapi file need a > corresponding documentation and man pages update? Do we dare try doing > (shock! horror!) design before implementation? ;-) > > The, of course, there is the whole sysfs and tracing mess :-( > -- > Cheers, > Stephen Rothwell > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 5:13 ` Julia Lawall @ 2017-08-04 14:20 ` Theodore Ts'o 2017-08-04 15:47 ` Julia Lawall 0 siblings, 1 reply; 37+ messages in thread From: Theodore Ts'o @ 2017-08-04 14:20 UTC (permalink / raw) To: Julia Lawall; +Cc: Andy Lutomirski, ksummit-discuss On Fri, Aug 04, 2017 at 07:13:10AM +0200, Julia Lawall wrote: > I did some work on a semantic patch for collecting the error codes > returned by all of the system class. Things were going fairly well until > I discovered that is fairly common near the user level to return error > codes in reference parameters rather than by direct returns, and that > meant that I was going to have to duplicate my entire rule set. I also > observed that the documentation is not always that precise. It will say > typically returns -E1, -E2, -E3, and may return other stuff, so in that > case there is less to check. Yeah, including potential new error returns as "changes to the ABI/API" is probably simply not practical. Adding a return for, say, ENOMEM instead of causing a kernel oops is not something that needs to be debated on the linux-api mailing list! I recall, many years ago, an executive being indignant because Linux was returning some error code for some syscall operation involving network file system because it returned an network-related errno that was not explicitly listed in POSIX for a file system related syscall, and demanded that we fix the problem. I had to gently point out to said gentleman (since I was working for the Linux Foundation at the time and he worked for a platinum sponsor :-) that POSIX as a blanket statment allows confirming implementations' system calls to return additional error codes as necessary. I think people are much more concerned when there is a new system call, or a new flag added to a core syscall (e.g., O_TMPFILE). I suspect that we required all new device ioctls and new flags to device ioctls to get the linux-api@ treatment that we would get mass resistance and the workload would not be practical. And this list doesn't even consider new sysfs files, new tracepoints, etc., etc. Although technically speaking this is all "API's" I think we need to pick our battles and start with a tractable subset of the problem... - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 14:20 ` Theodore Ts'o @ 2017-08-04 15:47 ` Julia Lawall 0 siblings, 0 replies; 37+ messages in thread From: Julia Lawall @ 2017-08-04 15:47 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Andy Lutomirski, ksummit-discuss On Fri, 4 Aug 2017, Theodore Ts'o wrote: > On Fri, Aug 04, 2017 at 07:13:10AM +0200, Julia Lawall wrote: > > I did some work on a semantic patch for collecting the error codes > > returned by all of the system class. Things were going fairly well until > > I discovered that is fairly common near the user level to return error > > codes in reference parameters rather than by direct returns, and that > > meant that I was going to have to duplicate my entire rule set. I also > > observed that the documentation is not always that precise. It will say > > typically returns -E1, -E2, -E3, and may return other stuff, so in that > > case there is less to check. > > Yeah, including potential new error returns as "changes to the > ABI/API" is probably simply not practical. Adding a return for, say, > ENOMEM instead of causing a kernel oops is not something that needs to > be debated on the linux-api mailing list! > > I recall, many years ago, an executive being indignant because Linux > was returning some error code for some syscall operation involving > network file system because it returned an network-related errno that > was not explicitly listed in POSIX for a file system related syscall, > and demanded that we fix the problem. I had to gently point out to > said gentleman (since I was working for the Linux Foundation at the > time and he worked for a platinum sponsor :-) that POSIX as a blanket > statment allows confirming implementations' system calls to return > additional error codes as necessary. > > I think people are much more concerned when there is a new system > call, or a new flag added to a core syscall (e.g., O_TMPFILE). I > suspect that we required all new device ioctls and new flags to device > ioctls to get the linux-api@ treatment that we would get mass > resistance and the workload would not be practical. And this list > doesn't even consider new sysfs files, new tracepoints, etc., etc. I guess that new stsem calls would be easy to recognize, if they all start with SYSCALL_DEFINE1, etc? New flags could be #defines that are added to uapi .h file and that are used in some similar way to other flags mentioned in the documentation? So if the code already contains eg x & O_APPEND and there appears x & O_TMPFILE and the documentation mentions O_APPEND, then it should now mention O_TMPFILE too? julia > > Although technically speaking this is all "API's" I think we need to > pick our battles and start with a tractable subset of the problem... > > - Ted > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 2:26 ` Theodore Ts'o 2017-08-04 3:27 ` Stephen Rothwell @ 2017-08-04 8:42 ` Jiri Kosina 2017-08-04 8:53 ` Hannes Reinecke 2017-08-04 8:57 ` Julia Lawall 1 sibling, 2 replies; 37+ messages in thread From: Jiri Kosina @ 2017-08-04 8:42 UTC (permalink / raw) To: Theodore Ts'o; +Cc: ksummit-discuss, Andy Lutomirski On Thu, 3 Aug 2017, Theodore Ts'o wrote: > One way that we could try to make things better is by having some kind > of semi-automated system which monitors changes in include/uapi/*.h in > linux-next. It's unfortunately just uapi though, and for sysfs it's a bit more difficult to define a pathname pattern to watch for. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 8:42 ` Jiri Kosina @ 2017-08-04 8:53 ` Hannes Reinecke 2017-08-04 16:04 ` Greg KH 2017-08-04 8:57 ` Julia Lawall 1 sibling, 1 reply; 37+ messages in thread From: Hannes Reinecke @ 2017-08-04 8:53 UTC (permalink / raw) To: ksummit-discuss On 08/04/2017 10:42 AM, Jiri Kosina wrote: > On Thu, 3 Aug 2017, Theodore Ts'o wrote: > >> One way that we could try to make things better is by having some kind >> of semi-automated system which monitors changes in include/uapi/*.h in >> linux-next. > > It's unfortunately just uapi though, and for sysfs it's a bit more > difficult to define a pathname pattern to watch for. > Yeah; that has been my main headache with the kABI stuff. Nowadays sysfs is considered part of the kABI, but we have no way of tracking it; we basically rely on people filling out some off-side documentation, and hope they're not missing anything. And we don't mess up when generating patches :-) That, and the infamous 'internal symbol' discussion. (Meaning that we only can declare symbols as exported, even though they really should only be visible to that driver, not anything else) Which leads to tons of false positives, and discussions about why this really is meant to be an internal symbol. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 8:53 ` Hannes Reinecke @ 2017-08-04 16:04 ` Greg KH 2017-08-04 17:14 ` Theodore Ts'o 2017-08-14 19:49 ` Steven Rostedt 0 siblings, 2 replies; 37+ messages in thread From: Greg KH @ 2017-08-04 16:04 UTC (permalink / raw) To: Hannes Reinecke; +Cc: ksummit-discuss On Fri, Aug 04, 2017 at 10:53:01AM +0200, Hannes Reinecke wrote: > On 08/04/2017 10:42 AM, Jiri Kosina wrote: > > On Thu, 3 Aug 2017, Theodore Ts'o wrote: > > > >> One way that we could try to make things better is by having some kind > >> of semi-automated system which monitors changes in include/uapi/*.h in > >> linux-next. > > > > It's unfortunately just uapi though, and for sysfs it's a bit more > > difficult to define a pathname pattern to watch for. > > > Yeah; that has been my main headache with the kABI stuff. > Nowadays sysfs is considered part of the kABI, but we have no way of > tracking it; we basically rely on people filling out some off-side > documentation, and hope they're not missing anything. > And we don't mess up when generating patches :-) We could start searching linux-next for new additions of sysfs files (search for the ATTR macros), and complain that there are no matching Documentation/ABI/ updates at the same time. I try to do that when reviewing patches that come through my trees, but yes, this is hard to keep up to date with. Sounds like a good GSoC project though, setting up the infrastructure to do this in a semi-automated fashion. greg k-h ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 16:04 ` Greg KH @ 2017-08-04 17:14 ` Theodore Ts'o 2017-08-04 17:53 ` Greg KH 2017-08-14 19:49 ` Steven Rostedt 1 sibling, 1 reply; 37+ messages in thread From: Theodore Ts'o @ 2017-08-04 17:14 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote: > > We could start searching linux-next for new additions of sysfs files > (search for the ATTR macros), and complain that there are no matching > Documentation/ABI/ updates at the same time. I try to do that when > reviewing patches that come through my trees, but yes, this is hard to > keep up to date with. > > Sounds like a good GSoC project though, setting up the infrastructure to > do this in a semi-automated fashion. This sounds like an obvious thing to add to checkpatch? - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 17:14 ` Theodore Ts'o @ 2017-08-04 17:53 ` Greg KH 2017-08-04 22:52 ` Joe Perches 2017-08-09 20:06 ` Geert Uytterhoeven 0 siblings, 2 replies; 37+ messages in thread From: Greg KH @ 2017-08-04 17:53 UTC (permalink / raw) To: Theodore Ts'o; +Cc: ksummit-discuss On Fri, Aug 04, 2017 at 01:14:44PM -0400, Theodore Ts'o wrote: > On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote: > > > > We could start searching linux-next for new additions of sysfs files > > (search for the ATTR macros), and complain that there are no matching > > Documentation/ABI/ updates at the same time. I try to do that when > > reviewing patches that come through my trees, but yes, this is hard to > > keep up to date with. > > > > Sounds like a good GSoC project though, setting up the infrastructure to > > do this in a semi-automated fashion. > > This sounds like an obvious thing to add to checkpatch? Probably, but lots of times this would be a false-positive as documentation shows up in a later patch in the series to make things easier to review. But if you want to try to add it to checkpatch, be my guest, last time I tried I gave up in as the regexes there brought me to tears... good luck! greg k-h ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 17:53 ` Greg KH @ 2017-08-04 22:52 ` Joe Perches 2017-08-09 20:06 ` Geert Uytterhoeven 1 sibling, 0 replies; 37+ messages in thread From: Joe Perches @ 2017-08-04 22:52 UTC (permalink / raw) To: Greg KH, Theodore Ts'o; +Cc: ksummit-discuss On Fri, 2017-08-04 at 10:53 -0700, Greg KH wrote: > On Fri, Aug 04, 2017 at 01:14:44PM -0400, Theodore Ts'o wrote: > > On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote: > > > > > > We could start searching linux-next for new additions of sysfs files > > > (search for the ATTR macros), and complain that there are no matching > > > Documentation/ABI/ updates at the same time. I try to do that when > > > reviewing patches that come through my trees, but yes, this is hard to > > > keep up to date with. > > > > > > Sounds like a good GSoC project though, setting up the infrastructure to > > > do this in a semi-automated fashion. > > > > This sounds like an obvious thing to add to checkpatch? > > Probably, but lots of times this would be a false-positive as > documentation shows up in a later patch in the series to make things > easier to review. > > But if you want to try to add it to checkpatch, be my guest, last time I > tried I gave up in as the regexes there brought me to tears... Piker. Even Linus has adapted to perl regexes... cheers, Joe ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 17:53 ` Greg KH 2017-08-04 22:52 ` Joe Perches @ 2017-08-09 20:06 ` Geert Uytterhoeven 1 sibling, 0 replies; 37+ messages in thread From: Geert Uytterhoeven @ 2017-08-09 20:06 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss On Fri, Aug 4, 2017 at 7:53 PM, Greg KH <greg@kroah.com> wrote: > On Fri, Aug 04, 2017 at 01:14:44PM -0400, Theodore Ts'o wrote: >> On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote: >> > We could start searching linux-next for new additions of sysfs files >> > (search for the ATTR macros), and complain that there are no matching >> > Documentation/ABI/ updates at the same time. I try to do that when >> > reviewing patches that come through my trees, but yes, this is hard to >> > keep up to date with. >> > >> > Sounds like a good GSoC project though, setting up the infrastructure to >> > do this in a semi-automated fashion. >> >> This sounds like an obvious thing to add to checkpatch? > > Probably, but lots of times this would be a false-positive as > documentation shows up in a later patch in the series to make things > easier to review. On the sender side, checkpatch will work fine, as it works against your current tree, which usually contains all patches from the series you've just created. On the receiver side, a different order will indeed cause false positives. But usually you can catch people not having run checkpatch before sending their patches by the presence of other checkpatch issues, so those can be used as a canary to switch to "more thorough review mode". Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 16:04 ` Greg KH 2017-08-04 17:14 ` Theodore Ts'o @ 2017-08-14 19:49 ` Steven Rostedt 2017-08-14 19:51 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2017-08-14 19:49 UTC (permalink / raw) To: Greg KH; +Cc: ksummit-discuss On Fri, 4 Aug 2017 09:04:54 -0700 Greg KH <greg@kroah.com> wrote: > We could start searching linux-next for new additions of sysfs files > (search for the ATTR macros), and complain that there are no matching > Documentation/ABI/ updates at the same time. I try to do that when > reviewing patches that come through my trees, but yes, this is hard to > keep up to date with. > > Sounds like a good GSoC project though, setting up the infrastructure to > do this in a semi-automated fashion. And perhaps do the same for new tracepoints. I'm wondering if we should start documenting all tracepoints, and have them not be added unless there's documentation with them. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-14 19:49 ` Steven Rostedt @ 2017-08-14 19:51 ` Linus Torvalds 2017-08-15 7:13 ` Julia Lawall 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2017-08-14 19:51 UTC (permalink / raw) To: Steven Rostedt; +Cc: ksummit On Mon, Aug 14, 2017 at 12:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 4 Aug 2017 09:04:54 -0700 > Greg KH <greg@kroah.com> wrote: > > >> We could start searching linux-next for new additions of sysfs files >> (search for the ATTR macros), and complain that there are no matching >> Documentation/ABI/ updates at the same time. I try to do that when >> reviewing patches that come through my trees, but yes, this is hard to >> keep up to date with. >> >> Sounds like a good GSoC project though, setting up the infrastructure to >> do this in a semi-automated fashion. > > And perhaps do the same for new tracepoints. Honestly, the *real* issue has traditionally been new ioctl's, not so much sysfs files or tracepoints. People add random device-specific crud that then has issues with alignment or word size. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-14 19:51 ` Linus Torvalds @ 2017-08-15 7:13 ` Julia Lawall 0 siblings, 0 replies; 37+ messages in thread From: Julia Lawall @ 2017-08-15 7:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: ksummit On Mon, 14 Aug 2017, Linus Torvalds wrote: > On Mon, Aug 14, 2017 at 12:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 4 Aug 2017 09:04:54 -0700 > > Greg KH <greg@kroah.com> wrote: > > > > > >> We could start searching linux-next for new additions of sysfs files > >> (search for the ATTR macros), and complain that there are no matching > >> Documentation/ABI/ updates at the same time. I try to do that when > >> reviewing patches that come through my trees, but yes, this is hard to > >> keep up to date with. > >> > >> Sounds like a good GSoC project though, setting up the infrastructure to > >> do this in a semi-automated fashion. > > > > And perhaps do the same for new tracepoints. > > Honestly, the *real* issue has traditionally been new ioctl's, not so > much sysfs files or tracepoints. > > People add random device-specific crud that then has issues with > alignment or word size. In terms of documentation, there are around 2500 names defined using _IOW, _IOR, or _IOWR, and around 500 of them are mentioned somewhere in Documentation. julia ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 8:42 ` Jiri Kosina 2017-08-04 8:53 ` Hannes Reinecke @ 2017-08-04 8:57 ` Julia Lawall 2017-08-04 11:27 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 37+ messages in thread From: Julia Lawall @ 2017-08-04 8:57 UTC (permalink / raw) To: Jiri Kosina; +Cc: Andy Lutomirski, ksummit-discuss On Fri, 4 Aug 2017, Jiri Kosina wrote: > On Thu, 3 Aug 2017, Theodore Ts'o wrote: > > > One way that we could try to make things better is by having some kind > > of semi-automated system which monitors changes in include/uapi/*.h in > > linux-next. > > It's unfortunately just uapi though, and for sysfs it's a bit more > difficult to define a pathname pattern to watch for. I think that Michael Kerrisk has a big list of regexps in this direction. julia ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 8:57 ` Julia Lawall @ 2017-08-04 11:27 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-08-04 11:27 UTC (permalink / raw) To: Julia Lawall; +Cc: ksummit-discuss, Andy Lutomirski On 4 August 2017 at 10:57, Julia Lawall <julia.lawall@lip6.fr> wrote: > > > On Fri, 4 Aug 2017, Jiri Kosina wrote: > >> On Thu, 3 Aug 2017, Theodore Ts'o wrote: >> >> > One way that we could try to make things better is by having some kind >> > of semi-automated system which monitors changes in include/uapi/*.h in >> > linux-next. >> >> It's unfortunately just uapi though, and for sysfs it's a bit more >> difficult to define a pathname pattern to watch for. > > I think that Michael Kerrisk has a big list of regexps in this direction. The list I have is not very big, and is rather unsophisticated (and manually maintained). But it helps me spot some new features that I would otherwise miss and that should be documented in the man pages. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-04 1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski 2017-08-04 1:30 ` Greg KH 2017-08-04 2:26 ` Theodore Ts'o @ 2017-08-09 0:00 ` NeilBrown 2017-08-09 11:54 ` Laurent Pinchart ` (2 more replies) 2 siblings, 3 replies; 37+ messages in thread From: NeilBrown @ 2017-08-09 0:00 UTC (permalink / raw) To: Andy Lutomirski, ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 4526 bytes --] On Thu, Aug 03 2017, Andy Lutomirski wrote: > [Note: I'm not entirely sure I can make it to the kernel summit this > year, due to having a tiny person and tons of travel] > > This may be highly controversial, but: there seems to be a weakness in > the kernel development model in the way that new ABI features become > stable. The current model is, roughly: > > 1. Someone writes the code. Maybe they cc linux-abi, maybe they don't. > 2. People hopefully review the code. > 3. A subsystem maintainer merges the code. They hope the ABI is right. > 4. Linus gets a pull request. Linus probably doesn't review the ABI > for sanity, style, blatant bugs, etc. If Linus did, then he'd never > get anything else done. > 5. The new ABI lands in -rc1. > 6. If someone finds a problem or objects, it had better get fixed > before the next real release. > > There's a few problems here. One is that the people who would really > review the ABI might not even notice until step 5 or 6 or so. Another > is that it takes some time for userspace to get experience with a new > ABI. > > I'm wondering if there are other models that could work. I think it > would be nice for us to be able to land a kernel in Linus tree and > still wait a while before stabilizing it. Rust, for example, has a > strict policy for this that seems to work quite well. > > Maybe we could pull something off where big new features hide behind a > named feature gate for a while. That feature gate can only be enabled > under some circumstances that make it very hard to mistake it for true > stability. (For example, maybe you *can't* enable feature gates on a > final kernel unless you manually patch something.) > > Here are a few examples that come to mind for where this would have helped: > > - Whatever that new RDMA socket type was that was deemed totally > broken but only just after it hit a real release. > - O_TMPFILE. I discovered that it corrupted filesystems in -rc6 or > -rc7. That got fixed, the the API is still a steaming pile of crap. > - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago. > > I'm sure there are tons more. > > Is this too crazy, or is it worth discussing? I think this is a real issue and it would be good to see improvements. I think this is primarily a social/communication issue. We need to know what is expected and what can be trusted. We need clear rules that everyone knows and that work for everyone. Currently we have (fairly) clear rules that work fairly well in many cases, but can be problematic. The rules, as you outline, are that users should not experience regressions from one released kernel to a subsequent released kernel. So people working on -rc kernels can expect to experience regressions. Also kernel devs are free to create theoretical regressions as long an no-one experiences them. My strawman is to suggest that we relax this. We change the promise "if it works on a released kernel, it will work on all future released kernels", to "if it works on N consecutive released kernels, it will work on all future released kernels", and then bikeshed the value of N, but probably settle on N=2. This should give important new freedom to kernel developers, and impose a (hopefully) small burden on application developers. They should be testing their code anyway (we all should), now they have to test it twice. To make that burden smaller, we could aim to apply all "new API fixes" to the -stable kernels promptly. If a new API appears in Linux N it might behave differently in N+1, but in that case the first N.M stable kernel released after N+1 will also have the new behaviour. So developing against that N.M should always be safe. Any APIs it has are declared to be stable. My other strawman is to declare that if an API is not documented, then it isn't stable. People are welcome to use undocumented APIs, but when their app breaks, they get to keep both parts. Of course, if the documentation is wrong, that puts us in an awkward place - especially if the documented behaviour is impossible to implement. We can then schedule the release of the documentation at whatever time seems appropriate given the complexity and utility of the particular API. My main point here is that I think the only real solution here is to revise the current social contract. Trying to use technology to detect API changes - as has been suggested in this thread - is not a bad idea, but is unlikely to catch the really important problems. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-09 0:00 ` NeilBrown @ 2017-08-09 11:54 ` Laurent Pinchart 2017-08-14 20:07 ` Steven Rostedt 2017-08-09 20:21 ` Linus Torvalds 2017-08-15 18:26 ` Michael Kerrisk (man-pages) 2 siblings, 1 reply; 37+ messages in thread From: Laurent Pinchart @ 2017-08-09 11:54 UTC (permalink / raw) To: ksummit-discuss; +Cc: Andy Lutomirski Hi Neil, On Wednesday 09 Aug 2017 10:00:51 NeilBrown wrote: > On Thu, Aug 03 2017, Andy Lutomirski wrote: > > [Note: I'm not entirely sure I can make it to the kernel summit this > > year, due to having a tiny person and tons of travel] > > > > This may be highly controversial, but: there seems to be a weakness in > > the kernel development model in the way that new ABI features become > > stable. The current model is, roughly: > > > > 1. Someone writes the code. Maybe they cc linux-abi, maybe they don't. > > 2. People hopefully review the code. > > 3. A subsystem maintainer merges the code. They hope the ABI is right. > > 4. Linus gets a pull request. Linus probably doesn't review the ABI > > for sanity, style, blatant bugs, etc. If Linus did, then he'd never > > get anything else done. > > 5. The new ABI lands in -rc1. > > 6. If someone finds a problem or objects, it had better get fixed > > before the next real release. > > > > There's a few problems here. One is that the people who would really > > review the ABI might not even notice until step 5 or 6 or so. Another > > is that it takes some time for userspace to get experience with a new > > ABI. > > > > I'm wondering if there are other models that could work. I think it > > would be nice for us to be able to land a kernel in Linus tree and > > still wait a while before stabilizing it. Rust, for example, has a > > strict policy for this that seems to work quite well. > > > > Maybe we could pull something off where big new features hide behind a > > named feature gate for a while. That feature gate can only be enabled > > under some circumstances that make it very hard to mistake it for true > > stability. (For example, maybe you *can't* enable feature gates on a > > final kernel unless you manually patch something.) > > > > Here are a few examples that come to mind for where this would have > > helped: > > - Whatever that new RDMA socket type was that was deemed totally > > broken but only just after it hit a real release. > > - O_TMPFILE. I discovered that it corrupted filesystems in -rc6 or > > -rc7. That got fixed, the the API is still a steaming pile of crap. > > - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few > > releases ago. > > > > I'm sure there are tons more. > > > > Is this too crazy, or is it worth discussing? > > I think this is a real issue and it would be good to see improvements. > > I think this is primarily a social/communication issue. We need to know > what is expected and what can be trusted. We need clear rules that > everyone knows and that work for everyone. Currently we have (fairly) > clear rules that work fairly well in many cases, but can be problematic. > > The rules, as you outline, are that users should not experience > regressions from one released kernel to a subsequent released kernel. > So people working on -rc kernels can expect to experience regressions. > Also kernel devs are free to create theoretical regressions as long an > no-one experiences them. > > My strawman is to suggest that we relax this. We change the promise "if > it works on a released kernel, it will work on all future released > kernels", to "if it works on N consecutive released kernels, it will > work on all future released kernels", and then bikeshed the value of N, > but probably settle on N=2. > This should give important new freedom to kernel developers, and impose > a (hopefully) small burden on application developers. They should be > testing their code anyway (we all should), now they have to test it > twice. > To make that burden smaller, we could aim to apply all "new API fixes" > to the -stable kernels promptly. > If a new API appears in Linux N it might behave differently in N+1, but > in that case the first N.M stable kernel released after N+1 will also > have the new behaviour. > So developing against that N.M should always be safe. Any APIs it has > are declared to be stable. I fear this will lead us to a situation where new APIs will receive less scrutiny because developers will rely on the ability to change the API for the next kernel. Of course they will then be sidetracked by something else, and the next kernel will be released without any API change. I might be overly pessimistic here, but I don't think we will be able to tackle what is largely a human problem (not paying enough attention to new APIs) with a small process adjustment. Let's face it, as long as we don't educate developers about APIs, we won't get this right, exactly the same way that developers need to be educated about security or race conditions. Education is a slow process but gives the best results. What we should first aim for, in my opinion, isn't to turn everybody into an API expert, but to have enough reviewers who can spot API changes and wave a red flag if the change hasn't gone to a proper review process. Part of this could possibly be automated as discussed in this mail thread, but at the end of the day it's really about a culture change to make sure APIs are treated with enough care. Now, assuming we can fix this first problem and get all new APIs properly reviewed and tested, the next question is what a proper review and test process should be. The DRM/KMS subsystem has put a process in place (as explained by Daniel Vetter in this mail thread) where every new API has to be implemented in real userspace components (and thus not just in test tools) and approved by the appropriate maintainers. The bar is pretty high, and possibly too high, but it is in my opinion better than the other way around. Yes, this will slow down patch acceptance, but I don't think that's a problem, quite the contrary. I'd rather slow down merging new APIs upstream than having to live with lots of crappy APIs, as long as the development process at the subsystem level is not slowed down. That's where process and infrastructure could help, to ensure that userspace components consuming new APIs can easily find the kernel code they need to test. I don't think named feature gates, as proposed by Andy, are needed (we had that a while ago, it was called CONFIG_EXPERIMENTAL, and proved to be useless), but I'm open to discussion in that area. > My other strawman is to declare that if an API is not documented, then > it isn't stable. People are welcome to use undocumented APIs, but when > their app breaks, they get to keep both parts. Of course, if the > documentation is wrong, that puts us in an awkward place - especially if > the documented behaviour is impossible to implement. We can then > schedule the release of the documentation at whatever time seems > appropriate given the complexity and utility of the particular API. I'd go one step further and say that every API has to be documented. There will always be undocumented features in every API as no documentation is perfect, and corner cases that nobody thought about can result in interesting undocumented behaviour that userspace starts relying on, but documentation is a must, and should not be written after the code stabilizes. Writing documentation is actually a good way to realize that an API is broken. > My main point here is that I think the only real solution here is to > revise the current social contract. Trying to use technology to detect > API changes - as has been suggested in this thread - is not a bad idea, > but is unlikely to catch the really important problems. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-09 11:54 ` Laurent Pinchart @ 2017-08-14 20:07 ` Steven Rostedt 0 siblings, 0 replies; 37+ messages in thread From: Steven Rostedt @ 2017-08-14 20:07 UTC (permalink / raw) To: Laurent Pinchart; +Cc: Andy Lutomirski, ksummit-discuss On Wed, 09 Aug 2017 14:54:10 +0300 Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > Hi Neil, > > > > I'm wondering if there are other models that could work. I think it > > > would be nice for us to be able to land a kernel in Linus tree and > > > still wait a while before stabilizing it. Rust, for example, has a > > > strict policy for this that seems to work quite well. I like the model of having to add a patch to implement a new ABI. Because then technically it never broke if some user space app depended on it, as the API never existed without modifying the kernel. But I also wonder if we could have a linux-api similar to linux-next. Where the linux-api is used to test api's until they are ready. It should follow Linus's tree, similar to linux-next, where new features get merged in nightly. But the difference from linux-next is that the api doesn't have to go into Linus's tree at the next merge window. It would be a place that APIs could be tested and changed, and not get into Linus's tree till it is ready. > Education is a slow process but gives the best results. What we should first > aim for, in my opinion, isn't to turn everybody into an API expert, but to > have enough reviewers who can spot API changes and wave a red flag if the > change hasn't gone to a proper review process. Part of this could possibly be > automated as discussed in this mail thread, but at the end of the day it's > really about a culture change to make sure APIs are treated with enough care. I would recommend a static analyzer that looks at linux-next for new APIs, and flags anything that it finds, where others can audit them. Perhaps make a rule that no new API is added without documentation, or if we have the above linux-api, going through that too. This will only work if it is automated. Linus could see the list of new APIs added and determine if it should be pulled or not. It would be too much work for him to search the code for new APIs, but a tool that gives a simple list that he can check them off after he agrees with them, may be scalable. > > Now, assuming we can fix this first problem and get all new APIs properly > reviewed and tested, the next question is what a proper review and test > process should be. The DRM/KMS subsystem has put a process in place (as > explained by Daniel Vetter in this mail thread) where every new API has to be > implemented in real userspace components (and thus not just in test tools) and > approved by the appropriate maintainers. The bar is pretty high, and possibly > too high, but it is in my opinion better than the other way around. As Linux becomes more advanced and used in more critical systems, I want that bar to rise. I'm trying to police myself with new features, and make sure they are all documented before adding them as well. > > Yes, this will slow down patch acceptance, but I don't think that's a problem, > quite the contrary. I'd rather slow down merging new APIs upstream than having > to live with lots of crappy APIs, as long as the development process at the > subsystem level is not slowed down. That's where process and infrastructure > could help, to ensure that userspace components consuming new APIs can easily > find the kernel code they need to test. I don't think named feature gates, as > proposed by Andy, are needed (we had that a while ago, it was called > CONFIG_EXPERIMENTAL, and proved to be useless), but I'm open to discussion in > that area. We could add a new label CONFIG_TEST_ABI which acts like CONFIG_BROKEN and doesn't compile that code. One would have to manually remove the config (hence patch the kernel) to have it compile. depends on CONFIG_TEST_ABI would need to be manually changed to depends on CONFIG_RUN_ABI (OK, I suck with names) and then it would be compiled in. This would still be in line with Linus's (don't break existing kernels), as the code he ships will never actually execute without modification. And if you modify it to run an app, then it's your fault if the app breaks because it changes. > I'd go one step further and say that every API has to be documented. There > will always be undocumented features in every API as no documentation is > perfect, and corner cases that nobody thought about can result in interesting > undocumented behaviour that userspace starts relying on, but documentation is > a must, and should not be written after the code stabilizes. Writing > documentation is actually a good way to realize that an API is broken. +1 > > > My main point here is that I think the only real solution here is to > > revise the current social contract. Trying to use technology to detect > > API changes - as has been suggested in this thread - is not a bad idea, > > but is unlikely to catch the really important problems. > But it will definitely help. We can't implement any of this without tools to track down API changes. If a new API is added, at the very minimum, there must be some documentation with it. Linus will have the final say, but it would go a long way if he was able to run some tool on a series of pull requests to see what new APIs have been added, and then decided if he should revert them or not if they appear that they will become unmaintainable in the future. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-09 0:00 ` NeilBrown 2017-08-09 11:54 ` Laurent Pinchart @ 2017-08-09 20:21 ` Linus Torvalds 2017-08-11 6:21 ` NeilBrown 2017-08-15 18:26 ` Michael Kerrisk (man-pages) 2 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2017-08-09 20:21 UTC (permalink / raw) To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski On Tue, Aug 8, 2017 at 5:00 PM, NeilBrown <neilb@suse.com> wrote: > > I think this is primarily a social/communication issue. We need to know > what is expected and what can be trusted. We need clear rules that > everyone knows and that work for everyone. Currently we have (fairly) > clear rules that work fairly well in many cases, but can be problematic. > > The rules, as you outline, are that users should not experience > regressions from one released kernel to a subsequent released kernel. > So people working on -rc kernels can expect to experience regressions. > Also kernel devs are free to create theoretical regressions as long an > no-one experiences them. > > My strawman is to suggest that we relax this. No. The whole "no regressions" is a hard rule, and it will remain so. It's pretty much the only really hard rule we have, and I will continue to insist on it. There are no loopholes. No "but it's been only one release". No, no, no. The whole point is that users are supposed to be able to *trust* the kernel. If we do something, we keep on doing it. And if it makes it harder to add new user-visible interfaces, then that's a *good* thing. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-09 20:21 ` Linus Torvalds @ 2017-08-11 6:21 ` NeilBrown 2017-08-11 6:39 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: NeilBrown @ 2017-08-11 6:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski [-- Attachment #1: Type: text/plain, Size: 3219 bytes --] On Wed, Aug 09 2017, Linus Torvalds wrote: > On Tue, Aug 8, 2017 at 5:00 PM, NeilBrown <neilb@suse.com> wrote: >> >> I think this is primarily a social/communication issue. We need to know >> what is expected and what can be trusted. We need clear rules that >> everyone knows and that work for everyone. Currently we have (fairly) >> clear rules that work fairly well in many cases, but can be problematic. >> >> The rules, as you outline, are that users should not experience >> regressions from one released kernel to a subsequent released kernel. >> So people working on -rc kernels can expect to experience regressions. >> Also kernel devs are free to create theoretical regressions as long an >> no-one experiences them. >> >> My strawman is to suggest that we relax this. > > No. > > The whole "no regressions" is a hard rule, and it will remain so. > It's pretty much the only really hard rule we have, and I will > continue to insist on it. > > There are no loopholes. No "but it's been only one release". No, no, > no. The whole point is that users are supposed to be able to *trust* > the kernel. If we do something, we keep on doing it. I completely agree with the "trust" issue. I don't think my proposal violates it. It just changes the names of the things that can be trusted. You could easily change them back. e.g. - When you are ready to release 4.13, call it 4.13-pre1 instead - You then open the merge window and pull changes onto this base working towards 4.14-rc1. Just as you currently do, you accept changes to the API for interfaces that have not appeared in a released kernel (and 4.13 hasn't been released at this point). - Greg takes over 4.13-pre and applied patches tagged for "stable" exactly as he currently does, except that he calls the first few releases "4.13-pre2" and "4.13-pre3" etc. These "stable" patches might include changes to APIs that were introduced since 4.12, changes that you have already included in 4.14-rcX. - After you release 4.14-pre1, the next kernel that Greg releases in the 4.13-preX series gets called "4.13", and subsequent kernels in that series are "4.13.1" etc as normal. With this pattern, people can still trust an X.Y kernel, possible more than they currently do (how many people wait for X.Y.3 before they will move?). With this pattern, we still get an X.Y every 2 months or so (except for one 4 month gap at the change-over). The main difference is that we are a bit more honest about how long it takes to bake a kernel before it is ready. We also get more time to document and fix broken APIs. "pre" is probably weird, and "rc" doesn't really mean "release candidate" these days, except for rc7. Maybe you could call your "rc" kernels dev1, dev2, etc. Then Greg could use "rcX" for the real release candidates. But naming is hard. > > And if it makes it harder to add new user-visible interfaces, then > that's a *good* thing. I think the point is that it is currently too easy to add user-visible changes (extra flags, etc), and none of the proposals actually make it harder. They just try to make them more visible. The proposal makes it harder in that it forces an extra 2 month delay. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-11 6:21 ` NeilBrown @ 2017-08-11 6:39 ` Linus Torvalds 2017-08-11 8:02 ` NeilBrown 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2017-08-11 6:39 UTC (permalink / raw) To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski On Thu, Aug 10, 2017 at 11:21 PM, NeilBrown <neilb@suse.com> wrote: > > With this pattern, people can still trust an X.Y kernel, I do *NOT* want people to trust an X.Y kernel. Quite the opposite. I want people to realize that the version doesn't matter, and that they should feel safe in upgrading. The X and the Y don't matter, and they *MUST*NOT*MATTER*. If they do, the process is completely and utterly broken. So what people should be able to trust is that they can always upgrade. Not the shit that I see *ALL* the time, where you upgrade something, and it breaks. And no, the excuse "but the API was new in X.Y, so it could change in X.Y+1" does *not* hold water. It very much violates that basic principle of trust and makes people go "I don't want to upgrade, because it might break something I do". Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-11 6:39 ` Linus Torvalds @ 2017-08-11 8:02 ` NeilBrown 2017-08-11 23:10 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: NeilBrown @ 2017-08-11 8:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski [-- Attachment #1: Type: text/plain, Size: 1029 bytes --] On Thu, Aug 10 2017, Linus Torvalds wrote: > On Thu, Aug 10, 2017 at 11:21 PM, NeilBrown <neilb@suse.com> wrote: >> >> With this pattern, people can still trust an X.Y kernel, > > I do *NOT* want people to trust an X.Y kernel. > > Quite the opposite. > > I want people to realize that the version doesn't matter, and that > they should feel safe in upgrading. The X and the Y don't matter, and > they *MUST*NOT*MATTER*. > > If they do, the process is completely and utterly broken. > > So what people should be able to trust is that they can always upgrade. What do you mean by "upgrade"? Can I upgrade from 3.15 to 3.16-rc1? If not, why not? NeilBrown > > Not the shit that I see *ALL* the time, where you upgrade something, > and it breaks. > > And no, the excuse "but the API was new in X.Y, so it could change in > X.Y+1" does *not* hold water. > > It very much violates that basic principle of trust and makes people > go "I don't want to upgrade, because it might break something I do". > > Linus [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-11 8:02 ` NeilBrown @ 2017-08-11 23:10 ` Linus Torvalds 2017-08-14 4:19 ` NeilBrown 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2017-08-11 23:10 UTC (permalink / raw) To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski On Fri, Aug 11, 2017 at 1:02 AM, NeilBrown <neilb@suse.com> wrote: > > What do you mean by "upgrade"? > Can I upgrade from 3.15 to 3.16-rc1? If not, why not? Yes.. Of course, bugs happen, and then they get fixed. But yes, even including things like -rc1 (or just "random untagged kernel of the day") you should (a) feel safe in always upgrading to any higher version (I *hope* you can always also downgrade to a lower kernel version, but obviously at some point user space may start depending on newer features that simply don't exist in older kernels). (b) also feel that if something breaks, it's a bug, and people will take it seriously and not dismiss it with some crazy "N+1 version" excuse. There are some cases where we may not be able to avoid breakage: the main two are "security issues" and "insanely old hardware". And even for security issues, we try really really hard to avoid breakage. And the key word in "insanely old hardware" is that "insanely" part. At some point it just gets too hard to test (and sometimes the hardware is too broken, like the original i386 non-working supervisor page fault workarounds). Now, it can get really interesting if somebody notices an ABI change so late that others have started to depend on that ABI change. At that point, it's a "damned if you do, damned if you don't". We've actually been able to handle even that occasionally (by just adjusting behavior automatically based on some pattern), but at some point it obviously is impossible to fix both cases. And then I say "if it took you three years to upgrade and notice a behavioral change that nobody else noticed, it's no longer _our_ fault". So there is _some_ onus on people actually testing and reporting these things, but I can't off-hand actually remember any case of this really being a major issue. So it's largely a theoretical thing. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-11 23:10 ` Linus Torvalds @ 2017-08-14 4:19 ` NeilBrown 2017-08-14 18:34 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: NeilBrown @ 2017-08-14 4:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski [-- Attachment #1: Type: text/plain, Size: 2604 bytes --] On Fri, Aug 11 2017, Linus Torvalds wrote: > On Fri, Aug 11, 2017 at 1:02 AM, NeilBrown <neilb@suse.com> wrote: >> >> What do you mean by "upgrade"? >> Can I upgrade from 3.15 to 3.16-rc1? If not, why not? > > Yes.. > > Of course, bugs happen, and then they get fixed. > > But yes, even including things like -rc1 (or just "random untagged > kernel of the day") you should > > (a) feel safe in always upgrading to any higher version (I *hope* you > can always also downgrade to a lower kernel version, but obviously at > some point user space may start depending on newer features that > simply don't exist in older kernels). > > (b) also feel that if something breaks, it's a bug, and people will > take it seriously and not dismiss it with some crazy "N+1 version" > excuse. I think the issue is that some of us would like a clearer statement on what values of "some point" we will honor, and which values are crazy-talk. This related slightly to your comment: > > And then I say "if it took you three years to upgrade and notice a > behavioral change that nobody else noticed, it's no longer _our_ > fault". Can we also say "if you started depending on an API that has only been in the kernel for 3 weeks and we had to revise it, then it not _our_ fault if you depend on it already"?? In the original post in this thread, Andy seemed to think that as long as it gets "fixed before the next real release", we are safe. Your comments could be read to mean that you don't agree and that there is no clear line at which we are safe. You mentioned trust earlier: > The whole point is that users are supposed to be able to *trust* > the kernel. I agree, but I think trust works best if it works both ways. Can we trust application developers not to depend on an API which hasn't reached maturity? I think we can if we tell them what we expect - what constitutes "maturity" - and make it a reasonable expectation. Do we have to declare "maturity" the moment an API hits mainline (or hits -next, or hits mailing lists), or can we negotiate a formal grace period? Yes, "no regressions" is an import rule that should remain, but there has always been a loophole. The loophole is "no harm, no foul". If we can negotiate an understanding that results in "no harm" from early revisions to an API, then those revisions will not cause actual regressions. "No foul". But maybe I'm wrong and all the people talking about automatic tooling to discover and highlight API changes in linux-next are on the right track... or would be if everything actually went through linux-next. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-14 4:19 ` NeilBrown @ 2017-08-14 18:34 ` Linus Torvalds 2017-08-14 18:40 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2017-08-14 18:34 UTC (permalink / raw) To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski On Sun, Aug 13, 2017 at 9:19 PM, NeilBrown <neilb@suse.com> wrote: > > I think the issue is that some of us would like a clearer statement on > what values of "some point" we will honor, and which values are > crazy-talk. What? I thought I was *VERY* clear. It has *ALWAYS* been very clear. There is absolutely *no* cut-off. If a feature has been in a released kernel, we support it. End of story. Stop fishing for "some point". It doesn't exist. It never has. And it never will. If you worry about how good and stable your ABI is, and aren't willing to support that ABI forever, don't send the patch. Seriously. Just don't. This whole discussion is pointless. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-14 18:34 ` Linus Torvalds @ 2017-08-14 18:40 ` Linus Torvalds 2017-08-14 23:23 ` Andy Lutomirski 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2017-08-14 18:40 UTC (permalink / raw) To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski On Mon, Aug 14, 2017 at 11:34 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > If you worry about how good and stable your ABI is, and aren't willing > to support that ABI forever, don't send the patch. Seriously. Just > don't. > > This whole discussion is pointless. To clarify, and to strengthen the point: the regression has always been about actual breakage. You can change semantics all you want, if nobody ever notices. But if somebody does notice, and something breaks, it gets fixed. That's the rule. No exceptions. If you aren't willing to fix the bugs you introduce, you shouldn't be working on the kernel. It's that simple. Find some other project to mess up - there are tons of sh*t projects out there that think that changing ABI's is a good idea and should be done regularly. But the kernel cares about regressions. Christ, this is not a new rule. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-14 18:40 ` Linus Torvalds @ 2017-08-14 23:23 ` Andy Lutomirski 2017-08-15 0:54 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Andy Lutomirski @ 2017-08-14 23:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski On Mon, Aug 14, 2017 at 11:40 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Aug 14, 2017 at 11:34 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> If you worry about how good and stable your ABI is, and aren't willing >> to support that ABI forever, don't send the patch. Seriously. Just >> don't. >> >> This whole discussion is pointless. > > To clarify, and to strengthen the point: the regression has always > been about actual breakage. You can change semantics all you want, if > nobody ever notices. > > But if somebody does notice, and something breaks, it gets fixed. > > That's the rule. No exceptions. If you aren't willing to fix the bugs > you introduce, you shouldn't be working on the kernel. What I was trying to get at with this thread was: is there a way that we can enable a new feature for testing in a way that it *can't* get used by real programs that expect stability? There are certainly nasty solutions. For example, cgroup v2 cpu controller support has been available as an out-of-tree patch for many cycles. It's finally being hashed out in a way that's incompatible with programs that target that patch, but no one is going say "screw you, 4.14 broke my setup" because that setup didn't work on any earlier kernel either. I'm wondering if we can maybe make this more systematic and less nasty. For example, what if we could have a way to enable features such that they work in -rc kernels (with big warnings!) but do *not* work in released kernels? Then people who want to develop against them would have to explicitly run -rc kernels, which would make it obvious that nothing's stable and might even get more -rc testers to boot. --Andy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-14 23:23 ` Andy Lutomirski @ 2017-08-15 0:54 ` Linus Torvalds 2017-08-15 16:11 ` Andy Lutomirski 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2017-08-15 0:54 UTC (permalink / raw) To: Andy Lutomirski; +Cc: ksummit-discuss On Mon, Aug 14, 2017 at 4:23 PM, Andy Lutomirski <luto@kernel.org> wrote: > > What I was trying to get at with this thread was: is there a way that > we can enable a new feature for testing in a way that it *can't* get > used by real programs that expect stability? Honestly, I can't think of a case where that would actually have been an issue. Make a config option out of it, and mark it expert, and maybe that would do it. But realistically, that just doesn't make any sense in reality - because in reality, user programs get written not on top of the development kernel, but on vendor kernels. So the scenario you describe simply never happens. The _reverse_ scenario does happen: vendors who do their own kernel patches that introduce something their customers need, and people start depending on those semantics. Android may be the case where that happens today, but it's not the only case. We've merged code that was in use by various Linux distro people and where there already was an active user base of the new ABI. So I think your issue is pretty much theoretical, and _would_ be easy to fix with some kind of "this option is only enabled for rc kernels, and gets disabled on release", but such an option just doesn't make sense because that's not how development actually happens. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-15 0:54 ` Linus Torvalds @ 2017-08-15 16:11 ` Andy Lutomirski 0 siblings, 0 replies; 37+ messages in thread From: Andy Lutomirski @ 2017-08-15 16:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski On Mon, Aug 14, 2017 at 5:54 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Aug 14, 2017 at 4:23 PM, Andy Lutomirski <luto@kernel.org> wrote: >> >> What I was trying to get at with this thread was: is there a way that >> we can enable a new feature for testing in a way that it *can't* get >> used by real programs that expect stability? > > Honestly, I can't think of a case where that would actually have been an issue. > > Make a config option out of it, and mark it expert, and maybe that would do it. That seems optimistic to me. > > But realistically, that just doesn't make any sense in reality - > because in reality, user programs get written not on top of the > development kernel, but on vendor kernels. Plenty of user programs get written against development kernels. iproute2 is a prime example. But IIRC the reason that RDMA disaster didn't get reverted is that people thought that user programs using it existing something like one week after the stable kernel containing the feature showed up. > > So the scenario you describe simply never happens. > > The _reverse_ scenario does happen: vendors who do their own kernel > patches that introduce something their customers need, and people > start depending on those semantics. > > Android may be the case where that happens today, but it's not the > only case. We've merged code that was in use by various Linux distro > people and where there already was an active user base of the new ABI. > > So I think your issue is pretty much theoretical, and _would_ be easy > to fix with some kind of "this option is only enabled for rc kernels, > and gets disabled on release", but such an option just doesn't make > sense because that's not how development actually happens. But maybe it would be a good thing if more development happened that way. If nothing else, we'd get lots more testing :) > > Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? 2017-08-09 0:00 ` NeilBrown 2017-08-09 11:54 ` Laurent Pinchart 2017-08-09 20:21 ` Linus Torvalds @ 2017-08-15 18:26 ` Michael Kerrisk (man-pages) 2 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-08-15 18:26 UTC (permalink / raw) To: NeilBrown, Andy Lutomirski, ksummit-discuss On 08/09/2017 02:00 AM, NeilBrown wrote: > On Thu, Aug 03 2017, Andy Lutomirski wrote: > [...] > My other strawman is to declare that if an API is not documented, then > it isn't stable. People are welcome to use undocumented APIs, but when > their app breaks, they get to keep both parts. Of course, if the > documentation is wrong, that puts us in an awkward place - especially if > the documented behaviour is impossible to implement. We can then > schedule the release of the documentation at whatever time seems > appropriate given the complexity and utility of the particular API. Given that features sometimes exist for years (and in rare cases decades) before they are documented, and many longstanding features remain incompletely documented, the notion that "if an API is not documented, then it isn't stable" makes no sense, really. > My main point here is that I think the only real solution here is to > revise the current social contract. Trying to use technology to detect > API changes - as has been suggested in this thread - is not a bad idea, > but is unlikely to catch the really important problems. Agreed. There are existing techniques (thorough documentation, more tests) that, if adhered to more strictly, could certainly alleviate many of the API design mess-ups. That sort of stuff requires people to really think about what they are doing, and gives other people greater insight into what they are doing, and in the process uncovers implementation and design bugs. (I can't count the number of times I discovered implementation bugs while I wrote manual pages.) Some technology to discover API changes would certainly be helpful, but it doesn't solve the deeper problem. It needs human beings to look at this stuff. We seem to have learned the lesson that ungoverned API design leads to chaos in the case of cgroups v1. And the approach in cgroups v2 made a critical change: there are actually individuals with overall design responsibility for the interface. I think that solution could be applied more generally. Have a *paid* user-space ABI maintainer whose job is to track ABI changes, make sure that someone has documented them sufficiently well, and that thorough tests have been written, before the interface can be released in the mainline kernel. Ideally, documented real-world use cases, along with real-world applications using the new interface would also be part of the package required for acceptance of a new interface.[*] Cheers, Michael [*] Note, by the way, that I'm not proposing that the ABI maintainer should write the docs or tests (although they might on occasion), just that they act as the gatekeeper to make sure that someone has done those tasks to a sufficient standard. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2017-08-15 18:26 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-04 1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski 2017-08-04 1:30 ` Greg KH 2017-08-04 4:15 ` Andy Lutomirski 2017-08-04 5:08 ` Sergey Senozhatsky 2017-08-04 8:23 ` Daniel Vetter 2017-08-04 2:26 ` Theodore Ts'o 2017-08-04 3:27 ` Stephen Rothwell 2017-08-04 5:13 ` Julia Lawall 2017-08-04 14:20 ` Theodore Ts'o 2017-08-04 15:47 ` Julia Lawall 2017-08-04 8:42 ` Jiri Kosina 2017-08-04 8:53 ` Hannes Reinecke 2017-08-04 16:04 ` Greg KH 2017-08-04 17:14 ` Theodore Ts'o 2017-08-04 17:53 ` Greg KH 2017-08-04 22:52 ` Joe Perches 2017-08-09 20:06 ` Geert Uytterhoeven 2017-08-14 19:49 ` Steven Rostedt 2017-08-14 19:51 ` Linus Torvalds 2017-08-15 7:13 ` Julia Lawall 2017-08-04 8:57 ` Julia Lawall 2017-08-04 11:27 ` Michael Kerrisk (man-pages) 2017-08-09 0:00 ` NeilBrown 2017-08-09 11:54 ` Laurent Pinchart 2017-08-14 20:07 ` Steven Rostedt 2017-08-09 20:21 ` Linus Torvalds 2017-08-11 6:21 ` NeilBrown 2017-08-11 6:39 ` Linus Torvalds 2017-08-11 8:02 ` NeilBrown 2017-08-11 23:10 ` Linus Torvalds 2017-08-14 4:19 ` NeilBrown 2017-08-14 18:34 ` Linus Torvalds 2017-08-14 18:40 ` Linus Torvalds 2017-08-14 23:23 ` Andy Lutomirski 2017-08-15 0:54 ` Linus Torvalds 2017-08-15 16:11 ` Andy Lutomirski 2017-08-15 18:26 ` Michael Kerrisk (man-pages)
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.