All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
@ 2017-08-04  1:16 Andy Lutomirski
  2017-08-04  1:30 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Andy Lutomirski @ 2017-08-04  1:16 UTC (permalink / raw)
  To: ksummit-discuss

[Note: I'm not entirely sure I can make it to the kernel summit this
year, due to having a tiny person and tons of travel]

This may be highly controversial, but: there seems to be a weakness in
the kernel development model in the way that new ABI features become
stable.  The current model is, roughly:

1. Someone writes the code.  Maybe they cc linux-abi, maybe they don't.
2. People hopefully review the code.
3. A subsystem maintainer merges the code.  They hope the ABI is right.
4. Linus gets a pull request.  Linus probably doesn't review the ABI
for sanity, style, blatant bugs, etc.  If Linus did, then he'd never
get anything else done.
5. The new ABI lands in -rc1.
6. If someone finds a problem or objects, it had better get fixed
before the next real release.

There's a few problems here.  One is that the people who would really
review the ABI might not even notice until step 5 or 6 or so.  Another
is that it takes some time for userspace to get experience with a new
ABI.

I'm wondering if there are other models that could work.  I think it
would be nice for us to be able to land a kernel in Linus tree and
still wait a while before stabilizing it.  Rust, for example, has a
strict policy for this that seems to work quite well.

Maybe we could pull something off where big new features hide behind a
named feature gate for a while.  That feature gate can only be enabled
under some circumstances that make it very hard to mistake it for true
stability.  (For example, maybe you *can't* enable feature gates on a
final kernel unless you manually patch something.)

Here are a few examples that come to mind for where this would have helped:

 - Whatever that new RDMA socket type was that was deemed totally
broken but only just after it hit a real release.
 - O_TMPFILE.  I discovered that it corrupted filesystems in -rc6 or
-rc7.  That got fixed, the the API is still a steaming pile of crap.
 - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago.

I'm sure there are tons more.

Is this too crazy, or is it worth discussing?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski
@ 2017-08-04  1:30 ` Greg KH
  2017-08-04  4:15   ` Andy Lutomirski
                     ` (2 more replies)
  2017-08-04  2:26 ` Theodore Ts'o
  2017-08-09  0:00 ` NeilBrown
  2 siblings, 3 replies; 37+ messages in thread
From: Greg KH @ 2017-08-04  1:30 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: ksummit-discuss

On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote:
> [Note: I'm not entirely sure I can make it to the kernel summit this
> year, due to having a tiny person and tons of travel]
> 
> This may be highly controversial, but: there seems to be a weakness in
> the kernel development model in the way that new ABI features become
> stable.  The current model is, roughly:
> 
> 1. Someone writes the code.  Maybe they cc linux-abi, maybe they don't.
> 2. People hopefully review the code.
> 3. A subsystem maintainer merges the code.  They hope the ABI is right.
> 4. Linus gets a pull request.  Linus probably doesn't review the ABI
> for sanity, style, blatant bugs, etc.  If Linus did, then he'd never
> get anything else done.
> 5. The new ABI lands in -rc1.
> 6. If someone finds a problem or objects, it had better get fixed
> before the next real release.
> 
> There's a few problems here.  One is that the people who would really
> review the ABI might not even notice until step 5 or 6 or so.  Another
> is that it takes some time for userspace to get experience with a new
> ABI.
> 
> I'm wondering if there are other models that could work.  I think it
> would be nice for us to be able to land a kernel in Linus tree and
> still wait a while before stabilizing it.  Rust, for example, has a
> strict policy for this that seems to work quite well.

What does Rust do here?

> Maybe we could pull something off where big new features hide behind a
> named feature gate for a while.  That feature gate can only be enabled
> under some circumstances that make it very hard to mistake it for true
> stability.  (For example, maybe you *can't* enable feature gates on a
> final kernel unless you manually patch something.)
> 
> Here are a few examples that come to mind for where this would have helped:
> 
>  - Whatever that new RDMA socket type was that was deemed totally
> broken but only just after it hit a real release.
>  - O_TMPFILE.  I discovered that it corrupted filesystems in -rc6 or
> -rc7.  That got fixed, the the API is still a steaming pile of crap.
>  - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago.
> 
> I'm sure there are tons more.
> 
> Is this too crazy, or is it worth discussing?

I think it is, it keeps coming up over and over and it's not getting any
easier.  We are long past the time when we only had to duplicate what
other operating systems do, adding new features is much different.

I like the "manually patch" thing as an good idea for how to maybe do
this, but who is going to do that patching for testing?  What's the rule
for how long time has to pass before it can be enabled?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski
  2017-08-04  1:30 ` Greg KH
@ 2017-08-04  2:26 ` Theodore Ts'o
  2017-08-04  3:27   ` Stephen Rothwell
  2017-08-04  8:42   ` Jiri Kosina
  2017-08-09  0:00 ` NeilBrown
  2 siblings, 2 replies; 37+ messages in thread
From: Theodore Ts'o @ 2017-08-04  2:26 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: ksummit-discuss

On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote:
> Maybe we could pull something off where big new features hide behind a
> named feature gate for a while.  That feature gate can only be enabled
> under some circumstances that make it very hard to mistake it for true
> stability.  (For example, maybe you *can't* enable feature gates on a
> final kernel unless you manually patch something.)

I think the problem you've pointed out is a real and vexing one, and
it's good we talk about possible solution. 

The challenge with using a feature gate is that the examples you gave
weren't the simply, obvious cases of a new syscall, but flags to a
system call (O_TMPFILE) or a new socket type (e.g., for RDMA).  So
that implies that adding a feature gate is going to requre making code
changes to enforce the ability turn off the feature.  Even if it's
done using CONFIG_* #ifdefs and KConfig options, it's going to require
more effort than cc'ing the patch to linux-api@vger.kernel.org.

And if the problem is that aren't bothering to remember to cc
linux-api@, I'm not sure it's realistic to think they will implement a
feature gate.

Furthermore, problems with an API design or implementation tend to get
noticed either (a) when someone else does a code audit and tries to
define exactly what the semantics will be for the new flag or syscall
or socket type, or (b) when someone tries using it and discovers
problems (usually not in the common path, since presumably the
developer tested that bit of it, but in the error handling).  The
feature gate won't necessarily help with (a), except that it gives
people a bit more time to notice that the new feature went in, and it
probably actively makes (b) worse, since if it is under a feature
gate, fewer people are likely to experiment with the new feature.

One way that we could try to make things better is by having some kind
of semi-automated system which monitors changes in include/uapi/*.h in
linux-next.  Unfortunately there will be a lot of false negatives, so
it's going to require a human to figure out which of the changes
represent new/changed API's, and which are just cleanups /
rearragements.  (We could try to see if we could train a Machine
Learning model --- but even if we can make a some nueral nets play Go,
I'm personally dubious this is something that ML would be successful
at.  I might be pleasantly surprised, though, if someone wanted to
give it a try.  :-)

				- Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  2:26 ` Theodore Ts'o
@ 2017-08-04  3:27   ` Stephen Rothwell
  2017-08-04  5:13     ` Julia Lawall
  2017-08-04  8:42   ` Jiri Kosina
  1 sibling, 1 reply; 37+ messages in thread
From: Stephen Rothwell @ 2017-08-04  3:27 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: ksummit-discuss, Andy Lutomirski

Hi Ted,

On Thu, 3 Aug 2017 22:26:39 -0400 Theodore Ts'o <tytso@mit.edu> wrote:
>
> One way that we could try to make things better is by having some kind
> of semi-automated system which monitors changes in include/uapi/*.h in
> linux-next.  Unfortunately there will be a lot of false negatives, so
> it's going to require a human to figure out which of the changes
> represent new/changed API's, and which are just cleanups /
> rearragements.  (We could try to see if we could train a Machine
> Learning model --- but even if we can make a some nueral nets play Go,
> I'm personally dubious this is something that ML would be successful
> at.  I might be pleasantly surprised, though, if someone wanted to
> give it a try.  :-)

OK, so this is what we have so far in linux-next:

$ git log --no-merges --pretty="%h %s" stable..next-20170803 include/uapi
9aa302aaede8 membarrier: Expedited private command
b3a88f222ae1 drm/msm: Add a parameter query for the number of ringbuffers
7f1779fd071b drm/msm: Implement per-submitqueue fences
20ab08455d3c drm/msm: Add per-instance submit queues
80fe969b5d50 uapi: fix linux/sysctl.h userspace compilation errors
4edc19eeeee3 mm: userfaultfd: add feature to request for a signal delivery
0d33bdf5ea6c mm: shm: use new hugetlb size encoding definitions
b217edddf5c2 mm: arch: consolidate mmap hugetlb size encodings
21d75dd8e83d mm: hugetlb: define system call hugetlb size encodings in single file
cdbc78ba7026 drm/msm: Remove __user from __u64 data types
db1689aa61bd drm: Create a format/modifier blob
e6fc3b68558e drm: Plumb modifiers through plane init
bb7c19f96012 tcp: add related fields into SCM_TIMESTAMPING_OPT_STATS
3282e65558b3 tcp: remove unused mib counters
615095752100 netfilter: nf_tables: Allow object names of up to 255 chars
387454901bd6 netfilter: nf_tables: Allow set names of up to 255 chars
b7263e071aba netfilter: nf_tables: Allow chain name of up to 255 chars
e46abbcc05aa netfilter: nf_tables: Allow table names of up to 255 chars
e62e484df049 net sched actions: add time filter for action dumping
90825b23a887 net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
64c83d837329 net netlink: Add new type NLA_BITFIELD32
1a5f3da20bd9 net: ethtool: add support for forward error correction modes
ca1136c99b66 blktrace: export cgroup info in trace
f30994622b2b drm/vc4: Add an ioctl for labeling GEM BOs for summary stats
67cbe3532c2c RDMA/qedr: notify user application of supported WIDs
ad84dad2160d RDMA/qedr: notify user application if DPM is supported
cc731525f26a signal: Remove kernel interal si_code magic
d08477aa975e fcntl: Don't use ambiguous SIG_POLL si_codes
3078f5f1bd8b IB/mlx4: Add support for RSS QP
b8d46ca03506 IB/mlx4: Add support for WQ indirection table related verbs
400b1ebcfe31 IB/mlx4: Add support for WQ related verbs
ea30b966f7dd IB/mlx4: Add inline-receive support
2dee0e545894 IB/uverbs: Enable QP creation with a given source QP number
784b4e612d42 netfilter: nf_tables: Attach process info to NFT_MSG_NEWGEN notifications
ddc6c70f07bb rxrpc: Move the packet.h include file into net/rxrpc/
727f8914477e rxrpc: Expose UAPI definitions to userspace
eb0baf8a0d92 perf/core: Define the common branch type classification
6b2bbb08747a media: cec: rework the cec event handling
6303d97873d3 media: linux/cec.h: add pin monitoring API support
fc60a8b675bd tty: serial: owl: Implement console driver
97f91a7cf04f bpf: add bpf_redirect_map helper routine
546ac1ffb70d bpf: add devmap, a map for storing net device references
814abfabef3c xdp: add bpf_redirect helper function
66bf97967726 annotate RWF_... flags
6545135a5ed2 drm/qxl: fix __user annotations

Some of these are just fixing bugs or moving things around.  I wonder
where we can go from here?  Does every change to a uapi file need a
corresponding documentation and man pages update?  Do we dare try doing
(shock! horror!) design before implementation?  ;-)

The, of course, there is the whole sysfs and tracing mess :-(
-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  1:30 ` Greg KH
@ 2017-08-04  4:15   ` Andy Lutomirski
  2017-08-04  5:08   ` Sergey Senozhatsky
  2017-08-04  8:23   ` Daniel Vetter
  2 siblings, 0 replies; 37+ messages in thread
From: Andy Lutomirski @ 2017-08-04  4:15 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss, Andy Lutomirski

On Thu, Aug 3, 2017 at 6:30 PM, Greg KH <greg@kroah.com> wrote:
> On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote:
>> I'm wondering if there are other models that could work.  I think it
>> would be nice for us to be able to land a kernel in Linus tree and
>> still wait a while before stabilizing it.  Rust, for example, has a
>> strict policy for this that seems to work quite well.
>
> What does Rust do here?
>

Rust has named unstable features.  In order to use them, you need to
declare, with a special annotation, your desire to use the named
feature.  You also need to be running a nightly build of Rust --
stable versions will refuse to compile code that requests unstable
features.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  1:30 ` Greg KH
  2017-08-04  4:15   ` Andy Lutomirski
@ 2017-08-04  5:08   ` Sergey Senozhatsky
  2017-08-04  8:23   ` Daniel Vetter
  2 siblings, 0 replies; 37+ messages in thread
From: Sergey Senozhatsky @ 2017-08-04  5:08 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss, Andy Lutomirski

On (08/03/17 18:30), Greg KH wrote:
[..]
> > There's a few problems here.  One is that the people who would really
> > review the ABI might not even notice until step 5 or 6 or so.  Another
> > is that it takes some time for userspace to get experience with a new
> > ABI.
> > 
> > I'm wondering if there are other models that could work.  I think it
> > would be nice for us to be able to land a kernel in Linus tree and
> > still wait a while before stabilizing it.  Rust, for example, has a
> > strict policy for this that seems to work quite well.
> 
> What does Rust do here?

I think Andy meant how Rust tags all of its features:

...

    #[stable(feature = "rust1", since = "1.0.0")]
    fn finish(&self) -> u64;

    /// Writes some data into this `Hasher`.
    ///
    /// # Examples
    ///
    /// ```
    /// use std::collections::hash_map::DefaultHasher;
    /// use std::hash::Hasher;
    ///
    /// let mut hasher = DefaultHasher::new();
    /// let data = [0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef];
    ///
    /// hasher.write(&data);
    ///
    /// println!("Hash is {:x}!", hasher.finish());
    /// ```
    #[stable(feature = "rust1", since = "1.0.0")]
    fn write(&mut self, bytes: &[u8]);

    /// Writes a single `u8` into this hasher.
    #[inline]
    #[stable(feature = "hasher_write", since = "1.3.0")]
    fn write_u8(&mut self, i: u8) {
        self.write(&[i])
    }
    /// Writes a single `u16` into this hasher.
    #[inline]
    #[stable(feature = "hasher_write", since = "1.3.0")]
    fn write_u16(&mut self, i: u16) {
        self.write(&unsafe { mem::transmute::<_, [u8; 2]>(i) })
    }


   #[unstable(feature = "sip_hash_13", issue = "34767")]
   #[allow(deprecated)]
   pub use self::sip::{SipHasher13, SipHasher24};

...

	-ss

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  3:27   ` Stephen Rothwell
@ 2017-08-04  5:13     ` Julia Lawall
  2017-08-04 14:20       ` Theodore Ts'o
  0 siblings, 1 reply; 37+ messages in thread
From: Julia Lawall @ 2017-08-04  5:13 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Andy Lutomirski, ksummit-discuss



On Fri, 4 Aug 2017, Stephen Rothwell wrote:

> Hi Ted,
>
> On Thu, 3 Aug 2017 22:26:39 -0400 Theodore Ts'o <tytso@mit.edu> wrote:
> >
> > One way that we could try to make things better is by having some kind
> > of semi-automated system which monitors changes in include/uapi/*.h in
> > linux-next.  Unfortunately there will be a lot of false negatives, so
> > it's going to require a human to figure out which of the changes
> > represent new/changed API's, and which are just cleanups /
> > rearragements.  (We could try to see if we could train a Machine
> > Learning model --- but even if we can make a some nueral nets play Go,
> > I'm personally dubious this is something that ML would be successful
> > at.  I might be pleasantly surprised, though, if someone wanted to
> > give it a try.  :-)
>
> OK, so this is what we have so far in linux-next:

I did some work on a semantic patch for collecting the error codes
returned by all of the system class.  Things were going fairly well until
I discovered that is fairly common near the user level to return error
codes in reference parameters rather than by direct returns, and that
meant that I was going to have to duplicate my entire rule set.  I also
observed that the documentation is not always that precise.  It will say
typically returns -E1, -E2, -E3, and may return other stuff, so in that
case there is less to check.

I was thinking of getting an intern to address the first problem, and then
to check the results to the extent that they are checkable.  One could
also run the rules over time, to see if there have been changes, to see
even in the other stuff case if something should be added.

julia

>
> $ git log --no-merges --pretty="%h %s" stable..next-20170803 include/uapi
> 9aa302aaede8 membarrier: Expedited private command
> b3a88f222ae1 drm/msm: Add a parameter query for the number of ringbuffers
> 7f1779fd071b drm/msm: Implement per-submitqueue fences
> 20ab08455d3c drm/msm: Add per-instance submit queues
> 80fe969b5d50 uapi: fix linux/sysctl.h userspace compilation errors
> 4edc19eeeee3 mm: userfaultfd: add feature to request for a signal delivery
> 0d33bdf5ea6c mm: shm: use new hugetlb size encoding definitions
> b217edddf5c2 mm: arch: consolidate mmap hugetlb size encodings
> 21d75dd8e83d mm: hugetlb: define system call hugetlb size encodings in single file
> cdbc78ba7026 drm/msm: Remove __user from __u64 data types
> db1689aa61bd drm: Create a format/modifier blob
> e6fc3b68558e drm: Plumb modifiers through plane init
> bb7c19f96012 tcp: add related fields into SCM_TIMESTAMPING_OPT_STATS
> 3282e65558b3 tcp: remove unused mib counters
> 615095752100 netfilter: nf_tables: Allow object names of up to 255 chars
> 387454901bd6 netfilter: nf_tables: Allow set names of up to 255 chars
> b7263e071aba netfilter: nf_tables: Allow chain name of up to 255 chars
> e46abbcc05aa netfilter: nf_tables: Allow table names of up to 255 chars
> e62e484df049 net sched actions: add time filter for action dumping
> 90825b23a887 net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
> 64c83d837329 net netlink: Add new type NLA_BITFIELD32
> 1a5f3da20bd9 net: ethtool: add support for forward error correction modes
> ca1136c99b66 blktrace: export cgroup info in trace
> f30994622b2b drm/vc4: Add an ioctl for labeling GEM BOs for summary stats
> 67cbe3532c2c RDMA/qedr: notify user application of supported WIDs
> ad84dad2160d RDMA/qedr: notify user application if DPM is supported
> cc731525f26a signal: Remove kernel interal si_code magic
> d08477aa975e fcntl: Don't use ambiguous SIG_POLL si_codes
> 3078f5f1bd8b IB/mlx4: Add support for RSS QP
> b8d46ca03506 IB/mlx4: Add support for WQ indirection table related verbs
> 400b1ebcfe31 IB/mlx4: Add support for WQ related verbs
> ea30b966f7dd IB/mlx4: Add inline-receive support
> 2dee0e545894 IB/uverbs: Enable QP creation with a given source QP number
> 784b4e612d42 netfilter: nf_tables: Attach process info to NFT_MSG_NEWGEN notifications
> ddc6c70f07bb rxrpc: Move the packet.h include file into net/rxrpc/
> 727f8914477e rxrpc: Expose UAPI definitions to userspace
> eb0baf8a0d92 perf/core: Define the common branch type classification
> 6b2bbb08747a media: cec: rework the cec event handling
> 6303d97873d3 media: linux/cec.h: add pin monitoring API support
> fc60a8b675bd tty: serial: owl: Implement console driver
> 97f91a7cf04f bpf: add bpf_redirect_map helper routine
> 546ac1ffb70d bpf: add devmap, a map for storing net device references
> 814abfabef3c xdp: add bpf_redirect helper function
> 66bf97967726 annotate RWF_... flags
> 6545135a5ed2 drm/qxl: fix __user annotations
>
> Some of these are just fixing bugs or moving things around.  I wonder
> where we can go from here?  Does every change to a uapi file need a
> corresponding documentation and man pages update?  Do we dare try doing
> (shock! horror!) design before implementation?  ;-)
>
> The, of course, there is the whole sysfs and tracing mess :-(
> --
> Cheers,
> Stephen Rothwell
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  1:30 ` Greg KH
  2017-08-04  4:15   ` Andy Lutomirski
  2017-08-04  5:08   ` Sergey Senozhatsky
@ 2017-08-04  8:23   ` Daniel Vetter
  2 siblings, 0 replies; 37+ messages in thread
From: Daniel Vetter @ 2017-08-04  8:23 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss, Andy Lutomirski

On Fri, Aug 4, 2017 at 3:30 AM, Greg KH <greg@kroah.com> wrote:
> On Thu, Aug 03, 2017 at 06:16:44PM -0700, Andy Lutomirski wrote:
>> [Note: I'm not entirely sure I can make it to the kernel summit this
>> year, due to having a tiny person and tons of travel]
>>
>> This may be highly controversial, but: there seems to be a weakness in
>> the kernel development model in the way that new ABI features become
>> stable.  The current model is, roughly:
>>
>> 1. Someone writes the code.  Maybe they cc linux-abi, maybe they don't.
>> 2. People hopefully review the code.
>> 3. A subsystem maintainer merges the code.  They hope the ABI is right.
>> 4. Linus gets a pull request.  Linus probably doesn't review the ABI
>> for sanity, style, blatant bugs, etc.  If Linus did, then he'd never
>> get anything else done.
>> 5. The new ABI lands in -rc1.
>> 6. If someone finds a problem or objects, it had better get fixed
>> before the next real release.
>>
>> There's a few problems here.  One is that the people who would really
>> review the ABI might not even notice until step 5 or 6 or so.  Another
>> is that it takes some time for userspace to get experience with a new
>> ABI.
>>
>> I'm wondering if there are other models that could work.  I think it
>> would be nice for us to be able to land a kernel in Linus tree and
>> still wait a while before stabilizing it.  Rust, for example, has a
>> strict policy for this that seems to work quite well.
>
> What does Rust do here?
>
>> Maybe we could pull something off where big new features hide behind a
>> named feature gate for a while.  That feature gate can only be enabled
>> under some circumstances that make it very hard to mistake it for true
>> stability.  (For example, maybe you *can't* enable feature gates on a
>> final kernel unless you manually patch something.)
>>
>> Here are a few examples that come to mind for where this would have helped:
>>
>>  - Whatever that new RDMA socket type was that was deemed totally
>> broken but only just after it hit a real release.
>>  - O_TMPFILE.  I discovered that it corrupted filesystems in -rc6 or
>> -rc7.  That got fixed, the the API is still a steaming pile of crap.
>>  - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago.
>>
>> I'm sure there are tons more.
>>
>> Is this too crazy, or is it worth discussing?
>
> I think it is, it keeps coming up over and over and it's not getting any
> easier.  We are long past the time when we only had to duplicate what
> other operating systems do, adding new features is much different.
>
> I like the "manually patch" thing as an good idea for how to maybe do
> this, but who is going to do that patching for testing?  What's the rule
> for how long time has to pass before it can be enabled?

Imo the real fix is to crank up requirements. What we have right now
for gpu drivers is:
- Kernel patch, fully reviewed.
- Full docs for the internals for the same.
- Testcases for the same.
- The entire userspace stack using that feature. Not some tech demo,
prototype or testcase, but the real deal. Also no vendor forks. In
extreme cases this means patches to 3+ projects (other than the
kernel) to wire the entire thing through the desktop stack.
- Testcases for all those userspace bits.
- Full review on all those userspace bits. This generally means kernel
hackers review the userspace and userpace people the kernel bits too.
- Not quite there yet, but some starts on real docs for the uapi.
- Also not quite where it needs to be yet, but full CI over everything
(at least on for the i915 driver, which is still the team that pushes
for most of the generic drm uapi additions).

Some drivers chicken a bit on some of these pieces (less docs/review),
but for the big ones and all the core bits, that's what we do. That's
some steep requirements, and it makes adding new uapi a fairly big
pain, but as a result we generally ship it as soon as it hits
linux-next, with very few regrets in hindsight. Ship as in, it e.g.
hits a stable mese release which distros often ship in their stable
release way before the much slower kernel release process has gone
through it's paces.

And yes, each of the rules we have for new uapi is because there's at
least one case where we did regret not doing that :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  2:26 ` Theodore Ts'o
  2017-08-04  3:27   ` Stephen Rothwell
@ 2017-08-04  8:42   ` Jiri Kosina
  2017-08-04  8:53     ` Hannes Reinecke
  2017-08-04  8:57     ` Julia Lawall
  1 sibling, 2 replies; 37+ messages in thread
From: Jiri Kosina @ 2017-08-04  8:42 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: ksummit-discuss, Andy Lutomirski

On Thu, 3 Aug 2017, Theodore Ts'o wrote:

> One way that we could try to make things better is by having some kind
> of semi-automated system which monitors changes in include/uapi/*.h in
> linux-next.  

It's unfortunately just uapi though, and for sysfs it's a bit more 
difficult to define a pathname pattern to watch for.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  8:42   ` Jiri Kosina
@ 2017-08-04  8:53     ` Hannes Reinecke
  2017-08-04 16:04       ` Greg KH
  2017-08-04  8:57     ` Julia Lawall
  1 sibling, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2017-08-04  8:53 UTC (permalink / raw)
  To: ksummit-discuss

On 08/04/2017 10:42 AM, Jiri Kosina wrote:
> On Thu, 3 Aug 2017, Theodore Ts'o wrote:
> 
>> One way that we could try to make things better is by having some kind
>> of semi-automated system which monitors changes in include/uapi/*.h in
>> linux-next.  
> 
> It's unfortunately just uapi though, and for sysfs it's a bit more 
> difficult to define a pathname pattern to watch for.
> 
Yeah; that has been my main headache with the kABI stuff.
Nowadays sysfs is considered part of the kABI, but we have no way of
tracking it; we basically rely on people filling out some off-side
documentation, and hope they're not missing anything.
And we don't mess up when generating patches :-)

That, and the infamous 'internal symbol' discussion.
(Meaning that we only can declare symbols as exported, even though they
really should only be visible to that driver, not anything else)
Which leads to tons of false positives, and discussions about why this
really is meant to be an internal symbol.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  8:42   ` Jiri Kosina
  2017-08-04  8:53     ` Hannes Reinecke
@ 2017-08-04  8:57     ` Julia Lawall
  2017-08-04 11:27       ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 37+ messages in thread
From: Julia Lawall @ 2017-08-04  8:57 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Andy Lutomirski, ksummit-discuss



On Fri, 4 Aug 2017, Jiri Kosina wrote:

> On Thu, 3 Aug 2017, Theodore Ts'o wrote:
>
> > One way that we could try to make things better is by having some kind
> > of semi-automated system which monitors changes in include/uapi/*.h in
> > linux-next.
>
> It's unfortunately just uapi though, and for sysfs it's a bit more
> difficult to define a pathname pattern to watch for.

I think that Michael Kerrisk has a big list of regexps in this direction.

julia

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  8:57     ` Julia Lawall
@ 2017-08-04 11:27       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2017-08-04 11:27 UTC (permalink / raw)
  To: Julia Lawall; +Cc: ksummit-discuss, Andy Lutomirski

On 4 August 2017 at 10:57, Julia Lawall <julia.lawall@lip6.fr> wrote:
>
>
> On Fri, 4 Aug 2017, Jiri Kosina wrote:
>
>> On Thu, 3 Aug 2017, Theodore Ts'o wrote:
>>
>> > One way that we could try to make things better is by having some kind
>> > of semi-automated system which monitors changes in include/uapi/*.h in
>> > linux-next.
>>
>> It's unfortunately just uapi though, and for sysfs it's a bit more
>> difficult to define a pathname pattern to watch for.
>
> I think that Michael Kerrisk has a big list of regexps in this direction.

The list I have is not very big, and is rather unsophisticated (and
manually maintained). But it helps me spot some new features that I
would otherwise miss and that should be documented in the man pages.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  5:13     ` Julia Lawall
@ 2017-08-04 14:20       ` Theodore Ts'o
  2017-08-04 15:47         ` Julia Lawall
  0 siblings, 1 reply; 37+ messages in thread
From: Theodore Ts'o @ 2017-08-04 14:20 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Andy Lutomirski, ksummit-discuss

On Fri, Aug 04, 2017 at 07:13:10AM +0200, Julia Lawall wrote:
> I did some work on a semantic patch for collecting the error codes
> returned by all of the system class.  Things were going fairly well until
> I discovered that is fairly common near the user level to return error
> codes in reference parameters rather than by direct returns, and that
> meant that I was going to have to duplicate my entire rule set.  I also
> observed that the documentation is not always that precise.  It will say
> typically returns -E1, -E2, -E3, and may return other stuff, so in that
> case there is less to check.

Yeah, including potential new error returns as "changes to the
ABI/API" is probably simply not practical.  Adding a return for, say,
ENOMEM instead of causing a kernel oops is not something that needs to
be debated on the linux-api mailing list!

I recall, many years ago, an executive being indignant because Linux
was returning some error code for some syscall operation involving
network file system because it returned an network-related errno that
was not explicitly listed in POSIX for a file system related syscall,
and demanded that we fix the problem.  I had to gently point out to
said gentleman (since I was working for the Linux Foundation at the
time and he worked for a platinum sponsor :-) that POSIX as a blanket
statment allows confirming implementations' system calls to return
additional error codes as necessary.

I think people are much more concerned when there is a new system
call, or a new flag added to a core syscall (e.g., O_TMPFILE).  I
suspect that we required all new device ioctls and new flags to device
ioctls to get the linux-api@ treatment that we would get mass
resistance and the workload would not be practical.  And this list
doesn't even consider new sysfs files, new tracepoints, etc., etc.

Although technically speaking this is all "API's" I think we need to
pick our battles and start with a tractable subset of the problem...

          	 	     	   	- Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04 14:20       ` Theodore Ts'o
@ 2017-08-04 15:47         ` Julia Lawall
  0 siblings, 0 replies; 37+ messages in thread
From: Julia Lawall @ 2017-08-04 15:47 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andy Lutomirski, ksummit-discuss



On Fri, 4 Aug 2017, Theodore Ts'o wrote:

> On Fri, Aug 04, 2017 at 07:13:10AM +0200, Julia Lawall wrote:
> > I did some work on a semantic patch for collecting the error codes
> > returned by all of the system class.  Things were going fairly well until
> > I discovered that is fairly common near the user level to return error
> > codes in reference parameters rather than by direct returns, and that
> > meant that I was going to have to duplicate my entire rule set.  I also
> > observed that the documentation is not always that precise.  It will say
> > typically returns -E1, -E2, -E3, and may return other stuff, so in that
> > case there is less to check.
>
> Yeah, including potential new error returns as "changes to the
> ABI/API" is probably simply not practical.  Adding a return for, say,
> ENOMEM instead of causing a kernel oops is not something that needs to
> be debated on the linux-api mailing list!
>
> I recall, many years ago, an executive being indignant because Linux
> was returning some error code for some syscall operation involving
> network file system because it returned an network-related errno that
> was not explicitly listed in POSIX for a file system related syscall,
> and demanded that we fix the problem.  I had to gently point out to
> said gentleman (since I was working for the Linux Foundation at the
> time and he worked for a platinum sponsor :-) that POSIX as a blanket
> statment allows confirming implementations' system calls to return
> additional error codes as necessary.
>
> I think people are much more concerned when there is a new system
> call, or a new flag added to a core syscall (e.g., O_TMPFILE).  I
> suspect that we required all new device ioctls and new flags to device
> ioctls to get the linux-api@ treatment that we would get mass
> resistance and the workload would not be practical.  And this list
> doesn't even consider new sysfs files, new tracepoints, etc., etc.

I guess that new stsem calls would be easy to recognize, if they all start
with SYSCALL_DEFINE1, etc?

New flags could be #defines that are added to uapi .h file and that are
used in some similar way to other flags mentioned in the documentation?
So if the code already contains eg x & O_APPEND and there appears x &
O_TMPFILE and the documentation mentions O_APPEND, then it should now
mention O_TMPFILE too?

julia

>
> Although technically speaking this is all "API's" I think we need to
> pick our battles and start with a tractable subset of the problem...
>
>           	 	     	   	- Ted
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  8:53     ` Hannes Reinecke
@ 2017-08-04 16:04       ` Greg KH
  2017-08-04 17:14         ` Theodore Ts'o
  2017-08-14 19:49         ` Steven Rostedt
  0 siblings, 2 replies; 37+ messages in thread
From: Greg KH @ 2017-08-04 16:04 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: ksummit-discuss

On Fri, Aug 04, 2017 at 10:53:01AM +0200, Hannes Reinecke wrote:
> On 08/04/2017 10:42 AM, Jiri Kosina wrote:
> > On Thu, 3 Aug 2017, Theodore Ts'o wrote:
> > 
> >> One way that we could try to make things better is by having some kind
> >> of semi-automated system which monitors changes in include/uapi/*.h in
> >> linux-next.  
> > 
> > It's unfortunately just uapi though, and for sysfs it's a bit more 
> > difficult to define a pathname pattern to watch for.
> > 
> Yeah; that has been my main headache with the kABI stuff.
> Nowadays sysfs is considered part of the kABI, but we have no way of
> tracking it; we basically rely on people filling out some off-side
> documentation, and hope they're not missing anything.
> And we don't mess up when generating patches :-)

We could start searching linux-next for new additions of sysfs files
(search for the ATTR macros), and complain that there are no matching
Documentation/ABI/ updates at the same time.  I try to do that when
reviewing patches that come through my trees, but yes, this is hard to
keep up to date with.

Sounds like a good GSoC project though, setting up the infrastructure to
do this in a semi-automated fashion.

greg k-h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04 16:04       ` Greg KH
@ 2017-08-04 17:14         ` Theodore Ts'o
  2017-08-04 17:53           ` Greg KH
  2017-08-14 19:49         ` Steven Rostedt
  1 sibling, 1 reply; 37+ messages in thread
From: Theodore Ts'o @ 2017-08-04 17:14 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote:
> 
> We could start searching linux-next for new additions of sysfs files
> (search for the ATTR macros), and complain that there are no matching
> Documentation/ABI/ updates at the same time.  I try to do that when
> reviewing patches that come through my trees, but yes, this is hard to
> keep up to date with.
> 
> Sounds like a good GSoC project though, setting up the infrastructure to
> do this in a semi-automated fashion.

This sounds like an obvious thing to add to checkpatch?

     	    	    	    	     	 - Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04 17:14         ` Theodore Ts'o
@ 2017-08-04 17:53           ` Greg KH
  2017-08-04 22:52             ` Joe Perches
  2017-08-09 20:06             ` Geert Uytterhoeven
  0 siblings, 2 replies; 37+ messages in thread
From: Greg KH @ 2017-08-04 17:53 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: ksummit-discuss

On Fri, Aug 04, 2017 at 01:14:44PM -0400, Theodore Ts'o wrote:
> On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote:
> > 
> > We could start searching linux-next for new additions of sysfs files
> > (search for the ATTR macros), and complain that there are no matching
> > Documentation/ABI/ updates at the same time.  I try to do that when
> > reviewing patches that come through my trees, but yes, this is hard to
> > keep up to date with.
> > 
> > Sounds like a good GSoC project though, setting up the infrastructure to
> > do this in a semi-automated fashion.
> 
> This sounds like an obvious thing to add to checkpatch?

Probably, but lots of times this would be a false-positive as
documentation shows up in a later patch in the series to make things
easier to review.

But if you want to try to add it to checkpatch, be my guest, last time I
tried I gave up in as the regexes there brought me to tears...

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04 17:53           ` Greg KH
@ 2017-08-04 22:52             ` Joe Perches
  2017-08-09 20:06             ` Geert Uytterhoeven
  1 sibling, 0 replies; 37+ messages in thread
From: Joe Perches @ 2017-08-04 22:52 UTC (permalink / raw)
  To: Greg KH, Theodore Ts'o; +Cc: ksummit-discuss

On Fri, 2017-08-04 at 10:53 -0700, Greg KH wrote:
> On Fri, Aug 04, 2017 at 01:14:44PM -0400, Theodore Ts'o wrote:
> > On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote:
> > > 
> > > We could start searching linux-next for new additions of sysfs files
> > > (search for the ATTR macros), and complain that there are no matching
> > > Documentation/ABI/ updates at the same time.  I try to do that when
> > > reviewing patches that come through my trees, but yes, this is hard to
> > > keep up to date with.
> > > 
> > > Sounds like a good GSoC project though, setting up the infrastructure to
> > > do this in a semi-automated fashion.
> > 
> > This sounds like an obvious thing to add to checkpatch?
> 
> Probably, but lots of times this would be a false-positive as
> documentation shows up in a later patch in the series to make things
> easier to review.
> 
> But if you want to try to add it to checkpatch, be my guest, last time I
> tried I gave up in as the regexes there brought me to tears...

Piker.  Even Linus has adapted to perl regexes...

cheers, Joe

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04  1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski
  2017-08-04  1:30 ` Greg KH
  2017-08-04  2:26 ` Theodore Ts'o
@ 2017-08-09  0:00 ` NeilBrown
  2017-08-09 11:54   ` Laurent Pinchart
                     ` (2 more replies)
  2 siblings, 3 replies; 37+ messages in thread
From: NeilBrown @ 2017-08-09  0:00 UTC (permalink / raw)
  To: Andy Lutomirski, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 4526 bytes --]

On Thu, Aug 03 2017, Andy Lutomirski wrote:

> [Note: I'm not entirely sure I can make it to the kernel summit this
> year, due to having a tiny person and tons of travel]
>
> This may be highly controversial, but: there seems to be a weakness in
> the kernel development model in the way that new ABI features become
> stable.  The current model is, roughly:
>
> 1. Someone writes the code.  Maybe they cc linux-abi, maybe they don't.
> 2. People hopefully review the code.
> 3. A subsystem maintainer merges the code.  They hope the ABI is right.
> 4. Linus gets a pull request.  Linus probably doesn't review the ABI
> for sanity, style, blatant bugs, etc.  If Linus did, then he'd never
> get anything else done.
> 5. The new ABI lands in -rc1.
> 6. If someone finds a problem or objects, it had better get fixed
> before the next real release.
>
> There's a few problems here.  One is that the people who would really
> review the ABI might not even notice until step 5 or 6 or so.  Another
> is that it takes some time for userspace to get experience with a new
> ABI.
>
> I'm wondering if there are other models that could work.  I think it
> would be nice for us to be able to land a kernel in Linus tree and
> still wait a while before stabilizing it.  Rust, for example, has a
> strict policy for this that seems to work quite well.
>
> Maybe we could pull something off where big new features hide behind a
> named feature gate for a while.  That feature gate can only be enabled
> under some circumstances that make it very hard to mistake it for true
> stability.  (For example, maybe you *can't* enable feature gates on a
> final kernel unless you manually patch something.)
>
> Here are a few examples that come to mind for where this would have helped:
>
>  - Whatever that new RDMA socket type was that was deemed totally
> broken but only just after it hit a real release.
>  - O_TMPFILE.  I discovered that it corrupted filesystems in -rc6 or
> -rc7.  That got fixed, the the API is still a steaming pile of crap.
>  - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few releases ago.
>
> I'm sure there are tons more.
>
> Is this too crazy, or is it worth discussing?

I think this is a real issue and it would be good to see improvements.

I think this is primarily a social/communication issue.  We need to know
what is expected and what can be trusted.  We need clear rules that
everyone knows and that work for everyone.  Currently we have (fairly)
clear rules that work fairly well in many cases, but can be problematic.

The rules, as you outline, are that users should not experience
regressions from one released kernel to a subsequent released kernel.
So people working on -rc kernels can expect to experience regressions.
Also kernel devs are free to create theoretical regressions as long an
no-one experiences them.

My strawman is to suggest that we relax this.  We change the promise "if
it works on a released kernel, it will work on all future released
kernels", to "if it works on N consecutive released kernels, it will
work on all future released kernels", and then bikeshed the value of N,
but probably settle on N=2.
This should give important new freedom to kernel developers, and impose
a (hopefully) small burden on application developers.  They should be
testing their code anyway (we all should), now they have to test it
twice.
To make that burden smaller, we could aim to apply all "new API fixes"
to the -stable kernels promptly.
If a new API appears in Linux N it might behave differently in N+1, but
in that case the first N.M stable kernel released after N+1 will also
have the new behaviour.
So developing against that N.M should always be safe.  Any APIs it has
are declared to be stable.

My other strawman is to declare that if an API is not documented, then
it isn't stable.  People are welcome to use undocumented APIs, but when
their app breaks, they get to keep both parts.  Of course, if the
documentation is wrong, that puts us in an awkward place - especially if
the documented behaviour is impossible to implement.  We can then
schedule the release of the documentation at whatever time seems
appropriate given the complexity and utility of the particular API.

My main point here is that I think the only real solution here is to
revise the current social contract.  Trying to use technology to detect
API changes - as has been suggested in this thread - is not a bad idea,
but is unlikely to catch the really important problems.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-09  0:00 ` NeilBrown
@ 2017-08-09 11:54   ` Laurent Pinchart
  2017-08-14 20:07     ` Steven Rostedt
  2017-08-09 20:21   ` Linus Torvalds
  2017-08-15 18:26   ` Michael Kerrisk (man-pages)
  2 siblings, 1 reply; 37+ messages in thread
From: Laurent Pinchart @ 2017-08-09 11:54 UTC (permalink / raw)
  To: ksummit-discuss; +Cc: Andy Lutomirski

Hi Neil,

On Wednesday 09 Aug 2017 10:00:51 NeilBrown wrote:
> On Thu, Aug 03 2017, Andy Lutomirski wrote:
> > [Note: I'm not entirely sure I can make it to the kernel summit this
> > year, due to having a tiny person and tons of travel]
> > 
> > This may be highly controversial, but: there seems to be a weakness in
> > the kernel development model in the way that new ABI features become
> > stable.  The current model is, roughly:
> > 
> > 1. Someone writes the code.  Maybe they cc linux-abi, maybe they don't.
> > 2. People hopefully review the code.
> > 3. A subsystem maintainer merges the code.  They hope the ABI is right.
> > 4. Linus gets a pull request.  Linus probably doesn't review the ABI
> > for sanity, style, blatant bugs, etc.  If Linus did, then he'd never
> > get anything else done.
> > 5. The new ABI lands in -rc1.
> > 6. If someone finds a problem or objects, it had better get fixed
> > before the next real release.
> > 
> > There's a few problems here.  One is that the people who would really
> > review the ABI might not even notice until step 5 or 6 or so.  Another
> > is that it takes some time for userspace to get experience with a new
> > ABI.
> > 
> > I'm wondering if there are other models that could work.  I think it
> > would be nice for us to be able to land a kernel in Linus tree and
> > still wait a while before stabilizing it.  Rust, for example, has a
> > strict policy for this that seems to work quite well.
> > 
> > Maybe we could pull something off where big new features hide behind a
> > named feature gate for a while.  That feature gate can only be enabled
> > under some circumstances that make it very hard to mistake it for true
> > stability.  (For example, maybe you *can't* enable feature gates on a
> > final kernel unless you manually patch something.)
> > 
> > Here are a few examples that come to mind for where this would have
> > helped:
> >  - Whatever that new RDMA socket type was that was deemed totally
> >    broken but only just after it hit a real release.
> >  - O_TMPFILE.  I discovered that it corrupted filesystems in -rc6 or
> >    -rc7.  That got fixed, the the API is still a steaming pile of crap.
> >  - Some cgroup+bpf stuff that got cleaned up in a -rc7 or so a few
> >    releases ago.
> >
> > I'm sure there are tons more.
> > 
> > Is this too crazy, or is it worth discussing?
> 
> I think this is a real issue and it would be good to see improvements.
> 
> I think this is primarily a social/communication issue.  We need to know
> what is expected and what can be trusted.  We need clear rules that
> everyone knows and that work for everyone.  Currently we have (fairly)
> clear rules that work fairly well in many cases, but can be problematic.
> 
> The rules, as you outline, are that users should not experience
> regressions from one released kernel to a subsequent released kernel.
> So people working on -rc kernels can expect to experience regressions.
> Also kernel devs are free to create theoretical regressions as long an
> no-one experiences them.
> 
> My strawman is to suggest that we relax this.  We change the promise "if
> it works on a released kernel, it will work on all future released
> kernels", to "if it works on N consecutive released kernels, it will
> work on all future released kernels", and then bikeshed the value of N,
> but probably settle on N=2.
> This should give important new freedom to kernel developers, and impose
> a (hopefully) small burden on application developers.  They should be
> testing their code anyway (we all should), now they have to test it
> twice.
> To make that burden smaller, we could aim to apply all "new API fixes"
> to the -stable kernels promptly.
> If a new API appears in Linux N it might behave differently in N+1, but
> in that case the first N.M stable kernel released after N+1 will also
> have the new behaviour.
> So developing against that N.M should always be safe.  Any APIs it has
> are declared to be stable.

I fear this will lead us to a situation where new APIs will receive less 
scrutiny because developers will rely on the ability to change the API for the 
next kernel. Of course they will then be sidetracked by something else, and 
the next kernel will be released without any API change.

I might be overly pessimistic here, but I don't think we will be able to 
tackle what is largely a human problem (not paying enough attention to new 
APIs) with a small process adjustment. Let's face it, as long as we don't 
educate developers about APIs, we won't get this right, exactly the same way 
that developers need to be educated about security or race conditions.

Education is a slow process but gives the best results. What we should first 
aim for, in my opinion, isn't to turn everybody into an API expert, but to 
have enough reviewers who can spot API changes and wave a red flag if the 
change hasn't gone to a proper review process. Part of this could possibly be 
automated as discussed in this mail thread, but at the end of the day it's 
really about a culture change to make sure APIs are treated with enough care.

Now, assuming we can fix this first problem and get all new APIs properly 
reviewed and tested, the next question is what a proper review and test 
process should be. The DRM/KMS subsystem has put a process in place (as 
explained by Daniel Vetter in this mail thread) where every new API has to be 
implemented in real userspace components (and thus not just in test tools) and 
approved by the appropriate maintainers. The bar is pretty high, and possibly 
too high, but it is in my opinion better than the other way around.

Yes, this will slow down patch acceptance, but I don't think that's a problem, 
quite the contrary. I'd rather slow down merging new APIs upstream than having 
to live with lots of crappy APIs, as long as the development process at the 
subsystem level is not slowed down. That's where process and infrastructure 
could help, to ensure that userspace components consuming new APIs can easily 
find the kernel code they need to test. I don't think named feature gates, as 
proposed by Andy, are needed (we had that a while ago, it was called 
CONFIG_EXPERIMENTAL, and proved to be useless), but I'm open to discussion in 
that area.

> My other strawman is to declare that if an API is not documented, then
> it isn't stable.  People are welcome to use undocumented APIs, but when
> their app breaks, they get to keep both parts.  Of course, if the
> documentation is wrong, that puts us in an awkward place - especially if
> the documented behaviour is impossible to implement.  We can then
> schedule the release of the documentation at whatever time seems
> appropriate given the complexity and utility of the particular API.

I'd go one step further and say that every API has to be documented. There 
will always be undocumented features in every API as no documentation is 
perfect, and corner cases that nobody thought about can result in interesting 
undocumented behaviour that userspace starts relying on, but documentation is 
a must, and should not be written after the code stabilizes. Writing 
documentation is actually a good way to realize that an API is broken.

> My main point here is that I think the only real solution here is to
> revise the current social contract.  Trying to use technology to detect
> API changes - as has been suggested in this thread - is not a bad idea,
> but is unlikely to catch the really important problems.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04 17:53           ` Greg KH
  2017-08-04 22:52             ` Joe Perches
@ 2017-08-09 20:06             ` Geert Uytterhoeven
  1 sibling, 0 replies; 37+ messages in thread
From: Geert Uytterhoeven @ 2017-08-09 20:06 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

On Fri, Aug 4, 2017 at 7:53 PM, Greg KH <greg@kroah.com> wrote:
> On Fri, Aug 04, 2017 at 01:14:44PM -0400, Theodore Ts'o wrote:
>> On Fri, Aug 04, 2017 at 09:04:54AM -0700, Greg KH wrote:
>> > We could start searching linux-next for new additions of sysfs files
>> > (search for the ATTR macros), and complain that there are no matching
>> > Documentation/ABI/ updates at the same time.  I try to do that when
>> > reviewing patches that come through my trees, but yes, this is hard to
>> > keep up to date with.
>> >
>> > Sounds like a good GSoC project though, setting up the infrastructure to
>> > do this in a semi-automated fashion.
>>
>> This sounds like an obvious thing to add to checkpatch?
>
> Probably, but lots of times this would be a false-positive as
> documentation shows up in a later patch in the series to make things
> easier to review.

On the sender side, checkpatch will work fine, as it works against your
current tree, which usually contains all patches from the series you've just
created.

On the receiver side, a different order will indeed cause false positives.
But usually you can catch people not having run checkpatch before sending
their patches by the presence of other checkpatch issues, so those can be
used as a canary to switch to "more thorough review mode".

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-09  0:00 ` NeilBrown
  2017-08-09 11:54   ` Laurent Pinchart
@ 2017-08-09 20:21   ` Linus Torvalds
  2017-08-11  6:21     ` NeilBrown
  2017-08-15 18:26   ` Michael Kerrisk (man-pages)
  2 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-08-09 20:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski

On Tue, Aug 8, 2017 at 5:00 PM, NeilBrown <neilb@suse.com> wrote:
>
> I think this is primarily a social/communication issue.  We need to know
> what is expected and what can be trusted.  We need clear rules that
> everyone knows and that work for everyone.  Currently we have (fairly)
> clear rules that work fairly well in many cases, but can be problematic.
>
> The rules, as you outline, are that users should not experience
> regressions from one released kernel to a subsequent released kernel.
> So people working on -rc kernels can expect to experience regressions.
> Also kernel devs are free to create theoretical regressions as long an
> no-one experiences them.
>
> My strawman is to suggest that we relax this.

No.

The whole "no regressions" is a hard rule, and it will remain so.
It's pretty much the only really hard rule we have, and I will
continue to insist on it.

There are no loopholes. No "but it's been only one release". No, no,
no. The whole point is that users are supposed to be able to *trust*
the kernel. If we do something, we keep on doing it.

And if it makes it harder to add new user-visible interfaces, then
that's a *good* thing.

                   Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-09 20:21   ` Linus Torvalds
@ 2017-08-11  6:21     ` NeilBrown
  2017-08-11  6:39       ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2017-08-11  6:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 3219 bytes --]

On Wed, Aug 09 2017, Linus Torvalds wrote:

> On Tue, Aug 8, 2017 at 5:00 PM, NeilBrown <neilb@suse.com> wrote:
>>
>> I think this is primarily a social/communication issue.  We need to know
>> what is expected and what can be trusted.  We need clear rules that
>> everyone knows and that work for everyone.  Currently we have (fairly)
>> clear rules that work fairly well in many cases, but can be problematic.
>>
>> The rules, as you outline, are that users should not experience
>> regressions from one released kernel to a subsequent released kernel.
>> So people working on -rc kernels can expect to experience regressions.
>> Also kernel devs are free to create theoretical regressions as long an
>> no-one experiences them.
>>
>> My strawman is to suggest that we relax this.
>
> No.
>
> The whole "no regressions" is a hard rule, and it will remain so.
> It's pretty much the only really hard rule we have, and I will
> continue to insist on it.
>
> There are no loopholes. No "but it's been only one release". No, no,
> no. The whole point is that users are supposed to be able to *trust*
> the kernel. If we do something, we keep on doing it.

I completely agree with the "trust" issue.  I don't think my proposal
violates it.  It just changes the names of the things that can be
trusted.  You could easily change them back.
e.g.
- When you are ready to release 4.13, call it 4.13-pre1 instead
- You then open the merge window and pull changes onto this base
  working towards 4.14-rc1.  Just as you currently do, you accept
  changes to the API for interfaces that have not appeared in a
  released kernel (and 4.13 hasn't been released at this point).
- Greg takes over 4.13-pre and applied patches tagged for "stable"
  exactly as he currently does, except that he calls the first
  few releases "4.13-pre2" and "4.13-pre3" etc.  These "stable"
  patches might include changes to APIs that were introduced since 4.12,
  changes that you have already included in 4.14-rcX.
- After you release 4.14-pre1, the next kernel that Greg releases
  in the 4.13-preX series gets called "4.13", and subsequent kernels in
  that series are "4.13.1" etc as normal.

With this pattern, people can still trust an X.Y kernel, possible more
than they currently do (how many people wait for X.Y.3 before they will
move?).   With this pattern, we still get an X.Y every 2 months or so
(except for one 4 month gap at the change-over).
The main difference is that we are a bit more honest about how long it
takes to bake a kernel before it is ready.  We also get more time to
document and fix broken APIs.

"pre" is probably weird, and "rc" doesn't really mean "release
candidate" these days, except for rc7.  Maybe you could call your "rc"
kernels dev1, dev2, etc. Then Greg could use "rcX" for the real release
candidates.  But naming is hard.

>
> And if it makes it harder to add new user-visible interfaces, then
> that's a *good* thing.

I think the point is that it is currently too easy to add user-visible
changes (extra flags, etc), and none of the proposals actually make it
harder.  They just try to make them more visible.  The proposal makes it
harder in that it forces an extra 2 month delay.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-11  6:21     ` NeilBrown
@ 2017-08-11  6:39       ` Linus Torvalds
  2017-08-11  8:02         ` NeilBrown
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-08-11  6:39 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski

On Thu, Aug 10, 2017 at 11:21 PM, NeilBrown <neilb@suse.com> wrote:
>
> With this pattern, people can still trust an X.Y kernel,

I do *NOT* want people to trust an X.Y kernel.

Quite the opposite.

I want people to realize that the version doesn't matter, and that
they should feel safe in upgrading. The X and the Y don't matter, and
they *MUST*NOT*MATTER*.

If they do, the process is completely and utterly broken.

So what people should be able to trust is that they can always upgrade.

Not the shit that I see *ALL* the time, where you upgrade something,
and it breaks.

And no, the excuse "but the API was new in X.Y, so it could change in
X.Y+1" does *not* hold water.

It very much violates that basic principle of trust and makes people
go "I don't want to upgrade, because it might break something I do".

                    Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-11  6:39       ` Linus Torvalds
@ 2017-08-11  8:02         ` NeilBrown
  2017-08-11 23:10           ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2017-08-11  8:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

On Thu, Aug 10 2017, Linus Torvalds wrote:

> On Thu, Aug 10, 2017 at 11:21 PM, NeilBrown <neilb@suse.com> wrote:
>>
>> With this pattern, people can still trust an X.Y kernel,
>
> I do *NOT* want people to trust an X.Y kernel.
>
> Quite the opposite.
>
> I want people to realize that the version doesn't matter, and that
> they should feel safe in upgrading. The X and the Y don't matter, and
> they *MUST*NOT*MATTER*.
>
> If they do, the process is completely and utterly broken.
>
> So what people should be able to trust is that they can always upgrade.

What do you mean by "upgrade"?
Can I upgrade from 3.15 to 3.16-rc1?  If not, why not?

NeilBrown


>
> Not the shit that I see *ALL* the time, where you upgrade something,
> and it breaks.
>
> And no, the excuse "but the API was new in X.Y, so it could change in
> X.Y+1" does *not* hold water.
>
> It very much violates that basic principle of trust and makes people
> go "I don't want to upgrade, because it might break something I do".
>
>                     Linus

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-11  8:02         ` NeilBrown
@ 2017-08-11 23:10           ` Linus Torvalds
  2017-08-14  4:19             ` NeilBrown
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-08-11 23:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski

On Fri, Aug 11, 2017 at 1:02 AM, NeilBrown <neilb@suse.com> wrote:
>
> What do you mean by "upgrade"?
> Can I upgrade from 3.15 to 3.16-rc1?  If not, why not?

Yes..

Of course, bugs happen, and then they get fixed.

But yes, even including things like -rc1 (or just "random untagged
kernel of the day") you should

 (a) feel safe in always upgrading to any higher version (I *hope* you
can always also downgrade to a lower kernel version, but obviously at
some point user space may start depending on newer features that
simply don't exist in older kernels).

 (b) also feel that if something breaks, it's a bug, and people will
take it seriously and not dismiss it with some crazy "N+1 version"
excuse.

There are some cases where we may not be able to avoid breakage: the
main two are "security issues" and "insanely old hardware".

And even for security issues, we try really really hard to avoid breakage.

And the key word in "insanely old hardware" is that "insanely" part.
At some point it just gets too hard to test (and sometimes the
hardware is too broken, like the original i386 non-working supervisor
page fault workarounds).

Now, it can get really interesting if somebody notices an ABI change
so late that others have started to depend on that ABI change. At that
point, it's a "damned if you do, damned if you don't". We've actually
been able to handle even that occasionally (by just adjusting behavior
automatically based on some pattern), but at some point it obviously
is impossible to fix both cases.

And then I say "if it took you three years to upgrade and notice a
behavioral change that nobody else noticed, it's no longer _our_
fault".

So there is _some_ onus on people actually testing and reporting these
things, but I can't off-hand actually remember any case of this really
being a major issue. So it's largely a theoretical thing.

                Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-11 23:10           ` Linus Torvalds
@ 2017-08-14  4:19             ` NeilBrown
  2017-08-14 18:34               ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2017-08-14  4:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 2604 bytes --]

On Fri, Aug 11 2017, Linus Torvalds wrote:

> On Fri, Aug 11, 2017 at 1:02 AM, NeilBrown <neilb@suse.com> wrote:
>>
>> What do you mean by "upgrade"?
>> Can I upgrade from 3.15 to 3.16-rc1?  If not, why not?
>
> Yes..
>
> Of course, bugs happen, and then they get fixed.
>
> But yes, even including things like -rc1 (or just "random untagged
> kernel of the day") you should
>
>  (a) feel safe in always upgrading to any higher version (I *hope* you
> can always also downgrade to a lower kernel version, but obviously at
> some point user space may start depending on newer features that
> simply don't exist in older kernels).
>
>  (b) also feel that if something breaks, it's a bug, and people will
> take it seriously and not dismiss it with some crazy "N+1 version"
> excuse.

I think the issue is that some of us would like a clearer statement on
what values of "some point" we will honor, and which values are
crazy-talk.

This related slightly to your comment:

>
> And then I say "if it took you three years to upgrade and notice a
> behavioral change that nobody else noticed, it's no longer _our_
> fault".

Can we also say "if you started depending on an API that has only been
in the kernel for 3 weeks and we had to revise it, then it not _our_
fault if you depend on it already"??

In the original post in this thread, Andy seemed to think that as long
as it gets "fixed before the next real release", we are safe.  Your
comments could be read to mean that you don't agree and that there is no
clear line at which we are safe.

You mentioned trust earlier:

> The whole point is that users are supposed to be able to *trust*
> the kernel.

I agree, but I think trust works best if it works both ways.  Can we
trust application developers not to depend on an API which hasn't
reached maturity?  I think we can if we tell them what we expect - what
constitutes "maturity" - and make it a reasonable expectation.  Do we
have to declare "maturity" the moment an API hits mainline (or hits
-next, or hits mailing lists), or can we negotiate a formal grace
period?

Yes, "no regressions" is an import rule that should remain, but there
has always been a loophole.  The loophole is "no harm, no foul".  If we
can negotiate an understanding that results in "no harm" from early
revisions to an API, then those revisions will not cause actual
regressions. "No foul".


But maybe I'm wrong and all the people talking about automatic tooling
to discover and highlight API changes in linux-next are on the right
track... or would be if everything actually went through linux-next.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-14  4:19             ` NeilBrown
@ 2017-08-14 18:34               ` Linus Torvalds
  2017-08-14 18:40                 ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-08-14 18:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski

On Sun, Aug 13, 2017 at 9:19 PM, NeilBrown <neilb@suse.com> wrote:
>
> I think the issue is that some of us would like a clearer statement on
> what values of "some point" we will honor, and which values are
> crazy-talk.

What?

I thought I was *VERY* clear. It has *ALWAYS* been very clear.

There is absolutely *no* cut-off. If a feature has been in a released
kernel, we support it. End of story.

Stop fishing for "some point". It doesn't exist. It never has. And it
never will.

If you worry about how good and stable your ABI is, and aren't willing
to support that ABI forever, don't send the patch. Seriously. Just
don't.

This whole discussion is pointless.

                      Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-14 18:34               ` Linus Torvalds
@ 2017-08-14 18:40                 ` Linus Torvalds
  2017-08-14 23:23                   ` Andy Lutomirski
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-08-14 18:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: ksummit-discuss, Andy Lutomirski

On Mon, Aug 14, 2017 at 11:34 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If you worry about how good and stable your ABI is, and aren't willing
> to support that ABI forever, don't send the patch. Seriously. Just
> don't.
>
> This whole discussion is pointless.

To clarify, and to strengthen the point: the regression has always
been about actual breakage. You can change semantics all you want, if
nobody ever notices.

But if somebody does notice, and something breaks, it gets fixed.

That's the rule. No exceptions. If you aren't willing to fix the bugs
you introduce, you shouldn't be working on the kernel.

It's that simple. Find some other project to mess up - there are tons
of sh*t projects out there that think that changing ABI's is a good
idea and should be done regularly.

But the kernel cares about regressions.

Christ, this is not a new rule.

               Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-04 16:04       ` Greg KH
  2017-08-04 17:14         ` Theodore Ts'o
@ 2017-08-14 19:49         ` Steven Rostedt
  2017-08-14 19:51           ` Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2017-08-14 19:49 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

On Fri, 4 Aug 2017 09:04:54 -0700
Greg KH <greg@kroah.com> wrote:


> We could start searching linux-next for new additions of sysfs files
> (search for the ATTR macros), and complain that there are no matching
> Documentation/ABI/ updates at the same time.  I try to do that when
> reviewing patches that come through my trees, but yes, this is hard to
> keep up to date with.
> 
> Sounds like a good GSoC project though, setting up the infrastructure to
> do this in a semi-automated fashion.

And perhaps do the same for new tracepoints.

I'm wondering if we should start documenting all tracepoints, and have
them not be added unless there's documentation with them.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-14 19:49         ` Steven Rostedt
@ 2017-08-14 19:51           ` Linus Torvalds
  2017-08-15  7:13             ` Julia Lawall
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-08-14 19:51 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Mon, Aug 14, 2017 at 12:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 4 Aug 2017 09:04:54 -0700
> Greg KH <greg@kroah.com> wrote:
>
>
>> We could start searching linux-next for new additions of sysfs files
>> (search for the ATTR macros), and complain that there are no matching
>> Documentation/ABI/ updates at the same time.  I try to do that when
>> reviewing patches that come through my trees, but yes, this is hard to
>> keep up to date with.
>>
>> Sounds like a good GSoC project though, setting up the infrastructure to
>> do this in a semi-automated fashion.
>
> And perhaps do the same for new tracepoints.

Honestly, the *real* issue has traditionally been new ioctl's, not so
much sysfs files or tracepoints.

People add random device-specific crud that then has issues with
alignment or word size.

                  Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-09 11:54   ` Laurent Pinchart
@ 2017-08-14 20:07     ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2017-08-14 20:07 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Andy Lutomirski, ksummit-discuss

On Wed, 09 Aug 2017 14:54:10 +0300
Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote:

> Hi Neil,
> 

> > > I'm wondering if there are other models that could work.  I think it
> > > would be nice for us to be able to land a kernel in Linus tree and
> > > still wait a while before stabilizing it.  Rust, for example, has a
> > > strict policy for this that seems to work quite well.

I like the model of having to add a patch to implement a new ABI.
Because then technically it never broke if some user space app
depended on it, as the API never existed without modifying the kernel.

But I also wonder if we could have a linux-api similar to linux-next.
Where the linux-api is used to test api's until they are ready. It
should follow Linus's tree, similar to linux-next, where new features
get merged in nightly. But the difference from linux-next is that the
api doesn't have to go into Linus's tree at the next merge window. It
would be a place that APIs could be tested and changed, and not get
into Linus's tree till it is ready.



> Education is a slow process but gives the best results. What we should first 
> aim for, in my opinion, isn't to turn everybody into an API expert, but to 
> have enough reviewers who can spot API changes and wave a red flag if the 
> change hasn't gone to a proper review process. Part of this could possibly be 
> automated as discussed in this mail thread, but at the end of the day it's 
> really about a culture change to make sure APIs are treated with enough care.

I would recommend a static analyzer that looks at linux-next for new
APIs, and flags anything that it finds, where others can audit them.
Perhaps make a rule that no new API is added without documentation, or
if we have the above linux-api, going through that too. This will only
work if it is automated. Linus could see the list of new APIs added and
determine if it should be pulled or not. It would be too much work for
him to search the code for new APIs, but a tool that gives a simple
list that he can check them off after he agrees with them, may be
scalable.

> 
> Now, assuming we can fix this first problem and get all new APIs properly 
> reviewed and tested, the next question is what a proper review and test 
> process should be. The DRM/KMS subsystem has put a process in place (as 
> explained by Daniel Vetter in this mail thread) where every new API has to be 
> implemented in real userspace components (and thus not just in test tools) and 
> approved by the appropriate maintainers. The bar is pretty high, and possibly 
> too high, but it is in my opinion better than the other way around.

As Linux becomes more advanced and used in more critical systems, I
want that bar to rise. I'm trying to police myself with new features,
and make sure they are all documented before adding them as well.

> 
> Yes, this will slow down patch acceptance, but I don't think that's a problem, 
> quite the contrary. I'd rather slow down merging new APIs upstream than having 
> to live with lots of crappy APIs, as long as the development process at the 
> subsystem level is not slowed down. That's where process and infrastructure 
> could help, to ensure that userspace components consuming new APIs can easily 
> find the kernel code they need to test. I don't think named feature gates, as 
> proposed by Andy, are needed (we had that a while ago, it was called 
> CONFIG_EXPERIMENTAL, and proved to be useless), but I'm open to discussion in 
> that area.

We could add a new label CONFIG_TEST_ABI which acts like CONFIG_BROKEN
and doesn't compile that code. One would have to manually remove the
config (hence patch the kernel) to have it compile.

	depends on CONFIG_TEST_ABI

would need to be manually changed to

	depends on CONFIG_RUN_ABI

(OK, I suck with names) and then it would be compiled in. This would
still be in line with Linus's (don't break existing kernels), as the
code he ships will never actually execute without modification. And if
you modify it to run an app, then it's your fault if the app breaks
because it changes.

> I'd go one step further and say that every API has to be documented. There 
> will always be undocumented features in every API as no documentation is 
> perfect, and corner cases that nobody thought about can result in interesting 
> undocumented behaviour that userspace starts relying on, but documentation is 
> a must, and should not be written after the code stabilizes. Writing 
> documentation is actually a good way to realize that an API is broken.

+1

> 
> > My main point here is that I think the only real solution here is to
> > revise the current social contract.  Trying to use technology to detect
> > API changes - as has been suggested in this thread - is not a bad idea,
> > but is unlikely to catch the really important problems.  
> 

But it will definitely help. We can't implement any of this without
tools to track down API changes. If a new API is added, at the very
minimum, there must be some documentation with it. Linus will have the
final say, but it would go a long way if he was able to run some tool
on a series of pull requests to see what new APIs have been added, and
then decided if he should revert them or not if they appear that they
will become unmaintainable in the future.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-14 18:40                 ` Linus Torvalds
@ 2017-08-14 23:23                   ` Andy Lutomirski
  2017-08-15  0:54                     ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Andy Lutomirski @ 2017-08-14 23:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski

On Mon, Aug 14, 2017 at 11:40 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Aug 14, 2017 at 11:34 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> If you worry about how good and stable your ABI is, and aren't willing
>> to support that ABI forever, don't send the patch. Seriously. Just
>> don't.
>>
>> This whole discussion is pointless.
>
> To clarify, and to strengthen the point: the regression has always
> been about actual breakage. You can change semantics all you want, if
> nobody ever notices.
>
> But if somebody does notice, and something breaks, it gets fixed.
>
> That's the rule. No exceptions. If you aren't willing to fix the bugs
> you introduce, you shouldn't be working on the kernel.

What I was trying to get at with this thread was: is there a way that
we can enable a new feature for testing in a way that it *can't* get
used by real programs that expect stability?  There are certainly
nasty solutions.  For example, cgroup v2 cpu controller support has
been available as an out-of-tree patch for many cycles.  It's finally
being hashed out in a way that's incompatible with programs that
target that patch, but no one is going say "screw you, 4.14 broke my
setup" because that setup didn't work on any earlier kernel either.

I'm wondering if we can maybe make this more systematic and less
nasty.  For example, what if we could have a way to enable features
such that they work in -rc kernels (with big warnings!) but do *not*
work in released kernels?  Then people who want to develop against
them would have to explicitly run -rc kernels, which would make it
obvious that nothing's stable and might even get more -rc testers to
boot.

--Andy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-14 23:23                   ` Andy Lutomirski
@ 2017-08-15  0:54                     ` Linus Torvalds
  2017-08-15 16:11                       ` Andy Lutomirski
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-08-15  0:54 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: ksummit-discuss

On Mon, Aug 14, 2017 at 4:23 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> What I was trying to get at with this thread was: is there a way that
> we can enable a new feature for testing in a way that it *can't* get
> used by real programs that expect stability?

Honestly, I can't think of a case where that would actually have been an issue.

Make a config option out of it, and mark it expert, and maybe that would do it.

But realistically, that just doesn't make any sense in reality -
because in reality, user programs get written not on top of the
development kernel, but on vendor kernels.

So the scenario you describe simply never happens.

The _reverse_ scenario does happen: vendors who do their own kernel
patches that introduce something their customers need, and people
start depending on those semantics.

Android may be the case where that happens today, but it's not the
only case. We've merged code that was in use by various Linux distro
people and where there already was an active user base of the new ABI.

So I think your issue is pretty much theoretical, and _would_ be easy
to fix with some kind of "this option is only enabled for rc kernels,
and gets disabled on release", but such an option just doesn't make
sense because that's not how development actually happens.

               Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-14 19:51           ` Linus Torvalds
@ 2017-08-15  7:13             ` Julia Lawall
  0 siblings, 0 replies; 37+ messages in thread
From: Julia Lawall @ 2017-08-15  7:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit



On Mon, 14 Aug 2017, Linus Torvalds wrote:

> On Mon, Aug 14, 2017 at 12:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Fri, 4 Aug 2017 09:04:54 -0700
> > Greg KH <greg@kroah.com> wrote:
> >
> >
> >> We could start searching linux-next for new additions of sysfs files
> >> (search for the ATTR macros), and complain that there are no matching
> >> Documentation/ABI/ updates at the same time.  I try to do that when
> >> reviewing patches that come through my trees, but yes, this is hard to
> >> keep up to date with.
> >>
> >> Sounds like a good GSoC project though, setting up the infrastructure to
> >> do this in a semi-automated fashion.
> >
> > And perhaps do the same for new tracepoints.
>
> Honestly, the *real* issue has traditionally been new ioctl's, not so
> much sysfs files or tracepoints.
>
> People add random device-specific crud that then has issues with
> alignment or word size.

In terms of documentation, there are around 2500 names defined using _IOW,
_IOR, or _IOWR, and around 500 of them are mentioned somewhere in
Documentation.

julia

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-15  0:54                     ` Linus Torvalds
@ 2017-08-15 16:11                       ` Andy Lutomirski
  0 siblings, 0 replies; 37+ messages in thread
From: Andy Lutomirski @ 2017-08-15 16:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit-discuss, Andy Lutomirski

On Mon, Aug 14, 2017 at 5:54 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Aug 14, 2017 at 4:23 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> What I was trying to get at with this thread was: is there a way that
>> we can enable a new feature for testing in a way that it *can't* get
>> used by real programs that expect stability?
>
> Honestly, I can't think of a case where that would actually have been an issue.
>
> Make a config option out of it, and mark it expert, and maybe that would do it.

That seems optimistic to me.

>
> But realistically, that just doesn't make any sense in reality -
> because in reality, user programs get written not on top of the
> development kernel, but on vendor kernels.

Plenty of user programs get written against development kernels.
iproute2 is a prime example.  But IIRC the reason that RDMA disaster
didn't get reverted is that people thought that user programs using it
existing something like one week after the stable kernel containing
the feature showed up.

>
> So the scenario you describe simply never happens.
>
> The _reverse_ scenario does happen: vendors who do their own kernel
> patches that introduce something their customers need, and people
> start depending on those semantics.
>
> Android may be the case where that happens today, but it's not the
> only case. We've merged code that was in use by various Linux distro
> people and where there already was an active user base of the new ABI.
>
> So I think your issue is pretty much theoretical, and _would_ be easy
> to fix with some kind of "this option is only enabled for rc kernels,
> and gets disabled on release", but such an option just doesn't make
> sense because that's not how development actually happens.

But maybe it would be a good thing if more development happened that
way.  If nothing else, we'd get lots more testing :)

>
>                Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates?
  2017-08-09  0:00 ` NeilBrown
  2017-08-09 11:54   ` Laurent Pinchart
  2017-08-09 20:21   ` Linus Torvalds
@ 2017-08-15 18:26   ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2017-08-15 18:26 UTC (permalink / raw)
  To: NeilBrown, Andy Lutomirski, ksummit-discuss

On 08/09/2017 02:00 AM, NeilBrown wrote:
> On Thu, Aug 03 2017, Andy Lutomirski wrote:
> 

[...]

> My other strawman is to declare that if an API is not documented, then
> it isn't stable.  People are welcome to use undocumented APIs, but when
> their app breaks, they get to keep both parts.  Of course, if the
> documentation is wrong, that puts us in an awkward place - especially if
> the documented behaviour is impossible to implement.  We can then
> schedule the release of the documentation at whatever time seems
> appropriate given the complexity and utility of the particular API.

Given that features sometimes exist for years (and in rare cases
decades) before they are documented, and many longstanding features
remain incompletely documented, the notion that "if an API is not
documented, then it isn't stable" makes no sense, really.

> My main point here is that I think the only real solution here is to
> revise the current social contract.  Trying to use technology to detect
> API changes - as has been suggested in this thread - is not a bad idea,
> but is unlikely to catch the really important problems.

Agreed. There are existing techniques (thorough documentation, more tests)
that, if adhered to more strictly, could certainly alleviate many of the
API design mess-ups. That sort of stuff requires people to really
think about what they are doing, and gives other people greater
insight into what they are doing, and in the process uncovers
implementation and design bugs. (I can't count the number of times
I discovered implementation bugs while I wrote manual pages.)

Some technology to discover API changes would certainly be helpful, but
it doesn't solve the deeper problem. It needs human beings to look
at this stuff. We seem to have learned the lesson that ungoverned API
design leads to chaos in the case of cgroups v1. And the approach in
cgroups v2 made a critical change: there are actually individuals
with overall design responsibility for the interface.

I think that solution could be applied more generally. Have a *paid*
user-space ABI maintainer whose job is to track ABI changes, make sure
that someone has documented them sufficiently well, and that thorough
tests have been written, before the interface can be released in the
mainline kernel.  Ideally, documented real-world use cases, along with
real-world applications using the new interface would also be part of
the package required for acceptance of a new interface.[*]

Cheers,

Michael

[*] Note, by the way, that I'm not proposing that the ABI maintainer
should write the docs or tests (although they might on occasion), 
just that they act as the gatekeeper to make sure that someone has 
done those tasks to a sufficient standard.


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-08-15 18:26 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04  1:16 [Ksummit-discuss] [MAINTAINER TOPIC] ABI feature gates? Andy Lutomirski
2017-08-04  1:30 ` Greg KH
2017-08-04  4:15   ` Andy Lutomirski
2017-08-04  5:08   ` Sergey Senozhatsky
2017-08-04  8:23   ` Daniel Vetter
2017-08-04  2:26 ` Theodore Ts'o
2017-08-04  3:27   ` Stephen Rothwell
2017-08-04  5:13     ` Julia Lawall
2017-08-04 14:20       ` Theodore Ts'o
2017-08-04 15:47         ` Julia Lawall
2017-08-04  8:42   ` Jiri Kosina
2017-08-04  8:53     ` Hannes Reinecke
2017-08-04 16:04       ` Greg KH
2017-08-04 17:14         ` Theodore Ts'o
2017-08-04 17:53           ` Greg KH
2017-08-04 22:52             ` Joe Perches
2017-08-09 20:06             ` Geert Uytterhoeven
2017-08-14 19:49         ` Steven Rostedt
2017-08-14 19:51           ` Linus Torvalds
2017-08-15  7:13             ` Julia Lawall
2017-08-04  8:57     ` Julia Lawall
2017-08-04 11:27       ` Michael Kerrisk (man-pages)
2017-08-09  0:00 ` NeilBrown
2017-08-09 11:54   ` Laurent Pinchart
2017-08-14 20:07     ` Steven Rostedt
2017-08-09 20:21   ` Linus Torvalds
2017-08-11  6:21     ` NeilBrown
2017-08-11  6:39       ` Linus Torvalds
2017-08-11  8:02         ` NeilBrown
2017-08-11 23:10           ` Linus Torvalds
2017-08-14  4:19             ` NeilBrown
2017-08-14 18:34               ` Linus Torvalds
2017-08-14 18:40                 ` Linus Torvalds
2017-08-14 23:23                   ` Andy Lutomirski
2017-08-15  0:54                     ` Linus Torvalds
2017-08-15 16:11                       ` Andy Lutomirski
2017-08-15 18:26   ` Michael Kerrisk (man-pages)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.