linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
@ 2017-10-03  4:02 James Bottomley
  2017-10-03  4:11 ` John Johansen
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-10-03  4:02 UTC (permalink / raw)
  To: John Johansen, Seth Arnold; +Cc: linux-kernel

The specific problem is that dnsmasq refuses to start on openSUSE Leap
42.2.  The specific cause is that and attempt to open a PF_LOCAL socket
gets EACCES.  This means that networking doesn't function on a system
with a 4.14-rc2 system.

Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e (apparmor:
add base infastructure for socket mediation) causes the system to
function again.

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-03  4:02 regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation James Bottomley
@ 2017-10-03  4:11 ` John Johansen
  2017-10-03  5:15   ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: John Johansen @ 2017-10-03  4:11 UTC (permalink / raw)
  To: James Bottomley, Seth Arnold; +Cc: linux-kernel

On 10/02/2017 09:02 PM, James Bottomley wrote:
> The specific problem is that dnsmasq refuses to start on openSUSE Leap
> 42.2.  The specific cause is that and attempt to open a PF_LOCAL socket
> gets EACCES.  This means that networking doesn't function on a system
> with a 4.14-rc2 system.
> 
> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e (apparmor:
> add base infastructure for socket mediation) causes the system to
> function again.
> 

This is not a kernel regression, it is because  opensuse dnsmasque is
starting with policy that doesn't allow access to PF_LOCAL socket

Christian Boltz the opensuse apparmor maintainer has been working
on a policy update for opensuse see bug

https://bugzilla.opensuse.org/show_bug.cgi?id=1061195

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-03  4:11 ` John Johansen
@ 2017-10-03  5:15   ` James Bottomley
  2017-10-03  6:32     ` John Johansen
  2017-10-03  6:48     ` Vlastimil Babka
  0 siblings, 2 replies; 19+ messages in thread
From: James Bottomley @ 2017-10-03  5:15 UTC (permalink / raw)
  To: John Johansen, Seth Arnold; +Cc: linux-kernel

On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
> On 10/02/2017 09:02 PM, James Bottomley wrote:
> > 
> > The specific problem is that dnsmasq refuses to start on openSUSE
> > Leap 42.2.  The specific cause is that and attempt to open a
> > PF_LOCAL socket gets EACCES.  This means that networking doesn't
> > function on a system with a 4.14-rc2 system.
> > 
> > Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
> > (apparmor: add base infastructure for socket mediation) causes the
> > system to function again.
> > 
> 
> This is not a kernel regression,

Regression means something that worked in a previous version of the
kernel which is broken now. This problem falls within that definition.

>  it is because  opensuse dnsmasque is starting with policy that
> doesn't allow access to PF_LOCAL socket

Because there was no co-ordination between their version of the patch
and yours.  If you're sending in patches that you know might break
systems because they need a co-ordinated rollout of something in
userspace then it would be nice if you could co-ordinate it ...

Doing it in the merge window and not in -rc2 would also be helpful
because I have more expectation of a userspace mismatch from stuff in
the merge window.

> Christian Boltz the opensuse apparmor maintainer has been working
> on a policy update for opensuse see bug
> 
> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195

Well, that looks really encouraging: The line about "To give you an
impression what "lots of" means - I had to adjust 40 profiles on my
laptop".  The upshot being apart from a bandaid, openSUSE still has no
co-ordinated fix for this.

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-03  5:15   ` James Bottomley
@ 2017-10-03  6:32     ` John Johansen
  2017-10-03  6:48     ` Vlastimil Babka
  1 sibling, 0 replies; 19+ messages in thread
From: John Johansen @ 2017-10-03  6:32 UTC (permalink / raw)
  To: James Bottomley, Seth Arnold; +Cc: linux-kernel

On 10/02/2017 10:15 PM, James Bottomley wrote:
> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>
>>> The specific problem is that dnsmasq refuses to start on openSUSE
>>> Leap 42.2.  The specific cause is that and attempt to open a
>>> PF_LOCAL socket gets EACCES.  This means that networking doesn't
>>> function on a system with a 4.14-rc2 system.
>>>
>>> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
>>> (apparmor: add base infastructure for socket mediation) causes the
>>> system to function again.
>>>
>>
>> This is not a kernel regression,
> 
> Regression means something that worked in a previous version of the
> kernel which is broken now. This problem falls within that definition.
> 

sure, its a regression for suse based system. It isn't however a
regression in the kernel code or interface. It makes the information
available, its a matter of how the user space and policy are
configured.

It is entirely possible to use the 4.14 kernel on suse without having
to modify policy if the policy version/feature set is pinned. However
this is not a feature that suse seems to be using. Instead suse policy
is tracking and enforcing all kernel supported features when they
become available, regardless of whether the policy has been updated.

This makes sense for a policy developers machine, not so much for a
general user. I will have to discuss this with Christian and Goldwyn.


>>  it is because  opensuse dnsmasque is starting with policy that
>> doesn't allow access to PF_LOCAL socket
> 
> Because there was no co-ordination between their version of the patch
> and yours.  If you're sending in patches that you know might break
> systems because they need a co-ordinated rollout of something in
> userspace then it would be nice if you could co-ordinate it ...
>

This information was communicated more than once. That is not to say
there were not issues with the landing or else you would not have seen
this. In fact I would say this particular sync was handled poorly and
we as an upstream certainly have to take some of the blame for it.

The userspace that supported the 4.14 kernel changes landed long
ago. It was specific policy updates that were missing.

Ideally your policy would have been pinned to a specific kernel
feature set, so that kernel changes would not have resulted in policy
issues.

> Doing it in the merge window and not in -rc2 would also be helpful
> because I have more expectation of a userspace mismatch from stuff in
> the merge window.
> 

Certainly and this would have landed during the merge window except
for an issue with the security tree. This particular series lived in
-next for several weeks before landing and I would have never asked
for it to have been pulled as late as it was except for the issue
around the security tree this last cycle.

>> Christian Boltz the opensuse apparmor maintainer has been working
>> on a policy update for opensuse see bug
>>
>> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195
> 
> Well, that looks really encouraging: The line about "To give you an
> impression what "lots of" means - I had to adjust 40 profiles on my
> laptop".  The upshot being apart from a bandaid, openSUSE still has no
> co-ordinated fix for this.
> 

yes, it is a change that affects policy, the same can be said for any
other MAC system when new mediation is added. It can be fixed by either
configuring the feature set/version that policy is targeting or updating
policy.

For policy changes this particular change it can mostly be fixed by an
adjustment to the abstractions. The bandaid referenced has to do with
Christian choosing to use only what is supported in 4.14 instead of
the upstream solution which contains rules for work targeted beyond
4.14, even though userspace supports those rules already and will
compile them to a policy that works in 4.14.

However Christian wants to update the suse policy using the 4.14
kernel because he does not feel that he can properly verify the
upstream policy changes on suse with 4.14. This is an understandable
stance for him to take, but it does mean there is some disconnect
between what is in the upstream apparmor project and what is in suse.

Regardless this is a change that you shouldn't have noticed, so its
obvious the coordination was off and needs to be improved.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-03  5:15   ` James Bottomley
  2017-10-03  6:32     ` John Johansen
@ 2017-10-03  6:48     ` Vlastimil Babka
  2017-10-03  7:17       ` John Johansen
  1 sibling, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2017-10-03  6:48 UTC (permalink / raw)
  To: James Bottomley, John Johansen, Seth Arnold; +Cc: linux-kernel

On 10/03/2017 07:15 AM, James Bottomley wrote:
> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>
>>> The specific problem is that dnsmasq refuses to start on openSUSE
>>> Leap 42.2.  The specific cause is that and attempt to open a
>>> PF_LOCAL socket gets EACCES.  This means that networking doesn't
>>> function on a system with a 4.14-rc2 system.
>>>
>>> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
>>> (apparmor: add base infastructure for socket mediation) causes the
>>> system to function again.
>>>
>>
>> This is not a kernel regression,
> 
> Regression means something that worked in a previous version of the
> kernel which is broken now. This problem falls within that definition.

Hm, but if this was because opensuse kernel and apparmor rules relied on
an out-of-tree patch, then it's not an upstream regression?

>>  it is because  opensuse dnsmasque is starting with policy that
>> doesn't allow access to PF_LOCAL socket
> 
> Because there was no co-ordination between their version of the patch
> and yours.  If you're sending in patches that you know might break
> systems because they need a co-ordinated rollout of something in
> userspace then it would be nice if you could co-ordinate it ...
> 
> Doing it in the merge window and not in -rc2 would also be helpful
> because I have more expectation of a userspace mismatch from stuff in
> the merge window.

Agree, but with rc2 there's still plenty of time, and running rcX means
some issues can be expected...

>> Christian Boltz the opensuse apparmor maintainer has been working
>> on a policy update for opensuse see bug
>>
>> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195
> 
> Well, that looks really encouraging: The line about "To give you an
> impression what "lots of" means - I had to adjust 40 profiles on my
> laptop".  The upshot being apart from a bandaid, openSUSE still has no
> co-ordinated fix for this.

Note that the openSUSE Leap 42.2 kernel is 4.4, so by running 4.14 means
you are unsupported from the distro POV and you can't expect that the
42.2 apparmor profiles will ever be updated. I reported the bug above
for the Tumbleweed rolling distro, which gets new kernels after the
final version is released and passes QA. rcX kernels are packaged for
testing, but you have to add the repo explicitly. So there's still
enough time to co-ordinate fix of profiles and final 4.14 even for
Tumbleweed.

> James
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-03  6:48     ` Vlastimil Babka
@ 2017-10-03  7:17       ` John Johansen
  2017-10-24  6:39         ` Thorsten Leemhuis
  0 siblings, 1 reply; 19+ messages in thread
From: John Johansen @ 2017-10-03  7:17 UTC (permalink / raw)
  To: Vlastimil Babka, James Bottomley, Seth Arnold; +Cc: linux-kernel

On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
> On 10/03/2017 07:15 AM, James Bottomley wrote:
>> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>>
>>>> The specific problem is that dnsmasq refuses to start on openSUSE
>>>> Leap 42.2.  The specific cause is that and attempt to open a
>>>> PF_LOCAL socket gets EACCES.  This means that networking doesn't
>>>> function on a system with a 4.14-rc2 system.
>>>>
>>>> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
>>>> (apparmor: add base infastructure for socket mediation) causes the
>>>> system to function again.
>>>>
>>>
>>> This is not a kernel regression,
>>
>> Regression means something that worked in a previous version of the
>> kernel which is broken now. This problem falls within that definition.
> 
> Hm, but if this was because opensuse kernel and apparmor rules relied on
> an out-of-tree patch, then it's not an upstream regression?
> 

While its true that previous opensuse kernels were relying on an out
of tree patch for doing mediation in this area, the real issue is the
configuration of the userspace on the system is setup to enforce new
policy features advertised by the kernel. Regardless of whether policy
has been updated to deal with it.

Distros should be pinning the feature set supported because as you
note below, policy will not get updated for unsupported kernels and you
will end up in an unsupported state where regressions like this can
happen.

There are reasons why distros don't, largely because certain packages
would like to take advanatage of new features, or only want to support
a single policy version across multiple releases and are relying on
the userspace tools to properly compile the policy to different
kernels.

The current pinning support doesn't allow for mixing policy versions
which can make supporting updated packages difficult atm, but there is
work (that hasn't landed yet) to allow for policy of different version
by putting the requirements within the individual profiles and will
completely avoid the problems encountered here.


>>>  it is because  opensuse dnsmasque is starting with policy that
>>> doesn't allow access to PF_LOCAL socket
>>
>> Because there was no co-ordination between their version of the patch
>> and yours.  If you're sending in patches that you know might break
>> systems because they need a co-ordinated rollout of something in
>> userspace then it would be nice if you could co-ordinate it ...
>>
>> Doing it in the merge window and not in -rc2 would also be helpful
>> because I have more expectation of a userspace mismatch from stuff in
>> the merge window.
> 
> Agree, but with rc2 there's still plenty of time, and running rcX means
> some issues can be expected...
> 
>>> Christian Boltz the opensuse apparmor maintainer has been working
>>> on a policy update for opensuse see bug
>>>
>>> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195
>>
>> Well, that looks really encouraging: The line about "To give you an
>> impression what "lots of" means - I had to adjust 40 profiles on my
>> laptop".  The upshot being apart from a bandaid, openSUSE still has no
>> co-ordinated fix for this.
> 
> Note that the openSUSE Leap 42.2 kernel is 4.4, so by running 4.14 means
> you are unsupported from the distro POV and you can't expect that the
> 42.2 apparmor profiles will ever be updated. I reported the bug above
> for the Tumbleweed rolling distro, which gets new kernels after the
> final version is released and passes QA. rcX kernels are packaged for
> testing, but you have to add the repo explicitly. So there's still
> enough time to co-ordinate fix of profiles and final 4.14 even for
> Tumbleweed.
> 
>> James
>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-03  7:17       ` John Johansen
@ 2017-10-24  6:39         ` Thorsten Leemhuis
  2017-10-24 11:03           ` James Bottomley
  2017-10-24 11:31           ` John Johansen
  0 siblings, 2 replies; 19+ messages in thread
From: Thorsten Leemhuis @ 2017-10-24  6:39 UTC (permalink / raw)
  To: John Johansen, Vlastimil Babka, James Bottomley, Seth Arnold; +Cc: linux-kernel

Lo, your friendly regression tracker here!

On 03.10.2017 09:17, John Johansen wrote:
> On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
>> On 10/03/2017 07:15 AM, James Bottomley wrote:
>>> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>>>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>>>
>>>>> The specific problem is that dnsmasq refuses to start on openSUSE
>>>>> Leap 42.2.  The specific cause is that and attempt to open a
>>>>> PF_LOCAL socket gets EACCES.  This means that networking doesn't
>>>>> function on a system with a 4.14-rc2 system.
>>>>> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
>>>>> (apparmor: add base infastructure for socket mediation) causes the
>>>>> system to function again.
>>>> This is not a kernel regression,
>>> Regression means something that worked in a previous version of the
>>> kernel which is broken now. This problem falls within that definition.
>> Hm, but if this was because opensuse kernel and apparmor rules relied on
>> an out-of-tree patch, then it's not an upstream regression?
> While its true that previous opensuse kernels were relying on an out
> of tree patch for doing mediation in this area, the real issue is the
> configuration of the userspace on the system is setup to enforce new
> policy features advertised by the kernel. Regardless of whether policy
> has been updated to deal with it.

Did anything came out of this discussion? I checked LKML and recent
commits, but missed if anything happened. But it seems this problem
annoys quite a few of people on various distros. It turned out one of
the the regressions in my last regression report seemed to be due to the
changes in apparmor. See:

https://bugzilla.kernel.org/show_bug.cgi?id=197137#7

That commit links to two bugs filed for Debian and Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1724450
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877581

The stuff even made the news:
https://www.phoronix.com/scan.php?page=news_item&px=AppArmor-Linux-4.14

It's obviously Linus to decide in the end, but from my understanding of
the whole "no regressions" rule this looks quite a lot like a regression
to me.

Ciao, Thorsten


> 
> Distros should be pinning the feature set supported because as you
> note below, policy will not get updated for unsupported kernels and you
> will end up in an unsupported state where regressions like this can
> happen.
> 
> There are reasons why distros don't, largely because certain packages
> would like to take advanatage of new features, or only want to support
> a single policy version across multiple releases and are relying on
> the userspace tools to properly compile the policy to different
> kernels.
> 
> The current pinning support doesn't allow for mixing policy versions
> which can make supporting updated packages difficult atm, but there is
> work (that hasn't landed yet) to allow for policy of different version
> by putting the requirements within the individual profiles and will
> completely avoid the problems encountered here.
> 
> 
>>>>  it is because  opensuse dnsmasque is starting with policy that
>>>> doesn't allow access to PF_LOCAL socket
>>>
>>> Because there was no co-ordination between their version of the patch
>>> and yours.  If you're sending in patches that you know might break
>>> systems because they need a co-ordinated rollout of something in
>>> userspace then it would be nice if you could co-ordinate it ...
>>>
>>> Doing it in the merge window and not in -rc2 would also be helpful
>>> because I have more expectation of a userspace mismatch from stuff in
>>> the merge window.
>>
>> Agree, but with rc2 there's still plenty of time, and running rcX means
>> some issues can be expected...
>>
>>>> Christian Boltz the opensuse apparmor maintainer has been working
>>>> on a policy update for opensuse see bug
>>>>
>>>> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195
>>>
>>> Well, that looks really encouraging: The line about "To give you an
>>> impression what "lots of" means - I had to adjust 40 profiles on my
>>> laptop".  The upshot being apart from a bandaid, openSUSE still has no
>>> co-ordinated fix for this.
>>
>> Note that the openSUSE Leap 42.2 kernel is 4.4, so by running 4.14 means
>> you are unsupported from the distro POV and you can't expect that the
>> 42.2 apparmor profiles will ever be updated. I reported the bug above
>> for the Tumbleweed rolling distro, which gets new kernels after the
>> final version is released and passes QA. rcX kernels are packaged for
>> testing, but you have to add the repo explicitly. So there's still
>> enough time to co-ordinate fix of profiles and final 4.14 even for
>> Tumbleweed.
>>
>>> James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-24  6:39         ` Thorsten Leemhuis
@ 2017-10-24 11:03           ` James Bottomley
  2017-10-24 11:57             ` John Johansen
  2017-10-24 15:19             ` Vlastimil Babka
  2017-10-24 11:31           ` John Johansen
  1 sibling, 2 replies; 19+ messages in thread
From: James Bottomley @ 2017-10-24 11:03 UTC (permalink / raw)
  To: Thorsten Leemhuis, John Johansen, Vlastimil Babka, Seth Arnold
  Cc: linux-kernel

On Tue, 2017-10-24 at 08:39 +0200, Thorsten Leemhuis wrote:
> Lo, your friendly regression tracker here!
> 
> On 03.10.2017 09:17, John Johansen wrote:
> > 
> > On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
> > > 
> > > On 10/03/2017 07:15 AM, James Bottomley wrote:
> > > > 
> > > > On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
> > > > > 
> > > > > On 10/02/2017 09:02 PM, James Bottomley wrote:
> > > > > > 
> > > > > > 
> > > > > > The specific problem is that dnsmasq refuses to start on
> > > > > > openSUSE Leap 42.2.  The specific cause is that and attempt
> > > > > > to open a PF_LOCAL socket gets EACCES.  This means that
> > > > > > networking doesn't function on a system with a 4.14-rc2
> > > > > > system. Reverting commit
> > > > > > 651e28c5537abb39076d3949fb7618536f1d242e
> > > > > > (apparmor: add base infastructure for socket mediation)
> > > > > > causes the system to function again.
> > > > > This is not a kernel regression,
> > > > Regression means something that worked in a previous version of
> > > > the kernel which is broken now. This problem falls within that
> > > > definition.
> > > Hm, but if this was because opensuse kernel and apparmor rules
> > > relied on an out-of-tree patch, then it's not an upstream
> > > regression?
> > While its true that previous opensuse kernels were relying on an
> > out of tree patch for doing mediation in this area, the real issue
> > is the configuration of the userspace on the system is setup to
> > enforce new policy features advertised by the kernel. Regardless of
> > whether policy has been updated to deal with it.
> 
> Did anything came out of this discussion?

Not really, no.  I've got the patch reverted locally, so it's not
causing *me* problems anymore.

>  I checked LKML and recent commits, but missed if anything happened.
> But it seems this problem annoys quite a few of people on various
> distros. It turned out one of the the regressions in my last
> regression report seemed to be due to the changes in apparmor. See:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=197137#7
> 
> That commit links to two bugs filed for Debian and Ubuntu:
> https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1724450
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877581
> 
> The stuff even made the news:
> https://www.phoronix.com/scan.php?page=news_item&px=AppArmor-Linux-4.
> 14
> 
> It's obviously Linus to decide in the end, but from my understanding
> of the whole "no regressions" rule this looks quite a lot like a
> regression to me.

It's certainly a lack of co-ordination between all the apparmour using
upstreams, yes.  I think of it as a regression because I have no way
other than reverting the patch of getting my system running again.

I'd also argue that treating this as a regression might possibly
encourage better co-ordination in future.

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-24  6:39         ` Thorsten Leemhuis
  2017-10-24 11:03           ` James Bottomley
@ 2017-10-24 11:31           ` John Johansen
  2017-10-26  9:11             ` Thorsten Leemhuis
  1 sibling, 1 reply; 19+ messages in thread
From: John Johansen @ 2017-10-24 11:31 UTC (permalink / raw)
  To: Thorsten Leemhuis, Vlastimil Babka, James Bottomley, Seth Arnold
  Cc: linux-kernel

On 10/23/2017 11:39 PM, Thorsten Leemhuis wrote:
> Lo, your friendly regression tracker here!
> 
> On 03.10.2017 09:17, John Johansen wrote:
>> On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
>>> On 10/03/2017 07:15 AM, James Bottomley wrote:
>>>> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>>>>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>>>>
>>>>>> The specific problem is that dnsmasq refuses to start on openSUSE
>>>>>> Leap 42.2.  The specific cause is that and attempt to open a
>>>>>> PF_LOCAL socket gets EACCES.  This means that networking doesn't
>>>>>> function on a system with a 4.14-rc2 system.
>>>>>> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
>>>>>> (apparmor: add base infastructure for socket mediation) causes the
>>>>>> system to function again.
>>>>> This is not a kernel regression,
>>>> Regression means something that worked in a previous version of the
>>>> kernel which is broken now. This problem falls within that definition.
>>> Hm, but if this was because opensuse kernel and apparmor rules relied on
>>> an out-of-tree patch, then it's not an upstream regression?
>> While its true that previous opensuse kernels were relying on an out
>> of tree patch for doing mediation in this area, the real issue is the
>> configuration of the userspace on the system is setup to enforce new
>> policy features advertised by the kernel. Regardless of whether policy
>> has been updated to deal with it.
> 
> Did anything came out of this discussion? I checked LKML and recent
> commits, but missed if anything happened. But it seems this problem
> annoys quite a few of people on various distros. It turned out one of
> the the regressions in my last regression report seemed to be due to the
> changes in apparmor. See:
> 

yes, there has been testing and discussions, and a regression was
found just not the "regression" you are encountering. A fix for that
regression is in testing and I will send a pull request for it soon.

> https://bugzilla.kernel.org/show_bug.cgi?id=197137#7
> 

yes, this is the same issue you have encountered

> That commit links to two bugs filed for Debian and Ubuntu:
> https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1724450

this is actually a different issue. Ubuntu hasn't SRUed the most
recent maintenance releases or even just cherry-picked a specific
patch into their userspace packaging.

> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877581
> 

this is largely the same issue as ubuntu.

> The stuff even made the news:
> https://www.phoronix.com/scan.php?page=news_item&px=AppArmor-Linux-4.14
> 
> It's obviously Linus to decide in the end, but from my understanding of
> the whole "no regressions" rule this looks quite a lot like a regression
> to me.
> 

I understand your pov, its breaking you so it is a regression. However
this is not a regression in the kernel nor the apparmor interfaces
between userspace and the kernel. It is a userspace configuration
issue.

It is a userspace configuration issue. Your userspace is set up to
basically do policy development. Atm this is the default configuration
that all distros are using, however the debian maintainer is planning
to use featurea abi pinning for stable releases.

However if you are doing things like using kernels that run ahead of
the distro's apparmor policy, that also means you need to either do
some policy revision, pin the feature abi (userspace configuration),
or disable apparmor.


> Ciao, Thorsten
> 
> 
>>
>> Distros should be pinning the feature set supported because as you
>> note below, policy will not get updated for unsupported kernels and you
>> will end up in an unsupported state where regressions like this can
>> happen.
>>
>> There are reasons why distros don't, largely because certain packages
>> would like to take advanatage of new features, or only want to support
>> a single policy version across multiple releases and are relying on
>> the userspace tools to properly compile the policy to different
>> kernels.
>>
>> The current pinning support doesn't allow for mixing policy versions
>> which can make supporting updated packages difficult atm, but there is
>> work (that hasn't landed yet) to allow for policy of different version
>> by putting the requirements within the individual profiles and will
>> completely avoid the problems encountered here.
>>
>>
>>>>>  it is because  opensuse dnsmasque is starting with policy that
>>>>> doesn't allow access to PF_LOCAL socket
>>>>
>>>> Because there was no co-ordination between their version of the patch
>>>> and yours.  If you're sending in patches that you know might break
>>>> systems because they need a co-ordinated rollout of something in
>>>> userspace then it would be nice if you could co-ordinate it ...
>>>>
>>>> Doing it in the merge window and not in -rc2 would also be helpful
>>>> because I have more expectation of a userspace mismatch from stuff in
>>>> the merge window.
>>>
>>> Agree, but with rc2 there's still plenty of time, and running rcX means
>>> some issues can be expected...
>>>
>>>>> Christian Boltz the opensuse apparmor maintainer has been working
>>>>> on a policy update for opensuse see bug
>>>>>
>>>>> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195
>>>>
>>>> Well, that looks really encouraging: The line about "To give you an
>>>> impression what "lots of" means - I had to adjust 40 profiles on my
>>>> laptop".  The upshot being apart from a bandaid, openSUSE still has no
>>>> co-ordinated fix for this.
>>>
>>> Note that the openSUSE Leap 42.2 kernel is 4.4, so by running 4.14 means
>>> you are unsupported from the distro POV and you can't expect that the
>>> 42.2 apparmor profiles will ever be updated. I reported the bug above
>>> for the Tumbleweed rolling distro, which gets new kernels after the
>>> final version is released and passes QA. rcX kernels are packaged for
>>> testing, but you have to add the repo explicitly. So there's still
>>> enough time to co-ordinate fix of profiles and final 4.14 even for
>>> Tumbleweed.
>>>
>>>> James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-24 11:03           ` James Bottomley
@ 2017-10-24 11:57             ` John Johansen
  2017-10-26 17:36               ` Linus Torvalds
  2017-10-24 15:19             ` Vlastimil Babka
  1 sibling, 1 reply; 19+ messages in thread
From: John Johansen @ 2017-10-24 11:57 UTC (permalink / raw)
  To: James Bottomley, Thorsten Leemhuis, Vlastimil Babka, Seth Arnold
  Cc: linux-kernel

On 10/24/2017 04:03 AM, James Bottomley wrote:
> On Tue, 2017-10-24 at 08:39 +0200, Thorsten Leemhuis wrote:
>> Lo, your friendly regression tracker here!
>>
>> On 03.10.2017 09:17, John Johansen wrote:
>>>
>>> On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
>>>>
>>>> On 10/03/2017 07:15 AM, James Bottomley wrote:
>>>>>
>>>>> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>>>>>>
>>>>>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>>>>>
>>>>>>>
>>>>>>> The specific problem is that dnsmasq refuses to start on
>>>>>>> openSUSE Leap 42.2.  The specific cause is that and attempt
>>>>>>> to open a PF_LOCAL socket gets EACCES.  This means that
>>>>>>> networking doesn't function on a system with a 4.14-rc2
>>>>>>> system. Reverting commit
>>>>>>> 651e28c5537abb39076d3949fb7618536f1d242e
>>>>>>> (apparmor: add base infastructure for socket mediation)
>>>>>>> causes the system to function again.
>>>>>> This is not a kernel regression,
>>>>> Regression means something that worked in a previous version of
>>>>> the kernel which is broken now. This problem falls within that
>>>>> definition.
>>>> Hm, but if this was because opensuse kernel and apparmor rules
>>>> relied on an out-of-tree patch, then it's not an upstream
>>>> regression?
>>> While its true that previous opensuse kernels were relying on an
>>> out of tree patch for doing mediation in this area, the real issue
>>> is the configuration of the userspace on the system is setup to
>>> enforce new policy features advertised by the kernel. Regardless of
>>> whether policy has been updated to deal with it.
>>
>> Did anything came out of this discussion?
> 
> Not really, no.  I've got the patch reverted locally, so it's not
> causing *me* problems anymore.
> 

actually a lot of work and testing has been done. A regression was
found, the fix is in testing and it should land soon, but its not the
regression you are having issues with.

>>  I checked LKML and recent commits, but missed if anything happened.
>> But it seems this problem annoys quite a few of people on various
>> distros. It turned out one of the the regressions in my last
>> regression report seemed to be due to the changes in apparmor. See:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=197137#7
>>
>> That commit links to two bugs filed for Debian and Ubuntu:
>> https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1724450
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877581
>>
>> The stuff even made the news:
>> https://www.phoronix.com/scan.php?page=news_item&px=AppArmor-Linux-4.
>> 14
>>
>> It's obviously Linus to decide in the end, but from my understanding
>> of the whole "no regressions" rule this looks quite a lot like a
>> regression to me.
> 
> It's certainly a lack of co-ordination between all the apparmour using
> upstreams, yes.  I think of it as a regression because I have no way
> other than reverting the patch of getting my system running again.
> 

its not a lack of coordination, regardless of coordination this would
have happened for people front running the suse distro kernel due to
how the apparmor userspace is configured. Could it have been handled
better for kernel devs? Certainly, but the issue you are encountering
is a userspace configuration issue.

Distros get to choose how apparmor is configured, and what certain
defaults are. The suse default configuration is to not pin the
features abi (there are pros and cons depending on what you want),
which essentially means if you are front running the distro kernel and
policy with an upstream kernel you are a policy dev, and you are going
to be hit with stuff like this.

You have a couple userspace options available to you and a couple
kernel options without having to revert the patch.

- You can update your policy.

- You can pin the feature set abi to that of the distro kernel, and
  what your policy was developed for.

- You can disable apparmor in grub

- You can disable apparmor in the kernel config

I have been discussing pinning the feature abi with distros
maintainers (the ability has been around for years) and the Debian
maintainer has been planning, for a while now, to use it by default
for stable releases.  However a similar decision has not been made for
other distros yet.

Long term we are working on a more flexible solution, that won't
require having to choose to pin the features abi but it is not
available yet.

> I'd also argue that treating this as a regression might possibly
> encourage better co-ordination in future.
> 
> James
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-24 11:03           ` James Bottomley
  2017-10-24 11:57             ` John Johansen
@ 2017-10-24 15:19             ` Vlastimil Babka
  1 sibling, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2017-10-24 15:19 UTC (permalink / raw)
  To: James Bottomley, Thorsten Leemhuis, John Johansen, Seth Arnold
  Cc: linux-kernel

On 10/24/2017 01:03 PM, James Bottomley wrote:
> On Tue, 2017-10-24 at 08:39 +0200, Thorsten Leemhuis wrote:
>> Lo, your friendly regression tracker here!
>>
>> On 03.10.2017 09:17, John Johansen wrote:
>>>
>>> On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
>>>>
>>>> On 10/03/2017 07:15 AM, James Bottomley wrote:
>>>> Hm, but if this was because opensuse kernel and apparmor rules
>>>> relied on an out-of-tree patch, then it's not an upstream
>>>> regression?
>>> While its true that previous opensuse kernels were relying on an
>>> out of tree patch for doing mediation in this area, the real issue
>>> is the configuration of the userspace on the system is setup to
>>> enforce new policy features advertised by the kernel. Regardless of
>>> whether policy has been updated to deal with it.
>>
>> Did anything came out of this discussion?
> 
> Not really, no.  I've got the patch reverted locally, so it's not
> causing *me* problems anymore.

openSUSE tumbleweed was fixed already and IIRC the maintainer also said
that Leap would be fixed as well, even though 4.14 is not an official
kernel. You could check whether the apparmor-profiles package was
updated yet on your version...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-24 11:31           ` John Johansen
@ 2017-10-26  9:11             ` Thorsten Leemhuis
  2017-10-26 18:13               ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Thorsten Leemhuis @ 2017-10-26  9:11 UTC (permalink / raw)
  To: John Johansen, Vlastimil Babka, James Bottomley, Seth Arnold
  Cc: linux-kernel, Linus Torvalds

On 24.10.2017 13:31, John Johansen wrote:
> On 10/23/2017 11:39 PM, Thorsten Leemhuis wrote:
>> Lo, your friendly regression tracker here!
>> On 03.10.2017 09:17, John Johansen wrote:
>>> On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
>>>> On 10/03/2017 07:15 AM, James Bottomley wrote:
>>>>> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>>>>>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>>>>>
>>>>>>> The specific problem is that dnsmasq refuses to start on openSUSE
>>>>>>> Leap 42.2.  The specific cause is that and attempt to open a
>>>>>>> PF_LOCAL socket gets EACCES.  This means that networking doesn't
>>>>>>> function on a system with a 4.14-rc2 system.
>>>>>>> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
>>>>>>> (apparmor: add base infastructure for socket mediation) causes the
>>>>>>> system to function again.
>>>>>> This is not a kernel regression,
>>>>> Regression means something that worked in a previous version of the
>>>>> kernel which is broken now. This problem falls within that definition.
>>>> Hm, but if this was because opensuse kernel and apparmor rules relied on
>>>> an out-of-tree patch, then it's not an upstream regression?
>>> While its true that previous opensuse kernels were relying on an out
>>> of tree patch for doing mediation in this area, the real issue is the
>>> configuration of the userspace on the system is setup to enforce new
>>> policy features advertised by the kernel. Regardless of whether policy
>>> has been updated to deal with it.
>> Did anything came out of this discussion? I checked LKML and recent
>> commits, but missed if anything happened. But it seems this problem
>> annoys quite a few of people on various distros. It turned out one of
>> the the regressions in my last regression report seemed to be due to the
>> changes in apparmor. See:
> 
> yes, there has been testing and discussions, and a regression was
> found just not the "regression" you are encountering. A fix for that
> regression is in testing and I will send a pull request for it soon.

Just out of curiosity: any pointer to the discussion or the fix?

>> https://bugzilla.kernel.org/show_bug.cgi?id=197137#7
> yes, this is the same issue you have encountered

FWIW: I didn't encounter any of this, I'm just doing regression tracking
and now hit the point where to escalate the issue to Linus...

>> That commit links to two bugs filed for Debian and Ubuntu:
>> https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1724450
> this is actually a different issue. Ubuntu hasn't SRUed the most
> recent maintenance releases or even just cherry-picked a specific
> patch into their userspace packaging.
> 
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877581
> this is largely the same issue as ubuntu.

Well, afaics it boils down: Things stop working on two (or even three?)
mainstream distros in case their users update to 4.14 without updating
their userland. Is that the case even after the fix you mentioned gets
merged? Then this definitely is a regression.
>> The stuff even made the news:
>> https://www.phoronix.com/scan.php?page=news_item&px=AppArmor-Linux-4.14
>> It's obviously Linus to decide in the end, but from my understanding of
>> the whole "no regressions" rule this looks quite a lot like a regression
>> to me.
> I understand your pov, its breaking you so it is a regression. However
> this is not a regression in the kernel nor the apparmor interfaces
> between userspace and the kernel. It is a userspace configuration
> issue.
>
> It is a userspace configuration issue. Your userspace is set up to
> basically do policy development. Atm this is the default configuration
> that all distros are using, however the debian maintainer is planning
> to use featurea abi pinning for stable releases.
> 
> However if you are doing things like using kernels that run ahead of
> the distro's apparmor policy, that also means you need to either do
> some policy revision, pin the feature abi (userspace configuration),
> or disable apparmor.

All that afaics doesn't matter. If a new kernel breaks things for people
(that especially includes people that do *not* update their userland)
then it's a kernel regression, even if the root of the problem is in
usersland. Linus (CCed) said that often enough (I really should sit down
and collect his mails on this from the web and put them in one
document). He for example recently said in
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004746.html
recently that people should "feel safe in always upgrading to any higher
version". And that's not the case afaics -- or am I missing something?
See also this discussion, where the problem was quite similar iirc:
https://lkml.org/lkml/2012/12/23/75 "If a change results in user
programs breaking, it's a bug in the kernel. We never EVER blame the
user programs. How hard can this be to understand? […]"

Ciao, Thorsten

>>> Distros should be pinning the feature set supported because as you
>>> note below, policy will not get updated for unsupported kernels and you
>>> will end up in an unsupported state where regressions like this can
>>> happen.
>>>
>>> There are reasons why distros don't, largely because certain packages
>>> would like to take advanatage of new features, or only want to support
>>> a single policy version across multiple releases and are relying on
>>> the userspace tools to properly compile the policy to different
>>> kernels.
>>>
>>> The current pinning support doesn't allow for mixing policy versions
>>> which can make supporting updated packages difficult atm, but there is
>>> work (that hasn't landed yet) to allow for policy of different version
>>> by putting the requirements within the individual profiles and will
>>> completely avoid the problems encountered here.
>>>
>>>
>>>>>>  it is because  opensuse dnsmasque is starting with policy that
>>>>>> doesn't allow access to PF_LOCAL socket
>>>>>
>>>>> Because there was no co-ordination between their version of the patch
>>>>> and yours.  If you're sending in patches that you know might break
>>>>> systems because they need a co-ordinated rollout of something in
>>>>> userspace then it would be nice if you could co-ordinate it ...
>>>>>
>>>>> Doing it in the merge window and not in -rc2 would also be helpful
>>>>> because I have more expectation of a userspace mismatch from stuff in
>>>>> the merge window.
>>>>
>>>> Agree, but with rc2 there's still plenty of time, and running rcX means
>>>> some issues can be expected...
>>>>
>>>>>> Christian Boltz the opensuse apparmor maintainer has been working
>>>>>> on a policy update for opensuse see bug
>>>>>>
>>>>>> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195
>>>>>
>>>>> Well, that looks really encouraging: The line about "To give you an
>>>>> impression what "lots of" means - I had to adjust 40 profiles on my
>>>>> laptop".  The upshot being apart from a bandaid, openSUSE still has no
>>>>> co-ordinated fix for this.
>>>>
>>>> Note that the openSUSE Leap 42.2 kernel is 4.4, so by running 4.14 means
>>>> you are unsupported from the distro POV and you can't expect that the
>>>> 42.2 apparmor profiles will ever be updated. I reported the bug above
>>>> for the Tumbleweed rolling distro, which gets new kernels after the
>>>> final version is released and passes QA. rcX kernels are packaged for
>>>> testing, but you have to add the repo explicitly. So there's still
>>>> enough time to co-ordinate fix of profiles and final 4.14 even for
>>>> Tumbleweed.
>>>>> James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-24 11:57             ` John Johansen
@ 2017-10-26 17:36               ` Linus Torvalds
  2017-10-26 18:54                 ` James Morris
  2017-10-26 19:59                 ` John Johansen
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2017-10-26 17:36 UTC (permalink / raw)
  To: John Johansen
  Cc: James Bottomley, Thorsten Leemhuis, Vlastimil Babka, Seth Arnold,
	linux-kernel

On Tue, Oct 24, 2017 at 1:57 PM, John Johansen
<john.johansen@canonical.com> wrote:
>
> actually a lot of work and testing has been done. A regression was
> found, the fix is in testing and it should land soon, but its not the
> regression you are having issues with.

Stop gthis f*cking idiocy already!

As far as the kernel is concerned, a regressions is THE KERNEL NOT
GIVING THE SAME END RESULT WITH THE SAME USER SPACE.

The regression was in the kernel. You trying to shift the regressions
somewhere  else is bogus SHIT.

And seriously, it's the kind of garbage that makes me think your
opinion and your code cannot be relied on.

If you are not willing to admit that your commit 651e28c5537a
("apparmor: add base infastructure for socket mediation") caused a
regression, then honestly, I don't want to get commits from you.

It's that simple.

I'm *very* unhappy with the security layer as is, the last thing I
want to see is some security layer developer that then goes on to try
to re-define was regression means.

If you break existing user space setups THAT IS A REGRESSION.

It's not ok to say "but we'll fix the user space setup".

Really. NOT OK.

I think I will have to revert that garbage, for the simple reason that
I refuse to have code in the kernel from maintainers that cannot even
understand the first rule of kernel development.

The first rule is:

 - we don't cause regressions

and the corollary is that when regressions *do* occur, we admit to
them and fix them, instead of blaming user space.

The fact that you have apparently been denying the regression now for
three weeks means that I will revert, and I will stop pulling apparmor
requests until the people involved understand how kernel development
is done.

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-26  9:11             ` Thorsten Leemhuis
@ 2017-10-26 18:13               ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2017-10-26 18:13 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: John Johansen, Vlastimil Babka, James Bottomley, Seth Arnold,
	linux-kernel

On Thu, Oct 26, 2017 at 11:11 AM, Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
>
> All that afaics doesn't matter. If a new kernel breaks things for people
> (that especially includes people that do *not* update their userland)
> then it's a kernel regression, even if the root of the problem is in
> usersland. Linus (CCed) said that often enough (I really should sit down
> and collect his mails on this from the web and put them in one
> document).

Thorsten is very much correct.

People should basically always feel like they can update their kernel
and simply not have to worry about it.

I refuse to introduce "you can only update the kernel if you also
update that other program" kind of limitations. If the kernel used to
work for you, the rule is that it continues to work for you.

There have been exceptions, but they are few and far between, and they
generally have some major and fundamental reasons for having happened,
that were basically entirely unavoidable, and people _tried_hard_ to
avoid them. Maybe we can't practically support the hardware any more
after it is decades old and nobody uses it with modern kernels any
more. Maybe there's a serious security issue with how we did things,
and people actually depended on that fundamentally broken model. Maybe
there was some fundamental other breakage that just _had_ to have a
flag day for very core and fundamental reasons.

And notice that this is very much about *breaking* peoples environments.

Behavioral changes happen, and maybe we don't even support some
feature any more. There's a number of fields in /proc/<pid>/stat that
are printed out as zeroes, simply because they don't even *exist* in
the kernel any more, or because showing them was a mistake (typically
an information leak). But the numbers got replaced by zeroes, so that
the code that used to parse the fields still works. The user might not
see everything they used to see, and so behavior is clearly different,
but things still _work_, even if they might no longer show sensitive
(or no longer relevant) information.

But if something actually breaks, then the change must get fixed or
reverted. And it gets fixed in the *kernel*. Not by saying "well, fix
your user space then". It was a kernel change that exposed the
problem, it needs to be the kernel that corrects for it, because we
have a "upgrade in place" model. We don't have a "upgrade with new
user space".

And I seriously will refuse to take code from people who do not
understand and honor this very simple rule.

This rule is also not going to change.

And yes, I realize that the kernel is "special" in this respect. I'm
proud of it.

I have seen, and can point to, lots of projects that go "We need to
break that use case in order to make progress" or "you relied on
undocumented behavior, it sucks to be you" or "there's a better way to
do what you want to do, and you have to change to that new better
way", and I simply don't think that's acceptable outside of very early
alpha releases that have experimental users that know what they signed
up for. The kernel hasn't been in that situation for the last two
decades.

We do API breakage _inside_ the kernel all the time. We will fix
internal problems by saying "you now need to do XYZ", but then it's
about internal kernel API's, and the people who do that then also
obviously have to fix up all the in-kernel users of that API. Nobody
can say "I now broke the API you used, and now _you_ need to fix it
up". Whoever broke something gets to fix it too.

And we simply do not break user space.

                Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-26 17:36               ` Linus Torvalds
@ 2017-10-26 18:54                 ` James Morris
  2017-10-26 19:02                   ` Linus Torvalds
  2017-10-26 19:59                 ` John Johansen
  1 sibling, 1 reply; 19+ messages in thread
From: James Morris @ 2017-10-26 18:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Johansen, James Bottomley, Thorsten Leemhuis,
	Vlastimil Babka, Seth Arnold, linux-kernel

On Thu, 26 Oct 2017, Linus Torvalds wrote:

> I'm *very* unhappy with the security layer as is

What are you unhappy with?


-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-26 18:54                 ` James Morris
@ 2017-10-26 19:02                   ` Linus Torvalds
  2017-10-26 19:06                     ` James Morris
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2017-10-26 19:02 UTC (permalink / raw)
  To: James Morris
  Cc: John Johansen, James Bottomley, Thorsten Leemhuis,
	Vlastimil Babka, Seth Arnold, linux-kernel

On Thu, Oct 26, 2017 at 8:54 PM, James Morris <james.l.morris@oracle.com> wrote:
> On Thu, 26 Oct 2017, Linus Torvalds wrote:
>
>> I'm *very* unhappy with the security layer as is
>
> What are you unhappy with?

We had two big _fundamental_ problems this merge window:

 - untested code that clearly didn't do what it claimed it did, and
which caused me to not even accept the main pull request

 - apparmor code that had a regression, where it took three weeks for
that regression to be escalated to me simply because the developer was
denying the regression.

Tell me why I *shouldn't* be unhappy with the security layer?

I shouldn't be in the situation where I start reviewing the code and
go "that can't be right".

And I *definitely* shouldn't be in the situation where I need to come
in three weeks later and tell people what a regression is!

             Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-26 19:02                   ` Linus Torvalds
@ 2017-10-26 19:06                     ` James Morris
  2017-10-26 20:08                       ` John Johansen
  0 siblings, 1 reply; 19+ messages in thread
From: James Morris @ 2017-10-26 19:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Johansen, James Bottomley, Thorsten Leemhuis,
	Vlastimil Babka, Seth Arnold, linux-kernel

On Thu, 26 Oct 2017, Linus Torvalds wrote:

> On Thu, Oct 26, 2017 at 8:54 PM, James Morris <james.l.morris@oracle.com> wrote:
> > On Thu, 26 Oct 2017, Linus Torvalds wrote:
> >
> >> I'm *very* unhappy with the security layer as is
> >
> > What are you unhappy with?
> 
> We had two big _fundamental_ problems this merge window:
> 
>  - untested code that clearly didn't do what it claimed it did, and
> which caused me to not even accept the main pull request
> 
>  - apparmor code that had a regression, where it took three weeks for
> that regression to be escalated to me simply because the developer was
> denying the regression.
> 
> Tell me why I *shouldn't* be unhappy with the security layer?
> 
> I shouldn't be in the situation where I start reviewing the code and
> go "that can't be right".
> 
> And I *definitely* shouldn't be in the situation where I need to come
> in three weeks later and tell people what a regression is!

Agreed on both counts, and sorry for these problems.

-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-26 17:36               ` Linus Torvalds
  2017-10-26 18:54                 ` James Morris
@ 2017-10-26 19:59                 ` John Johansen
  1 sibling, 0 replies; 19+ messages in thread
From: John Johansen @ 2017-10-26 19:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Bottomley, Thorsten Leemhuis, Vlastimil Babka, Seth Arnold,
	linux-kernel

On 10/26/2017 10:36 AM, Linus Torvalds wrote:
> On Tue, Oct 24, 2017 at 1:57 PM, John Johansen
> <john.johansen@canonical.com> wrote:
>>
>> actually a lot of work and testing has been done. A regression was
>> found, the fix is in testing and it should land soon, but its not the
>> regression you are having issues with.
> 
> Stop gthis f*cking idiocy already!
> 
> As far as the kernel is concerned, a regressions is THE KERNEL NOT
> GIVING THE SAME END RESULT WITH THE SAME USER SPACE.
> 
> The regression was in the kernel. You trying to shift the regressions
> somewhere  else is bogus SHIT.
> 
> And seriously, it's the kind of garbage that makes me think your
> opinion and your code cannot be relied on.
> 
> If you are not willing to admit that your commit 651e28c5537a
> ("apparmor: add base infastructure for socket mediation") caused a
> regression, then honestly, I don't want to get commits from you.
> 
> It's that simple.
> 
> I'm *very* unhappy with the security layer as is, the last thing I
> want to see is some security layer developer that then goes on to try
> to re-define was regression means.
> 
> If you break existing user space setups THAT IS A REGRESSION.
>
You're right, sorry. I really wasn't thinking about this the right way.

> It's not ok to say "but we'll fix the user space setup".
> 
> Really. NOT OK.
> 
> I think I will have to revert that garbage, for the simple reason that
> I refuse to have code in the kernel from maintainers that cannot even
> understand the first rule of kernel development.
> 
> The first rule is:
> 
>  - we don't cause regressions
> 
> and the corollary is that when regressions *do* occur, we admit to
> them and fix them, instead of blaming user space.
> 
> The fact that you have apparently been denying the regression now for
> three weeks means that I will revert, and I will stop pulling apparmor
> requests until the people involved understand how kernel development
> is done.
> 

ack, and understood. I will update the apparmor module kernel abi to
ensure that existing userspaces won't break here. After that we will
implement full policy versioning to ensure that userspace and the
kernel agree on the version of security policy that should be used.

Going forward if for any reason there is a regression we will either
get a patch to you asap or ask for the offending patch to be reverted.

Again, sorry, our perspective was too narrow. We will make it right.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation
  2017-10-26 19:06                     ` James Morris
@ 2017-10-26 20:08                       ` John Johansen
  0 siblings, 0 replies; 19+ messages in thread
From: John Johansen @ 2017-10-26 20:08 UTC (permalink / raw)
  To: James Morris, Linus Torvalds
  Cc: James Bottomley, Thorsten Leemhuis, Vlastimil Babka, Seth Arnold,
	linux-kernel

On 10/26/2017 12:06 PM, James Morris wrote:
> On Thu, 26 Oct 2017, Linus Torvalds wrote:
> 
>> On Thu, Oct 26, 2017 at 8:54 PM, James Morris <james.l.morris@oracle.com> wrote:
>>> On Thu, 26 Oct 2017, Linus Torvalds wrote:
>>>
>>>> I'm *very* unhappy with the security layer as is
>>>
>>> What are you unhappy with?
>>
>> We had two big _fundamental_ problems this merge window:
>>
>>  - untested code that clearly didn't do what it claimed it did, and
>> which caused me to not even accept the main pull request
>>
>>  - apparmor code that had a regression, where it took three weeks for
>> that regression to be escalated to me simply because the developer was
>> denying the regression.
>>
>> Tell me why I *shouldn't* be unhappy with the security layer?
>>
>> I shouldn't be in the situation where I start reviewing the code and
>> go "that can't be right".
>>
>> And I *definitely* shouldn't be in the situation where I need to come
>> in three weeks later and tell people what a regression is!
> 
> Agreed on both counts, and sorry for these problems.
> 

I am fine with doing either, what ever Linus finds works best for
him. The only reason I went to the direct pull request for apparmor
was that as I as understood it Linus wanted the larger LSM pull
requests separated out so that it was easier for him to see what was
in them.

And again sorry, I screwed up, it should not have happened, my
perspective was incorrect and I know I need to make it right.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-10-26 20:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03  4:02 regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation James Bottomley
2017-10-03  4:11 ` John Johansen
2017-10-03  5:15   ` James Bottomley
2017-10-03  6:32     ` John Johansen
2017-10-03  6:48     ` Vlastimil Babka
2017-10-03  7:17       ` John Johansen
2017-10-24  6:39         ` Thorsten Leemhuis
2017-10-24 11:03           ` James Bottomley
2017-10-24 11:57             ` John Johansen
2017-10-26 17:36               ` Linus Torvalds
2017-10-26 18:54                 ` James Morris
2017-10-26 19:02                   ` Linus Torvalds
2017-10-26 19:06                     ` James Morris
2017-10-26 20:08                       ` John Johansen
2017-10-26 19:59                 ` John Johansen
2017-10-24 15:19             ` Vlastimil Babka
2017-10-24 11:31           ` John Johansen
2017-10-26  9:11             ` Thorsten Leemhuis
2017-10-26 18:13               ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).